<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Observability - Tag - Shengxu · Cloud Architecture &amp; DevOps</title><link>https://shengxu.pages.dev/en/tags/observability/</link><description>Cloud architecture &amp; DevOps notes by Shengxu: Kubernetes, Cilium, observability, LLM infra, AI agents.</description><generator>Hugo 0.153.2 &amp; FixIt v0.4.0-alpha.3-20251225101113-8ffb9a95</generator><language>en</language><lastBuildDate>Thu, 05 Feb 2026 16:00:00 +0800</lastBuildDate><atom:link href="https://shengxu.pages.dev/en/tags/observability/index.xml" rel="self" type="application/rss+xml"/><item><title>Hands-On · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)</title><link>https://shengxu.pages.dev/en/posts/fantasy-novel-agent-observability/</link><pubDate>Thu, 05 Feb 2026 16:00:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/fantasy-novel-agent-observability/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;In the previous post, we discussed the security of RAG systems and prompt injection protection. Today, let&amp;rsquo;s dive into another engineering deep-water zone: &lt;strong&gt;Observability&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When a system evolves from &amp;ldquo;it works&amp;rdquo; to &amp;ldquo;it works reliably long-term,&amp;rdquo; you will inevitably encounter three types of problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Slow&lt;/strong&gt;: Is retrieval slow? Is the LLM slow? Or is some Agent stuck in a retry loop?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expensive&lt;/strong&gt;: Is token consumption being silently drained by a specific chain? Why doesn&amp;rsquo;t this month&amp;rsquo;s API bill add up?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strange&lt;/strong&gt;: Intermittent bugs that can&amp;rsquo;t be reproduced, leaving you to fix code based on &amp;ldquo;gut feeling.&amp;rdquo;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At this stage, I chose to build a complete &lt;strong&gt;Metrics + Logs&lt;/strong&gt; system, rather than just sprinkling in a few &lt;code&gt;print&lt;/code&gt; statements.&lt;/p&gt;</description></item></channel></rss>