<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Observability - Category - Shengxu · Cloud Architecture &amp; DevOps</title><link>https://shengxu.pages.dev/en/categories/observability/</link><description>Cloud architecture &amp; DevOps notes by Shengxu: Kubernetes, Cilium, observability, LLM infra, AI agents.</description><generator>Hugo 0.153.2 &amp; FixIt v0.4.0-alpha.3-20251225101113-8ffb9a95</generator><language>en</language><lastBuildDate>Fri, 17 Apr 2026 19:40:00 +0800</lastBuildDate><atom:link href="https://shengxu.pages.dev/en/categories/observability/index.xml" rel="self" type="application/rss+xml"/><item><title>From Azure SRE Agent to HolmesGPT: AIOps Practices in Multi-Cloud Kubernetes Environments</title><link>https://shengxu.pages.dev/en/posts/azure-sre-agent-to-holmesgpt/</link><pubDate>Fri, 17 Apr 2026 19:40:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/azure-sre-agent-to-holmesgpt/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/kubernetes/">Kubernetes</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;In the multi-cloud Kubernetes era, the pain point for SREs is no longer just &amp;ldquo;too many alerts,&amp;rdquo; but rather investigation chains that are too long, context that is too scattered, and troubleshooting costs across clouds that are too high. What truly drains people isn&amp;rsquo;t glancing at a chart, but constantly switching between multiple cloud platforms, logging systems, deployment records, and ticketing systems.&lt;/p&gt;
&lt;p&gt;This is why AI SRE Agents are starting to deliver real value. Their goal isn&amp;rsquo;t to be a better conversational Copilot, but to proactively take over the highly repetitive first half of the work—&amp;ldquo;checking logs, finding correlations, guessing root causes, and giving suggestions&amp;rdquo;—once an alert is triggered.&lt;/p&gt;</description></item><item><title>Cilium 2026 (Continued): How the Unified Data Plane Is Reshaping Kubernetes Platform Architecture</title><link>https://shengxu.pages.dev/en/posts/cilium-2026-part-2-unified-dataplane/</link><pubDate>Sat, 21 Mar 2026 14:31:56 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/cilium-2026-part-2-unified-dataplane/</guid><category domain="https://shengxu.pages.dev/en/categories/kubernetes/">Kubernetes</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><category domain="https://shengxu.pages.dev/en/categories/security/">Security</category><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><description>&lt;p&gt;In &lt;a href="https://shengxu.pages.dev/posts/cilium-2026/"&gt;the previous article on Cilium&lt;/a&gt;, we explored the real reasons behind the 2026 migration wave: it&amp;rsquo;s no longer just &amp;ldquo;a faster CNI,&amp;rdquo; but rather a reorganization of Kubernetes networking, security, observability, and multi-cluster capabilities into a more unified infrastructure foundation, while clarifying its division of labor and boundaries with Istio.&lt;/p&gt;
&lt;p&gt;If the previous article answered &amp;ldquo;What exactly can Cilium bring us?&amp;rdquo;, this one goes further, focusing on its core evolution: the &lt;strong&gt;Unified Dataplane&lt;/strong&gt;.&lt;/p&gt;</description></item><item><title>What Cilium Can Really Bring Us in 2026</title><link>https://shengxu.pages.dev/en/posts/cilium-2026/</link><pubDate>Sun, 08 Mar 2026 10:30:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/cilium-2026/</guid><category domain="https://shengxu.pages.dev/en/categories/kubernetes/">Kubernetes</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;h2 class="heading-element" id="what-meaningful-changes-it-actually-brings-and-how-to-divide-work-with-istio"&gt;&lt;span&gt;——What Meaningful Changes It Actually Brings, and How to Divide Work with Istio&lt;/span&gt;
 &lt;a href="#what-meaningful-changes-it-actually-brings-and-how-to-divide-work-with-istio" class="heading-mark"&gt;
 &lt;svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"&gt;&lt;path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"&gt;&lt;/path&gt;&lt;/svg&gt;
 &lt;/a&gt;
&lt;/h2&gt;&lt;p&gt;By 2026, many teams discussing Cilium are no longer asking &amp;ldquo;Is it worth trying?&amp;rdquo; but rather &amp;ldquo;When should we migrate?&amp;rdquo;&lt;/p&gt;</description></item><item><title>Weekend Project: Building a Local Load Balancer for LLM API Keys</title><link>https://shengxu.pages.dev/en/posts/llm-api-load-balancer/</link><pubDate>Sat, 14 Feb 2026 10:18:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/llm-api-load-balancer/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;Lately, because I&amp;rsquo;ve been using various LLM services (OpenAI, Gemini, DeepSeek, etc.) intensively, I&amp;rsquo;ve run into a very real pain point: &lt;strong&gt;being broke&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To save money, I applied for multiple free API keys (like Google Gemini&amp;rsquo;s Free Tier or DeepSeek&amp;rsquo;s complimentary credits), but these free keys often come with strict rate limits (RPM/TPM). Just when I&amp;rsquo;m in the flow writing code, a &lt;code&gt;429 Too Many Requests&lt;/code&gt; error pops up, completely breaking my train of thought. It&amp;rsquo;s really frustrating.&lt;/p&gt;</description></item><item><title>Hands-On · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)</title><link>https://shengxu.pages.dev/en/posts/fantasy-novel-agent-observability/</link><pubDate>Thu, 05 Feb 2026 16:00:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/fantasy-novel-agent-observability/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;In the previous post, we discussed the security of RAG systems and prompt injection protection. Today, let&amp;rsquo;s dive into another engineering deep-water zone: &lt;strong&gt;Observability&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When a system evolves from &amp;ldquo;it works&amp;rdquo; to &amp;ldquo;it works reliably long-term,&amp;rdquo; you will inevitably encounter three types of problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Slow&lt;/strong&gt;: Is retrieval slow? Is the LLM slow? Or is some Agent stuck in a retry loop?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expensive&lt;/strong&gt;: Is token consumption being silently drained by a specific chain? Why doesn&amp;rsquo;t this month&amp;rsquo;s API bill add up?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strange&lt;/strong&gt;: Intermittent bugs that can&amp;rsquo;t be reproduced, leaving you to fix code based on &amp;ldquo;gut feeling.&amp;rdquo;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At this stage, I chose to build a complete &lt;strong&gt;Metrics + Logs&lt;/strong&gt; system, rather than just sprinkling in a few &lt;code&gt;print&lt;/code&gt; statements.&lt;/p&gt;</description></item><item><title>Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)</title><link>https://shengxu.pages.dev/en/posts/fantasy-novel-agent-security/</link><pubDate>Wed, 04 Feb 2026 10:00:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/fantasy-novel-agent-security/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/security/">Security</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;In the previous 2.5 articles, I&amp;rsquo;ve already laid out the backbone of &lt;a href="https://shengxu.pages.dev/posts/fantasy-novel-agent-architecture-evolution/"&gt;FantasyNovelAgent&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://shengxu.pages.dev/posts/fantasy-novel-agent-architecture-evolution/"&gt;Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://shengxu.pages.dev/posts/fantasy-novel-agent-database-evolution/"&gt;Building a Memory-Enabled AI Writing Partner (Part 2): Database Evolution&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://shengxu.pages.dev/posts/fantasy-novel-agent-retrieval-evolution/"&gt;Building a Memory-Enabled AI Writing Partner (Part 3): Retrieval System Evolution&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This article dives deep into the most overlooked yet critical aspect of AI systems: &lt;strong&gt;Security&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re thinking, &amp;ldquo;I&amp;rsquo;m just writing a novel, what security issues could there be?&amp;rdquo;, consider this:&lt;/p&gt;</description></item><item><title>From Traffic Gatekeeping to Quality Insight: A 2026 Guide to Building Enterprise-Grade LLM Observability Systems</title><link>https://shengxu.pages.dev/en/posts/llm-observability-guide-2026/</link><pubDate>Mon, 19 Jan 2026 15:00:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/llm-observability-guide-2026/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;As large language models (LLMs) evolve from &amp;ldquo;novelty toys&amp;rdquo; into the &amp;ldquo;productivity backbone&amp;rdquo; of enterprises, a question that every technical leader keeps coming back to has surfaced: &lt;strong&gt;When API calls become a black box, how do we manage these massive, expensive AI models with the same rigor we apply to databases or microservices?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If 2024 was the year everyone was busy &amp;ldquo;getting demos to work,&amp;rdquo; then 2026 marks the dawn of &amp;ldquo;fine-grained governance.&amp;rdquo; The simple &amp;ldquo;call succeeded/failed&amp;rdquo; logs of the past can no longer answer today&amp;rsquo;s complex operational questions: &lt;em&gt;&amp;ldquo;Why was this agent so smart yesterday, but today it&amp;rsquo;s spouting nonsense?&amp;rdquo;&lt;/em&gt;, &lt;em&gt;&amp;ldquo;Why did our token costs suddenly double last month?&amp;rdquo;&lt;/em&gt;, &lt;em&gt;&amp;ldquo;Is someone trying to attack our customer service bot with a prompt injection?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;</description></item><item><title>From Improvement to Reinvention: Deconstructing the Three Philosophies and Selection Truths of Prometheus Monitoring Architecture</title><link>https://shengxu.pages.dev/en/posts/prometheus-monitoring-architecture-evolution/</link><pubDate>Sun, 04 Jan 2026 17:00:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/prometheus-monitoring-architecture-evolution/</guid><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;Looking back at the years spent navigating the observability space—especially around building metrics systems—it feels like a long architectural pilgrimage. From the early days of babysitting a standalone Prometheus and worrying about disk space, to introducing Thanos in an attempt to achieve &amp;ldquo;infinite storage,&amp;rdquo; and now rebuilding the entire monitoring hub with Mimir, these experiences are scattered in memory, with some details already starting to blur.&lt;/p&gt;
&lt;p&gt;Recently, I took some time to systematically revisit the pitfalls I&amp;rsquo;ve encountered and the technical decisions I&amp;rsquo;ve made over the years. Suddenly, it struck me: this isn&amp;rsquo;t just a story of technical iteration; it&amp;rsquo;s a series of philosophical choices made when facing pain points at different scales. What I once thought were &amp;ldquo;upgrades&amp;rdquo; turned out to be fundamentally different species. This post serves as a salvage summary of those fading experiences, discussing what I see as three architectural patterns and why, at a certain scale, Mimir becomes the &amp;ldquo;right&amp;rdquo; choice.&lt;/p&gt;</description></item></channel></rss>