Observability - Category - Shengxu · Cloud Architecture & DevOps

From Azure SRE Agent to HolmesGPT: AIOps Practices in Multi-Cloud Kubernetes Environments

Fri, 17 Apr 2026 19:40:00 +0800

In the multi-cloud Kubernetes era, the pain point for SREs is no longer just “too many alerts,” but rather investigation chains that are too long, context that is too scattered, and troubleshooting costs across clouds that are too high. What truly drains people isn’t glancing at a chart, but constantly switching between multiple cloud platforms, logging systems, deployment records, and ticketing systems.

This is why AI SRE Agents are starting to deliver real value. Their goal isn’t to be a better conversational Copilot, but to proactively take over the highly repetitive first half of the work—“checking logs, finding correlations, guessing root causes, and giving suggestions”—once an alert is triggered.

Cilium 2026 (Continued): How the Unified Data Plane Is Reshaping Kubernetes Platform Architecture

Sat, 21 Mar 2026 14:31:56 +0800

In the previous article on Cilium, we explored the real reasons behind the 2026 migration wave: it’s no longer just “a faster CNI,” but rather a reorganization of Kubernetes networking, security, observability, and multi-cluster capabilities into a more unified infrastructure foundation, while clarifying its division of labor and boundaries with Istio.

If the previous article answered “What exactly can Cilium bring us?”, this one goes further, focusing on its core evolution: the Unified Dataplane.

What Cilium Can Really Bring Us in 2026

Sun, 08 Mar 2026 10:30:00 +0800

——What Meaningful Changes It Actually Brings, and How to Divide Work with Istio

By 2026, many teams discussing Cilium are no longer asking “Is it worth trying?” but rather “When should we migrate?”

Weekend Project: Building a Local Load Balancer for LLM API Keys

Sat, 14 Feb 2026 10:18:00 +0800

Lately, because I’ve been using various LLM services (OpenAI, Gemini, DeepSeek, etc.) intensively, I’ve run into a very real pain point: being broke.

To save money, I applied for multiple free API keys (like Google Gemini’s Free Tier or DeepSeek’s complimentary credits), but these free keys often come with strict rate limits (RPM/TPM). Just when I’m in the flow writing code, a 429 Too Many Requests error pops up, completely breaking my train of thought. It’s really frustrating.

Hands-On · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)

Thu, 05 Feb 2026 16:00:00 +0800

In the previous post, we discussed the security of RAG systems and prompt injection protection. Today, let’s dive into another engineering deep-water zone: Observability.

When a system evolves from “it works” to “it works reliably long-term,” you will inevitably encounter three types of problems:

Slow: Is retrieval slow? Is the LLM slow? Or is some Agent stuck in a retry loop?
Expensive: Is token consumption being silently drained by a specific chain? Why doesn’t this month’s API bill add up?
Strange: Intermittent bugs that can’t be reproduced, leaving you to fix code based on “gut feeling.”

At this stage, I chose to build a complete Metrics + Logs system, rather than just sprinkling in a few print statements.

Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)

Wed, 04 Feb 2026 10:00:00 +0800

In the previous 2.5 articles, I’ve already laid out the backbone of FantasyNovelAgent:

This article dives deep into the most overlooked yet critical aspect of AI systems: Security.

If you’re thinking, “I’m just writing a novel, what security issues could there be?”, consider this:

From Traffic Gatekeeping to Quality Insight: A 2026 Guide to Building Enterprise-Grade LLM Observability Systems

Mon, 19 Jan 2026 15:00:00 +0800

As large language models (LLMs) evolve from “novelty toys” into the “productivity backbone” of enterprises, a question that every technical leader keeps coming back to has surfaced: When API calls become a black box, how do we manage these massive, expensive AI models with the same rigor we apply to databases or microservices?

If 2024 was the year everyone was busy “getting demos to work,” then 2026 marks the dawn of “fine-grained governance.” The simple “call succeeded/failed” logs of the past can no longer answer today’s complex operational questions: “Why was this agent so smart yesterday, but today it’s spouting nonsense?”, “Why did our token costs suddenly double last month?”, “Is someone trying to attack our customer service bot with a prompt injection?”

From Improvement to Reinvention: Deconstructing the Three Philosophies and Selection Truths of Prometheus Monitoring Architecture

Sun, 04 Jan 2026 17:00:00 +0800

Looking back at the years spent navigating the observability space—especially around building metrics systems—it feels like a long architectural pilgrimage. From the early days of babysitting a standalone Prometheus and worrying about disk space, to introducing Thanos in an attempt to achieve “infinite storage,” and now rebuilding the entire monitoring hub with Mimir, these experiences are scattered in memory, with some details already starting to blur.

Recently, I took some time to systematically revisit the pitfalls I’ve encountered and the technical decisions I’ve made over the years. Suddenly, it struck me: this isn’t just a story of technical iteration; it’s a series of philosophical choices made when facing pain points at different scales. What I once thought were “upgrades” turned out to be fundamentally different species. This post serves as a salvage summary of those fading experiences, discussing what I see as three architectural patterns and why, at a certain scale, Mimir becomes the “right” choice.