All Posts - Shengxu · Cloud Architecture & DevOps

Two Real Problems in AI Programming: Multi-Project Task Management and Multi-User Collaboration Isolation

Sat, 09 May 2026 16:28:25 +0800

In multi-project, multi-developer AI programming practice, the continuity of task status and the isolation of personal configurations are key pain points affecting efficiency. This article proposes an engineering solution based on “sub-project Source of Truth” and “local rule isolation,” aiming to address cross-project task breakpoint management and team configuration pollution, while providing a replicable directory structure, read/write boundaries, and backup strategy.

Once an engineer starts using AI agents to write code frequently, the problem they quickly encounter isn’t “Can AI write functions?” but a more practical set of issues.

From Azure SRE Agent to HolmesGPT: AIOps Practices in Multi-Cloud Kubernetes Environments

Fri, 17 Apr 2026 19:40:00 +0800

In the multi-cloud Kubernetes era, the pain point for SREs is no longer just “too many alerts,” but rather investigation chains that are too long, context that is too scattered, and troubleshooting costs across clouds that are too high. What truly drains people isn’t glancing at a chart, but constantly switching between multiple cloud platforms, logging systems, deployment records, and ticketing systems.

This is why AI SRE Agents are starting to deliver real value. Their goal isn’t to be a better conversational Copilot, but to proactively take over the highly repetitive first half of the work—“checking logs, finding correlations, guessing root causes, and giving suggestions”—once an alert is triggered.

Cilium 2026 (Continued): How the Unified Data Plane Is Reshaping Kubernetes Platform Architecture

Sat, 21 Mar 2026 14:31:56 +0800

In the previous article on Cilium, we explored the real reasons behind the 2026 migration wave: it’s no longer just “a faster CNI,” but rather a reorganization of Kubernetes networking, security, observability, and multi-cluster capabilities into a more unified infrastructure foundation, while clarifying its division of labor and boundaries with Istio.

If the previous article answered “What exactly can Cilium bring us?”, this one goes further, focusing on its core evolution: the Unified Dataplane.

Before Discussing LLM Security, Is Your Kubernetes Foundation Up to Standard?

Sat, 14 Mar 2026 10:00:00 +0800

The explosion of Large Language Models (LLMs) and AI Agents has not only revolutionized business models but also introduced new application-layer security challenges such as prompt injection and data poisoning. While everyone’s attention is drawn to these cutting-edge vulnerabilities, let’s first pause and ask ourselves a fundamental question: Before diving into these complex AI security issues, is the cloud-native foundation that supports all our business workloads even up to par?

What Cilium Can Really Bring Us in 2026

Sun, 08 Mar 2026 10:30:00 +0800

——What Meaningful Changes It Actually Brings, and How to Divide Work with Istio

By 2026, many teams discussing Cilium are no longer asking “Is it worth trying?” but rather “When should we migrate?”

Weekend Project: Building a Local Load Balancer for LLM API Keys

Sat, 14 Feb 2026 10:18:00 +0800

Lately, because I’ve been using various LLM services (OpenAI, Gemini, DeepSeek, etc.) intensively, I’ve run into a very real pain point: being broke.

To save money, I applied for multiple free API keys (like Google Gemini’s Free Tier or DeepSeek’s complimentary credits), but these free keys often come with strict rate limits (RPM/TPM). Just when I’m in the flow writing code, a 429 Too Many Requests error pops up, completely breaking my train of thought. It’s really frustrating.

Hands-On · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)

Thu, 05 Feb 2026 16:00:00 +0800

In the previous post, we discussed the security of RAG systems and prompt injection protection. Today, let’s dive into another engineering deep-water zone: Observability.

When a system evolves from “it works” to “it works reliably long-term,” you will inevitably encounter three types of problems:

Slow: Is retrieval slow? Is the LLM slow? Or is some Agent stuck in a retry loop?
Expensive: Is token consumption being silently drained by a specific chain? Why doesn’t this month’s API bill add up?
Strange: Intermittent bugs that can’t be reproduced, leaving you to fix code based on “gut feeling.”

At this stage, I chose to build a complete Metrics + Logs system, rather than just sprinkling in a few print statements.

Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)

Wed, 04 Feb 2026 10:00:00 +0800

In the previous 2.5 articles, I’ve already laid out the backbone of FantasyNovelAgent:

This article dives deep into the most overlooked yet critical aspect of AI systems: Security.

If you’re thinking, “I’m just writing a novel, what security issues could there be?”, consider this:

Practical Guide: Building a Memory-Enabled AI Writing Partner (Kun) – Retrieval System (Vector Search, Hybrid Search & Cloud Deployment)

Wed, 28 Jan 2026 10:30:00 +0800

In “Practical · Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution”, I clarified how multiple agents collaborate and how memory is chained together. In “Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database Evolution (From JSON to Single Database to Relational Tables)”, I reviewed the evolution of the “fact layer” from JSON to SQLite and then to relational tables.

However, when the text length reaches hundreds of thousands of words, what truly determines the experience is often not “whether the data exists,” but “whether I can retrieve it”: exact lookup (did it appear or not), structured filtering (who belongs to whom), and semantic association (is it similar, is it the same atmosphere) must all work simultaneously. So I added a clear “index layer” to FantasyNovelAgent and expanded retrieval from “chapters” to the “full knowledge graph.”

Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database (From JSON to Single Table to Relational Tables)

Wed, 28 Jan 2026 10:00:00 +0800

If you’ve already read Building a Memory-Powered AI Writing Partner (Part 1): Multi-Agent Architecture Evolution, you likely have a high-level understanding of how multiple agents collaborate and how memory is chained together. But what truly makes a system viable long-term isn’t just a pretty architecture diagram—it requires a data foundation that can withstand growth: one that supports querying, modification, and rollback.

This article focuses on the evolution of the “fact layer” (the database): JSON files → SQLite single database (KV) → SQLite single database (relational tables). Semantic search, hybrid search, full graph indexing, and cloud migration are covered separately in the next article, Building a Memory-Powered AI Writing Partner (Part 2): Retrieval Systems (Vector Search, Hybrid Search, and Cloud Migration).

Practical Guide · Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution

Sun, 25 Jan 2026 10:00:00 +0800

When writing a long novel, the most painful part isn’t “not being able to write,” but “forgetting what you’ve already written”: Did I set up that foreshadowing properly? Was the character already injured in the last chapter? When exactly was that specific rule established? Once the word count reaches hundreds of thousands, relying solely on human memory and scattered notes quickly spirals out of control.

FantasyNovelAgent grew out of this very need, evolving step by step: starting as a simple Python script, then adding dynamic memory and automatic archiving, followed by multi-device sync support, and finally moving toward a front-end/back-end separation with a cloud-native storage prototype. This article reviews that evolutionary path and explains the key trade-offs made along the way, offering a reference for similar projects.

Kubernetes Complexity: Starting from a Job Interview Question

Sat, 24 Jan 2026 12:47:00 +0800

I recently went through a job interview where the interviewer posed a seemingly routine question: “In your opinion, when should you use Kubernetes, and when is it unnecessary and just adds complexity?”

I answered it fairly smoothly at the time, but the question lingered in my mind long afterward. What made it so “sharp” was that it stepped beyond the technical details of “how to use K8s” and cut straight to the core trade-off in architecture design: Are we introducing a tech stack to solve a real business pain point, or just to satisfy the team’s “anxiety about being cutting-edge”?

Hands-On: Building an Automated AI Semantic Search with Cloudflare Vectorize and Gemini

Fri, 23 Jan 2026 15:30:00 +0800

In 2026, adding AI search to a personal blog is nothing new. But achieving it with zero cost, full automation, and high performance remains a technical topic worth exploring.

This article breaks down the technical architecture behind this site’s AI Search feature, showing how to combine Cloudflare Workers, Vectorize, D1, and Google Gemini to build a closed-loop RAG (Retrieval-Augmented Generation) system.

1. Core Architecture Design

Our goal is a fully automated workflow: write and deploy. The author only needs to push Markdown articles; everything else—vector generation, index updates, frontend deployment—is automated.

OWASP LLM Top 10 Security in Practice

Fri, 23 Jan 2026 10:00:00 +0800

Yesterday I had the privilege of attending a talk by Sergey Saburov from Acronis on “Agentic Engineering & LLM Security.” Sergey provided an in-depth analysis of security threats facing modern LLM applications, along with numerous real-world case studies aligned with the OWASP LLM Top 10 framework.

I’ve organized and summarized the content based on the latest OWASP LLM Top 10 v2.0 (2025) official standard. I’ve corrected some terminology discrepancies from the original talk (e.g., LLM06, LLM10) and compiled Python PoC (Proof of Concept) and defense scripts tailored for Kubernetes platform engineers, hoping this serves as a reference for building secure AI systems.

Helm 4 Deep Dive: More Than a Version Bump – A New Beginning for the Kubernetes-Native Era

Thu, 22 Jan 2026 10:00:00 +0800

In the infrastructure world, some version updates are “icing on the cake,” while others are “transformative.” If Helm 3 freed us from the nightmare of Tiller, then Helm 4, officially released in November 2025, marks the coming-of-age moment when Helm truly understood and embraced Kubernetes’ declarative philosophy.

After two months of community validation and official documentation refinement, this article will clarify the easily misunderstood technical details based on Helm 4’s actual release state.

Kubernetes 1.35 Native Gang Scheduling: The Eve of Scheduling Ecosystem Unification

Wed, 21 Jan 2026 00:00:00 +0000

Kubernetes 1.35 introduces native Workload API and Gang Scheduling support, widely regarded as a “kernel-level refactoring” of cloud-native AI infrastructure. To truly grasp the significance of this upgrade, we need to look not only at what it brings but also at what it aims to replace (or merge with).

Before v1.35, to address the “resource deadlock” pain point of AI training tasks, the community had actually evolved a complex “third-party scheduler zoo.” This article starts from the native primitives, takes stock of existing ecosystem options, and reveals the architectural evolution direction in production environments.

When AI Gets Your Database Password: A Practical Guide to MCP Exposure Risks

Tue, 20 Jan 2026 00:00:00 +0000

Last year, a typical scenario sparked heated debate in the security community: a developer installed Supabase’s MCP plugin in Cursor and configured a service_role key (database super admin privileges) so the AI could query the database directly. One day, a customer casually asked in a ticket, “Can you show me our integration configuration?” The AI interpreted this as an instruction and printed the token directly in the reply.

While this case often appears in security reports as a “risk demonstration,” the problem it reveals is real: The MCP protocol grants AI operational permissions, and prompt injection attacks allow hackers to “hijack” these permissions through natural language.

From Traffic Gatekeeping to Quality Insight: A 2026 Guide to Building Enterprise-Grade LLM Observability Systems

Mon, 19 Jan 2026 15:00:00 +0800

As large language models (LLMs) evolve from “novelty toys” into the “productivity backbone” of enterprises, a question that every technical leader keeps coming back to has surfaced: When API calls become a black box, how do we manage these massive, expensive AI models with the same rigor we apply to databases or microservices?

If 2024 was the year everyone was busy “getting demos to work,” then 2026 marks the dawn of “fine-grained governance.” The simple “call succeeded/failed” logs of the past can no longer answer today’s complex operational questions: “Why was this agent so smart yesterday, but today it’s spouting nonsense?”, “Why did our token costs suddenly double last month?”, “Is someone trying to attack our customer service bot with a prompt injection?”

Dragonfly: Image and Model Distribution Infrastructure for the Cloud-Native Era

Thu, 15 Jan 2026 10:00:00 +0800

In 2026, as AI and cloud-native infrastructure continue to evolve, image and model distribution is shifting from a “peripheral optimization point” to a critical factor affecting platform efficiency. Traditional approaches relying on centralized Registry + CDN often face dual challenges of speed and cost when dealing with scenarios involving large-scale concurrent nodes and large-volume images or models. Against this backdrop, Dragonfly has grown into a CNCF Graduated project and is adopted in production environments by companies such as Ant Group, Alibaba, Datadog, DiDi, and Kuaishou to support efficient distribution of containers and AI models.

Farewell to iptables: The Nftables Revolution in Kubernetes Network Data Plane

Fri, 09 Jan 2026 14:00:00 +0800

In the networking world of Kubernetes, kube-proxy has long played the role of “gatekeeper,” responsible for distributing Service traffic to backend Pods. However, for years, we’ve endured the performance pain of iptables mode or been forced to migrate to the more complex IPVS mode.

Fast forward to 2026, with Kubernetes 1.33 reaching General Availability (GA) in April 2025, nftables mode is no longer an experimental option—it has become the “new standard” for production environments. In fact, with the release of v1.35 at the end of 2025, the once-reliable ipvs mode has been officially marked as Deprecated. This marks a complete “return to fundamentals” for the Linux kernel network stack in the cloud-native era.

From Improvement to Reinvention: Deconstructing the Three Philosophies and Selection Truths of Prometheus Monitoring Architecture

Sun, 04 Jan 2026 17:00:00 +0800

Looking back at the years spent navigating the observability space—especially around building metrics systems—it feels like a long architectural pilgrimage. From the early days of babysitting a standalone Prometheus and worrying about disk space, to introducing Thanos in an attempt to achieve “infinite storage,” and now rebuilding the entire monitoring hub with Mimir, these experiences are scattered in memory, with some details already starting to blur.

Recently, I took some time to systematically revisit the pitfalls I’ve encountered and the technical decisions I’ve made over the years. Suddenly, it struck me: this isn’t just a story of technical iteration; it’s a series of philosophical choices made when facing pain points at different scales. What I once thought were “upgrades” turned out to be fundamentally different species. This post serves as a salvage summary of those fading experiences, discussing what I see as three architectural patterns and why, at a certain scale, Mimir becomes the “right” choice.

Kubernetes 1.34/1.35 Certificate Revolution: From Manual Hell to Zero-Trust Heaven

Sat, 03 Jan 2026 19:00:00 +0800

Recently upgraded to 1.35 and discovered that certificate management changes are nothing short of revolutionary—especially for self-managed K8s users, where operational overhead has been cut in half.

In the past, certificate issues were the “silent killer” of security incidents: expired certificates causing outages, token leaks, and manual rotation consuming 30% of ops time. Versions 1.34/1.35 introduce native automated mTLS, making zero trust no longer exclusive to Istio. Today, let’s dive into these new features and compare them in a self-managed K8s vs. cloud K8s hands-on scenario.

Kubernetes v1.33–v1.35 Deep Dive: From Native Sidecar to AI Compute Foundation

Fri, 02 Jan 2026 09:50:00 +0800

Timeline Overview

v1.33 (Octarine): Released April 2025, Native Sidecar GA, security features enabled by default.
v1.34 (Of Wind & Will): Released August 2025, DRA GA, marking the native era of AI/GPU scheduling.
v1.35 (Timbernetes): Released December 2025, In-Place Pod Resize GA, zero-disruption elasticity becomes reality.

1. v1.33 “Octarine”: Sidecar Graduation and Default Security

The keywords for v1.33 are “Native Sidecar” and “Security Enabled by Default.” This release transforms long-standing experimental capabilities into dependable infrastructure for daily engineering.

IngressNightmare (CVE-2025-1974): Vulnerability Deep Dive and Gateway API Migration Guide

Sat, 27 Dec 2025 10:00:00 +0800

The recently disclosed “IngressNightmare” vulnerability in Ingress-NGINX has once again thrust nginx-ingress into the spotlight, serving as a stark warning for clusters still relying on traditional Ingress.

Below is a technical review focused on engineering practice, covering the vulnerability recap, risk analysis, short-term fixes, how to leverage this as an opportunity to migrate to Gateway API, and a comparison of pros and cons before and after migration.

Vulnerability Brief: IngressNightmare (CVE‑2025‑1974)

Severity: In March 2025, researchers disclosed a set of high-severity vulnerabilities in the Ingress-NGINX controller, collectively known as “IngressNightmare.” Among them, CVE‑2025‑1974 has a CVSS score of 9.8, rated as “Critical” by the official team and multiple security vendors, affecting a vast number of Kubernetes clusters.
Root Cause: The core issue lies in the Validating Admission Webhook. When validating an Ingress object, the controller generates an NGINX configuration based on the object and its annotations, then uses nginx -t for validation. During this process, insufficient filtering of annotations and configuration fragments allows attackers to inject arbitrary NGINX directives, ultimately leading to Remote Code Execution (RCE) on the controller Pod.
Low Attack Barrier: An attacker only needs access to the admission webhook within the Pod network (many clusters even expose it to the public internet) to trigger the vulnerability via unauthenticated requests. This is an unauthenticated RCE, highly susceptible to mass exploitation by worms or automated attack tools.
Vulnerability Chain: The same disclosure includes several other high-severity injection vulnerabilities (e.g., CVE‑2025‑24514, CVE‑2025‑1097, CVE‑2025‑1098), collectively forming the IngressNightmare vulnerability chain, with an attack surface far exceeding a single CVE.

Risk and Impact: From NGINX to Full Cluster Takeover

Sensitive Information Leakage: Once RCE is achieved within the ingress-nginx controller container, attackers can read all Kubernetes Secrets mounted to that Pod. Crucially, the NGINX Ingress Controller typically has extremely high privileges (ClusterRole), requiring it to read Secrets from all namespaces in the cluster to obtain TLS certificates. This means the consequence of RCE is not just the current Namespace, but the complete leakage of all cluster certificates and credentials.
Traffic Hijacking and Tampering: The controller usually has read and write permissions for Ingress resources in the cluster. Combined with RCE, attackers can further tamper with routing, transparently forwarding user traffic to attacker-controlled backends for man-in-the-middle attacks or data theft.
“One Hole to Rule the Cloud”: Practical tests by multiple security vendors show that in clusters with loose default network policies, an attacker only needs execution permissions on any Pod to laterally access the admission webhook, thereby escalating to cluster-level control.

Short-Term Remediation: Patch First, Rebuild Later

Before discussing Gateway API migration, all clusters still running ingress-nginx need to take two immediate actions: