AI - Category - Shengxu · Cloud Architecture & DevOps

Two Real Problems in AI Programming: Multi-Project Task Management and Multi-User Collaboration Isolation

Sat, 09 May 2026 16:28:25 +0800

In multi-project, multi-developer AI programming practice, the continuity of task status and the isolation of personal configurations are key pain points affecting efficiency. This article proposes an engineering solution based on “sub-project Source of Truth” and “local rule isolation,” aiming to address cross-project task breakpoint management and team configuration pollution, while providing a replicable directory structure, read/write boundaries, and backup strategy.

Once an engineer starts using AI agents to write code frequently, the problem they quickly encounter isn’t “Can AI write functions?” but a more practical set of issues.

From Azure SRE Agent to HolmesGPT: AIOps Practices in Multi-Cloud Kubernetes Environments

Fri, 17 Apr 2026 19:40:00 +0800

In the multi-cloud Kubernetes era, the pain point for SREs is no longer just “too many alerts,” but rather investigation chains that are too long, context that is too scattered, and troubleshooting costs across clouds that are too high. What truly drains people isn’t glancing at a chart, but constantly switching between multiple cloud platforms, logging systems, deployment records, and ticketing systems.

This is why AI SRE Agents are starting to deliver real value. Their goal isn’t to be a better conversational Copilot, but to proactively take over the highly repetitive first half of the work—“checking logs, finding correlations, guessing root causes, and giving suggestions”—once an alert is triggered.

Cilium 2026 (Continued): How the Unified Data Plane Is Reshaping Kubernetes Platform Architecture

Sat, 21 Mar 2026 14:31:56 +0800

In the previous article on Cilium, we explored the real reasons behind the 2026 migration wave: it’s no longer just “a faster CNI,” but rather a reorganization of Kubernetes networking, security, observability, and multi-cluster capabilities into a more unified infrastructure foundation, while clarifying its division of labor and boundaries with Istio.

If the previous article answered “What exactly can Cilium bring us?”, this one goes further, focusing on its core evolution: the Unified Dataplane.

Weekend Project: Building a Local Load Balancer for LLM API Keys

Sat, 14 Feb 2026 10:18:00 +0800

Lately, because I’ve been using various LLM services (OpenAI, Gemini, DeepSeek, etc.) intensively, I’ve run into a very real pain point: being broke.

To save money, I applied for multiple free API keys (like Google Gemini’s Free Tier or DeepSeek’s complimentary credits), but these free keys often come with strict rate limits (RPM/TPM). Just when I’m in the flow writing code, a 429 Too Many Requests error pops up, completely breaking my train of thought. It’s really frustrating.

Hands-On · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)

Thu, 05 Feb 2026 16:00:00 +0800

In the previous post, we discussed the security of RAG systems and prompt injection protection. Today, let’s dive into another engineering deep-water zone: Observability.

When a system evolves from “it works” to “it works reliably long-term,” you will inevitably encounter three types of problems:

Slow: Is retrieval slow? Is the LLM slow? Or is some Agent stuck in a retry loop?
Expensive: Is token consumption being silently drained by a specific chain? Why doesn’t this month’s API bill add up?
Strange: Intermittent bugs that can’t be reproduced, leaving you to fix code based on “gut feeling.”

At this stage, I chose to build a complete Metrics + Logs system, rather than just sprinkling in a few print statements.

Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)

Wed, 04 Feb 2026 10:00:00 +0800

In the previous 2.5 articles, I’ve already laid out the backbone of FantasyNovelAgent:

This article dives deep into the most overlooked yet critical aspect of AI systems: Security.

If you’re thinking, “I’m just writing a novel, what security issues could there be?”, consider this:

Practical Guide: Building a Memory-Enabled AI Writing Partner (Kun) – Retrieval System (Vector Search, Hybrid Search & Cloud Deployment)

Wed, 28 Jan 2026 10:30:00 +0800

In “Practical · Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution”, I clarified how multiple agents collaborate and how memory is chained together. In “Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database Evolution (From JSON to Single Database to Relational Tables)”, I reviewed the evolution of the “fact layer” from JSON to SQLite and then to relational tables.

However, when the text length reaches hundreds of thousands of words, what truly determines the experience is often not “whether the data exists,” but “whether I can retrieve it”: exact lookup (did it appear or not), structured filtering (who belongs to whom), and semantic association (is it similar, is it the same atmosphere) must all work simultaneously. So I added a clear “index layer” to FantasyNovelAgent and expanded retrieval from “chapters” to the “full knowledge graph.”

Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database (From JSON to Single Table to Relational Tables)

Wed, 28 Jan 2026 10:00:00 +0800

If you’ve already read Building a Memory-Powered AI Writing Partner (Part 1): Multi-Agent Architecture Evolution, you likely have a high-level understanding of how multiple agents collaborate and how memory is chained together. But what truly makes a system viable long-term isn’t just a pretty architecture diagram—it requires a data foundation that can withstand growth: one that supports querying, modification, and rollback.

This article focuses on the evolution of the “fact layer” (the database): JSON files → SQLite single database (KV) → SQLite single database (relational tables). Semantic search, hybrid search, full graph indexing, and cloud migration are covered separately in the next article, Building a Memory-Powered AI Writing Partner (Part 2): Retrieval Systems (Vector Search, Hybrid Search, and Cloud Migration).

Practical Guide · Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution

Sun, 25 Jan 2026 10:00:00 +0800

When writing a long novel, the most painful part isn’t “not being able to write,” but “forgetting what you’ve already written”: Did I set up that foreshadowing properly? Was the character already injured in the last chapter? When exactly was that specific rule established? Once the word count reaches hundreds of thousands, relying solely on human memory and scattered notes quickly spirals out of control.

FantasyNovelAgent grew out of this very need, evolving step by step: starting as a simple Python script, then adding dynamic memory and automatic archiving, followed by multi-device sync support, and finally moving toward a front-end/back-end separation with a cloud-native storage prototype. This article reviews that evolutionary path and explains the key trade-offs made along the way, offering a reference for similar projects.

Hands-On: Building an Automated AI Semantic Search with Cloudflare Vectorize and Gemini

Fri, 23 Jan 2026 15:30:00 +0800

In 2026, adding AI search to a personal blog is nothing new. But achieving it with zero cost, full automation, and high performance remains a technical topic worth exploring.

This article breaks down the technical architecture behind this site’s AI Search feature, showing how to combine Cloudflare Workers, Vectorize, D1, and Google Gemini to build a closed-loop RAG (Retrieval-Augmented Generation) system.

1. Core Architecture Design

Our goal is a fully automated workflow: write and deploy. The author only needs to push Markdown articles; everything else—vector generation, index updates, frontend deployment—is automated.

OWASP LLM Top 10 Security in Practice

Fri, 23 Jan 2026 10:00:00 +0800

Yesterday I had the privilege of attending a talk by Sergey Saburov from Acronis on “Agentic Engineering & LLM Security.” Sergey provided an in-depth analysis of security threats facing modern LLM applications, along with numerous real-world case studies aligned with the OWASP LLM Top 10 framework.

I’ve organized and summarized the content based on the latest OWASP LLM Top 10 v2.0 (2025) official standard. I’ve corrected some terminology discrepancies from the original talk (e.g., LLM06, LLM10) and compiled Python PoC (Proof of Concept) and defense scripts tailored for Kubernetes platform engineers, hoping this serves as a reference for building secure AI systems.

Kubernetes 1.35 Native Gang Scheduling: The Eve of Scheduling Ecosystem Unification

Wed, 21 Jan 2026 00:00:00 +0000

Kubernetes 1.35 introduces native Workload API and Gang Scheduling support, widely regarded as a “kernel-level refactoring” of cloud-native AI infrastructure. To truly grasp the significance of this upgrade, we need to look not only at what it brings but also at what it aims to replace (or merge with).

Before v1.35, to address the “resource deadlock” pain point of AI training tasks, the community had actually evolved a complex “third-party scheduler zoo.” This article starts from the native primitives, takes stock of existing ecosystem options, and reveals the architectural evolution direction in production environments.

When AI Gets Your Database Password: A Practical Guide to MCP Exposure Risks

Tue, 20 Jan 2026 00:00:00 +0000

Last year, a typical scenario sparked heated debate in the security community: a developer installed Supabase’s MCP plugin in Cursor and configured a service_role key (database super admin privileges) so the AI could query the database directly. One day, a customer casually asked in a ticket, “Can you show me our integration configuration?” The AI interpreted this as an instruction and printed the token directly in the reply.

While this case often appears in security reports as a “risk demonstration,” the problem it reveals is real: The MCP protocol grants AI operational permissions, and prompt injection attacks allow hackers to “hijack” these permissions through natural language.

From Traffic Gatekeeping to Quality Insight: A 2026 Guide to Building Enterprise-Grade LLM Observability Systems

Mon, 19 Jan 2026 15:00:00 +0800

As large language models (LLMs) evolve from “novelty toys” into the “productivity backbone” of enterprises, a question that every technical leader keeps coming back to has surfaced: When API calls become a black box, how do we manage these massive, expensive AI models with the same rigor we apply to databases or microservices?

If 2024 was the year everyone was busy “getting demos to work,” then 2026 marks the dawn of “fine-grained governance.” The simple “call succeeded/failed” logs of the past can no longer answer today’s complex operational questions: “Why was this agent so smart yesterday, but today it’s spouting nonsense?”, “Why did our token costs suddenly double last month?”, “Is someone trying to attack our customer service bot with a prompt injection?”

Dragonfly: Image and Model Distribution Infrastructure for the Cloud-Native Era

Thu, 15 Jan 2026 10:00:00 +0800

In 2026, as AI and cloud-native infrastructure continue to evolve, image and model distribution is shifting from a “peripheral optimization point” to a critical factor affecting platform efficiency. Traditional approaches relying on centralized Registry + CDN often face dual challenges of speed and cost when dealing with scenarios involving large-scale concurrent nodes and large-volume images or models. Against this backdrop, Dragonfly has grown into a CNCF Graduated project and is adopted in production environments by companies such as Ant Group, Alibaba, Datadog, DiDi, and Kuaishou to support efficient distribution of containers and AI models.