Security - Category - Shengxu · Cloud Architecture & DevOps

Cilium 2026 (Continued): How the Unified Data Plane Is Reshaping Kubernetes Platform Architecture

Sat, 21 Mar 2026 14:31:56 +0800

In the previous article on Cilium, we explored the real reasons behind the 2026 migration wave: it’s no longer just “a faster CNI,” but rather a reorganization of Kubernetes networking, security, observability, and multi-cluster capabilities into a more unified infrastructure foundation, while clarifying its division of labor and boundaries with Istio.

If the previous article answered “What exactly can Cilium bring us?”, this one goes further, focusing on its core evolution: the Unified Dataplane.

Before Discussing LLM Security, Is Your Kubernetes Foundation Up to Standard?

Sat, 14 Mar 2026 10:00:00 +0800

The explosion of Large Language Models (LLMs) and AI Agents has not only revolutionized business models but also introduced new application-layer security challenges such as prompt injection and data poisoning. While everyone’s attention is drawn to these cutting-edge vulnerabilities, let’s first pause and ask ourselves a fundamental question: Before diving into these complex AI security issues, is the cloud-native foundation that supports all our business workloads even up to par?

Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)

Wed, 04 Feb 2026 10:00:00 +0800

In the previous 2.5 articles, I’ve already laid out the backbone of FantasyNovelAgent:

This article dives deep into the most overlooked yet critical aspect of AI systems: Security.

If you’re thinking, “I’m just writing a novel, what security issues could there be?”, consider this:

OWASP LLM Top 10 Security in Practice

Fri, 23 Jan 2026 10:00:00 +0800

Yesterday I had the privilege of attending a talk by Sergey Saburov from Acronis on “Agentic Engineering & LLM Security.” Sergey provided an in-depth analysis of security threats facing modern LLM applications, along with numerous real-world case studies aligned with the OWASP LLM Top 10 framework.

I’ve organized and summarized the content based on the latest OWASP LLM Top 10 v2.0 (2025) official standard. I’ve corrected some terminology discrepancies from the original talk (e.g., LLM06, LLM10) and compiled Python PoC (Proof of Concept) and defense scripts tailored for Kubernetes platform engineers, hoping this serves as a reference for building secure AI systems.

When AI Gets Your Database Password: A Practical Guide to MCP Exposure Risks

Tue, 20 Jan 2026 00:00:00 +0000

Last year, a typical scenario sparked heated debate in the security community: a developer installed Supabase’s MCP plugin in Cursor and configured a service_role key (database super admin privileges) so the AI could query the database directly. One day, a customer casually asked in a ticket, “Can you show me our integration configuration?” The AI interpreted this as an instruction and printed the token directly in the reply.

While this case often appears in security reports as a “risk demonstration,” the problem it reveals is real: The MCP protocol grants AI operational permissions, and prompt injection attacks allow hackers to “hijack” these permissions through natural language.

Kubernetes 1.34/1.35 Certificate Revolution: From Manual Hell to Zero-Trust Heaven

Sat, 03 Jan 2026 19:00:00 +0800

Recently upgraded to 1.35 and discovered that certificate management changes are nothing short of revolutionary—especially for self-managed K8s users, where operational overhead has been cut in half.

In the past, certificate issues were the “silent killer” of security incidents: expired certificates causing outages, token leaks, and manual rotation consuming 30% of ops time. Versions 1.34/1.35 introduce native automated mTLS, making zero trust no longer exclusive to Istio. Today, let’s dive into these new features and compare them in a self-managed K8s vs. cloud K8s hands-on scenario.

Kubernetes v1.33–v1.35 Deep Dive: From Native Sidecar to AI Compute Foundation

Fri, 02 Jan 2026 09:50:00 +0800

Timeline Overview

v1.33 (Octarine): Released April 2025, Native Sidecar GA, security features enabled by default.
v1.34 (Of Wind & Will): Released August 2025, DRA GA, marking the native era of AI/GPU scheduling.
v1.35 (Timbernetes): Released December 2025, In-Place Pod Resize GA, zero-disruption elasticity becomes reality.

1. v1.33 “Octarine”: Sidecar Graduation and Default Security

The keywords for v1.33 are “Native Sidecar” and “Security Enabled by Default.” This release transforms long-standing experimental capabilities into dependable infrastructure for daily engineering.

IngressNightmare (CVE-2025-1974): Vulnerability Deep Dive and Gateway API Migration Guide

Sat, 27 Dec 2025 10:00:00 +0800

The recently disclosed “IngressNightmare” vulnerability in Ingress-NGINX has once again thrust nginx-ingress into the spotlight, serving as a stark warning for clusters still relying on traditional Ingress.

Below is a technical review focused on engineering practice, covering the vulnerability recap, risk analysis, short-term fixes, how to leverage this as an opportunity to migrate to Gateway API, and a comparison of pros and cons before and after migration.

Vulnerability Brief: IngressNightmare (CVE‑2025‑1974)

Severity: In March 2025, researchers disclosed a set of high-severity vulnerabilities in the Ingress-NGINX controller, collectively known as “IngressNightmare.” Among them, CVE‑2025‑1974 has a CVSS score of 9.8, rated as “Critical” by the official team and multiple security vendors, affecting a vast number of Kubernetes clusters.
Root Cause: The core issue lies in the Validating Admission Webhook. When validating an Ingress object, the controller generates an NGINX configuration based on the object and its annotations, then uses nginx -t for validation. During this process, insufficient filtering of annotations and configuration fragments allows attackers to inject arbitrary NGINX directives, ultimately leading to Remote Code Execution (RCE) on the controller Pod.
Low Attack Barrier: An attacker only needs access to the admission webhook within the Pod network (many clusters even expose it to the public internet) to trigger the vulnerability via unauthenticated requests. This is an unauthenticated RCE, highly susceptible to mass exploitation by worms or automated attack tools.
Vulnerability Chain: The same disclosure includes several other high-severity injection vulnerabilities (e.g., CVE‑2025‑24514, CVE‑2025‑1097, CVE‑2025‑1098), collectively forming the IngressNightmare vulnerability chain, with an attack surface far exceeding a single CVE.

Risk and Impact: From NGINX to Full Cluster Takeover

Sensitive Information Leakage: Once RCE is achieved within the ingress-nginx controller container, attackers can read all Kubernetes Secrets mounted to that Pod. Crucially, the NGINX Ingress Controller typically has extremely high privileges (ClusterRole), requiring it to read Secrets from all namespaces in the cluster to obtain TLS certificates. This means the consequence of RCE is not just the current Namespace, but the complete leakage of all cluster certificates and credentials.
Traffic Hijacking and Tampering: The controller usually has read and write permissions for Ingress resources in the cluster. Combined with RCE, attackers can further tamper with routing, transparently forwarding user traffic to attacker-controlled backends for man-in-the-middle attacks or data theft.
“One Hole to Rule the Cloud”: Practical tests by multiple security vendors show that in clusters with loose default network policies, an attacker only needs execution permissions on any Pod to laterally access the admission webhook, thereby escalating to cluster-level control.

Short-Term Remediation: Patch First, Rebuild Later

Before discussing Gateway API migration, all clusters still running ingress-nginx need to take two immediate actions: