<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>MacOS - Tag - Shengxu · Cloud Architecture &amp; DevOps</title><link>https://shengxu.pages.dev/en/tags/macos/</link><description>Cloud architecture &amp; DevOps notes by Shengxu: Kubernetes, Cilium, observability, LLM infra, AI agents.</description><generator>Hugo 0.153.2 &amp; FixIt v0.4.0-alpha.3-20251225101113-8ffb9a95</generator><language>en</language><lastBuildDate>Sat, 14 Feb 2026 10:18:00 +0800</lastBuildDate><atom:link href="https://shengxu.pages.dev/en/tags/macos/index.xml" rel="self" type="application/rss+xml"/><item><title>Weekend Project: Building a Local Load Balancer for LLM API Keys</title><link>https://shengxu.pages.dev/en/posts/llm-api-load-balancer/</link><pubDate>Sat, 14 Feb 2026 10:18:00 +0800</pubDate><guid>https://shengxu.pages.dev/en/posts/llm-api-load-balancer/</guid><category domain="https://shengxu.pages.dev/en/categories/ai/">AI</category><category domain="https://shengxu.pages.dev/en/categories/devops/">DevOps</category><category domain="https://shengxu.pages.dev/en/categories/observability/">Observability</category><description>&lt;p&gt;Lately, because I&amp;rsquo;ve been using various LLM services (OpenAI, Gemini, DeepSeek, etc.) intensively, I&amp;rsquo;ve run into a very real pain point: &lt;strong&gt;being broke&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To save money, I applied for multiple free API keys (like Google Gemini&amp;rsquo;s Free Tier or DeepSeek&amp;rsquo;s complimentary credits), but these free keys often come with strict rate limits (RPM/TPM). Just when I&amp;rsquo;m in the flow writing code, a &lt;code&gt;429 Too Many Requests&lt;/code&gt; error pops up, completely breaking my train of thought. It&amp;rsquo;s really frustrating.&lt;/p&gt;</description></item></channel></rss>