Hands-On: Building an Automated AI Semantic Search With Cloudflare Vectorize and Gemini
In 2026, adding AI search to a personal blog is nothing new. But achieving it with zero cost, full automation, and high performance remains a technical topic worth exploring.
This article breaks down the technical architecture behind this site’s AI Search feature, showing how to combine Cloudflare Workers, Vectorize, D1, and Google Gemini to build a closed-loop RAG (Retrieval-Augmented Generation) system.
1. Core Architecture Design
Our goal is a fully automated workflow: write and deploy. The author only needs to push Markdown articles; everything else—vector generation, index updates, frontend deployment—is automated.
graph TD
subgraph "Control Plane (GitHub Actions)"
Push[Git Push Markdown] --> Action[Sync Workflow]
Action -->|Extract Text| Script[Python Script]
Script -->|Embed| Gemini[Gemini API]
Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
end
subgraph "Data Plane (Cloudflare)"
User[User Query] -->|Request| Worker[Cloudflare Worker]
Worker -->|Embed Query| Gemini
Worker -->|Search| Vectorize
Worker -->|Log & Stats| D1[D1 SQL Database]
Worker -->|Return Matches| User
end
graph TD
subgraph "Control Plane (GitHub Actions)"
Push[Git Push Markdown] --> Action[Sync Workflow]
Action -->|Extract Text| Script[Python Script]
Script -->|Embed| Gemini[Gemini API]
Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
end
subgraph "Data Plane (Cloudflare)"
User[User Query] -->|Request| Worker[Cloudflare Worker]
Worker -->|Embed Query| Gemini
Worker -->|Search| Vectorize
Worker -->|Log & Stats| D1[D1 SQL Database]
Worker -->|Return Matches| User
end
graph TD
subgraph "Control Plane (GitHub Actions)"
Push[Git Push Markdown] --> Action[Sync Workflow]
Action -->|Extract Text| Script[Python Script]
Script -->|Embed| Gemini[Gemini API]
Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
end
subgraph "Data Plane (Cloudflare)"
User[User Query] -->|Request| Worker[Cloudflare Worker]
Worker -->|Embed Query| Gemini
Worker -->|Search| Vectorize
Worker -->|Log & Stats| D1[D1 SQL Database]
Worker -->|Return Matches| User
end
graph TD
subgraph "Control Plane (GitHub Actions)"
Push[Git Push Markdown] --> Action[Sync Workflow]
Action -->|Extract Text| Script[Python Script]
Script -->|Embed| Gemini[Gemini API]
Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
end
subgraph "Data Plane (Cloudflare)"
User[User Query] -->|Request| Worker[Cloudflare Worker]
Worker -->|Embed Query| Gemini
Worker -->|Search| Vectorize
Worker -->|Log & Stats| D1[D1 SQL Database]
Worker -->|Return Matches| User
end
Key component choices:
- Embedding Model:
text-embedding-004(Google Gemini), 768 dimensions, free and performs well. - Vector Database: Cloudflare Vectorize, edge-native with extremely low query latency.
- Persistent Storage: Cloudflare D1 (SQLite), used for storing search logs and statistics.
- Compute Runtime: Cloudflare Workers, handling business logic.
2. Automated Vector Sync (Control Plane)
To avoid the hassle of “publishing an article and then manually running a script,” we built an automated sync pipeline using GitHub Actions.
Recursive File Scanning and Vectorization
Traditional sync scripts often only scan the root directory, but Hugo blogs typically use a Page Bundle structure (content/posts/xxx/index.md). We need to recursively find all Markdown files and extract metadata from the Frontmatter.
| |
GitHub Actions Trigger
Configure the Workflow to listen for changes in content/**. Once a push is detected, the sync is triggered immediately.
| |
3. Edge Semantic Search (Data Plane)
The Worker handles frontend search requests. Its core responsibility is “translation”: converting the user’s natural language query into a vector, then searching the database for the “closest” articles.
Vector Space Distance and Threshold Control
During implementation, we discovered a key issue: RAG always tends to return results, even if they are completely irrelevant. For example, searching for “Master of Laws” might force the vector database to return an article about “Prometheus monitoring,” simply because some implicit dimensions (like “learning,” “exam”) overlap slightly.
To solve this, we introduced dynamic threshold logic at the Worker layer:
| |
This 0.40 threshold is an empirical value derived from extensive testing. Matches below this score are typically noise.
4. Content Gap Insight System
This is the most unique feature of this site’s AI search: it not only tells users what exists, but also tells the author what’s missing.
We use the D1 database to record the has_results status for every search. By aggregating queries where has_results = 0, we can generate a “Unanswered Questions (Content Gaps)” list.
SQL Aggregation Analysis
The Worker exposes an action=stats endpoint that executes the following SQL:
| |
The frontend renders this as a “Everyone is asking (Unanswered)” panel:
This creates a perfect content production feedback loop:
- User searches with no results.
- System records it as a Content Gap.
- Author sees the demand on the Dashboard.
- Author writes a new article.
- Vectors are automatically synced, filling the Gap.
5. Summary
Using the Cloudflare family (Workers + Vectorize + D1), we built an enterprise-grade AI search system with fewer than 200 lines of code. It’s not only extremely fast (edge computing) but also completely free (for a personal blog’s scale).
Most importantly, it transforms a blog from a one-way static site into a dynamic system that can sense user needs and guide content creation.
🤖 AI Related Posts by semantic similarity
Want updates? Subscribe via RSS
Related Content
- Practical Guide: Building a Memory-Enabled AI Writing Partner (Kun) – Retrieval System (Vector Search, Hybrid Search & Cloud Deployment)
- Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)
- Practical Guide · Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution
- Two Real Problems in AI Programming: Multi-Project Task Management and Multi-User Collaboration Isolation
- From Azure SRE Agent to HolmesGPT: AIOps Practices in Multi-Cloud Kubernetes Environments