Hands-On: Building an Automated AI Semantic Search With Cloudflare Vectorize and Gemini

Shengxu included in AI DevOps

2026-01-23 About 1000 words 5 minutes

Contents

In 2026, adding AI search to a personal blog is nothing new. But achieving it with zero cost, full automation, and high performance remains a technical topic worth exploring.

This article breaks down the technical architecture behind this site’s AI Search feature, showing how to combine Cloudflare Workers, Vectorize, D1, and Google Gemini to build a closed-loop RAG (Retrieval-Augmented Generation) system.

1. Core Architecture Design

Our goal is a fully automated workflow: write and deploy. The author only needs to push Markdown articles; everything else—vector generation, index updates, frontend deployment—is automated.

graph TD
    subgraph "Control Plane (GitHub Actions)"
        Push[Git Push Markdown] --> Action[Sync Workflow]
        Action -->|Extract Text| Script[Python Script]
        Script -->|Embed| Gemini[Gemini API]
        Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
    end

    subgraph "Data Plane (Cloudflare)"
        User[User Query] -->|Request| Worker[Cloudflare Worker]
        Worker -->|Embed Query| Gemini
        Worker -->|Search| Vectorize
        Worker -->|Log & Stats| D1[D1 SQL Database]
        Worker -->|Return Matches| User
    end

graph TD
    subgraph "Control Plane (GitHub Actions)"
        Push[Git Push Markdown] --> Action[Sync Workflow]
        Action -->|Extract Text| Script[Python Script]
        Script -->|Embed| Gemini[Gemini API]
        Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
    end

    subgraph "Data Plane (Cloudflare)"
        User[User Query] -->|Request| Worker[Cloudflare Worker]
        Worker -->|Embed Query| Gemini
        Worker -->|Search| Vectorize
        Worker -->|Log & Stats| D1[D1 SQL Database]
        Worker -->|Return Matches| User
    end

graph TD
    subgraph "Control Plane (GitHub Actions)"
        Push[Git Push Markdown] --> Action[Sync Workflow]
        Action -->|Extract Text| Script[Python Script]
        Script -->|Embed| Gemini[Gemini API]
        Script -->|Upsert Vectors| Vectorize[Cloudflare Vectorize]
    end

    subgraph "Data Plane (Cloudflare)"
        User[User Query] -->|Request| Worker[Cloudflare Worker]
        Worker -->|Embed Query| Gemini
        Worker -->|Search| Vectorize
        Worker -->|Log & Stats| D1[D1 SQL Database]
        Worker -->|Return Matches| User
    end

Key component choices:

Embedding Model: text-embedding-004 (Google Gemini), 768 dimensions, free and performs well.
Vector Database: Cloudflare Vectorize, edge-native with extremely low query latency.
Persistent Storage: Cloudflare D1 (SQLite), used for storing search logs and statistics.
Compute Runtime: Cloudflare Workers, handling business logic.

2. Automated Vector Sync (Control Plane)

To avoid the hassle of “publishing an article and then manually running a script,” we built an automated sync pipeline using GitHub Actions.

Recursive File Scanning and Vectorization

Traditional sync scripts often only scan the root directory, but Hugo blogs typically use a Page Bundle structure (content/posts/xxx/index.md). We need to recursively find all Markdown files and extract metadata from the Frontmatter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# scripts/sync_vectors.py core logic
def sync_all():
    # 1. Recursively find all articles
    files = glob.glob("content/posts/**/*.md", recursive=True)
    
    for filepath in files:
        post = frontmatter.load(filepath)
        # 2. Build semantic fingerprint: title + description + body summary
        text_chunk = f"{post.metadata.get('title')} {post.metadata.get('description')} {post.content[:800]}"
        
        # 3. Call Gemini to generate embedding
        embedding = get_embedding(text_chunk)
        
        # 4. Prepare Upsert data
        vectors.append({
            "id": slug,
            "values": embedding,
            "metadata": { "title": post.title, "url": post.url }
        })

GitHub Actions Trigger

Configure the Workflow to listen for changes in content/**. Once a push is detected, the sync is triggered immediately.

1
2
3
4
5
# .github/workflows/ai-sync.yml
on:
  push:
    paths:
      - 'content/**'  # Only trigger on content changes

3. Edge Semantic Search (Data Plane)

The Worker handles frontend search requests. Its core responsibility is “translation”: converting the user’s natural language query into a vector, then searching the database for the “closest” articles.

Vector Space Distance and Threshold Control

During implementation, we discovered a key issue: RAG always tends to return results, even if they are completely irrelevant. For example, searching for “Master of Laws” might force the vector database to return an article about “Prometheus monitoring,” simply because some implicit dimensions (like “learning,” “exam”) overlap slightly.

To solve this, we introduced dynamic threshold logic at the Worker layer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// my-blog-ai/src/index.js
const matches = await env.VECTOR_INDEX.query(vector, { topK: 5 });

// Determine if it's a "valid search"
// Only consider it a match if the most relevant result's similarity > 0.40
const hasResults = matches.matches.length > 0 && matches.matches[0].score > 0.40;

// Asynchronously log to D1 for later analysis
ctx.waitUntil(
  env.DB.prepare("INSERT INTO search_logs (query, has_results) VALUES (?, ?)")
    .bind(query, hasResults ? 1 : 0).run()
);

This 0.40 threshold is an empirical value derived from extensive testing. Matches below this score are typically noise.

4. Content Gap Insight System

This is the most unique feature of this site’s AI search: it not only tells users what exists, but also tells the author what’s missing.

We use the D1 database to record the has_results status for every search. By aggregating queries where has_results = 0, we can generate a “Unanswered Questions (Content Gaps)” list.

SQL Aggregation Analysis

The Worker exposes an action=stats endpoint that executes the following SQL:

1
2
3
4
5
6
7
-- Find high-frequency questions users searched for but got no results
SELECT query, COUNT(*) as count 
FROM search_logs 
WHERE has_results = 0 
GROUP BY query 
ORDER BY count DESC 
LIMIT 10

The frontend renders this as a “Everyone is asking (Unanswered)” panel:

(Illustration: Popular valid searches on the left, content users are interested in but missing from this site on the right)

This creates a perfect content production feedback loop:

User searches with no results.
System records it as a Content Gap.
Author sees the demand on the Dashboard.
Author writes a new article.
Vectors are automatically synced, filling the Gap.

5. Summary

Using the Cloudflare family (Workers + Vectorize + D1), we built an enterprise-grade AI search system with fewer than 200 lines of code. It’s not only extremely fast (edge computing) but also completely free (for a personal blog’s scale).

Most importantly, it transforms a blog from a one-way static site into a dynamic system that can sense user needs and guide content creation.

Want updates? Subscribe via RSS