Practical Guide · Building a Memory-Enabled AI Writing Partner (Part 1): Multi-Agent Architecture Evolution
When writing a long novel, the most painful part isn’t “not being able to write,” but “forgetting what you’ve already written”: Did I set up that foreshadowing properly? Was the character already injured in the last chapter? When exactly was that specific rule established? Once the word count reaches hundreds of thousands, relying solely on human memory and scattered notes quickly spirals out of control.
FantasyNovelAgent grew out of this very need, evolving step by step: starting as a simple Python script, then adding dynamic memory and automatic archiving, followed by multi-device sync support, and finally moving toward a front-end/back-end separation with a cloud-native storage prototype. This article reviews that evolutionary path and explains the key trade-offs made along the way, offering a reference for similar projects.
If you’d like to try the project yourself, here’s an online demo: demo online (feel free to test it). To prevent abuse and cost leakage, the demo requires you to fill in your own LLM API Key in the settings before it will actually invoke the model’s capabilities.

1. Core Features: How AI Writes Like a Partner
Before diving into the technical architecture, let’s look at what it can do. FantasyNovelAgent is not a simple “continuation tool”; it’s more like a “writing studio” staffed by multiple experts.
1.1 Brainstorming
When you hit a wall, click “Auto Brainstorm.” The system analyzes the plot direction of the last 10 chapters, unresolved plot points (future plans), and the world’s setting, then provides 3 distinct plot branches. You can choose one or blend their ideas.
1.2 Writing & Polishing
- Muse: Handles the “skeleton.” Based on your chosen outline, it quickly generates a ~2000-word first draft, focusing on plot progression and planting foreshadowing.
- Stylist: Handles the “flesh.” It deeply polishes the draft, transforming a bland “he threw a punch” into “a fist howled through the air, carrying the force of a thunderbolt…”, ensuring the style matches the tone of a “modern xianxia power fantasy.”
1.3 Active Memory System
This is the project’s killer feature. You don’t need to manually maintain a “character sheet” or “inventory.”
- The Archivist works silently in the background. After you finish a chapter, it automatically analyzes the text: “The protagonist obtained the ‘Azure Cloud Sword’.” “‘Li Si’ was mortally wounded and died.”
- This information is extracted as structured data and stored in the SQLite database. When writing the next chapter, the AI won’t confuse whether the protagonist is holding a sword or a knife.

graph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Contextgraph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Contextgraph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Contextgraph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Context1.4 Logic Guard
Want the protagonist to suddenly learn a forbidden technique from a rival sect? The Guard will immediately warn you: “Detected setting conflict: This forbidden technique requires ‘Demonic Bloodline,’ but the protagonist currently has a ‘Pure Yang Body’.”
1.5 LLM Strategy
To achieve the best results, I didn’t bind to a single model but adopted a “horses for courses” strategy:
| Task Type | Recommended Model | Reason |
|---|---|---|
| Logic Check / Complex Reasoning | DeepSeek R1 / OpenAI o1 | These “reasoning” models perform long chain-of-thought (CoT) thinking before outputting, making them excellent for finding plot holes or designing complex intellectual battles. |
| Drafting / Polishing | Claude 3.5 Sonnet / GPT-4o | Excellent prose, natural language flow, especially good at environmental descriptions and emotional rendering. |
| Memory Extraction / Summarization | Gemini Flash / DeepSeek V3 | Fast, low cost, large context window, suitable for processing large volumes of text for analysis tasks. |

2. Architecture Evolution: From Files to Database
In the project’s early days, to quickly validate the idea, I used the simplest “file system storage” approach.
- Chapters: Each chapter was a
.txtfile. - Memory: Character cards, world settings, and plot outlines were stored as
character_db.json,world_settings.md, etc. - Advantages: Extremely fast development, Git-friendly version control, human-readable.
- Disadvantages: As the number of chapters grew (e.g., reaching chapter 100), the
data/directory would become cluttered with hundreds of small files. File I/O became frequent, and complex queries (like “search all chapters mentioning ‘Azure Cloud Sword’”) were difficult.
3. Feature Completion and Automation
As the core logic solidified, I introduced more engineering features:
- Intent Router: Routes requests to different Agents based on the user’s natural language instruction (“Help me write a fight scene” vs. “Check this chapter for bugs”).
- Usage Tracking: Integrated token consumption statistics for clear cost visibility.
- Auto-Archiving: When the user clicks “Save,” the system not only writes the file but also triggers a series of background tasks—updating the summary chain, checking the completion status of future plans, etc.
4. Deployment: Putting AI in a Raspberry Pi
To enable writing anytime, anywhere, I deployed the project on my home Raspberry Pi.
- Tunneling: Used Cloudflare Tunnel for secure access via a custom domain without needing a public IP.
- Automated Ops: Wrote
systemdservice scripts for auto-start on boot and process monitoring. - One-Click Deploy: Developed a
deploy.shscript. After writing code on my Mac, a single command automatically handles Git commit, code sync (Rsync), and remote service restart.
5. Key Turning Point: SQLite Architecture Refactoring
This was the most significant recent bottom-layer overhaul.
As the drawbacks of the “file-as-database” model became increasingly apparent, I decided to introduce SQLite.
5.1 Why Change?
- Data Integrity: The file system lacks transaction support; a write interruption could corrupt JSON files.
- Query Capability: I needed more powerful retrieval to support the AI’s “long-term memory.”
- Deployment Complexity: Syncing 1000 small files is far more error-prone than syncing a single
.dbfile.
5.2 Refactoring Plan
I designed an Abstract Storage Layer:
- Interface-based: Decoupled the business logic in
memory_manager.pyfrom the underlying I/O. - Data Migration: Wrote scripts to seamlessly import old JSON/TXT data into
novel.db. - Hybrid Architecture:
- Core Data (chapters, memories, drafts) → SQLite
- Configuration & Logs (API Keys, Logs) → Separate JSON files (easier for Git to ignore and for log rotation)
5.3 Bidirectional Sync Flow
To prevent the disaster of “writing new chapters on the Raspberry Pi, only to have them overwritten by old code on the Mac,” I added data rollback protection to the deployment script:
- Sync Back: Before deployment, the script pulls the latest
novel.dbfrom the Raspberry Pi to the local machine. - Backup: Automatically commits the pulled data to a private repository for backup.
- Push: Only after ensuring data safety does it push the new code to the Raspberry Pi.
sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)6. Transition Phase: Front-End/Back-End Separation (The Great Decoupling)
Before moving towards a more “service-oriented” architecture, I realized the current Streamlit monolith was becoming bloated: UI rendering, business logic, and database operations were all crammed into one entry point.
To support potential future mobile apps or multi-user collaboration, I planned a front-end/back-end separation:
- Backend as API: Introduced FastAPI to encapsulate the capabilities of Agents like
MuseandGuardinto standard REST interfaces (e.g.,/api/v1/brainstorm). - Lightweight Frontend: Streamlit would be relegated to a pure “frontend panel,” responsible only for display and sending requests; it could later be replaced by React/Vue.
- Independent Deployment: The backend could run independently in a Docker container, serving multiple frontends.
While this step doesn’t involve changes to the underlying storage, it’s a crucial leap from “script” to “platform”: once the boundaries are clear, the system can more naturally expand towards platform capabilities like multi-tenancy, permission isolation, canary releases, and asynchronous tasks.
7. Future Outlook: Cloud Native Architecture
Phase Two: Retrieval Upgrade (SQLite + Vector Retrieval Dual System)
As the story grows longer, simply “remembering facts” isn’t enough. The system needs to both maintain structured facts (who holds what, who is injured, which settings are active) and perform fuzzy recall during writing (similar scenes, atmospheric text, foreshadowing/memory triggers, character voice consistency). Therefore, I define the next phase’s goal as a SQLite + Vector Retrieval Dual System:
- SQLite continues to handle “facts and structured memory”: Verifiable, traceable data like character states, settings, and timelines that can be used for constraint checking.
- Vector retrieval handles “fuzzy recall”: Similar passages, related dialogues, writing references for similar scenes, and semantically related content that can activate “foreshadowing/memory” triggers.
The corresponding deliverables will be more engineering-oriented and iterable:
- A Pluggable Retrieval Module: Exposes a unified interface
retrieve(query) -> passages[]to the upper layers, with swappable underlying implementations (SQLite built-in / sidecar index / remote vector database). - Context Assembly Rules: For writing/polishing/Q&A, the context is assembled uniformly with the priority: “structured facts + vector recall passages (TopK) + recent chapters,” ensuring both reliability and inspiration.
For gradual implementation, I’ll prioritize a “local closed-loop → then replace” path:
- Start Local: Add an
embeddingstable in SQLite or use a sidecar file index to first get the “vector recall loop” working, validating chunking strategies, recall quality, and context assembly tactics. - Then Replace: When multi-device/multi-user/larger scale is needed, migrate to pgvector/Milvus/Pinecone, which are better suited for online retrieval and concurrency.
Here are two design principles I believe must be upheld:
- Chunking Strategy Matters More Than “Which Vector DB”: Chunking by paragraph, event, or dialogue often yields significantly better recall usability than chunking by a fixed number of words (especially for tasks like “character voice consistency” and “foreshadowing callback”).
- Facts First (Conflict Resolution): When a vector-recalled passage conflicts with a structured fact in SQLite, SQLite takes precedence. Vector recall provides inspiration and context, not the “source of truth” for the world’s facts.
Phase Three: Cloud Native Prototype (Database + Object Storage)
SQLite is just the first step. As the novel’s length reaches millions of words, I still plan for a “Database + Object Storage” form:
| Data Type | Storage Solution | Reason |
|---|---|---|
| Metadata/Index | Cloudflare D1 / AWS RDS | Chapter lists, character relationship graphs, etc., require high-frequency, complex structured queries. |
| Content/Materials | Cloudflare R2 / AWS S3 | Novel text and illustrations are large in size but simple to read/write; separating storage significantly reduces database load. |
To make “multi-device writing + multi-device sync” truly reliable, the core of the next phase will no longer be “can it generate,” but “can it stably govern creative assets long-term”: data consistency, backup and rollback, permissions and auditing, cost and observability will all gradually become the main themes of architectural evolution.
Conclusion
The evolution of FantasyNovelAgent is also a microcosm of a developer’s journey from “just making it work” to pursuing “architectural beauty.” Every refactoring is aimed at making the AI assistant more stable and smarter, allowing me to focus on the most important thing—telling a good story.
🤖 AI Related Posts by semantic similarity
Want updates? Subscribe via RSS
Related Content
- Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)
- Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database (From JSON to Single Table to Relational Tables)
- Hands-on · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)
- Practical Guide: Building a Memory-Enabled AI Writing Partner (Kun) – Retrieval System (Vector Search, Hybrid Search & Cloud Deployment)
- Hands-On: Building an Automated AI Semantic Search With Cloudflare Vectorize and Gemini