Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database (From JSON to Single Table to Relational Tables)

If you’ve already read Building a Memory-Powered AI Writing Partner (Part 1): Multi-Agent Architecture Evolution, you likely have a high-level understanding of how multiple agents collaborate and how memory is chained together. But what truly makes a system viable long-term isn’t just a pretty architecture diagram—it requires a data foundation that can withstand growth: one that supports querying, modification, and rollback.

This article focuses on the evolution of the “fact layer” (the database): JSON files → SQLite single database (KV) → SQLite single database (relational tables). Semantic search, hybrid search, full graph indexing, and cloud migration are covered separately in the next article, Building a Memory-Powered AI Writing Partner (Part 2): Retrieval Systems (Vector Search, Hybrid Search, and Cloud Migration).

The essence of a long-form novel writing system isn’t “writing a block of text.” It’s about maintaining a constantly growing world over time: character states, faction relationships, item flows, location hierarchies, foreshadowing chains… As the word count grows, this information expands exponentially.

When data is just “a pile of text,” you’ll inevitably encounter three types of problems:

  • Hard to query: Finding a passage with a “similar atmosphere/conflict” or precisely listing “current members of a sect” becomes difficult.
  • Poor consistency: Deletions aren’t clean, changing A forgets to update B, and the same entity gets defined redundantly in different places.
  • Cross-device maintenance breaks down: Multi-device sync, merge conflicts, and rollback backups become manual labor.

The goal has always been clear:

Transform data into an “entity-relationship system,” then layer on a “retrieval index layer,” so the AI can not only write but also query, remember, and stay organized.


0. Phase Zero: JSON Files (Easiest, but Quickly Hits Limits)

0.1 The Initial Choice

To get started quickly, I used the file system for storage: character libraries, maps, world-building settings, etc., were saved as JSON (or JSON-like) files.

The benefits were straightforward:

  • Zero dependencies: No database, no migration scripts needed.
  • Readable and diffable: Seeing changes with Git was very convenient.
  • LLM-friendly: Large models could extract data directly as JSON, making storage frictionless.

0.2 Problems That Quickly Emerged

As data volume and functionality grew, JSON files exposed several hard limitations:

  • Lack of globally unique IDs: Everything relied on names as keys. Renaming, duplicate names, and aliases made data uncontrollable.
  • Difficult relationship modeling: Relationships like character↔sect history, character↔skill proficiency, and character↔artifact ownership had to be manually written as nested structures, becoming increasingly hard to maintain.
  • Painful cross-device sync: When two devices modified the same JSON file simultaneously, reliably resolving merge conflicts was difficult.
  • Weak querying: Without indexes, queries devolved into “load JSON → Python loop and filter → maintain your own cache.”

The point of upgrading wasn’t just “switching to something more complex.” It was about turning a “save file” into a “runnable data system.”


1. Phase One: SQLite Single Database (KV-Focused) — Stabilizing Data Aggregation and Backup

1.1 The Core Problem Solved

I migrated the early JSON content into SQLite’s kv_store (key/value): for example, character_db, map_db, world settings, future plans, etc.

The value of this step was upgrading the writing system from “scattered multiple files” to a “single-file source of truth” prototype (note: this doesn’t solve multi-device concurrent merging):

  • Simple deployment and backup: A single novel.db file could run (backup/rollback became more controllable).
  • Unified read/write path: Read/write logic was no longer scattered everywhere.
  • Retained JSON advantages: The KV store still held human-readable JSON.

Let’s be clear about the boundary: SQLite consolidates the “source of truth” into a single file. However, if you sync the entire db file via a cloud drive, simultaneous edits on multiple devices will still create “conflict copies” that can’t be reliably merged like text. True cross-device sync requires “centralized arbitration (cloud)” or “mergeable sync based on operation logs (op-log)” (more on this in the cloud migration section).

(Implementation-wise, during app initialization, basic tables like kv_store, chapters, and drafts are created, converging data reads/writes from “multiple files” into a “single database.”)

1.2 Remaining Problems

The limits of KV were also clear:

  • Query limits: All complex queries required “loading JSON and then iterating.”
  • Relationship expression limits: Relationships were forced into nested JSON, making deletion/updates hard to keep consistent.
  • Blurry consistency boundaries: The same entity could be described redundantly across multiple JSON blobs, making conflict resolution difficult.

This phase is suitable for “rapid early iteration” but not for “long-term maintenance of an entity-relationship graph.”


2. Phase Two: SQLite Single Database (Content Table + KV) — Establishing a Clear Source of Truth

2.1 What I Did

Within the same data/novel.db, alongside kv_store, I maintained well-structured content tables:

  • chapters: Chapter metadata (title/ulid/timestamp/index fields; chapter content stored in data/blob_store/)
  • drafts: Drafts

The significance was upgrading “writing content” from file reads/writes to database records, creating a more stable versioning and sync path.

2.2 Source of Truth

From this point, I established a core principle:

Source of Truth = data/novel.db (structured data/metadata/KV/FTS) + data/blob_store/ (chapter content objects). Any index, cache, or derived structure must be rebuildable from the Source of Truth.

This principle directly determines how the “retrieval layer” is designed: whether it’s full-text search or vector search, it must only be an index layer, never a second source of truth.


3. Phase Three: SQLite Single Database + Relational Tables — Transforming the “Memory Bank” from a Text Pile into an Entity-Relationship System

The core decision in this phase was:

Use the Source of Truth (data/novel.db + data/blob_store/) as the foundation: add relational tables within the same SQLite file to hold structured knowledge.

3.1 Why Relational Tables?

Because a writing knowledge base is fundamentally an “entity-relationship system.” When you start wanting to run these queries, the KV model becomes a maintenance nightmare:

  • “What artifacts/skills does Nanhai Crocodile God possess? What are their proficiency levels?”
  • “Who are the members of the Manlin Ancient Tribe? Who are active? What are their positions?”
  • “Which characters practice a specific skill? Sort by proficiency.”
  • “Which characters/locations/artifacts are involved in a specific unresolved plot thread? In which chapter did it first appear?”

3.2 The Two Most Critical Constraints for Relational Tables: Entity Table + Unique ID

More specifically, getting “unique IDs” right is crucial because it determines the cost of all future joins, indexes, migrations, and merge conflicts:

  • Don’t use name as the primary key: Names change, have duplicates, and have aliases/titles; name is a mutable field.
  • Distinguish between “internal row ID” and “globally unique ID”:
    • Local single-machine: Use auto-incrementing integer primary keys (good performance, lightweight joins) as internal fact anchors.
    • Multi-device/cloud: Use globally unique IDs like ULID/UUIDv7 for external references to avoid ID conflicts during offline editing and merging.
  • Use unique constraints for “business uniqueness”: You can add a UNIQUE constraint to name (depending on project tolerance), but still don’t use it as the primary key.
  • Separate table for aliases/titles: Introduce entity_aliases(entity_type, entity_id, alias) to handle “same name/nickname/title” and lookup issues.

In the current implementation, relational tables primarily use id INTEGER PRIMARY KEY. I’ve also added ulid to the chapters table for index alignment and future multi-device sync. The next step is to add ulid/public_id to entity tables as well.

3.3 Query Advantages of Relational Tables: From “Iterating JSON” to “A Few Lines of SQL”

Once many-to-many relationships are extracted, many features suddenly become simple, reliable, and optimizable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Example 1: What skills does Nanhai Crocodile God possess? Sort by proficiency.
SELECT
  c.name AS character_name,
  m.name AS method_name,
  cc.proficiency,
  cc.note
FROM characters c
JOIN char_cultivations cc ON cc.char_id = c.id
JOIN cultivation_methods m ON m.id = cc.method_id
WHERE c.name = 'Nanhai Crocodile God'
ORDER BY cc.proficiency DESC;

-- Example 2: Who are the members of the Manlin Ancient Tribe? Who are active? What are their positions?
SELECT
  o.name AS org_name,
  c.name AS character_name,
  ca.position,
  ca.is_current
FROM organizations o
JOIN char_affiliations ca ON ca.org_id = o.id
JOIN characters c ON c.id = ca.char_id
WHERE o.name = 'Manlin Ancient Tribe'
ORDER BY ca.is_current DESC, ca.position;

-- Example 3: Unresolved plot threads related to a specific character, sorted by the chapter they were introduced.
SELECT
  um.id AS mystery_id,
  um.content,
  c.name AS subject_character_name,
  um.created_at_chapter AS created_at_chapter_no
FROM unresolved_mysteries um
JOIN characters c
  ON um.subject_type = 'character'
 AND um.subject_id = c.id
WHERE um.status = 'open'
ORDER BY created_at_chapter_no ASC;

3.4 Engineering Implementation: Start from “Read/Write Paths,” Not “Table Design”

The most common pitfall in migration isn’t whether the schema is pretty, but whether the read/write paths are too aggressive.

My strategy was “get the system running first, then gradually make relational tables the primary path”:

  • Migration scripts: Provide import scripts from KV to relational tables, allowing historical data to be moved into the new structure incrementally.
  • Storage layer fallback: Prioritize reading from relational tables, but still write JSON back to kv_store (for transitional backup/rollback).
    • This allows the primary read path to be slowly switched to relational tables without breaking existing functionality.

Also, this phase must implement “delete semantics”; otherwise, the UI will exhibit the classic problem: “It looks deleted, but it reappears after a refresh.”

3.5 A Realistic Compromise: mentioned_character_ids (Denormalized Field)

Strictly speaking, “characters mentioned in this chapter” could be dynamically computed at query time via a structured entity reference table (or FTS/NER parsing). However, to make the chapter library UI’s “character filter” and “display mentioned characters” more intuitive, I added chapters.mentioned_character_ids, storing an array of character table IDs as a JSON string.

Meanwhile, the UI and retrieval filtering associated with chapters.primary_character_id (the “main perspective”) have been removed. In multi-perspective writing, using a single field to express perspective often creates more confusion. The field is temporarily retained only for compatibility and potential future redesign.


4. Summary

This article has clarified the evolution path of the “fact layer”:

  • Started with JSON files for rapid prototyping.
  • Migrated to SQLite KV to unify backup and read/write paths.
  • Introduced relational tables to advance the world-building from a “text pile” to an “entity-relationship system.”

The next article will thoroughly cover the “index layer”: how vector search is implemented, how FTS5 and vectors are combined for hybrid search, how indexing is extended to the full graph, and why Cloudflare is the first choice for cloud migration:


Want updates? Subscribe via RSS


Related Content