Posts
RSS feedPosts in 2026
-
Storing HNSW vectors as regular LSM entries in Theseon — the encoding format, key layout, per-collection concurrency model, a graph connectivity bug, and binary snapshot persistence to avoid full rebuilds on restart.
-
Implementing the HNSW approximate nearest neighbor algorithm from the original paper in Go — the insert and search algorithms, neighbor selection heuristics, tombstone-aware traversal, and building an evaluation framework to prove it works.
-
How Theseon stores writes destined for dead nodes and replays them on recovery — the capacity accounting race, the iterator deadlock, and why skipping fsync is the right call for ephemeral data.
-
How Theseon's coordinator fans out operations to replicas, resolves conflicts by HLC timestamp, and repairs stale data in the background — plus the latency traps and test design bugs that emerged.
-
How Theseon detects node failures without a leader — implementing the SWIM gossip protocol from the paper, the bugs that emerged, and why liveness and data ownership must be decoupled.
-
How Theseon gained MVCC snapshot isolation and optimistic transactions — by separating merge from dedup and making versioning a first-class feature.
- Updated:
How the manifest, leveled compaction, block cache, and write batches turned Theseon into a self-maintaining storage engine.
-
How the storage foundation pieces are wired into a working engine, including internal key encoding and the merge iterator.
-
A deep dive into the bottom half of the Theseon stack: data structures and on-disk formats.
- Updated:
Architecture and design decisions behind a hand-built distributed LSM engine with HNSW vector search: from skip lists and SSTables to SWIM gossip, quorum coordination, and approximate nearest neighbors.