A Setsum Manifest
I was motivated by a singular problem: How does one detect data loss or corruption in a new key value store? How does one ensure that data loss and corruption don’t happen?
For an LSM tree, there are two points at which data can be lost: During ingest or during compaction. Losing data at ingest time is harder to believe as a bug, so I’ll focus on compaction for this post. The same mechanism that protects compaction extends across ingest, so as long as something makes it to the write batch, it will get written.
The key for using mani and setsum is to construct an end-to-end checksum over the database. With each edit to the tree—-each new mani edit—-the setsum gets updated to reflect data added or removed by the edit. Pure compaction neither adds nor removes data, so it should leave the checksum unaffected.
In the implementation of the key-value store, a verifier thread takes care of verifying checksums and unlinks stale files only after verifying the checksum of the files to succeed them.
To summarize, mani and setsum provide an efficient way to keep a rolling checksum over a key-value store, capturing all writes to the data store as part of its checksum.