Irons in the Fire

I have multiple irons in the fire that I switch between regularly. They are mostly inter-connected and revolve around building the core for services in the Rust language. I’ve been using these projects to gauge my recovery and ability to work.

Cadence

In life and in business, there are recurring rhythms that are necessary to sustain life. Example rhythms include eating breakfast, paying a particular bill, or examining particular reports. The defining feature of a rhythm is that it describes work at a prescribed interval. On an infinite timeline rhythms will happen an infinite number of times.

Cadence is a tool for managing rhythms. The core interface is a list of rhythms that must be addressed. Rhythms can be marked “done” or can be skipped, and will shift in the schedule accordingly. It superficially resembles a TODO list.

The insight that powers cadence is that delinquency in tasks correlates with negative mental health conditions. Individuals with executive functioning differences like those found in autism, bipolar, schizophrenia, depression, and other conditions may find it difficult to maintain life’s tasks when the condition acts up. Being able to detect such swings early allows for proper medical intervention.

There are currently two prototypes to Cadence and a third that was lost to time. The original prototype worked on recurring tasks in the taskwarrior project. Recurring tasks would build up and I used a collection of scripts to hold things together. The second prototype, the first which still exists, was written in rust for single-user use. It is rough about the edges and likely not usable for the general public. It was a proving ground for the algorithm. The third/second prototype is written in Python for a multi-user application.

I’m currently looking for a home for Cadence so that I don’t have to host the system I would depend upon. What currently exists is good for single-user, but one could imagine re-implementing the system for a multi-user setup like Jira where rhythms get managed like tasks. A team that falls behind on its rhythms is objectively not healthy.

I haven’t investigated this thoroughly, but it seems to me that a solid timeseries database would make Cadence a breeze to support for large organizations.

The current status of Cadence is beta software. The Python version is usable and recommended. The Rust version has advanced features but likely will require manual setup. The Python variant will see further development.

TupleDB

TupleDB is a record-oriented layer that sits on top of a key-value-store that supports a bytes-to-bytes abstraction. The core idea is that each row in the key-value-store has a unique tuple-key that indicates the row. The rows that share a tuple-key together specify the object at that tuple-key. This provides the illusion of a primary-keyed table where each row is returned as a protocol-buffers message.

Fields and maps can break out from the row that contains them so that they are stored under a different tuple-key. In this way, data that is written frequently can be separated from data that’s written infrequently. Similarly, maps can be broken out to elements of the tuple key so that it is possible to leverage the key-value-store abstraction to lookup an element in the map.

The interface to TupleDB is intentionally similar to protocol buffers in that the answered records will always be a valid protocol buffers message. Scanning through the key-value store, elements with the same key-prefix will be aggregated to a relation that maps the prefix to the aggregated value. For example, it would be possible to break out the last-seen field from a user’s profile so that a small value can be written each time without rewriting any of the rest of the user’s profile. Similarly, maps break out by key so that the user’s profile could have a map (perhaps of all post IDs the user has posted to a social networking sight) that’s too large to retrieve entirely, but possible to index into directly using nested map operations.

TupleDB is designed to build on top of a bytes-to-bytes key-value store like that provided in the SST module. Systems like LevelDB and RocksDB provide this interface natively, as do several distributed key-value stores.

SST/LSM

To power TupleDB, I’ve built an SST abstraction and designed an LSM-tree abstraction. The SST format is entirely protocol-buffers compatible so that programs in other languages can parse the abstraction too.

Every iron in the fire that logs data—-from the biometrics/counters/gauges/moments to the log tracing—-outputs to the SST format. The idea is that it can output to localhost and then a background process can collect the sealed SSTs and move them to a central location. This allows SREs to interrogate the data on the host where it is generated, or move it to a central location to run tools across the cluster.

Another side effect of outputting everything in the same SST and LSM abstraction is that automation can query the tracing and biometrics libraries directly in order to raise alarms directly to the user. Everything folds over the SST library so that future command line tooling for the library automatically helps every library that depends upon it.

Saros

Saros is an approximately eighteen-years-long cycle. It is also the name of a timeseries database. Working in the same vein as graphite or Prometheus, except the code is written on top of TupleDB’s map functionality. For example lookup[table][key][timestamp] conceptually illustrates that the timestamp sub-indexes the key.

It also has the ability to detect trends over longer periods of time, and alert when the trends are not sustainable. Cadence would be a perfect first user.

That’s all I’ll say about Saros for now.

Setsum

Setsum provides an abstraction for hashing sets of data. The paper outlines how to apply this to a database and it is inbuilt to SST.

Indicio

I didn’t find a good tracing library I liked, so I wrote indicio.

tatl

I have a private set of hypothesis around monitoring and alerting and this library is where I’ll prove out those ideas for the rest of the project. The core idea is to sidestep the traditional mechanism that collects data and then alerts over the data. The alert is calculated right next to the counter so that it can react quickly.

Parallel Paxos

One million operations per second across the continental US. That’s the goal. Paper forthcoming.

Lexico

I came up with a way of partitioning and repartitioning data in grad school that is superior to both B+-trees and consistent hashing—-the two ways people largely configure things nowadays.

Scrunch

Scrunch is a full-text search engine. The idea is to make it so that any TupleDB SST can be encoded as a scrunched file for regular-expressions over the data. For example, finding all objects that contain a particular prefix and match a full text search. The format is both compressing and indexing, so it’s expected to see no more than a 2x factor in disk usage for the full text search.

Scrunch is currently an alpha library. I need to talk to people to see the best way forward as I keep getting stuck. The only thing implemented are reference structures.

All My Rust Crates

I’ve been publishing to crates.io.