guacamole

Take a moment to ponder the following question: Imagine you were benchmarking a key-value store and needed to represent 1e12 distinct keys to share among the workload generators. How could you share such a large data set without materializing it?

My solution to this problem is guacamole, a linearly-seekable, flexible, and deterministic pseudo-random number generator.

Linearly-Seekable: A psuedo-random number generator is said to be linearly-seekable when advancing the input predictably advances the output. In the case of guacamole, every seed’s output is exactly 64 bytes away from its predecessor and successor seeds. The key property of a linearly-seekable pseudo-random number generator is that it allows one to divide the seed space into N seeds and get N non-overlapping buckets of random data to draw from.

Flexible: Guacamole provides a combinators module for generating complex random data structures.

Deterministic: Given the same seed, guacamole will produce the same random output stream. This is necessary for the linearly-seekable property to be meaningful.

The answer to our opening question presents itself: Using guacamole, divide the input key space into 1e12 distinct buckets, each corresponding to a seed that is u64::MAX / 1e12 * 64 bytes away from other seeds. Now, generate a random length and string of that length using each individual bucket. It’s easy to enumerate the keys, and it’s easy to choose and construct a single key.