Stronger Cryptography for Every Device, Everywhere

Contents

Introduction: Cryptography, NSA Backdoors, and the Problem of Entropy

Cryptography gives us the capability to encrypt and decrypt messages, giving us confidentiality, privacy, and security. This relies on large amounts of strong random numbers. Unfortunately, computers, being deterministic machines, are super bad at generating random numbers.

(Why does our cryptographic technology require large amounts of randomness? Here's a gentle introduction for the layman: link to follow in next update)

I'm going to use the word entropy here, and by entropy I simply mean the measure of randomness. Low entropy means low randomness or more predicable, high entropy means high randomness or unpredictable.

For our encryption technologies to actually work and protect our privacy and security, we rely on large amounts of randomness (high entropy). When there isn't enough entropy available in a system, the privacy and security technologies break down really badly.

Since computers are super bad at randomness, the industry has relied on hardware random number generators, also known as "True" Random Number Generators or TRNGS. These come in several form factors – USB dongles, add-in cards, or, most famously today, integrated directly into CPUs, like Intel Secure Key and VIA's Padlock. This was a good, working solution.

But everything changed when the Fire Nation attacked.

But everything changed with Edward Snowden / NSA leaks. HWRNGs became untrustworthy.

Whether these suspicions are an overreaction or not isn't the point. The real problem is that reliance on blackboxes, and ones which we are barely able to review or audit - if at all - is never a good idea, particularly for something so central to our security and privacy.

So if HWRNGs are out, and computers are still deterministic machines, and we know for a fact (thanks to Henigner et al) how widespread low entropy devices are and the catastrophic results that ensue, how do we move on? What solution can we use and deploy widely?

In this line of research and prototype testing, I'll introduce the Virtual Coin Flip paradigm, and its prototype implementation called Micro-Benchmark Timing Aggregation. With these techniques, it is possible to have stronger cryptography for every device, everywhere, by being more resistant to NSA backdoors through openness and auditability, as well as cheap deployment as it demands nothing more than a key component that all devices MUST already have: just any CPU.

No More Blackboxes: Openness and Auditability

Before we proceeed to the overarching paradigm and the specific implementation, it's important to mention first two key principles that guided their design.

These two key principles are openness and auditability.

  1. Openness - being so central to privacy and security, whatever our source of secure random numbers should be truly open, as in Open Source – with the entire algorithm available – and not a blackbox. If the algorithm is truly a CSPRNG, then the algorithm itself need not be secret. This is also applies to seeding techniques. The method of generating secure random numbers or collecting the initial seed entropy should be easily studied and reviewed by interested parties. This is essential so that the very core of our security and privacy technologies is not something that can easily be backdoored or otherwise tampered with by nation-states and their security agencies.
  2. Auditability – an extension of Openness, this criteria demands that aside from being open in spirit (which is easily accomplished by releasing something under an Open Source license and hosting the code publicly, such as through GitHub), the program itself is practically auditable – that is, it is as simple as possible, and avoids forms of programming or implementation "cleverness" that make the act of review harder.

I propose these key guiding principles as criteria for our confidence in any source of randomness. This doesn't refer to the quality of the randomness output, but rather to the level of trust – e.g., confidence that it has not been tampered with, backdoored, or otherwise obviously broken in some manner.

A 25,000-line algorithm that is mostly monolithic, for example, can easily be released as Open Source and hosted in a public GitHub repository. However, while it passes criteria #1 easily, it may not pass criteria #2, since being a monolithic, 25,000-line monster will make it extremely hard or impractical to thoroughly review and audit, which will lessen confidence in it as a trusted source, regardless of the quality of its output. In such a case, we would be better off with a similarly open, but far less complex implementation, which is easier to have reviewed and audited, and therefore more trusted.

The Virtual Coin Flip Paradigm

Virtual Coin Flip (VCF) is a paradigm for collecting entropy (generating randomness) in our computing devices. This paradigm is important because our devices are deterministic machines, so they're terrible at generating the randomness we need for cryptographic purposes. VCF outlines a model for reasonable and reliable randomness generation across all types of devices.

Imagine you have a fair coin. You toss it once, it's a 50/50 chance of being heads or tails. That's just 1 bit of entropy, so it's a very low entropy event.

Now imagine you're tasked with manually collecting strong-encryption-grade randomness, and all you have is that one fair coin. That means you need to come up with 256 bits of entropy. Are you up a creek without a paddle?

Even with just one coin, you can straightforwardly generate 256 bits of entropy – you just flip that one coin 256 times. The sequence of heads and tails – the entire chronological order of the flip results – becomes your 256 bits of entropy.

You also obviously aren't limited to just 256-bits. You can generate practically unlimited bits of entropy from that one coin, you just flip it even more times – so 512, 1024, or 2048 bits of entropy isn't actually more complicated to generate than that 256 bits. We just flip more times.

That's basically how the VCF paradigm works.

VCF prescribes that, for all our computing devices to have simplified and reliable entropy collection across the board, a couple of things should be true of the implementation:

  1. There is a source of entropy that can be triggered by the device at will. The coin flip in our example works because a person can just flip a coin an arbitrary amount of times at any time. It doesn't depend on external events. For VCF to work, the entropy source has to be triggerable any time. This also explicitly means it can be triggered several times, in succession, at will. A lot of traditional sources of computing entropy fail this criteria. Things like disk latency, network latency or network activity, mouse movements, and keystroke latency are all things the computer CAN'T trigger at will, as these are all external events (from the point of view of the device).
  2. The entropy source can be a very low entropy event. We've gone through how someone could straightforwardly generate 256 bits of randomness even with just a very low entropy event (flipping one fair coin). As long as the entropy source follows criteria #1 (triggerable at will), it can be of any quality, because even very low entropy events will suffice for generating cryptographically-secure randomness. We can just trigger that entropy source repeatedly for an arbitrarily large amount of times in order to collect the entropy we need.
  3. Implementation simplicity. Flip, record. Flip, record. Flip, record. That's all a person has to do when collecting 256 bits of entropy from one fair coin. Ideally, VCF implementations should have that same simplicity. While implementation simplicity doesn't have a direct effect on the quality of the generated randomness, it does have a significant effect on the trustworthiness and backdoor-resistance of the implementation. Having simple, straightforward code that avoids programming "cleverness" makes the implementation inherently safer by making it more easily auditable.

Micro-Benchmark Timing Aggregation

Micro-Benchmark Timing Aggregation (MBTA) is an implementation of VCF.

In a nutshell, MBTA relies on CPU runtime irreproducibility – the reality that the running time of CPU tasks are irreproducible. Due to CPU design complexity, data locality, transistor variability, background task noise, temperature and a host of other factors, the same benchmark will not run for exactly the same time when ran multiple times (as long as we can measure with enough precision).

Broken down to the essentials, MBTA is composed of the following components:

  1. A trivial CPU operation. In MBTA, the actual operation being done and considered as a "micro-benchmark" is not important. Any trivial operation will do. The secret sauce isn't in how complex the operation is. In my default MBTA implementation, this trivial operation is just the addition of two large numbers. No complex mathematical formulas necessary.
  2. Scaling up the trivial CPU operation using a loop. Just to make sure the trivial operation presents a heavy enough load, there is a scale factor. The scale is simply the number of times the trivial operation is looped before a time measurement is made.
  3. Running the scaled micro-benchmark to collect a target number of samples. Since timing the micro-benchmarks may not necessarily be a high-entropy event, we need to collect a large number of them. The samples parameter indicates how many timing measurements to collect in total.
  4. Collecting an ordered sequence of results. The end result of MBTA is a sequence of runtime values in the order that they were collected, for how many samples were set to be collected. This sequence does not need to be altered in any way. They just have to be aggregated as is (for example, stored into an array or appended in a string), with the whole thing to be passed into a cryptographic hash function to derive a key of specific length.

Lab results show that MBTA is able to collect thousands to millions of bits of entropy per second, depending on platform and the timer resolution available. Since only 256 bits of entropy is needed for initial seeding, MBTA is overkill for practically all platforms, from very small and simple devices to large complex server CPUs with nanosecond-level timers.

Using MBTA, an Arduino Uno (with a 16MHz processor and only a 4-microsecond resolution timer) was able to collect >3,000 bits of entropy per second. This is the worst case for MBTA in all of the lab experiments done, as it is a combination of having a simple processor and a low resolution timer.

Key Differences From Previous Work

1. HAVEGE / haveged

HAVEGE and haveged are related – HAVEGE is the defining initial work (since 2002), and haveged is a specific implementation for Linux with some improvements

The common complaint about HAVEGE/haveged is that it's too complex. HAVEGE code itself has not been updated since 2006. In the wild, I've only ever seen haveged. You can inspect haveged code yourself, being Open Source.

Other researchers in this same field have also commented about the perceived complexity. HAVEGE/haveged also seems to need specific types of CPUs, or require tuning that is specific to the CPU or the CPU architecture.

Being Open Source, this passes the criteria of Openness.

The complexity fails Auditability, however.

2. Stephan Mueller's Jitter Entropy

Jitter Entropy by Stephan Mueller is the most recent related work, and the only one that's very actively being developed and maintained.

It fits favorably within the VCF paradigm – it's easily argued that it succeeds in both principles of openness and auditability. By most accounts, it's already a good, feasible replacement for most HWRNG use.

Where MBTA improves on is that the code for Jitter Entropy is relatively complex compared to MBTA.

In a code comment: "The code is deliberately inefficient and shall stay that way. This function is the root cause why the code shall be compiled without optimization. This function not only acts as folding operation, but this function's execution is used to measure the CPU execution time jitter. Any change to the loop in this function implies that careful retesting must be done." Emphasis mine. In Jitter Entropy, the operation matters; In MBTA, it doesn't. In JitterEntropy, the operation is complex (folding, memory access, etc) and as emphasized in the quote, is central to the entropy collection. In MBTA, it doesn't really matter, it's much simpler and easier to audit.

Jitter Entropy also seems to rely on a high-resolution timer being available. MBTA does not. To target the largest possible set of devices, being able to easily handle low resolution timers is a key advantage.

3. Maxwell by Sandy Harris

Maxwell is another implementation that arguably passes the VCF paradigm. Specific to its design is keeping the code base small.

Maxwell and MBTA are probably the closest match and most similar to each other, compared to the previous entries. There are still key differences, though. Maxwell does some modulo operations to extract randomness from collected values. MBTA does nothing more than collect the sequence of values, intending later on to have this hashed.

Perhaps the only real disagreement I have with Maxwell is actually only "spiritual": Sandy Harris considers Maxwell and its operation only as a supplemental feature – that is, this is helpful, but only as a supplement, or only as a last resort (if there's nothing better). I argue the opposite – Maxwell (if it could easily be ported outside Linux, and it seems so) and MBTA should be considered essential to all devices, everywhere, to act as the primary source of strong random numbers in order to seed the OS RNG. All other things – entropy from disk latency, network activity, keystrokes, mouse movements, even maybe some entropy from HWRNGs – can be treated as supplemental, but something that is as open and auditable and portable like Maxwell or MBTA should be the standard to simplify reliability and auditability across the entire ecosystem of secure devices.

It bears noting that a key difference between MBTA and all the cited similar past work is that MBTA is only meant to seed the OS RNG. There is no need for MBTA to fulfill the function of a CSPRNG other than for the initial seed.

History of Publications, Lab Results, and Experiments

Note: During the first iterations of this research, before the VCF paradigm and MBTA were fully formed concepts, the project was called 'SideRand'. For all files, code, and publications below, it is safe to assume 'MBTA' whenever SideRand is mentioned.

Research Papers
Paper Remarks
SideRand Paper 1 First released version, April 8, 2018.
[PRE-PRINT] Paper 2: Stronger Cryptography for Every Device, Everywhere Pre-print; for conference, *abstract presentation*, National University of Singapore, Sep 29-30, 2018 (ICIRSTM2018).
Won the Best Conference Paper Award.
A shorter paper (7 pages) that includes a better approach to analyzing entropy - using frequency distribution and the percentage of the most frequent value to get the minimum entropy estimate.
Conference presentation materials: Slide Deck | Spiel

Supplemental Information (for Paper 1, Q1 2018 data)
Section Description
Experiment Results Browse the different experiments done and download the results data. NOTE: Currently just a default Apache directory listing.
Scripts This contains the various Python 3 and bash scripts created to run the SideRand experiments, as well as the various SideRand prototypes created during the R&D phase. NOTE: Currently just a default Apache directory listing.
Supplemental Information (for Paper 2, May-Jul 2018 data)
Section Description
Summary Spreadsheet Current data summary (still work in progress, will be updated often and reformatted)
Experiment Results Browse the different experiments done and download the results data. NOTE: Currently just a default Apache directory listing.
Scripts This contains the Python3, C, PHP, Ruby implementations of SideRand-based entropy collection, and bash scripts created to run the improved SideRand experiments. NOTE: Currently just a default Apache directory listing.