Blog: Internet Of Things
Want entropy? Don’t use a floating ADC input
Following on from our post discussing the risks of inadequate entropy in IoT devices, I wanted to look at one particular method used to gather entropy – using a floating ADC (analog to digital converter) input.
The idea is that you take a floating (i.e. unconnected) ADC input to a microcontroller, take a sample, and use that as your entropy source. This sounds great – all that noise! It must be random.
Rather than directly use this as entropy, you seed a pseudo-random number generator with this value. This is a common pattern – true entropy is generally rate limited, so you take that and generate pseudo-random numbers. This is how Linux works – it feeds true entropy into /dev/random, and from that generates pseudo-random numbers in /dev/urandom.
I’ve drawn this example directly from the Arduino documentation. You can see it done many times in various Arduino projects. I’ve seen it done in commercial devices. But it has so many problems that it isn’t even suitable for hobby projects – they end up being blogged, others copy them, and the effect of poor entropy snowballs.
The analog to digital converter in most microcontrollers used by Arduino boards is 10bits. At most, it will produce 2^10 values – 1024. Yet the call to seed takes an unsigned long which is 32bits. We have already reduced the potential entropy massively. 1024 potential seeds isn’t even enough for randomly generating levels in a dungeon game – users will notice repeat patterns.
But it is far worse than this. A floating input will tend to float at the same voltage. It varies from microprocessor to microprocessor, board to board, power supply to power supply, and even the surrounding environment. But I have seen a given board produce only 32 values, and less than 100 is common. We are now down into the territory when random is trivially guessable.
Why not take the least significant bit of the ADC and concatenate this to generate more entropy? A simple concept, but not great. Try it and run one an entropy benchmark, such as ent or Diehard against it. You will see failures:
- Bias – either 1 or 0 will be more common.
- Periodicity – plot the values and you may see a 50 or 60Hz pattern – picked up from mains frequency.
Now people try to remove the bias and correlation. They write some custom code. Maybe the Von Neumann method to remove bias (10->1, 01->0, remove others). This means that the rate that you can draw entropy is at least halved, but with a relatively biased source, 10% is more common. You can increase the rate at which you sample from the ADC – but have you checked to see if the values you get are different when you do this? It all gets quite complex quite quickly.
There are plenty of forum answers that are little throwaway, single sentence algorithms:
“Just take the 4 lowest significant bits, sample it 8 times, use that”
But why is this considered secure? We don’t know that it is. It has essentially become “roll your own crypto”. The problem is that people think “randomness” is an easy problem to solve, and don’t even think of it as crypto.
This is why the HAL (Hardware Abstraction Layer) or libc (the C library) should clearly state the methods used to source entropy, and how rand() works. Secure options should be provided and, if possible, used as the default options.
We haven’t even started on the PRNG (Pseudo Random Number Generator) used by Arduino and most other rands(). Arduino uses an LCG (Linear Congruential Generator) to produce pseudo random numbers. The problem with this is that you can recover the seed from only three consecutive outputs of an LCG, allowing you to predict all future outputs. This really sucks for security purposes.
You can mix and reduce the output down to something like a coin flip, reducing the information available to an attacker, but it isn’t enough. You can still often recover the seed, or at least reduce the number of potential seeds down to a very small number.
You need to use a CSPRNG (Cryptographically Secure Pseudo Random Number Generator) if you need security. With one of these, it doesn’t matter how many outputs you see, you will not be able to recover the seed by anything except brute-force.
But many beginners – who then become not-beginners – don’t know the difference between true and pseudo entropy. Time and time again you see people asking why they are getting the same random numbers out of rand() – something must be broken! This just isn’t taught to developers.
A common rebuttal I see to this is “It’s just an Arduino game”. No! Even in toy scenarios, weak entropy sources result in poor user experience. But it’s bigger than that. A developer making an Arduino toy today is making a remote-keyless entry system tomorrow – all using the same entropy gathering techniques.