// ------------------------------------------------------------ 66. Exploiting Implementation Errors in Diffie-Hellman Most of the problems we've exploited have been application- or protocol-layer errors of design. But cryptographic primitives can suffer from implementation errors too! One seldom-seen (but never the less deadly!) class of implementation error is the notorious carry bug. It goes like this: 1. Most public-key (and some symmetric) cryptographic primitives take advantage of bignum arithmetic. 2. Bignums are big. They can be thousands of bits long. They do not fit in machine registers. 3. To compute the correct results, implementations are forced to break up bignums into pieces (henceforth "limbs") that will fit in registers. 4. Just like long addition in grade school, intermediate results are computed one column (pair of limbs) at a time. 5. Just like long addition in grade school, intermediate results can overflow. The implementation must track and apply all these carry bits with utmost care. 6. Finally, the implementation has to stitch all the limbs back into a coherent result. There are opportunities for errors everywhere, but step 5 is especially pernicious. Carry bugs can be difficult to detect, since the incidence of random faults might be 2^-64 or worse. And because these bugs are so implementation-dependent, standard test vectors, even those meant to exercise pathological cases, typically are not very good at finding these flaws. Ladies and gentlemen, cryptopals, please: a moment of silence for the crypto implementers. *Sheds a single tear* Okay, that's enough sympathy. Now let's find a nontrivial homomorphism from their sorrow to the profit domain. Let's target Diffie-Hellman, the bread-and-butter public-key primitive of cryptographic protocols. We'll stick with EC notation. Here's the high-level view of scalar multiplication: function scalarmult(Q, k): R := Q # assume n-bit k with k[1] = 1 # iterate bits from high to low, skip first bit for b in bits(k)[2..n]: R := add(R, R) if b = 1: R := add(R, Q) return R In ECDH, Q is an attacker-supplied public point, and k is our secret scalar. For simplicity, let's suppose k is a fixed bit-length. (This is often true in DH implementations anyway.) We number the bits of k from 1 to n with k[1] the most-significant bit and k[n] the least-significant bit. We need to hide some low-level bug in this high-level logic. We could go through the exercise of implementing multiple-precision arithmetic with some subtle-but-realistic carry bug embedded deep within the logic, but that would be incredibly tedious, and it would only obfuscate what is (at heart) a beautifully simple attack. Instead, let's cheat: 1. If you worked through the previous ECDH problems, you should be able to borrow code from your existing implementation. If not, gin one up with the standard EC addition laws from Wikipedia. 2. Replace add() with faultyadd(), which should look something like this: function faultyadd(Q1, Q2): if fault(Q1, Q2): raise return add(Q1, Q2) 3. Fault() should just hash the points and compare them to some sentinel value. I did something like this: function fault(Q1, Q2): return (Q1.x * Q2.x) % p = 0 With p the probability of a fault. It doesn't really matter what fault() is, but it must be deterministic. Just make sure it's not too slow, since it's going to get called 2n times per scalar multiplication. And make sure you can adjust p easily, for your own sanity. Before we think about how to attack, I think it's helpful to think about the DH implementation above. Write yourself a little tracer that steps through a scalar multiplication and, at each call to add, prints the coefficients c and d of Q in each call add(cQ, dQ). Feel free to add whatever debug outputs make sense. For example, with n = 6, trace(58) gives me: # k = 111010 # i = 2, b = 1 add(1Q, 1Q) add(2Q, 1Q) # i = 3, b = 1 add(3Q, 3Q) add(6Q, 1Q) # i = 4, b = 0 add(7Q, 7Q) # i = 5, b = 1 add(14Q, 14Q) add(28Q, 1Q) # i = 6, b = 0 add(29Q, 29Q) And here's trace(62): # k = 111110 # i = 2, b = 1 add(1Q, 1Q) add(2Q, 1Q) # i = 3, b = 1 add(3Q, 3Q) add(6Q, 1Q) # i = 4, b = 1 add(7Q, 7Q) add(14Q, 1Q) # i = 5, b = 1 add(15Q, 15Q) add(30Q, 1Q) # i = 6, b = 0 add(31Q, 31Q) Play around with different inputs until you get an intuitive feel for it. Here are some things to note: 1. The coefficients of Q in each call to k is a function of the secret scalar alone. This implies another model for the secret: a unique sequence of group operations. While the attacker chooses the input point, the victim alone chooses the sequence of operations on it. 2. For any given k, each pair of inputs to add() is unique. That is, we never repeat a (Q1, Q2) pair. This is because the accumulator doubles in each iteration. 3. In each iteration, the inputs to add() depend on the current bit and all preceding bits. The succeeding bits are not considered until later. The two examples above differ only in k[4]. Notice that they encode an identical sequence of operations before diverging forever when i = 4. Another way to put this is that the bits of k encode a binary decision tree: add(1Q, 1Q) /---/ \---\ b = 0 b = 1 add(2Q, 2Q) add(2Q, 1Q) add(3Q, 3Q) / \ / \ b = 0 b = 1 b = 0 b = 1 add(4Q, 4Q) add(4Q, 1Q) add(6Q, 6Q) add(6Q, 1Q) add(5Q, 5Q) add(7Q, 7Q) ... And so on. Now, let's attack. We know k[1] = 1 by definition. Let's consider k[2]. By inspecting the tree above, we can see add(2Q, 2Q) is computed if and only if k[2] = 0. So: 1. Generate a random scalar d. Keep it secret and keep it safe. This is the target of our attack. 2. Define an oracle that accepts a point Q, multiplies it by d, and returns true or false depending upon whether a fault is triggered. In a realistic setting, this could be an endpoint that computes the ECDH handshake and decrypts a message. You can build this out if you're feeling fancy, but the artificial oracle is okay too. 3. Working offline, generate random points and compute add(1Q, 1Q) (the unconditional first step of the multiplication) followed by add(2Q, 2Q). You're looking for a point that survives the first addition and triggers a fault on the second one. 4. When you find a point that triggers a fault, query the oracle. The response from the oracle will tell you the value of k[2]. If it succeeds, k[2] = 1. If it triggers the fault, k[2] = 0 (probably.) Probably? Well, sure. There is, of course, a chance for false positives. Since we're treating faults as random, there is a small but nonzero chance your input point will trigger a fault on some later step. We can eliminate these by improving our simulation: 1. Instead of simulating only the b = 0 branch, simulate both branches. Find a candidate point that triggers a fault on one but not the other. 2. Suppose your point triggers a fault when k[2] = 1. Query the oracle. If there is no fault, k[2] = 0. Otherwise, we are unsure, so return to step 1. (The same logic applies if your point triggers when k[2] = 0.) From here, we can generalize the above algorithm to recover succeeding bits: 1. Generate random points and simulate the bits of the key that you know. Ensure there is no error. 2. Simulate all possible computations for the next unknown bit of the key. If you trigger a fault, proceed. Otherwise, return to step 1. 3. Query the oracle. A negative result leaks a key bit with certainty. A positive result (i.e. a fault) may leak a key bit. Depending on how you tuned your fault probability, this may give you enough confidence to proceed. 4. Rinse and repeat until all key bits are recovered. That's pretty much it. Well, there are a few opportunities to get fancy, if you feel so inclined: 1. Even in the presence of uncertainty, positive results have value. You can calculate the probability of a false positive and determine whether you have enough confidence to proceed. 2. Once you have enough bits of the key, you can compute the remainder of the attack offline using standard discrete logarithm attacks (e.g. Pollard's kangaroo).