RAVENS was started in part as a security research project. Therefore, it has to go beyond simply installing an update, but also need to download it, in a potentially hostile environment.
Although this sounds like a solved problem (HTTPS!), this turned in a surprisingly complex problem when platform limitations had to be taken into account. Specifically, the devices we were targeting (that is, very low power devices powered by a monolithic MCU) may not have the memory to run a TLS stack. Moreover, those devices may lack a persistant clock and be unable to get a trustworthy source of time (making expiration tricky). To top it all off, those devices are mostly deterministic and don’t always have proper sources of entropy nor can be provisioned with device-specific secrets. Our transfer protocol thus had to do a lot, with very little.
Our final design aims at providing the following:
- Integrity checks, including against incomplete update packages;
- Protection against any kind of forged update, even when based on a valid update;
- Prevention of accidental distribution error ending up with bricked devices;
- Protection against downgrade, and partial upgrade (upgrade to a more recent, but obsolete version);
- Protection against running delta updates on a unknown state;
- Limiting the impact of a server-side leak of cryptographic material;
- Use of fast cryptography with minimal device provisioning;
- Supporting simple evolution toward encrypting the update payload.
Those features are achieved with the following provisioning:
- A unique ECDSA public key is provisioned on the device, and is shared by every device running on the same hardware;
- Some random (?) data is included in the firmware, but can be shared between multiple devices.
The parties
Three parties are involved in this protocol, each with different constraints and available tools.
The device, Munin
As mentioned earlier, the device is extremely limited in its hardware capabilities. Therefore, it’s forcing the communication to take place over cleartext HTTP/MQTT. On the other hand, it has two tools to protect its communication: a 32 bytes public key that can be used to check cryptographic signatures (from this point referred to as Device Public Key) and a field of 64 bytes of pseudo-random data (which will then be referred to as Challenge Data).
An important note is that the Device Public Key is the only embedded cryptographic key, and leaking the corresponding private key would be a Very Bad Thing™ (basically, the server have no other mean to authenticate itself).
The server, Odin
The server is only a distribution mean, has access to updates and to some secrets, and need to get the devices updated. In addition to that, Odin need to find a way to detect corrupted devices to the best of its abilities so that they don’t try to reuse corrupted data when building the new firmware image.
An additional design goal is that a breach in Odin not result in the leak of the Device Private Key, which is reflected in the update format (I’ll touch on that a bit later). This reduces the potential for error and make deploying a CDN less risky.
Hugin
Hugin is usually the command-line tool used to generate and sign the update packages. In order to ensure maximum consistency between all the parser used, Hugin and Munin share part of their codebase. It is also used by Odin in order to perform cryptographic operations, although with no state to speak of.
Protocol
With that being said, here is protocol, and I’ll go a bit deeper in how it achieves its goals.
The core principle follow the common method of downloading an update manifest with some metadata on the update, then downloading the main package. Our differences become more obvious when looking at the details.
First Request: Munin ask Odin whether an update is available
Munin send a simple HTTP request with, as a user agent, the name and the version of the device. The version used is an internal counter that strictly prevents downgrade. A rollback thus require the switch to a new version. On top of that, Munin send a X-Update-Challenge header with the Challenge Data I mentioned earlier, encoded in base64 so avoid sending raw, random, binary data in the header.
First Reply: Odin sends back the update manifest
For the sake of brevity, I’ll skip over the “no update” reply, as it’s nothing than an HTTP error code.
On the other hand, if Odin determine an update is required (that is, the device is outdated and any server-side policy accept the device), it will reply with the update manifest.
On top of the actual manifest, Odin sends back the challenge after authentification and a third component which consists of multiple memory adresses and length. This is an integrity check as those sections correspond to any part of the device memory that will be used by the delta update. Odin is expecting that in reply to that, the device will send a cryptographic hash of those ranges.
Second Request: Munin reply to Odin’s request for information
After receiving Odin’s reply, Munin check the cryptographic signature of the update manifest, the signed challenge and of the ranges Odin wants it to send back. Assuming all signature verification succeed, Munin will hash those ranges and append the resulting raw hashes (not encoded in hexadecimal/base64). It’ll then send a POST request to Odin with the resulting hashes, including again the user agent in the request. The user agent is used by Odin when checking if the hashes are correct.
Second Reply: Odin sends back the update payload
After confirming that the device reply is the expected hashes, Odin sends back the main update payload. It is then downloaded by Munin, written to the flash and the reboot is initiated if the signature is valid.
Data structure
As described, the protocol appear to be very standard, and it may not be cleared how it provides its numerous security guarantees. The magic is largely contained within the three components of Odin’s first response. The main payload (send during the second reply) is actually largely untouched, its integrity being insured by the update manifest.
Okay then, what is Odin replying, and what’s so fancy about it?
As mentioned earlier, Odin is sending back three independently secured items, put back to back in the data stream: the Update Manifest, the Challenge Response and the Integrity Check.
The Update Manifest
Described here, the update manifest main purpose is to provide some information on the update. Specifically, it provides the firmware version on top of which the update should be installed, the version it is installing and the length of the update payload. The manifest also contains a cryptographic (SHA-256) hash of the update payload, making sure the payload can’t be tampered in any way during transit. So far, so good.
The Update Manifest is secured by being cryptographically signed (using Curve25519/libHydrogen), using the Private Device Key. Being signed mean that the manifest can’t be tampered with. Using the Private Device Key mean that the signature can be checked using the Device Public Key, embedded in the device. This however means that whoever signed the manifest had access to a super critical secret key whose leak would destroy our security guarantees. It must thus be used as little as possible, and we have two other items to secure.
The last relevant item of the Update Manifest is a new public key, called Update Public Key. This key was generated at the same time as the update manifest, and is thus only valid for this specific update (meaning, starting from this specific version toward another specific version). This create a much lower value kind of key, as a leak of the corresponding private key can only be used with this specific update, and thus even a leak of every existing Update Private Keys is absolutely useless against any up-to-date device. But what is the Update Public Key used for?
The Challenge Response
The challenge response is a fairly simple structure: it only contains a cryptographic signature and a 64 bits random number. This is where things get a bit weird: the signature is not of the random number, and is not signed using the Device Private Key.
What is signed is actually the challenge we sent, to which the random number and a few more metadata were appended. Why appending a random value, and not simply sign the challenge and send back the signature? This is in order to work around the lack of a random number generator on those devices: the signature of the challenge will be used as the challenge of the next update request (assuming the update payload is accepted). This makes sure that even if all the devices are provisioned with the same initial challenge, it will change after every update, and two devices with the same challenge will end up with different signatures, and thus different challenges, after a single update.
This explains the signature, but not the key. The signature is generated using the Update Private Key, whose public key was transferred in the Update Manifest and is only valid for the update the device think is receiving. This has two major advantages: it prevents mixing-and-matching challenges to different updates and replay a different challenge response, and makes it that the only keys Odin need access to are low-value, Update Private Keys. A total compromising of the Odin infrastructure thus will only leak keys letting you generate fake Challenge Response, but not fake Update Manifest. This means that an attacker may then be able to replay older updates (assuming they have the update from the proper starting version, and that the version it’s installing is newer than the one the device had installed).
Together, this enables our system to strongly resist partial updates (upgrading to an obsolete firmware, even if it is newer than the installed firmware). The only way to replay an update is by capturing a valid challenge response beforehand and preventing the device from updating (as any update will change the challenge). This is achievable within the context of a targeted attack, but not afterward or at large scale.
A few more metadata (starting and destination versions IDs and the Update Public Key) are also thrown in the signature so that reusing Update Keys doesn’t open too large of a vulnerability.
The Integrity Check
Cryptographically, the Integrity Check is very simple: it’s generated at the same time as the Update Manifest and is signed with the Update Private Key, tying it to the Update Manifest. Being unnecessary to the update installation, it is separated from the main Update Manifest and the Update Challenge. This lets us not having to write it to the flash with the rest of the data the bootloader will need access to, and to be discarded once the integrity check was generated and sent to Odin.
Functionally, there is again very little surprise. The Integrity Check is generated shortly after the Update Payload by generating a list of every region of memory read by the Update Payload. Those regions are then sorted and written in the Integrity Check. Meanwhile, a reference response is computed by using the ranges we’re sending and the reference original firmware, and appended to the metadata relative to the update, for use by Odin.
No… More… Context…
Wow, okay, no need to be rude, I was having fun. With the tools I just described, let’s see how we can deliver on our promises.
Integrity checks, including against incomplete update packages
The Update Manifest is fully signed, and thus can’t be tampered in any way without breaking the signature. Moreover, the signature is checked first after the download and before any field is even read.
If the Update Manifest is valid, the Update Challenge and the Integrity Check are validated using the Update Public Key, included in the Update Manifest. This means that it’s impossible to exchange the Update Challenge or the Integrity Check between multiple updates. Moreover, the Update Challenge signature is computed using more than simply the data in the structure, preventing accidental swap of those two data structures (processing the Integrity Check as if it was the Update Challenge were they exchanges in the data stream).
The Update Payload is validated by computing its cryptographic hash and checking it with a reference included in the Update Manifest. It is thus basically impossible to use an Update Payload with the incorrect Update Manifest, even in the Update Payload isn’t actually signed.
Protection against any kind of forged update, even when based on a valid update
On top of the measures taken in regard to the previous guarantee, the size of the Update Payload also included in the Update Manifest. This protects Munin against an attacker sending an overly large Update Payload which might result in a DoS attack. The download is aborted as soon as the payload size exceed what is expected.
Prevention of accidental distribution error ending up with bricked devices
Update Manifest are signed using the Device Private Key. As we’re assuming each hardware revision ship with a different Device Public Key, the Update Manifest destined to another device (and thus another Device Public Key) will not verify, and thus prevent further processing. The starting firmware version of the delta update being also included in the Update Manifest (which also contain the expected hash of the Update Payload) prevent installing an incompatible Update Payload to a device, even if Odin makes a mistake.
Protection against downgrade, and partial upgrade (upgrade to a more recent, but obsolete version)
The destination version is checked very early in the update process, from a trusted source (the verified Update Manifest). This prevents Munin from ever downgrading a firmware version (this also prevents rolling back, although this could be patched in).
Partial upgrade are mostly mitigated by the Update Challenge, although the approach is a bit more complex than usual is order to comply with our stringent hardware restrictions and some of our other security guarantees. We’re still vulnerable to update replay, assuming an attacker managed to capture a valid update for our device, including the device specific Update Challenge.
Protection against running delta updates on a unknown state
Before sending the Update Payload, Odin require from Munin a “proof” that the most important regions of its storage (that is, the one we will reuse) are in the state the delta update expect. This would prevent the update from running if the device isn’t in the expected state, and thus mitigate the risk of the delta update generating garbage.
Limiting the impact of a server-side leak of cryptographic material
The main weakness of the earliest draft of the protocol was the power of the Device Private Key. This key had to be available to sign Odin’s challenge, but also had the power to issue new Update Manifest. The use of Update Private Keys, provisioned by the Update Manifest and thus strongly tied to a single update, limit the risk.
Thanks to this design, we can freely distribute the Update Private Keys to a CDN architecture in which Odin reside. If the CDN get breached, the only security guarantee suffering from the loss of cryptographic material is the protection against partial updates (upgrading to an obsolete version, later than the one the device is running). It also makes stricter management of the Device Private Key possible (i.e. storing the keys in a dedicated, secure server (HSM) with tight access control).
Use of fast cryptography with minimal device provisioning
As I mentioned earlier, the device is only provisioned with two data.
The Device Public Key, which is a 32 bytes cryptographic key, is generated by our cryptographic library libHydrogen and is hardcoded in the firmware of the device. This key is used for verifying cryptographic signature made using an algorithm based on Curve25519, which is a secure but also very fast elliptic curve. This library and algorithm were selected thanks to:
- their density: equivalent security using RSA would require a 512 bytes key, and similarly sized signature;
- the speed of signature verification: more than 10× faster than equivalent elliptical curves included in the system library, equivalent to RSA for a similar level of security;
- their track record: well known maintainers, no known security issues despite widespread use.
Acute readers may have noticed that the second bit of provisioning is also hardcoded in the firmware, and if the first challenge the device will send to Odin. Considering this challenge is used to prevent update replay, that’s pretty bad! Thankfully, after every update, the challenge is replaced by the signature of the Challenge Response. Because the Challenge Response includes some request specific, random, components, two devices with identical challenges will get two different Challenge Response and thus will have different challenges for the next update. Ideally, devices should get a “dummy” update after manufacturing but before sale, so each device can get a unique challenge at minimal cost when exiting the factory.
Supporting simple evolution toward encrypting the Update Payload
Today, the Update Payload is sent unencrypted over the Internet. This is not an integrity concern (it can’t be tampered with in any way) but may leak trade secret the designer would like to keep secret. This is especially a concern when physical counterfeit can get made at scale very quickly. The main defense the designers turned to nowadays is embedding as much differentiation as possible in the software, and keeping it secret (thus making physical copies useless). Although not initially a design goal, encrypting the Update Payload could probably be tapped in fairly easily.
To do that, we’re assuming the current firmware is secret, thus making the Device Public Key secret. If it was dumped, the game is already lost as the attacker doesn’t really need to leak data out of the updates. If correct, we can add to the Update Manifest a field, which is for the symmetric key used to encrypt the Update Payload. The key is then encrypted using the Device Private Key, or another provisioned key depending of the technical details.
Final words
The goal of this article was to demonstrate a minimal protocol enabling a device to authenticate a server payload in very constrained circumstances (no time, no real random, basic provisioning). If you have any comment, feel free to fire them my way, or to open an issue on GitHub. Thanks for your time :)