Debian Project Aims to Keep the CIA Off Our Computers
on September 15, 2015
Lunar, one of the lead developers on the Debian ReproducibleBuilds project, has recently outlined a serious security hole that could impact all open-source software, including most Linux distributions. It potentially exposes users to unwanted scrutiny from third parties, including security agencies. His project is designed to close this hole.
1: The source code must be changed so that variables are always initialized to static values (not dynamic values from memory, which can be random).
2: Eliminate the use of timestamps, source code file paths, and build numbers.
3: Specify the exact build environment, so that it can be reproduced on different computers.
https://wiki.debian.org/ReproducibleBuilds
and
https://reproducible.alioth.debian.org/presentations/2015-08-13-CCCamp15-outline.pdf
One of the big advantages of open source software is that third parties can inspect the code to ensure it does what it's supposed to. If any malicious code is present, it can be detected and eliminated. But when software is distributed in the form of a binary executable, there is a risk that malicious code (not present in the original source code) has been added.
This doesn't necessarily mean that the developer intended to distribute corrupted code. If the developer is using a compromised compiler, it could introduce the malware as the source code is turned into an executable.
This may sound a little far-fetched, but in actual fact it is a real security concern. The Snowden leak has revealed that the CIA is working on ways to exploit these weaknesses to install snooping software onto consumer devices all over the world.
At a recent conference organized by the CIA, a team of developers presented a proof of concept. They had managed to bypass Apple's digital certificates to produce a corrupted version of XCode, Apple's proprietary compiler. This compiler is used to by independent developers to make OS X and IOS apps. The corrupted version embeds spyware into any application compiled by the developer without their knowledge.
These apps could find their way into the app stores, and potentially onto millions of consumer devices. This would allow security agencies to snoop on the conversations and private messages of millions of innocent users all over the world.
If Apple is a hot target, then Linux is an even more tempting one. Security conscious users who understand the risk of commercial platforms often use Linux for its tighter security features. This includes people who the security agencies are VERY interested in spying on.
Anti-virus software can detect fragments of known malware, but this is only possible after instances of the malware have been discovered and analyzed. It doesn't protect against new or previously undetected malware infections. In short, anti-virus software is not enough to protect against this type of attack.
The only way to be sure that a binary executable does not include any unexpected code is to compile the source code and compare the two files. If the freshly compiled file does not match the binary executable under test, it could have added code, possibly malware.
While this is a basically sound idea, there is a major fly in the ointment. The source code for the majority of Linux packages is written in such a way that it doesn't always compile to produce an identical binary file.
There are several reasons why a compiled file may be different. This includes:
- Timestamps embedded in the code.
- Incremental build numbers.
- Differences between different file systems, so a binary compiled on my computer is different from one compiled on yours.
- File paths from the build machine are embedded into the binary - different computers could store resources and code in different locations.
- Random data from memory or the CPU embedded into the compiled file. And so on.
The problem of producing reproducible builds requires a number of changes to be made:
1: The source code must be changed so that variables are always initialized to static values (not dynamic values from memory, which can be random).
2: Eliminate the use of timestamps, source code file paths, and build numbers.
3: Specify the exact build environment, so that it can be reproduced on different computers.
As you can imagine, this could be painstaking work on a single project. But the Debian project has over 20,000 packages, and the majority of them need to be overhauled. This is a major undertaking, to say the least.
But it has to be done. A single corrupted package could result in thousands of infected computers.
You can read more about the project at:
https://wiki.debian.org/ReproducibleBuilds
and
https://reproducible.alioth.debian.org/presentations/2015-08-13-CCCamp15-outline.pdf