Contextual Bloat

Contextual bloat

Background

Bert Hubert recently wrote an excellent piece on software bloat and related this to downstream effects on security. Why Bloat Is Still Software’s Biggest Vulnerability - IEEE Spectrum I suggest reading that first, as well as Niklaus Wirth’s 1995 article “A Plea for Lean Software”. This post is mostly an attempt at delineating a strategy to improve the situation in the short term.

One of the examples given in Hubert’s original post is a garage door opener that uses 50 million lines of code. That seems like a very large number, especially when compared to some choice open source projects. Chromium, the open-source project behind Google Chrome, is approximately 40 million lines of code. The Linux kernel contains 46 million lines of code.

Reminder of why there’s so much code

As noted by Hubert and Wirth, one of the reasons for increasing bloat is cost. If you want to create anything involving software, it’s much easier and faster to build on existing components others have created. For those making components others build on, it’s also easier and faster to build on existing components. If existing open source software can get you 90% of the way to what you’re looking to build, why wouldn’t you use it?

If a given platform and programming language have a large pool of developers to draw from, the cost of creating software on that platform should generally be lower. In addition to the obvious benefit of being able to find developers to complete the work at a lower price point, larger platforms should have more example code to use. There will likely also be far more support available and more extensions or plugins for the platform that bring you closer to what you’re trying to achieve without even having started on the project. Those extensions might increase the number of lines of code involved in the product, but if the overall cost of the product is lower then it’s more likely to be a marketplace success.

Context

One way to improve the situation is to examine the security context in which the code is running. Decompose the software into the components operating with different privilege levels. If everything runs at the same privilege level (such as the root user), it behooves you to consider privilege separation. Consider separating processes and running them as different users, with well defined interfaces between those. In the case of a microcontroller without an MMU this could be more difficult, but separating a few components is sometimes possible with only an MPU.

Another example I thought of while considering software bloat is a video game. A single player game that is distributed by a large platform will have few, if any, entry points for malicious content. Having everything run as the same user here might not be worth any effort, because there is no meaningfully different context or point of entry for an attacker. A multiplayer game which displays arbitrary models and textures from other players would have a very different attack surface, though there are techniques to design for this. For this example, the usual approach of fuzzing to discover vulnerabilities in the model and texture parsers is good, but fuzzing might not discover all vulnerabilities. Adding a sandbox where the model and texture parsers operate might provide more value than extending fuzzing campaigns.

Threat Modeling

One tried and tested way to determine where to focus these efforts is in threat modeling. There has been a significant amount written on the subject, and there are various frameworks around for it. The tl;dr of those is that security (hopefully in collaboration with the development teams) will enumerate assets to protect and possible threats to those assets, along with likely interfaces attackers can use to find a path to those assets.

If you can take that threat model and augment it with the properties of the code you’d like to measure, you’ll be a lot closer to figuring out where to focus efforts. Huge codebases written in memory unsafe languages like C++ which also process highly complex inputs will likely benefit from attention, whereas smaller codebases that are written in, say, a formally verified language that has already received significant attention might warrant being ignored. (While memory safety is not the only source of vulnerabilities, I usually see the number 70% thrown around as an estimate of how many vulnerabilities arise from memory corruption.)

Conclusion

Holistically, the garage door example could reasonably contain millions lines of code and still provide an adequate amount of security. To get to that point, I would want to consider every piece of code that goes into creating the hardware to open the garage door. For each component, consider:

In what security context is it running? (Does everything run as root?)
Does the system support running this component in a meaningfully unprivileged context? For a low powered microcontroller without an MMU there may only be a limited amount of RAM, and with an MPU there may only be a few regions available in which to isolate code. If any sandbox implementation is trivial to escape then efforts might be best placed elsewhere, as in reducing the amount of code.
Is the code processing untrusted input? (Is it parsing packets sent over Bluetooth LE? Are there appropriate permissions on things exposed over BLE?)
If a single device is compromised, are they all compromised? If all of the garage door openers talk to the same cloud backend and all use the same private key for authentication, could someone who compromised a single opener open all garage doors everywhere at the same time?
What is the impact to users of a compromise? Might it be a company ending event, or is it a mild denial of service that ends when the attacker is no longer physically present?

This is not to say that I think it’s good that there is far more code than necessary in implementing the garage door opener. I wholeheartedly agree with the desire for smaller codebases. Code that doesn’t exist can’t be exploited. (Though to be pedantic, removing code that implements permissions checks will likely not improve security) When faced with a product that has already been created, a strategy of scoping the existing product’s security through threat modeling, privilege separation, and compartmentalizing dangerous or sensitive code can get us to a place where the vulnerabilities aren’t as immediately exploitable.

Background#

Reminder of why there’s so much code#

Context#

Threat Modeling#

Conclusion#

Background

Reminder of why there’s so much code

Context

Threat Modeling

Conclusion