Spectre, Meltdown, and Virtual Systems

In June of 2017 I wrote a blog for InfoWorld on How to handle the risks of hypervisor hacking. In it, I described the theoretical points where Virtual Machines (VMs) and hypervisors could be hacked. My crystal ball must have been well polished. Spectre and Meltdown prey on one of the points I described there.

What I did not predict is where the vulnerability would come from. As a software engineer, I always think about software vulnerabilities, but I tend to assume that the hardware is seldom at fault. I took one class in computer hardware design thirty years ago. Since then, my working approach is to look first for software flaws and only consider hardware when I am forced, kicking and screaming, to examine for hardware failure. This is usually a good plan for a software engineer. As a rule, when hardware fails, the device bricks (is completely dead), seldom does it continue to function. There is usually not much beyond rewriting drivers that a coder can do to fix a hardware issue. Even rewriting a driver is usually beyond me because it takes more hardware expertise than I have to write a correct driver.

In my previous blog here, I wrote that Spectre and Meltdown probably will not affect individual users much. So far, that is still true, but the real impact of these vulnerabilities is being felt by service providers, businesses, and organizations that make extensive use of virtual systems. Although the performance degradation after temporary fixes have been applied is not as serious as previously estimated, some loads are seeing serious hits and even single digit degradation can be significant is scaled up systems. Already, we’ve seen some botched fixes, which never help anyone.

Hardware flaws are more serious than software flaws for several reasons. A software flaw is usually limited to a single piece of software, often an application. A vulnerability limited to a single application is relatively easy to defend against. Just disable or uninstall the application until it is fixed. Inconvenient, but less of a problem than an operating system vulnerability that may force you to shut down many applications and halt work until the operating system supplier offers a fix to the OS. A flaw in a basic software library can be worse: it may affect many applications and operating systems. The bright side is that software patches can be written and applied quickly and even automatically installed without computer user intervention— sometimes too quickly when the fix is rushed and inadequately tested before deployment— but the interval from discovery of a vulnerability to patch deployment is usually weeks or months, not years.

Hardware chip level flaws cut a wider and longer swathe. A hardware flaw typically affects every application, operating system, and embedded system running on the hardware. In some cases, new microcode can correct hardware flaws, but in the most serious cases, new chips must be installed, and sometimes new sets of chips and new circuit boards are required. If installing microcode will not fix the problem, at the very least, someone has to physically open a case and replace a component. Not a trivial task with more than one or two boxes to fix and a major project in a data center with hundreds or thousands of devices. Often, a fix requires replacing an entire unit, either because that is the only way to fix the problem, or because replacing the entire unit is easier and ultimately cheaper.

Both Intel and AMD have announced hardware fixes to the Spectre and Meltdown vulnerabilities. The replacement chips will probably roll out within the year. The fix may only entail a single chip replacement, but it is a solid prediction that many computers will be replaced. The Spectre and Meltdown vulnerabilities exist in processors deployed ten years ago. Many of the computers using these processors are obsolete, considering that a processor over eighteen months old is often no longer supported by the manufacturer. These machines are probably cheaper to replace than upgrade, even if an upgrade is available. More recent upgradable machines will frequently be replaced anyway because upgrading a machine near the end of its lifecycle is a poor investment. Some sites will put off costly replacements. In other words, the computing industry will struggle with the issues raised by Spectre and Meltdown for years to come.

There is yet another reason vulnerabilities in hardware are worse than software vulnerabilities. The software industry is still coping with the aftermath of a period when computer security was given inadequate attention. At the turn of the 21st century, most developers had no idea that losses due to insecure computing would soon be measured in billions of dollars per year. The industry has changed— software engineers no longer dismiss security as an optional afterthought, but a decade after the problems became unmistakable, we are still learning to build secure software. I discuss this at length in my book, Personal Cybersecurity.

Spectre and Meltdown suggest that the hardware side may not have taken security as seriously as the software side. Now that criminal and state-sponsored hackers are aware that hardware has vulnerabilities, they will begin to look hard to find new flaws in hardware for subverting systems. A whole new world of hacking possibilities awaits.

We know from the software experience that it takes time for engineers to develop and internalize methodologies for creating secure systems. We can hope that hardware engineers will take advantage of software security lessons, but secure processor design methodologies are unlikely to appear overnight, and a backlog of insecure hardware surprises may be waiting for us.

The next year or two promises to be interesting.

Spectre and Meltdown

Will Spectre and Meltdown be the flagship computer security crisis of 2018? There is a good chance that it will be, although I doubt that many personal computer users will be directly affected.

Good news

These flaws are hard to understand and take advanced engineering skills to implement; when implemented they are hard to exploit; I struggle to imagine results that would be worth a hacker’s trouble. Also, exploiting these flaws on a computer you do not already have access to is close to impossible. Consequently, good basic computer hygiene will protect you from these attacks as well as everything else thrown at you. In addition, the exploits are read-only; they do not corrupt data or processes.

The patches are going out this week to all the major operating systems and so far, the bruited predictions of devastating across-the-board 30% performance degradations have not proven out. 10% degradation and only in limited circumstances seems more realistic according to early testing reports.

Less good news

Nevertheless, the fallout from Spectre and Meltdown is likely to cause migraines and insomnia among computer security experts for months, even years to come. And the picture is not quite as rosy for businesses, especially for businesses that rely on virtual computing in various forms, as it is for individuals.

Scope

These are not your garden variety zero-day exploits. When I wrote about KRACK a few months ago, I explained that the flaw is particularly bad because it is in the standard and every correct implementation is vulnerable. The Spectre and Meltdown flaws are in the processor chip design. Intel processors have the worst problems and they perform the vast majority of computer processing in the world today, but AMD and ARM processors are also affected. That covers most of the rest of computing, including phones and tablets. For reasons I will elaborate on later, I suspect other processors have not been cited only because no one has looked hard enough yet.

The patches that have been applied are crack sealers; they do not repair the broken foundation that caused the cracks. Fixing the source of the cracks will require new processor designs and new chips. In order to explain just what Spectre and Meltdown are, I have to explain several unfamiliar concepts.

Protection rings

One of the pillars of computer security is called a “protection ring.” They are what prevents one computer process from interfering with another. For example, without protection rings, forcing a user to pass through a login gate before using a computer is easier to circumvent. Protection rings have been built right into the silicon of most processors since the eighties and the concept goes back to the beginnings of multi-processing in the 60s.

To science fiction readers, I liken protection rings to Asimov’s laws of robotics—they are intended to be intrinsic in all computers. In theory, protection rings when properly used make it impossible to break into a well-written operating system without physically altering the processor. When a computer is hacked into, it usually stems from a flaw in the operating system’s use of protection rings, not the physical processor chip.

The Spectre and Meltdown flaws are special because they are gaps in the integrity of privilege rings that were inadvertently built into the processor chips. To see how these gaps were opened, we have to look at concepts of modern processor design.

Multi-core processors

One of these concepts is “multi-core processors.” Before the advent of multi-cores, the capacity of processors was beginning to be limited by the great physical speed limit: the speed of light. When a processor reaches a certain number of instructions per second, it is limited by the time a signal takes to travel across the chip at the speed of light. The processor can’t move on to the next instruction in less time than it takes to read he previous instruction’s results.

Processor designers got around that by putting multiple processors, cores, on a single chip. In theory, by putting two cores on a chip, the speed is doubled. But that does not really solve the problem because taking advantage of the doubled speed requires complex and expensive changes in program design.

Speculative execution

The designers hit on a solution to this: speculative execution. Most computer programs are long chains of “if-thens”. If X condition is met, do Y; if it is not met, do Z. Traditional computers first evaluate X, then decide whether to perform Y or Z. With speculative execution, at the same time one core evaluates X, another core performs Y, and a third performs Z. Depending on how X comes out, Y or Z is discarded. This is a gross simplification, but in the time a single core uses to evaluate X, the three cores already have both the Y and Z results. Thus, the multi-core processor executes a conventionally written program in much less time than a single core. And the speed of computing doubles in 18 months again. Nifty, huh?

Not so nifty. Those discarded speculative chunks of execution can be manipulated in such a way that protection rings are violated. I won’t go into how it’s done. A Google researcher explains it here.

Migraines and insomnia

I am not optimistic when I think about what these defects reveal about processor design. Software development underwent a revolution in the early part of this century when security rose in priority. You can read about it in my book, Personal Cybersecurity. Security was a neglected step-child in the pioneering days of software development in the last century, but around 2000, the industry realized that computing would die if software was not built with more secure methodologies. The revolution is still going on, but the slap-dash attitude toward security that characterized the software cowboys of the 90s is gone.

Spectre and Meltdown tell me that the security revolution did not make it into processor design. Makes you think about why the CEO of Intel sold a big block of Intel stock after the flaws in Intel chips were discovered.

I am afraid we have not heard the last of chip level security flaws. I hope processor designs are not easy pickings for hackers, but the fact that these flaws have been present for at least a decade is daunting. Also, to completely eradicate these flaws, processor chips or entire computers will have to be replaced, which suggests that heads will ache on for years.

Coming soon

I wrote a blog on hypervisor hacking and one on virtual machine security for Network World last year that are affected by the Spectre and Meltdown flaws, but I’ll save comments on the safety of virtual computing for another blog.