It’s no longer necessary to run attacker code on the victim system.

When the Spectre and Meltdown attacks were disclosed earlier this year, the initial exploits required an attacker to be able to run code of their choosing on a victim system. This made browsers vulnerable, as suitably crafted JavaScript could be used to perform Spectre attacks. Cloud hosts were susceptible, too. But outside these situations, the impact seemed relatively limited.

That impact is now a little larger. Researchers from Graz University of Technology, including one of the original Meltdown discoverers, Daniel Gruss, have described NetSpectre: a fully remote attack based on Spectre. With NetSpectre, an attacker can remotely read the memory of a victim system without running any code on that system.

All the variants of the Spectre attacks follow a common set of principles. Each processor has an architectural behavior (the documented behavior that describes how the instructions work and that programmers depend on to write their programs) and a microarchitectural behavior (the way an actual implementation of the architecture behaves). These can diverge in subtle ways. For example, architecturally, a program that loads a value from a particular address in memory will wait until the address is known before trying to perform the load. Microarchitecturally, however, the processor might try to speculatively guess at the address so that it can start loading the value from memory (which is slow) even before it’s absolutely certain of which address it should use.

If the processor guesses wrong, it will ignore the guessed-at value and perform the load again, this time with the correct address. The architecturally defined behavior is thus preserved. But that faulty guess will disturb other parts of the processor—in particular the contents of the cache. These microarchitectural disturbances can be detected and measured by timing how long it takes to access data that should (or shouldn’t) be in the cache, allowing a malicious program to make inferences about the values stored in memory. These information paths are known collectively as side channels.

NetSpectre builds on these principles; it just has to work a lot harder to exploit them. With a malicious JavaScript, for example, exploitation is fairly straightforward. The JavaScript developer has relatively fine control over the instructions the processor executes and can both perform speculative execution and measure differences in cache performance quite easily. With remote execution, that’s a lot harder: the code to perform a vulnerable speculative execution (the “leak gadget”) and the code to disclose the differences in microarchitectural state over the network (the “transmit gadget”) have to both already exist somewhere on the remote system, such that a remote attacker can reliably call them.

The researchers found that both of these parts could be found in networked applications. For the networked attack, rather than measuring cache performance, the attack measures the time taken to respond to network requests. The disturbance to the microarchitectural state is such that it can cause a measurably different response time to the request.

New side channels

Two different remote measurements were developed. The first is a variation on the cache timing approach already demonstrated with Spectre. The attacker makes the remote system perform a large data transfer (in this case, a file download), which fills the processor’s cache with useless data. The attacker then calls the leak gadget to will speculatively load (or not load) some value in the processor’s cache, followed by the transmit gadget. If the speculative execution loaded the value then the transmit gadget will be fast; if it didn’t, it’ll be slow.

The second measurement is novel and doesn’t use the cache at all. Instead, it relies on the behavior of the AVX2 vector instruction set on Intel processors. The units that process AVX2 instructions are large and power hungry. Accordingly, the processor will power down those units when it hasn’t run any AVX2 code for a millisecond or two, powering them up later when needed. There’s also an intermediate half powered state. Brief uses of AVX2 will use this half powered state (at the cost of lower performance); the processor will only fully enable (or fully disable) the AVX2 units after extended periods of use (or non-use). This microarchitectural feature can be measured: if the AVX2 units are fully powered down, running an AVX2 instruction will take longer than if the units are fully powered up.

For this AVX2 side channel, the leak gadget is a fragment of code that speculatively uses an AVX2 instruction. The transmit gadget is a fragment of code that always uses an AVX2 instruction. If the processor speculates that AVX2 is needed then it’ll start powering up the AVX2 units; this will make the subsequent transmit gadget run quickly. If, however, the processor speculates that the AVX2 code won’t be used, the transmit gadget will take longer. These small performance differences are large enough to be measured over a network.

The AVX2 side channel was found to be quite a bit faster than the cache side channel, but both are very slow. Network stacks are complicated, and network traffic makes network latency variable. In spite of this, the side channels still work, but even on a local network, the researchers needed about 100,000 measurements to discern the value of a single bit. To make their attack reliable and consistent, they used 1,000,000 measurements per bit. Using a gigabit network to an Intel-based system and the cache-based side channel, this enabled an overall rate of data extraction of about one byte every 30 minutes. The AVX2 side channel is much faster—one byte every eight minutes—but still very slow.

Over a remote network to a system hosted in Google Cloud, 20 million measurements were needed for each bit, and the data rate dropped to one byte every eight hours for the cache side channel, every three hours for the AVX2 one.

These data rates are far too slow to extract any significant amount of data; even the fastest side channel (AVX2 over the local network) would take about 15 years to read 1MB of data. They might, however, be sufficient for highly targeted data extraction; a few hundred bits of an encryption key, for example. The cache side channel can be used to leak memory addresses, which in turn can be used to defeat the randomized memory addresses used by ASLR (address space layout randomization). Leaking a memory address to defeat ASLR took about two hours. With this memory address information, an attacker would be able to more easily attack otherexploitable flaws of a remote system.

The same countermeasures as are effective against Spectre—code changes that one way or another prevent speculative execution of sensitive code—are effective against NetSpectre. NetSpectre does, however, make the label “sensitive code” rather broader than it was before; there are now many more pathways and system components that might potentially be used to leak information. The slow transfer rates mean that the utility of NetSpectre is limited, but this underscores how the initial Spectre research was a launching point for a wide range of related attacks. We doubt this will be the last.

Update: Intel has issued a statement that says much the same thing.

NetSpectre is an application of Bounds Check Bypass (CVE-2017-5753), and is mitigated in the same manner – through code inspection and modification of software to ensure a speculation stopping barrier is in place where appropriate. We provide guidance for developers in our whitepaper, Analyzing Potential Bounds Check Bypass Vulnerabilities, which has been updated to incorporate this method. We are thankful to Michael Schwarz, Daniel Gruss, Martin Schwarzl, Moritz Lipp, & Stefan Mangard of Graz University of Technology for reporting their research.