[RFC] spectre hardware-software cooperative mitigation

* [RFC] spectre hardware-software cooperative mitigation
       [not found] <CAPweEDyvUdGvwy4u5qoA2QgDow0npZBMqjeMiU2RqzrJ_q4Omw@mail.gmail.com>
@ 2019-01-14 18:55 ` Luke Kenneth Casson Leighton
  2019-01-18 15:07   ` Alan Cox
  0 siblings, 1 reply; 3+ messages in thread
From: Luke Kenneth Casson Leighton @ 2019-01-14 18:55 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi all, please cc me on replies. Hardware discussion may be found here:
https://groups.google.com/forum/?nomobile=true#!topic/comp.arch/mzXXTU2GUSo

I am designing a new processor, based on RISCV, that is intended as a
hybrid GPU VPU and CPU. For various reasons, it needs to be a
multi-issue Out of Order engine. The innocent question was therefore
asked, "how is Spectre to be dealt with?" which threw a massive
spanner in the works.

The processor is being designed to use multi-issue as a means to
implement Vector Processing. For example: for predicated elements,
several instructions (one per element) will be thrown into the
*standard* multi-issue instruction queue, and cancelled only when the
register containing the predicate mask is available and has been
decoded. Thus, resources are taken up that will affect and be affected
by other instructions, which is the very definition of Spectre timing
attacks.

ooops.

Standard Spectre mitigation would completely destroy the performance
and viability of the project's Vector Engine, as well as many other
features.

So I have a proposal that, if correct and implemented, may be adopted
by other architectures as a mitigation solution that allows out of
order to continue to be used. It is a collaborative solution that
specifically requires explicit instructions to be added (and called)
at the aporopriate time(s).

The issue with Spectre attacks is that untrusted code may cause past
OR FUTURE instructions to change the amount of time in which they will
complete. An in-order architecture does not have this problem (except
where pipeline stalls occur), as there is always [almost always]
enough resources available that allow instructions (pipelines) to
proceed without blocking.

OoO typically has resource bottlenecks that are affected by other
instructions. The whole POINT of an OoO design is to run ahead,
utilising these resources speculatively and, duh, out of order.

To deal with absolutely every possible flaw in the OoO paradigm is a
total nightmare. Performance as people are discovering is utterly
trashed. Code complexity both in software terms and hardware terms
goes mental. Intel had to REMOVE hyperthreading from its latest
processors, the crossover timing leakage is that bad.

There is another way to ensure that untrusted code cannot affect
secure code: clear out the "internal state" of the processor before
letting it proceed to run the untrusted code.

In this way it becomes impossible for untrusted code to ascertain the
state of the processor, because it has been reset back to a known
uniform (blank) state.

This REQUIRES an actual instruction that programs (and the kernel) may
call. It is NOT ENOUGH that the linux kernel try to deal with
absolutely every possible situation automatically, and it is a total
nightmare to even try.

It is also not enough that the hardware try to deal with this on its
own: that is insanely complex as well. The only real safe way is to
abandon all of the benefits of OoO and go back to in-order SINGLE
issue performance levels.

Clearly, both options are not viable or acceptable.

A hybrid solution is a reasonable compromise, that may even be
possible to implement right now, with code that, on processors that do
not have the proposed new instruction, issues sufficient NOPs (or
other suitably researched instructions) such that they create a
"processor internal state" firebreak between secure and untrusted
code.

The hardware version of the firebreak opcode would WAIT until the
processor internal state has cleared out. All outstanding speculative
instructions would be cancelled. All instructions waiting for
pipelines to complete would be waited for until they had completed,
and their results written to the register file. Only then would the
processor be allowed to proceed.

It is not enough to have these "firebreak" calls done automatically by
the linux kernel: they need to be part of standard applications. An
example is firefox, which has a single process for javascript. Specre
atracks have been shown to exist using untrusted arbitrary javascript,
and if that javascript is being executed by a single process, then it
is the responsibility of that process to call the "firebreak" just
before allowing the untrusted javascript to execute.

This is going to be a mammoth task. The alternatives are to continue
as things are, which is a mess that cannot be cleaned up by either of
(mutually exclusive) hardware or software alone.

Thoughts and feedback appreciated.

l.

^ permalink raw reply	[flat|nested] 3+ messages in thread