[RFD] x86: Curing the exception and syscall trainwreck in hardware

* [RFD] x86: Curing the exception and syscall trainwreck in hardware
@ 2020-08-24 12:24 Thomas Gleixner
  2020-08-24 13:52 ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2020-08-24 12:24 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tom Lendacky, Pu Wen, Stephen Hemminger,
	Sasha Levin, Andrew Cooper, Dirk Hohndel, Jan Kiszka,
	Tony W Wang-oc, H. Peter Anvin, Asit Mallick, Gordon Tetlow,
	David Kaplan, Tony Luck

It's a sad state of affairs that I have to write this mail at all and it's
nothing else than an act of desperation.

The x86 exception handling including the various ways of syscall entry/exit
are a constant source of trouble. Aside of being a functional disaster
quite some of these issues have severe security implications.

There are similar issues on the virtualization side including the handling
of essential MSRs which are required to run a guest OS and even more so
with the upcoming virt specific exceptions of various vendors.

We are asking the vendors for more than a decade to fix this situation, but
even the most trivial requests like an IRET variant which does not reenable
NMIs unconditionally and other small things which would make our life less
miserable aren't happening.

Instead of fixing the underlying design fails first and creating a solid
base the vendors add even more ill defined exception variants on top of
the existing pile. Unsurprisingly these add-ons are creating more
problems than they solve, but being based on the existing house of cards
that's obviously expected.

This really has to stop and the underlying issues have to be resolved
before more problems are inflicted upon operating systems and hypervisors.
The amount of code to workaround these issues is already by far larger than
the actual functional code. Some of these workarounds are just bandaids
which try to prevent the most obvious damage, but they are mostly based on
the hope that the unfixable corner cases never happen.

There is talk about solutions for years, but it's just talk and we have not
yet seen a coordinated effort accross the x86 vendors to come up with a
sane replacement for the x86 exception and syscall trainwreck.

The important word here is 'coordinated'. We are not at all interested
in different solutions from different vendors. It's going to be
challenging enough to maintain ONE parallel exception/syscall handling
implementation.  In other words, the kernel is going to support exactly
ONE new exception/syscall handling mechanism and not going to accomodate
every vendor.

So I call on the x86 vendors to sit together and come up with a unified
and consolidated base on which each of the vendors can build their
differentiating features.

Aside of coordination between the x86 vendors this also requires
coordination with the people who finally have to deal with that on the
software side. The prevailing hardware engineering principle "That can
be fixed in software" does not work; it never worked - especially not in
the area of x86 exception and syscall handling.

This coordination must include all major operating systems and hypervisors
whether open source or proprietary to ensure that the different
requirements are met. This kind of coordination has happened in the context
of the hardware vulnerability mitigations already in a fruitful way so
this request is not asking for something impossible.

If the x86 vendors are unable to talk to each other and coordinate on a
solution, then the ultimate backstop might be to take the first reasonable
design specification and the first reasonable silicon implementation of it
as the ONE alternative solution to the existing trainwreck. How the other
vendors are going to deal with that is none of our business. That's the
least useful and least desired outcome and will only happen when the x86
vendors are not able to get their act together and sort that out upfront.

Thanks,

	Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread