Re: Linux guest kernel threat model for Confidential Computing

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>,
	"Shishkin, Alexander" <alexander.shishkin@intel.com>,
	"Shutemov, Kirill" <kirill.shutemov@intel.com>,
	"Kuppuswamy,
	Sathyanarayanan" <sathyanarayanan.kuppuswamy@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	"Wunner, Lukas" <lukas.wunner@intel.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	"Poimboe, Josh" <jpoimboe@redhat.com>,
	"aarcange@redhat.com" <aarcange@redhat.com>,
	Cfir Cohen <cfir@google.com>, Marc Orr <marcorr@google.com>,
	"jbachmann@google.com" <jbachmann@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"keescook@chromium.org" <keescook@chromium.org>,
	James Morris <jmorris@namei.org>,
	Michael Kelley <mikelley@microsoft.com>,
	"Lange, Jon" <jlange@microsoft.com>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux guest kernel threat model for Confidential Computing
Date: Wed, 25 Jan 2023 15:50:36 +0000	[thread overview]
Message-ID: <Y9FPzD/kpL4sI/Qa@work-vm> (raw)
In-Reply-To: <Y9FHsvVoPbgMR2s3@kroah.com>

* Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg, 
> > > > > > 
> > > > > > You mentioned couple of times (last time in this recent thread:
> > > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > > > > discussing the updated threat model for kernel, so this email is a start in this direction. 
> > > > > 
> > > > > Any specific reason you didn't cc: the linux-hardening mailing list?
> > > > > This seems to be in their area as well, right?
> > > > > 
> > > > > > As we have shared before in various lkml threads/conference presentations
> > > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a 
> > > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. 
> > > > > 
> > > > > That is, frankly, a very funny threat model.  How realistic is it really
> > > > > given all of the other ways that a hypervisor can mess with a guest?
> > > > 
> > > > It's what a lot of people would like; in the early attempts it was easy
> > > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> > > > can mess with - remember that not just the memory is encrypted, so is
> > > > the register state, and the guest gets to see changes to mapping and a
> > > > lot of control over interrupt injection etc.
> > > 
> > > And due to the fact that SEV and TDX really do not work, how is anyone
> > > expecting any of this to work?  As one heckler on IRC recently put it,
> > > if you squint hard enough, you can kind of ignore the real-world issues
> > > here, so perhaps this should all be called "squint-puting" in order to
> > > feel like you have a "confidential" system?  :)
> > 
> > I agree the original SEV was that weak; I've not seen anyone give a good
> > argument against SNP or TDX.
> 
> Argument that it doesn't work?  I thought that ship sailed a long time
> ago but I could be wrong as I don't really pay attention to that stuff
> as it's just vaporware :)

You're being unfair claiming it's vaporware.  You can go out and buy SNP
hardware now (for over a year), the patches are on list and under review
(and have been for quite a while).
If you're claiming it doesn't, please justify it.

> > > > > So what do you actually trust here?  The CPU?  A device?  Nothing?
> > > > 
> > > > We trust the actual physical CPU, provided that it can prove that it's a
> > > > real CPU with the CoCo hardware enabled.
> > > 
> > > Great, so why not have hardware attestation also for your devices you
> > > wish to talk to?  Why not use that as well?  Then you don't have to
> > > worry about anything in the guest.
> > 
> > There were some talks at Plumbers where PCIe is working on adding that;
> > it's not there yet though.  I think that's PCIe 'Integrity and Data
> > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > SPDM.   I don't know much of the detail of those, just that they're far
> > enough off that people aren't depending on them yet.
> 
> Then work with those groups to implement that in an industry-wide way
> and then take advantage of it by adding support for it to Linux!  Don't
> try to reinvent the same thing in a totally different way please.

Sure, people are working with them; but those are going to take time
and people want to use existing PCIe devices; and given that the hosts
are available that seems reasonable.   

> > > > Both the SNP and TDX hardware
> > > > can perform an attestation signed by the CPU to prove to someone
> > > > external that the guest is running on a real trusted CPU.
> > > 
> > > And again, do the same thing for the other hardware devices and all is
> > > good.  To not do that is to just guess and wave hands.  You know this :)
> > 
> > That wouldn't help you necessarily for virtual devices - where the
> > hypervisor implements the device (like a virtual NIC).
> 
> Then create a new bus for that if you don't trust the virtio bus today.

It's not that I distrust the virtio bus - just that we need to make sure
it's implementation is pessimistic enough for CoCo.

> > > > > I hate the term "hardening".  Please just say it for what it really is,
> > > > > "fixing bugs to handle broken hardware".  We've done that for years when
> > > > > dealing with PCI and USB and even CPUs doing things that they shouldn't
> > > > > be doing.  How is this any different in the end?
> > > > > 
> > > > > So what you also are saying here now is "we do not trust any PCI
> > > > > devices", so please just say that (why do you trust USB devices?)  If
> > > > > that is something that you all think that Linux should support, then
> > > > > let's go from there.
> > > > 
> > > > I don't think generally all PCI device drivers guard against all the
> > > > nasty things that a broken implementation of their hardware can do.
> > > 
> > > I know that all PCI drivers can NOT do that today as that was never
> > > anything that Linux was designed for.
> > 
> > Agreed; which again is why I only really worry about the subset of
> > devices I'd want in a CoCo VM.
> 
> Everyone wants a subset, different from other's subset, which means you
> need them all.  Sorry.

I think for CoCo the subset is fairly small, even including all the
people discussing it.  It's the virtual devices, and a few of their
favourite physical devices, but a fairly small subset.

> > > > The USB devices are probably a bit better, because they actually worry
> > > > about people walking up with a nasty HID device;  I'm skeptical that
> > > > a kernel would survive a purposely broken USB controller.
> > > 
> > > I agree with you there, USB drivers are only starting to be fuzzed at
> > > the descriptor level, that's all.  Which is why they too can be put into
> > > the "untrusted" area until you trust them.
> > > 
> > > > I'm not sure the request here isn't really to make sure *all* PCI devices
> > > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
> > > > and potentially ones that people will want to pass-through (which
> > > > generally needs a lot more work to make safe).
> > > > (I've not looked at these Intel tools to see what they cover)
> > > 
> > > Why not just create a whole new bus path for these "trusted" devices to
> > > attach to and do that instead of tyring to emulate a protocol that was
> > > explicitly designed NOT to this model at all?  Why are you trying to
> > > shoehorn something here and not just designing it properly from the
> > > beginning?
> > 
> > I'd be kind of OK with that for the virtual devices; but:
> > 
> >   a) I think you'd start reinventing PCIe with enumeration etc
> 
> Great, then work with the PCI group as talked about above to solve it
> properly and not do whack-a-mole like seems to be happening so far.
> 
> >   b) We do want those pass through NICs etc that are PCIe
> >     - as long as you use normal guest crypto stuff then the host
> >     can be just as nasty as it likes with the data they present.
> 
> Great, work with the PCI spec for verified devices.
> 
> >   c) The world has enough bus protocols, and people understand the
> >    basics of PCI(e) - we really don't need another one.
> 
> Great, work with the PCI spec people please.

As I say above; all happening - but it's going to take years.
It's wrong to leave users with less secure solutions if there are simple
fixes available.  I agree that if it involves major pain all over then
I can see your dislike - but if it's small fixes then what's the
problem?

> > > > Having said that, how happy are you with Thunderbolt PCI devices being
> > > > plugged into your laptop or into the hotplug NVMe slot on a server?
> > > 
> > > We have protection for that, and have had it for many years.  Same for
> > > USB devices.  This isn't new, perhaps you all have not noticed those
> > > features be added and taken advantage of already by many Linux distros
> > > and system images (i.e. ChromeOS and embedded systems?)
> > 
> > What protection?  I know we have an IOMMU, and that stops the device
> > stamping all over RAM by itself - but I think Intel's worries are more
> > subtle, things where the device starts playing with what PCI devices
> > are expected to do to try and trigger untested kernel paths.  I don't
> > think there's protection against that.
> > I know we can lock by PCI/USB vendor/device ID - but those can be made
> > up trivially; protection like that is meaningless.
> 
> Then combine it with device attestation and you have a solved solution,
> don't ignore others working on this please.
> 
> > > > We're now in the position we were with random USB devices years ago.
> > > 
> > > Nope, we are not, again, we already handle random PCI devices being
> > > plugged in.  It's up to userspace to make the policy decision if it
> > > should be trusted or not before the kernel has access to it.
> > > 
> > > So a meta-comment, why not just use that today?  If your guest OS can
> > > not authenticate the PCI device passed to it, don't allow the kernel to
> > > bind to it.  If it can be authenticated, wonderful, bind away!  You can
> > > do this today with no kernel changes needed.
> > 
> > Because:
> >    a) there's no good way to authenticate a PCI device yet
> >      - any nasty device can claim to have a given PCI ID.
> >    b) Even if you could, there's no man-in-the-middle protection yet.
> 
> Where is the "man" here in the middle of?

I'm worried what a malicious hypervisor could do.

> And any PCI attestation should handle that, if not, work with them to
> solve that please.

I believe the two mechanisms I mentioned above would handle that; when
it eventually gets there.

> Thunderbolt has authenticated device support today, and so does PCI, and
> USB has had it for a decade or so.  Use the in-kernel implementation
> that we already have or again, show us where it is lacking and we will
> be glad to take patches to cover the holes (as we did last year when
> ChromeOS implemented support for it in their userspace.)

I'd appreciate pointers to the implementations you're referring to.

> > > > Also we would want to make sure that any config data that the hypervisor
> > > > can pass to the guest is validated.
> > > 
> > > Define "validated" please.
> > 
> > Lets say you get something like a ACPI table or qemu fw.cfg table
> > giving details of your devices; if the hypervisor builds those in a
> > nasty way what happens?
> 
> You tell me, as we trust ACPI tables today, and if we can not, again
> then you need to change the model of what Linux does.  Why isn't the
> BIOS authentication path working properly for ACPI tables already today?
> I thought that was a long-solved problem with UEFI (if not, I'm sure the
> UEFI people would be interested.)

If it's part of the BIOS image that's measured/loaded during startup
then we're fine; if it's a table dynamically generated by the hypervisor
I'm more worried.

> Anyway, I'll wait until I see real patches as this thread seems to be
> totally vague and ignores our current best-practices for pluggable
> devices for some odd reason.

Please point people at those best practices rather than just ranting
about how pointless you feel all this is!

The patches here from Intel are a TOOL to find problems; I can't see the
objections to having a tool like this.

(I suspect some of these fixes might make the kernel a bit more robust
against unexpected hot-remove of PCIe devices as well; but that's more
of a guess)

Dave

> thanks,
> 
> greg k-h
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK