* Linux guest kernel threat model for Confidential Computing @ 2023-01-25 12:28 Reshetova, Elena 2023-01-25 12:43 ` Greg Kroah-Hartman ` (2 more replies) 0 siblings, 3 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-25 12:28 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List Hi Greg, You mentioned couple of times (last time in this recent thread: https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start discussing the updated threat model for kernel, so this email is a start in this direction. (Note: I tried to include relevant people from different companies, as well as linux-coco mailing list, but I hope everyone can help by including additional people as needed). As we have shared before in various lkml threads/conference presentations ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a change in the threat model where guest kernel doesn’t anymore trust the hypervisor. This is a big change in the threat model and requires both careful assessment of the new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations and security validation techniques. This is the activity that we have started back at Intel and the current status can be found in 1) Threat model and potential mitigations: https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html 2) One of the described in the above doc mitigations is "hardening of the enabled code". What we mean by this, as well as techniques that are being used are described in this document: https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html 3) All the tools are open-source and everyone can start using them right away even without any special HW (readme has description of what is needed). Tools and documentation is here: https://github.com/intel/ccc-linux-guest-hardening 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found here: https://github.com/intel/tdx/commits/guest-next So, my main question before we start to argue about the threat model, mitigations, etc, is what is the good way to get this reviewed to make sure everyone is aligned? There are a lot of angles and details, so what is the most efficient method? Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html into logical pieces and start submitting it to mailing list for discussion one by one? Any other methods? The original plan we had in mind is to start discussing the relevant pieces when submitting the code, i.e. when submitting the device filter patches, we will include problem statement, threat model link, data, alternatives considered, etc. Best Regards, Elena. [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/ [2] https://lpc.events/event/16/contributions/1328/ [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/ ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 12:28 Linux guest kernel threat model for Confidential Computing Reshetova, Elena @ 2023-01-25 12:43 ` Greg Kroah-Hartman 2023-01-25 13:42 ` Dr. David Alan Gilbert 2023-01-25 15:29 ` Reshetova, Elena 2023-01-30 11:36 ` Christophe de Dinechin 2023-02-07 0:27 ` Carlos Bilbao 2 siblings, 2 replies; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-01-25 12:43 UTC (permalink / raw) To: Reshetova, Elena Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > Hi Greg, > > You mentioned couple of times (last time in this recent thread: > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > discussing the updated threat model for kernel, so this email is a start in this direction. Any specific reason you didn't cc: the linux-hardening mailing list? This seems to be in their area as well, right? > As we have shared before in various lkml threads/conference presentations > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. That is, frankly, a very funny threat model. How realistic is it really given all of the other ways that a hypervisor can mess with a guest? So what do you actually trust here? The CPU? A device? Nothing? > This is a big change in the threat model and requires both careful assessment of the > new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations > and security validation techniques. This is the activity that we have started back at Intel > and the current status can be found in > > 1) Threat model and potential mitigations: > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html So you trust all of qemu but not Linux? Or am I misreading that diagram? > 2) One of the described in the above doc mitigations is "hardening of the enabled > code". What we mean by this, as well as techniques that are being used are > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html I hate the term "hardening". Please just say it for what it really is, "fixing bugs to handle broken hardware". We've done that for years when dealing with PCI and USB and even CPUs doing things that they shouldn't be doing. How is this any different in the end? So what you also are saying here now is "we do not trust any PCI devices", so please just say that (why do you trust USB devices?) If that is something that you all think that Linux should support, then let's go from there. > 3) All the tools are open-source and everyone can start using them right away even > without any special HW (readme has description of what is needed). > Tools and documentation is here: > https://github.com/intel/ccc-linux-guest-hardening Again, as our documentation states, when you submit patches based on these tools, you HAVE TO document that. Otherwise we think you all are crazy and will get your patches rejected. You all know this, why ignore it? > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found > here: https://github.com/intel/tdx/commits/guest-next Random github trees of kernel patches are just that, sorry. > So, my main question before we start to argue about the threat model, mitigations, etc, > is what is the good way to get this reviewed to make sure everyone is aligned? > There are a lot of angles and details, so what is the most efficient method? > Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html > into logical pieces and start submitting it to mailing list for discussion one by one? Yes, start out by laying out what you feel the actual problem is, what you feel should be done for it, and the patches you have proposed to implement this, for each and every logical piece. Again, nothing new here, that's how Linux is developed, again, you all know this, it's not anything I should have to say. > Any other methods? > > The original plan we had in mind is to start discussing the relevant pieces when submitting the code, > i.e. when submitting the device filter patches, we will include problem statement, threat model link, > data, alternatives considered, etc. As always, we can't do anything without actual working changes to the code, otherwise it's just a pipe dream and we can't waste our time on it (neither would you want us to). thanks, and good luck! greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 12:43 ` Greg Kroah-Hartman @ 2023-01-25 13:42 ` Dr. David Alan Gilbert 2023-01-25 14:13 ` Daniel P. Berrangé 2023-01-25 14:22 ` Greg Kroah-Hartman 2023-01-25 15:29 ` Reshetova, Elena 1 sibling, 2 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-25 13:42 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > Hi Greg, > > > > You mentioned couple of times (last time in this recent thread: > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > discussing the updated threat model for kernel, so this email is a start in this direction. > > Any specific reason you didn't cc: the linux-hardening mailing list? > This seems to be in their area as well, right? > > > As we have shared before in various lkml threads/conference presentations > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > That is, frankly, a very funny threat model. How realistic is it really > given all of the other ways that a hypervisor can mess with a guest? It's what a lot of people would like; in the early attempts it was easy to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it can mess with - remember that not just the memory is encrypted, so is the register state, and the guest gets to see changes to mapping and a lot of control over interrupt injection etc. > So what do you actually trust here? The CPU? A device? Nothing? We trust the actual physical CPU, provided that it can prove that it's a real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware can perform an attestation signed by the CPU to prove to someone external that the guest is running on a real trusted CPU. Note that the trust is limited: a) We don't trust that we can make forward progress - if something does something bad it's OK for the guest to stop. b) We don't trust devices, and we don't trust them by having the guest do normal encryption; e.g. just LUKS on the disk and normal encrypted networking. [There's a lot of schemes people are working on about how the guest gets the keys etc for that) > > This is a big change in the threat model and requires both careful assessment of the > > new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations > > and security validation techniques. This is the activity that we have started back at Intel > > and the current status can be found in > > > > 1) Threat model and potential mitigations: > > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html > > So you trust all of qemu but not Linux? Or am I misreading that > diagram? You're misreading it; This is about the grey part (i.e. the guest) not trusting the host (the white part including qemu and the host kernel). > > 2) One of the described in the above doc mitigations is "hardening of the enabled > > code". What we mean by this, as well as techniques that are being used are > > described in this document: > > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html > > I hate the term "hardening". Please just say it for what it really is, > "fixing bugs to handle broken hardware". We've done that for years when > dealing with PCI and USB and even CPUs doing things that they shouldn't > be doing. How is this any different in the end? > > So what you also are saying here now is "we do not trust any PCI > devices", so please just say that (why do you trust USB devices?) If > that is something that you all think that Linux should support, then > let's go from there. I don't think generally all PCI device drivers guard against all the nasty things that a broken implementation of their hardware can do. The USB devices are probably a bit better, because they actually worry about people walking up with a nasty HID device; I'm skeptical that a kernel would survive a purposely broken USB controller. I'm not sure the request here isn't really to make sure *all* PCI devices are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - and potentially ones that people will want to pass-through (which generally needs a lot more work to make safe). (I've not looked at these Intel tools to see what they cover) Having said that, how happy are you with Thunderbolt PCI devices being plugged into your laptop or into the hotplug NVMe slot on a server? We're now in the position we were with random USB devices years ago. Also we would want to make sure that any config data that the hypervisor can pass to the guest is validated. > > 3) All the tools are open-source and everyone can start using them right away even > > without any special HW (readme has description of what is needed). > > Tools and documentation is here: > > https://github.com/intel/ccc-linux-guest-hardening > > Again, as our documentation states, when you submit patches based on > these tools, you HAVE TO document that. Otherwise we think you all are > crazy and will get your patches rejected. You all know this, why ignore > it? > > > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found > > here: https://github.com/intel/tdx/commits/guest-next > > Random github trees of kernel patches are just that, sorry. > > > So, my main question before we start to argue about the threat model, mitigations, etc, > > is what is the good way to get this reviewed to make sure everyone is aligned? > > There are a lot of angles and details, so what is the most efficient method? > > Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html > > into logical pieces and start submitting it to mailing list for discussion one by one? > > Yes, start out by laying out what you feel the actual problem is, what > you feel should be done for it, and the patches you have proposed to > implement this, for each and every logical piece. > > Again, nothing new here, that's how Linux is developed, again, you all > know this, it's not anything I should have to say. That seems harsh. The problem seems reasonably well understood within the CoCo world - how far people want to push it probably varies; but it's good to make the problem more widely understood. > > Any other methods? > > > > The original plan we had in mind is to start discussing the relevant pieces when submitting the code, > > i.e. when submitting the device filter patches, we will include problem statement, threat model link, > > data, alternatives considered, etc. > > As always, we can't do anything without actual working changes to the > code, otherwise it's just a pipe dream and we can't waste our time on it > (neither would you want us to). > > thanks, and good luck! > > greg k-h Dave > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 13:42 ` Dr. David Alan Gilbert @ 2023-01-25 14:13 ` Daniel P. Berrangé 2023-01-25 15:29 ` Dr. David Alan Gilbert ` (2 more replies) 2023-01-25 14:22 ` Greg Kroah-Hartman 1 sibling, 3 replies; 102+ messages in thread From: Daniel P. Berrangé @ 2023-01-25 14:13 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > Hi Greg, > > > > > > You mentioned couple of times (last time in this recent thread: > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > > discussing the updated threat model for kernel, so this email is a start in this direction. > > > > Any specific reason you didn't cc: the linux-hardening mailing list? > > This seems to be in their area as well, right? > > > > > As we have shared before in various lkml threads/conference presentations > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > > > That is, frankly, a very funny threat model. How realistic is it really > > given all of the other ways that a hypervisor can mess with a guest? > > It's what a lot of people would like; in the early attempts it was easy > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it > can mess with - remember that not just the memory is encrypted, so is > the register state, and the guest gets to see changes to mapping and a > lot of control over interrupt injection etc. > > > So what do you actually trust here? The CPU? A device? Nothing? > > We trust the actual physical CPU, provided that it can prove that it's a > real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware > can perform an attestation signed by the CPU to prove to someone > external that the guest is running on a real trusted CPU. > > Note that the trust is limited: > a) We don't trust that we can make forward progress - if something > does something bad it's OK for the guest to stop. > b) We don't trust devices, and we don't trust them by having the guest > do normal encryption; e.g. just LUKS on the disk and normal encrypted > networking. [There's a lot of schemes people are working on about how > the guest gets the keys etc for that) I think we need to more precisely say what we mean by 'trust' as it can have quite a broad interpretation. As a baseline requirement, in the context of confidential computing the guest would not trust the hypervisor with data that needs to remain confidential, but would generally still expect it to provide a faithful implementation of a given device. IOW, the guest would expect the implementation of virtio-blk devices to be functionally correct per the virtio-blk specification, but would not trust the host to protect confidentiality any stored data in the disk. Any virtual device exposed to the guest that can transfer potentially sensitive data needs to have some form of guest controlled encryption applied. For disks this is easy with FDE like LUKS, for NICs this is already best practice for services by using TLS. Other devices may not have good existing options for applying encryption. If the guest has a virtual keyboard, mouse and graphical display, which is backed by a VNC/RDP server in the host, then all that is visible to the host. There's no pre-existing solutions I know can could offer easy confidentiality for basic console I/O from the start of guest firmware onwards. The best is to spawn a VNC/RDP server in the guest at some point during boot. Means you can't login to the guest in single user mode with your root password though, without compromising it. The problem also applies for common solutions today where the host passes in config data to the guest, for consumption by tools like cloud-init. This is used in the past to inject an SSH key for example, or set the guest root password. Such data received from the host can no longer be trusted, as the host can see the data, or subsitute its own SSH key(s) in order to gain access. Cloud-init needs to get its config data from a trusted source, likely an external attestation server A further challenge surrounds handling of undesirable devices. A goal of OS development has been to ensure that both coldplugged and hotplugged devices "just work" out of the box with zero guest admin config required. To some extent this is contrary to what a confidential guest will want. It doesn't want a getty spawned on any console exposed, it doesn't want to use a virtio-rng exposed by the host which could be feeding non-random. Protecting against malicious implementations of devices is conceivably interesting, as a hardening task. A malicious host may try to take advantage of the guest OS device driver impl to exploit the guest OS kernel with an end goal of getting into a state where it can be made to reveal confidential data that was otherwise protected. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:13 ` Daniel P. Berrangé @ 2023-01-25 15:29 ` Dr. David Alan Gilbert 2023-01-26 14:23 ` Richard Weinberger 2023-01-30 11:30 ` Christophe de Dinechin 2 siblings, 0 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-25 15:29 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Daniel P. Berrangé (berrange@redhat.com) wrote: > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > Hi Greg, > > > > > > > > You mentioned couple of times (last time in this recent thread: > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > > > discussing the updated threat model for kernel, so this email is a start in this direction. > > > > > > Any specific reason you didn't cc: the linux-hardening mailing list? > > > This seems to be in their area as well, right? > > > > > > > As we have shared before in various lkml threads/conference presentations > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > > > > > That is, frankly, a very funny threat model. How realistic is it really > > > given all of the other ways that a hypervisor can mess with a guest? > > > > It's what a lot of people would like; in the early attempts it was easy > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it > > can mess with - remember that not just the memory is encrypted, so is > > the register state, and the guest gets to see changes to mapping and a > > lot of control over interrupt injection etc. > > > > > So what do you actually trust here? The CPU? A device? Nothing? > > > > We trust the actual physical CPU, provided that it can prove that it's a > > real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware > > can perform an attestation signed by the CPU to prove to someone > > external that the guest is running on a real trusted CPU. > > > > Note that the trust is limited: > > a) We don't trust that we can make forward progress - if something > > does something bad it's OK for the guest to stop. > > b) We don't trust devices, and we don't trust them by having the guest > > do normal encryption; e.g. just LUKS on the disk and normal encrypted > > networking. [There's a lot of schemes people are working on about how > > the guest gets the keys etc for that) > > I think we need to more precisely say what we mean by 'trust' as it > can have quite a broad interpretation. > > As a baseline requirement, in the context of confidential computing the > guest would not trust the hypervisor with data that needs to remain > confidential, but would generally still expect it to provide a faithful > implementation of a given device. > > IOW, the guest would expect the implementation of virtio-blk devices to > be functionally correct per the virtio-blk specification, but would not > trust the host to protect confidentiality any stored data in the disk. > > Any virtual device exposed to the guest that can transfer potentially > sensitive data needs to have some form of guest controlled encryption > applied. For disks this is easy with FDE like LUKS, for NICs this is > already best practice for services by using TLS. Other devices may not > have good existing options for applying encryption. > > If the guest has a virtual keyboard, mouse and graphical display, which > is backed by a VNC/RDP server in the host, then all that is visible to the > host. There's no pre-existing solutions I know can could offer easy > confidentiality for basic console I/O from the start of guest firmware > onwards. The best is to spawn a VNC/RDP server in the guest at some > point during boot. Means you can't login to the guest in single user > mode with your root password though, without compromising it. > > The problem also applies for common solutions today where the host passes > in config data to the guest, for consumption by tools like cloud-init. > This is used in the past to inject an SSH key for example, or set the > guest root password. Such data received from the host can no longer be > trusted, as the host can see the data, or subsitute its own SSH key(s) > in order to gain access. Cloud-init needs to get its config data from > a trusted source, likely an external attestation server > > > A further challenge surrounds handling of undesirable devices. A goal > of OS development has been to ensure that both coldplugged and hotplugged > devices "just work" out of the box with zero guest admin config required. > To some extent this is contrary to what a confidential guest will want. > It doesn't want a getty spawned on any console exposed, it doesn't want > to use a virtio-rng exposed by the host which could be feeding non-random. > > > Protecting against malicious implementations of devices is conceivably > interesting, as a hardening task. A malicious host may try to take > advantage of the guest OS device driver impl to exploit the guest OS > kernel with an end goal of getting into a state where it can be made > to reveal confidential data that was otherwise protected. I think this is really what the Intel stuff is trying to protect against. Dave > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:13 ` Daniel P. Berrangé 2023-01-25 15:29 ` Dr. David Alan Gilbert @ 2023-01-26 14:23 ` Richard Weinberger 2023-01-26 14:58 ` Dr. David Alan Gilbert ` (2 more replies) 2023-01-30 11:30 ` Christophe de Dinechin 2 siblings, 3 replies; 102+ messages in thread From: Richard Weinberger @ 2023-01-26 14:23 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > Any virtual device exposed to the guest that can transfer potentially > sensitive data needs to have some form of guest controlled encryption > applied. For disks this is easy with FDE like LUKS, for NICs this is > already best practice for services by using TLS. Other devices may not > have good existing options for applying encryption. I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data but not transport. If an attacker can observe all IO you better consult a cryptographer. LUKS has no concept of session keys or such, so the same disk sector will always get encrypted with the very same key/iv. -- Thanks, //richard ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 14:23 ` Richard Weinberger @ 2023-01-26 14:58 ` Dr. David Alan Gilbert 2023-01-26 15:13 ` Richard Weinberger 2023-01-26 15:43 ` Daniel P. Berrangé 2023-01-27 11:23 ` Reshetova, Elena 2 siblings, 1 reply; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-26 14:58 UTC (permalink / raw) To: Richard Weinberger Cc: Daniel P. Berrangé, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Richard Weinberger (richard.weinberger@gmail.com) wrote: > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > Any virtual device exposed to the guest that can transfer potentially > > sensitive data needs to have some form of guest controlled encryption > > applied. For disks this is easy with FDE like LUKS, for NICs this is > > already best practice for services by using TLS. Other devices may not > > have good existing options for applying encryption. > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data > but not transport. If an attacker can observe all IO you better > consult a cryptographer. > LUKS has no concept of session keys or such, so the same disk sector will > always get encrypted with the very same key/iv. Are you aware of anything that you'd use instead? Are you happy with dm-verity for protection against modification? Dave > -- > Thanks, > //richard > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 14:58 ` Dr. David Alan Gilbert @ 2023-01-26 15:13 ` Richard Weinberger 2023-01-26 15:22 ` Dr. David Alan Gilbert ` (2 more replies) 0 siblings, 3 replies; 102+ messages in thread From: Richard Weinberger @ 2023-01-26 15:13 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Daniel P. Berrangé, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Richard Weinberger (richard.weinberger@gmail.com) wrote: > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > Any virtual device exposed to the guest that can transfer potentially > > > sensitive data needs to have some form of guest controlled encryption > > > applied. For disks this is easy with FDE like LUKS, for NICs this is > > > already best practice for services by using TLS. Other devices may not > > > have good existing options for applying encryption. > > > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data > > but not transport. If an attacker can observe all IO you better > > consult a cryptographer. > > LUKS has no concept of session keys or such, so the same disk sector will > > always get encrypted with the very same key/iv. > > Are you aware of anything that you'd use instead? Well, I'd think towards iSCSI over TLS to protect the IO transport. > Are you happy with dm-verity for protection against modification? Like LUKS (actually dm-crypt) the crypto behind is designed to protect persistent data not transport. My fear is that an attacker who is able to observe IOs can do bad things. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 15:13 ` Richard Weinberger @ 2023-01-26 15:22 ` Dr. David Alan Gilbert 2023-01-26 15:55 ` Daniel P. Berrangé 2023-01-27 9:02 ` Jörg Rödel 2 siblings, 0 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-26 15:22 UTC (permalink / raw) To: Richard Weinberger Cc: Daniel P. Berrangé, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Richard Weinberger (richard.weinberger@gmail.com) wrote: > On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > > > * Richard Weinberger (richard.weinberger@gmail.com) wrote: > > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > Any virtual device exposed to the guest that can transfer potentially > > > > sensitive data needs to have some form of guest controlled encryption > > > > applied. For disks this is easy with FDE like LUKS, for NICs this is > > > > already best practice for services by using TLS. Other devices may not > > > > have good existing options for applying encryption. > > > > > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data > > > but not transport. If an attacker can observe all IO you better > > > consult a cryptographer. > > > LUKS has no concept of session keys or such, so the same disk sector will > > > always get encrypted with the very same key/iv. > > > > Are you aware of anything that you'd use instead? > > Well, I'd think towards iSCSI over TLS to protect the IO transport. Yeh, that's not entirely crazy for VMs which tend to come off some remote storage system. > > Are you happy with dm-verity for protection against modification? > > Like LUKS (actually dm-crypt) the crypto behind is designed to protect > persistent data not transport. > My fear is that an attacker who is able to observe IOs can do bad things. Hmm, OK, I'd assumed dm-crypt was OK since it's more hashlike and unchanging. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 15:13 ` Richard Weinberger 2023-01-26 15:22 ` Dr. David Alan Gilbert @ 2023-01-26 15:55 ` Daniel P. Berrangé 2023-01-27 9:02 ` Jörg Rödel 2 siblings, 0 replies; 102+ messages in thread From: Daniel P. Berrangé @ 2023-01-26 15:55 UTC (permalink / raw) To: Richard Weinberger Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Thu, Jan 26, 2023 at 04:13:11PM +0100, Richard Weinberger wrote: > On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > > > * Richard Weinberger (richard.weinberger@gmail.com) wrote: > > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > Any virtual device exposed to the guest that can transfer potentially > > > > sensitive data needs to have some form of guest controlled encryption > > > > applied. For disks this is easy with FDE like LUKS, for NICs this is > > > > already best practice for services by using TLS. Other devices may not > > > > have good existing options for applying encryption. > > > > > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data > > > but not transport. If an attacker can observe all IO you better > > > consult a cryptographer. > > > LUKS has no concept of session keys or such, so the same disk sector will > > > always get encrypted with the very same key/iv. > > > > Are you aware of anything that you'd use instead? > > Well, I'd think towards iSCSI over TLS to protect the IO transport. That just moves the problem elsewhere though surely. The remote iSCSI server still has to persist the VMs' data, and the cloud sevice provider can observe any I/O before it hits the final hardware storage. So the remote iSCSI server needs to apply a FDE like encryption scheme for the exported iSCSI block device, and using a key only accessible to the tenant that owns the VM. It still needs to solve the same problem of having some kind of "generation ID" that can tweak the IV for each virtual disk sector, to protect against time based analysis. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 15:13 ` Richard Weinberger 2023-01-26 15:22 ` Dr. David Alan Gilbert 2023-01-26 15:55 ` Daniel P. Berrangé @ 2023-01-27 9:02 ` Jörg Rödel 2 siblings, 0 replies; 102+ messages in thread From: Jörg Rödel @ 2023-01-27 9:02 UTC (permalink / raw) To: Richard Weinberger Cc: Dr. David Alan Gilbert, Daniel P. Berrangé, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Thu, Jan 26, 2023 at 04:13:11PM +0100, Richard Weinberger wrote: > On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > > > * Richard Weinberger (richard.weinberger@gmail.com) wrote: > > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > Are you aware of anything that you'd use instead? > > Well, I'd think towards iSCSI over TLS to protect the IO transport. In the context of confidential computing this makes only sense if the scsi target is part of the trusted base, which means it needs to be attested and protected against outside attacks. Currently all CoCo implementations I know of treat disk storage as untrusted. Besides that the same problems exist with a VMs encrypted memory. The hardware does not guarantee that the HV can not fiddle with your private memory, it only guarantees that you can detect such fiddling and that the private data is encrypted. The HV can also still trace memory access patterns of confidential guests by setting the right permissions in the nested page table. So storage and memory of a CoCo VM have in common that the transport is not secure, but there are measures to detect if someone fiddles with your data on the transport or at rest, for memory implemented in hardware, and for storage in software by using dm-crypt together with dm-verity or dm-integrity. Regards, Joerg ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 14:23 ` Richard Weinberger 2023-01-26 14:58 ` Dr. David Alan Gilbert @ 2023-01-26 15:43 ` Daniel P. Berrangé 2023-01-27 11:23 ` Reshetova, Elena 2 siblings, 0 replies; 102+ messages in thread From: Daniel P. Berrangé @ 2023-01-26 15:43 UTC (permalink / raw) To: Richard Weinberger Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Thu, Jan 26, 2023 at 03:23:34PM +0100, Richard Weinberger wrote: > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > Any virtual device exposed to the guest that can transfer potentially > > sensitive data needs to have some form of guest controlled encryption > > applied. For disks this is easy with FDE like LUKS, for NICs this is > > already best practice for services by using TLS. Other devices may not > > have good existing options for applying encryption. > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data > but not transport. If an attacker can observe all IO you better > consult a cryptographer. > LUKS has no concept of session keys or such, so the same disk sector will > always get encrypted with the very same key/iv. Yes, you're right, all the FDE cipher modes are susceptible to time based analysis of I/O, so very far from ideal. You'll get protection for your historically written confidential data at the time a VM host is first compromised, but if (as) they retain long term access to the host, confidentiality is increasingly undermined the longer they can observe the ongoing I/O. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-26 14:23 ` Richard Weinberger 2023-01-26 14:58 ` Dr. David Alan Gilbert 2023-01-26 15:43 ` Daniel P. Berrangé @ 2023-01-27 11:23 ` Reshetova, Elena 2 siblings, 0 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-27 11:23 UTC (permalink / raw) To: Richard Weinberger, Daniel P. Berrangé Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> > wrote: > > Any virtual device exposed to the guest that can transfer potentially > > sensitive data needs to have some form of guest controlled encryption > > applied. For disks this is easy with FDE like LUKS, for NICs this is > > already best practice for services by using TLS. Other devices may not > > have good existing options for applying encryption. > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data > but not transport. If an attacker can observe all IO you better > consult a cryptographer. > LUKS has no concept of session keys or such, so the same disk sector will > always get encrypted with the very same key/iv. I guess you are referring to the aes-xts-plain64 mode of LUKS operation or to LUKS in general? Different modes of operation (including AEAD modes) can provide different levels of protection, so I would not state it so generally. But the point you raised is good to discuss through: XTS for example is a confidentiality mode, based on a concept of tweakable blockcipher, designed as you pointed out with disk encryption use case in mind. It does have a bunch of limitations/ weaknesses that are known (good classical reference I can suggest on this is [1]), but as any blockcipher mode its confidentiality guarantees are evaluated in terms of security against a chosen ciphertext attack (CCA) where an adversary has an access to both encryption and decryption oracle (he can perform encryptions and decryptions of plaintexts/cyphertexts of his liking up to the allowed number of queries). This is a very powerful attack model which to me seems to cover the model of untrusted host/VMM being able to observe disk reads/writes. Also, if I remember right, the disk encryption also assumes that the disk operations are fully visible to the attacker, i.e. he can see all encrypted data on the disk, observe how it changes when a new block is written, etc. So, where do we have a change in an attacker model here? What am I missing? What AES XTS was never designed to do is an integrity protection (only some very limited malleability): it is not AEAD mode, it doesn’t also provide a replay protection. So, the same limitations are going to apply in our case also. Best Regards, Elena. [1] Chapter 6. XTS mode, https://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:13 ` Daniel P. Berrangé 2023-01-25 15:29 ` Dr. David Alan Gilbert 2023-01-26 14:23 ` Richard Weinberger @ 2023-01-30 11:30 ` Christophe de Dinechin 2 siblings, 0 replies; 102+ messages in thread From: Christophe de Dinechin @ 2023-01-30 11:30 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On 2023-01-25 at 14:13 UTC, Daniel P. Berrangé <berrange@redhat.com> wrote... > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: >> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: >> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: >> > > Hi Greg, >> > > >> > > You mentioned couple of times (last time in this recent thread: >> > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start >> > > discussing the updated threat model for kernel, so this email is a start in this direction. >> > >> > Any specific reason you didn't cc: the linux-hardening mailing list? >> > This seems to be in their area as well, right? >> > >> > > As we have shared before in various lkml threads/conference presentations >> > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a >> > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. >> > >> > That is, frankly, a very funny threat model. How realistic is it really >> > given all of the other ways that a hypervisor can mess with a guest? >> >> It's what a lot of people would like; in the early attempts it was easy >> to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it >> can mess with - remember that not just the memory is encrypted, so is >> the register state, and the guest gets to see changes to mapping and a >> lot of control over interrupt injection etc. >> >> > So what do you actually trust here? The CPU? A device? Nothing? >> >> We trust the actual physical CPU, provided that it can prove that it's a >> real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware >> can perform an attestation signed by the CPU to prove to someone >> external that the guest is running on a real trusted CPU. >> >> Note that the trust is limited: >> a) We don't trust that we can make forward progress - if something >> does something bad it's OK for the guest to stop. >> b) We don't trust devices, and we don't trust them by having the guest >> do normal encryption; e.g. just LUKS on the disk and normal encrypted >> networking. [There's a lot of schemes people are working on about how >> the guest gets the keys etc for that) > > I think we need to more precisely say what we mean by 'trust' as it > can have quite a broad interpretation. > > As a baseline requirement, in the context of confidential computing the > guest would not trust the hypervisor with data that needs to remain > confidential, but would generally still expect it to provide a faithful > implementation of a given device. ... or to have a reliable faulting behaviour (e.g. panic) if the device is found to be malicious, e.g. attempting to inject bogus data in the driver to trigger unexpected paths in the guest kernel. I think that part of the original discussion is really about being able to do that at least for the small subset of (mostly virtio) devices that would typically be of use in a CoCo setup. As was pointed out elsewhere in that thread, doing so for physical devices, to the point of enabling end-to-end attestation and encryption, is work that is presently underway, but there is work to do already with the comparatively small subset of devices we need in the short-term. Also, that work needs only the Linux kernel community, whereas changes for example at the PCI level are much broader, and therefore require a lot more time. -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 13:42 ` Dr. David Alan Gilbert 2023-01-25 14:13 ` Daniel P. Berrangé @ 2023-01-25 14:22 ` Greg Kroah-Hartman 2023-01-25 14:30 ` James Bottomley ` (3 more replies) 1 sibling, 4 replies; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-01-25 14:22 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > Hi Greg, > > > > > > You mentioned couple of times (last time in this recent thread: > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > > discussing the updated threat model for kernel, so this email is a start in this direction. > > > > Any specific reason you didn't cc: the linux-hardening mailing list? > > This seems to be in their area as well, right? > > > > > As we have shared before in various lkml threads/conference presentations > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > > > That is, frankly, a very funny threat model. How realistic is it really > > given all of the other ways that a hypervisor can mess with a guest? > > It's what a lot of people would like; in the early attempts it was easy > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it > can mess with - remember that not just the memory is encrypted, so is > the register state, and the guest gets to see changes to mapping and a > lot of control over interrupt injection etc. And due to the fact that SEV and TDX really do not work, how is anyone expecting any of this to work? As one heckler on IRC recently put it, if you squint hard enough, you can kind of ignore the real-world issues here, so perhaps this should all be called "squint-puting" in order to feel like you have a "confidential" system? :) > > So what do you actually trust here? The CPU? A device? Nothing? > > We trust the actual physical CPU, provided that it can prove that it's a > real CPU with the CoCo hardware enabled. Great, so why not have hardware attestation also for your devices you wish to talk to? Why not use that as well? Then you don't have to worry about anything in the guest. > Both the SNP and TDX hardware > can perform an attestation signed by the CPU to prove to someone > external that the guest is running on a real trusted CPU. And again, do the same thing for the other hardware devices and all is good. To not do that is to just guess and wave hands. You know this :) > Note that the trust is limited: > a) We don't trust that we can make forward progress - if something > does something bad it's OK for the guest to stop. So the guest can stop itself? > b) We don't trust devices, and we don't trust them by having the guest > do normal encryption; e.g. just LUKS on the disk and normal encrypted > networking. [There's a lot of schemes people are working on about how > the guest gets the keys etc for that) How do you trust you got real data on the disk? On the network? Those are coming from the host, how is any of that data to be trusted? Where does the trust stop and why? > > I hate the term "hardening". Please just say it for what it really is, > > "fixing bugs to handle broken hardware". We've done that for years when > > dealing with PCI and USB and even CPUs doing things that they shouldn't > > be doing. How is this any different in the end? > > > > So what you also are saying here now is "we do not trust any PCI > > devices", so please just say that (why do you trust USB devices?) If > > that is something that you all think that Linux should support, then > > let's go from there. > > I don't think generally all PCI device drivers guard against all the > nasty things that a broken implementation of their hardware can do. I know that all PCI drivers can NOT do that today as that was never anything that Linux was designed for. > The USB devices are probably a bit better, because they actually worry > about people walking up with a nasty HID device; I'm skeptical that > a kernel would survive a purposely broken USB controller. I agree with you there, USB drivers are only starting to be fuzzed at the descriptor level, that's all. Which is why they too can be put into the "untrusted" area until you trust them. > I'm not sure the request here isn't really to make sure *all* PCI devices > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - > and potentially ones that people will want to pass-through (which > generally needs a lot more work to make safe). > (I've not looked at these Intel tools to see what they cover) Why not just create a whole new bus path for these "trusted" devices to attach to and do that instead of tyring to emulate a protocol that was explicitly designed NOT to this model at all? Why are you trying to shoehorn something here and not just designing it properly from the beginning? > Having said that, how happy are you with Thunderbolt PCI devices being > plugged into your laptop or into the hotplug NVMe slot on a server? We have protection for that, and have had it for many years. Same for USB devices. This isn't new, perhaps you all have not noticed those features be added and taken advantage of already by many Linux distros and system images (i.e. ChromeOS and embedded systems?) > We're now in the position we were with random USB devices years ago. Nope, we are not, again, we already handle random PCI devices being plugged in. It's up to userspace to make the policy decision if it should be trusted or not before the kernel has access to it. So a meta-comment, why not just use that today? If your guest OS can not authenticate the PCI device passed to it, don't allow the kernel to bind to it. If it can be authenticated, wonderful, bind away! You can do this today with no kernel changes needed. > Also we would want to make sure that any config data that the hypervisor > can pass to the guest is validated. Define "validated" please. > The problem seems reasonably well understood within the CoCo world - how > far people want to push it probably varies; but it's good to make the > problem more widely understood. The "CoCo" world seems distant and separate from the real-world of Linux kernel development if you all do not even know about the authentication methods that we have for years for enabling access to PCI and USB devices as described above. If the impementations that we currently have are lacking in some way, wonderful, please submit changes for them and we will be glad to review them as needed. Remember, it's up to you all to convince us that your changes make actual sense and are backed up with working implementations. Not us :) good luck! greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:22 ` Greg Kroah-Hartman @ 2023-01-25 14:30 ` James Bottomley 2023-01-25 14:57 ` Dr. David Alan Gilbert ` (2 subsequent siblings) 3 siblings, 0 replies; 102+ messages in thread From: James Bottomley @ 2023-01-25 14:30 UTC (permalink / raw) To: Greg Kroah-Hartman, Dr. David Alan Gilbert Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, 2023-01-25 at 15:22 +0100, Greg Kroah-Hartman wrote: > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert > wrote: > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > Hi Greg, > > > > > > > > You mentioned couple of times (last time in this recent thread: > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that > > > > we ought to start > > > > discussing the updated threat model for kernel, so this email > > > > is a start in this direction. > > > > > > Any specific reason you didn't cc: the linux-hardening mailing > > > list? This seems to be in their area as well, right? > > > > > > > As we have shared before in various lkml threads/conference > > > > presentations ([1], [2], [3] and many others), for the > > > > Confidential Computing guest kernel, we have a change in the > > > > threat model where guest kernel doesn’t anymore trust the > > > > hypervisor. > > > > > > That is, frankly, a very funny threat model. How realistic is it > > > really given all of the other ways that a hypervisor can mess > > > with a guest? > > > > It's what a lot of people would like; in the early attempts it was > > easy to defeat, but in TDX and SEV-SNP the hypervisor has a lot > > less that it can mess with - remember that not just the memory is > > encrypted, so is the register state, and the guest gets to see > > changes to mapping and a lot of control over interrupt injection > > etc. > > And due to the fact that SEV and TDX really do not work, how is > anyone expecting any of this to work? As one heckler on IRC recently > put it, if you squint hard enough, you can kind of ignore the real- > world issues here, so perhaps this should all be called "squint- > puting" in order to feel like you have a "confidential" system? :) There's a difference between no trust, which requires defeating all attacks as they occur and limited trust, which merely means you want to detect an attack from the limited trust entity to show that trust was violated. Trying to achieve the former with CC is a good academic exercise, but not required for the technology to be useful. Most cloud providers are working towards the latter ... we know there are holes, but as long as the guest can always detect interference they can be confident in their trust in the CSP not to attack them via various hypervisor mechanisms. James ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:22 ` Greg Kroah-Hartman 2023-01-25 14:30 ` James Bottomley @ 2023-01-25 14:57 ` Dr. David Alan Gilbert 2023-01-25 15:16 ` Greg Kroah-Hartman 2023-01-25 21:53 ` Lukas Wunner 2023-01-25 20:13 ` Jiri Kosina 2023-01-26 13:13 ` Reshetova, Elena 3 siblings, 2 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-25 14:57 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > Hi Greg, > > > > > > > > You mentioned couple of times (last time in this recent thread: > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > > > discussing the updated threat model for kernel, so this email is a start in this direction. > > > > > > Any specific reason you didn't cc: the linux-hardening mailing list? > > > This seems to be in their area as well, right? > > > > > > > As we have shared before in various lkml threads/conference presentations > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > > > > > That is, frankly, a very funny threat model. How realistic is it really > > > given all of the other ways that a hypervisor can mess with a guest? > > > > It's what a lot of people would like; in the early attempts it was easy > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it > > can mess with - remember that not just the memory is encrypted, so is > > the register state, and the guest gets to see changes to mapping and a > > lot of control over interrupt injection etc. > > And due to the fact that SEV and TDX really do not work, how is anyone > expecting any of this to work? As one heckler on IRC recently put it, > if you squint hard enough, you can kind of ignore the real-world issues > here, so perhaps this should all be called "squint-puting" in order to > feel like you have a "confidential" system? :) I agree the original SEV was that weak; I've not seen anyone give a good argument against SNP or TDX. > > > So what do you actually trust here? The CPU? A device? Nothing? > > > > We trust the actual physical CPU, provided that it can prove that it's a > > real CPU with the CoCo hardware enabled. > > Great, so why not have hardware attestation also for your devices you > wish to talk to? Why not use that as well? Then you don't have to > worry about anything in the guest. There were some talks at Plumbers where PCIe is working on adding that; it's not there yet though. I think that's PCIe 'Integrity and Data Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - SPDM. I don't know much of the detail of those, just that they're far enough off that people aren't depending on them yet. > > Both the SNP and TDX hardware > > can perform an attestation signed by the CPU to prove to someone > > external that the guest is running on a real trusted CPU. > > And again, do the same thing for the other hardware devices and all is > good. To not do that is to just guess and wave hands. You know this :) That wouldn't help you necessarily for virtual devices - where the hypervisor implements the device (like a virtual NIC). > > Note that the trust is limited: > > a) We don't trust that we can make forward progress - if something > > does something bad it's OK for the guest to stop. > > So the guest can stop itself? Sure. > > b) We don't trust devices, and we don't trust them by having the guest > > do normal encryption; e.g. just LUKS on the disk and normal encrypted > > networking. [There's a lot of schemes people are working on about how > > the guest gets the keys etc for that) > > How do you trust you got real data on the disk? On the network? Those > are coming from the host, how is any of that data to be trusted? Where > does the trust stop and why? We don't; you use LUKS2 on the disk and/or dm-verity; so there's no trust in the disk. You use whatever your favorite network encryption already is that you're using to send data across the untrusted net. So no trust in the data from the NIC. > > > I hate the term "hardening". Please just say it for what it really is, > > > "fixing bugs to handle broken hardware". We've done that for years when > > > dealing with PCI and USB and even CPUs doing things that they shouldn't > > > be doing. How is this any different in the end? > > > > > > So what you also are saying here now is "we do not trust any PCI > > > devices", so please just say that (why do you trust USB devices?) If > > > that is something that you all think that Linux should support, then > > > let's go from there. > > > > I don't think generally all PCI device drivers guard against all the > > nasty things that a broken implementation of their hardware can do. > > I know that all PCI drivers can NOT do that today as that was never > anything that Linux was designed for. Agreed; which again is why I only really worry about the subset of devices I'd want in a CoCo VM. > > The USB devices are probably a bit better, because they actually worry > > about people walking up with a nasty HID device; I'm skeptical that > > a kernel would survive a purposely broken USB controller. > > I agree with you there, USB drivers are only starting to be fuzzed at > the descriptor level, that's all. Which is why they too can be put into > the "untrusted" area until you trust them. > > > I'm not sure the request here isn't really to make sure *all* PCI devices > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - > > and potentially ones that people will want to pass-through (which > > generally needs a lot more work to make safe). > > (I've not looked at these Intel tools to see what they cover) > > Why not just create a whole new bus path for these "trusted" devices to > attach to and do that instead of tyring to emulate a protocol that was > explicitly designed NOT to this model at all? Why are you trying to > shoehorn something here and not just designing it properly from the > beginning? I'd be kind of OK with that for the virtual devices; but: a) I think you'd start reinventing PCIe with enumeration etc b) We do want those pass through NICs etc that are PCIe - as long as you use normal guest crypto stuff then the host can be just as nasty as it likes with the data they present. c) The world has enough bus protocols, and people understand the basics of PCI(e) - we really don't need another one. > > Having said that, how happy are you with Thunderbolt PCI devices being > > plugged into your laptop or into the hotplug NVMe slot on a server? > > We have protection for that, and have had it for many years. Same for > USB devices. This isn't new, perhaps you all have not noticed those > features be added and taken advantage of already by many Linux distros > and system images (i.e. ChromeOS and embedded systems?) What protection? I know we have an IOMMU, and that stops the device stamping all over RAM by itself - but I think Intel's worries are more subtle, things where the device starts playing with what PCI devices are expected to do to try and trigger untested kernel paths. I don't think there's protection against that. I know we can lock by PCI/USB vendor/device ID - but those can be made up trivially; protection like that is meaningless. > > We're now in the position we were with random USB devices years ago. > > Nope, we are not, again, we already handle random PCI devices being > plugged in. It's up to userspace to make the policy decision if it > should be trusted or not before the kernel has access to it. > > So a meta-comment, why not just use that today? If your guest OS can > not authenticate the PCI device passed to it, don't allow the kernel to > bind to it. If it can be authenticated, wonderful, bind away! You can > do this today with no kernel changes needed. Because: a) there's no good way to authenticate a PCI device yet - any nasty device can claim to have a given PCI ID. b) Even if you could, there's no man-in-the-middle protection yet. > > Also we would want to make sure that any config data that the hypervisor > > can pass to the guest is validated. > > Define "validated" please. Lets say you get something like a ACPI table or qemu fw.cfg table giving details of your devices; if the hypervisor builds those in a nasty way what happens? > > The problem seems reasonably well understood within the CoCo world - how > > far people want to push it probably varies; but it's good to make the > > problem more widely understood. > > The "CoCo" world seems distant and separate from the real-world of Linux > kernel development if you all do not even know about the authentication > methods that we have for years for enabling access to PCI and USB > devices as described above. If the impementations that we currently > have are lacking in some way, wonderful, please submit changes for them > and we will be glad to review them as needed. That's probably fair to some degree - the people looking at this are VM people, not desktop people; I'm not sure what the overlap is; but as I say above, I don't think the protection currently available really help here. Please show us where we're wrong. > Remember, it's up to you all to convince us that your changes make > actual sense and are backed up with working implementations. Not us :) Sure; I'm seeing existing implementations being used in vendors clouds at the moment, and they're slowly getting the security people want. I'd like to see that being done with upstream kernels and firmware. Dave > good luck! > > greg k-h > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:57 ` Dr. David Alan Gilbert @ 2023-01-25 15:16 ` Greg Kroah-Hartman 2023-01-25 15:45 ` Michael S. Tsirkin ` (3 more replies) 2023-01-25 21:53 ` Lukas Wunner 1 sibling, 4 replies; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-01-25 15:16 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: > > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > Hi Greg, > > > > > > > > > > You mentioned couple of times (last time in this recent thread: > > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > > > > discussing the updated threat model for kernel, so this email is a start in this direction. > > > > > > > > Any specific reason you didn't cc: the linux-hardening mailing list? > > > > This seems to be in their area as well, right? > > > > > > > > > As we have shared before in various lkml threads/conference presentations > > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > > > > > > > That is, frankly, a very funny threat model. How realistic is it really > > > > given all of the other ways that a hypervisor can mess with a guest? > > > > > > It's what a lot of people would like; in the early attempts it was easy > > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it > > > can mess with - remember that not just the memory is encrypted, so is > > > the register state, and the guest gets to see changes to mapping and a > > > lot of control over interrupt injection etc. > > > > And due to the fact that SEV and TDX really do not work, how is anyone > > expecting any of this to work? As one heckler on IRC recently put it, > > if you squint hard enough, you can kind of ignore the real-world issues > > here, so perhaps this should all be called "squint-puting" in order to > > feel like you have a "confidential" system? :) > > I agree the original SEV was that weak; I've not seen anyone give a good > argument against SNP or TDX. Argument that it doesn't work? I thought that ship sailed a long time ago but I could be wrong as I don't really pay attention to that stuff as it's just vaporware :) > > > > So what do you actually trust here? The CPU? A device? Nothing? > > > > > > We trust the actual physical CPU, provided that it can prove that it's a > > > real CPU with the CoCo hardware enabled. > > > > Great, so why not have hardware attestation also for your devices you > > wish to talk to? Why not use that as well? Then you don't have to > > worry about anything in the guest. > > There were some talks at Plumbers where PCIe is working on adding that; > it's not there yet though. I think that's PCIe 'Integrity and Data > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > SPDM. I don't know much of the detail of those, just that they're far > enough off that people aren't depending on them yet. Then work with those groups to implement that in an industry-wide way and then take advantage of it by adding support for it to Linux! Don't try to reinvent the same thing in a totally different way please. > > > Both the SNP and TDX hardware > > > can perform an attestation signed by the CPU to prove to someone > > > external that the guest is running on a real trusted CPU. > > > > And again, do the same thing for the other hardware devices and all is > > good. To not do that is to just guess and wave hands. You know this :) > > That wouldn't help you necessarily for virtual devices - where the > hypervisor implements the device (like a virtual NIC). Then create a new bus for that if you don't trust the virtio bus today. > > > > I hate the term "hardening". Please just say it for what it really is, > > > > "fixing bugs to handle broken hardware". We've done that for years when > > > > dealing with PCI and USB and even CPUs doing things that they shouldn't > > > > be doing. How is this any different in the end? > > > > > > > > So what you also are saying here now is "we do not trust any PCI > > > > devices", so please just say that (why do you trust USB devices?) If > > > > that is something that you all think that Linux should support, then > > > > let's go from there. > > > > > > I don't think generally all PCI device drivers guard against all the > > > nasty things that a broken implementation of their hardware can do. > > > > I know that all PCI drivers can NOT do that today as that was never > > anything that Linux was designed for. > > Agreed; which again is why I only really worry about the subset of > devices I'd want in a CoCo VM. Everyone wants a subset, different from other's subset, which means you need them all. Sorry. > > > The USB devices are probably a bit better, because they actually worry > > > about people walking up with a nasty HID device; I'm skeptical that > > > a kernel would survive a purposely broken USB controller. > > > > I agree with you there, USB drivers are only starting to be fuzzed at > > the descriptor level, that's all. Which is why they too can be put into > > the "untrusted" area until you trust them. > > > > > I'm not sure the request here isn't really to make sure *all* PCI devices > > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - > > > and potentially ones that people will want to pass-through (which > > > generally needs a lot more work to make safe). > > > (I've not looked at these Intel tools to see what they cover) > > > > Why not just create a whole new bus path for these "trusted" devices to > > attach to and do that instead of tyring to emulate a protocol that was > > explicitly designed NOT to this model at all? Why are you trying to > > shoehorn something here and not just designing it properly from the > > beginning? > > I'd be kind of OK with that for the virtual devices; but: > > a) I think you'd start reinventing PCIe with enumeration etc Great, then work with the PCI group as talked about above to solve it properly and not do whack-a-mole like seems to be happening so far. > b) We do want those pass through NICs etc that are PCIe > - as long as you use normal guest crypto stuff then the host > can be just as nasty as it likes with the data they present. Great, work with the PCI spec for verified devices. > c) The world has enough bus protocols, and people understand the > basics of PCI(e) - we really don't need another one. Great, work with the PCI spec people please. > > > Having said that, how happy are you with Thunderbolt PCI devices being > > > plugged into your laptop or into the hotplug NVMe slot on a server? > > > > We have protection for that, and have had it for many years. Same for > > USB devices. This isn't new, perhaps you all have not noticed those > > features be added and taken advantage of already by many Linux distros > > and system images (i.e. ChromeOS and embedded systems?) > > What protection? I know we have an IOMMU, and that stops the device > stamping all over RAM by itself - but I think Intel's worries are more > subtle, things where the device starts playing with what PCI devices > are expected to do to try and trigger untested kernel paths. I don't > think there's protection against that. > I know we can lock by PCI/USB vendor/device ID - but those can be made > up trivially; protection like that is meaningless. Then combine it with device attestation and you have a solved solution, don't ignore others working on this please. > > > We're now in the position we were with random USB devices years ago. > > > > Nope, we are not, again, we already handle random PCI devices being > > plugged in. It's up to userspace to make the policy decision if it > > should be trusted or not before the kernel has access to it. > > > > So a meta-comment, why not just use that today? If your guest OS can > > not authenticate the PCI device passed to it, don't allow the kernel to > > bind to it. If it can be authenticated, wonderful, bind away! You can > > do this today with no kernel changes needed. > > Because: > a) there's no good way to authenticate a PCI device yet > - any nasty device can claim to have a given PCI ID. > b) Even if you could, there's no man-in-the-middle protection yet. Where is the "man" here in the middle of? And any PCI attestation should handle that, if not, work with them to solve that please. Thunderbolt has authenticated device support today, and so does PCI, and USB has had it for a decade or so. Use the in-kernel implementation that we already have or again, show us where it is lacking and we will be glad to take patches to cover the holes (as we did last year when ChromeOS implemented support for it in their userspace.) > > > Also we would want to make sure that any config data that the hypervisor > > > can pass to the guest is validated. > > > > Define "validated" please. > > Lets say you get something like a ACPI table or qemu fw.cfg table > giving details of your devices; if the hypervisor builds those in a > nasty way what happens? You tell me, as we trust ACPI tables today, and if we can not, again then you need to change the model of what Linux does. Why isn't the BIOS authentication path working properly for ACPI tables already today? I thought that was a long-solved problem with UEFI (if not, I'm sure the UEFI people would be interested.) Anyway, I'll wait until I see real patches as this thread seems to be totally vague and ignores our current best-practices for pluggable devices for some odd reason. thanks, greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:16 ` Greg Kroah-Hartman @ 2023-01-25 15:45 ` Michael S. Tsirkin 2023-01-25 16:02 ` Kirill A. Shutemov 2023-01-25 15:50 ` Dr. David Alan Gilbert ` (2 subsequent siblings) 3 siblings, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-25 15:45 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Dr. David Alan Gilbert, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote: > Everyone wants a subset, different from other's subset, which means you > need them all. Sorry. Well if there's a very popular system (virtual in this case) that needs a specific config to work well, then I guess arch/x86/configs/ccguest.config or whatever might be acceptable, no? Lots of precedent here. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:45 ` Michael S. Tsirkin @ 2023-01-25 16:02 ` Kirill A. Shutemov 2023-01-25 17:47 ` Michael S. Tsirkin 0 siblings, 1 reply; 102+ messages in thread From: Kirill A. Shutemov @ 2023-01-25 16:02 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Greg Kroah-Hartman, Dr. David Alan Gilbert, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 10:45:48AM -0500, Michael S. Tsirkin wrote: > On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote: > > Everyone wants a subset, different from other's subset, which means you > > need them all. Sorry. > > Well if there's a very popular system (virtual in this case) that needs > a specific config to work well, then I guess > arch/x86/configs/ccguest.config or whatever might be acceptable, no? > Lots of precedent here. OS vendors want the single kernel that fits all sizes: it should be possible (and secure) to run a generic disto kernel within TDX/SEV guest. -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 16:02 ` Kirill A. Shutemov @ 2023-01-25 17:47 ` Michael S. Tsirkin 0 siblings, 0 replies; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-25 17:47 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Greg Kroah-Hartman, Dr. David Alan Gilbert, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 07:02:03PM +0300, Kirill A. Shutemov wrote: > On Wed, Jan 25, 2023 at 10:45:48AM -0500, Michael S. Tsirkin wrote: > > On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote: > > > Everyone wants a subset, different from other's subset, which means you > > > need them all. Sorry. > > > > Well if there's a very popular system (virtual in this case) that needs > > a specific config to work well, then I guess > > arch/x86/configs/ccguest.config or whatever might be acceptable, no? > > Lots of precedent here. > > OS vendors want the single kernel that fits all sizes: it should be > possible (and secure) to run a generic disto kernel within TDX/SEV guest. If they want that, sure. But it then becomes this distro's responsibility to configure things in a sane way. At least if there's a known good config that's a place to document what is known to work well. No? -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:16 ` Greg Kroah-Hartman 2023-01-25 15:45 ` Michael S. Tsirkin @ 2023-01-25 15:50 ` Dr. David Alan Gilbert 2023-01-25 18:47 ` Jiri Kosina 2023-01-26 9:19 ` Jörg Rödel 3 siblings, 0 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-25 15:50 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote: > > > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > Hi Greg, > > > > > > > > > > > > You mentioned couple of times (last time in this recent thread: > > > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > > > > > > discussing the updated threat model for kernel, so this email is a start in this direction. > > > > > > > > > > Any specific reason you didn't cc: the linux-hardening mailing list? > > > > > This seems to be in their area as well, right? > > > > > > > > > > > As we have shared before in various lkml threads/conference presentations > > > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > > > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > > > > > > > > > > That is, frankly, a very funny threat model. How realistic is it really > > > > > given all of the other ways that a hypervisor can mess with a guest? > > > > > > > > It's what a lot of people would like; in the early attempts it was easy > > > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it > > > > can mess with - remember that not just the memory is encrypted, so is > > > > the register state, and the guest gets to see changes to mapping and a > > > > lot of control over interrupt injection etc. > > > > > > And due to the fact that SEV and TDX really do not work, how is anyone > > > expecting any of this to work? As one heckler on IRC recently put it, > > > if you squint hard enough, you can kind of ignore the real-world issues > > > here, so perhaps this should all be called "squint-puting" in order to > > > feel like you have a "confidential" system? :) > > > > I agree the original SEV was that weak; I've not seen anyone give a good > > argument against SNP or TDX. > > Argument that it doesn't work? I thought that ship sailed a long time > ago but I could be wrong as I don't really pay attention to that stuff > as it's just vaporware :) You're being unfair claiming it's vaporware. You can go out and buy SNP hardware now (for over a year), the patches are on list and under review (and have been for quite a while). If you're claiming it doesn't, please justify it. > > > > > So what do you actually trust here? The CPU? A device? Nothing? > > > > > > > > We trust the actual physical CPU, provided that it can prove that it's a > > > > real CPU with the CoCo hardware enabled. > > > > > > Great, so why not have hardware attestation also for your devices you > > > wish to talk to? Why not use that as well? Then you don't have to > > > worry about anything in the guest. > > > > There were some talks at Plumbers where PCIe is working on adding that; > > it's not there yet though. I think that's PCIe 'Integrity and Data > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > SPDM. I don't know much of the detail of those, just that they're far > > enough off that people aren't depending on them yet. > > Then work with those groups to implement that in an industry-wide way > and then take advantage of it by adding support for it to Linux! Don't > try to reinvent the same thing in a totally different way please. Sure, people are working with them; but those are going to take time and people want to use existing PCIe devices; and given that the hosts are available that seems reasonable. > > > > Both the SNP and TDX hardware > > > > can perform an attestation signed by the CPU to prove to someone > > > > external that the guest is running on a real trusted CPU. > > > > > > And again, do the same thing for the other hardware devices and all is > > > good. To not do that is to just guess and wave hands. You know this :) > > > > That wouldn't help you necessarily for virtual devices - where the > > hypervisor implements the device (like a virtual NIC). > > Then create a new bus for that if you don't trust the virtio bus today. It's not that I distrust the virtio bus - just that we need to make sure it's implementation is pessimistic enough for CoCo. > > > > > I hate the term "hardening". Please just say it for what it really is, > > > > > "fixing bugs to handle broken hardware". We've done that for years when > > > > > dealing with PCI and USB and even CPUs doing things that they shouldn't > > > > > be doing. How is this any different in the end? > > > > > > > > > > So what you also are saying here now is "we do not trust any PCI > > > > > devices", so please just say that (why do you trust USB devices?) If > > > > > that is something that you all think that Linux should support, then > > > > > let's go from there. > > > > > > > > I don't think generally all PCI device drivers guard against all the > > > > nasty things that a broken implementation of their hardware can do. > > > > > > I know that all PCI drivers can NOT do that today as that was never > > > anything that Linux was designed for. > > > > Agreed; which again is why I only really worry about the subset of > > devices I'd want in a CoCo VM. > > Everyone wants a subset, different from other's subset, which means you > need them all. Sorry. I think for CoCo the subset is fairly small, even including all the people discussing it. It's the virtual devices, and a few of their favourite physical devices, but a fairly small subset. > > > > The USB devices are probably a bit better, because they actually worry > > > > about people walking up with a nasty HID device; I'm skeptical that > > > > a kernel would survive a purposely broken USB controller. > > > > > > I agree with you there, USB drivers are only starting to be fuzzed at > > > the descriptor level, that's all. Which is why they too can be put into > > > the "untrusted" area until you trust them. > > > > > > > I'm not sure the request here isn't really to make sure *all* PCI devices > > > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - > > > > and potentially ones that people will want to pass-through (which > > > > generally needs a lot more work to make safe). > > > > (I've not looked at these Intel tools to see what they cover) > > > > > > Why not just create a whole new bus path for these "trusted" devices to > > > attach to and do that instead of tyring to emulate a protocol that was > > > explicitly designed NOT to this model at all? Why are you trying to > > > shoehorn something here and not just designing it properly from the > > > beginning? > > > > I'd be kind of OK with that for the virtual devices; but: > > > > a) I think you'd start reinventing PCIe with enumeration etc > > Great, then work with the PCI group as talked about above to solve it > properly and not do whack-a-mole like seems to be happening so far. > > > b) We do want those pass through NICs etc that are PCIe > > - as long as you use normal guest crypto stuff then the host > > can be just as nasty as it likes with the data they present. > > Great, work with the PCI spec for verified devices. > > > c) The world has enough bus protocols, and people understand the > > basics of PCI(e) - we really don't need another one. > > Great, work with the PCI spec people please. As I say above; all happening - but it's going to take years. It's wrong to leave users with less secure solutions if there are simple fixes available. I agree that if it involves major pain all over then I can see your dislike - but if it's small fixes then what's the problem? > > > > Having said that, how happy are you with Thunderbolt PCI devices being > > > > plugged into your laptop or into the hotplug NVMe slot on a server? > > > > > > We have protection for that, and have had it for many years. Same for > > > USB devices. This isn't new, perhaps you all have not noticed those > > > features be added and taken advantage of already by many Linux distros > > > and system images (i.e. ChromeOS and embedded systems?) > > > > What protection? I know we have an IOMMU, and that stops the device > > stamping all over RAM by itself - but I think Intel's worries are more > > subtle, things where the device starts playing with what PCI devices > > are expected to do to try and trigger untested kernel paths. I don't > > think there's protection against that. > > I know we can lock by PCI/USB vendor/device ID - but those can be made > > up trivially; protection like that is meaningless. > > Then combine it with device attestation and you have a solved solution, > don't ignore others working on this please. > > > > > We're now in the position we were with random USB devices years ago. > > > > > > Nope, we are not, again, we already handle random PCI devices being > > > plugged in. It's up to userspace to make the policy decision if it > > > should be trusted or not before the kernel has access to it. > > > > > > So a meta-comment, why not just use that today? If your guest OS can > > > not authenticate the PCI device passed to it, don't allow the kernel to > > > bind to it. If it can be authenticated, wonderful, bind away! You can > > > do this today with no kernel changes needed. > > > > Because: > > a) there's no good way to authenticate a PCI device yet > > - any nasty device can claim to have a given PCI ID. > > b) Even if you could, there's no man-in-the-middle protection yet. > > Where is the "man" here in the middle of? I'm worried what a malicious hypervisor could do. > And any PCI attestation should handle that, if not, work with them to > solve that please. I believe the two mechanisms I mentioned above would handle that; when it eventually gets there. > Thunderbolt has authenticated device support today, and so does PCI, and > USB has had it for a decade or so. Use the in-kernel implementation > that we already have or again, show us where it is lacking and we will > be glad to take patches to cover the holes (as we did last year when > ChromeOS implemented support for it in their userspace.) I'd appreciate pointers to the implementations you're referring to. > > > > Also we would want to make sure that any config data that the hypervisor > > > > can pass to the guest is validated. > > > > > > Define "validated" please. > > > > Lets say you get something like a ACPI table or qemu fw.cfg table > > giving details of your devices; if the hypervisor builds those in a > > nasty way what happens? > > You tell me, as we trust ACPI tables today, and if we can not, again > then you need to change the model of what Linux does. Why isn't the > BIOS authentication path working properly for ACPI tables already today? > I thought that was a long-solved problem with UEFI (if not, I'm sure the > UEFI people would be interested.) If it's part of the BIOS image that's measured/loaded during startup then we're fine; if it's a table dynamically generated by the hypervisor I'm more worried. > Anyway, I'll wait until I see real patches as this thread seems to be > totally vague and ignores our current best-practices for pluggable > devices for some odd reason. Please point people at those best practices rather than just ranting about how pointless you feel all this is! The patches here from Intel are a TOOL to find problems; I can't see the objections to having a tool like this. (I suspect some of these fixes might make the kernel a bit more robust against unexpected hot-remove of PCIe devices as well; but that's more of a guess) Dave > thanks, > > greg k-h > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:16 ` Greg Kroah-Hartman 2023-01-25 15:45 ` Michael S. Tsirkin 2023-01-25 15:50 ` Dr. David Alan Gilbert @ 2023-01-25 18:47 ` Jiri Kosina 2023-01-26 9:19 ` Jörg Rödel 3 siblings, 0 replies; 102+ messages in thread From: Jiri Kosina @ 2023-01-25 18:47 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Dr. David Alan Gilbert, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, 25 Jan 2023, Greg Kroah-Hartman wrote: > Argument that it doesn't work? I thought that ship sailed a long time > ago but I could be wrong as I don't really pay attention to that stuff > as it's just vaporware :) Greg, are you sure you are talking about *SEV-SNP* here? (*) That ship hasn't sailed as far as I can tell, it's being actively worked on. With SEV-SNP launch attestation, FDE, and runtime remote attestation (**) one thing that you get is a way how to ensure that the guest image that you have booted in a (public) cloud hasn't been tampered with, even if you have zero trust in the cloud provider and their hypervisor. And that without the issues and side-channels previous SEV and SEV-ES had. Which to me is a rather valid usecase in today's world, rather than vaporware. (*) and corresponding Intel-TDX support counterpart, once it exists (**) which is not necessarily a kernel work of course, but rather userspace integration work, e.g. based on Keylime -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:16 ` Greg Kroah-Hartman ` (2 preceding siblings ...) 2023-01-25 18:47 ` Jiri Kosina @ 2023-01-26 9:19 ` Jörg Rödel 3 siblings, 0 replies; 102+ messages in thread From: Jörg Rödel @ 2023-01-26 9:19 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Dr. David Alan Gilbert, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote: > Argument that it doesn't work? I thought that ship sailed a long time > ago but I could be wrong as I don't really pay attention to that stuff > as it's just vaporware :) Well, "vaporware" is a bold word, especially given the fact that one can get a confidential VM using AMD SEV[1] or SEV-SNP[2] the cloud today. Hardware for SEV-SNP is also widely available since at least October 2021. But okay, there seems to be some misunderstanding what Confidential Computing (CoCo) implicates, so let me state my view here. The vision for CoCo is to remove trust from the hypervisor (HV), so that a guest owner only needs to trust the hardware and the os vendor for the VM to be trusted and the data in it to be secure. The implication is that the guest-HV interface becomes an attack surface for the guest, and there are two basic strategies to mitigate the risk: 1) Move HV functionality into the guest or the hardware and reduce the guest-HV interface. This already happened to some degree with the SEV-ES enablement, where instruction decoding and handling of most intercepts moved into the guest kernel. 2) Harden the guest-HV interface against malicious input. Where possible we are going with option 1, up to the point where scheduling our VCPUs is the only point we need to trust the HV on. For example, the whole interrupt injection logic will also move either into guest context or the hardware (depends on the HW vendor). That covers most of the CPU emulation that the HV was doing, but an equally important part is device emulation. For device emulation it is harder to move that into the trusted guest context, first of all because there is limited hardware support for that, secondly because it will not perform well. So device emulation will have to stay in the HV for the forseeable future (except for devices carrying secrets, like the TPM). What Elena and others are trying in this thread is to make the wider kernel community aware that malicious input to a device driver is a real problem in some environments and driver hardening is actually worthwile. Regards, Joerg [1] https://cloud.google.com/confidential-computing [2] https://learn.microsoft.com/en-us/azure/confidential-computing/confidential-vm-overview ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:57 ` Dr. David Alan Gilbert 2023-01-25 15:16 ` Greg Kroah-Hartman @ 2023-01-25 21:53 ` Lukas Wunner 2023-01-26 10:48 ` Dr. David Alan Gilbert [not found] ` <CAGXJix9-cXNW7EwJf0PVzj_Qmt5fmQvBX1KvXfRX5NAeEpnMvw@mail.gmail.com> 1 sibling, 2 replies; 102+ messages in thread From: Lukas Wunner @ 2023-01-25 21:53 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci [cc += Jonathan Cameron, linux-pci] On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > Great, so why not have hardware attestation also for your devices you > > wish to talk to? Why not use that as well? Then you don't have to > > worry about anything in the guest. > > There were some talks at Plumbers where PCIe is working on adding that; > it's not there yet though. I think that's PCIe 'Integrity and Data > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > SPDM. I don't know much of the detail of those, just that they're far > enough off that people aren't depending on them yet. CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: https://github.com/l1k/linux/commits/doe It will allow for authentication of PCIe devices. Goal is to submit this quarter (Q1). Afterwards we'll look into retrieving measurements via CMA/SPDM and bringing up IDE encryption. It's a kernel-native implementation which uses the existing crypto and keys infrastructure and is wired into the appropriate places in the PCI core to authenticate devices on enumeration and reauthenticate when CMA/SPDM state is lost (after resume from D3cold, after a Secondary Bus Reset and after a DPC-induced Hot Reset). The device authentication service afforded here is generic. It is up to users and vendors to decide how to employ it, be it for "confidential computing" or something else. Trusted root certificates to validate device certificates can be installed into a kernel keyring using the familiar keyctl(1) utility, but platform-specific roots of trust (such as a HSM) could be supported as well. I would like to stress that this particular effort is a collaboration of multiple vendors. It is decidedly not a single vendor trying to shoehorn something into upstream, so the criticism that has been leveled upthread against other things does not apply here. The Plumbers BoF you're referring to was co-authored by Jonathan Cameron and me and its purpose was precisely to have an open discussion and align on an approach that works for everyone: https://lpc.events/event/16/contributions/1304/ > a) there's no good way to authenticate a PCI device yet > - any nasty device can claim to have a given PCI ID. CMA/SPDM prescribes that the Subject Alternative Name of the device certificate contains the Vendor ID, Device ID, Subsystem Vendor ID, Subsystem ID, Class Code, Revision and Serial Number (PCIe r6.0 sec 6.31.3). Thus a forged Device ID in the Configuration Space Header will result in authentication failure. Thanks, Lukas ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 21:53 ` Lukas Wunner @ 2023-01-26 10:48 ` Dr. David Alan Gilbert 2023-01-26 11:24 ` Jonathan Cameron 2023-01-26 13:32 ` Samuel Ortiz [not found] ` <CAGXJix9-cXNW7EwJf0PVzj_Qmt5fmQvBX1KvXfRX5NAeEpnMvw@mail.gmail.com> 1 sibling, 2 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-26 10:48 UTC (permalink / raw) To: Lukas Wunner Cc: Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci * Lukas Wunner (lukas@wunner.de) wrote: > [cc += Jonathan Cameron, linux-pci] > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > Great, so why not have hardware attestation also for your devices you > > > wish to talk to? Why not use that as well? Then you don't have to > > > worry about anything in the guest. > > > > There were some talks at Plumbers where PCIe is working on adding that; > > it's not there yet though. I think that's PCIe 'Integrity and Data > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > SPDM. I don't know much of the detail of those, just that they're far > > enough off that people aren't depending on them yet. > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > https://github.com/l1k/linux/commits/doe Thanks for the pointer - I'll go and hunt down that spec. > It will allow for authentication of PCIe devices. Goal is to submit > this quarter (Q1). Afterwards we'll look into retrieving measurements > via CMA/SPDM and bringing up IDE encryption. > > It's a kernel-native implementation which uses the existing crypto and > keys infrastructure and is wired into the appropriate places in the > PCI core to authenticate devices on enumeration and reauthenticate > when CMA/SPDM state is lost (after resume from D3cold, after a > Secondary Bus Reset and after a DPC-induced Hot Reset). > > The device authentication service afforded here is generic. > It is up to users and vendors to decide how to employ it, > be it for "confidential computing" or something else. As Samuel asks about who is doing the challenge; but I guess there are also things like what happens when the host controls intermediate switches and BAR access and when only VFs are passed to guests. > Trusted root certificates to validate device certificates can be > installed into a kernel keyring using the familiar keyctl(1) utility, > but platform-specific roots of trust (such as a HSM) could be > supported as well. > > I would like to stress that this particular effort is a collaboration > of multiple vendors. It is decidedly not a single vendor trying to > shoehorn something into upstream, so the criticism that has been > leveled upthread against other things does not apply here. > > The Plumbers BoF you're referring to was co-authored by Jonathan Cameron > and me and its purpose was precisely to have an open discussion and > align on an approach that works for everyone: > > https://lpc.events/event/16/contributions/1304/ > > > > a) there's no good way to authenticate a PCI device yet > > - any nasty device can claim to have a given PCI ID. > > CMA/SPDM prescribes that the Subject Alternative Name of the device > certificate contains the Vendor ID, Device ID, Subsystem Vendor ID, > Subsystem ID, Class Code, Revision and Serial Number (PCIe r6.0 > sec 6.31.3). > > Thus a forged Device ID in the Configuration Space Header will result > in authentication failure. Good! It'll be nice when people figure out the CoCo integration for that; I'm still guessing it's a little way off until we get hardware for that. Dave > Thanks, > > Lukas > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 10:48 ` Dr. David Alan Gilbert @ 2023-01-26 11:24 ` Jonathan Cameron 2023-01-26 13:32 ` Samuel Ortiz 1 sibling, 0 replies; 102+ messages in thread From: Jonathan Cameron @ 2023-01-26 11:24 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Lukas Wunner, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, linux-pci On Thu, 26 Jan 2023 10:48:50 +0000 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > * Lukas Wunner (lukas@wunner.de) wrote: > > [cc += Jonathan Cameron, linux-pci] > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > Great, so why not have hardware attestation also for your devices you > > > > wish to talk to? Why not use that as well? Then you don't have to > > > > worry about anything in the guest. > > > > > > There were some talks at Plumbers where PCIe is working on adding that; > > > it's not there yet though. I think that's PCIe 'Integrity and Data > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > > SPDM. I don't know much of the detail of those, just that they're far > > > enough off that people aren't depending on them yet. > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > https://github.com/l1k/linux/commits/doe > > Thanks for the pointer - I'll go and hunt down that spec. > > > It will allow for authentication of PCIe devices. Goal is to submit > > this quarter (Q1). Afterwards we'll look into retrieving measurements > > via CMA/SPDM and bringing up IDE encryption. > > > > It's a kernel-native implementation which uses the existing crypto and > > keys infrastructure and is wired into the appropriate places in the > > PCI core to authenticate devices on enumeration and reauthenticate > > when CMA/SPDM state is lost (after resume from D3cold, after a > > Secondary Bus Reset and after a DPC-induced Hot Reset). > > > > The device authentication service afforded here is generic. > > It is up to users and vendors to decide how to employ it, > > be it for "confidential computing" or something else. > > As Samuel asks about who is doing the challenge; but I guess there are > also things like what happens when the host controls intermediate > switches and BAR access and when only VFs are passed to guests. Hmm. Bringing switches into the TCB came up at Plumbers. You can get partly around that using selective IDE (end to end encryption) but it has some disadvantages. You can attest the switches if you don't mind bringing them into TCB (one particularly cloud vendor person was very strongly against doing so!) but they don't have nice VF type abstractions so the switch attestation needs to go through someone who isn't the guest. > > > Trusted root certificates to validate device certificates can be > > installed into a kernel keyring using the familiar keyctl(1) utility, > > but platform-specific roots of trust (such as a HSM) could be > > supported as well. > > > > I would like to stress that this particular effort is a collaboration > > of multiple vendors. It is decidedly not a single vendor trying to > > shoehorn something into upstream, so the criticism that has been > > leveled upthread against other things does not apply here. > > > > The Plumbers BoF you're referring to was co-authored by Jonathan Cameron > > and me and its purpose was precisely to have an open discussion and > > align on an approach that works for everyone: > > > > https://lpc.events/event/16/contributions/1304/ > > > > > > > a) there's no good way to authenticate a PCI device yet > > > - any nasty device can claim to have a given PCI ID. > > > > CMA/SPDM prescribes that the Subject Alternative Name of the device > > certificate contains the Vendor ID, Device ID, Subsystem Vendor ID, > > Subsystem ID, Class Code, Revision and Serial Number (PCIe r6.0 > > sec 6.31.3). > > > > Thus a forged Device ID in the Configuration Space Header will result > > in authentication failure. > > Good! It'll be nice when people figure out the CoCo integration for > that; I'm still guessing it's a little way off until we get hardware > for that. FYI: We have QEMU using the DMTF reference implementation (libspdm/spdm-emu) if anyone wants to play with it. Avery Design folk did the qemu bridging to that a while back. Not upstream yet*, but I'm carrying it on my staging CXL qemu tree. https://gitlab.com/jic23/qemu/-/commit/8d0ad6bc84a5d96039aaf8f929c60b9f7ba02832 In combination with Lucas' tree mentioned earlier you can get all the handshaking to happen to attest against certs. Don't think we are yet actually checking the IDs but trivial to add (mainly a case of generating right certs with the Subject Alternative Name set). Jonathan * It's a hack using the socket interface of spdm-emu tools - at some point I need to start a discussion on QEMU list / with dmtf tools group on whether to fix libspdm to actually work as a shared library, or cope with the current approach (crossing fingers the socket interface remains stable in spdm-emu). > > Dave > > > Thanks, > > > > Lukas > > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 10:48 ` Dr. David Alan Gilbert 2023-01-26 11:24 ` Jonathan Cameron @ 2023-01-26 13:32 ` Samuel Ortiz 1 sibling, 0 replies; 102+ messages in thread From: Samuel Ortiz @ 2023-01-26 13:32 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Lukas Wunner, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci On Thu, Jan 26, 2023 at 10:48:50AM +0000, Dr. David Alan Gilbert wrote: > * Lukas Wunner (lukas@wunner.de) wrote: > > [cc += Jonathan Cameron, linux-pci] > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > Great, so why not have hardware attestation also for your devices you > > > > wish to talk to? Why not use that as well? Then you don't have to > > > > worry about anything in the guest. > > > > > > There were some talks at Plumbers where PCIe is working on adding that; > > > it's not there yet though. I think that's PCIe 'Integrity and Data > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > > SPDM. I don't know much of the detail of those, just that they're far > > > enough off that people aren't depending on them yet. > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > https://github.com/l1k/linux/commits/doe > > Thanks for the pointer - I'll go and hunt down that spec. > > > It will allow for authentication of PCIe devices. Goal is to submit > > this quarter (Q1). Afterwards we'll look into retrieving measurements > > via CMA/SPDM and bringing up IDE encryption. > > > > It's a kernel-native implementation which uses the existing crypto and > > keys infrastructure and is wired into the appropriate places in the > > PCI core to authenticate devices on enumeration and reauthenticate > > when CMA/SPDM state is lost (after resume from D3cold, after a > > Secondary Bus Reset and after a DPC-induced Hot Reset). > > > > The device authentication service afforded here is generic. > > It is up to users and vendors to decide how to employ it, > > be it for "confidential computing" or something else. > > As Samuel asks about who is doing the challenge; but I guess there are > also things like what happens when the host controls intermediate > switches You'd want to protect that through IDE selective streams. > and BAR access and when only VFs are passed to guests. TDISP aims at addressing that afaiu. Once the VF (aka TDI) is locked, any changes to its BAR(s) or any PF MMIO that would affect the VF would get the VF back to unlocked (and let the guest reject it). Cheers, Samuel. ^ permalink raw reply [flat|nested] 102+ messages in thread
[parent not found: <CAGXJix9-cXNW7EwJf0PVzj_Qmt5fmQvBX1KvXfRX5NAeEpnMvw@mail.gmail.com>]
* Re: Linux guest kernel threat model for Confidential Computing [not found] ` <CAGXJix9-cXNW7EwJf0PVzj_Qmt5fmQvBX1KvXfRX5NAeEpnMvw@mail.gmail.com> @ 2023-01-26 10:58 ` Jonathan Cameron 2023-01-26 13:15 ` Samuel Ortiz 2023-01-26 15:44 ` Lukas Wunner 1 sibling, 1 reply; 102+ messages in thread From: Jonathan Cameron @ 2023-01-26 10:58 UTC (permalink / raw) To: Samuel Ortiz Cc: Lukas Wunner, Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, linux-pci On Thu, 26 Jan 2023 10:24:32 +0100 Samuel Ortiz <sameo@rivosinc.com> wrote: > Hi Lukas, > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote: > > > [cc += Jonathan Cameron, linux-pci] > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > Great, so why not have hardware attestation also for your devices you > > > > wish to talk to? Why not use that as well? Then you don't have to > > > > worry about anything in the guest. > > > > > > There were some talks at Plumbers where PCIe is working on adding that; > > > it's not there yet though. I think that's PCIe 'Integrity and Data > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > > SPDM. I don't know much of the detail of those, just that they're far > > > enough off that people aren't depending on them yet. > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > https://github.com/l1k/linux/commits/doe > > Nice, thanks a lot for that. > > > > > The device authentication service afforded here is generic. > > It is up to users and vendors to decide how to employ it, > > be it for "confidential computing" or something else. > > > > Trusted root certificates to validate device certificates can be > > installed into a kernel keyring using the familiar keyctl(1) utility, > > but platform-specific roots of trust (such as a HSM) could be > > supported as well. > > > > This may have been discussed at LPC, but are there any plans to also > support confidential computing flows where the host kernel is not part > of the TCB and would not be trusted for validating the device cert chain > nor for running the SPDM challenge? There are lots of possible models for this. One simple option if the assigned VF supports it is a CMA instance per VF. That will let the guest do full attestation including measurement of whether the device is appropriately locked down so the hypervisor can't mess with configuration that affects the guest (without a reset anyway and that is guest visible). Whether anyone builds that option isn't yet clear though. If they do, Lukas' work should work there as well as for the host OS. (Note I'm not a security expert so may be missing something!) For extra fun, why should the device trust the host? Mutual authentication fun (there are usecases where that matters) There are way more complex options supported in PCIe TDISP (Tee Device security interface protocols). Anyone have an visibility of open solutions that make use of that? May be too new. Jonathan > > Cheers, > Samuel. > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 10:58 ` Jonathan Cameron @ 2023-01-26 13:15 ` Samuel Ortiz 2023-01-26 16:07 ` Jonathan Cameron 0 siblings, 1 reply; 102+ messages in thread From: Samuel Ortiz @ 2023-01-26 13:15 UTC (permalink / raw) To: Jonathan Cameron Cc: Lukas Wunner, Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, linux-pci On Thu, Jan 26, 2023 at 10:58:47AM +0000, Jonathan Cameron wrote: > On Thu, 26 Jan 2023 10:24:32 +0100 > Samuel Ortiz <sameo@rivosinc.com> wrote: > > > Hi Lukas, > > > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote: > > > > > [cc += Jonathan Cameron, linux-pci] > > > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > > Great, so why not have hardware attestation also for your devices you > > > > > wish to talk to? Why not use that as well? Then you don't have to > > > > > worry about anything in the guest. > > > > > > > > There were some talks at Plumbers where PCIe is working on adding that; > > > > it's not there yet though. I think that's PCIe 'Integrity and Data > > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > > > SPDM. I don't know much of the detail of those, just that they're far > > > > enough off that people aren't depending on them yet. > > > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > > > https://github.com/l1k/linux/commits/doe > > > > Nice, thanks a lot for that. > > > > > > > > > The device authentication service afforded here is generic. > > > It is up to users and vendors to decide how to employ it, > > > be it for "confidential computing" or something else. > > > > > > Trusted root certificates to validate device certificates can be > > > installed into a kernel keyring using the familiar keyctl(1) utility, > > > but platform-specific roots of trust (such as a HSM) could be > > > supported as well. > > > > > > > This may have been discussed at LPC, but are there any plans to also > > support confidential computing flows where the host kernel is not part > > of the TCB and would not be trusted for validating the device cert chain > > nor for running the SPDM challenge? > > There are lots of possible models for this. One simple option if the assigned > VF supports it is a CMA instance per VF. That will let the guest > do full attestation including measurement of whether the device is > appropriately locked down so the hypervisor can't mess with > configuration that affects the guest (without a reset anyway and that > is guest visible). So the VF would be directly assigned to the guest, and the guest kernel would create a CMA instance for the VF, and do the SPDM authentication (based on a guest provided trusted root certificate). I think one security concern with that approach is assigning the VF to the (potentially confidential) guest address space without the guest being able to attest of the device trustworthiness first. That's what TDISP is aiming at fixing (establish a secure SPDM between the confidential guest and the device, lock the device from the guest, attest and then enable DMA). > Whether anyone builds that option isn't yet clear > though. If they do, Lukas' work should work there as well as for the > host OS. (Note I'm not a security expert so may be missing something!) > > For extra fun, why should the device trust the host? Mutual authentication > fun (there are usecases where that matters) > > There are way more complex options supported in PCIe TDISP (Tee Device > security interface protocols). Anyone have an visibility of open solutions > that make use of that? May be too new. It's still a PCI ECN, so quite new indeed. FWIW the rust spdm crate [1] implements the TDISP state machine. Cheers, Samuel. [1] https://github.com/jyao1/rust-spdm > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 13:15 ` Samuel Ortiz @ 2023-01-26 16:07 ` Jonathan Cameron 2023-01-27 7:02 ` Samuel Ortiz 0 siblings, 1 reply; 102+ messages in thread From: Jonathan Cameron @ 2023-01-26 16:07 UTC (permalink / raw) To: Samuel Ortiz Cc: Lukas Wunner, Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, linux-pci On Thu, 26 Jan 2023 14:15:05 +0100 Samuel Ortiz <sameo@rivosinc.com> wrote: > On Thu, Jan 26, 2023 at 10:58:47AM +0000, Jonathan Cameron wrote: > > On Thu, 26 Jan 2023 10:24:32 +0100 > > Samuel Ortiz <sameo@rivosinc.com> wrote: > > > > > Hi Lukas, > > > > > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote: > > > > > > > [cc += Jonathan Cameron, linux-pci] > > > > > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > > > Great, so why not have hardware attestation also for your devices you > > > > > > wish to talk to? Why not use that as well? Then you don't have to > > > > > > worry about anything in the guest. > > > > > > > > > > There were some talks at Plumbers where PCIe is working on adding that; > > > > > it's not there yet though. I think that's PCIe 'Integrity and Data > > > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > > > > SPDM. I don't know much of the detail of those, just that they're far > > > > > enough off that people aren't depending on them yet. > > > > > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > > > > > https://github.com/l1k/linux/commits/doe > > > > > > Nice, thanks a lot for that. > > > > > > > > > > > > > The device authentication service afforded here is generic. > > > > It is up to users and vendors to decide how to employ it, > > > > be it for "confidential computing" or something else. > > > > > > > > Trusted root certificates to validate device certificates can be > > > > installed into a kernel keyring using the familiar keyctl(1) utility, > > > > but platform-specific roots of trust (such as a HSM) could be > > > > supported as well. > > > > > > > > > > This may have been discussed at LPC, but are there any plans to also > > > support confidential computing flows where the host kernel is not part > > > of the TCB and would not be trusted for validating the device cert chain > > > nor for running the SPDM challenge? > > > > There are lots of possible models for this. One simple option if the assigned > > VF supports it is a CMA instance per VF. That will let the guest > > do full attestation including measurement of whether the device is > > appropriately locked down so the hypervisor can't mess with > > configuration that affects the guest (without a reset anyway and that > > is guest visible). > > So the VF would be directly assigned to the guest, and the guest kernel > would create a CMA instance for the VF, and do the SPDM authentication > (based on a guest provided trusted root certificate). I think one > security concern with that approach is assigning the VF to the > (potentially confidential) guest address space without the guest being > able to attest of the device trustworthiness first. That's what TDISP is > aiming at fixing (establish a secure SPDM between the confidential guest > and the device, lock the device from the guest, attest and then enable > DMA). Agreed, TDISP is more comprehensive, but also much more complex with more moving parts that we don't really have yet. Depending on your IOMMU design (+ related stuff) and interaction with the secure guest, you might be able to block any rogue DMA until after attestation / lock down checks even if the Hypervisor was letting it through. > > > Whether anyone builds that option isn't yet clear > > though. If they do, Lukas' work should work there as well as for the > > host OS. (Note I'm not a security expert so may be missing something!) > > > > For extra fun, why should the device trust the host? Mutual authentication > > fun (there are usecases where that matters) > > > > There are way more complex options supported in PCIe TDISP (Tee Device > > security interface protocols). Anyone have an visibility of open solutions > > that make use of that? May be too new. > > It's still a PCI ECN, so quite new indeed. > FWIW the rust spdm crate [1] implements the TDISP state machine. Cool. thanks for the reference. > > Cheers, > Samuel. > > [1] https://github.com/jyao1/rust-spdm > > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 16:07 ` Jonathan Cameron @ 2023-01-27 7:02 ` Samuel Ortiz 0 siblings, 0 replies; 102+ messages in thread From: Samuel Ortiz @ 2023-01-27 7:02 UTC (permalink / raw) To: Jonathan Cameron Cc: Lukas Wunner, Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, linux-pci On Thu, Jan 26, 2023 at 04:07:29PM +0000, Jonathan Cameron wrote: > On Thu, 26 Jan 2023 14:15:05 +0100 > Samuel Ortiz <sameo@rivosinc.com> wrote: > > > On Thu, Jan 26, 2023 at 10:58:47AM +0000, Jonathan Cameron wrote: > > > On Thu, 26 Jan 2023 10:24:32 +0100 > > > Samuel Ortiz <sameo@rivosinc.com> wrote: > > > > > > > Hi Lukas, > > > > > > > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote: > > > > > > > > > [cc += Jonathan Cameron, linux-pci] > > > > > > > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote: > > > > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > > > > > > > Great, so why not have hardware attestation also for your devices you > > > > > > > wish to talk to? Why not use that as well? Then you don't have to > > > > > > > worry about anything in the guest. > > > > > > > > > > > > There were some talks at Plumbers where PCIe is working on adding that; > > > > > > it's not there yet though. I think that's PCIe 'Integrity and Data > > > > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' - > > > > > > SPDM. I don't know much of the detail of those, just that they're far > > > > > > enough off that people aren't depending on them yet. > > > > > > > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > > > > > > > https://github.com/l1k/linux/commits/doe > > > > > > > > Nice, thanks a lot for that. > > > > > > > > > > > > > > > > > The device authentication service afforded here is generic. > > > > > It is up to users and vendors to decide how to employ it, > > > > > be it for "confidential computing" or something else. > > > > > > > > > > Trusted root certificates to validate device certificates can be > > > > > installed into a kernel keyring using the familiar keyctl(1) utility, > > > > > but platform-specific roots of trust (such as a HSM) could be > > > > > supported as well. > > > > > > > > > > > > > This may have been discussed at LPC, but are there any plans to also > > > > support confidential computing flows where the host kernel is not part > > > > of the TCB and would not be trusted for validating the device cert chain > > > > nor for running the SPDM challenge? > > > > > > There are lots of possible models for this. One simple option if the assigned > > > VF supports it is a CMA instance per VF. That will let the guest > > > do full attestation including measurement of whether the device is > > > appropriately locked down so the hypervisor can't mess with > > > configuration that affects the guest (without a reset anyway and that > > > is guest visible). > > > > So the VF would be directly assigned to the guest, and the guest kernel > > would create a CMA instance for the VF, and do the SPDM authentication > > (based on a guest provided trusted root certificate). I think one > > security concern with that approach is assigning the VF to the > > (potentially confidential) guest address space without the guest being > > able to attest of the device trustworthiness first. That's what TDISP is > > aiming at fixing (establish a secure SPDM between the confidential guest > > and the device, lock the device from the guest, attest and then enable > > DMA). > > Agreed, TDISP is more comprehensive, but also much more complex with > more moving parts that we don't really have yet. > > Depending on your IOMMU design (+ related stuff) and interaction with > the secure guest, you might be able to block any rogue DMA until > after attestation / lock down checks even if the Hypervisor was letting > it through. Provided that the guest or, in the TDX and AP-TEE cases, the TSM have protected access to the IOMMU, yes. But then the implementation becomes platform specific. Cheers, Samuel. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing [not found] ` <CAGXJix9-cXNW7EwJf0PVzj_Qmt5fmQvBX1KvXfRX5NAeEpnMvw@mail.gmail.com> 2023-01-26 10:58 ` Jonathan Cameron @ 2023-01-26 15:44 ` Lukas Wunner 2023-01-26 16:25 ` Michael S. Tsirkin 2023-01-27 7:17 ` Samuel Ortiz 1 sibling, 2 replies; 102+ messages in thread From: Lukas Wunner @ 2023-01-26 15:44 UTC (permalink / raw) To: Samuel Ortiz Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci On Thu, Jan 26, 2023 at 10:24:32AM +0100, Samuel Ortiz wrote: > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote: > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > https://github.com/l1k/linux/commits/doe > > > > The device authentication service afforded here is generic. > > It is up to users and vendors to decide how to employ it, > > be it for "confidential computing" or something else. > > > > Trusted root certificates to validate device certificates can be > > installed into a kernel keyring using the familiar keyctl(1) utility, > > but platform-specific roots of trust (such as a HSM) could be > > supported as well. > > This may have been discussed at LPC, but are there any plans to also > support confidential computing flows where the host kernel is not part > of the TCB and would not be trusted for validating the device cert chain > nor for running the SPDM challenge? As long as a device is passed through to a guest, the guest owns that device. It is the guest's prerogative and duty to perform CMA/SPDM authentication on its own behalf. If the guest uses memory encryption via TDX or SEV, key material established through a Diffie-Hellman exchange between guest and device is invisible to the host. Consequently using that key material for IDE encryption protects device accesses from the guest against snooping by the host. SPDM authentication consists of a sequence of exchanges, the first being GET_VERSION. When a responder (=device) receives a GET_VERSION request, it resets the connection and all internal state related to that connection. (SPDM 1.2.1 margin no 185: "a Requester can issue a GET_VERSION to a Responder to reset a connection at any time"; see also SPDM 1.1.0 margin no 161 for details.) Thus, even though the host may have authenticated the device, once it's passed through to a guest and the guest performs authentication again, SPDM state on the device is reset. I'll amend the patches so that the host refrains from performing reauthentication as long as a device is passed through. The host has no business mutating SPDM state on the device once ownership has passed to the guest. The first few SPDM exchanges are transmitted in the clear, so the host can eavesdrop on the negotiated algorithms, exchanged certificates and nonces. However the host cannot successfully modify the exchanged data due to the man in the middle protection afforded by SPDM: The challenge/response hash is computed over the concatenation of the exchanged messages, so modification of the messages by a man in the middle leads to authentication failure. Obviously the host can DoS guest access to the device by modifying exchanged messages, but there are much simpler ways for it to do that, say, by clearing Bus Master Enable or Memory Space Enable bits in the Command Register. DoS attacks from the host against the guest cannot be part of the threat model at this point. Thanks, Lukas ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 15:44 ` Lukas Wunner @ 2023-01-26 16:25 ` Michael S. Tsirkin 2023-01-26 21:41 ` Lukas Wunner 2023-01-27 7:17 ` Samuel Ortiz 1 sibling, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-26 16:25 UTC (permalink / raw) To: Lukas Wunner Cc: Samuel Ortiz, Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci On Thu, Jan 26, 2023 at 04:44:49PM +0100, Lukas Wunner wrote: > Obviously the host can DoS guest access to the device by modifying > exchanged messages, but there are much simpler ways for it to > do that, say, by clearing Bus Master Enable or Memory Space Enable > bits in the Command Register. There's a single key per guest though, isn't it? Also used for regular memory? -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 16:25 ` Michael S. Tsirkin @ 2023-01-26 21:41 ` Lukas Wunner 0 siblings, 0 replies; 102+ messages in thread From: Lukas Wunner @ 2023-01-26 21:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Samuel Ortiz, Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci On Thu, Jan 26, 2023 at 11:25:21AM -0500, Michael S. Tsirkin wrote: > On Thu, Jan 26, 2023 at 04:44:49PM +0100, Lukas Wunner wrote: > > Obviously the host can DoS guest access to the device by modifying > > exchanged messages, but there are much simpler ways for it to > > do that, say, by clearing Bus Master Enable or Memory Space Enable > > bits in the Command Register. > > There's a single key per guest though, isn't it? Also used > for regular memory? The current design is to have a global keyring (per kernel, i.e. per guest). A device presents a certificate chain and the first certificate in that chain needs to be signed by one of the certificates on the keyring. This is completely independent from the key used for memory encryption. A device can have up to 8 certificate chains (called "slots" in the SPDM spec) and I've implemented it such that all slots are iterated and validation is considered to be successful as soon as a slot with a valid signature is found. We can discuss having a per-device keyring if anyone thinks it makes sense. The PCISIG's idea seems to be that each vendor of PCIe cards publishes a trusted root certificate and users would then have to keep all those vendor certificates in their global keyring. This follows from the last paragraph of PCIe r6.0.1 sec 6.31.3, which says "it is strongly recommended that authentication requesters [i.e. the kernel] confirm that the information provided in the Subject Alternative Name entry [of the device's leaf certificate] is signed by the vendor indicated by the Vendor ID." The astute reader will notice that for this to work, the Vendor ID must be included in the trusted root certificate in a machine-readable way. Unfortunately the PCIe Base Spec fails to specify that. So I don't know how to associate a trusted root certificate with a Vendor ID. I'll report this and several other gaps I've found in the spec to the editor at the PCISIG so that they can be filled in a future revision. Thanks, Lukas ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 15:44 ` Lukas Wunner 2023-01-26 16:25 ` Michael S. Tsirkin @ 2023-01-27 7:17 ` Samuel Ortiz 1 sibling, 0 replies; 102+ messages in thread From: Samuel Ortiz @ 2023-01-27 7:17 UTC (permalink / raw) To: Lukas Wunner Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Jonathan Cameron, linux-pci On Thu, Jan 26, 2023 at 04:44:49PM +0100, Lukas Wunner wrote: > On Thu, Jan 26, 2023 at 10:24:32AM +0100, Samuel Ortiz wrote: > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote: > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch: > > > > > > https://github.com/l1k/linux/commits/doe > > > > > > The device authentication service afforded here is generic. > > > It is up to users and vendors to decide how to employ it, > > > be it for "confidential computing" or something else. > > > > > > Trusted root certificates to validate device certificates can be > > > installed into a kernel keyring using the familiar keyctl(1) utility, > > > but platform-specific roots of trust (such as a HSM) could be > > > supported as well. > > > > This may have been discussed at LPC, but are there any plans to also > > support confidential computing flows where the host kernel is not part > > of the TCB and would not be trusted for validating the device cert chain > > nor for running the SPDM challenge? > > As long as a device is passed through to a guest, the guest owns > that device. I agree. On a SRIOV setup, the host typically owns the PF and assigns VFs to the guests. Devices must be enlightened to guarantee that once one of their VFs/interfaces is passed to a trusted VM, it can no longer be modified by anything untrusted (e.g. the hypervisor). > It is the guest's prerogative and duty to perform > CMA/SPDM authentication on its own behalf. If the guest uses > memory encryption via TDX or SEV, key material established through > a Diffie-Hellman exchange between guest and device is invisible > to the host. Consequently using that key material for IDE encryption > protects device accesses from the guest against snooping by the host. On confidential computing platforms where a security manager (e.g. Intel TDX module) manages the confidential guests, the IDE key management and stream settings would be handled by this manager. In other words, the SPDM requester would not be a Linux kernel. FWIW, Intel recently published an interesting description of TEE-IO enabling with TDX [1]. > SPDM authentication consists of a sequence of exchanges, the first > being GET_VERSION. When a responder (=device) receives a GET_VERSION > request, it resets the connection and all internal state related to > that connection. (SPDM 1.2.1 margin no 185: "a Requester can issue > a GET_VERSION to a Responder to reset a connection at any time"; > see also SPDM 1.1.0 margin no 161 for details.) > > Thus, even though the host may have authenticated the device, > once it's passed through to a guest and the guest performs > authentication again, SPDM state on the device is reset. > > I'll amend the patches so that the host refrains from performing > reauthentication as long as a device is passed through. The host > has no business mutating SPDM state on the device once ownership > has passed to the guest. > > The first few SPDM exchanges are transmitted in the clear, > so the host can eavesdrop on the negotiated algorithms, > exchanged certificates and nonces. However the host cannot > successfully modify the exchanged data due to the man in the middle > protection afforded by SPDM: The challenge/response hash is > computed over the concatenation of the exchanged messages, > so modification of the messages by a man in the middle leads > to authentication failure. Right, I was not concerned by the challenge messages integrity but by trusting the host with verifying the response and validating the device cert chains. > Obviously the host can DoS guest access to the device by modifying > exchanged messages, but there are much simpler ways for it to > do that, say, by clearing Bus Master Enable or Memory Space Enable > bits in the Command Register. DoS attacks from the host against > the guest cannot be part of the threat model at this point. Yes, the host can DoS the guest at anytime it wants and in multiple ways. It's definitely out of the confidential computing thread model at least. Cheers, Samuel. [1] https://cdrdv2-public.intel.com/742542/software-enabling-for-tdx-tee-io-fixed.pdf ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:22 ` Greg Kroah-Hartman 2023-01-25 14:30 ` James Bottomley 2023-01-25 14:57 ` Dr. David Alan Gilbert @ 2023-01-25 20:13 ` Jiri Kosina 2023-01-26 13:13 ` Reshetova, Elena 3 siblings, 0 replies; 102+ messages in thread From: Jiri Kosina @ 2023-01-25 20:13 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Dr. David Alan Gilbert, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, 25 Jan 2023, Greg Kroah-Hartman wrote: > How do you trust you got real data on the disk? On the network? Those > are coming from the host, how is any of that data to be trusted? Where > does the trust stop and why? This is all well described in AMD SEV-SNP documentation, see page 5 of [1]. All the external devices are treated as untrusted in that model. [1] https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-25 14:22 ` Greg Kroah-Hartman ` (2 preceding siblings ...) 2023-01-25 20:13 ` Jiri Kosina @ 2023-01-26 13:13 ` Reshetova, Elena 3 siblings, 0 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-26 13:13 UTC (permalink / raw) To: Greg Kroah-Hartman, Dr. David Alan Gilbert Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List > > > I hate the term "hardening". Please just say it for what it really is, > > > "fixing bugs to handle broken hardware". We've done that for years when > > > dealing with PCI and USB and even CPUs doing things that they shouldn't > > > be doing. How is this any different in the end? > > > > > > So what you also are saying here now is "we do not trust any PCI > > > devices", so please just say that (why do you trust USB devices?) If > > > that is something that you all think that Linux should support, then > > > let's go from there. > > > > I don't think generally all PCI device drivers guard against all the > > nasty things that a broken implementation of their hardware can do. > > I know that all PCI drivers can NOT do that today as that was never > anything that Linux was designed for. > > > The USB devices are probably a bit better, because they actually worry > > about people walking up with a nasty HID device; I'm skeptical that > > a kernel would survive a purposely broken USB controller. > > I agree with you there, USB drivers are only starting to be fuzzed at > the descriptor level, that's all. Which is why they too can be put into > the "untrusted" area until you trust them. > > > I'm not sure the request here isn't really to make sure *all* PCI devices > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - > > and potentially ones that people will want to pass-through (which > > generally needs a lot more work to make safe). > > (I've not looked at these Intel tools to see what they cover) > > Why not just create a whole new bus path for these "trusted" devices to > attach to and do that instead of tyring to emulate a protocol that was > explicitly designed NOT to this model at all? Why are you trying to > shoehorn something here and not just designing it properly from the > beginning? > > > Having said that, how happy are you with Thunderbolt PCI devices being > > plugged into your laptop or into the hotplug NVMe slot on a server? > > We have protection for that, and have had it for many years. Same for > USB devices. This isn't new, perhaps you all have not noticed those > features be added and taken advantage of already by many Linux distros > and system images (i.e. ChromeOS and embedded systems?) > > > We're now in the position we were with random USB devices years ago. > > Nope, we are not, again, we already handle random PCI devices being > plugged in. It's up to userspace to make the policy decision if it > should be trusted or not before the kernel has access to it. > > So a meta-comment, why not just use that today? If your guest OS can > not authenticate the PCI device passed to it, don't allow the kernel to > bind to it. If it can be authenticated, wonderful, bind away! You can > do this today with no kernel changes needed. > > > Also we would want to make sure that any config data that the hypervisor > > can pass to the guest is validated. > > Define "validated" please. > > > The problem seems reasonably well understood within the CoCo world - how > > far people want to push it probably varies; but it's good to make the > > problem more widely understood. > > The "CoCo" world seems distant and separate from the real-world of Linux > kernel development if you all do not even know about the authentication > methods that we have for years for enabling access to PCI and USB > devices as described above. If the impementations that we currently > have are lacking in some way, wonderful, please submit changes for them > and we will be glad to review them as needed. We are aware of USB/Thunderbolt authorization framework and this is what we have been extending now for the our CC usage in order to apply this to all devices. The patches are currently under testing/polishing, but we will be submitting them in the near future. That's said even with the above in place we don’t get a protection from a man-in- the-middle attacks that are possible by untrusted hypervisor or host. In order to get a full protection here, we need an attestation and end-to-end secure channel between devices and CC guest. However, since it is going to take a long time before we have all the infrastructure in place in Linux, as well as devices that are capable of supporting all required functionality (and some devices will never have this support such as virtual devices), we need to have a reasonable security model now, vs waiting until researchers are starting to post the proof-of-concept privilege escalation exploits on smth that is even (thanks to the tools we created in in [1]) not so hard to find: you run our fuzzing tools on the guest kernel tree of your liking and it gives you a nice set of KASAN issues to play with. What we are trying to do is to address these findings (among other things) for a more robust guest kernel. Best Regards, Elena [1] https://github.com/intel/ccc-linux-guest-hardening ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-25 12:43 ` Greg Kroah-Hartman 2023-01-25 13:42 ` Dr. David Alan Gilbert @ 2023-01-25 15:29 ` Reshetova, Elena 2023-01-25 16:40 ` Theodore Ts'o ` (2 more replies) 1 sibling, 3 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-25 15:29 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening Replying only to the not-so-far addressed points. > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > Hi Greg, > > > > You mentioned couple of times (last time in this recent thread: > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to > start > > discussing the updated threat model for kernel, so this email is a start in this > direction. > > Any specific reason you didn't cc: the linux-hardening mailing list? > This seems to be in their area as well, right? Added now, I am just not sure how many mailing lists I want to cross spam this. And this is a very special aspect of 'hardening' since it is about hardening a kernel under different threat model/assumptions. > I hate the term "hardening". Please just say it for what it really is, > "fixing bugs to handle broken hardware". We've done that for years when > dealing with PCI and USB and even CPUs doing things that they shouldn't > be doing. How is this any different in the end? Well, that would not be fully correct in this case. You can really see it from two angles: 1. fixing bugs to handle broken hardware 2. fixing bugs that are result of correctly operating HW, but incorrectly or maliciously operating hypervisor (acting as a man in the middle) We focus on 2 but it happens to address 1 also to some level. > > So what you also are saying here now is "we do not trust any PCI > devices", so please just say that (why do you trust USB devices?) If > that is something that you all think that Linux should support, then > let's go from there. > > > 3) All the tools are open-source and everyone can start using them right away > even > > without any special HW (readme has description of what is needed). > > Tools and documentation is here: > > https://github.com/intel/ccc-linux-guest-hardening > > Again, as our documentation states, when you submit patches based on > these tools, you HAVE TO document that. Otherwise we think you all are > crazy and will get your patches rejected. You all know this, why ignore > it? Sorry, I didn’t know that for every bug that is found in linux kernel when we are submitting a fix that we have to list the way how it has been found. We will fix this in the future submissions, but some bugs we have are found by plain code audit, so 'human' is the tool. > > > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be > found > > here: https://github.com/intel/tdx/commits/guest-next > > Random github trees of kernel patches are just that, sorry. This was just for a completeness or for anyone who is curious to see the actual code already now. Of course they will be submitted for review using normal process. > > > So, my main question before we start to argue about the threat model, > mitigations, etc, > > is what is the good way to get this reviewed to make sure everyone is aligned? > > There are a lot of angles and details, so what is the most efficient method? > > Should I split the threat model from https://intel.github.io/ccc-linux-guest- > hardening-docs/security-spec.html > > into logical pieces and start submitting it to mailing list for discussion one by > one? > > Yes, start out by laying out what you feel the actual problem is, what > you feel should be done for it, and the patches you have proposed to > implement this, for each and every logical piece. OK, so this thread is about the actual threat model and overall problem. We can re-write the current bug fixe patches (virtio and MSI) to refer to this threat model properly and explain that they fix the actual bugs under this threat model. Rest of pieces will come when other patches will be submitted for the review in logical groups. Does this work? > > Again, nothing new here, that's how Linux is developed, again, you all > know this, it's not anything I should have to say. > > > Any other methods? > > > > The original plan we had in mind is to start discussing the relevant pieces when > submitting the code, > > i.e. when submitting the device filter patches, we will include problem > statement, threat model link, > > data, alternatives considered, etc. > > As always, we can't do anything without actual working changes to the > code, otherwise it's just a pipe dream and we can't waste our time on it > (neither would you want us to). Of course, code exists, we just only starting submitting it. We started from easy bug fixes because they are small trivial fixes that are easy to review. Bigger pieces will follow (for example Satya has been addressing your comments about the device filter in his new implementation). Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:29 ` Reshetova, Elena @ 2023-01-25 16:40 ` Theodore Ts'o 2023-01-26 8:08 ` Reshetova, Elena 2023-01-26 11:19 ` Leon Romanovsky 2023-01-26 16:29 ` Michael S. Tsirkin 2 siblings, 1 reply; 102+ messages in thread From: Theodore Ts'o @ 2023-01-25 16:40 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > Again, as our documentation states, when you submit patches based on > > these tools, you HAVE TO document that. Otherwise we think you all are > > crazy and will get your patches rejected. You all know this, why ignore > > it? > > Sorry, I didn’t know that for every bug that is found in linux kernel when > we are submitting a fix that we have to list the way how it has been found. > We will fix this in the future submissions, but some bugs we have are found by > plain code audit, so 'human' is the tool. So the concern is that *you* may think it is a bug, but other people may not agree. Perhaps what is needed is a full description of the goals of Confidential Computing, and what is in scope, and what is deliberately *not* in scope. I predict that when you do this, that people will come out of the wood work and say, no wait, "CoCo ala S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V means QUUX". Others may end up objecting, "no wait, doing this is going to mean ***insane*** changes to the entire kernel, and this will be a performance / maintenance nightmare and unless you fix your hardware in future chips, we wlil consider this a hardware bug and reject all of your patches". But it's better to figure this out now, then after you get hundreds of patches into the upstream kernel, we discover that this is only 5% of the necessary changes, and then the rest of your patches are rejected, and you have to end up fixing the hardware anyway, with the patches upstreamed so far being wasted effort. :-) If we get consensus on that document, then that can get checked into Documentation, and that can represent general consensus on the problem early on. - Ted ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-25 16:40 ` Theodore Ts'o @ 2023-01-26 8:08 ` Reshetova, Elena 0 siblings, 0 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-26 8:08 UTC (permalink / raw) To: Theodore Ts'o Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > Again, as our documentation states, when you submit patches based on > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > crazy and will get your patches rejected. You all know this, why ignore > > > it? > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > we are submitting a fix that we have to list the way how it has been found. > > We will fix this in the future submissions, but some bugs we have are found by > > plain code audit, so 'human' is the tool. > > So the concern is that *you* may think it is a bug, but other people > may not agree. Perhaps what is needed is a full description of the > goals of Confidential Computing, and what is in scope, and what is > deliberately *not* in scope. I predict that when you do this, that > people will come out of the wood work and say, no wait, "CoCo ala > S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V > means QUUX". Agree, and this is the reason behind starting this thread: to make sure people agree on the threat model. The only reason why we submitted some trivial bugs fixes separately is the fact that they *also* can be considered bugs under existing threat model, if one thinks that kernel should be as robust as possible against potential erroneous devices. As described right in the beginning of the doc I shared [1] (adjusted now to remove 'TDX' and put generic 'CC guest kernel'), we want to make sure that an untrusted host (and hypervisor) is not able to 1. archive privileged escalation into a CC guest kernel 2. compromise the confidentiality or integrity of CC guest private memory The above security objectives give us two primary assets we want to protect: CC guest execution context and CC guest private memory confidentiality and integrity. The DoS from the host towards CC guest is explicitly out of scope and a non-security objective. The attack surface in question is any interface exposed from a CC guest kernel towards untrusted host that is not covered by the CC HW protections. Here the exact list can differ somewhat depending on what technology is being used, but as David already pointed out before: both CC guest memory and register state is protected from host attacks, so we are focusing on other communication channels and on generic interfaces used by Linux today. Examples of such interfaces for TDX (and I think SEV shares most of them, but please correct me if I am wrong here) are access to some MSRs and CPUIDs, port IO, MMIO and DMA, access to PCI config space, KVM hypercalls (if hypervisor is KVM), TDX specific hypercalls (this is technology specific), data consumed from untrusted host during the CC guest initialization (including kernel itself, kernel command line, provided ACPI tables, etc) and others described in [1]. An important note here is that these interfaces are not limited just to device drivers (albeit device drivers are the biggest users for some of them), they are present through the whole kernel in different subsystems and need careful examination and development of mitigations. The possible range of mitigations that we can apply is also wide, but you can roughly split it into two groups: 1. mitigations that use various attestation mechanisms (we can attest the kernel code, cmline, ACPI tables being provided and other potential configurations, and one day we will hopefully also be able to attest devices we connect to CC guest and their configuration) 2. other mitigations for threats that attestation cannot cover, i.e. mainly runtime interactions with the host. Above sounds conceptually simple but the devil is as usual in details, but it doesn’t look very impossible or smth that would need the ***insane*** changes to the entire kernel. > > Others may end up objecting, "no wait, doing this is going to mean > ***insane*** changes to the entire kernel, and this will be a > performance / maintenance nightmare and unless you fix your hardware > in future chips, we wlil consider this a hardware bug and reject all > of your patches". > > But it's better to figure this out now, then after you get hundreds of > patches into the upstream kernel, we discover that this is only 5% of > the necessary changes, and then the rest of your patches are rejected, > and you have to end up fixing the hardware anyway, with the patches > upstreamed so far being wasted effort. :-) > > If we get consensus on that document, then that can get checked into > Documentation, and that can represent general consensus on the problem > early on. Sure, I am willing to work on this since we already spent quite a lot of effort looking into this problem. My only question is how to organize a review of such document in a sane and productive way and to make sure all relevant people are included into discussion. As I said, this spawns across many areas in kernel, and ideally you would want different people review their area in detail. For example, one of many aspects we need to worry is security of CC guest LRNG ( especially in cases when we don’t have a trusted security HW source of entropy) [2] and here a feedback from LRNG experts would be important. I guess the first clear step I can do is to re-write the relevant part of [1] into a CC-technology neutral language and then would need feedback and input from AMD guys to make sure it correctly reflects their case also. We can probably do this preparation work on linux-coco mailing list and then post for a wider review? Best Regards, Elena. [1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#threat-model [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest > > - Ted ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:29 ` Reshetova, Elena 2023-01-25 16:40 ` Theodore Ts'o @ 2023-01-26 11:19 ` Leon Romanovsky 2023-01-26 11:29 ` Reshetova, Elena 2023-01-26 16:29 ` Michael S. Tsirkin 2 siblings, 1 reply; 102+ messages in thread From: Leon Romanovsky @ 2023-01-26 11:19 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > Replying only to the not-so-far addressed points. > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > Hi Greg, <...> > > > 3) All the tools are open-source and everyone can start using them right away > > even > > > without any special HW (readme has description of what is needed). > > > Tools and documentation is here: > > > https://github.com/intel/ccc-linux-guest-hardening > > > > Again, as our documentation states, when you submit patches based on > > these tools, you HAVE TO document that. Otherwise we think you all are > > crazy and will get your patches rejected. You all know this, why ignore > > it? > > Sorry, I didn’t know that for every bug that is found in linux kernel when > we are submitting a fix that we have to list the way how it has been found. > We will fix this in the future submissions, but some bugs we have are found by > plain code audit, so 'human' is the tool. My problem with that statement is that by applying different threat model you "invent" bugs which didn't exist in a first place. For example, in this [1] latest submission, authors labeled correct behaviour as "bug". [1] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com/ Thanks ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-26 11:19 ` Leon Romanovsky @ 2023-01-26 11:29 ` Reshetova, Elena 2023-01-26 12:30 ` Leon Romanovsky 2023-01-26 13:58 ` Dr. David Alan Gilbert 0 siblings, 2 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-26 11:29 UTC (permalink / raw) To: Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > Replying only to the not-so-far addressed points. > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > Hi Greg, > > <...> > > > > > 3) All the tools are open-source and everyone can start using them right > away > > > even > > > > without any special HW (readme has description of what is needed). > > > > Tools and documentation is here: > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > Again, as our documentation states, when you submit patches based on > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > crazy and will get your patches rejected. You all know this, why ignore > > > it? > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > we are submitting a fix that we have to list the way how it has been found. > > We will fix this in the future submissions, but some bugs we have are found by > > plain code audit, so 'human' is the tool. > > My problem with that statement is that by applying different threat > model you "invent" bugs which didn't exist in a first place. > > For example, in this [1] latest submission, authors labeled correct > behaviour as "bug". > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > alexander.shishkin@linux.intel.com/ Hm.. Does everyone think that when kernel dies with unhandled page fault (such as in that case) or detection of a KASAN out of bounds violation (as it is in some other cases we already have fixes or investigating) it represents a correct behavior even if you expect that all your pci HW devices are trusted? What about an error in two consequent pci reads? What about just some failure that results in erroneous input? Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 11:29 ` Reshetova, Elena @ 2023-01-26 12:30 ` Leon Romanovsky 2023-01-26 13:28 ` Reshetova, Elena 2023-01-27 9:32 ` Jörg Rödel 2023-01-26 13:58 ` Dr. David Alan Gilbert 1 sibling, 2 replies; 102+ messages in thread From: Leon Romanovsky @ 2023-01-26 12:30 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote: > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > Replying only to the not-so-far addressed points. > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > Hi Greg, > > > > <...> > > > > > > > 3) All the tools are open-source and everyone can start using them right > > away > > > > even > > > > > without any special HW (readme has description of what is needed). > > > > > Tools and documentation is here: > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > it? > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > we are submitting a fix that we have to list the way how it has been found. > > > We will fix this in the future submissions, but some bugs we have are found by > > > plain code audit, so 'human' is the tool. > > > > My problem with that statement is that by applying different threat > > model you "invent" bugs which didn't exist in a first place. > > > > For example, in this [1] latest submission, authors labeled correct > > behaviour as "bug". > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > alexander.shishkin@linux.intel.com/ > > Hm.. Does everyone think that when kernel dies with unhandled page fault > (such as in that case) or detection of a KASAN out of bounds violation (as it is in some > other cases we already have fixes or investigating) it represents a correct behavior even if > you expect that all your pci HW devices are trusted? This is exactly what I said. You presented me the cases which exist in your invented world. Mentioned unhandled page fault doesn't exist in real world. If PCI device doesn't work, it needs to be replaced/blocked and not left to be operable and accessible from the kernel/user. > What about an error in two consequent pci reads? What about just some > failure that results in erroneous input? Yes, some bugs need to be fixed, but they are not related to trust/not-trust discussion and PCI spec violations. Thanks > > Best Regards, > Elena. > ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-26 12:30 ` Leon Romanovsky @ 2023-01-26 13:28 ` Reshetova, Elena 2023-01-26 13:50 ` Leon Romanovsky ` (2 more replies) 2023-01-27 9:32 ` Jörg Rödel 1 sibling, 3 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-26 13:28 UTC (permalink / raw) To: Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote: > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > Replying only to the not-so-far addressed points. > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > Hi Greg, > > > > > > <...> > > > > > > > > > 3) All the tools are open-source and everyone can start using them right > > > away > > > > > even > > > > > > without any special HW (readme has description of what is needed). > > > > > > Tools and documentation is here: > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > > it? > > > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > > we are submitting a fix that we have to list the way how it has been found. > > > > We will fix this in the future submissions, but some bugs we have are found > by > > > > plain code audit, so 'human' is the tool. > > > > > > My problem with that statement is that by applying different threat > > > model you "invent" bugs which didn't exist in a first place. > > > > > > For example, in this [1] latest submission, authors labeled correct > > > behaviour as "bug". > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > alexander.shishkin@linux.intel.com/ > > > > Hm.. Does everyone think that when kernel dies with unhandled page fault > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in > some > > other cases we already have fixes or investigating) it represents a correct > behavior even if > > you expect that all your pci HW devices are trusted? > > This is exactly what I said. You presented me the cases which exist in > your invented world. Mentioned unhandled page fault doesn't exist in real > world. If PCI device doesn't work, it needs to be replaced/blocked and not > left to be operable and accessible from the kernel/user. Can we really assure correct operation of *all* pci devices out there? How would such an audit be performed given a huge set of them available? Isnt it better instead to make a small fix in the kernel behavior that would guard us from such potentially not correctly operating devices? > > > What about an error in two consequent pci reads? What about just some > > failure that results in erroneous input? > > Yes, some bugs need to be fixed, but they are not related to trust/not-trust > discussion and PCI spec violations. Let's forget the trust angle here (it only applies to the Confidential Computing threat model and you clearly implying the existing threat model instead) and stick just to the not-correctly operating device. What you are proposing is to fix *unknown* bugs in multitude of pci devices that (in case of this particular MSI bug) can lead to two different values being read from the config space and kernel incorrectly handing this situation. Isn't it better to do the clear fix in one place to ensure such situation (two subsequent reads with different values) cannot even happen in theory? In security we have a saying that fixing a root cause of the problem is the most efficient way to mitigate the problem. The root cause here is a double-read with different values, so if it can be substituted with an easy and clear patch that probably even improves performance as we do one less pci read and use cached value instead, where is the problem in this particular case? If there are technical issues with the patch, of course we need to discuss it/fix it, but it seems we are arguing here about whenever or not we want to be fixing kernel code when we notice such cases... Best Regards, Elena ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 13:28 ` Reshetova, Elena @ 2023-01-26 13:50 ` Leon Romanovsky 2023-01-26 20:54 ` Theodore Ts'o 2023-01-27 19:24 ` James Bottomley 2 siblings, 0 replies; 102+ messages in thread From: Leon Romanovsky @ 2023-01-26 13:50 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote: > > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote: > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > > Replying only to the not-so-far addressed points. > > > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > > Hi Greg, > > > > > > > > <...> > > > > > > > > > > > 3) All the tools are open-source and everyone can start using them right > > > > away > > > > > > even > > > > > > > without any special HW (readme has description of what is needed). > > > > > > > Tools and documentation is here: > > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > > > it? > > > > > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > > > we are submitting a fix that we have to list the way how it has been found. > > > > > We will fix this in the future submissions, but some bugs we have are found > > by > > > > > plain code audit, so 'human' is the tool. > > > > > > > > My problem with that statement is that by applying different threat > > > > model you "invent" bugs which didn't exist in a first place. > > > > > > > > For example, in this [1] latest submission, authors labeled correct > > > > behaviour as "bug". > > > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > > alexander.shishkin@linux.intel.com/ > > > > > > Hm.. Does everyone think that when kernel dies with unhandled page fault > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in > > some > > > other cases we already have fixes or investigating) it represents a correct > > behavior even if > > > you expect that all your pci HW devices are trusted? > > > > This is exactly what I said. You presented me the cases which exist in > > your invented world. Mentioned unhandled page fault doesn't exist in real > > world. If PCI device doesn't work, it needs to be replaced/blocked and not > > left to be operable and accessible from the kernel/user. > > Can we really assure correct operation of *all* pci devices out there? Why do we need to do it in 2022? These *all* pci devices work. > How would such an audit be performed given a huge set of them available? Compliance tests? https://pcisig.com/developers/compliance-program > Isnt it better instead to make a small fix in the kernel behavior that would guard > us from such potentially not correctly operating devices? Like Greg already said, this is a small drop in a ocean which needs to be changed. However even in mentioned by me case, you are not fixing but hiding real problem of having broken device in my machine. It is worst possible solution for the users. > > > > > > > What about an error in two consequent pci reads? What about just some > > > failure that results in erroneous input? > > > > Yes, some bugs need to be fixed, but they are not related to trust/not-trust > > discussion and PCI spec violations. > > Let's forget the trust angle here (it only applies to the Confidential Computing > threat model and you clearly implying the existing threat model instead) and stick just to > the not-correctly operating device. What you are proposing is to fix *unknown* bugs > in multitude of pci devices that (in case of this particular MSI bug) can > lead to two different values being read from the config space and kernel incorrectly > handing this situation. Let's don't call bug for something which is not. Random crashes are much more tolerable then "working" device which sends random results. > Isn't it better to do the clear fix in one place to ensure such > situation (two subsequent reads with different values) cannot even happen in theory? > In security we have a saying that fixing a root cause of the problem is the most efficient > way to mitigate the problem. The root cause here is a double-read with different values, > so if it can be substituted with an easy and clear patch that probably even improves > performance as we do one less pci read and use cached value instead, where is the > problem in this particular case? If there are technical issues with the patch, of course we > need to discuss it/fix it, but it seems we are arguing here about whenever or not we want > to be fixing kernel code when we notice such cases... Not really, we are arguing what is the right thing to do: 1. Fix a root cause - device 2. Hide the failure and pretend what everything is perfect despite having problematic device. Thanks > > Best Regards, > Elena > > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 13:28 ` Reshetova, Elena 2023-01-26 13:50 ` Leon Romanovsky @ 2023-01-26 20:54 ` Theodore Ts'o 2023-01-27 19:24 ` James Bottomley 2 siblings, 0 replies; 102+ messages in thread From: Theodore Ts'o @ 2023-01-26 20:54 UTC (permalink / raw) To: Reshetova, Elena Cc: Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote: > > This is exactly what I said. You presented me the cases which exist in > > your invented world. Mentioned unhandled page fault doesn't exist in real > > world. If PCI device doesn't work, it needs to be replaced/blocked and not > > left to be operable and accessible from the kernel/user. > > Can we really assure correct operation of *all* pci devices out there? > How would such an audit be performed given a huge set of them available? > Isnt it better instead to make a small fix in the kernel behavior that would guard > us from such potentially not correctly operating devices? We assume that hardware works according to the spec; that's why we have a specification. Otherwise, things would be pretty insane, and would lead to massive bloat *everywhere*. If there are broken PCI devices out there, then we can blacklist the PCI device. If a manufacturer is consistently creating devices which don't obey the spec, we could block all devices from that manufacturer, and have an explicit white list for those devices from that manufacturer that actually work. If we can't count on a floating point instruction to return the right value, what are we supposed to do? Create a code which double checks every single floating point instruction just in case 2 + 2 = 3.99999999? :-) Ultimately, changing the trust boundary what is considered is a fundamentally hard thing, and to try to claim that code that assumes that things inside the trust boundary are, well, trusted, is not a great way to win friends and influence people. > Let's forget the trust angle here (it only applies to the Confidential Computing > threat model and you clearly implying the existing threat model instead) and stick just to > the not-correctly operating device. What you are proposing is to fix *unknown* bugs > in multitude of pci devices that (in case of this particular MSI bug) can > lead to two different values being read from the config space and kernel incorrectly > handing this situation. I don't think that's what people are saying. If there are buggy PCI devices, we can put them on block lists. But checking that every single read from the config space is unchanged is not something we should do, period. > Isn't it better to do the clear fix in one place to ensure such > situation (two subsequent reads with different values) cannot even happen in theory? > In security we have a saying that fixing a root cause of the problem is the most efficient > way to mitigate the problem. The root cause here is a double-read with different values, > so if it can be substituted with an easy and clear patch that probably even improves > performance as we do one less pci read and use cached value instead, where is the > problem in this particular case? If there are technical issues with the patch, of course we > need to discuss it/fix it, but it seems we are arguing here about whenever or not we want > to be fixing kernel code when we notice such cases... Well, if there is a performance win to cache a read from config space, then make the argument from a performance perspective. But caching values takes memory, and will potentially bloat data structures. It's not necessarily cost-free to caching every single config space variable to prevent double-read from either buggy or malicious devices. So it's one thing if we make each decision from a cost-benefit perspective. But then it's a *optimization*, not a *bug-fix*, and it also means that we aren't obligated to cache every single read from config space, lest someone wag their fingers at us saying, "Buggy! Your code is Buggy!". Cheers, - Ted ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 13:28 ` Reshetova, Elena 2023-01-26 13:50 ` Leon Romanovsky 2023-01-26 20:54 ` Theodore Ts'o @ 2023-01-27 19:24 ` James Bottomley 2023-01-30 7:42 ` Reshetova, Elena 2 siblings, 1 reply; 102+ messages in thread From: James Bottomley @ 2023-01-27 19:24 UTC (permalink / raw) To: Reshetova, Elena, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote: > > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote: > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena > > > > wrote: > > > > > Replying only to the not-so-far addressed points. > > > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena > > > > > > wrote: > > > > > > > Hi Greg, > > > > > > > > <...> > > > > > > > > > > > 3) All the tools are open-source and everyone can start > > > > > > > using them right away even without any special HW (readme > > > > > > > has description of what is needed). > > > > > > > Tools and documentation is here: > > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > > > Again, as our documentation states, when you submit patches > > > > > > based on these tools, you HAVE TO document that. Otherwise > > > > > > we think you all are crazy and will get your patches > > > > > > rejected. You all know this, why ignore it? > > > > > > > > > > Sorry, I didn’t know that for every bug that is found in > > > > > linux kernel when we are submitting a fix that we have to > > > > > list the way how it has been found. We will fix this in the > > > > > future submissions, but some bugs we have are found by > > > > > plain code audit, so 'human' is the tool. > > > > My problem with that statement is that by applying different > > > > threat model you "invent" bugs which didn't exist in a first > > > > place. > > > > > > > > For example, in this [1] latest submission, authors labeled > > > > correct behaviour as "bug". > > > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > > alexander.shishkin@linux.intel.com/ > > > > > > Hm.. Does everyone think that when kernel dies with unhandled > > > page fault (such as in that case) or detection of a KASAN out of > > > bounds violation (as it is in some other cases we already have > > > fixes or investigating) it represents a correct behavior even if > > > you expect that all your pci HW devices are trusted? > > > > This is exactly what I said. You presented me the cases which exist > > in your invented world. Mentioned unhandled page fault doesn't > > exist in real world. If PCI device doesn't work, it needs to be > > replaced/blocked and not left to be operable and accessible from > > the kernel/user. > > Can we really assure correct operation of *all* pci devices out > there? How would such an audit be performed given a huge set of them > available? Isnt it better instead to make a small fix in the kernel > behavior that would guard us from such potentially not correctly > operating devices? I think this is really the wrong question from the confidential computing (CC) point of view. The question shouldn't be about assuring that the PCI device is operating completely correctly all the time (for some value of correct). It's if it were programmed to be malicious what could it do to us? If we take all DoS and Crash outcomes off the table (annoying but harmless if they don't reveal the confidential contents), we're left with it trying to extract secrets from the confidential environment. The big threat from most devices (including the thunderbolt classes) is that they can DMA all over memory. However, this isn't really a threat in CC (well until PCI becomes able to do encrypted DMA) because the device has specific unencrypted buffers set aside for the expected DMA. If it writes outside that CC integrity will detect it and if it reads outside that it gets unintelligible ciphertext. So we're left with the device trying to trick secrets out of us by returning unexpected data. If I set this as the problem, verifying device correct operation is a possible solution (albeit hugely expensive), but there are likely many other cheaper ways to defeat or detect a device trying to trick us into revealing something. James ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-27 19:24 ` James Bottomley @ 2023-01-30 7:42 ` Reshetova, Elena 2023-01-30 12:40 ` James Bottomley 0 siblings, 1 reply; 102+ messages in thread From: Reshetova, Elena @ 2023-01-30 7:42 UTC (permalink / raw) To: jejb, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote: > > > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote: > > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena > > > > > wrote: > > > > > > Replying only to the not-so-far addressed points. > > > > > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena > > > > > > > wrote: > > > > > > > > Hi Greg, > > > > > > > > > > <...> > > > > > > > > > > > > > 3) All the tools are open-source and everyone can start > > > > > > > > using them right away even without any special HW (readme > > > > > > > > has description of what is needed). > > > > > > > > Tools and documentation is here: > > > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > > > > > Again, as our documentation states, when you submit patches > > > > > > > based on these tools, you HAVE TO document that. Otherwise > > > > > > > we think you all are crazy and will get your patches > > > > > > > rejected. You all know this, why ignore it? > > > > > > > > > > > > Sorry, I didn’t know that for every bug that is found in > > > > > > linux kernel when we are submitting a fix that we have to > > > > > > list the way how it has been found. We will fix this in the > > > > > > future submissions, but some bugs we have are found by > > > > > > plain code audit, so 'human' is the tool. > > > > > My problem with that statement is that by applying different > > > > > threat model you "invent" bugs which didn't exist in a first > > > > > place. > > > > > > > > > > For example, in this [1] latest submission, authors labeled > > > > > correct behaviour as "bug". > > > > > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > > > alexander.shishkin@linux.intel.com/ > > > > > > > > Hm.. Does everyone think that when kernel dies with unhandled > > > > page fault (such as in that case) or detection of a KASAN out of > > > > bounds violation (as it is in some other cases we already have > > > > fixes or investigating) it represents a correct behavior even if > > > > you expect that all your pci HW devices are trusted? > > > > > > This is exactly what I said. You presented me the cases which exist > > > in your invented world. Mentioned unhandled page fault doesn't > > > exist in real world. If PCI device doesn't work, it needs to be > > > replaced/blocked and not left to be operable and accessible from > > > the kernel/user. > > > > Can we really assure correct operation of *all* pci devices out > > there? How would such an audit be performed given a huge set of them > > available? Isnt it better instead to make a small fix in the kernel > > behavior that would guard us from such potentially not correctly > > operating devices? > > I think this is really the wrong question from the confidential > computing (CC) point of view. The question shouldn't be about assuring > that the PCI device is operating completely correctly all the time (for > some value of correct). It's if it were programmed to be malicious > what could it do to us? Sure, but Leon didn’t agree with CC threat model to begin with, so I was trying to argue here how this fix can be useful for non-CC threat model case. But obviously my argument for non-CC case wasn't good ( especially reading Ted's reply here https://lore.kernel.org/all/Y9Lonw9HzlosUPnS@mit.edu/ ), so I better stick to CC threat model case indeed. >If we take all DoS and Crash outcomes off the > table (annoying but harmless if they don't reveal the confidential > contents), we're left with it trying to extract secrets from the > confidential environment. Yes, this is the ultimate end goal. > > The big threat from most devices (including the thunderbolt classes) is > that they can DMA all over memory. However, this isn't really a threat > in CC (well until PCI becomes able to do encrypted DMA) because the > device has specific unencrypted buffers set aside for the expected DMA. > If it writes outside that CC integrity will detect it and if it reads > outside that it gets unintelligible ciphertext. So we're left with the > device trying to trick secrets out of us by returning unexpected data. Yes, by supplying the input that hasn’t been expected. This is exactly the case we were trying to fix here for example: https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/ I do agree that this case is less severe when others where memory corruption/buffer overrun can happen, like here: https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/ But we are trying to fix all issues we see now (prioritizing the second ones though). > > If I set this as the problem, verifying device correct operation is a > possible solution (albeit hugely expensive) but there are likely many > other cheaper ways to defeat or detect a device trying to trick us into > revealing something. What do you have in mind here for the actual devices we need to enable for CC cases? We have been using here a combination of extensive fuzzing and static code analysis. Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-30 7:42 ` Reshetova, Elena @ 2023-01-30 12:40 ` James Bottomley 2023-01-31 11:31 ` Reshetova, Elena 0 siblings, 1 reply; 102+ messages in thread From: James Bottomley @ 2023-01-30 12:40 UTC (permalink / raw) To: Reshetova, Elena, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: [...] > > The big threat from most devices (including the thunderbolt > > classes) is that they can DMA all over memory. However, this isn't > > really a threat in CC (well until PCI becomes able to do encrypted > > DMA) because the device has specific unencrypted buffers set aside > > for the expected DMA. If it writes outside that CC integrity will > > detect it and if it reads outside that it gets unintelligible > > ciphertext. So we're left with the device trying to trick secrets > > out of us by returning unexpected data. > > Yes, by supplying the input that hasn’t been expected. This is > exactly the case we were trying to fix here for example: > https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/ > I do agree that this case is less severe when others where memory > corruption/buffer overrun can happen, like here: > https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/ > But we are trying to fix all issues we see now (prioritizing the > second ones though). I don't see how MSI table sizing is a bug in the category we've defined. The very text of the changelog says "resulting in a kernel page fault in pci_write_msg_msix()." which is a crash, which I thought we were agreeing was out of scope for CC attacks? > > > > If I set this as the problem, verifying device correct operation is > > a possible solution (albeit hugely expensive) but there are likely > > many other cheaper ways to defeat or detect a device trying to > > trick us into revealing something. > > What do you have in mind here for the actual devices we need to > enable for CC cases? Well, the most dangerous devices seem to be the virtio set a CC system will rely on to boot up. After that, there are other ways (like SPDM) to verify a real PCI device is on the other end of the transaction. > We have been using here a combination of extensive fuzzing and static > code analysis. by fuzzing, I assume you mean fuzzing from the PCI configuration space? Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses off the table because fuzzing primarily triggers those so its hard to see what else it could detect given the signal will be smothered by oopses and secondly I think the PCI interface is likely the wrong place to begin and you should probably begin on the virtio bus and the hypervisor generated configuration space. James ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-30 12:40 ` James Bottomley @ 2023-01-31 11:31 ` Reshetova, Elena 2023-01-31 13:28 ` James Bottomley 2023-02-02 14:51 ` Jeremi Piotrowski 0 siblings, 2 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-31 11:31 UTC (permalink / raw) To: jejb, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: > [...] > > > The big threat from most devices (including the thunderbolt > > > classes) is that they can DMA all over memory. However, this isn't > > > really a threat in CC (well until PCI becomes able to do encrypted > > > DMA) because the device has specific unencrypted buffers set aside > > > for the expected DMA. If it writes outside that CC integrity will > > > detect it and if it reads outside that it gets unintelligible > > > ciphertext. So we're left with the device trying to trick secrets > > > out of us by returning unexpected data. > > > > Yes, by supplying the input that hasn’t been expected. This is > > exactly the case we were trying to fix here for example: > > https://lore.kernel.org/all/20230119170633.40944-2- > alexander.shishkin@linux.intel.com/ > > I do agree that this case is less severe when others where memory > > corruption/buffer overrun can happen, like here: > > https://lore.kernel.org/all/20230119135721.83345-6- > alexander.shishkin@linux.intel.com/ > > But we are trying to fix all issues we see now (prioritizing the > > second ones though). > > I don't see how MSI table sizing is a bug in the category we've > defined. The very text of the changelog says "resulting in a kernel > page fault in pci_write_msg_msix()." which is a crash, which I thought > we were agreeing was out of scope for CC attacks? As I said this is an example of a crash and on the first look might not lead to the exploitable condition (albeit attackers are creative). But we noticed this one while fuzzing and it was common enough that prevented fuzzer going deeper into the virtio devices driver fuzzing. The core PCI/MSI doesn’t seem to have that many easily triggerable Other examples in virtio patchset are more severe. > > > > > > > If I set this as the problem, verifying device correct operation is > > > a possible solution (albeit hugely expensive) but there are likely > > > many other cheaper ways to defeat or detect a device trying to > > > trick us into revealing something. > > > > What do you have in mind here for the actual devices we need to > > enable for CC cases? > > Well, the most dangerous devices seem to be the virtio set a CC system > will rely on to boot up. After that, there are other ways (like SPDM) > to verify a real PCI device is on the other end of the transaction. Yes, it the future, but not yet. Other vendors will not necessary be using virtio devices at this point, so we will have non-virtio and not CC enabled devices that we want to securely add to the guest. > > > We have been using here a combination of extensive fuzzing and static > > code analysis. > > by fuzzing, I assume you mean fuzzing from the PCI configuration space? > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses > off the table because fuzzing primarily triggers those If you enable memory sanitizers you can detect more server conditions like out of bounds accesses and such. I think given that we have a way to verify that fuzzing is reaching the code locations we want it to reach, it can be pretty effective method to find at least low-hanging bugs. And these will be the bugs that most of the attackers will go after at the first place. But of course it is not a formal verification of any kind. so its hard to > see what else it could detect given the signal will be smothered by > oopses and secondly I think the PCI interface is likely the wrong place > to begin and you should probably begin on the virtio bus and the > hypervisor generated configuration space. This is exactly what we do. We don’t fuzz from the PCI config space, we supply inputs from the host/vmm via the legitimate interfaces that it can inject them to the guest: whenever guest requests a pci config space (which is controlled by host/hypervisor as you said) read operation, it gets input injected by the kafl fuzzer. Same for other interfaces that are under control of host/VMM (MSRs, port IO, MMIO, anything that goes via #VE handler in our case). When it comes to virtio, we employ two different fuzzing techniques: directly injecting kafl fuzz input when virtio core or virtio drivers gets the data received from the host (via injecting input in functions virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory pages using kfx fuzzer. More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 11:31 ` Reshetova, Elena @ 2023-01-31 13:28 ` James Bottomley 2023-01-31 15:14 ` Christophe de Dinechin 2023-01-31 16:34 ` Reshetova, Elena 2023-02-02 14:51 ` Jeremi Piotrowski 1 sibling, 2 replies; 102+ messages in thread From: James Bottomley @ 2023-01-31 13:28 UTC (permalink / raw) To: Reshetova, Elena, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote: > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: > > [...] > > > > The big threat from most devices (including the thunderbolt > > > > classes) is that they can DMA all over memory. However, this > > > > isn't really a threat in CC (well until PCI becomes able to do > > > > encrypted DMA) because the device has specific unencrypted > > > > buffers set aside for the expected DMA. If it writes outside > > > > that CC integrity will detect it and if it reads outside that > > > > it gets unintelligible ciphertext. So we're left with the > > > > device trying to trick secrets out of us by returning > > > > unexpected data. > > > > > > Yes, by supplying the input that hasn’t been expected. This is > > > exactly the case we were trying to fix here for example: > > > https://lore.kernel.org/all/20230119170633.40944-2- > > alexander.shishkin@linux.intel.com/ > > > I do agree that this case is less severe when others where memory > > > corruption/buffer overrun can happen, like here: > > > https://lore.kernel.org/all/20230119135721.83345-6- > > alexander.shishkin@linux.intel.com/ > > > But we are trying to fix all issues we see now (prioritizing the > > > second ones though). > > > > I don't see how MSI table sizing is a bug in the category we've > > defined. The very text of the changelog says "resulting in a > > kernel page fault in pci_write_msg_msix()." which is a crash, > > which I thought we were agreeing was out of scope for CC attacks? > > As I said this is an example of a crash and on the first look > might not lead to the exploitable condition (albeit attackers are > creative). But we noticed this one while fuzzing and it was common > enough that prevented fuzzer going deeper into the virtio devices > driver fuzzing. The core PCI/MSI doesn’t seem to have that many > easily triggerable Other examples in virtio patchset are more severe. You cited this as your example. I'm pointing out it seems to be an event of the class we've agreed not to consider because it's an oops not an exploit. If there are examples of fixing actual exploits to CC VMs, what are they? This patch is, however, an example of the problem everyone else on the thread is complaining about: a patch which adds an unnecessary check to the MSI subsystem; unnecessary because it doesn't fix a CC exploit and in the real world the tables are correct (or the manufacturer is quickly chastened), so it adds overhead to no benefit. [...] > > see what else it could detect given the signal will be smothered by > > oopses and secondly I think the PCI interface is likely the wrong > > place to begin and you should probably begin on the virtio bus and > > the hypervisor generated configuration space. > > This is exactly what we do. We don’t fuzz from the PCI config space, > we supply inputs from the host/vmm via the legitimate interfaces that > it can inject them to the guest: whenever guest requests a pci config > space (which is controlled by host/hypervisor as you said) read > operation, it gets input injected by the kafl fuzzer. Same for other > interfaces that are under control of host/VMM (MSRs, port IO, MMIO, > anything that goes via #VE handler in our case). When it comes to > virtio, we employ two different fuzzing techniques: directly > injecting kafl fuzz input when virtio core or virtio drivers gets the > data received from the host (via injecting input in functions > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory > pages using kfx fuzzer. More information can be found in > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing Given that we previously agreed that oppses and other DoS attacks are out of scope for CC, I really don't think fuzzing, which primarily finds oopses, is at all a useful tool unless you filter the results by the question "could we exploit this in a CC VM to reveal secrets". Without applying that filter you're sending a load of patches which don't really do much to reduce the CC attack surface and which do annoy non-CC people because they add pointless checks to things they expect the cards and config tables to get right. James ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 13:28 ` James Bottomley @ 2023-01-31 15:14 ` Christophe de Dinechin 2023-01-31 17:39 ` Michael S. Tsirkin 2023-02-01 10:24 ` Christophe de Dinechin 2023-01-31 16:34 ` Reshetova, Elena 1 sibling, 2 replies; 102+ messages in thread From: Christophe de Dinechin @ 2023-01-31 15:14 UTC (permalink / raw) To: jejb Cc: Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote... > On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote: >> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: >> > [...] >> > > > The big threat from most devices (including the thunderbolt >> > > > classes) is that they can DMA all over memory. However, this >> > > > isn't really a threat in CC (well until PCI becomes able to do >> > > > encrypted DMA) because the device has specific unencrypted >> > > > buffers set aside for the expected DMA. If it writes outside >> > > > that CC integrity will detect it and if it reads outside that >> > > > it gets unintelligible ciphertext. So we're left with the >> > > > device trying to trick secrets out of us by returning >> > > > unexpected data. >> > > >> > > Yes, by supplying the input that hasn’t been expected. This is >> > > exactly the case we were trying to fix here for example: >> > > https://lore.kernel.org/all/20230119170633.40944-2- >> > alexander.shishkin@linux.intel.com/ >> > > I do agree that this case is less severe when others where memory >> > > corruption/buffer overrun can happen, like here: >> > > https://lore.kernel.org/all/20230119135721.83345-6- >> > alexander.shishkin@linux.intel.com/ >> > > But we are trying to fix all issues we see now (prioritizing the >> > > second ones though). >> > >> > I don't see how MSI table sizing is a bug in the category we've >> > defined. The very text of the changelog says "resulting in a >> > kernel page fault in pci_write_msg_msix()." which is a crash, >> > which I thought we were agreeing was out of scope for CC attacks? >> >> As I said this is an example of a crash and on the first look >> might not lead to the exploitable condition (albeit attackers are >> creative). But we noticed this one while fuzzing and it was common >> enough that prevented fuzzer going deeper into the virtio devices >> driver fuzzing. The core PCI/MSI doesn’t seem to have that many >> easily triggerable Other examples in virtio patchset are more severe. > > You cited this as your example. I'm pointing out it seems to be an > event of the class we've agreed not to consider because it's an oops > not an exploit. If there are examples of fixing actual exploits to CC > VMs, what are they? > > This patch is, however, an example of the problem everyone else on the > thread is complaining about: a patch which adds an unnecessary check to > the MSI subsystem; unnecessary because it doesn't fix a CC exploit and > in the real world the tables are correct (or the manufacturer is > quickly chastened), so it adds overhead to no benefit. I'd like to backtrack a little here. 1/ PCI-as-a-thread, where does it come from? On physical devices, we have to assume that the device is working. As other pointed out, there are things like PCI compliance tests, etc. So Linux has to trust the device. You could manufacture a broken device intentionally, but the value you would get from that would be limited. On a CC system, the "PCI" values are really provided by the hypervisor, which is not trusted. This leads to this peculiar way of thinking where we say "what happens if virtual device feeds us a bogus value *intentionally*". We cannot assume that the *virtual* PCI device ran through the compliance tests. Instead, we see the PCI interface as hostile, which makes us look like weirdos to the rest of the community. Consequently, as James pointed out, we first need to focus on consequences that would break what I would call the "CC promise", which is essentially that we'd rather kill the guest than reveal its secrets. Unless you have a credible path to a secret being revealed, don't bother "fixing" a bug. And as was pointed out elsewhere in this thread, caching has a cost, so you can't really use the "optimization" angle either. 2/ Clarification of the "CC promise" and value proposition Based on the above, the very first thing is to clarify that "CC promise", because if exchanges on this thread have proved anything, it is that it's quite unclear to anyone outside the "CoCo world". The Linux Guest Kernel Security Specification needs to really elaborate on what the value proposition of CC is, not assume it is a given. "Bug fixes" before this value proposition has been understood and accepted by the non-CoCo community are likely to go absolutely nowhere. Here is a quick proposal for the Purpose and Scope section: <doc> Purpose and Scope Confidential Computing (CC) is a set of technologies that allows a guest to run without having to trust either the hypervisor or the host. CC offers two new guarantees to the guest compared to the non-CC case: a) The guest will be able to measure and attest, by cryptographic means, the guest software stack that it is running, and be assured that this software stack cannot be tampered with by the host or the hypervisor after it was measured. The root of trust for this aspect of CC is typically the CPU manufacturer (e.g. through a private key that can be used to respond to cryptographic challenges). b) Guest state, including memory, become secrets which must remain inaccessible to the host. In a CC context, it is considered preferable to stop or kill a guest rather than risk leaking its secrets. This aspect of CC is typically enforced by means such as memory encryption and new semantics for memory protection. CC leads to a different threat model for a Linux kernel running as a guest inside a confidential virtual machine (CVM). Notably, whereas the machine (CPU, I/O devices, etc) is usually considered as trustworthy, in the CC case, the hypervisor emulating some aspects of the virtual machine is now considered as potentially malicious. Consequently, effects of any data provided by the guest to the hypervisor, including ACPI configuration tables, MMIO interfaces or machine specific registers (MSRs) need to be re-evaluated. This document describes the security architecture of the Linux guest kernel running inside a CVM, with a particular focus on the Intel TDX implementation. Many aspects of this document will be applicable to other CC implementations such as AMD SEV. Aspects of the guest-visible state that are under direct control of the hardware, such as the CPU state or memory protection, will be considered as being handled by the CC implementations. This document will therefore only focus on aspects of the virtual machine that are typically managed by the hypervisor or the host. Since the host ultimately owns the resources and can allocate them at will, including denying their use at any point, this document will not address denial or service or performance degradation. It will however cover random number generation, which is central for cryptographic security. Finally, security considerations that apply irrespective of whether the platform is confidential or not are also outside of the scope of this document. This includes topics ranging from timing attacks to social engineering. </doc> Feel free to comment and reword at will ;-) 3/ PCI-as-a-threat: where does that come from Isn't there a fundamental difference, from a threat model perspective, between a bad actor, say a rogue sysadmin dumping the guest memory (which CC should defeat) and compromised software feeding us bad data? I think there is: at leats inside the TCB, we can detect bad software using measurements, and prevent it from running using attestation. In other words, we first check what we will run, then we run it. The security there is that we know what we are running. The trust we have in the software is from testing, reviewing or using it. This relies on a key aspect provided by TDX and SEV, which is that the software being measured is largely tamper-resistant thanks to memory encryption. In other words, after you have measured your guest software stack, the host or hypervisor cannot willy-nilly change it. So this brings me to the next question: is there any way we could offer the same kind of service for KVM and qemu? The measurement part seems relatively easy. Thetamper-resistant part, on the other hand, seems quite difficult to me. But maybe someone else will have a brilliant idea? So I'm asking the question, because if you could somehow prove to the guest not only that it's running the right guest stack (as we can do today) but also a known host/KVM/hypervisor stack, we would also switch the potential issues with PCI, MSRs and the like from "malicious" to merely "bogus", and this is something which is evidently easier to deal with. I briefly discussed this with James, and he pointed out two interesting aspects of that question: 1/ In the CC world, we don't really care about *virtual* PCI devices. We care about either virtio devices, or physical ones being passed through to the guest. Let's assume physical ones can be trusted, see above. That leaves virtio devices. How much damage can a malicious virtio device do to the guest kernel, and can this lead to secrets being leaked? 2/ He was not as negative as I anticipated on the possibility of somehow being able to prevent tampering of the guest. One example he mentioned is a research paper [1] about running the hypervisor itself inside an "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved with TDX using secure enclaves or some other mechanism? Sorry, this mail is a bit long ;-) > > > [...] >> > see what else it could detect given the signal will be smothered by >> > oopses and secondly I think the PCI interface is likely the wrong >> > place to begin and you should probably begin on the virtio bus and >> > the hypervisor generated configuration space. >> >> This is exactly what we do. We don’t fuzz from the PCI config space, >> we supply inputs from the host/vmm via the legitimate interfaces that >> it can inject them to the guest: whenever guest requests a pci config >> space (which is controlled by host/hypervisor as you said) read >> operation, it gets input injected by the kafl fuzzer. Same for other >> interfaces that are under control of host/VMM (MSRs, port IO, MMIO, >> anything that goes via #VE handler in our case). When it comes to >> virtio, we employ two different fuzzing techniques: directly >> injecting kafl fuzz input when virtio core or virtio drivers gets the >> data received from the host (via injecting input in functions >> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory >> pages using kfx fuzzer. More information can be found in >> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing > > Given that we previously agreed that oppses and other DoS attacks are > out of scope for CC, I really don't think fuzzing, which primarily > finds oopses, is at all a useful tool unless you filter the results by > the question "could we exploit this in a CC VM to reveal secrets". > Without applying that filter you're sending a load of patches which > don't really do much to reduce the CC attack surface and which do annoy > non-CC people because they add pointless checks to things they expect > the cards and config tables to get right. Indeed. [1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592 -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 15:14 ` Christophe de Dinechin @ 2023-01-31 17:39 ` Michael S. Tsirkin 2023-02-01 10:52 ` Christophe de Dinechin Dupont de Dinechin 2023-02-01 10:24 ` Christophe de Dinechin 1 sibling, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-31 17:39 UTC (permalink / raw) To: Christophe de Dinechin Cc: jejb, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: > Finally, security considerations that apply irrespective of whether the > platform is confidential or not are also outside of the scope of this > document. This includes topics ranging from timing attacks to social > engineering. Why are timing attacks by hypervisor on the guest out of scope? > </doc> > > Feel free to comment and reword at will ;-) > > > 3/ PCI-as-a-threat: where does that come from > > Isn't there a fundamental difference, from a threat model perspective, > between a bad actor, say a rogue sysadmin dumping the guest memory (which CC > should defeat) and compromised software feeding us bad data? I think there > is: at leats inside the TCB, we can detect bad software using measurements, > and prevent it from running using attestation. In other words, we first > check what we will run, then we run it. The security there is that we know > what we are running. The trust we have in the software is from testing, > reviewing or using it. > > This relies on a key aspect provided by TDX and SEV, which is that the > software being measured is largely tamper-resistant thanks to memory > encryption. In other words, after you have measured your guest software > stack, the host or hypervisor cannot willy-nilly change it. > > So this brings me to the next question: is there any way we could offer the > same kind of service for KVM and qemu? The measurement part seems relatively > easy. Thetamper-resistant part, on the other hand, seems quite difficult to > me. But maybe someone else will have a brilliant idea? > > So I'm asking the question, because if you could somehow prove to the guest > not only that it's running the right guest stack (as we can do today) but > also a known host/KVM/hypervisor stack, we would also switch the potential > issues with PCI, MSRs and the like from "malicious" to merely "bogus", and > this is something which is evidently easier to deal with. Agree absolutely that's much easier. > I briefly discussed this with James, and he pointed out two interesting > aspects of that question: > > 1/ In the CC world, we don't really care about *virtual* PCI devices. We > care about either virtio devices, or physical ones being passed through > to the guest. Let's assume physical ones can be trusted, see above. > That leaves virtio devices. How much damage can a malicious virtio device > do to the guest kernel, and can this lead to secrets being leaked? > > 2/ He was not as negative as I anticipated on the possibility of somehow > being able to prevent tampering of the guest. One example he mentioned is > a research paper [1] about running the hypervisor itself inside an > "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved > with TDX using secure enclaves or some other mechanism? Or even just secureboot based root of trust? -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 17:39 ` Michael S. Tsirkin @ 2023-02-01 10:52 ` Christophe de Dinechin Dupont de Dinechin 2023-02-01 11:01 ` Michael S. Tsirkin 0 siblings, 1 reply; 102+ messages in thread From: Christophe de Dinechin Dupont de Dinechin @ 2023-02-01 10:52 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: >> Finally, security considerations that apply irrespective of whether the >> platform is confidential or not are also outside of the scope of this >> document. This includes topics ranging from timing attacks to social >> engineering. > > Why are timing attacks by hypervisor on the guest out of scope? Good point. I was thinking that mitigation against timing attacks is the same irrespective of the source of the attack. However, because the HV controls CPU time allocation, there are presumably attacks that are made much easier through the HV. Those should be listed. > >> </doc> >> >> Feel free to comment and reword at will ;-) >> >> >> 3/ PCI-as-a-threat: where does that come from >> >> Isn't there a fundamental difference, from a threat model perspective, >> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC >> should defeat) and compromised software feeding us bad data? I think there >> is: at leats inside the TCB, we can detect bad software using measurements, >> and prevent it from running using attestation. In other words, we first >> check what we will run, then we run it. The security there is that we know >> what we are running. The trust we have in the software is from testing, >> reviewing or using it. >> >> This relies on a key aspect provided by TDX and SEV, which is that the >> software being measured is largely tamper-resistant thanks to memory >> encryption. In other words, after you have measured your guest software >> stack, the host or hypervisor cannot willy-nilly change it. >> >> So this brings me to the next question: is there any way we could offer the >> same kind of service for KVM and qemu? The measurement part seems relatively >> easy. Thetamper-resistant part, on the other hand, seems quite difficult to >> me. But maybe someone else will have a brilliant idea? >> >> So I'm asking the question, because if you could somehow prove to the guest >> not only that it's running the right guest stack (as we can do today) but >> also a known host/KVM/hypervisor stack, we would also switch the potential >> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and >> this is something which is evidently easier to deal with. > > Agree absolutely that's much easier. > >> I briefly discussed this with James, and he pointed out two interesting >> aspects of that question: >> >> 1/ In the CC world, we don't really care about *virtual* PCI devices. We >> care about either virtio devices, or physical ones being passed through >> to the guest. Let's assume physical ones can be trusted, see above. >> That leaves virtio devices. How much damage can a malicious virtio device >> do to the guest kernel, and can this lead to secrets being leaked? >> >> 2/ He was not as negative as I anticipated on the possibility of somehow >> being able to prevent tampering of the guest. One example he mentioned is >> a research paper [1] about running the hypervisor itself inside an >> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved >> with TDX using secure enclaves or some other mechanism? > > Or even just secureboot based root of trust? You mean host secureboot? Or guest? If it’s host, then the problem is detecting malicious tampering with host code (whether it’s kernel or hypervisor). If it’s guest, at the moment at least, the measurements do not extend beyond the TCB. > > -- > MST > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-01 10:52 ` Christophe de Dinechin Dupont de Dinechin @ 2023-02-01 11:01 ` Michael S. Tsirkin 2023-02-01 13:15 ` Christophe de Dinechin Dupont de Dinechin 2023-02-02 3:24 ` Jason Wang 0 siblings, 2 replies; 102+ messages in thread From: Michael S. Tsirkin @ 2023-02-01 11:01 UTC (permalink / raw) To: Christophe de Dinechin Dupont de Dinechin Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote: > > > > On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: > >> Finally, security considerations that apply irrespective of whether the > >> platform is confidential or not are also outside of the scope of this > >> document. This includes topics ranging from timing attacks to social > >> engineering. > > > > Why are timing attacks by hypervisor on the guest out of scope? > > Good point. > > I was thinking that mitigation against timing attacks is the same > irrespective of the source of the attack. However, because the HV > controls CPU time allocation, there are presumably attacks that > are made much easier through the HV. Those should be listed. Not just that, also because it can and does emulate some devices. For example, are disk encryption systems protected against timing of disk accesses? This is why some people keep saying "forget about emulated devices, require passthrough, include devices in the trust zone". > > > >> </doc> > >> > >> Feel free to comment and reword at will ;-) > >> > >> > >> 3/ PCI-as-a-threat: where does that come from > >> > >> Isn't there a fundamental difference, from a threat model perspective, > >> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC > >> should defeat) and compromised software feeding us bad data? I think there > >> is: at leats inside the TCB, we can detect bad software using measurements, > >> and prevent it from running using attestation. In other words, we first > >> check what we will run, then we run it. The security there is that we know > >> what we are running. The trust we have in the software is from testing, > >> reviewing or using it. > >> > >> This relies on a key aspect provided by TDX and SEV, which is that the > >> software being measured is largely tamper-resistant thanks to memory > >> encryption. In other words, after you have measured your guest software > >> stack, the host or hypervisor cannot willy-nilly change it. > >> > >> So this brings me to the next question: is there any way we could offer the > >> same kind of service for KVM and qemu? The measurement part seems relatively > >> easy. Thetamper-resistant part, on the other hand, seems quite difficult to > >> me. But maybe someone else will have a brilliant idea? > >> > >> So I'm asking the question, because if you could somehow prove to the guest > >> not only that it's running the right guest stack (as we can do today) but > >> also a known host/KVM/hypervisor stack, we would also switch the potential > >> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and > >> this is something which is evidently easier to deal with. > > > > Agree absolutely that's much easier. > > > >> I briefly discussed this with James, and he pointed out two interesting > >> aspects of that question: > >> > >> 1/ In the CC world, we don't really care about *virtual* PCI devices. We > >> care about either virtio devices, or physical ones being passed through > >> to the guest. Let's assume physical ones can be trusted, see above. > >> That leaves virtio devices. How much damage can a malicious virtio device > >> do to the guest kernel, and can this lead to secrets being leaked? > >> > >> 2/ He was not as negative as I anticipated on the possibility of somehow > >> being able to prevent tampering of the guest. One example he mentioned is > >> a research paper [1] about running the hypervisor itself inside an > >> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved > >> with TDX using secure enclaves or some other mechanism? > > > > Or even just secureboot based root of trust? > > You mean host secureboot? Or guest? > > If it’s host, then the problem is detecting malicious tampering with > host code (whether it’s kernel or hypervisor). Host. Lots of existing systems do this. As an extreme boot a RO disk, limit which packages are allowed. > If it’s guest, at the moment at least, the measurements do not extend > beyond the TCB. > > > > > -- > > MST > > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-01 11:01 ` Michael S. Tsirkin @ 2023-02-01 13:15 ` Christophe de Dinechin Dupont de Dinechin 2023-02-01 16:02 ` Michael S. Tsirkin 2023-02-02 3:24 ` Jason Wang 1 sibling, 1 reply; 102+ messages in thread From: Christophe de Dinechin Dupont de Dinechin @ 2023-02-01 13:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote: >> >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: >>> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: >>>> Finally, security considerations that apply irrespective of whether the >>>> platform is confidential or not are also outside of the scope of this >>>> document. This includes topics ranging from timing attacks to social >>>> engineering. >>> >>> Why are timing attacks by hypervisor on the guest out of scope? >> >> Good point. >> >> I was thinking that mitigation against timing attacks is the same >> irrespective of the source of the attack. However, because the HV >> controls CPU time allocation, there are presumably attacks that >> are made much easier through the HV. Those should be listed. > > Not just that, also because it can and does emulate some devices. > For example, are disk encryption systems protected against timing of > disk accesses? > This is why some people keep saying "forget about emulated devices, require > passthrough, include devices in the trust zone". > >>> >>>> </doc> >>>> >>>> Feel free to comment and reword at will ;-) >>>> >>>> >>>> 3/ PCI-as-a-threat: where does that come from >>>> >>>> Isn't there a fundamental difference, from a threat model perspective, >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC >>>> should defeat) and compromised software feeding us bad data? I think there >>>> is: at leats inside the TCB, we can detect bad software using measurements, >>>> and prevent it from running using attestation. In other words, we first >>>> check what we will run, then we run it. The security there is that we know >>>> what we are running. The trust we have in the software is from testing, >>>> reviewing or using it. >>>> >>>> This relies on a key aspect provided by TDX and SEV, which is that the >>>> software being measured is largely tamper-resistant thanks to memory >>>> encryption. In other words, after you have measured your guest software >>>> stack, the host or hypervisor cannot willy-nilly change it. >>>> >>>> So this brings me to the next question: is there any way we could offer the >>>> same kind of service for KVM and qemu? The measurement part seems relatively >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to >>>> me. But maybe someone else will have a brilliant idea? >>>> >>>> So I'm asking the question, because if you could somehow prove to the guest >>>> not only that it's running the right guest stack (as we can do today) but >>>> also a known host/KVM/hypervisor stack, we would also switch the potential >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and >>>> this is something which is evidently easier to deal with. >>> >>> Agree absolutely that's much easier. >>> >>>> I briefly discussed this with James, and he pointed out two interesting >>>> aspects of that question: >>>> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We >>>> care about either virtio devices, or physical ones being passed through >>>> to the guest. Let's assume physical ones can be trusted, see above. >>>> That leaves virtio devices. How much damage can a malicious virtio device >>>> do to the guest kernel, and can this lead to secrets being leaked? >>>> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow >>>> being able to prevent tampering of the guest. One example he mentioned is >>>> a research paper [1] about running the hypervisor itself inside an >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved >>>> with TDX using secure enclaves or some other mechanism? >>> >>> Or even just secureboot based root of trust? >> >> You mean host secureboot? Or guest? >> >> If it’s host, then the problem is detecting malicious tampering with >> host code (whether it’s kernel or hypervisor). > > Host. Lots of existing systems do this. As an extreme boot a RO disk, > limit which packages are allowed. Is that provable to the guest? Consider a cloud provider doing that: how do they prove to their guest: a) What firmware, kernel and kvm they run b) That what they booted cannot be maliciouly modified, e.g. by a rogue device driver installed by a rogue sysadmin My understanding is that SecureBoot is only intended to prevent non-verified operating systems from booting. So the proof is given to the cloud provider, and the proof is that the system boots successfully. After that, I think all bets are off. SecureBoot does little AFAICT to prevent malicious modifications of the running system by someone with root access, including deliberately loading a malicious kvm-zilog.ko It does not mean it cannot be done, just that I don’t think we have the tools at the moment. > >> If it’s guest, at the moment at least, the measurements do not extend >> beyond the TCB. >> >>> >>> -- >>> MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-01 13:15 ` Christophe de Dinechin Dupont de Dinechin @ 2023-02-01 16:02 ` Michael S. Tsirkin 2023-02-01 17:13 ` Christophe de Dinechin 0 siblings, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-02-01 16:02 UTC (permalink / raw) To: Christophe de Dinechin Dupont de Dinechin Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote: > > > > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote: > >> > >> > >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: > >>> > >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: > >>>> Finally, security considerations that apply irrespective of whether the > >>>> platform is confidential or not are also outside of the scope of this > >>>> document. This includes topics ranging from timing attacks to social > >>>> engineering. > >>> > >>> Why are timing attacks by hypervisor on the guest out of scope? > >> > >> Good point. > >> > >> I was thinking that mitigation against timing attacks is the same > >> irrespective of the source of the attack. However, because the HV > >> controls CPU time allocation, there are presumably attacks that > >> are made much easier through the HV. Those should be listed. > > > > Not just that, also because it can and does emulate some devices. > > For example, are disk encryption systems protected against timing of > > disk accesses? > > This is why some people keep saying "forget about emulated devices, require > > passthrough, include devices in the trust zone". > > > >>> > >>>> </doc> > >>>> > >>>> Feel free to comment and reword at will ;-) > >>>> > >>>> > >>>> 3/ PCI-as-a-threat: where does that come from > >>>> > >>>> Isn't there a fundamental difference, from a threat model perspective, > >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC > >>>> should defeat) and compromised software feeding us bad data? I think there > >>>> is: at leats inside the TCB, we can detect bad software using measurements, > >>>> and prevent it from running using attestation. In other words, we first > >>>> check what we will run, then we run it. The security there is that we know > >>>> what we are running. The trust we have in the software is from testing, > >>>> reviewing or using it. > >>>> > >>>> This relies on a key aspect provided by TDX and SEV, which is that the > >>>> software being measured is largely tamper-resistant thanks to memory > >>>> encryption. In other words, after you have measured your guest software > >>>> stack, the host or hypervisor cannot willy-nilly change it. > >>>> > >>>> So this brings me to the next question: is there any way we could offer the > >>>> same kind of service for KVM and qemu? The measurement part seems relatively > >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to > >>>> me. But maybe someone else will have a brilliant idea? > >>>> > >>>> So I'm asking the question, because if you could somehow prove to the guest > >>>> not only that it's running the right guest stack (as we can do today) but > >>>> also a known host/KVM/hypervisor stack, we would also switch the potential > >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and > >>>> this is something which is evidently easier to deal with. > >>> > >>> Agree absolutely that's much easier. > >>> > >>>> I briefly discussed this with James, and he pointed out two interesting > >>>> aspects of that question: > >>>> > >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We > >>>> care about either virtio devices, or physical ones being passed through > >>>> to the guest. Let's assume physical ones can be trusted, see above. > >>>> That leaves virtio devices. How much damage can a malicious virtio device > >>>> do to the guest kernel, and can this lead to secrets being leaked? > >>>> > >>>> 2/ He was not as negative as I anticipated on the possibility of somehow > >>>> being able to prevent tampering of the guest. One example he mentioned is > >>>> a research paper [1] about running the hypervisor itself inside an > >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved > >>>> with TDX using secure enclaves or some other mechanism? > >>> > >>> Or even just secureboot based root of trust? > >> > >> You mean host secureboot? Or guest? > >> > >> If it’s host, then the problem is detecting malicious tampering with > >> host code (whether it’s kernel or hypervisor). > > > > Host. Lots of existing systems do this. As an extreme boot a RO disk, > > limit which packages are allowed. > > Is that provable to the guest? > > Consider a cloud provider doing that: how do they prove to their guest: > > a) What firmware, kernel and kvm they run > > b) That what they booted cannot be maliciouly modified, e.g. by a rogue > device driver installed by a rogue sysadmin > > My understanding is that SecureBoot is only intended to prevent non-verified > operating systems from booting. So the proof is given to the cloud provider, > and the proof is that the system boots successfully. I think I should have said measured boot not secure boot. > > After that, I think all bets are off. SecureBoot does little AFAICT > to prevent malicious modifications of the running system by someone with > root access, including deliberately loading a malicious kvm-zilog.ko So disable module loading then or don't allow root access? > > It does not mean it cannot be done, just that I don’t think we > have the tools at the moment. Phones, chromebooks do this all the time ... > > > >> If it’s guest, at the moment at least, the measurements do not extend > >> beyond the TCB. > >> > >>> > >>> -- > >>> MST > ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-01 16:02 ` Michael S. Tsirkin @ 2023-02-01 17:13 ` Christophe de Dinechin 2023-02-06 18:58 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 102+ messages in thread From: Christophe de Dinechin @ 2023-02-01 17:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote... > On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote: >> >> >> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote: >> > >> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote: >> >> >> >> >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: >> >>> >> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: >> >>>> Finally, security considerations that apply irrespective of whether the >> >>>> platform is confidential or not are also outside of the scope of this >> >>>> document. This includes topics ranging from timing attacks to social >> >>>> engineering. >> >>> >> >>> Why are timing attacks by hypervisor on the guest out of scope? >> >> >> >> Good point. >> >> >> >> I was thinking that mitigation against timing attacks is the same >> >> irrespective of the source of the attack. However, because the HV >> >> controls CPU time allocation, there are presumably attacks that >> >> are made much easier through the HV. Those should be listed. >> > >> > Not just that, also because it can and does emulate some devices. >> > For example, are disk encryption systems protected against timing of >> > disk accesses? >> > This is why some people keep saying "forget about emulated devices, require >> > passthrough, include devices in the trust zone". >> > >> >>> >> >>>> </doc> >> >>>> >> >>>> Feel free to comment and reword at will ;-) >> >>>> >> >>>> >> >>>> 3/ PCI-as-a-threat: where does that come from >> >>>> >> >>>> Isn't there a fundamental difference, from a threat model perspective, >> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC >> >>>> should defeat) and compromised software feeding us bad data? I think there >> >>>> is: at leats inside the TCB, we can detect bad software using measurements, >> >>>> and prevent it from running using attestation. In other words, we first >> >>>> check what we will run, then we run it. The security there is that we know >> >>>> what we are running. The trust we have in the software is from testing, >> >>>> reviewing or using it. >> >>>> >> >>>> This relies on a key aspect provided by TDX and SEV, which is that the >> >>>> software being measured is largely tamper-resistant thanks to memory >> >>>> encryption. In other words, after you have measured your guest software >> >>>> stack, the host or hypervisor cannot willy-nilly change it. >> >>>> >> >>>> So this brings me to the next question: is there any way we could offer the >> >>>> same kind of service for KVM and qemu? The measurement part seems relatively >> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to >> >>>> me. But maybe someone else will have a brilliant idea? >> >>>> >> >>>> So I'm asking the question, because if you could somehow prove to the guest >> >>>> not only that it's running the right guest stack (as we can do today) but >> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential >> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and >> >>>> this is something which is evidently easier to deal with. >> >>> >> >>> Agree absolutely that's much easier. >> >>> >> >>>> I briefly discussed this with James, and he pointed out two interesting >> >>>> aspects of that question: >> >>>> >> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We >> >>>> care about either virtio devices, or physical ones being passed through >> >>>> to the guest. Let's assume physical ones can be trusted, see above. >> >>>> That leaves virtio devices. How much damage can a malicious virtio device >> >>>> do to the guest kernel, and can this lead to secrets being leaked? >> >>>> >> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow >> >>>> being able to prevent tampering of the guest. One example he mentioned is >> >>>> a research paper [1] about running the hypervisor itself inside an >> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved >> >>>> with TDX using secure enclaves or some other mechanism? >> >>> >> >>> Or even just secureboot based root of trust? >> >> >> >> You mean host secureboot? Or guest? >> >> >> >> If it’s host, then the problem is detecting malicious tampering with >> >> host code (whether it’s kernel or hypervisor). >> > >> > Host. Lots of existing systems do this. As an extreme boot a RO disk, >> > limit which packages are allowed. >> >> Is that provable to the guest? >> >> Consider a cloud provider doing that: how do they prove to their guest: >> >> a) What firmware, kernel and kvm they run >> >> b) That what they booted cannot be maliciouly modified, e.g. by a rogue >> device driver installed by a rogue sysadmin >> >> My understanding is that SecureBoot is only intended to prevent non-verified >> operating systems from booting. So the proof is given to the cloud provider, >> and the proof is that the system boots successfully. > > I think I should have said measured boot not secure boot. The problem again is how you prove to the guest that you are not lying? We know how to do that from a guest [1], but you will note that in the normal process, a trusted hardware component (e.g. the PSP for AMD SEV) proves the validity of the measurements of the TCB by encrypting it with an attestation signing key derived from some chip-unique secret. For AMD, this is called the VCEK, and TDX has something similar. In the case of SEV, this goes through firmware, and you have to tell the firmware each time you insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied to a VM execution context. I do not believe there is any provision to do the same thing to measure host data. And again, it would be somewhat pointless if there isn't also a mechanism to ensure the host data is not changed after the measurement. Now, I don't think it would be super-difficult to add a firmware service that would let the host do some kind of equivalent to PVALIDATE, setting some physical pages aside that then get measured and become inaccessible to the host. The PSP or similar could then integrate these measurements as part of the TCB, and the fact that the pages were "transferred" to this special invariant block would ensure the guests that the code will not change after being measured. I am not aware that such a mechanism exists on any of the existing CC platforms. Please feel free to enlighten me if I'm wrong. [1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow > >> >> After that, I think all bets are off. SecureBoot does little AFAICT >> to prevent malicious modifications of the running system by someone with >> root access, including deliberately loading a malicious kvm-zilog.ko > > So disable module loading then or don't allow root access? Who would do that? The problem is that we have a host and a tenant, and the tenant does not trust the host in principle. So it is not sufficient for the host to disable module loading or carefully control root access. It is also necessary to prove to the tenant(s) that this was done. > >> >> It does not mean it cannot be done, just that I don’t think we >> have the tools at the moment. > > Phones, chromebooks do this all the time ... Indeed, but there, this is to prove to the phone's real owner (which, surprise, is not the naive person who thought they'd get some kind of ownership by buying the phone) that the software running on the phone has not been replaced by some horribly jailbreaked goo. In other words, the user of the phone gets no proof whatsoever of anything, except that the phone appears to work. This is somewhat the situation in the cloud today: the owners of the hardware get all sorts of useful checks, from SecureBoot to error-correction for memory or I/O devices. However, someone running in a VM on the cloud gets none of that, just like the user of your phone. -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-01 17:13 ` Christophe de Dinechin @ 2023-02-06 18:58 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-02-06 18:58 UTC (permalink / raw) To: Christophe de Dinechin Cc: Michael S. Tsirkin, James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening * Christophe de Dinechin (dinechin@redhat.com) wrote: > > On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote... > > On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote: > >> > >> > >> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > > >> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote: > >> >> > >> >> > >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >>> > >> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: > >> >>>> Finally, security considerations that apply irrespective of whether the > >> >>>> platform is confidential or not are also outside of the scope of this > >> >>>> document. This includes topics ranging from timing attacks to social > >> >>>> engineering. > >> >>> > >> >>> Why are timing attacks by hypervisor on the guest out of scope? > >> >> > >> >> Good point. > >> >> > >> >> I was thinking that mitigation against timing attacks is the same > >> >> irrespective of the source of the attack. However, because the HV > >> >> controls CPU time allocation, there are presumably attacks that > >> >> are made much easier through the HV. Those should be listed. > >> > > >> > Not just that, also because it can and does emulate some devices. > >> > For example, are disk encryption systems protected against timing of > >> > disk accesses? > >> > This is why some people keep saying "forget about emulated devices, require > >> > passthrough, include devices in the trust zone". > >> > > >> >>> > >> >>>> </doc> > >> >>>> > >> >>>> Feel free to comment and reword at will ;-) > >> >>>> > >> >>>> > >> >>>> 3/ PCI-as-a-threat: where does that come from > >> >>>> > >> >>>> Isn't there a fundamental difference, from a threat model perspective, > >> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC > >> >>>> should defeat) and compromised software feeding us bad data? I think there > >> >>>> is: at leats inside the TCB, we can detect bad software using measurements, > >> >>>> and prevent it from running using attestation. In other words, we first > >> >>>> check what we will run, then we run it. The security there is that we know > >> >>>> what we are running. The trust we have in the software is from testing, > >> >>>> reviewing or using it. > >> >>>> > >> >>>> This relies on a key aspect provided by TDX and SEV, which is that the > >> >>>> software being measured is largely tamper-resistant thanks to memory > >> >>>> encryption. In other words, after you have measured your guest software > >> >>>> stack, the host or hypervisor cannot willy-nilly change it. > >> >>>> > >> >>>> So this brings me to the next question: is there any way we could offer the > >> >>>> same kind of service for KVM and qemu? The measurement part seems relatively > >> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to > >> >>>> me. But maybe someone else will have a brilliant idea? > >> >>>> > >> >>>> So I'm asking the question, because if you could somehow prove to the guest > >> >>>> not only that it's running the right guest stack (as we can do today) but > >> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential > >> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and > >> >>>> this is something which is evidently easier to deal with. > >> >>> > >> >>> Agree absolutely that's much easier. > >> >>> > >> >>>> I briefly discussed this with James, and he pointed out two interesting > >> >>>> aspects of that question: > >> >>>> > >> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We > >> >>>> care about either virtio devices, or physical ones being passed through > >> >>>> to the guest. Let's assume physical ones can be trusted, see above. > >> >>>> That leaves virtio devices. How much damage can a malicious virtio device > >> >>>> do to the guest kernel, and can this lead to secrets being leaked? > >> >>>> > >> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow > >> >>>> being able to prevent tampering of the guest. One example he mentioned is > >> >>>> a research paper [1] about running the hypervisor itself inside an > >> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved > >> >>>> with TDX using secure enclaves or some other mechanism? > >> >>> > >> >>> Or even just secureboot based root of trust? > >> >> > >> >> You mean host secureboot? Or guest? > >> >> > >> >> If it’s host, then the problem is detecting malicious tampering with > >> >> host code (whether it’s kernel or hypervisor). > >> > > >> > Host. Lots of existing systems do this. As an extreme boot a RO disk, > >> > limit which packages are allowed. > >> > >> Is that provable to the guest? > >> > >> Consider a cloud provider doing that: how do they prove to their guest: > >> > >> a) What firmware, kernel and kvm they run > >> > >> b) That what they booted cannot be maliciouly modified, e.g. by a rogue > >> device driver installed by a rogue sysadmin > >> > >> My understanding is that SecureBoot is only intended to prevent non-verified > >> operating systems from booting. So the proof is given to the cloud provider, > >> and the proof is that the system boots successfully. > > > > I think I should have said measured boot not secure boot. > > The problem again is how you prove to the guest that you are not lying? > > We know how to do that from a guest [1], but you will note that in the > normal process, a trusted hardware component (e.g. the PSP for AMD SEV) > proves the validity of the measurements of the TCB by encrypting it with an > attestation signing key derived from some chip-unique secret. For AMD, this > is called the VCEK, and TDX has something similar. In the case of SEV, this > goes through firmware, and you have to tell the firmware each time you > insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied > to a VM execution context. I do not believe there is any provision to do the > same thing to measure host data. And again, it would be somewhat pointless > if there isn't also a mechanism to ensure the host data is not changed after > the measurement. > > Now, I don't think it would be super-difficult to add a firmware service > that would let the host do some kind of equivalent to PVALIDATE, setting > some physical pages aside that then get measured and become inaccessible to > the host. The PSP or similar could then integrate these measurements as part > of the TCB, and the fact that the pages were "transferred" to this special > invariant block would ensure the guests that the code will not change after > being measured. > > I am not aware that such a mechanism exists on any of the existing CC > platforms. Please feel free to enlighten me if I'm wrong. > > [1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow > > > >> > >> After that, I think all bets are off. SecureBoot does little AFAICT > >> to prevent malicious modifications of the running system by someone with > >> root access, including deliberately loading a malicious kvm-zilog.ko > > > > So disable module loading then or don't allow root access? > > Who would do that? > > The problem is that we have a host and a tenant, and the tenant does not > trust the host in principle. So it is not sufficient for the host to disable > module loading or carefully control root access. It is also necessary to > prove to the tenant(s) that this was done. > > > > >> > >> It does not mean it cannot be done, just that I don’t think we > >> have the tools at the moment. > > > > Phones, chromebooks do this all the time ... > > Indeed, but there, this is to prove to the phone's real owner (which, > surprise, is not the naive person who thought they'd get some kind of > ownership by buying the phone) that the software running on the phone has > not been replaced by some horribly jailbreaked goo. > > In other words, the user of the phone gets no proof whatsoever of anything, > except that the phone appears to work. This is somewhat the situation in the > cloud today: the owners of the hardware get all sorts of useful checks, from > SecureBoot to error-correction for memory or I/O devices. However, someone > running in a VM on the cloud gets none of that, just like the user of your > phone. Assuming you do a measured boot, the host OS and firmware is measured into the host TPM; people have thought in the past about triggering attestations of the host from the guest; then you could have something external attest the host and only release keys to the guests disks if the attestation is correct; or a key for the guests disks held in the hosts TPM. Dave > -- > Cheers, > Christophe de Dinechin (https://c3d.github.io) > Theory of Incomplete Measurements (https://c3d.github.io/TIM) > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-01 11:01 ` Michael S. Tsirkin 2023-02-01 13:15 ` Christophe de Dinechin Dupont de Dinechin @ 2023-02-02 3:24 ` Jason Wang 1 sibling, 0 replies; 102+ messages in thread From: Jason Wang @ 2023-02-02 3:24 UTC (permalink / raw) To: Michael S. Tsirkin, Christophe de Dinechin Dupont de Dinechin Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening 在 2023/2/1 19:01, Michael S. Tsirkin 写道: > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote: >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote: >>> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote: >>>> Finally, security considerations that apply irrespective of whether the >>>> platform is confidential or not are also outside of the scope of this >>>> document. This includes topics ranging from timing attacks to social >>>> engineering. >>> Why are timing attacks by hypervisor on the guest out of scope? >> Good point. >> >> I was thinking that mitigation against timing attacks is the same >> irrespective of the source of the attack. However, because the HV >> controls CPU time allocation, there are presumably attacks that >> are made much easier through the HV. Those should be listed. > Not just that, also because it can and does emulate some devices. > For example, are disk encryption systems protected against timing of > disk accesses? > This is why some people keep saying "forget about emulated devices, require > passthrough, include devices in the trust zone". One problem is that the device could be yet another emulated one that is running in the SmartNIC/DPU itself. Thanks ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 15:14 ` Christophe de Dinechin 2023-01-31 17:39 ` Michael S. Tsirkin @ 2023-02-01 10:24 ` Christophe de Dinechin 1 sibling, 0 replies; 102+ messages in thread From: Christophe de Dinechin @ 2023-02-01 10:24 UTC (permalink / raw) To: jejb Cc: Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening I typoed a lot in this email... On 2023-01-31 at 16:14 +01, Christophe de Dinechin <dinechin@redhat.com> wrote... > On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote... >> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote: >>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: >>> > [...] >>> > > > The big threat from most devices (including the thunderbolt >>> > > > classes) is that they can DMA all over memory. However, this >>> > > > isn't really a threat in CC (well until PCI becomes able to do >>> > > > encrypted DMA) because the device has specific unencrypted >>> > > > buffers set aside for the expected DMA. If it writes outside >>> > > > that CC integrity will detect it and if it reads outside that >>> > > > it gets unintelligible ciphertext. So we're left with the >>> > > > device trying to trick secrets out of us by returning >>> > > > unexpected data. >>> > > >>> > > Yes, by supplying the input that hasn’t been expected. This is >>> > > exactly the case we were trying to fix here for example: >>> > > https://lore.kernel.org/all/20230119170633.40944-2- >>> > alexander.shishkin@linux.intel.com/ >>> > > I do agree that this case is less severe when others where memory >>> > > corruption/buffer overrun can happen, like here: >>> > > https://lore.kernel.org/all/20230119135721.83345-6- >>> > alexander.shishkin@linux.intel.com/ >>> > > But we are trying to fix all issues we see now (prioritizing the >>> > > second ones though). >>> > >>> > I don't see how MSI table sizing is a bug in the category we've >>> > defined. The very text of the changelog says "resulting in a >>> > kernel page fault in pci_write_msg_msix()." which is a crash, >>> > which I thought we were agreeing was out of scope for CC attacks? >>> >>> As I said this is an example of a crash and on the first look >>> might not lead to the exploitable condition (albeit attackers are >>> creative). But we noticed this one while fuzzing and it was common >>> enough that prevented fuzzer going deeper into the virtio devices >>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many >>> easily triggerable Other examples in virtio patchset are more severe. >> >> You cited this as your example. I'm pointing out it seems to be an >> event of the class we've agreed not to consider because it's an oops >> not an exploit. If there are examples of fixing actual exploits to CC >> VMs, what are they? >> >> This patch is, however, an example of the problem everyone else on the >> thread is complaining about: a patch which adds an unnecessary check to >> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and >> in the real world the tables are correct (or the manufacturer is >> quickly chastened), so it adds overhead to no benefit. > > I'd like to backtrack a little here. > > > 1/ PCI-as-a-thread, where does it come from? PCI-as-a-threat > > On physical devices, we have to assume that the device is working. As other > pointed out, there are things like PCI compliance tests, etc. So Linux has > to trust the device. You could manufacture a broken device intentionally, > but the value you would get from that would be limited. > > On a CC system, the "PCI" values are really provided by the hypervisor, > which is not trusted. This leads to this peculiar way of thinking where we > say "what happens if virtual device feeds us a bogus value *intentionally*". > We cannot assume that the *virtual* PCI device ran through the compliance > tests. Instead, we see the PCI interface as hostile, which makes us look > like weirdos to the rest of the community. > > Consequently, as James pointed out, we first need to focus on consequences > that would break what I would call the "CC promise", which is essentially > that we'd rather kill the guest than reveal its secrets. Unless you have a > credible path to a secret being revealed, don't bother "fixing" a bug. And > as was pointed out elsewhere in this thread, caching has a cost, so you > can't really use the "optimization" angle either. > > > 2/ Clarification of the "CC promise" and value proposition > > Based on the above, the very first thing is to clarify that "CC promise", > because if exchanges on this thread have proved anything, it is that it's > quite unclear to anyone outside the "CoCo world". > > The Linux Guest Kernel Security Specification needs to really elaborate on > what the value proposition of CC is, not assume it is a given. "Bug fixes" > before this value proposition has been understood and accepted by the > non-CoCo community are likely to go absolutely nowhere. > > Here is a quick proposal for the Purpose and Scope section: > > <doc> > Purpose and Scope > > Confidential Computing (CC) is a set of technologies that allows a guest to > run without having to trust either the hypervisor or the host. CC offers two > new guarantees to the guest compared to the non-CC case: > > a) The guest will be able to measure and attest, by cryptographic means, the > guest software stack that it is running, and be assured that this > software stack cannot be tampered with by the host or the hypervisor > after it was measured. The root of trust for this aspect of CC is > typically the CPU manufacturer (e.g. through a private key that can be > used to respond to cryptographic challenges). > > b) Guest state, including memory, become secrets which must remain > inaccessible to the host. In a CC context, it is considered preferable to > stop or kill a guest rather than risk leaking its secrets. This aspect of > CC is typically enforced by means such as memory encryption and new > semantics for memory protection. > > CC leads to a different threat model for a Linux kernel running as a guest > inside a confidential virtual machine (CVM). Notably, whereas the machine > (CPU, I/O devices, etc) is usually considered as trustworthy, in the CC > case, the hypervisor emulating some aspects of the virtual machine is now > considered as potentially malicious. Consequently, effects of any data > provided by the guest to the hypervisor, including ACPI configuration to the guest by the hypervisor > tables, MMIO interfaces or machine specific registers (MSRs) need to be > re-evaluated. > > This document describes the security architecture of the Linux guest kernel > running inside a CVM, with a particular focus on the Intel TDX > implementation. Many aspects of this document will be applicable to other > CC implementations such as AMD SEV. > > Aspects of the guest-visible state that are under direct control of the > hardware, such as the CPU state or memory protection, will be considered as > being handled by the CC implementations. This document will therefore only > focus on aspects of the virtual machine that are typically managed by the > hypervisor or the host. > > Since the host ultimately owns the resources and can allocate them at will, > including denying their use at any point, this document will not address > denial or service or performance degradation. It will however cover random > number generation, which is central for cryptographic security. > > Finally, security considerations that apply irrespective of whether the > platform is confidential or not are also outside of the scope of this > document. This includes topics ranging from timing attacks to social > engineering. > </doc> > > Feel free to comment and reword at will ;-) > > > 3/ PCI-as-a-threat: where does that come from 3/ Can we shift from "malicious" hypervisor/host input to "bogus" input? > > Isn't there a fundamental difference, from a threat model perspective, > between a bad actor, say a rogue sysadmin dumping the guest memory (which CC > should defeat) and compromised software feeding us bad data? I think there > is: at leats inside the TCB, we can detect bad software using measurements, > and prevent it from running using attestation. In other words, we first > check what we will run, then we run it. The security there is that we know > what we are running. The trust we have in the software is from testing, > reviewing or using it. > > This relies on a key aspect provided by TDX and SEV, which is that the > software being measured is largely tamper-resistant thanks to memory > encryption. In other words, after you have measured your guest software > stack, the host or hypervisor cannot willy-nilly change it. > > So this brings me to the next question: is there any way we could offer the > same kind of service for KVM and qemu? The measurement part seems relatively > easy. Thetamper-resistant part, on the other hand, seems quite difficult to > me. But maybe someone else will have a brilliant idea? > > So I'm asking the question, because if you could somehow prove to the guest > not only that it's running the right guest stack (as we can do today) but > also a known host/KVM/hypervisor stack, we would also switch the potential > issues with PCI, MSRs and the like from "malicious" to merely "bogus", and > this is something which is evidently easier to deal with. > > I briefly discussed this with James, and he pointed out two interesting > aspects of that question: > > 1/ In the CC world, we don't really care about *virtual* PCI devices. We > care about either virtio devices, or physical ones being passed through > to the guest. Let's assume physical ones can be trusted, see above. > That leaves virtio devices. How much damage can a malicious virtio device > do to the guest kernel, and can this lead to secrets being leaked? > > 2/ He was not as negative as I anticipated on the possibility of somehow > being able to prevent tampering of the guest. One example he mentioned is > a research paper [1] about running the hypervisor itself inside an > "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved > with TDX using secure enclaves or some other mechanism? > > > Sorry, this mail is a bit long ;-) and was a bit rushed too... > > >> >> >> [...] >>> > see what else it could detect given the signal will be smothered by >>> > oopses and secondly I think the PCI interface is likely the wrong >>> > place to begin and you should probably begin on the virtio bus and >>> > the hypervisor generated configuration space. >>> >>> This is exactly what we do. We don’t fuzz from the PCI config space, >>> we supply inputs from the host/vmm via the legitimate interfaces that >>> it can inject them to the guest: whenever guest requests a pci config >>> space (which is controlled by host/hypervisor as you said) read >>> operation, it gets input injected by the kafl fuzzer. Same for other >>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO, >>> anything that goes via #VE handler in our case). When it comes to >>> virtio, we employ two different fuzzing techniques: directly >>> injecting kafl fuzz input when virtio core or virtio drivers gets the >>> data received from the host (via injecting input in functions >>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory >>> pages using kfx fuzzer. More information can be found in >>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing >> >> Given that we previously agreed that oppses and other DoS attacks are >> out of scope for CC, I really don't think fuzzing, which primarily >> finds oopses, is at all a useful tool unless you filter the results by >> the question "could we exploit this in a CC VM to reveal secrets". >> Without applying that filter you're sending a load of patches which >> don't really do much to reduce the CC attack surface and which do annoy >> non-CC people because they add pointless checks to things they expect >> the cards and config tables to get right. > > Indeed. > > [1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592 -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-31 13:28 ` James Bottomley 2023-01-31 15:14 ` Christophe de Dinechin @ 2023-01-31 16:34 ` Reshetova, Elena 2023-01-31 17:49 ` James Bottomley 1 sibling, 1 reply; 102+ messages in thread From: Reshetova, Elena @ 2023-01-31 16:34 UTC (permalink / raw) To: jejb, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote: > > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: > > > [...] > > > > > The big threat from most devices (including the thunderbolt > > > > > classes) is that they can DMA all over memory. However, this > > > > > isn't really a threat in CC (well until PCI becomes able to do > > > > > encrypted DMA) because the device has specific unencrypted > > > > > buffers set aside for the expected DMA. If it writes outside > > > > > that CC integrity will detect it and if it reads outside that > > > > > it gets unintelligible ciphertext. So we're left with the > > > > > device trying to trick secrets out of us by returning > > > > > unexpected data. > > > > > > > > Yes, by supplying the input that hasn’t been expected. This is > > > > exactly the case we were trying to fix here for example: > > > > https://lore.kernel.org/all/20230119170633.40944-2- > > > alexander.shishkin@linux.intel.com/ > > > > I do agree that this case is less severe when others where memory > > > > corruption/buffer overrun can happen, like here: > > > > https://lore.kernel.org/all/20230119135721.83345-6- > > > alexander.shishkin@linux.intel.com/ > > > > But we are trying to fix all issues we see now (prioritizing the > > > > second ones though). > > > > > > I don't see how MSI table sizing is a bug in the category we've > > > defined. The very text of the changelog says "resulting in a > > > kernel page fault in pci_write_msg_msix()." which is a crash, > > > which I thought we were agreeing was out of scope for CC attacks? > > > > As I said this is an example of a crash and on the first look > > might not lead to the exploitable condition (albeit attackers are > > creative). But we noticed this one while fuzzing and it was common > > enough that prevented fuzzer going deeper into the virtio devices > > driver fuzzing. The core PCI/MSI doesn’t seem to have that many > > easily triggerable Other examples in virtio patchset are more severe. > > You cited this as your example. I'm pointing out it seems to be an > event of the class we've agreed not to consider because it's an oops > not an exploit. If there are examples of fixing actual exploits to CC > VMs, what are they? > > This patch is, however, an example of the problem everyone else on the > thread is complaining about: a patch which adds an unnecessary check to > the MSI subsystem; unnecessary because it doesn't fix a CC exploit and > in the real world the tables are correct (or the manufacturer is > quickly chastened), so it adds overhead to no benefit. How can you make sure there is no exploit possible using this crash as a stepping stone into a CC guest? Or are you saying that we are back to the times when we can merge the fixes for crashes and out of bound errors in kernel only given that we submit a proof of concept exploit with the patch for every issue? > > > [...] > > > see what else it could detect given the signal will be smothered by > > > oopses and secondly I think the PCI interface is likely the wrong > > > place to begin and you should probably begin on the virtio bus and > > > the hypervisor generated configuration space. > > > > This is exactly what we do. We don’t fuzz from the PCI config space, > > we supply inputs from the host/vmm via the legitimate interfaces that > > it can inject them to the guest: whenever guest requests a pci config > > space (which is controlled by host/hypervisor as you said) read > > operation, it gets input injected by the kafl fuzzer. Same for other > > interfaces that are under control of host/VMM (MSRs, port IO, MMIO, > > anything that goes via #VE handler in our case). When it comes to > > virtio, we employ two different fuzzing techniques: directly > > injecting kafl fuzz input when virtio core or virtio drivers gets the > > data received from the host (via injecting input in functions > > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory > > pages using kfx fuzzer. More information can be found in > > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest- > hardening.html#td-guest-fuzzing > > Given that we previously agreed that oppses and other DoS attacks are > out of scope for CC, I really don't think fuzzing, which primarily > finds oopses, is at all a useful tool unless you filter the results by > the question "could we exploit this in a CC VM to reveal secrets". > Without applying that filter you're sending a load of patches which > don't really do much to reduce the CC attack surface and which do annoy > non-CC people because they add pointless checks to things they expect > the cards and config tables to get right. I don’t think we have agreed that random kernel crashes are out of scope in CC threat model (controlled safe panic is out of scope, but this is not what we have here). It all depends if this ops can be used in a successful attack against guest private memory or not and this is *not* a trivial thing to decide. That's said, we are mostly focusing on KASAN findings, which have higher likelihood to be exploitable at least for host -> guest privilege escalation (which in turn compromised guest private memory confidentiality). Fuzzing has a long history of find such issues in past (including the ones that have been exploited after). But even for this ops bug, can anyone guarantee it cannot be chained with other ones to cause a more complex privilege escalation attack? I wont be making such a claim, I feel it is safer to fix this vs debating whenever it can be used for an attack or not. Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 16:34 ` Reshetova, Elena @ 2023-01-31 17:49 ` James Bottomley 0 siblings, 0 replies; 102+ messages in thread From: James Bottomley @ 2023-01-31 17:49 UTC (permalink / raw) To: Reshetova, Elena, Leon Romanovsky Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Tue, 2023-01-31 at 16:34 +0000, Reshetova, Elena wrote: [...] > > You cited this as your example. I'm pointing out it seems to be an > > event of the class we've agreed not to consider because it's an > > oops not an exploit. If there are examples of fixing actual > > exploits to CC VMs, what are they? > > > > This patch is, however, an example of the problem everyone else on > > the thread is complaining about: a patch which adds an unnecessary > > check to the MSI subsystem; unnecessary because it doesn't fix a CC > > exploit and in the real world the tables are correct (or the > > manufacturer is quickly chastened), so it adds overhead to no > > benefit. > > How can you make sure there is no exploit possible using this crash > as a stepping stone into a CC guest? I'm not, what I'm saying is you haven't proved it can be used to exfiltrate secrets. In a world where the PCI device is expected to be correct, and the non-CC kernel doesn't want to second guess that, there are loads of lies you can tell to the PCI subsystem that causes a crash or a hang. If we fix every one, we end up with a massive patch set and a huge potential slow down for the non-CC kernel. If there's no way to tell what lies might leak data, the fuzzing results are a mass of noise with no real signal and we can't even quantify by how much (or even if) we've improved the CC VM attack surface even after we merge the huge patch set it generates. > Or are you saying that we are back to the times when we can merge > the fixes for crashes and out of bound errors in kernel only given > that we submit a proof of concept exploit with the patch for every > issue? The PCI people have already said that crashing in the face of bogus configuration data is expected behaviour, so just generating the crash doesn't prove there's a problem to be fixed. That means you do have to go beyond and demonstrate there could be an information leak in a CC VM on the back of it, yes. > > [...] > > > > see what else it could detect given the signal will be > > > > smothered by oopses and secondly I think the PCI interface is > > > > likely the wrong place to begin and you should probably begin > > > > on the virtio bus and the hypervisor generated configuration > > > > space. > > > > > > This is exactly what we do. We don’t fuzz from the PCI config > > > space, we supply inputs from the host/vmm via the legitimate > > > interfaces that it can inject them to the guest: whenever guest > > > requests a pci config space (which is controlled by > > > host/hypervisor as you said) read operation, it gets input > > > injected by the kafl fuzzer. Same for other interfaces that are > > > under control of host/VMM (MSRs, port IO, MMIO, anything that > > > goes via #VE handler in our case). When it comes to virtio, we > > > employ two different fuzzing techniques: directly injecting kafl > > > fuzz input when virtio core or virtio drivers gets the data > > > received from the host (via injecting input in functions > > > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory > > > pages using kfx fuzzer. More information can be found in > > > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest- > > hardening.html#td-guest-fuzzing > > > > Given that we previously agreed that oppses and other DoS attacks > > are out of scope for CC, I really don't think fuzzing, which > > primarily finds oopses, is at all a useful tool unless you filter > > the results by the question "could we exploit this in a CC VM to > > reveal secrets". Without applying that filter you're sending a load > > of patches which don't really do much to reduce the CC attack > > surface and which do annoy non-CC people because they add pointless > > checks to things they expect the cards and config tables to get > > right. > > I don’t think we have agreed that random kernel crashes are out of > scope in CC threat model (controlled safe panic is out of scope, but > this is not what we have here). So perhaps making it a controlled panic in the CC VM, so we can guarantee no information leak, would be the first place to start? > It all depends if this ops can be used in a successful attack against > guest private memory or not and this is *not* a trivial thing to > decide. Right, but if you can't decide that, you can't extract the signal from your fuzzing tool noise. > That's said, we are mostly focusing on KASAN findings, which > have higher likelihood to be exploitable at least for host -> guest > privilege escalation (which in turn compromised guest private memory > confidentiality). Fuzzing has a long history of find such issues in > past (including the ones that have been exploited after). But even > for this ops bug, can anyone guarantee it cannot be chained with > other ones to cause a more complex privilege escalation attack? > I wont be making such a claim, I feel it is safer to fix this vs > debating whenever it can be used for an attack or not. The PCI people have already been clear that adding a huge framework of checks to PCI table parsing simply for the promise it "might possibly" improve CC VM security is way too much effort for too little result. If you can hone that down to a few places where you can show it will prevent a CC information leak, I'm sure they'll be more receptive. Telling them to disprove your assertion that there might be an exploit here isn't going to make them change their minds. James ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 11:31 ` Reshetova, Elena 2023-01-31 13:28 ` James Bottomley @ 2023-02-02 14:51 ` Jeremi Piotrowski 2023-02-03 14:05 ` Reshetova, Elena 1 sibling, 1 reply; 102+ messages in thread From: Jeremi Piotrowski @ 2023-02-02 14:51 UTC (permalink / raw) To: Reshetova, Elena Cc: jejb, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote: > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: > > [...] > > > > The big threat from most devices (including the thunderbolt > > > > classes) is that they can DMA all over memory. However, this isn't > > > > really a threat in CC (well until PCI becomes able to do encrypted > > > > DMA) because the device has specific unencrypted buffers set aside > > > > for the expected DMA. If it writes outside that CC integrity will > > > > detect it and if it reads outside that it gets unintelligible > > > > ciphertext. So we're left with the device trying to trick secrets > > > > out of us by returning unexpected data. > > > > > > Yes, by supplying the input that hasn’t been expected. This is > > > exactly the case we were trying to fix here for example: > > > https://lore.kernel.org/all/20230119170633.40944-2- > > alexander.shishkin@linux.intel.com/ > > > I do agree that this case is less severe when others where memory > > > corruption/buffer overrun can happen, like here: > > > https://lore.kernel.org/all/20230119135721.83345-6- > > alexander.shishkin@linux.intel.com/ > > > But we are trying to fix all issues we see now (prioritizing the > > > second ones though). > > > > I don't see how MSI table sizing is a bug in the category we've > > defined. The very text of the changelog says "resulting in a kernel > > page fault in pci_write_msg_msix()." which is a crash, which I thought > > we were agreeing was out of scope for CC attacks? > > As I said this is an example of a crash and on the first look > might not lead to the exploitable condition (albeit attackers are creative). > But we noticed this one while fuzzing and it was common enough > that prevented fuzzer going deeper into the virtio devices driver fuzzing. > The core PCI/MSI doesn’t seem to have that many easily triggerable > Other examples in virtio patchset are more severe. > > > > > > > > > > > If I set this as the problem, verifying device correct operation is > > > > a possible solution (albeit hugely expensive) but there are likely > > > > many other cheaper ways to defeat or detect a device trying to > > > > trick us into revealing something. > > > > > > What do you have in mind here for the actual devices we need to > > > enable for CC cases? > > > > Well, the most dangerous devices seem to be the virtio set a CC system > > will rely on to boot up. After that, there are other ways (like SPDM) > > to verify a real PCI device is on the other end of the transaction. > > Yes, it the future, but not yet. Other vendors will not necessary be > using virtio devices at this point, so we will have non-virtio and not > CC enabled devices that we want to securely add to the guest. > > > > > > We have been using here a combination of extensive fuzzing and static > > > code analysis. > > > > by fuzzing, I assume you mean fuzzing from the PCI configuration space? > > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses > > off the table because fuzzing primarily triggers those > > If you enable memory sanitizers you can detect more server conditions like > out of bounds accesses and such. I think given that we have a way to > verify that fuzzing is reaching the code locations we want it to reach, it > can be pretty effective method to find at least low-hanging bugs. And these > will be the bugs that most of the attackers will go after at the first place. > But of course it is not a formal verification of any kind. > > so its hard to > > see what else it could detect given the signal will be smothered by > > oopses and secondly I think the PCI interface is likely the wrong place > > to begin and you should probably begin on the virtio bus and the > > hypervisor generated configuration space. > > This is exactly what we do. We don’t fuzz from the PCI config space, > we supply inputs from the host/vmm via the legitimate interfaces that it can > inject them to the guest: whenever guest requests a pci config space > (which is controlled by host/hypervisor as you said) read operation, > it gets input injected by the kafl fuzzer. Same for other interfaces that > are under control of host/VMM (MSRs, port IO, MMIO, anything that goes > via #VE handler in our case). When it comes to virtio, we employ > two different fuzzing techniques: directly injecting kafl fuzz input when > virtio core or virtio drivers gets the data received from the host > (via injecting input in functions virtio16/32/64_to_cpu and others) and > directly fuzzing DMA memory pages using kfx fuzzer. > More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing > > Best Regards, > Elena. Hi Elena, I think it might be a good idea to narrow down a configuration that *can* reasonably be hardened to be suitable for confidential computing, before proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices in the context of virtualization, but what about taking PCI out of scope completely by switching to virtio-mmio devices? Jeremi ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-02-02 14:51 ` Jeremi Piotrowski @ 2023-02-03 14:05 ` Reshetova, Elena 0 siblings, 0 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-02-03 14:05 UTC (permalink / raw) To: Jeremi Piotrowski Cc: jejb, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote: > > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote: > > > [...] > > > > > The big threat from most devices (including the thunderbolt > > > > > classes) is that they can DMA all over memory. However, this isn't > > > > > really a threat in CC (well until PCI becomes able to do encrypted > > > > > DMA) because the device has specific unencrypted buffers set aside > > > > > for the expected DMA. If it writes outside that CC integrity will > > > > > detect it and if it reads outside that it gets unintelligible > > > > > ciphertext. So we're left with the device trying to trick secrets > > > > > out of us by returning unexpected data. > > > > > > > > Yes, by supplying the input that hasn’t been expected. This is > > > > exactly the case we were trying to fix here for example: > > > > https://lore.kernel.org/all/20230119170633.40944-2- > > > alexander.shishkin@linux.intel.com/ > > > > I do agree that this case is less severe when others where memory > > > > corruption/buffer overrun can happen, like here: > > > > https://lore.kernel.org/all/20230119135721.83345-6- > > > alexander.shishkin@linux.intel.com/ > > > > But we are trying to fix all issues we see now (prioritizing the > > > > second ones though). > > > > > > I don't see how MSI table sizing is a bug in the category we've > > > defined. The very text of the changelog says "resulting in a kernel > > > page fault in pci_write_msg_msix()." which is a crash, which I thought > > > we were agreeing was out of scope for CC attacks? > > > > As I said this is an example of a crash and on the first look > > might not lead to the exploitable condition (albeit attackers are creative). > > But we noticed this one while fuzzing and it was common enough > > that prevented fuzzer going deeper into the virtio devices driver fuzzing. > > The core PCI/MSI doesn’t seem to have that many easily triggerable > > Other examples in virtio patchset are more severe. > > > > > > > > > > > > > > > If I set this as the problem, verifying device correct operation is > > > > > a possible solution (albeit hugely expensive) but there are likely > > > > > many other cheaper ways to defeat or detect a device trying to > > > > > trick us into revealing something. > > > > > > > > What do you have in mind here for the actual devices we need to > > > > enable for CC cases? > > > > > > Well, the most dangerous devices seem to be the virtio set a CC system > > > will rely on to boot up. After that, there are other ways (like SPDM) > > > to verify a real PCI device is on the other end of the transaction. > > > > Yes, it the future, but not yet. Other vendors will not necessary be > > using virtio devices at this point, so we will have non-virtio and not > > CC enabled devices that we want to securely add to the guest. > > > > > > > > > We have been using here a combination of extensive fuzzing and static > > > > code analysis. > > > > > > by fuzzing, I assume you mean fuzzing from the PCI configuration space? > > > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses > > > off the table because fuzzing primarily triggers those > > > > If you enable memory sanitizers you can detect more server conditions like > > out of bounds accesses and such. I think given that we have a way to > > verify that fuzzing is reaching the code locations we want it to reach, it > > can be pretty effective method to find at least low-hanging bugs. And these > > will be the bugs that most of the attackers will go after at the first place. > > But of course it is not a formal verification of any kind. > > > > so its hard to > > > see what else it could detect given the signal will be smothered by > > > oopses and secondly I think the PCI interface is likely the wrong place > > > to begin and you should probably begin on the virtio bus and the > > > hypervisor generated configuration space. > > > > This is exactly what we do. We don’t fuzz from the PCI config space, > > we supply inputs from the host/vmm via the legitimate interfaces that it can > > inject them to the guest: whenever guest requests a pci config space > > (which is controlled by host/hypervisor as you said) read operation, > > it gets input injected by the kafl fuzzer. Same for other interfaces that > > are under control of host/VMM (MSRs, port IO, MMIO, anything that goes > > via #VE handler in our case). When it comes to virtio, we employ > > two different fuzzing techniques: directly injecting kafl fuzz input when > > virtio core or virtio drivers gets the data received from the host > > (via injecting input in functions virtio16/32/64_to_cpu and others) and > > directly fuzzing DMA memory pages using kfx fuzzer. > > More information can be found in https://intel.github.io/ccc-linux-guest- > hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing > > > > Best Regards, > > Elena. > > Hi Elena, Hi Jeremi, > > I think it might be a good idea to narrow down a configuration that *can* > reasonably be hardened to be suitable for confidential computing, before > proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices > in the context of virtualization, but what about taking PCI out of scope > completely by switching to virtio-mmio devices? I agree that narrowing down is important and we spent a significant effort in disabling various code we don’t need (including PCI code, like quirks, early PCI, etc). The decision to use virtio over pci vs. mmio I believe comes from performance and usage scenarios and we have to best we can with these limitations. Moreover, even if we could remove PCI for the virtio devices by removing the transport dependency, this isn’t possible for other devices that we know are used in some CC setups: not all CSPs are using virtio-based drivers, so pretty quickly PCI comes back into hardening scope and we cannot just remove it unfortunately. Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 12:30 ` Leon Romanovsky 2023-01-26 13:28 ` Reshetova, Elena @ 2023-01-27 9:32 ` Jörg Rödel 1 sibling, 0 replies; 102+ messages in thread From: Jörg Rödel @ 2023-01-27 9:32 UTC (permalink / raw) To: Leon Romanovsky Cc: Reshetova, Elena, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, Jan 26, 2023 at 02:30:19PM +0200, Leon Romanovsky wrote: > This is exactly what I said. You presented me the cases which exist in > your invented world. Mentioned unhandled page fault doesn't exist in real > world. If PCI device doesn't work, it needs to be replaced/blocked and not > left to be operable and accessible from the kernel/user. Believe it or not, this "invented" world is already part of the real world, and will become even more in the future. So this has been stated elsewhere in the thread already, but I also like to stress that hiding misbehavior of devices (real or emulated) is not the goal of this work. In fact, the best action for a CoCo guest in case it detects a (possible) attack is to stop whatever it is doing and crash. And a misbehaving device in a CoCo guest is a possible attack. But what needs to be prevented at all costs is undefined behavior in the CoCo guest that is triggerable by the HV, e.g. by letting an emulated device misbehave. That undefined behavior can lead to information leak, which is a way bigger problem for a guest owner than a crashed VM. Regards, Joerg ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 11:29 ` Reshetova, Elena 2023-01-26 12:30 ` Leon Romanovsky @ 2023-01-26 13:58 ` Dr. David Alan Gilbert 2023-01-26 17:48 ` Reshetova, Elena 1 sibling, 1 reply; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-26 13:58 UTC (permalink / raw) To: Reshetova, Elena Cc: Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening * Reshetova, Elena (elena.reshetova@intel.com) wrote: > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > Replying only to the not-so-far addressed points. > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > Hi Greg, > > > > <...> > > > > > > > 3) All the tools are open-source and everyone can start using them right > > away > > > > even > > > > > without any special HW (readme has description of what is needed). > > > > > Tools and documentation is here: > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > it? > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > we are submitting a fix that we have to list the way how it has been found. > > > We will fix this in the future submissions, but some bugs we have are found by > > > plain code audit, so 'human' is the tool. > > > > My problem with that statement is that by applying different threat > > model you "invent" bugs which didn't exist in a first place. > > > > For example, in this [1] latest submission, authors labeled correct > > behaviour as "bug". > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > alexander.shishkin@linux.intel.com/ > > Hm.. Does everyone think that when kernel dies with unhandled page fault > (such as in that case) or detection of a KASAN out of bounds violation (as it is in some > other cases we already have fixes or investigating) it represents a correct behavior even if > you expect that all your pci HW devices are trusted? What about an error in two > consequent pci reads? What about just some failure that results in erroneous input? I'm not sure you'll get general agreement on those answers for all devices and situations; I think for most devices for non-CoCo situations, then people are generally OK with a misbehaving PCI device causing a kernel crash, since most people are running without IOMMU anyway, a misbehaving device can cause otherwise undetectable chaos. I'd say: a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't guarantee forward progress or stop the hypervisor doing something truly stupid. b) For CoCo, information disclosure, or corruption IS a problem c) For non-CoCo some people might care about robustness of the kernel against a failing PCI device, but generally I think they worry about a fairly clean failure, even in the unexpected-hot unplug case. d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe device; if it's a device that attests OK and we trust it is the device it says it is, do we give it freedom or are we still wary? Dave > Best Regards, > Elena. > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-26 13:58 ` Dr. David Alan Gilbert @ 2023-01-26 17:48 ` Reshetova, Elena 2023-01-26 18:06 ` Leon Romanovsky 0 siblings, 1 reply; 102+ messages in thread From: Reshetova, Elena @ 2023-01-26 17:48 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > * Reshetova, Elena (elena.reshetova@intel.com) wrote: > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > Replying only to the not-so-far addressed points. > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > Hi Greg, > > > > > > <...> > > > > > > > > > 3) All the tools are open-source and everyone can start using them right > > > away > > > > > even > > > > > > without any special HW (readme has description of what is needed). > > > > > > Tools and documentation is here: > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > > it? > > > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > > we are submitting a fix that we have to list the way how it has been found. > > > > We will fix this in the future submissions, but some bugs we have are found > by > > > > plain code audit, so 'human' is the tool. > > > > > > My problem with that statement is that by applying different threat > > > model you "invent" bugs which didn't exist in a first place. > > > > > > For example, in this [1] latest submission, authors labeled correct > > > behaviour as "bug". > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > alexander.shishkin@linux.intel.com/ > > > > Hm.. Does everyone think that when kernel dies with unhandled page fault > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in > some > > other cases we already have fixes or investigating) it represents a correct > behavior even if > > you expect that all your pci HW devices are trusted? What about an error in > two > > consequent pci reads? What about just some failure that results in erroneous > input? > > I'm not sure you'll get general agreement on those answers for all > devices and situations; I think for most devices for non-CoCo > situations, then people are generally OK with a misbehaving PCI device > causing a kernel crash, since most people are running without IOMMU > anyway, a misbehaving device can cause otherwise undetectable chaos. Ok, if this is a consensus within the kernel community, then we can consider the fixes strictly from the CoCo threat model point of view. > > I'd say: > a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't > guarantee forward progress or stop the hypervisor doing something > truly stupid. Yes, denial of service is out of scope but I would not pile all crashes as 'safe' automatically. Depending on the crash, it can be used as a primitive to launch further attacks: privilege escalation, information disclosure and corruption. It is especially true for memory corruption issues. > b) For CoCo, information disclosure, or corruption IS a problem Agreed, but the path to this can incorporate a number of attack primitives, as well as bug chaining. So, if the bug is detected, and fix is easy, instead of thinking about possible implications and its potential usage in exploit writing, safer to fix it. > > c) For non-CoCo some people might care about robustness of the kernel > against a failing PCI device, but generally I think they worry about > a fairly clean failure, even in the unexpected-hot unplug case. Ok. > > d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe > device; if it's a device that attests OK and we trust it is the device > it says it is, do we give it freedom or are we still wary? I would say that attestation and established secure channel to an end device means that we don’t have to employ additional measures to secure data transfer, as well as we 'trust' a device at least to some degree to keep our data protected (both from untrusted host and from other CC guests). I don’t think there is anything else behind this concept. Best Regards, Elena ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 17:48 ` Reshetova, Elena @ 2023-01-26 18:06 ` Leon Romanovsky 2023-01-26 18:14 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 102+ messages in thread From: Leon Romanovsky @ 2023-01-26 18:06 UTC (permalink / raw) To: Reshetova, Elena Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote: > > > * Reshetova, Elena (elena.reshetova@intel.com) wrote: > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > > Replying only to the not-so-far addressed points. > > > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > > Hi Greg, > > > > > > > > <...> > > > > > > > > > > > 3) All the tools are open-source and everyone can start using them right > > > > away > > > > > > even > > > > > > > without any special HW (readme has description of what is needed). > > > > > > > Tools and documentation is here: > > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > > > it? > > > > > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > > > we are submitting a fix that we have to list the way how it has been found. > > > > > We will fix this in the future submissions, but some bugs we have are found > > by > > > > > plain code audit, so 'human' is the tool. > > > > > > > > My problem with that statement is that by applying different threat > > > > model you "invent" bugs which didn't exist in a first place. > > > > > > > > For example, in this [1] latest submission, authors labeled correct > > > > behaviour as "bug". > > > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > > alexander.shishkin@linux.intel.com/ > > > > > > Hm.. Does everyone think that when kernel dies with unhandled page fault > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in > > some > > > other cases we already have fixes or investigating) it represents a correct > > behavior even if > > > you expect that all your pci HW devices are trusted? What about an error in > > two > > > consequent pci reads? What about just some failure that results in erroneous > > input? > > > > I'm not sure you'll get general agreement on those answers for all > > devices and situations; I think for most devices for non-CoCo > > situations, then people are generally OK with a misbehaving PCI device > > causing a kernel crash, since most people are running without IOMMU > > anyway, a misbehaving device can cause otherwise undetectable chaos. > > Ok, if this is a consensus within the kernel community, then we can consider > the fixes strictly from the CoCo threat model point of view. > > > > > I'd say: > > a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't > > guarantee forward progress or stop the hypervisor doing something > > truly stupid. > > Yes, denial of service is out of scope but I would not pile all crashes as > 'safe' automatically. Depending on the crash, it can be used as a > primitive to launch further attacks: privilege escalation, information > disclosure and corruption. It is especially true for memory corruption > issues. > > > b) For CoCo, information disclosure, or corruption IS a problem > > Agreed, but the path to this can incorporate a number of attack > primitives, as well as bug chaining. So, if the bug is detected, and > fix is easy, instead of thinking about possible implications and its > potential usage in exploit writing, safer to fix it. > > > > > c) For non-CoCo some people might care about robustness of the kernel > > against a failing PCI device, but generally I think they worry about > > a fairly clean failure, even in the unexpected-hot unplug case. > > Ok. With my other hat as a representative of hardware vendor (at least for NIC part), who cares about quality of our devices, we don't want to hide ANY crash related to our devices, especially if it is related to misbehaving PCI HW logic. Any uncontrolled "robustness" hides real issues and makes QA/customer support much harder. Thanks ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-26 18:06 ` Leon Romanovsky @ 2023-01-26 18:14 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-01-26 18:14 UTC (permalink / raw) To: Leon Romanovsky Cc: Reshetova, Elena, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening * Leon Romanovsky (leon@kernel.org) wrote: > On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote: > > > > > * Reshetova, Elena (elena.reshetova@intel.com) wrote: > > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > > > Replying only to the not-so-far addressed points. > > > > > > > > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote: > > > > > > > > Hi Greg, > > > > > > > > > > <...> > > > > > > > > > > > > > 3) All the tools are open-source and everyone can start using them right > > > > > away > > > > > > > even > > > > > > > > without any special HW (readme has description of what is needed). > > > > > > > > Tools and documentation is here: > > > > > > > > https://github.com/intel/ccc-linux-guest-hardening > > > > > > > > > > > > > > Again, as our documentation states, when you submit patches based on > > > > > > > these tools, you HAVE TO document that. Otherwise we think you all are > > > > > > > crazy and will get your patches rejected. You all know this, why ignore > > > > > > > it? > > > > > > > > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when > > > > > > we are submitting a fix that we have to list the way how it has been found. > > > > > > We will fix this in the future submissions, but some bugs we have are found > > > by > > > > > > plain code audit, so 'human' is the tool. > > > > > > > > > > My problem with that statement is that by applying different threat > > > > > model you "invent" bugs which didn't exist in a first place. > > > > > > > > > > For example, in this [1] latest submission, authors labeled correct > > > > > behaviour as "bug". > > > > > > > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1- > > > > > alexander.shishkin@linux.intel.com/ > > > > > > > > Hm.. Does everyone think that when kernel dies with unhandled page fault > > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in > > > some > > > > other cases we already have fixes or investigating) it represents a correct > > > behavior even if > > > > you expect that all your pci HW devices are trusted? What about an error in > > > two > > > > consequent pci reads? What about just some failure that results in erroneous > > > input? > > > > > > I'm not sure you'll get general agreement on those answers for all > > > devices and situations; I think for most devices for non-CoCo > > > situations, then people are generally OK with a misbehaving PCI device > > > causing a kernel crash, since most people are running without IOMMU > > > anyway, a misbehaving device can cause otherwise undetectable chaos. > > > > Ok, if this is a consensus within the kernel community, then we can consider > > the fixes strictly from the CoCo threat model point of view. > > > > > > > > I'd say: > > > a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't > > > guarantee forward progress or stop the hypervisor doing something > > > truly stupid. > > > > Yes, denial of service is out of scope but I would not pile all crashes as > > 'safe' automatically. Depending on the crash, it can be used as a > > primitive to launch further attacks: privilege escalation, information > > disclosure and corruption. It is especially true for memory corruption > > issues. > > > > > b) For CoCo, information disclosure, or corruption IS a problem > > > > Agreed, but the path to this can incorporate a number of attack > > primitives, as well as bug chaining. So, if the bug is detected, and > > fix is easy, instead of thinking about possible implications and its > > potential usage in exploit writing, safer to fix it. > > > > > > > > c) For non-CoCo some people might care about robustness of the kernel > > > against a failing PCI device, but generally I think they worry about > > > a fairly clean failure, even in the unexpected-hot unplug case. > > > > Ok. > > With my other hat as a representative of hardware vendor (at least for > NIC part), who cares about quality of our devices, we don't want to hide > ANY crash related to our devices, especially if it is related to misbehaving > PCI HW logic. Any uncontrolled "robustness" hides real issues and makes > QA/customer support much harder. Yeh if you're adding new code to be more careful, you want the code to fail/log the problem, not hide it. (Although heck, I suspect there are a million apparently working PCI cards out there that break some spec somewhere). Dave > Thanks > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 15:29 ` Reshetova, Elena 2023-01-25 16:40 ` Theodore Ts'o 2023-01-26 11:19 ` Leon Romanovsky @ 2023-01-26 16:29 ` Michael S. Tsirkin 2023-01-27 8:52 ` Reshetova, Elena 2 siblings, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-26 16:29 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > And this is a very special aspect of 'hardening' since it is about hardening a kernel > under different threat model/assumptions. I am not sure it's that special in that hardening IMHO is not a specific threat model or a set of assumptions. IIUC it's just something that helps reduce severity of vulnerabilities. Similarly, one can use the CC hardware in a variety of ways I guess. And one way is just that - hardening linux such that ability to corrupt guest memory does not automatically escalate into guest code execution. If you put it this way, you get to participate in a well understood problem space instead of constantly saying "yes but CC is special". And further, you will now talk about features as opposed to fixing bugs. Which will stop annoying people who currently seem annoyed by the implication that their code is buggy simply because it does not cache in memory all data read from hardware. Finally, you then don't really need to explain why e.g. DoS is not a problem but info leak is a problem - when for many users it's actually the reverse - the reason is not that it's not part of a threat model - which then makes you work hard to define the threat model - but simply that CC hardware does not support this kind of hardening. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-26 16:29 ` Michael S. Tsirkin @ 2023-01-27 8:52 ` Reshetova, Elena 2023-01-27 10:04 ` Michael S. Tsirkin 0 siblings, 1 reply; 102+ messages in thread From: Reshetova, Elena @ 2023-01-27 8:52 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > And this is a very special aspect of 'hardening' since it is about hardening a > kernel > > under different threat model/assumptions. > > I am not sure it's that special in that hardening IMHO is not a specific > threat model or a set of assumptions. IIUC it's just something that > helps reduce severity of vulnerabilities. Similarly, one can use the CC > hardware in a variety of ways I guess. And one way is just that - > hardening linux such that ability to corrupt guest memory does not > automatically escalate into guest code execution. I am not sure if I fully follow you on this. I do agree that it is in principle the same 'hardening' that we have been doing in Linux for decades just applied to a new attack surface, host <-> guest, vs userspace <->kernel. Interfaces have changed, but the types of vulnerabilities, etc are the same. The attacker model is somewhat different because we have different expectations on what host/hypervisor should be able to do to the guest (following business reasons and use-cases), versus what we expect normal userspace being able to "do" towards kernel. The host and hypervisor still has a lot of control over the guest (ability to start/stop it, manage its resources, etc). But the reasons behind this doesn’t come from the fact that security CoCo HW not being able to support this stricter security model (it cannot now indeed, but this is a design decision), but from the fact that it is important for Cloud service providers to retain that level of control over their infrastructure. > > If you put it this way, you get to participate in a well understood > problem space instead of constantly saying "yes but CC is special". And > further, you will now talk about features as opposed to fixing bugs. > Which will stop annoying people who currently seem annoyed by the > implication that their code is buggy simply because it does not cache in > memory all data read from hardware. Finally, you then don't really need > to explain why e.g. DoS is not a problem but info leak is a problem - when > for many users it's actually the reverse - the reason is not that it's > not part of a threat model - which then makes you work hard to define > the threat model - but simply that CC hardware does not support this > kind of hardening. But this won't be correct statement, because it is not limitation of HW, but the threat and business model that Confidential Computing exists in. I am not aware of a single cloud provider who would be willing to use the HW that takes the full control of their infrastructure and running confidential guests, leaving them with no mechanisms to control the load balancing, enforce resource usage, etc. So, given that nobody needs/willing to use such HW, such HW simply doesn’t exist. So, I would still say that the model we operate in CoCo usecases is somewhat special, but I do agree that given that we list a couple of these special assumptions (over which ones we have no control or ability to influence, none of us are business people), then the rest becomes just careful enumeration of attack surface interfaces and break up of potential mitigations. Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-27 8:52 ` Reshetova, Elena @ 2023-01-27 10:04 ` Michael S. Tsirkin 2023-01-27 12:25 ` Reshetova, Elena 0 siblings, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-27 10:04 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote: > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > And this is a very special aspect of 'hardening' since it is about hardening a > > kernel > > > under different threat model/assumptions. > > > > I am not sure it's that special in that hardening IMHO is not a specific > > threat model or a set of assumptions. IIUC it's just something that > > helps reduce severity of vulnerabilities. Similarly, one can use the CC > > hardware in a variety of ways I guess. And one way is just that - > > hardening linux such that ability to corrupt guest memory does not > > automatically escalate into guest code execution. > > I am not sure if I fully follow you on this. I do agree that it is in principle > the same 'hardening' that we have been doing in Linux for decades just > applied to a new attack surface, host <-> guest, vs userspace <->kernel. Sorry about being unclear this is not the type of hardening I meant really. The "hardening" you meant is preventing kernel vulnerabilities, right? This is what we've been doing for decades. But I meant slightly newer things like e.g. KASLR or indeed ASLR generally - we are trying to reduce a chance a vulnerability causes random code execution as opposed to a DOS. To think in these terms you do not need to think about attack surfaces - in the system including a hypervisor, guest supervisor and guest userspace hiding one component from others is helpful even if they share a privelege level. > Interfaces have changed, but the types of vulnerabilities, etc are the same. > The attacker model is somewhat different because we have > different expectations on what host/hypervisor should be able to do > to the guest (following business reasons and use-cases), versus what we > expect normal userspace being able to "do" towards kernel. The host and > hypervisor still has a lot of control over the guest (ability to start/stop it, > manage its resources, etc). But the reasons behind this doesn’t come > from the fact that security CoCo HW not being able to support this stricter > security model (it cannot now indeed, but this is a design decision), but > from the fact that it is important for Cloud service providers to retain that > level of control over their infrastructure. Surely they need ability to control resource usage, not ability to execute DOS attacks. Current hardware just does not have ability to allow the former without the later. > > > > If you put it this way, you get to participate in a well understood > > problem space instead of constantly saying "yes but CC is special". And > > further, you will now talk about features as opposed to fixing bugs. > > Which will stop annoying people who currently seem annoyed by the > > implication that their code is buggy simply because it does not cache in > > memory all data read from hardware. Finally, you then don't really need > > to explain why e.g. DoS is not a problem but info leak is a problem - when > > for many users it's actually the reverse - the reason is not that it's > > not part of a threat model - which then makes you work hard to define > > the threat model - but simply that CC hardware does not support this > > kind of hardening. > > But this won't be correct statement, because it is not limitation of HW, but the > threat and business model that Confidential Computing exists in. I am not > aware of a single cloud provider who would be willing to use the HW that > takes the full control of their infrastructure and running confidential guests, > leaving them with no mechanisms to control the load balancing, enforce > resource usage, etc. So, given that nobody needs/willing to use such HW, > such HW simply doesn’t exist. > > So, I would still say that the model we operate in CoCo usecases is somewhat > special, but I do agree that given that we list a couple of these special assumptions > (over which ones we have no control or ability to influence, none of us are business > people), then the rest becomes just careful enumeration of attack surface interfaces > and break up of potential mitigations. > > Best Regards, > Elena. > I'd say each business has a slightly different business model, no? Finding common ground is what helps us share code ... -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-27 10:04 ` Michael S. Tsirkin @ 2023-01-27 12:25 ` Reshetova, Elena 2023-01-27 14:32 ` Michael S. Tsirkin 2023-01-27 20:51 ` Carlos Bilbao 0 siblings, 2 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-01-27 12:25 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening > On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote: > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > And this is a very special aspect of 'hardening' since it is about hardening a > > > kernel > > > > under different threat model/assumptions. > > > > > > I am not sure it's that special in that hardening IMHO is not a specific > > > threat model or a set of assumptions. IIUC it's just something that > > > helps reduce severity of vulnerabilities. Similarly, one can use the CC > > > hardware in a variety of ways I guess. And one way is just that - > > > hardening linux such that ability to corrupt guest memory does not > > > automatically escalate into guest code execution. > > > > I am not sure if I fully follow you on this. I do agree that it is in principle > > the same 'hardening' that we have been doing in Linux for decades just > > applied to a new attack surface, host <-> guest, vs userspace <->kernel. > > Sorry about being unclear this is not the type of hardening I meant > really. The "hardening" you meant is preventing kernel vulnerabilities, > right? This is what we've been doing for decades. > But I meant slightly newer things like e.g. KASLR or indeed ASLR generally - > we are trying to reduce a chance a vulnerability causes random > code execution as opposed to a DOS. To think in these terms you do not > need to think about attack surfaces - in the system including > a hypervisor, guest supervisor and guest userspace hiding > one component from others is helpful even if they share > a privelege level. Do you mean that the fact that CoCo guest has memory encrypted can help even in non-CoCo scenarios? I am sorry, I still seem not to be able to grasp your idea fully. When the privilege level is shared, there is no incentive to perform privilege escalation attacks across components, so why hide them from each other? Data protection? But I don’t think you are talking about this? I do agree that KASLR is stronger when you remove the possibility to read the memory (make sure kernel code is execute only) you are trying to attack, but again not sure if you mean this. > > > > > Interfaces have changed, but the types of vulnerabilities, etc are the same. > > The attacker model is somewhat different because we have > > different expectations on what host/hypervisor should be able to do > > to the guest (following business reasons and use-cases), versus what we > > expect normal userspace being able to "do" towards kernel. The host and > > hypervisor still has a lot of control over the guest (ability to start/stop it, > > manage its resources, etc). But the reasons behind this doesn’t come > > from the fact that security CoCo HW not being able to support this stricter > > security model (it cannot now indeed, but this is a design decision), but > > from the fact that it is important for Cloud service providers to retain that > > level of control over their infrastructure. > > Surely they need ability to control resource usage, not ability to execute DOS > attacks. Current hardware just does not have ability to allow the former > without the later. I don’t see why it cannot be added to HW if requirement comes. However, I think in cloud provider world being able to control resources equals to being able to deny these resources when required, so being able to denial of service its clients is kind of build-in expectation that everyone just agrees on. > > > > > > > If you put it this way, you get to participate in a well understood > > > problem space instead of constantly saying "yes but CC is special". And > > > further, you will now talk about features as opposed to fixing bugs. > > > Which will stop annoying people who currently seem annoyed by the > > > implication that their code is buggy simply because it does not cache in > > > memory all data read from hardware. Finally, you then don't really need > > > to explain why e.g. DoS is not a problem but info leak is a problem - when > > > for many users it's actually the reverse - the reason is not that it's > > > not part of a threat model - which then makes you work hard to define > > > the threat model - but simply that CC hardware does not support this > > > kind of hardening. > > > > But this won't be correct statement, because it is not limitation of HW, but the > > threat and business model that Confidential Computing exists in. I am not > > aware of a single cloud provider who would be willing to use the HW that > > takes the full control of their infrastructure and running confidential guests, > > leaving them with no mechanisms to control the load balancing, enforce > > resource usage, etc. So, given that nobody needs/willing to use such HW, > > such HW simply doesn’t exist. > > > > So, I would still say that the model we operate in CoCo usecases is somewhat > > special, but I do agree that given that we list a couple of these special > assumptions > > (over which ones we have no control or ability to influence, none of us are > business > > people), then the rest becomes just careful enumeration of attack surface > interfaces > > and break up of potential mitigations. > > > > Best Regards, > > Elena. > > > > I'd say each business has a slightly different business model, no? > Finding common ground is what helps us share code ... Fully agree, and a good discussion with everyone willing to listen and cooperate can go a long way into defining the best implementation. Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-27 12:25 ` Reshetova, Elena @ 2023-01-27 14:32 ` Michael S. Tsirkin 2023-01-27 20:51 ` Carlos Bilbao 1 sibling, 0 replies; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-27 14:32 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On Fri, Jan 27, 2023 at 12:25:09PM +0000, Reshetova, Elena wrote: > > > On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote: > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: > > > > > And this is a very special aspect of 'hardening' since it is about hardening a > > > > kernel > > > > > under different threat model/assumptions. > > > > > > > > I am not sure it's that special in that hardening IMHO is not a specific > > > > threat model or a set of assumptions. IIUC it's just something that > > > > helps reduce severity of vulnerabilities. Similarly, one can use the CC > > > > hardware in a variety of ways I guess. And one way is just that - > > > > hardening linux such that ability to corrupt guest memory does not > > > > automatically escalate into guest code execution. > > > > > > I am not sure if I fully follow you on this. I do agree that it is in principle > > > the same 'hardening' that we have been doing in Linux for decades just > > > applied to a new attack surface, host <-> guest, vs userspace <->kernel. > > > > Sorry about being unclear this is not the type of hardening I meant > > really. The "hardening" you meant is preventing kernel vulnerabilities, > > right? This is what we've been doing for decades. > > But I meant slightly newer things like e.g. KASLR or indeed ASLR generally - > > we are trying to reduce a chance a vulnerability causes random > > code execution as opposed to a DOS. To think in these terms you do not > > need to think about attack surfaces - in the system including > > a hypervisor, guest supervisor and guest userspace hiding > > one component from others is helpful even if they share > > a privelege level. > > Do you mean that the fact that CoCo guest has memory encrypted > can help even in non-CoCo scenarios? Yes. > I am sorry, I still seem not to be able > to grasp your idea fully. When the privilege level is shared, there is no > incentive to perform privilege escalation attacks across components, > so why hide them from each other? Because limiting horisontal movement between components is still valuable. > Data protection? But I don’t think you > are talking about this? I do agree that KASLR is stronger when you remove > the possibility to read the memory (make sure kernel code is execute only) > you are trying to attack, but again not sure if you mean this. It's an example. If kernel was 100% secure we won't need KASLR. Nothing ever is though. > > > > > > > > > Interfaces have changed, but the types of vulnerabilities, etc are the same. > > > The attacker model is somewhat different because we have > > > different expectations on what host/hypervisor should be able to do > > > to the guest (following business reasons and use-cases), versus what we > > > expect normal userspace being able to "do" towards kernel. The host and > > > hypervisor still has a lot of control over the guest (ability to start/stop it, > > > manage its resources, etc). But the reasons behind this doesn’t come > > > from the fact that security CoCo HW not being able to support this stricter > > > security model (it cannot now indeed, but this is a design decision), but > > > from the fact that it is important for Cloud service providers to retain that > > > level of control over their infrastructure. > > > > Surely they need ability to control resource usage, not ability to execute DOS > > attacks. Current hardware just does not have ability to allow the former > > without the later. > > I don’t see why it cannot be added to HW if requirement comes. However, I think > in cloud provider world being able to control resources equals to being able > to deny these resources when required, so being able to denial of service its clients > is kind of build-in expectation that everyone just agrees on. > > > > > > > > > > > If you put it this way, you get to participate in a well understood > > > > problem space instead of constantly saying "yes but CC is special". And > > > > further, you will now talk about features as opposed to fixing bugs. > > > > Which will stop annoying people who currently seem annoyed by the > > > > implication that their code is buggy simply because it does not cache in > > > > memory all data read from hardware. Finally, you then don't really need > > > > to explain why e.g. DoS is not a problem but info leak is a problem - when > > > > for many users it's actually the reverse - the reason is not that it's > > > > not part of a threat model - which then makes you work hard to define > > > > the threat model - but simply that CC hardware does not support this > > > > kind of hardening. > > > > > > But this won't be correct statement, because it is not limitation of HW, but the > > > threat and business model that Confidential Computing exists in. I am not > > > aware of a single cloud provider who would be willing to use the HW that > > > takes the full control of their infrastructure and running confidential guests, > > > leaving them with no mechanisms to control the load balancing, enforce > > > resource usage, etc. So, given that nobody needs/willing to use such HW, > > > such HW simply doesn’t exist. > > > > > > So, I would still say that the model we operate in CoCo usecases is somewhat > > > special, but I do agree that given that we list a couple of these special > > assumptions > > > (over which ones we have no control or ability to influence, none of us are > > business > > > people), then the rest becomes just careful enumeration of attack surface > > interfaces > > > and break up of potential mitigations. > > > > > > Best Regards, > > > Elena. > > > > > > > I'd say each business has a slightly different business model, no? > > Finding common ground is what helps us share code ... > > Fully agree, and a good discussion with everyone willing to listen and cooperate > can go a long way into defining the best implementation. > > Best Regards, > Elena. Right. My point was that trying to show how CC usecases are similar to other existing ones will be more helpful for everyone than just focusing on how they are different. I hope I was able to show some similarities. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-27 12:25 ` Reshetova, Elena 2023-01-27 14:32 ` Michael S. Tsirkin @ 2023-01-27 20:51 ` Carlos Bilbao 1 sibling, 0 replies; 102+ messages in thread From: Carlos Bilbao @ 2023-01-27 20:51 UTC (permalink / raw) To: Reshetova, Elena, Michael S. Tsirkin Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening On 1/27/23 6:25 AM, Reshetova, Elena wrote: > >> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote: >>>> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote: >>>>> And this is a very special aspect of 'hardening' since it is about hardening a >>>> kernel >>>>> under different threat model/assumptions. >>>> >>>> I am not sure it's that special in that hardening IMHO is not a specific >>>> threat model or a set of assumptions. IIUC it's just something that >>>> helps reduce severity of vulnerabilities. Similarly, one can use the CC >>>> hardware in a variety of ways I guess. And one way is just that - >>>> hardening linux such that ability to corrupt guest memory does not >>>> automatically escalate into guest code execution. >>> >>> I am not sure if I fully follow you on this. I do agree that it is in principle >>> the same 'hardening' that we have been doing in Linux for decades just >>> applied to a new attack surface, host <-> guest, vs userspace <->kernel. >> >> Sorry about being unclear this is not the type of hardening I meant >> really. The "hardening" you meant is preventing kernel vulnerabilities, >> right? This is what we've been doing for decades. >> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally - >> we are trying to reduce a chance a vulnerability causes random >> code execution as opposed to a DOS. To think in these terms you do not >> need to think about attack surfaces - in the system including >> a hypervisor, guest supervisor and guest userspace hiding >> one component from others is helpful even if they share >> a privelege level. > > Do you mean that the fact that CoCo guest has memory encrypted > can help even in non-CoCo scenarios? I am sorry, I still seem not to be able > to grasp your idea fully. When the privilege level is shared, there is no > incentive to perform privilege escalation attacks across components, > so why hide them from each other? Data protection? But I don’t think you > are talking about this? I do agree that KASLR is stronger when you remove > the possibility to read the memory (make sure kernel code is execute only) > you are trying to attack, but again not sure if you mean this. > >> >> >> >>> Interfaces have changed, but the types of vulnerabilities, etc are the same. >>> The attacker model is somewhat different because we have >>> different expectations on what host/hypervisor should be able to do >>> to the guest (following business reasons and use-cases), versus what we >>> expect normal userspace being able to "do" towards kernel. The host and >>> hypervisor still has a lot of control over the guest (ability to start/stop it, >>> manage its resources, etc). But the reasons behind this doesn’t come >>> from the fact that security CoCo HW not being able to support this stricter >>> security model (it cannot now indeed, but this is a design decision), but >>> from the fact that it is important for Cloud service providers to retain that >>> level of control over their infrastructure. >> >> Surely they need ability to control resource usage, not ability to execute DOS >> attacks. Current hardware just does not have ability to allow the former >> without the later. > > I don’t see why it cannot be added to HW if requirement comes. However, I think > in cloud provider world being able to control resources equals to being able > to deny these resources when required, so being able to denial of service its clients > is kind of build-in expectation that everyone just agrees on. > Just a thought, but I wouldn't discard availability guarantees like that at some point. As a client I would certainly like it, and if it's good for business... >> >>>> >>>> If you put it this way, you get to participate in a well understood >>>> problem space instead of constantly saying "yes but CC is special". And >>>> further, you will now talk about features as opposed to fixing bugs. >>>> Which will stop annoying people who currently seem annoyed by the >>>> implication that their code is buggy simply because it does not cache in >>>> memory all data read from hardware. Finally, you then don't really need >>>> to explain why e.g. DoS is not a problem but info leak is a problem - when >>>> for many users it's actually the reverse - the reason is not that it's >>>> not part of a threat model - which then makes you work hard to define >>>> the threat model - but simply that CC hardware does not support this >>>> kind of hardening. >>> >>> But this won't be correct statement, because it is not limitation of HW, but the >>> threat and business model that Confidential Computing exists in. I am not >>> aware of a single cloud provider who would be willing to use the HW that >>> takes the full control of their infrastructure and running confidential guests, >>> leaving them with no mechanisms to control the load balancing, enforce >>> resource usage, etc. So, given that nobody needs/willing to use such HW, >>> such HW simply doesn’t exist. >>> >>> So, I would still say that the model we operate in CoCo usecases is somewhat >>> special, but I do agree that given that we list a couple of these special >> assumptions >>> (over which ones we have no control or ability to influence, none of us are >> business >>> people), then the rest becomes just careful enumeration of attack surface >> interfaces >>> and break up of potential mitigations. >>> >>> Best Regards, >>> Elena. >>> >> >> I'd say each business has a slightly different business model, no? >> Finding common ground is what helps us share code ... > > Fully agree, and a good discussion with everyone willing to listen and cooperate > can go a long way into defining the best implementation. > > Best Regards, > Elena. Thanks for sharing the threat model with the list! Carlos ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 12:28 Linux guest kernel threat model for Confidential Computing Reshetova, Elena 2023-01-25 12:43 ` Greg Kroah-Hartman @ 2023-01-30 11:36 ` Christophe de Dinechin 2023-01-30 12:00 ` Kirill A. Shutemov 2023-01-31 10:06 ` Reshetova, Elena 2023-02-07 0:27 ` Carlos Bilbao 2 siblings, 2 replies; 102+ messages in thread From: Christophe de Dinechin @ 2023-01-30 11:36 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List Hi Elena, On 2023-01-25 at 12:28 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> wrote... > Hi Greg, > > You mentioned couple of times (last time in this recent thread: > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > discussing the updated threat model for kernel, so this email is a start in this direction. > > (Note: I tried to include relevant people from different companies, as well as linux-coco > mailing list, but I hope everyone can help by including additional people as needed). > > As we have shared before in various lkml threads/conference presentations > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > This is a big change in the threat model and requires both careful assessment of the > new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations > and security validation techniques. This is the activity that we have started back at Intel > and the current status can be found in > > 1) Threat model and potential mitigations: > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html I only looked at this one so far. Here are a few quick notes: DoS attacks are out of scope. What about timing attacks, which were the basis of some of the most successful attacks in the past years? My understanding is that TDX relies on existing mitigations, and does not introduce anythign new in that space. Worth mentioning in that "out of scope" section IMO. Why are TDVMCALL hypercalls listed as an "existing" communication interface? That seems to exclude the TDX module from the TCB. Also, "shared memory for I/Os" seems unnecessarily restrictive, since it excludes interrupts, timing attacks, network or storage attacks, or devices passed through to the guest. The latter category seems important to list, since there are separate efforts to provide confidential computing capabilities e.g. to PCI devices, which were discussed elsewhere in this thread. I suspect that my question above is due to ambiguous wording. What I initially read as "this is out of scope for TDX" morphs in the next paragraph into "we are going to explain how to mitigate attacks through TDVMCALLS and shared memory for I/O". Consider rewording to clarify the intent of these paragraphs. Nit: I suggest adding bullets to the items below "between host/VMM and the guest" You could count the "unique code locations" that can consume malicious input in drivers, why not in core kernel? I think you write elsewhere that the drivers account for the vast majority, so I suspect you have the numbers. "The implementation of the #VE handler is simple and does not require an in-depth security audit or fuzzing since it is not the actual consumer of the host/VMM supplied untrusted data": The assumption there seems to be that the host will never be able to supply data (e.g. through a bounce buffer) that it can trick the guest into executing. If that is indeed the assumption, it is worth mentioning explicitly. I suspect it is a bit weak, since many earlier attacks were based on executing the wrong code. Notably, it is worth pointing out that I/O buffers are _not_ encrypted with the CPU key (as opposed to any device key e.g. for PCI encryption) in either TDX or SEV. Is there for example anything that precludes TDX or SEV from executing code in the bounce buffers? "We only care about users that read from MMIO": Why? My guess is that this is the only way bad data could be fed to the guest. But what if a bad MMIO write due to poisoned data injected earlier was a necessary step to open the door to a successful attack? > > 2) One of the described in the above doc mitigations is "hardening of the enabled > code". What we mean by this, as well as techniques that are being used are > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html > > 3) All the tools are open-source and everyone can start using them right away even > without any special HW (readme has description of what is needed). > Tools and documentation is here: > https://github.com/intel/ccc-linux-guest-hardening > > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found > here: https://github.com/intel/tdx/commits/guest-next > > So, my main question before we start to argue about the threat model, mitigations, etc, > is what is the good way to get this reviewed to make sure everyone is aligned? > There are a lot of angles and details, so what is the most efficient method? > Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html > into logical pieces and start submitting it to mailing list for discussion one by one? > Any other methods? > > The original plan we had in mind is to start discussing the relevant pieces when submitting the code, > i.e. when submitting the device filter patches, we will include problem statement, threat model link, > data, alternatives considered, etc. > > Best Regards, > Elena. > > [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/ > [2] https://lpc.events/event/16/contributions/1328/ > [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/ -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-30 11:36 ` Christophe de Dinechin @ 2023-01-30 12:00 ` Kirill A. Shutemov 2023-01-30 15:14 ` Michael S. Tsirkin 2023-01-31 10:06 ` Reshetova, Elena 1 sibling, 1 reply; 102+ messages in thread From: Kirill A. Shutemov @ 2023-01-30 12:00 UTC (permalink / raw) To: Christophe de Dinechin Cc: Reshetova, Elena, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Mon, Jan 30, 2023 at 12:36:34PM +0100, Christophe de Dinechin wrote: > Is there for example anything that precludes TDX or SEV from executing > code in the bounce buffers? In TDX, attempt to fetch instructions from shared memory (i.e. bounce buffer) will cause #GP, only data fetch is allowed. Page table also cannot be placed there and will cause the same #GP. -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-30 12:00 ` Kirill A. Shutemov @ 2023-01-30 15:14 ` Michael S. Tsirkin 0 siblings, 0 replies; 102+ messages in thread From: Michael S. Tsirkin @ 2023-01-30 15:14 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Christophe de Dinechin, Reshetova, Elena, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Mon, Jan 30, 2023 at 03:00:52PM +0300, Kirill A. Shutemov wrote: > On Mon, Jan 30, 2023 at 12:36:34PM +0100, Christophe de Dinechin wrote: > > Is there for example anything that precludes TDX or SEV from executing > > code in the bounce buffers? > > In TDX, attempt to fetch instructions from shared memory (i.e. bounce > buffer) will cause #GP, only data fetch is allowed. Page table also cannot > be placed there and will cause the same #GP. Same with SEV IIRC. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-30 11:36 ` Christophe de Dinechin 2023-01-30 12:00 ` Kirill A. Shutemov @ 2023-01-31 10:06 ` Reshetova, Elena 2023-01-31 16:52 ` Christophe de Dinechin 1 sibling, 1 reply; 102+ messages in thread From: Reshetova, Elena @ 2023-01-31 10:06 UTC (permalink / raw) To: Christophe de Dinechin Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List Hi Dinechin, Thank you very much for your review! Please find the replies inline. > > Hi Elena, > > On 2023-01-25 at 12:28 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> > wrote... > > Hi Greg, > > > > You mentioned couple of times (last time in this recent thread: > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to > start > > discussing the updated threat model for kernel, so this email is a start in this > direction. > > > > (Note: I tried to include relevant people from different companies, as well as > linux-coco > > mailing list, but I hope everyone can help by including additional people as > needed). > > > > As we have shared before in various lkml threads/conference presentations > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we > have a > > change in the threat model where guest kernel doesn’t anymore trust the > hypervisor. > > This is a big change in the threat model and requires both careful assessment of > the > > new (hypervisor <-> guest kernel) attack surface, as well as careful design of > mitigations > > and security validation techniques. This is the activity that we have started back > at Intel > > and the current status can be found in > > > > 1) Threat model and potential mitigations: > > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html > > I only looked at this one so far. Here are a few quick notes: > > DoS attacks are out of scope. What about timing attacks, which were the > basis of some of the most successful attacks in the past years? My > understanding is that TDX relies on existing mitigations, and does not > introduce anythign new in that space. Worth mentioning in that "out of > scope" section IMO. It is not out of the scope because TD guest SW has to think about these matters and protect adequately. We have a section lower on " Transient Execution attacks mitigation" https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#transient-execution-attacks-and-their-mitigation but I agree it is worth pointing to this (and generic side-channel attacks) already in the scoping. I will make an update. > > Why are TDVMCALL hypercalls listed as an "existing" communication interface? > That seems to exclude the TDX module from the TCB. I believe this is just ambiguous wording, I need to find a better one. TDVMCALL is indeed a *new* TDX specific communication interface, but it is only a transport in this case for the actual *existing* legacy communication interfaces between the VM guest and host/hypervisor (read/write MSRs, pci config space access, port IO and MMIO, etc). Also, "shared memory for > I/Os" seems unnecessarily restrictive, since it excludes interrupts, timing > attacks, network or storage attacks, or devices passed through to the guest. > The latter category seems important to list, since there are separate > efforts to provide confidential computing capabilities e.g. to PCI devices, > which were discussed elsewhere in this thread. The second bullet meant to say that we also have another interface how CoCo guest and host/VMM can communicate and it is done via shared pages (vs private pages that are only accessible to confidential computing guest). Maybe I should drop the "IO" part of this and it would avoid confusion. The other means (some are higher-level abstractions like disk operations that happen over bounce buffer in shared memory), like interrupts, disk, etc, we do cover below in separate sections of the doc with exception of covering CoCo-enabled devices. This is smth we can briefly mention as an addition, but since we don’t have these devices yet, and neither we have linux implementation that can securely add them to the CoCo guest, I find it preliminary to discuss details at this point. > I suspect that my question above is due to ambiguous wording. What I > initially read as "this is out of scope for TDX" morphs in the next > paragraph into "we are going to explain how to mitigate attacks through > TDVMCALLS and shared memory for I/O". Consider rewording to clarify the > intent of these paragraphs. > Sure, sorry for ambiguous wording, will try to clarify. > Nit: I suggest adding bullets to the items below "between host/VMM and the > guest" Yes, it used to have it actually, have to see what happened with recent docs update. > > You could count the "unique code locations" that can consume malicious input > in drivers, why not in core kernel? I think you write elsewhere that the > drivers account for the vast majority, so I suspect you have the numbers. I don’t have the ready numbers for core kernel, but if really needed, I can calculate them. Here https://github.com/intel/ccc-linux-guest-hardening/tree/master/bkc/audit/sample_output/6.0-rc2 you can find the public files that would produce this data: https://github.com/intel/ccc-linux-guest-hardening/blob/master/bkc/audit/sample_output/6.0-rc2/smatch_warns_6.0_tdx_allyesconfig is all hits (with taint propagation) for the whole allyesconfig (x86 build, CONFIG_COMPILE_TEST is off). https://github.com/intel/ccc-linux-guest-hardening/blob/master/bkc/audit/sample_output/6.0-rc2/smatch_warns_6.0_tdx_allyesconfig_filtered is the same but with most of the drivers dropped. > > "The implementation of the #VE handler is simple and does not require an > in-depth security audit or fuzzing since it is not the actual consumer of > the host/VMM supplied untrusted data": The assumption there seems to be that > the host will never be able to supply data (e.g. through a bounce buffer) > that it can trick the guest into executing. If that is indeed the > assumption, it is worth mentioning explicitly. I suspect it is a bit weak, > since many earlier attacks were based on executing the wrong code. Notably, > it is worth pointing out that I/O buffers are _not_ encrypted with the CPU > key (as opposed to any device key e.g. for PCI encryption) in either > TDX or SEV. Is there for example anything that precludes TDX or SEV from > executing code in the bounce buffers? This was already replied by Kirill, any code execution out of shared memory generates a #GP. > > "We only care about users that read from MMIO": Why? My guess is that this > is the only way bad data could be fed to the guest. But what if a bad MMIO > write due to poisoned data injected earlier was a necessary step to open the > door to a successful attack? The entry point of the attack is still a "read". The situation you describe can happen, but the root cause would be still an incorrectly handled MMIO read and this is what we try to check with both fuzzing and auditing the 'read' entry points. Thank you again for the review! Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-31 10:06 ` Reshetova, Elena @ 2023-01-31 16:52 ` Christophe de Dinechin 2023-02-02 11:31 ` Reshetova, Elena 0 siblings, 1 reply; 102+ messages in thread From: Christophe de Dinechin @ 2023-01-31 16:52 UTC (permalink / raw) To: Reshetova, Elena Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On 2023-01-31 at 10:06 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> wrote... > Hi Dinechin, Nit: My first name is actually Christophe ;-) [snip] >> "The implementation of the #VE handler is simple and does not require an >> in-depth security audit or fuzzing since it is not the actual consumer of >> the host/VMM supplied untrusted data": The assumption there seems to be that >> the host will never be able to supply data (e.g. through a bounce buffer) >> that it can trick the guest into executing. If that is indeed the >> assumption, it is worth mentioning explicitly. I suspect it is a bit weak, >> since many earlier attacks were based on executing the wrong code. Notably, >> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU >> key (as opposed to any device key e.g. for PCI encryption) in either >> TDX or SEV. Is there for example anything that precludes TDX or SEV from >> executing code in the bounce buffers? > > This was already replied by Kirill, any code execution out of shared memory generates > a #GP. Apologies for my wording. Everyone interpreted "executing" as "executing directly on the bounce buffer page", when what I meant is "consuming data fetched from the bounce buffers as code" (not necessarily directly). For example, in the diagram in your document, the guest kernel is a monolithic piece. In reality, there are dynamically loaded components. In the original SEV implementation, with pre-attestation, the measurement could only apply before loading any DLKM (I believe, not really sure). As another example, SEVerity (CVE-2020-12967 [1]) worked by injecting a payload directly into the guest kernel using virtio-based network I/O. That is what I referred to when I wrote "many earlier attacks were based on executing the wrong code". The fact that I/O buffers are not encrypted matters here, because it gives the host ample latitude to observe or even corrupt all I/Os, as many others have pointed out. Notably, disk crypto may not be designed to resist to a host that can see and possibly change the I/Os. So let me rephrase my vague question as a few more precise ones: 1) What are the effects of semi-random kernel code injection? If the host knows that a given bounce buffer happens to be used later to execute some kernel code, it can start flipping bits in it to try and trigger arbitrary code paths in the guest. My understanding is that crypto alone (i.e. without additional layers like dm-integrity) will happily decrypt that into a code stream with pseudo-random instructions in it, not vehemently error out. So, while TDX precludes the host from writing into guest memory directly, since the bounce buffers are shared, TDX will not prevent the host from flipping bits there. It's then just a matter of guessing where the bits will go, and hoping that some bits execute at guest PL0. Of course, this can be mitigated by either only using static configs, or using dm-verity/dm-integrity, or maybe some other mechanisms. Shouldn't that be part of your document? To be clear: you mention under "Storage protection" that you use dm-crypt and dm-integrity, so I believe *you* know, but your readers may not figure out why dm-integrity is integral to the process, notably after you write "Users could use other encryption schemes". 2) What are the effects of random user code injection? It's the same as above, except that now you can target a much wider range of input data, including shell scripts, etc. So the attack surface is much larger. 3) What is the effect of data poisoning? You don't necessarily need to corrupt code. Being able to corrupt a system configuration file for example can be largely sufficient. 4) Are there I/O-based replay attacks that would work pre-attestation? My current mental model is that you load a "base" software stack into the TCB and then measure a relevant part of it. What you measure is somewhat implementation-dependent, but in the end, if the system is attested, you respond to a cryptographic challenge based on what was measured, and you then get relevant secrets, e.g. a disk decryption key, that let you make forward progress. However, what happens if every time you boot, the host feeds you bogus disk data just to try to steer the boot sequence along some specific path? I believe that the short answer is: the guest either: a) reaches attestation, but with bad in-memory data, so it fails the crypto exchange, and secrets are not leaked. b) does not reach attestation, so never gets the secrets, and therefore still fulfils the CC promise of not leaking secrets. So I personally feel this is OK, but it's worth writing up in your doc. Back to the #VE handler, if I can find a way to inject malicious code into my guest, what you wrote in that paragraph as a justification for no in-depth security still seems like "not exactly defense in depth". I would just remove the sentence, audit and fuzz that code with the same energy as for anything else that could face bad input. [1]: https://www.sec.in.tum.de/i20/student-work/code-execution-attacks-against-encrypted-virtual-machines -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-01-31 16:52 ` Christophe de Dinechin @ 2023-02-02 11:31 ` Reshetova, Elena 0 siblings, 0 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-02-02 11:31 UTC (permalink / raw) To: Christophe de Dinechin Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List > On 2023-01-31 at 10:06 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> > wrote... > > Hi Dinechin, > > Nit: My first name is actually Christophe ;-) I am sorry, my automation of extracting names from emails failed here (( > > [snip] > > >> "The implementation of the #VE handler is simple and does not require an > >> in-depth security audit or fuzzing since it is not the actual consumer of > >> the host/VMM supplied untrusted data": The assumption there seems to be > that > >> the host will never be able to supply data (e.g. through a bounce buffer) > >> that it can trick the guest into executing. If that is indeed the > >> assumption, it is worth mentioning explicitly. I suspect it is a bit weak, > >> since many earlier attacks were based on executing the wrong code. Notably, > >> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU > >> key (as opposed to any device key e.g. for PCI encryption) in either > >> TDX or SEV. Is there for example anything that precludes TDX or SEV from > >> executing code in the bounce buffers? > > > > This was already replied by Kirill, any code execution out of shared memory > generates > > a #GP. > > Apologies for my wording. Everyone interpreted "executing" as "executing > directly on the bounce buffer page", when what I meant is "consuming data > fetched from the bounce buffers as code" (not necessarily directly). I guess in theory it is possible, but we have not seen such usages in guest kernel code in practice during our audit. This would be pretty ugly thing to do imo even if you forget about confidential computing. > > For example, in the diagram in your document, the guest kernel is a > monolithic piece. In reality, there are dynamically loaded components. In > the original SEV implementation, with pre-attestation, the measurement could > only apply before loading any DLKM (I believe, not really sure). As another > example, SEVerity (CVE-2020-12967 [1]) worked by injecting a payload > directly into the guest kernel using virtio-based network I/O. That is what > I referred to when I wrote "many earlier attacks were based on executing the > wrong code". The above attack was only possible because an attacker was able to directly modify the code execution pointer to an arbitrary guest memory address (in that case guest NMI handler was substituted pointing to attacker payload). This is an obvious hole in the integrity protection of the guest private memory and its page table mappings. This is not possible with TDX and I believe with new versions of AMD SEV also. > > The fact that I/O buffers are not encrypted matters here, because it gives > the host ample latitude to observe or even corrupt all I/Os, as many others > have pointed out. Notably, disk crypto may not be designed to resist to a > host that can see and possibly change the I/Os. > > So let me rephrase my vague question as a few more precise ones: > > 1) What are the effects of semi-random kernel code injection? > > If the host knows that a given bounce buffer happens to be used later to > execute some kernel code, it can start flipping bits in it to try and > trigger arbitrary code paths in the guest. My understanding is that > crypto alone (i.e. without additional layers like dm-integrity) will > happily decrypt that into a code stream with pseudo-random instructions > in it, not vehemently error out. > > So, while TDX precludes the host from writing into guest memory directly, > since the bounce buffers are shared, TDX will not prevent the host from > flipping bits there. It's then just a matter of guessing where the bits > will go, and hoping that some bits execute at guest PL0. Of course, this > can be mitigated by either only using static configs, or using > dm-verity/dm-integrity, or maybe some other mechanisms. > > Shouldn't that be part of your document? To be clear: you mention under > "Storage protection" that you use dm-crypt and dm-integrity, so I believe > *you* know, but your readers may not figure out why dm-integrity is > integral to the process, notably after you write "Users could use other > encryption schemes". Sure, I can elaborate in the storage protection section about the importance of disk integrity protection. > > 2) What are the effects of random user code injection? > > It's the same as above, except that now you can target a much wider range > of input data, including shell scripts, etc. So the attack surface is > much larger. > > 3) What is the effect of data poisoning? > > You don't necessarily need to corrupt code. Being able to corrupt a > system configuration file for example can be largely sufficient. > > 4) Are there I/O-based replay attacks that would work pre-attestation? > > My current mental model is that you load a "base" software stack into the > TCB and then measure a relevant part of it. What you measure is somewhat > implementation-dependent, but in the end, if the system is attested, you > respond to a cryptographic challenge based on what was measured, and you > then get relevant secrets, e.g. a disk decryption key, that let you make > forward progress. However, what happens if every time you boot, the host > feeds you bogus disk data just to try to steer the boot sequence along > some specific path? What you ideally want is a full disk encryption with additional integrity protection, like aes-gcm authenticated encryption mode. Then there are no questions on the disk integrity and many attacks are mitigated. > > I believe that the short answer is: the guest either: > > a) reaches attestation, but with bad in-memory data, so it fails the > crypto exchange, and secrets are not leaked. > > b) does not reach attestation, so never gets the secrets, and therefore > still fulfils the CC promise of not leaking secrets. > > So I personally feel this is OK, but it's worth writing up in your doc. > Yes, I will expand the storage section more on this. > > Back to the #VE handler, if I can find a way to inject malicious code into > my guest, what you wrote in that paragraph as a justification for no > in-depth security still seems like "not exactly defense in depth". I would > just remove the sentence, audit and fuzz that code with the same energy as > for anything else that could face bad input. In fact most of our fuzzing hooks are inside #VE itself if you take a look on the implementation. They just don’t cover things like the #VE info decoding (information is provided by a trusted party - TDX module). Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-01-25 12:28 Linux guest kernel threat model for Confidential Computing Reshetova, Elena 2023-01-25 12:43 ` Greg Kroah-Hartman 2023-01-30 11:36 ` Christophe de Dinechin @ 2023-02-07 0:27 ` Carlos Bilbao 2023-02-07 6:03 ` Greg Kroah-Hartman 2 siblings, 1 reply; 102+ messages in thread From: Carlos Bilbao @ 2023-02-07 0:27 UTC (permalink / raw) To: Reshetova, Elena, Greg Kroah-Hartman Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On 1/25/23 6:28 AM, Reshetova, Elena wrote: > Hi Greg, > > You mentioned couple of times (last time in this recent thread: > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start > discussing the updated threat model for kernel, so this email is a start in this direction. > > (Note: I tried to include relevant people from different companies, as well as linux-coco > mailing list, but I hope everyone can help by including additional people as needed). > > As we have shared before in various lkml threads/conference presentations > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a > change in the threat model where guest kernel doesn’t anymore trust the hypervisor. > This is a big change in the threat model and requires both careful assessment of the > new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations > and security validation techniques. This is the activity that we have started back at Intel > and the current status can be found in > > 1) Threat model and potential mitigations: > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html> > 2) One of the described in the above doc mitigations is "hardening of the enabled > code". What we mean by this, as well as techniques that are being used are > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html Regarding driver hardening, does anyone have a better filtering idea? The current solution assumes the kernel command line is trusted and cannot avoid the __init() functions that waste memory. I don't know if the __exit() routines of the filtered devices are called, but it doesn't sound much better to allocate memory and free it right after. > > 3) All the tools are open-source and everyone can start using them right away even > without any special HW (readme has description of what is needed). > Tools and documentation is here: > https://github.com/intel/ccc-linux-guest-hardening > > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found > here: https://github.com/intel/tdx/commits/guest-next > > So, my main question before we start to argue about the threat model, mitigations, etc, > is what is the good way to get this reviewed to make sure everyone is aligned? > There are a lot of angles and details, so what is the most efficient method? > Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html > into logical pieces and start submitting it to mailing list for discussion one by one? > Any other methods? > > The original plan we had in mind is to start discussing the relevant pieces when submitting the code, > i.e. when submitting the device filter patches, we will include problem statement, threat model link, > data, alternatives considered, etc. > > Best Regards, > Elena. > > [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/ > [2] https://lpc.events/event/16/contributions/1328/ > [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/ Thanks, Carlos ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-07 0:27 ` Carlos Bilbao @ 2023-02-07 6:03 ` Greg Kroah-Hartman 2023-02-07 19:53 ` Carlos Bilbao 0 siblings, 1 reply; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-02-07 6:03 UTC (permalink / raw) To: Carlos Bilbao Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote: > On 1/25/23 6:28 AM, Reshetova, Elena wrote: > > 2) One of the described in the above doc mitigations is "hardening of the enabled > > code". What we mean by this, as well as techniques that are being used are > > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html > > Regarding driver hardening, does anyone have a better filtering idea? > > The current solution assumes the kernel command line is trusted and cannot > avoid the __init() functions that waste memory. That is two different things (command line trust and __init() functions), so I do not understand the relationship at all here. Please explain it better. Also, why would an __init() function waste memory? Memory usage isn't an issue here, right? > I don't know if the > __exit() routines of the filtered devices are called, but it doesn't sound > much better to allocate memory and free it right after. What device has a __exit() function? Drivers have module init/exit functions but they should do nothing but register themselves with the relevant busses and they are only loaded if the device is found in the system. And what exactly is incorrect about allocating memory and then freeing it when not needed? So again, I don't understand the question, sorry. thanks, greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-07 6:03 ` Greg Kroah-Hartman @ 2023-02-07 19:53 ` Carlos Bilbao 2023-02-07 21:55 ` Michael S. Tsirkin ` (3 more replies) 0 siblings, 4 replies; 102+ messages in thread From: Carlos Bilbao @ 2023-02-07 19:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On 2/7/23 00:03, Greg Kroah-Hartman wrote: > On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote: >> On 1/25/23 6:28 AM, Reshetova, Elena wrote: >>> 2) One of the described in the above doc mitigations is "hardening of the enabled >>> code". What we mean by this, as well as techniques that are being used are >>> described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html >> Regarding driver hardening, does anyone have a better filtering idea? >> >> The current solution assumes the kernel command line is trusted and cannot >> avoid the __init() functions that waste memory. > That is two different things (command line trust and __init() > functions), so I do not understand the relationship at all here. Please > explain it better. No relation other than it would be nice to have a solution that does not require kernel command line and that prevents __init()s. > > Also, why would an __init() function waste memory? Memory usage isn't > an issue here, right? > >> I don't know if the >> __exit() routines of the filtered devices are called, but it doesn't sound >> much better to allocate memory and free it right after. > What device has a __exit() function? Drivers have module init/exit > functions but they should do nothing but register themselves with the > relevant busses and they are only loaded if the device is found in the > system. > > And what exactly is incorrect about allocating memory and then freeing > it when not needed? Currently proposed device filtering does not stop the __init() functions from these drivers to be called. Whatever memory is allocated by blacklisted drivers is wasted because those drivers cannot ever be used. Sure, memory can be allocated and freed as soon as it is no longer needed, but these memory would never be needed. More pressing concern than wasted memory, which may be unimportant, there's the issue of what are those driver init functions doing. For example, as part of device setup, MMIO regs may be involved, which we cannot trust. It's a lot more code to worry about from a CoCo perspective. > > So again, I don't understand the question, sorry. Given the limitations of current approach, does anyone have any other ideas for filtering devices prior to their initialization? > > thanks, > > greg k-h Thanks, Carlos ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-07 19:53 ` Carlos Bilbao @ 2023-02-07 21:55 ` Michael S. Tsirkin 2023-02-08 1:51 ` Theodore Ts'o ` (2 subsequent siblings) 3 siblings, 0 replies; 102+ messages in thread From: Michael S. Tsirkin @ 2023-02-07 21:55 UTC (permalink / raw) To: Carlos Bilbao Cc: Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote: > Given the limitations of current approach, does anyone have any other ideas > for filtering devices prior to their initialization? /me mumbles ... something something ... bpf ... -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-07 19:53 ` Carlos Bilbao 2023-02-07 21:55 ` Michael S. Tsirkin @ 2023-02-08 1:51 ` Theodore Ts'o 2023-02-08 9:31 ` Michael S. Tsirkin 2023-02-08 7:19 ` Greg Kroah-Hartman 2023-02-08 10:16 ` Reshetova, Elena 3 siblings, 1 reply; 102+ messages in thread From: Theodore Ts'o @ 2023-02-08 1:51 UTC (permalink / raw) To: Carlos Bilbao Cc: Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote: > Currently proposed device filtering does not stop the __init() functions > from these drivers to be called. Whatever memory is allocated by > blacklisted drivers is wasted because those drivers cannot ever be used. > Sure, memory can be allocated and freed as soon as it is no longer needed, > but these memory would never be needed. > > > More pressing concern than wasted memory, which may be unimportant, there's > the issue of what are those driver init functions doing. For example, as > part of device setup, MMIO regs may be involved, which we cannot trust. It's > a lot more code to worry about from a CoCo perspective. Why not just simply compile a special CoCo kernel that doesn't have any drivers that you don't trust. Now, the distros may be pushing back in that they don't want to support a separate kernel image. But this apparently really a pain allocation negotiation, isn't it? Intel and other companies want to make $$$$$ with CoCo. In order to make $$$$$, you need to push the costs onto various different players in the ecosystem. This is cleverly disguised as taking current perfectly acceptable design paradigm when the trust boundary is in the traditional location, and causing all of the assumptions which you have broken as "bugs" that must be fixed by upstream developers. But another place to push the costs is to the distro vendors, who might need to maintain a separate CoCo kernel that is differently configured. Now, Red Hat and company will no doubt push back. But the usptream development community will also push back if you try to dump too much work on *us*. - Ted ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 1:51 ` Theodore Ts'o @ 2023-02-08 9:31 ` Michael S. Tsirkin 2023-02-08 10:44 ` Reshetova, Elena 0 siblings, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-02-08 9:31 UTC (permalink / raw) To: Theodore Ts'o Cc: Carlos Bilbao, Greg Kroah-Hartman, Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote: > Why not just simply compile a special CoCo kernel that doesn't have > any drivers that you don't trust. Or at least, start with that? You can then gradually expand that until some config is both acceptable to distros and seems sufficiently trusty to the CoCo project. Lots of kernel features got upstreamed this way. Requirement to have an arbitrary config satisfy CoCo seems like a very high bar to clear. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-02-08 9:31 ` Michael S. Tsirkin @ 2023-02-08 10:44 ` Reshetova, Elena 2023-02-08 10:58 ` Greg Kroah-Hartman ` (2 more replies) 0 siblings, 3 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-02-08 10:44 UTC (permalink / raw) To: Michael S. Tsirkin, Theodore Ts'o Cc: Carlos Bilbao, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List > On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote: > > Why not just simply compile a special CoCo kernel that doesn't have > > any drivers that you don't trust. Aside from complexity and scalability management of such a config that has to change with every kernel release, what about the build-in platform drivers? I am not a driver expert here but as far as I understand they cannot be disabled via config. Please correct if this statement is wrong. > In order to make $$$$$, you need to push the costs onto various > different players in the ecosystem. This is cleverly disguised as > taking current perfectly acceptable design paradigm when the trust > boundary is in the traditional location, and causing all of the > assumptions which you have broken as "bugs" that must be fixed by > upstream developers. The CC threat model does change the traditional linux trust boundary regardless of what mitigations are used (kernel config vs. runtime filtering). Because for the drivers that CoCo guest happens to need, there is no way to fix this problem by either of these mechanisms (we cannot disable the code that we need), unless somebody writes a totally new set of coco specific drivers (who needs another set of CoCo specific virtio drivers in the kernel?). So, if the path is to be able to use existing driver kernel code, then we need: 1. these selective CoCo guest required drivers (small set) needs to be hardened (or whatever word people prefer to use here), which only means that in the presence of malicious host/hypervisor that can manipulate pci config space, port IO and MMIO, these drivers should not expose CC guest memory confidentiality or integrity (including via privilege escalation into CC guest). Please note that this only applies to a small set (in tdx virtio setup we have less than 10 of them) of drivers and does not present invasive changes to the kernel code. There is also an additional core pci/msi code that is involved with discovery and configuration of these drivers, this code also falls into the category we need to make robust. 2. rest of non-needed drivers must be disabled. Here we can argue about what is the correct method of doing this and who should bare the costs of enforcing it. But from pure security point of view: the method that is simple and clear, that requires as little maintenance as possible usually has the biggest chance of enforcing security. And given that we already have the concept of authorized devices in Linux, does this method really brings so much additional complexity to the kernel? But hard to argue here without the code: we need to submit the filter proposal first (under internal review still). Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 10:44 ` Reshetova, Elena @ 2023-02-08 10:58 ` Greg Kroah-Hartman 2023-02-08 16:19 ` Christophe de Dinechin 2023-02-08 13:00 ` Michael S. Tsirkin 2023-02-08 13:42 ` Theodore Ts'o 2 siblings, 1 reply; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-02-08 10:58 UTC (permalink / raw) To: Reshetova, Elena Cc: Michael S. Tsirkin, Theodore Ts'o, Carlos Bilbao, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote: > > > > On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote: > > > Why not just simply compile a special CoCo kernel that doesn't have > > > any drivers that you don't trust. > > Aside from complexity and scalability management of such a config that has > to change with every kernel release, what about the build-in platform drivers? What do you mean by "built in platform drivers"? You are creating a .config for a specific cloud platform, just only select the drivers for that exact configuration and you should be fine. And as for the management of such a config, distros do this just fine, why can't you? It's not that hard to manage properly. > I am not a driver expert here but as far as I understand they cannot be disabled > via config. Please correct if this statement is wrong. Again, which specific drivers are you referring to? And why are they a problem? > > In order to make $$$$$, you need to push the costs onto various > > different players in the ecosystem. This is cleverly disguised as > > taking current perfectly acceptable design paradigm when the trust > > boundary is in the traditional location, and causing all of the > > assumptions which you have broken as "bugs" that must be fixed by > > upstream developers. > > The CC threat model does change the traditional linux trust boundary regardless of > what mitigations are used (kernel config vs. runtime filtering). Because for the > drivers that CoCo guest happens to need, there is no way to fix this problem by > either of these mechanisms (we cannot disable the code that we need), unless somebody > writes a totally new set of coco specific drivers (who needs another set of > CoCo specific virtio drivers in the kernel?). It sounds like you want such a set of drivers, why not just write them? We have zillions of drivers already, it's not hard to write new ones, as it really sounds like that's exactly what you want to have happen here in the end as you don't trust the existing set of drivers you are using for some reason. > So, if the path is to be able to use existing driver kernel code, then we need: Wait, again, why? Why not just have your own? That should be the simplest thing overall. What's wrong with that? > 1. these selective CoCo guest required drivers (small set) needs to be hardened > (or whatever word people prefer to use here), which only means that in > the presence of malicious host/hypervisor that can manipulate pci config space, > port IO and MMIO, these drivers should not expose CC guest memory > confidentiality or integrity (including via privilege escalation into CC guest). Again, stop it please with the "hardened" nonsense, that means nothing. Either the driver has bugs, or it doesn't. I welcome you to prove it doesn't :) > Please note that this only applies to a small set (in tdx virtio setup we have less > than 10 of them) of drivers and does not present invasive changes to the kernel > code. There is also an additional core pci/msi code that is involved with discovery > and configuration of these drivers, this code also falls into the category we need to > make robust. Again, why wouldn't we all want "robust" drivers? This is not anything new here, all you are somehow saying is that you are changing the thread model that the kernel "must" support. And for that, you need to then change the driver code to support that. So again, why not just have your own drivers and driver subsystem that meets your new requirements? Let's see what that looks like and if there even is any overlap between that and the existing kernel driver subsystems. > 2. rest of non-needed drivers must be disabled. Here we can argue about what > is the correct method of doing this and who should bare the costs of enforcing it. You bare that cost. Or you get a distro to do that. That's not up to us in the kernel community, sorry, we give you the option to do that if you want to, that's all that we can do. > But from pure security point of view: the method that is simple and clear, that > requires as little maintenance as possible usually has the biggest chance of > enforcing security. Again, that's up to your configuration management. Please do it, tell us what doesn't work and send changes if you find better ways to do it. Again, this is all there for you to do today, nothing for us to have to do for you. > And given that we already have the concept of authorized devices in Linux, > does this method really brings so much additional complexity to the kernel? No idea, you tell us! :) Again, I recommend you just having your own drivers, that will allow you to show us all exactly what you mean by the terms you keep using. Why not just submit that for review instead? good luck! greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 10:58 ` Greg Kroah-Hartman @ 2023-02-08 16:19 ` Christophe de Dinechin 2023-02-08 17:29 ` Greg Kroah-Hartman 0 siblings, 1 reply; 102+ messages in thread From: Christophe de Dinechin @ 2023-02-08 16:19 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Reshetova, Elena, Michael S. Tsirkin, Theodore Ts'o, Carlos Bilbao, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote... > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote: >> >> The CC threat model does change the traditional linux trust boundary regardless of >> what mitigations are used (kernel config vs. runtime filtering). Because for the >> drivers that CoCo guest happens to need, there is no way to fix this problem by >> either of these mechanisms (we cannot disable the code that we need), unless somebody >> writes a totally new set of coco specific drivers (who needs another set of >> CoCo specific virtio drivers in the kernel?). > > It sounds like you want such a set of drivers, why not just write them? > We have zillions of drivers already, it's not hard to write new ones, as > it really sounds like that's exactly what you want to have happen here > in the end as you don't trust the existing set of drivers you are using > for some reason. In the CC approach, the hypervisor is considered as hostile. The rest of the system is not changed much. If we pass-through some existing NIC, we'd rather use the existing driver for that NIC rather than reinvent it. However, we need to also consider the possibility that someone maliciously replaced the actual NIC with a cleverly crafted software emulator designed to cause the driver to leak confidential data. >> So, if the path is to be able to use existing driver kernel code, then we need: > > Wait, again, why? Why not just have your own? That should be the > simplest thing overall. What's wrong with that? That would require duplication for the majority of hardware drivers. >> 1. these selective CoCo guest required drivers (small set) needs to be hardened >> (or whatever word people prefer to use here), which only means that in >> the presence of malicious host/hypervisor that can manipulate pci config space, >> port IO and MMIO, these drivers should not expose CC guest memory >> confidentiality or integrity (including via privilege escalation into CC guest). > > Again, stop it please with the "hardened" nonsense, that means nothing. > Either the driver has bugs, or it doesn't. I welcome you to prove it > doesn't :) In a non-CC scenario, a driver is correct if, among other things, it does not leak kernel data to user space. However, it assumes that PCI devices are working correctly and according to spec. In a CC scenario, an additional condition for correctness is that it must not leak data from the trusted environment to the host. It assumes that a _virtual_ PCI device can be implemented on the host side to cause an existing driver to leak secrets to the host. It is this additional condition that we are talking about. Think of this as a bit similar to the introduction of IOMMUs, which meant there was a new condition impacting _the entire kernel_ that you had to make sure your DMA operations and IOMMU were in agreement. Here, it is a bit of a similar situation: CC forbids some specific operations the same way an IOMMU does, except instead of stray DMAs, it's stray accesses from the host. Note that, as James Bottomley pointed out, a crash is not seen as a failure of the CC model, unless it leads to a subsequent leak of confidential data. Denial of service, through crash or otherwise, is so easy to do from host or hypervisor side that it is entirely out of scope. > >> Please note that this only applies to a small set (in tdx virtio setup we have less >> than 10 of them) of drivers and does not present invasive changes to the kernel >> code. There is also an additional core pci/msi code that is involved with discovery >> and configuration of these drivers, this code also falls into the category we need to >> make robust. > > Again, why wouldn't we all want "robust" drivers? This is not anything > new here, What is new is that CC requires driver to be "robust" against a new kind of attack "from below" (i.e. from the [virtual] hardware side). > all you are somehow saying is that you are changing the thread > model that the kernel "must" support. And for that, you need to then > change the driver code to support that. What is being argued is that CC is not robust unless we block host-side attacks that can cause the guest to leak data to the host. > > So again, why not just have your own drivers and driver subsystem that > meets your new requirements? Let's see what that looks like and if > there even is any overlap between that and the existing kernel driver > subsystems. Would a "CC-aware PCI" subsystem fit your definition? > >> 2. rest of non-needed drivers must be disabled. Here we can argue about what >> is the correct method of doing this and who should bare the costs of enforcing it. > > You bare that cost. I believe the CC community understands that. The first step before introducing modifications in the drivers is getting an understanding of why we think that CC introduces a new condition for robustness. We will not magically turn all drivers into CC-safe drivers. It will take a lot of time, and the patches are likely to come from the CC community. At that stage, though, the question is: "do you understand the problem we are trying to solve?". I hope that my IOMMU analogy above helps. > Or you get a distro to do that. Best a distro can do is have a minified kernel tuned for CC use-cases, or enabling an hypothetical CONFIG_COCO_SAFETY configuration. A distro cannot decide what work goes behing CONFIG_COCO_SAFETY. > That's not up to us in the kernel community, sorry, we give you the option > to do that if you want to, that's all that we can do. I hope that the explanations above will help you change your mind on that statement. That cannot be a config-only or custom-drivers-only solution. (or maybe you can convince us it can ;-) > >> But from pure security point of view: the method that is simple and clear, that >> requires as little maintenance as possible usually has the biggest chance of >> enforcing security. > > Again, that's up to your configuration management. Please do it, tell > us what doesn't work and send changes if you find better ways to do it. > Again, this is all there for you to do today, nothing for us to have to > do for you. > >> And given that we already have the concept of authorized devices in Linux, >> does this method really brings so much additional complexity to the kernel? > > No idea, you tell us! :) > > Again, I recommend you just having your own drivers, that will allow you > to show us all exactly what you mean by the terms you keep using. Why > not just submit that for review instead? > > good luck! > > greg k-h -- Cheers, Christophe de Dinechin (https://c3d.github.io) Theory of Incomplete Measurements (https://c3d.github.io/TIM) ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 16:19 ` Christophe de Dinechin @ 2023-02-08 17:29 ` Greg Kroah-Hartman 2023-02-08 18:02 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-02-08 17:29 UTC (permalink / raw) To: Christophe de Dinechin Cc: Reshetova, Elena, Michael S. Tsirkin, Theodore Ts'o, Carlos Bilbao, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Feb 08, 2023 at 05:19:37PM +0100, Christophe de Dinechin wrote: > > On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote... > > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote: > >> > >> The CC threat model does change the traditional linux trust boundary regardless of > >> what mitigations are used (kernel config vs. runtime filtering). Because for the > >> drivers that CoCo guest happens to need, there is no way to fix this problem by > >> either of these mechanisms (we cannot disable the code that we need), unless somebody > >> writes a totally new set of coco specific drivers (who needs another set of > >> CoCo specific virtio drivers in the kernel?). > > > > It sounds like you want such a set of drivers, why not just write them? > > We have zillions of drivers already, it's not hard to write new ones, as > > it really sounds like that's exactly what you want to have happen here > > in the end as you don't trust the existing set of drivers you are using > > for some reason. > > In the CC approach, the hypervisor is considered as hostile. The rest of the > system is not changed much. If we pass-through some existing NIC, we'd > rather use the existing driver for that NIC rather than reinvent > it. But that is not what was proposed. I thought this was all about virtio. If not, again, someone needs to write a solid definition. So if you want to use existing drivers, wonderful, please work on making the needed changes to meet your goals to all of them. I was trying to give you a simple way out :) > >> 1. these selective CoCo guest required drivers (small set) needs to be hardened > >> (or whatever word people prefer to use here), which only means that in > >> the presence of malicious host/hypervisor that can manipulate pci config space, > >> port IO and MMIO, these drivers should not expose CC guest memory > >> confidentiality or integrity (including via privilege escalation into CC guest). > > > > Again, stop it please with the "hardened" nonsense, that means nothing. > > Either the driver has bugs, or it doesn't. I welcome you to prove it > > doesn't :) > > In a non-CC scenario, a driver is correct if, among other things, it does > not leak kernel data to user space. However, it assumes that PCI devices are > working correctly and according to spec. And you also assume that your CPU is working properly. And what spec exactly are you referring to? How can you validate any of that without using the PCI authentication protocol already discussed in this thread? > >> Please note that this only applies to a small set (in tdx virtio setup we have less > >> than 10 of them) of drivers and does not present invasive changes to the kernel > >> code. There is also an additional core pci/msi code that is involved with discovery > >> and configuration of these drivers, this code also falls into the category we need to > >> make robust. > > > > Again, why wouldn't we all want "robust" drivers? This is not anything > > new here, > > What is new is that CC requires driver to be "robust" against a new kind of > attack "from below" (i.e. from the [virtual] hardware side). And as I have said multiple times, that is a totally new "requirement" and one that Linux does not meet in any way at this point in time. If you somehow feel this is a change that is ok to make for Linux, you will need to do a lot of work to make this happen. Anyway, you all are just spinning in circles now. I'll just mute this thread until I see an actual code change as it seems to be full of people not actually sending anything we can actually do anything with. greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 17:29 ` Greg Kroah-Hartman @ 2023-02-08 18:02 ` Dr. David Alan Gilbert 2023-02-08 18:58 ` Thomas Gleixner 0 siblings, 1 reply; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-02-08 18:02 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Christophe de Dinechin, Reshetova, Elena, Michael S. Tsirkin, Theodore Ts'o, Carlos Bilbao, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > On Wed, Feb 08, 2023 at 05:19:37PM +0100, Christophe de Dinechin wrote: > > > > On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote... > > > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote: > > >> > > >> The CC threat model does change the traditional linux trust boundary regardless of > > >> what mitigations are used (kernel config vs. runtime filtering). Because for the > > >> drivers that CoCo guest happens to need, there is no way to fix this problem by > > >> either of these mechanisms (we cannot disable the code that we need), unless somebody > > >> writes a totally new set of coco specific drivers (who needs another set of > > >> CoCo specific virtio drivers in the kernel?). > > > > > > It sounds like you want such a set of drivers, why not just write them? > > > We have zillions of drivers already, it's not hard to write new ones, as > > > it really sounds like that's exactly what you want to have happen here > > > in the end as you don't trust the existing set of drivers you are using > > > for some reason. > > > > In the CC approach, the hypervisor is considered as hostile. The rest of the > > system is not changed much. If we pass-through some existing NIC, we'd > > rather use the existing driver for that NIC rather than reinvent > > it. > > But that is not what was proposed. I thought this was all about virtio. > If not, again, someone needs to write a solid definition. As I said in my reply to you a couple of weeks ago: I'm not sure the request here isn't really to make sure *all* PCI devices are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) - and potentially ones that people will want to pass-through (which generally needs a lot more work to make safe). (I've not looked at these Intel tools to see what they cover) so *mostly* virtio, and just a few of the other devices. > So if you want to use existing drivers, wonderful, please work on making > the needed changes to meet your goals to all of them. I was trying to > give you a simple way out :) > > > >> 1. these selective CoCo guest required drivers (small set) needs to be hardened > > >> (or whatever word people prefer to use here), which only means that in > > >> the presence of malicious host/hypervisor that can manipulate pci config space, > > >> port IO and MMIO, these drivers should not expose CC guest memory > > >> confidentiality or integrity (including via privilege escalation into CC guest). > > > > > > Again, stop it please with the "hardened" nonsense, that means nothing. > > > Either the driver has bugs, or it doesn't. I welcome you to prove it > > > doesn't :) > > > > In a non-CC scenario, a driver is correct if, among other things, it does > > not leak kernel data to user space. However, it assumes that PCI devices are > > working correctly and according to spec. > > And you also assume that your CPU is working properly. We require the CPU to give us a signed attestation to prove that it's a trusted CPU, that someone external can validate. So, not quite 'assume'. > And what spec > exactly are you referring to? How can you validate any of that without > using the PCI authentication protocol already discussed in this thread? The PCI auth protocol looks promising and is possibly the right long term answer. But for a pass through NIC for example, all we'd want is that (with the help of the IOMMU) it can't get or corrupt any data the guest doesn't give it - and then it's upto the guest to run encryption over the protocols over the NIC. > > > >> Please note that this only applies to a small set (in tdx virtio setup we have less > > >> than 10 of them) of drivers and does not present invasive changes to the kernel > > >> code. There is also an additional core pci/msi code that is involved with discovery > > >> and configuration of these drivers, this code also falls into the category we need to > > >> make robust. > > > > > > Again, why wouldn't we all want "robust" drivers? This is not anything > > > new here, > > > > What is new is that CC requires driver to be "robust" against a new kind of > > attack "from below" (i.e. from the [virtual] hardware side). > > And as I have said multiple times, that is a totally new "requirement" > and one that Linux does not meet in any way at this point in time. Yes, that's a fair statement. > If > you somehow feel this is a change that is ok to make for Linux, you will > need to do a lot of work to make this happen. > > Anyway, you all are just spinning in circles now. I'll just mute this > thread until I see an actual code change as it seems to be full of > people not actually sending anything we can actually do anything with. I think the challenge will be to come up with non-intrusive, minimal changes; obviously you don't want stuff shutgunned everywhere. Dave > greg k-h > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 18:02 ` Dr. David Alan Gilbert @ 2023-02-08 18:58 ` Thomas Gleixner 2023-02-09 19:48 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 102+ messages in thread From: Thomas Gleixner @ 2023-02-08 18:58 UTC (permalink / raw) To: Dr. David Alan Gilbert, Greg Kroah-Hartman Cc: Christophe de Dinechin, Reshetova, Elena, Michael S. Tsirkin, Theodore Ts'o, Carlos Bilbao, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Feb 08 2023 at 18:02, David Alan Gilbert wrote: > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: >> Anyway, you all are just spinning in circles now. I'll just mute this >> thread until I see an actual code change as it seems to be full of >> people not actually sending anything we can actually do anything with. There have been random patchs posted which finally caused this discussion to start. Wrong order obviously :) > I think the challenge will be to come up with non-intrusive, minimal > changes; obviously you don't want stuff shutgunned everywhere. That has been tried by doing random surgery, e.g. caching some particular PCI config value. While that might not look intrusive on the first glance, these kind of punctual changes are the begin of a whack a mole game and will end up in an uncoordinated maze of tiny mitigations which make the code harder to maintain. The real challenge is to come up with threat classes and mechanisms which squash the whole class. Done right, e.g. caching a range of config space values (or all of it) might give a benefit even for the bare metal or general virtualization case. That's quite some work, but its much more palatable than a trickle of "fixes" when yet another source of trouble has been detected by a tool or human inspection. It's also more future proof because with the current approach of scratching the itch of the day the probability that the just "mitigated" issue comes back due to unrelated changes is very close to 100%. It's not any different than any other threat class problem. Thanks, tglx ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 18:58 ` Thomas Gleixner @ 2023-02-09 19:48 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 102+ messages in thread From: Dr. David Alan Gilbert @ 2023-02-09 19:48 UTC (permalink / raw) To: Thomas Gleixner Cc: Greg Kroah-Hartman, Christophe de Dinechin, Reshetova, Elena, Michael S. Tsirkin, Theodore Ts'o, Carlos Bilbao, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List * Thomas Gleixner (tglx@linutronix.de) wrote: > On Wed, Feb 08 2023 at 18:02, David Alan Gilbert wrote: > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote: > >> Anyway, you all are just spinning in circles now. I'll just mute this > >> thread until I see an actual code change as it seems to be full of > >> people not actually sending anything we can actually do anything with. > > There have been random patchs posted which finally caused this > discussion to start. Wrong order obviously :) > > > I think the challenge will be to come up with non-intrusive, minimal > > changes; obviously you don't want stuff shutgunned everywhere. > > That has been tried by doing random surgery, e.g. caching some > particular PCI config value. While that might not look intrusive on the > first glance, these kind of punctual changes are the begin of a whack a > mole game and will end up in an uncoordinated maze of tiny mitigations > which make the code harder to maintain. > > The real challenge is to come up with threat classes and mechanisms > which squash the whole class. Done right, e.g. caching a range of config > space values (or all of it) might give a benefit even for the bare metal > or general virtualization case. Yeh, reasonable. > That's quite some work, but its much more palatable than a trickle of > "fixes" when yet another source of trouble has been detected by a tool > or human inspection. > > It's also more future proof because with the current approach of > scratching the itch of the day the probability that the just "mitigated" > issue comes back due to unrelated changes is very close to 100%. > > It's not any different than any other threat class problem. I wonder if trying to group/categorise the output of Intel's tool would allow common problematic patterns to be found to then try and come up with more concrete fixes for whole classes of issues. Dave > Thanks, > > tglx > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 10:44 ` Reshetova, Elena 2023-02-08 10:58 ` Greg Kroah-Hartman @ 2023-02-08 13:00 ` Michael S. Tsirkin 2023-02-08 13:42 ` Theodore Ts'o 2 siblings, 0 replies; 102+ messages in thread From: Michael S. Tsirkin @ 2023-02-08 13:00 UTC (permalink / raw) To: Reshetova, Elena Cc: Theodore Ts'o, Carlos Bilbao, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote: > Because for the > drivers that CoCo guest happens to need, there is no way to fix this problem by > either of these mechanisms (we cannot disable the code that we need), unless somebody > writes a totally new set of coco specific drivers (who needs another set of > CoCo specific virtio drivers in the kernel?). I think it's more about pci and all that jazz, no? As a virtio maintainer I applied patches adding validation and intend to do so in the future simply because for virtio specifically people build all kind of weird setups out of software and so validating everything is a good idea. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 10:44 ` Reshetova, Elena 2023-02-08 10:58 ` Greg Kroah-Hartman 2023-02-08 13:00 ` Michael S. Tsirkin @ 2023-02-08 13:42 ` Theodore Ts'o 2 siblings, 0 replies; 102+ messages in thread From: Theodore Ts'o @ 2023-02-08 13:42 UTC (permalink / raw) To: Reshetova, Elena Cc: Michael S. Tsirkin, Carlos Bilbao, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote: > 2. rest of non-needed drivers must be disabled. Here we can argue about what > is the correct method of doing this and who should bare the costs of enforcing it. > But from pure security point of view: the method that is simple and clear, that > requires as little maintenance as possible usually has the biggest chance of > enforcing security. > And given that we already have the concept of authorized devices in Linux, > does this method really brings so much additional complexity to the kernel? > But hard to argue here without the code: we need to submit the filter proposal first > (under internal review still). I think the problem here is that we've had a lot of painful experience where fuzzing produces a lot of false positives which then security-types then insist that all kernel developers must fix so that we can see the "important" security issues from the false positives. So "as little maintenance as possible" and fuzzing have not necessarily gone together. It might be less maintenance costs for *you*, but it's not necessarily less maintenance work for *us*. I've seen Red Hat principal engineers take completely bogus issues and raise them to CVE "high" priority levels, when it was nothing like that, thus forcing distro and data center people to be forced to do global pushes to production because it's easier than trying to explain to FEDramp auditors why the CVE SS is bogus --- and every single unnecessary push to production has its own costs and risks. I've seen the constant load of syzbot false positives that generate noise in my inbox and in bug tracking issues assigned to me at $WORK. I've seen the false positives generated by DEPT, which is why I've pushed back on it. So if you are going to insist on fuzzing all of the PCI config space, and treat them all as "bugs", there is going to be huge pushback. Even if the "fixes" are minor, and don't have any massive impact on memory used or cache line misses or code/maintainability bloat, the fact that we treat them as P3 quality of implementation issues, and *you* treat them as P1 security bugs that must be fixed Now! Now! Now! is going to cause friction. (This is especially true since CVE SS scores are unidimentional, and what might be high security --- or at least embarassing --- for CoCo, might be completely innocuous QOI bugs for the rest of the world.) So it might be that a simple, separate, kerenl config is going to be the massively simpler way to go, instead of insisting that all PCI device drivers must be fuzzed and be made CoCo safe, even if they will never be used in a CoCo context. Again, please be cognizant about the costs that CoCo may be imposing and pushing onto the rest of the ecosystem. Cheers, - Ted ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-07 19:53 ` Carlos Bilbao 2023-02-07 21:55 ` Michael S. Tsirkin 2023-02-08 1:51 ` Theodore Ts'o @ 2023-02-08 7:19 ` Greg Kroah-Hartman 2023-02-08 10:16 ` Reshetova, Elena 3 siblings, 0 replies; 102+ messages in thread From: Greg Kroah-Hartman @ 2023-02-08 7:19 UTC (permalink / raw) To: Carlos Bilbao Cc: Reshetova, Elena, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote: > On 2/7/23 00:03, Greg Kroah-Hartman wrote: > > > On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote: > > > On 1/25/23 6:28 AM, Reshetova, Elena wrote: > > > > 2) One of the described in the above doc mitigations is "hardening of the enabled > > > > code". What we mean by this, as well as techniques that are being used are > > > > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html > > > Regarding driver hardening, does anyone have a better filtering idea? > > > > > > The current solution assumes the kernel command line is trusted and cannot > > > avoid the __init() functions that waste memory. > > That is two different things (command line trust and __init() > > functions), so I do not understand the relationship at all here. Please > > explain it better. > > > No relation other than it would be nice to have a solution that does not > require kernel command line and that prevents __init()s. Again, __init() has nothing to do with the kernel command line so I do not understand the relationship here. Have a specific example? > > Also, why would an __init() function waste memory? Memory usage isn't > > an issue here, right? > > > > > I don't know if the > > > __exit() routines of the filtered devices are called, but it doesn't sound > > > much better to allocate memory and free it right after. > > What device has a __exit() function? Drivers have module init/exit > > functions but they should do nothing but register themselves with the > > relevant busses and they are only loaded if the device is found in the > > system. > > > > And what exactly is incorrect about allocating memory and then freeing > > it when not needed? > > > Currently proposed device filtering does not stop the __init() functions > from these drivers to be called. Whatever memory is allocated by > blacklisted drivers is wasted because those drivers cannot ever be used. > Sure, memory can be allocated and freed as soon as it is no longer needed, > but these memory would never be needed. Drivers are never even loaded if the hardware is not present, and a driver init function should do nothing anyway if it is written properly, so again, I do not understand what you are referring to here. Again, a real example might help explain your concerns, pointers to the code? > More pressing concern than wasted memory, which may be unimportant, there's > the issue of what are those driver init functions doing. For example, as > part of device setup, MMIO regs may be involved, which we cannot trust. It's > a lot more code to worry about from a CoCo perspective. Again, specific example? And if you don't want a driver to be loaded, don't build it into your kernel as Ted said. Or better yet, use the in-kernel functionality to prevent drivers from ever loading or binding to a device until you tell it from userspace that it is safe to do so. So I don't think this is a real issue unless you have pointers to code you are concerned about. > > So again, I don't understand the question, sorry. > > Given the limitations of current approach, does anyone have any other ideas > for filtering devices prior to their initialization? What is wrong with the functionality we have today for this very thing? Does it not work properly for you? If so, why not, for what devices and drivers and busses do you still have problems with? thanks, greg k-h ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-02-07 19:53 ` Carlos Bilbao ` (2 preceding siblings ...) 2023-02-08 7:19 ` Greg Kroah-Hartman @ 2023-02-08 10:16 ` Reshetova, Elena 2023-02-08 13:15 ` Michael S. Tsirkin 3 siblings, 1 reply; 102+ messages in thread From: Reshetova, Elena @ 2023-02-08 10:16 UTC (permalink / raw) To: Carlos Bilbao, Greg Kroah-Hartman Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List > No relation other than it would be nice to have a solution that does not >require kernel command line and that prevents __init()s. For __inits see below. For the command line, it is pretty straightforward to measure it and attest its integrity later: we need to do it for other parts anyhow as acpi tables, etc. So I don’t see why we need to do smth special about it? In any case it is indeed very different from driver discussion and goes into "what should be covered by attestation for CC guest" topic. > More pressing concern than wasted memory, which may be unimportant, there's > the issue of what are those driver init functions doing. For example, as > part of device setup, MMIO regs may be involved, which we cannot trust. It's > a lot more code to worry about from a CoCo perspective. Yes, we have seen such cases in kernel where drivers or modules would access MMIO or pci config space already in their __init() functions. Some concrete examples from modules and drivers (there are more): intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch() skx_init() -> get_all_munits() skx_init() -> skx_register_mci() -> skx_get_dimm_config() intel_rng_mod_init() -> intel_init_hw_struct() i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log() However, this is how we address this from security point of view: 1. In order for a MMIO read to obtain data from a untrusted host, the memory range must be shared with the host to begin with. We enforce that all MMIO mappings are private by default to the CC guest unless it is explicitly shared (and we do automatically share for the authorized devices and their drivers from the allow list). This removes a problem of an "unexpected MMIO region interaction" (modulo acpi AML operation regions that we do have to share also unfortunately, but acpi is a whole different difficult case on its own). 2. For pci config space, we limit any interaction with pci config space only to authorized devices and their drivers (that are in the allow list). As a result device drivers outside of the allow list are not able to access pci config space even in their __init routines. It is done by setting the to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-authorized devices. So, even if host made the driver __init function to run (by faking the device on the host side), it should not be able to supply any malicious data to it via MMIO or pci config space, so running their __init routines should be ok from security point of view or does anyone see any holes here? Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
* Re: Linux guest kernel threat model for Confidential Computing 2023-02-08 10:16 ` Reshetova, Elena @ 2023-02-08 13:15 ` Michael S. Tsirkin 2023-02-09 14:30 ` Reshetova, Elena 0 siblings, 1 reply; 102+ messages in thread From: Michael S. Tsirkin @ 2023-02-08 13:15 UTC (permalink / raw) To: Reshetova, Elena Cc: Carlos Bilbao, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List On Wed, Feb 08, 2023 at 10:16:14AM +0000, Reshetova, Elena wrote: > > No relation other than it would be nice to have a solution that does not > >require kernel command line and that prevents __init()s. > > For __inits see below. For the command line, it is pretty straightforward to > measure it and attest its integrity later: we need to do it for other parts > anyhow as acpi tables, etc. So I don’t see why we need to do smth special > about it? In any case it is indeed very different from driver discussion and > goes into "what should be covered by attestation for CC guest" topic. > > > More pressing concern than wasted memory, which may be unimportant, there's > > the issue of what are those driver init functions doing. For example, as > > part of device setup, MMIO regs may be involved, which we cannot trust. It's > > a lot more code to worry about from a CoCo perspective. > > Yes, we have seen such cases in kernel where drivers or modules would access > MMIO or pci config space already in their __init() functions. > Some concrete examples from modules and drivers (there are more): > > intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch() An iommu driver. So maybe you want to use virtio iommu then? > skx_init() -> get_all_munits() > skx_init() -> skx_register_mci() -> skx_get_dimm_config() A memory controller driver, right? And you need it in a VM? why? > intel_rng_mod_init() -> intel_init_hw_struct() And virtio iommu? > i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log() Another memory controller driver? Can we decide on a single one? > However, this is how we address this from security point of view: > > 1. In order for a MMIO read to obtain data from a untrusted host, the memory > range must be shared with the host to begin with. We enforce that > all MMIO mappings are private by default to the CC guest unless it is > explicitly shared (and we do automatically share for the authorized devices > and their drivers from the allow list). This removes a problem of an > "unexpected MMIO region interaction" > (modulo acpi AML operation regions that we do have to share also unfortunately, > but acpi is a whole different difficult case on its own). How does it remove the problem? You basically get trash from host, no? But it seems that whether said trash is exploitable will really depend on how it's used, e.g. if it's an 8 bit value host can just scan all options in a couple of hundred attempts. What did I miss? > 2. For pci config space, we limit any interaction with pci config > space only to authorized devices and their drivers (that are in the allow list). > As a result device drivers outside of the allow list are not able to access pci > config space even in their __init routines. It is done by setting the > to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-authorized > devices. This seems to be assuming drivers check return code from pci config space accesses, right? I doubt all drivers do though. Even if they do that's unlikely to be a well tested path, right? > So, even if host made the driver __init function to run > (by faking the device on the host side), it should not be able to supply any > malicious data to it via MMIO or pci config space, so running their __init > routines should be ok from security point of view or does anyone see any > holes here? > > Best Regards, > Elena. See above. I am not sure the argument that the bugs are unexploitable sits well with the idea that all this effort is improving code quality. -- MST ^ permalink raw reply [flat|nested] 102+ messages in thread
* RE: Linux guest kernel threat model for Confidential Computing 2023-02-08 13:15 ` Michael S. Tsirkin @ 2023-02-09 14:30 ` Reshetova, Elena 0 siblings, 0 replies; 102+ messages in thread From: Reshetova, Elena @ 2023-02-09 14:30 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Carlos Bilbao, Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco, Linux Kernel Mailing List > On Wed, Feb 08, 2023 at 10:16:14AM +0000, Reshetova, Elena wrote: > > > No relation other than it would be nice to have a solution that does not > > >require kernel command line and that prevents __init()s. > > > > For __inits see below. For the command line, it is pretty straightforward to > > measure it and attest its integrity later: we need to do it for other parts > > anyhow as acpi tables, etc. So I don’t see why we need to do smth special > > about it? In any case it is indeed very different from driver discussion and > > goes into "what should be covered by attestation for CC guest" topic. > > > > > More pressing concern than wasted memory, which may be unimportant, > there's > > > the issue of what are those driver init functions doing. For example, as > > > part of device setup, MMIO regs may be involved, which we cannot trust. It's > > > a lot more code to worry about from a CoCo perspective. > > > > Yes, we have seen such cases in kernel where drivers or modules would access > > MMIO or pci config space already in their __init() functions. > > Some concrete examples from modules and drivers (there are more): > > > > intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch() > > An iommu driver. So maybe you want to use virtio iommu then? > > > skx_init() -> get_all_munits() > > skx_init() -> skx_register_mci() -> skx_get_dimm_config() > > A memory controller driver, right? And you need it in a VM? why? > > > intel_rng_mod_init() -> intel_init_hw_struct() > > And virtio iommu? > > > i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log() > > Another memory controller driver? Can we decide on a single one? We don’t need any of the above in CC guest. The point was to indicate that we know that the current device filter design we have, we will not necessary prevent the __init functions of drivers running in CC guest and we have seen in Linux codebase the code paths that may potentially execute and consume malicious host input already in __init functions (most of drivers luckily do it in probe). However, the argument below I gave is why we think such __init functions are not that big security problem in our case. > > > However, this is how we address this from security point of view: > > > > 1. In order for a MMIO read to obtain data from a untrusted host, the memory > > range must be shared with the host to begin with. We enforce that > > all MMIO mappings are private by default to the CC guest unless it is > > explicitly shared (and we do automatically share for the authorized devices > > and their drivers from the allow list). This removes a problem of an > > "unexpected MMIO region interaction" > > (modulo acpi AML operation regions that we do have to share also > unfortunately, > > but acpi is a whole different difficult case on its own). > > How does it remove the problem? You basically get trash from host, no? > But it seems that whether said trash is exploitable will really depend > on how it's used, e.g. if it's an 8 bit value host can just scan all > options in a couple of hundred attempts. What did I miss? No, it wont work like that. Guest code will never be able to consume the garbage data written into its private memory by host: we will get a memory integrity violation and guest is killed for safety reasons. The confidentiality and integrity of private memory is guaranteed by CC technology itself. > > > > 2. For pci config space, we limit any interaction with pci config > > space only to authorized devices and their drivers (that are in the allow list). > > As a result device drivers outside of the allow list are not able to access pci > > config space even in their __init routines. It is done by setting the > > to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non- > authorized > > devices. > > This seems to be assuming drivers check return code from pci config > space accesses, right? I doubt all drivers do though. Even if they do > that's unlikely to be a well tested path, right? This is a good thing to double check, thank you for pointing this out! Best Regards, Elena. ^ permalink raw reply [flat|nested] 102+ messages in thread
end of thread, other threads:[~2023-02-09 19:48 UTC | newest] Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-25 12:28 Linux guest kernel threat model for Confidential Computing Reshetova, Elena 2023-01-25 12:43 ` Greg Kroah-Hartman 2023-01-25 13:42 ` Dr. David Alan Gilbert 2023-01-25 14:13 ` Daniel P. Berrangé 2023-01-25 15:29 ` Dr. David Alan Gilbert 2023-01-26 14:23 ` Richard Weinberger 2023-01-26 14:58 ` Dr. David Alan Gilbert 2023-01-26 15:13 ` Richard Weinberger 2023-01-26 15:22 ` Dr. David Alan Gilbert 2023-01-26 15:55 ` Daniel P. Berrangé 2023-01-27 9:02 ` Jörg Rödel 2023-01-26 15:43 ` Daniel P. Berrangé 2023-01-27 11:23 ` Reshetova, Elena 2023-01-30 11:30 ` Christophe de Dinechin 2023-01-25 14:22 ` Greg Kroah-Hartman 2023-01-25 14:30 ` James Bottomley 2023-01-25 14:57 ` Dr. David Alan Gilbert 2023-01-25 15:16 ` Greg Kroah-Hartman 2023-01-25 15:45 ` Michael S. Tsirkin 2023-01-25 16:02 ` Kirill A. Shutemov 2023-01-25 17:47 ` Michael S. Tsirkin 2023-01-25 15:50 ` Dr. David Alan Gilbert 2023-01-25 18:47 ` Jiri Kosina 2023-01-26 9:19 ` Jörg Rödel 2023-01-25 21:53 ` Lukas Wunner 2023-01-26 10:48 ` Dr. David Alan Gilbert 2023-01-26 11:24 ` Jonathan Cameron 2023-01-26 13:32 ` Samuel Ortiz [not found] ` <CAGXJix9-cXNW7EwJf0PVzj_Qmt5fmQvBX1KvXfRX5NAeEpnMvw@mail.gmail.com> 2023-01-26 10:58 ` Jonathan Cameron 2023-01-26 13:15 ` Samuel Ortiz 2023-01-26 16:07 ` Jonathan Cameron 2023-01-27 7:02 ` Samuel Ortiz 2023-01-26 15:44 ` Lukas Wunner 2023-01-26 16:25 ` Michael S. Tsirkin 2023-01-26 21:41 ` Lukas Wunner 2023-01-27 7:17 ` Samuel Ortiz 2023-01-25 20:13 ` Jiri Kosina 2023-01-26 13:13 ` Reshetova, Elena 2023-01-25 15:29 ` Reshetova, Elena 2023-01-25 16:40 ` Theodore Ts'o 2023-01-26 8:08 ` Reshetova, Elena 2023-01-26 11:19 ` Leon Romanovsky 2023-01-26 11:29 ` Reshetova, Elena 2023-01-26 12:30 ` Leon Romanovsky 2023-01-26 13:28 ` Reshetova, Elena 2023-01-26 13:50 ` Leon Romanovsky 2023-01-26 20:54 ` Theodore Ts'o 2023-01-27 19:24 ` James Bottomley 2023-01-30 7:42 ` Reshetova, Elena 2023-01-30 12:40 ` James Bottomley 2023-01-31 11:31 ` Reshetova, Elena 2023-01-31 13:28 ` James Bottomley 2023-01-31 15:14 ` Christophe de Dinechin 2023-01-31 17:39 ` Michael S. Tsirkin 2023-02-01 10:52 ` Christophe de Dinechin Dupont de Dinechin 2023-02-01 11:01 ` Michael S. Tsirkin 2023-02-01 13:15 ` Christophe de Dinechin Dupont de Dinechin 2023-02-01 16:02 ` Michael S. Tsirkin 2023-02-01 17:13 ` Christophe de Dinechin 2023-02-06 18:58 ` Dr. David Alan Gilbert 2023-02-02 3:24 ` Jason Wang 2023-02-01 10:24 ` Christophe de Dinechin 2023-01-31 16:34 ` Reshetova, Elena 2023-01-31 17:49 ` James Bottomley 2023-02-02 14:51 ` Jeremi Piotrowski 2023-02-03 14:05 ` Reshetova, Elena 2023-01-27 9:32 ` Jörg Rödel 2023-01-26 13:58 ` Dr. David Alan Gilbert 2023-01-26 17:48 ` Reshetova, Elena 2023-01-26 18:06 ` Leon Romanovsky 2023-01-26 18:14 ` Dr. David Alan Gilbert 2023-01-26 16:29 ` Michael S. Tsirkin 2023-01-27 8:52 ` Reshetova, Elena 2023-01-27 10:04 ` Michael S. Tsirkin 2023-01-27 12:25 ` Reshetova, Elena 2023-01-27 14:32 ` Michael S. Tsirkin 2023-01-27 20:51 ` Carlos Bilbao 2023-01-30 11:36 ` Christophe de Dinechin 2023-01-30 12:00 ` Kirill A. Shutemov 2023-01-30 15:14 ` Michael S. Tsirkin 2023-01-31 10:06 ` Reshetova, Elena 2023-01-31 16:52 ` Christophe de Dinechin 2023-02-02 11:31 ` Reshetova, Elena 2023-02-07 0:27 ` Carlos Bilbao 2023-02-07 6:03 ` Greg Kroah-Hartman 2023-02-07 19:53 ` Carlos Bilbao 2023-02-07 21:55 ` Michael S. Tsirkin 2023-02-08 1:51 ` Theodore Ts'o 2023-02-08 9:31 ` Michael S. Tsirkin 2023-02-08 10:44 ` Reshetova, Elena 2023-02-08 10:58 ` Greg Kroah-Hartman 2023-02-08 16:19 ` Christophe de Dinechin 2023-02-08 17:29 ` Greg Kroah-Hartman 2023-02-08 18:02 ` Dr. David Alan Gilbert 2023-02-08 18:58 ` Thomas Gleixner 2023-02-09 19:48 ` Dr. David Alan Gilbert 2023-02-08 13:00 ` Michael S. Tsirkin 2023-02-08 13:42 ` Theodore Ts'o 2023-02-08 7:19 ` Greg Kroah-Hartman 2023-02-08 10:16 ` Reshetova, Elena 2023-02-08 13:15 ` Michael S. Tsirkin 2023-02-09 14:30 ` Reshetova, Elena
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).