kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* RE: Linux guest kernel threat model for Confidential Computing
       [not found] ` <Y9EkCvAfNXnJ+ATo@kroah.com>
@ 2023-01-25 15:29   ` Reshetova, Elena
  2023-01-25 16:40     ` Theodore Ts'o
                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-25 15:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy,
	Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner,
	Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

Replying only to the not-so-far addressed points. 

> On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > Hi Greg,
> >
> > You mentioned couple of times (last time in this recent thread:
> > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to
> start
> > discussing the updated threat model for kernel, so this email is a start in this
> direction.
> 
> Any specific reason you didn't cc: the linux-hardening mailing list?
> This seems to be in their area as well, right?

Added now, I am just not sure how many mailing lists I want to cross spam this.
And this is a very special aspect of 'hardening' since it is about hardening a kernel
under different threat model/assumptions. 
 

> I hate the term "hardening".  Please just say it for what it really is,
> "fixing bugs to handle broken hardware".  We've done that for years when
> dealing with PCI and USB and even CPUs doing things that they shouldn't
> be doing.  How is this any different in the end?

Well, that would not be fully correct in this case. You can really see it from two
angles:

1. fixing bugs to handle broken hardware
2. fixing bugs that are result of correctly operating HW, but incorrectly or maliciously
operating hypervisor (acting as a man in the middle)

We focus on 2 but it happens to address 1 also to some level.  

> 
> So what you also are saying here now is "we do not trust any PCI
> devices", so please just say that (why do you trust USB devices?)  If
> that is something that you all think that Linux should support, then
> let's go from there.
> 
> > 3) All the tools are open-source and everyone can start using them right away
> even
> > without any special HW (readme has description of what is needed).
> > Tools and documentation is here:
> > https://github.com/intel/ccc-linux-guest-hardening
> 
> Again, as our documentation states, when you submit patches based on
> these tools, you HAVE TO document that.  Otherwise we think you all are
> crazy and will get your patches rejected.  You all know this, why ignore
> it?

Sorry, I didn’t know that for every bug that is found in linux kernel when
we are submitting a fix that we have to list the way how it has been found.
We will fix this in the future submissions, but some bugs we have are found by
plain code audit, so 'human' is the tool. 

> 
> > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be
> found
> > here: https://github.com/intel/tdx/commits/guest-next
> 
> Random github trees of kernel patches are just that, sorry.

This was just for a completeness or for anyone who is curious to see the actual
code already now. Of course they will be submitted for review 
using normal process. 

> 
> > So, my main question before we start to argue about the threat model,
> mitigations, etc,
> > is what is the good way to get this reviewed to make sure everyone is aligned?
> > There are a lot of angles and details, so what is the most efficient method?
> > Should I split the threat model from https://intel.github.io/ccc-linux-guest-
> hardening-docs/security-spec.html
> > into logical pieces and start submitting it to mailing list for discussion one by
> one?
> 
> Yes, start out by laying out what you feel the actual problem is, what
> you feel should be done for it, and the patches you have proposed to
> implement this, for each and every logical piece.

OK, so this thread is about the actual threat model and overall problem. 
We can re-write the current bug fixe patches (virtio and MSI) to refer to this threat model
properly and explain that they fix the actual bugs under this threat model.
Rest of pieces will come when other patches will be submitted for the review
in logical groups. 

Does this work? 

> 
> Again, nothing new here, that's how Linux is developed, again, you all
> know this, it's not anything I should have to say.
> 
> > Any other methods?
> >
> > The original plan we had in mind is to start discussing the relevant pieces when
> submitting the code,
> > i.e. when submitting the device filter patches, we will include problem
> statement, threat model link,
> > data, alternatives considered, etc.
> 
> As always, we can't do anything without actual working changes to the
> code, otherwise it's just a pipe dream and we can't waste our time on it
> (neither would you want us to).

Of course, code exists, we just only starting submitting it. We started from
easy bug fixes because they are small trivial fixes that are easy to review. 
Bigger pieces will follow (for example Satya has been addressing your comments about the
device filter in his new implementation). 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-25 15:29   ` Linux guest kernel threat model for Confidential Computing Reshetova, Elena
@ 2023-01-25 16:40     ` Theodore Ts'o
  2023-01-26  8:08       ` Reshetova, Elena
  2023-01-26 11:19     ` Leon Romanovsky
  2023-01-26 16:29     ` Michael S. Tsirkin
  2 siblings, 1 reply; 39+ messages in thread
From: Theodore Ts'o @ 2023-01-25 16:40 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > Again, as our documentation states, when you submit patches based on
> > these tools, you HAVE TO document that.  Otherwise we think you all are
> > crazy and will get your patches rejected.  You all know this, why ignore
> > it?
> 
> Sorry, I didn’t know that for every bug that is found in linux kernel when
> we are submitting a fix that we have to list the way how it has been found.
> We will fix this in the future submissions, but some bugs we have are found by
> plain code audit, so 'human' is the tool.

So the concern is that *you* may think it is a bug, but other people
may not agree.  Perhaps what is needed is a full description of the
goals of Confidential Computing, and what is in scope, and what is
deliberately *not* in scope.  I predict that when you do this, that
people will come out of the wood work and say, no wait, "CoCo ala
S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V
means QUUX".

Others may end up objecting, "no wait, doing this is going to mean
***insane*** changes to the entire kernel, and this will be a
performance / maintenance nightmare and unless you fix your hardware
in future chips, we wlil consider this a hardware bug and reject all
of your patches".

But it's better to figure this out now, then after you get hundreds of
patches into the upstream kernel, we discover that this is only 5% of
the necessary changes, and then the rest of your patches are rejected,
and you have to end up fixing the hardware anyway, with the patches
upstreamed so far being wasted effort.  :-)

If we get consensus on that document, then that can get checked into
Documentation, and that can represent general consensus on the problem
early on.

						- Ted

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-25 16:40     ` Theodore Ts'o
@ 2023-01-26  8:08       ` Reshetova, Elena
  0 siblings, 0 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-26  8:08 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening


 On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Again, as our documentation states, when you submit patches based on
> > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > crazy and will get your patches rejected.  You all know this, why ignore
> > > it?
> >
> > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > we are submitting a fix that we have to list the way how it has been found.
> > We will fix this in the future submissions, but some bugs we have are found by
> > plain code audit, so 'human' is the tool.
> 
> So the concern is that *you* may think it is a bug, but other people
> may not agree.  Perhaps what is needed is a full description of the
> goals of Confidential Computing, and what is in scope, and what is
> deliberately *not* in scope.  I predict that when you do this, that
> people will come out of the wood work and say, no wait, "CoCo ala
> S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V
> means QUUX".

Agree, and this is the reason behind starting this thread: to make sure people
agree on the threat model.  The only reason why we submitted some trivial bugs
fixes separately is the fact that they *also* can be considered bugs under existing
threat model, if one thinks that kernel should be as robust as possible against 
potential erroneous devices.

As described right in the beginning of the doc I shared [1] (adjusted now to remove
'TDX' and put generic 'CC guest kernel'), we want to make sure that an untrusted
host (and hypervisor) is not able to

1. archive privileged escalation into a CC guest kernel
2.  compromise the confidentiality or integrity of CC guest private memory

The above security objectives give us two primary assets we want to protect:
CC guest execution context and CC guest private memory confidentiality and
integrity. 

The DoS from the host towards CC guest is explicitly out of scope and a non-security
objective. 

The attack surface in question is any interface exposed from a CC guest kernel
towards untrusted host that is not covered by the CC HW protections. Here the
exact list can differ somewhat depending on what technology is being used, but as
David already pointed out before: both CC guest memory and register state is
protected from host attacks, so we are focusing on other communication channels
and on generic interfaces used by Linux today. 

Examples of such interfaces for TDX (and I think SEV shares most of them, but please
correct me if I am wrong here) are access to some MSRs and CPUIDs, port IO, MMIO
and DMA, access to PCI config space, KVM hypercalls (if hypervisor is KVM), TDX specific
hypercalls (this is technology specific), data consumed from untrusted host during the
CC guest initialization (including kernel itself, kernel command line, provided ACPI tables, 
etc) and others described in [1].
An important note here is that these interfaces are not limited just to device drivers
(albeit device drivers are the biggest users for some of them), they are present through the whole 
kernel in different subsystems and need careful examination and development of 
mitigations. 

The possible range of mitigations that we can apply is also wide, but you can roughly split it into
two groups: 

1. mitigations that use various attestation mechanisms (we can attest the kernel code,
cmline, ACPI tables being provided and other potential configurations, and one day we 
will hopefully also be able to attest devices we connect to CC guest and their configuration)

2. other mitigations for threats that attestation cannot cover, i.e. mainly runtime 
interactions with the host. 

Above sounds conceptually simple but the devil is as usual in details, but it doesn’t look
very impossible or smth that would need the ***insane*** changes to the entire kernel.

> 
> Others may end up objecting, "no wait, doing this is going to mean
> ***insane*** changes to the entire kernel, and this will be a
> performance / maintenance nightmare and unless you fix your hardware
> in future chips, we wlil consider this a hardware bug and reject all
> of your patches".
> 
> But it's better to figure this out now, then after you get hundreds of
> patches into the upstream kernel, we discover that this is only 5% of
> the necessary changes, and then the rest of your patches are rejected,
> and you have to end up fixing the hardware anyway, with the patches
> upstreamed so far being wasted effort.  :-)
> 
> If we get consensus on that document, then that can get checked into
> Documentation, and that can represent general consensus on the problem
> early on.

Sure, I am willing to work on this since we already spent quite a lot of effort
looking into this problem. My only question is how to organize a review of such
document in a sane and productive way and to make sure all relevant people
are included into discussion. As I said, this spawns across many areas in kernel,
and ideally you would want different people review their area in detail. 
For example, one of many aspects we need to worry is security of CC guest LRNG (
especially in cases when we don’t have a trusted security HW source of entropy)
[2] and here a feedback from LRNG experts would be important. 

I guess the first clear step I can do is to re-write the relevant part of [1]  into a CC-technology
neutral language and then would need feedback and input from AMD guys to make 
sure it correctly reflects their case also. We can probably do this preparation work 
on linux-coco mailing list and then post for a wider review? 

Best Regards,
Elena.

[1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#threat-model
[2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest

> 
> 						- Ted

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-25 15:29   ` Linux guest kernel threat model for Confidential Computing Reshetova, Elena
  2023-01-25 16:40     ` Theodore Ts'o
@ 2023-01-26 11:19     ` Leon Romanovsky
  2023-01-26 11:29       ` Reshetova, Elena
  2023-01-26 16:29     ` Michael S. Tsirkin
  2 siblings, 1 reply; 39+ messages in thread
From: Leon Romanovsky @ 2023-01-26 11:19 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> Replying only to the not-so-far addressed points. 
> 
> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > Hi Greg,

<...>

> > > 3) All the tools are open-source and everyone can start using them right away
> > even
> > > without any special HW (readme has description of what is needed).
> > > Tools and documentation is here:
> > > https://github.com/intel/ccc-linux-guest-hardening
> > 
> > Again, as our documentation states, when you submit patches based on
> > these tools, you HAVE TO document that.  Otherwise we think you all are
> > crazy and will get your patches rejected.  You all know this, why ignore
> > it?
> 
> Sorry, I didn’t know that for every bug that is found in linux kernel when
> we are submitting a fix that we have to list the way how it has been found.
> We will fix this in the future submissions, but some bugs we have are found by
> plain code audit, so 'human' is the tool. 

My problem with that statement is that by applying different threat
model you "invent" bugs which didn't exist in a first place.

For example, in this [1] latest submission, authors labeled correct
behaviour as "bug".

[1] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com/

Thanks

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-26 11:19     ` Leon Romanovsky
@ 2023-01-26 11:29       ` Reshetova, Elena
  2023-01-26 12:30         ` Leon Romanovsky
  2023-01-26 13:58         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-26 11:29 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > Replying only to the not-so-far addressed points.
> >
> > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > Hi Greg,
> 
> <...>
> 
> > > > 3) All the tools are open-source and everyone can start using them right
> away
> > > even
> > > > without any special HW (readme has description of what is needed).
> > > > Tools and documentation is here:
> > > > https://github.com/intel/ccc-linux-guest-hardening
> > >
> > > Again, as our documentation states, when you submit patches based on
> > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > crazy and will get your patches rejected.  You all know this, why ignore
> > > it?
> >
> > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > we are submitting a fix that we have to list the way how it has been found.
> > We will fix this in the future submissions, but some bugs we have are found by
> > plain code audit, so 'human' is the tool.
> 
> My problem with that statement is that by applying different threat
> model you "invent" bugs which didn't exist in a first place.
> 
> For example, in this [1] latest submission, authors labeled correct
> behaviour as "bug".
> 
> [1] https://lore.kernel.org/all/20230119170633.40944-1-
> alexander.shishkin@linux.intel.com/

Hm.. Does everyone think that when kernel dies with unhandled page fault 
(such as in that case) or detection of a KASAN out of bounds violation (as it is in some
other cases we already have fixes or investigating) it represents a correct behavior even if
you expect that all your pci HW devices are trusted? What about an error in two 
consequent pci reads? What about just some failure that results in erroneous input? 

Best Regards,
Elena.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 11:29       ` Reshetova, Elena
@ 2023-01-26 12:30         ` Leon Romanovsky
  2023-01-26 13:28           ` Reshetova, Elena
  2023-01-27  9:32           ` Jörg Rödel
  2023-01-26 13:58         ` Dr. David Alan Gilbert
  1 sibling, 2 replies; 39+ messages in thread
From: Leon Romanovsky @ 2023-01-26 12:30 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Replying only to the not-so-far addressed points.
> > >
> > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > Hi Greg,
> > 
> > <...>
> > 
> > > > > 3) All the tools are open-source and everyone can start using them right
> > away
> > > > even
> > > > > without any special HW (readme has description of what is needed).
> > > > > Tools and documentation is here:
> > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > >
> > > > Again, as our documentation states, when you submit patches based on
> > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > it?
> > >
> > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > we are submitting a fix that we have to list the way how it has been found.
> > > We will fix this in the future submissions, but some bugs we have are found by
> > > plain code audit, so 'human' is the tool.
> > 
> > My problem with that statement is that by applying different threat
> > model you "invent" bugs which didn't exist in a first place.
> > 
> > For example, in this [1] latest submission, authors labeled correct
> > behaviour as "bug".
> > 
> > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > alexander.shishkin@linux.intel.com/
> 
> Hm.. Does everyone think that when kernel dies with unhandled page fault 
> (such as in that case) or detection of a KASAN out of bounds violation (as it is in some
> other cases we already have fixes or investigating) it represents a correct behavior even if
> you expect that all your pci HW devices are trusted? 

This is exactly what I said. You presented me the cases which exist in
your invented world. Mentioned unhandled page fault doesn't exist in real
world. If PCI device doesn't work, it needs to be replaced/blocked and not
left to be operable and accessible from the kernel/user.

> What about an error in two consequent pci reads? What about just some
> failure that results in erroneous input?

Yes, some bugs need to be fixed, but they are not related to trust/not-trust
discussion and PCI spec violations.

Thanks

> 
> Best Regards,
> Elena.
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-26 12:30         ` Leon Romanovsky
@ 2023-01-26 13:28           ` Reshetova, Elena
  2023-01-26 13:50             ` Leon Romanovsky
                               ` (2 more replies)
  2023-01-27  9:32           ` Jörg Rödel
  1 sibling, 3 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-26 13:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

> On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > Replying only to the not-so-far addressed points.
> > > >
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg,
> > >
> > > <...>
> > >
> > > > > > 3) All the tools are open-source and everyone can start using them right
> > > away
> > > > > even
> > > > > > without any special HW (readme has description of what is needed).
> > > > > > Tools and documentation is here:
> > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > >
> > > > > Again, as our documentation states, when you submit patches based on
> > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > it?
> > > >
> > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > we are submitting a fix that we have to list the way how it has been found.
> > > > We will fix this in the future submissions, but some bugs we have are found
> by
> > > > plain code audit, so 'human' is the tool.
> > >
> > > My problem with that statement is that by applying different threat
> > > model you "invent" bugs which didn't exist in a first place.
> > >
> > > For example, in this [1] latest submission, authors labeled correct
> > > behaviour as "bug".
> > >
> > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > alexander.shishkin@linux.intel.com/
> >
> > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> some
> > other cases we already have fixes or investigating) it represents a correct
> behavior even if
> > you expect that all your pci HW devices are trusted?
> 
> This is exactly what I said. You presented me the cases which exist in
> your invented world. Mentioned unhandled page fault doesn't exist in real
> world. If PCI device doesn't work, it needs to be replaced/blocked and not
> left to be operable and accessible from the kernel/user.

Can we really assure correct operation of *all* pci devices out there? 
How would such an audit be performed given a huge set of them available? 
Isnt it better instead to make a small fix in the kernel behavior that would guard
us from such potentially not correctly operating devices? 


> 
> > What about an error in two consequent pci reads? What about just some
> > failure that results in erroneous input?
> 
> Yes, some bugs need to be fixed, but they are not related to trust/not-trust
> discussion and PCI spec violations.

Let's forget the trust angle here (it only applies to the Confidential Computing 
threat model and you clearly implying the existing threat model instead) and stick just to
the not-correctly operating device. What you are proposing is to fix *unknown* bugs
in multitude of pci devices that (in case of this particular MSI bug) can
lead to two different values being read from the config space and kernel incorrectly
handing this situation. Isn't it better to do the clear fix in one place to ensure such
situation (two subsequent reads with different values) cannot even happen in theory?
In security we have a saying that fixing a root cause of the problem is the most efficient
way to mitigate the problem. The root cause here is a double-read with different values,
so if it can be substituted with an easy and clear patch that probably even improves
performance as we do one less pci read and use cached value instead, where is the
problem in this particular case? If there are technical issues with the patch, of course we 
need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
to be fixing kernel code when we notice such cases... 

Best Regards,
Elena
 
 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 13:28           ` Reshetova, Elena
@ 2023-01-26 13:50             ` Leon Romanovsky
  2023-01-26 20:54             ` Theodore Ts'o
  2023-01-27 19:24             ` James Bottomley
  2 siblings, 0 replies; 39+ messages in thread
From: Leon Romanovsky @ 2023-01-26 13:50 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote:
> > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > >
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > Hi Greg,
> > > >
> > > > <...>
> > > >
> > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > away
> > > > > > even
> > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > >
> > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > > it?
> > > > >
> > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > We will fix this in the future submissions, but some bugs we have are found
> > by
> > > > > plain code audit, so 'human' is the tool.
> > > >
> > > > My problem with that statement is that by applying different threat
> > > > model you "invent" bugs which didn't exist in a first place.
> > > >
> > > > For example, in this [1] latest submission, authors labeled correct
> > > > behaviour as "bug".
> > > >
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > >
> > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > some
> > > other cases we already have fixes or investigating) it represents a correct
> > behavior even if
> > > you expect that all your pci HW devices are trusted?
> > 
> > This is exactly what I said. You presented me the cases which exist in
> > your invented world. Mentioned unhandled page fault doesn't exist in real
> > world. If PCI device doesn't work, it needs to be replaced/blocked and not
> > left to be operable and accessible from the kernel/user.
> 
> Can we really assure correct operation of *all* pci devices out there?

Why do we need to do it in 2022? These *all* pci devices work.

> How would such an audit be performed given a huge set of them available?

Compliance tests?
https://pcisig.com/developers/compliance-program

> Isnt it better instead to make a small fix in the kernel behavior that would guard
> us from such potentially not correctly operating devices? 

Like Greg already said, this is a small drop in a ocean which needs to be changed.

However even in mentioned by me case, you are not fixing but hiding real
problem of having broken device in my machine. It is worst possible solution
for the users. 

> 
> 
> > 
> > > What about an error in two consequent pci reads? What about just some
> > > failure that results in erroneous input?
> > 
> > Yes, some bugs need to be fixed, but they are not related to trust/not-trust
> > discussion and PCI spec violations.
> 
> Let's forget the trust angle here (it only applies to the Confidential Computing 
> threat model and you clearly implying the existing threat model instead) and stick just to
> the not-correctly operating device. What you are proposing is to fix *unknown* bugs
> in multitude of pci devices that (in case of this particular MSI bug) can
> lead to two different values being read from the config space and kernel incorrectly
> handing this situation. 

Let's don't call bug for something which is not.

Random crashes are much more tolerable then "working" device which sends
random results.

> Isn't it better to do the clear fix in one place to ensure such
> situation (two subsequent reads with different values) cannot even happen in theory?
> In security we have a saying that fixing a root cause of the problem is the most efficient
> way to mitigate the problem. The root cause here is a double-read with different values,
> so if it can be substituted with an easy and clear patch that probably even improves
> performance as we do one less pci read and use cached value instead, where is the
> problem in this particular case? If there are technical issues with the patch, of course we 
> need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
> to be fixing kernel code when we notice such cases... 

Not really, we are arguing what is the right thing to do:
1. Fix a root cause - device
2. Hide the failure and pretend what everything is perfect despite
having problematic device.

Thanks

> 
> Best Regards,
> Elena
>  
>  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 11:29       ` Reshetova, Elena
  2023-01-26 12:30         ` Leon Romanovsky
@ 2023-01-26 13:58         ` Dr. David Alan Gilbert
  2023-01-26 17:48           ` Reshetova, Elena
  1 sibling, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-26 13:58 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

* Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Replying only to the not-so-far addressed points.
> > >
> > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > Hi Greg,
> > 
> > <...>
> > 
> > > > > 3) All the tools are open-source and everyone can start using them right
> > away
> > > > even
> > > > > without any special HW (readme has description of what is needed).
> > > > > Tools and documentation is here:
> > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > >
> > > > Again, as our documentation states, when you submit patches based on
> > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > it?
> > >
> > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > we are submitting a fix that we have to list the way how it has been found.
> > > We will fix this in the future submissions, but some bugs we have are found by
> > > plain code audit, so 'human' is the tool.
> > 
> > My problem with that statement is that by applying different threat
> > model you "invent" bugs which didn't exist in a first place.
> > 
> > For example, in this [1] latest submission, authors labeled correct
> > behaviour as "bug".
> > 
> > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > alexander.shishkin@linux.intel.com/
> 
> Hm.. Does everyone think that when kernel dies with unhandled page fault 
> (such as in that case) or detection of a KASAN out of bounds violation (as it is in some
> other cases we already have fixes or investigating) it represents a correct behavior even if
> you expect that all your pci HW devices are trusted? What about an error in two 
> consequent pci reads? What about just some failure that results in erroneous input? 

I'm not sure you'll get general agreement on those answers for all
devices and situations; I think for most devices for non-CoCo
situations, then people are generally OK with a misbehaving PCI device
causing a kernel crash, since most people are running without IOMMU
anyway, a misbehaving device can cause otherwise undetectable chaos.

I'd say:
  a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
  guarantee forward progress or stop the hypervisor doing something
  truly stupid.

  b) For CoCo, information disclosure, or corruption IS a problem

  c) For non-CoCo some people might care about robustness of the kernel
  against a failing PCI device, but generally I think they worry about
  a fairly clean failure, even in the unexpected-hot unplug case.

  d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe
  device; if it's a device that attests OK and we trust it is the device
  it says it is, do we give it freedom or are we still wary?

Dave


> Best Regards,
> Elena.
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-25 15:29   ` Linux guest kernel threat model for Confidential Computing Reshetova, Elena
  2023-01-25 16:40     ` Theodore Ts'o
  2023-01-26 11:19     ` Leon Romanovsky
@ 2023-01-26 16:29     ` Michael S. Tsirkin
  2023-01-27  8:52       ` Reshetova, Elena
  2 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2023-01-26 16:29 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening

On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> And this is a very special aspect of 'hardening' since it is about hardening a kernel
> under different threat model/assumptions. 

I am not sure it's that special in that hardening IMHO is not a specific
threat model or a set of assumptions. IIUC it's just something that
helps reduce severity of vulnerabilities.  Similarly, one can use the CC
hardware in a variety of ways I guess. And one way is just that -
hardening linux such that ability to corrupt guest memory does not
automatically escalate into guest code execution.

If you put it this way, you get to participate in a well understood
problem space instead of constantly saying "yes but CC is special".  And
further, you will now talk about features as opposed to fixing bugs.
Which will stop annoying people who currently seem annoyed by the
implication that their code is buggy simply because it does not cache in
memory all data read from hardware. Finally, you then don't really need
to explain why e.g. DoS is not a problem but info leak is a problem - when
for many users it's actually the reverse - the reason is not that it's
not part of a threat model - which then makes you work hard to define
the threat model - but simply that CC hardware does not support this
kind of hardening.

-- 
MST


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-26 13:58         ` Dr. David Alan Gilbert
@ 2023-01-26 17:48           ` Reshetova, Elena
  2023-01-26 18:06             ` Leon Romanovsky
  0 siblings, 1 reply; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-26 17:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening


> * Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > Replying only to the not-so-far addressed points.
> > > >
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg,
> > >
> > > <...>
> > >
> > > > > > 3) All the tools are open-source and everyone can start using them right
> > > away
> > > > > even
> > > > > > without any special HW (readme has description of what is needed).
> > > > > > Tools and documentation is here:
> > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > >
> > > > > Again, as our documentation states, when you submit patches based on
> > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > it?
> > > >
> > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > we are submitting a fix that we have to list the way how it has been found.
> > > > We will fix this in the future submissions, but some bugs we have are found
> by
> > > > plain code audit, so 'human' is the tool.
> > >
> > > My problem with that statement is that by applying different threat
> > > model you "invent" bugs which didn't exist in a first place.
> > >
> > > For example, in this [1] latest submission, authors labeled correct
> > > behaviour as "bug".
> > >
> > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > alexander.shishkin@linux.intel.com/
> >
> > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> some
> > other cases we already have fixes or investigating) it represents a correct
> behavior even if
> > you expect that all your pci HW devices are trusted? What about an error in
> two
> > consequent pci reads? What about just some failure that results in erroneous
> input?
> 
> I'm not sure you'll get general agreement on those answers for all
> devices and situations; I think for most devices for non-CoCo
> situations, then people are generally OK with a misbehaving PCI device
> causing a kernel crash, since most people are running without IOMMU
> anyway, a misbehaving device can cause otherwise undetectable chaos.

Ok, if this is a consensus within the kernel community, then we can consider
the fixes strictly from the CoCo threat model point of view. 

> 
> I'd say:
>   a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
>   guarantee forward progress or stop the hypervisor doing something
>   truly stupid.

Yes, denial of service is out of scope but I would not pile all crashes as
'safe' automatically. Depending on the crash, it can be used as a
primitive to launch further attacks: privilege escalation, information
disclosure and corruption. It is especially true for memory corruption
issues. 

>   b) For CoCo, information disclosure, or corruption IS a problem

Agreed, but the path to this can incorporate a number of attack 
primitives, as well as bug chaining. So, if the bug is detected, and
fix is easy, instead of thinking about possible implications and its 
potential usage in exploit writing, safer to fix it.

> 
>   c) For non-CoCo some people might care about robustness of the kernel
>   against a failing PCI device, but generally I think they worry about
>   a fairly clean failure, even in the unexpected-hot unplug case.

Ok.

> 
>   d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe
>   device; if it's a device that attests OK and we trust it is the device
>   it says it is, do we give it freedom or are we still wary?

I would say that attestation and established secure channel to an end
device means that we don’t have to employ additional measures to
secure data transfer, as well as we 'trust' a device at least to some degree to
keep our data protected (both from untrusted host and from other
CC guests). I don’t think there is anything else behind this concept. 

Best Regards,
Elena





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 17:48           ` Reshetova, Elena
@ 2023-01-26 18:06             ` Leon Romanovsky
  2023-01-26 18:14               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 39+ messages in thread
From: Leon Romanovsky @ 2023-01-26 18:06 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Dr. David Alan Gilbert, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote:
> 
> > * Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > >
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > Hi Greg,
> > > >
> > > > <...>
> > > >
> > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > away
> > > > > > even
> > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > >
> > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > > it?
> > > > >
> > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > We will fix this in the future submissions, but some bugs we have are found
> > by
> > > > > plain code audit, so 'human' is the tool.
> > > >
> > > > My problem with that statement is that by applying different threat
> > > > model you "invent" bugs which didn't exist in a first place.
> > > >
> > > > For example, in this [1] latest submission, authors labeled correct
> > > > behaviour as "bug".
> > > >
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > >
> > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > some
> > > other cases we already have fixes or investigating) it represents a correct
> > behavior even if
> > > you expect that all your pci HW devices are trusted? What about an error in
> > two
> > > consequent pci reads? What about just some failure that results in erroneous
> > input?
> > 
> > I'm not sure you'll get general agreement on those answers for all
> > devices and situations; I think for most devices for non-CoCo
> > situations, then people are generally OK with a misbehaving PCI device
> > causing a kernel crash, since most people are running without IOMMU
> > anyway, a misbehaving device can cause otherwise undetectable chaos.
> 
> Ok, if this is a consensus within the kernel community, then we can consider
> the fixes strictly from the CoCo threat model point of view. 
> 
> > 
> > I'd say:
> >   a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> >   guarantee forward progress or stop the hypervisor doing something
> >   truly stupid.
> 
> Yes, denial of service is out of scope but I would not pile all crashes as
> 'safe' automatically. Depending on the crash, it can be used as a
> primitive to launch further attacks: privilege escalation, information
> disclosure and corruption. It is especially true for memory corruption
> issues. 
> 
> >   b) For CoCo, information disclosure, or corruption IS a problem
> 
> Agreed, but the path to this can incorporate a number of attack 
> primitives, as well as bug chaining. So, if the bug is detected, and
> fix is easy, instead of thinking about possible implications and its 
> potential usage in exploit writing, safer to fix it.
> 
> > 
> >   c) For non-CoCo some people might care about robustness of the kernel
> >   against a failing PCI device, but generally I think they worry about
> >   a fairly clean failure, even in the unexpected-hot unplug case.
> 
> Ok.

With my other hat as a representative of hardware vendor (at least for
NIC part), who cares about quality of our devices, we don't want to hide
ANY crash related to our devices, especially if it is related to misbehaving
PCI HW logic. Any uncontrolled "robustness" hides real issues and makes
QA/customer support much harder.

Thanks

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 18:06             ` Leon Romanovsky
@ 2023-01-26 18:14               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-26 18:14 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Reshetova, Elena, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

* Leon Romanovsky (leon@kernel.org) wrote:
> On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote:
> > 
> > > * Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > > away
> > > > > > > even
> > > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > > > it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > > We will fix this in the future submissions, but some bugs we have are found
> > > by
> > > > > > plain code audit, so 'human' is the tool.
> > > > >
> > > > > My problem with that statement is that by applying different threat
> > > > > model you "invent" bugs which didn't exist in a first place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled correct
> > > > > behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@linux.intel.com/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > > some
> > > > other cases we already have fixes or investigating) it represents a correct
> > > behavior even if
> > > > you expect that all your pci HW devices are trusted? What about an error in
> > > two
> > > > consequent pci reads? What about just some failure that results in erroneous
> > > input?
> > > 
> > > I'm not sure you'll get general agreement on those answers for all
> > > devices and situations; I think for most devices for non-CoCo
> > > situations, then people are generally OK with a misbehaving PCI device
> > > causing a kernel crash, since most people are running without IOMMU
> > > anyway, a misbehaving device can cause otherwise undetectable chaos.
> > 
> > Ok, if this is a consensus within the kernel community, then we can consider
> > the fixes strictly from the CoCo threat model point of view. 
> > 
> > > 
> > > I'd say:
> > >   a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> > >   guarantee forward progress or stop the hypervisor doing something
> > >   truly stupid.
> > 
> > Yes, denial of service is out of scope but I would not pile all crashes as
> > 'safe' automatically. Depending on the crash, it can be used as a
> > primitive to launch further attacks: privilege escalation, information
> > disclosure and corruption. It is especially true for memory corruption
> > issues. 
> > 
> > >   b) For CoCo, information disclosure, or corruption IS a problem
> > 
> > Agreed, but the path to this can incorporate a number of attack 
> > primitives, as well as bug chaining. So, if the bug is detected, and
> > fix is easy, instead of thinking about possible implications and its 
> > potential usage in exploit writing, safer to fix it.
> > 
> > > 
> > >   c) For non-CoCo some people might care about robustness of the kernel
> > >   against a failing PCI device, but generally I think they worry about
> > >   a fairly clean failure, even in the unexpected-hot unplug case.
> > 
> > Ok.
> 
> With my other hat as a representative of hardware vendor (at least for
> NIC part), who cares about quality of our devices, we don't want to hide
> ANY crash related to our devices, especially if it is related to misbehaving
> PCI HW logic. Any uncontrolled "robustness" hides real issues and makes
> QA/customer support much harder.

Yeh if you're adding new code to be more careful, you want the code to
fail/log the problem, not hide it.
(Although heck, I suspect there are a million apparently working PCI
cards out there that break some spec somewhere).

Dave

> Thanks
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 13:28           ` Reshetova, Elena
  2023-01-26 13:50             ` Leon Romanovsky
@ 2023-01-26 20:54             ` Theodore Ts'o
  2023-01-27 19:24             ` James Bottomley
  2 siblings, 0 replies; 39+ messages in thread
From: Theodore Ts'o @ 2023-01-26 20:54 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote:
> > This is exactly what I said. You presented me the cases which exist in
> > your invented world. Mentioned unhandled page fault doesn't exist in real
> > world. If PCI device doesn't work, it needs to be replaced/blocked and not
> > left to be operable and accessible from the kernel/user.
> 
> Can we really assure correct operation of *all* pci devices out there? 
> How would such an audit be performed given a huge set of them available? 
> Isnt it better instead to make a small fix in the kernel behavior that would guard
> us from such potentially not correctly operating devices?

We assume that hardware works according to the spec; that's why we
have a specification.  Otherwise, things would be pretty insane, and
would lead to massive bloat *everywhere*.  If there are broken PCI
devices out there, then we can blacklist the PCI device.  If a
manufacturer is consistently creating devices which don't obey the
spec, we could block all devices from that manufacturer, and have an
explicit white list for those devices from that manufacturer that
actually work.

If we can't count on a floating point instruction to return the right
value, what are we supposed to do?  Create a code which double checks
every single floating point instruction just in case 2 + 2 = 3.99999999?   :-)

Ultimately, changing the trust boundary what is considered is a
fundamentally hard thing, and to try to claim that code that assumes
that things inside the trust boundary are, well, trusted, is not a
great way to win friends and influence people.

> Let's forget the trust angle here (it only applies to the Confidential Computing 
> threat model and you clearly implying the existing threat model instead) and stick just to
> the not-correctly operating device. What you are proposing is to fix *unknown* bugs
> in multitude of pci devices that (in case of this particular MSI bug) can
> lead to two different values being read from the config space and kernel incorrectly
> handing this situation.

I don't think that's what people are saying.  If there are buggy PCI
devices, we can put them on block lists.  But checking that every
single read from the config space is unchanged is not something we
should do, period.

> Isn't it better to do the clear fix in one place to ensure such
> situation (two subsequent reads with different values) cannot even happen in theory?
> In security we have a saying that fixing a root cause of the problem is the most efficient
> way to mitigate the problem. The root cause here is a double-read with different values,
> so if it can be substituted with an easy and clear patch that probably even improves
> performance as we do one less pci read and use cached value instead, where is the
> problem in this particular case? If there are technical issues with the patch, of course we 
> need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
> to be fixing kernel code when we notice such cases...

Well, if there is a performance win to cache a read from config space,
then make the argument from a performance perspective.  But caching
values takes memory, and will potentially bloat data structures.  It's
not necessarily cost-free to caching every single config space
variable to prevent double-read from either buggy or malicious devices.

So it's one thing if we make each decision from a cost-benefit
perspective.  But then it's a *optimization*, not a *bug-fix*, and it
also means that we aren't obligated to cache every single read from
config space, lest someone wag their fingers at us saying, "Buggy!
Your code is Buggy!".

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-26 16:29     ` Michael S. Tsirkin
@ 2023-01-27  8:52       ` Reshetova, Elena
  2023-01-27 10:04         ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-27  8:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening

> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > And this is a very special aspect of 'hardening' since it is about hardening a
> kernel
> > under different threat model/assumptions.
> 
> I am not sure it's that special in that hardening IMHO is not a specific
> threat model or a set of assumptions. IIUC it's just something that
> helps reduce severity of vulnerabilities.  Similarly, one can use the CC
> hardware in a variety of ways I guess. And one way is just that -
> hardening linux such that ability to corrupt guest memory does not
> automatically escalate into guest code execution.

I am not sure if I fully follow you on this. I do agree that it is in principle
the same 'hardening' that we have been doing in Linux for decades just
applied to a new attack surface, host <-> guest, vs userspace <->kernel.
Interfaces have changed, but the types of vulnerabilities, etc are the same.
The attacker model is somewhat different because we have 
different expectations on what host/hypervisor should be able to do
to the guest (following business reasons and use-cases), versus what we
expect normal userspace being able to "do" towards kernel. The host and
hypervisor still has a lot of control over the guest (ability to start/stop it, 
manage its resources, etc). But the reasons behind this doesn’t come
from the fact that security CoCo HW not being able to support this stricter
security model (it cannot now indeed, but this is a design decision), but
from the fact that it is important for Cloud service providers to retain that
level of control over their infrastructure. 
 
> 
> If you put it this way, you get to participate in a well understood
> problem space instead of constantly saying "yes but CC is special".  And
> further, you will now talk about features as opposed to fixing bugs.
> Which will stop annoying people who currently seem annoyed by the
> implication that their code is buggy simply because it does not cache in
> memory all data read from hardware. Finally, you then don't really need
> to explain why e.g. DoS is not a problem but info leak is a problem - when
> for many users it's actually the reverse - the reason is not that it's
> not part of a threat model - which then makes you work hard to define
> the threat model - but simply that CC hardware does not support this
> kind of hardening.

But this won't be correct statement, because it is not limitation of HW, but the
threat and business model that Confidential Computing exists in. I am not 
aware of a single cloud provider who would be willing to use the HW that
takes the full control of their infrastructure and running confidential guests,
leaving them with no mechanisms to control the load balancing, enforce
resource usage, etc. So, given that nobody needs/willing to use such HW, 
such HW simply doesn’t exist. 

So, I would still say that the model we operate in CoCo usecases is somewhat
special, but I do agree that given that we list a couple of these special assumptions
(over which ones we have no control or ability to influence, none of us are business
people), then the rest becomes just careful enumeration of attack surface interfaces
and break up of potential mitigations. 

Best Regards,
Elena.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 12:30         ` Leon Romanovsky
  2023-01-26 13:28           ` Reshetova, Elena
@ 2023-01-27  9:32           ` Jörg Rödel
  1 sibling, 0 replies; 39+ messages in thread
From: Jörg Rödel @ 2023-01-27  9:32 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Reshetova, Elena, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Thu, Jan 26, 2023 at 02:30:19PM +0200, Leon Romanovsky wrote:
> This is exactly what I said. You presented me the cases which exist in
> your invented world. Mentioned unhandled page fault doesn't exist in real
> world. If PCI device doesn't work, it needs to be replaced/blocked and not
> left to be operable and accessible from the kernel/user.

Believe it or not, this "invented" world is already part of the real
world, and will become even more in the future.

So this has been stated elsewhere in the thread already, but I also like
to stress that hiding misbehavior of devices (real or emulated) is not
the goal of this work.

In fact, the best action for a CoCo guest in case it detects a
(possible) attack is to stop whatever it is doing and crash. And a
misbehaving device in a CoCo guest is a possible attack.

But what needs to be prevented at all costs is undefined behavior in the
CoCo guest that is triggerable by the HV, e.g. by letting an emulated
device misbehave. That undefined behavior can lead to information leak,
which is a way bigger problem for a guest owner than a crashed VM.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-27  8:52       ` Reshetova, Elena
@ 2023-01-27 10:04         ` Michael S. Tsirkin
  2023-01-27 12:25           ` Reshetova, Elena
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2023-01-27 10:04 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening

On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > And this is a very special aspect of 'hardening' since it is about hardening a
> > kernel
> > > under different threat model/assumptions.
> > 
> > I am not sure it's that special in that hardening IMHO is not a specific
> > threat model or a set of assumptions. IIUC it's just something that
> > helps reduce severity of vulnerabilities.  Similarly, one can use the CC
> > hardware in a variety of ways I guess. And one way is just that -
> > hardening linux such that ability to corrupt guest memory does not
> > automatically escalate into guest code execution.
> 
> I am not sure if I fully follow you on this. I do agree that it is in principle
> the same 'hardening' that we have been doing in Linux for decades just
> applied to a new attack surface, host <-> guest, vs userspace <->kernel.

Sorry about being unclear this is not the type of hardening I meant
really.  The "hardening" you meant is preventing kernel vulnerabilities,
right? This is what we've been doing for decades.
But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
we are trying to reduce a chance a vulnerability causes random
code execution as opposed to a DOS. To think in these terms you do not
need to think about attack surfaces - in the system including
a hypervisor, guest supervisor and guest userspace hiding
one component from others is helpful even if they share
a privelege level.



> Interfaces have changed, but the types of vulnerabilities, etc are the same.
> The attacker model is somewhat different because we have 
> different expectations on what host/hypervisor should be able to do
> to the guest (following business reasons and use-cases), versus what we
> expect normal userspace being able to "do" towards kernel. The host and
> hypervisor still has a lot of control over the guest (ability to start/stop it, 
> manage its resources, etc). But the reasons behind this doesn’t come
> from the fact that security CoCo HW not being able to support this stricter
> security model (it cannot now indeed, but this is a design decision), but
> from the fact that it is important for Cloud service providers to retain that
> level of control over their infrastructure. 

Surely they need ability to control resource usage, not ability to execute DOS
attacks. Current hardware just does not have ability to allow the former
without the later.

> > 
> > If you put it this way, you get to participate in a well understood
> > problem space instead of constantly saying "yes but CC is special".  And
> > further, you will now talk about features as opposed to fixing bugs.
> > Which will stop annoying people who currently seem annoyed by the
> > implication that their code is buggy simply because it does not cache in
> > memory all data read from hardware. Finally, you then don't really need
> > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > for many users it's actually the reverse - the reason is not that it's
> > not part of a threat model - which then makes you work hard to define
> > the threat model - but simply that CC hardware does not support this
> > kind of hardening.
> 
> But this won't be correct statement, because it is not limitation of HW, but the
> threat and business model that Confidential Computing exists in. I am not 
> aware of a single cloud provider who would be willing to use the HW that
> takes the full control of their infrastructure and running confidential guests,
> leaving them with no mechanisms to control the load balancing, enforce
> resource usage, etc. So, given that nobody needs/willing to use such HW, 
> such HW simply doesn’t exist. 
> 
> So, I would still say that the model we operate in CoCo usecases is somewhat
> special, but I do agree that given that we list a couple of these special assumptions
> (over which ones we have no control or ability to influence, none of us are business
> people), then the rest becomes just careful enumeration of attack surface interfaces
> and break up of potential mitigations. 
> 
> Best Regards,
> Elena.
> 

I'd say each business has a slightly different business model, no?
Finding common ground is what helps us share code ...

-- 
MST


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-27 10:04         ` Michael S. Tsirkin
@ 2023-01-27 12:25           ` Reshetova, Elena
  2023-01-27 14:32             ` Michael S. Tsirkin
  2023-01-27 20:51             ` Carlos Bilbao
  0 siblings, 2 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-27 12:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening


> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > And this is a very special aspect of 'hardening' since it is about hardening a
> > > kernel
> > > > under different threat model/assumptions.
> > >
> > > I am not sure it's that special in that hardening IMHO is not a specific
> > > threat model or a set of assumptions. IIUC it's just something that
> > > helps reduce severity of vulnerabilities.  Similarly, one can use the CC
> > > hardware in a variety of ways I guess. And one way is just that -
> > > hardening linux such that ability to corrupt guest memory does not
> > > automatically escalate into guest code execution.
> >
> > I am not sure if I fully follow you on this. I do agree that it is in principle
> > the same 'hardening' that we have been doing in Linux for decades just
> > applied to a new attack surface, host <-> guest, vs userspace <->kernel.
> 
> Sorry about being unclear this is not the type of hardening I meant
> really.  The "hardening" you meant is preventing kernel vulnerabilities,
> right? This is what we've been doing for decades.
> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
> we are trying to reduce a chance a vulnerability causes random
> code execution as opposed to a DOS. To think in these terms you do not
> need to think about attack surfaces - in the system including
> a hypervisor, guest supervisor and guest userspace hiding
> one component from others is helpful even if they share
> a privelege level.

Do you mean that the fact that CoCo guest has memory encrypted
can help even in non-CoCo scenarios? I am sorry, I still seem not to be able
to grasp your idea fully. When the privilege level is shared, there is no
incentive to perform privilege escalation attacks across components,
so why hide them from each other? Data protection? But I don’t think you
are talking about this? I do agree that KASLR is stronger when you remove
the possibility to read the memory (make sure kernel code is execute only)
you are trying to attack, but again not sure if you mean this. 

> 
> 
> 
> > Interfaces have changed, but the types of vulnerabilities, etc are the same.
> > The attacker model is somewhat different because we have
> > different expectations on what host/hypervisor should be able to do
> > to the guest (following business reasons and use-cases), versus what we
> > expect normal userspace being able to "do" towards kernel. The host and
> > hypervisor still has a lot of control over the guest (ability to start/stop it,
> > manage its resources, etc). But the reasons behind this doesn’t come
> > from the fact that security CoCo HW not being able to support this stricter
> > security model (it cannot now indeed, but this is a design decision), but
> > from the fact that it is important for Cloud service providers to retain that
> > level of control over their infrastructure.
> 
> Surely they need ability to control resource usage, not ability to execute DOS
> attacks. Current hardware just does not have ability to allow the former
> without the later.

I don’t see why it cannot be added to HW if requirement comes. However, I think 
in cloud provider world being able to control resources equals to being able
to deny these resources when required, so being able to denial of service its clients
is kind of build-in expectation that everyone just agrees on.  

> 
> > >
> > > If you put it this way, you get to participate in a well understood
> > > problem space instead of constantly saying "yes but CC is special".  And
> > > further, you will now talk about features as opposed to fixing bugs.
> > > Which will stop annoying people who currently seem annoyed by the
> > > implication that their code is buggy simply because it does not cache in
> > > memory all data read from hardware. Finally, you then don't really need
> > > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > > for many users it's actually the reverse - the reason is not that it's
> > > not part of a threat model - which then makes you work hard to define
> > > the threat model - but simply that CC hardware does not support this
> > > kind of hardening.
> >
> > But this won't be correct statement, because it is not limitation of HW, but the
> > threat and business model that Confidential Computing exists in. I am not
> > aware of a single cloud provider who would be willing to use the HW that
> > takes the full control of their infrastructure and running confidential guests,
> > leaving them with no mechanisms to control the load balancing, enforce
> > resource usage, etc. So, given that nobody needs/willing to use such HW,
> > such HW simply doesn’t exist.
> >
> > So, I would still say that the model we operate in CoCo usecases is somewhat
> > special, but I do agree that given that we list a couple of these special
> assumptions
> > (over which ones we have no control or ability to influence, none of us are
> business
> > people), then the rest becomes just careful enumeration of attack surface
> interfaces
> > and break up of potential mitigations.
> >
> > Best Regards,
> > Elena.
> >
> 
> I'd say each business has a slightly different business model, no?
> Finding common ground is what helps us share code ...

Fully agree, and a good discussion with everyone willing to listen and cooperate
can go a long way into defining the best implementation. 

Best Regards,
Elena. 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-27 12:25           ` Reshetova, Elena
@ 2023-01-27 14:32             ` Michael S. Tsirkin
  2023-01-27 20:51             ` Carlos Bilbao
  1 sibling, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2023-01-27 14:32 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening

On Fri, Jan 27, 2023 at 12:25:09PM +0000, Reshetova, Elena wrote:
> 
> > On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > And this is a very special aspect of 'hardening' since it is about hardening a
> > > > kernel
> > > > > under different threat model/assumptions.
> > > >
> > > > I am not sure it's that special in that hardening IMHO is not a specific
> > > > threat model or a set of assumptions. IIUC it's just something that
> > > > helps reduce severity of vulnerabilities.  Similarly, one can use the CC
> > > > hardware in a variety of ways I guess. And one way is just that -
> > > > hardening linux such that ability to corrupt guest memory does not
> > > > automatically escalate into guest code execution.
> > >
> > > I am not sure if I fully follow you on this. I do agree that it is in principle
> > > the same 'hardening' that we have been doing in Linux for decades just
> > > applied to a new attack surface, host <-> guest, vs userspace <->kernel.
> > 
> > Sorry about being unclear this is not the type of hardening I meant
> > really.  The "hardening" you meant is preventing kernel vulnerabilities,
> > right? This is what we've been doing for decades.
> > But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
> > we are trying to reduce a chance a vulnerability causes random
> > code execution as opposed to a DOS. To think in these terms you do not
> > need to think about attack surfaces - in the system including
> > a hypervisor, guest supervisor and guest userspace hiding
> > one component from others is helpful even if they share
> > a privelege level.
> 
> Do you mean that the fact that CoCo guest has memory encrypted
> can help even in non-CoCo scenarios?

Yes.

> I am sorry, I still seem not to be able
> to grasp your idea fully. When the privilege level is shared, there is no
> incentive to perform privilege escalation attacks across components,
> so why hide them from each other?

Because limiting horisontal movement between components is still valuable.

> Data protection? But I don’t think you
> are talking about this? I do agree that KASLR is stronger when you remove
> the possibility to read the memory (make sure kernel code is execute only)
> you are trying to attack, but again not sure if you mean this. 

It's an example. If kernel was 100% secure we won't need KASLR. Nothing
ever is though.

> > 
> > 
> > 
> > > Interfaces have changed, but the types of vulnerabilities, etc are the same.
> > > The attacker model is somewhat different because we have
> > > different expectations on what host/hypervisor should be able to do
> > > to the guest (following business reasons and use-cases), versus what we
> > > expect normal userspace being able to "do" towards kernel. The host and
> > > hypervisor still has a lot of control over the guest (ability to start/stop it,
> > > manage its resources, etc). But the reasons behind this doesn’t come
> > > from the fact that security CoCo HW not being able to support this stricter
> > > security model (it cannot now indeed, but this is a design decision), but
> > > from the fact that it is important for Cloud service providers to retain that
> > > level of control over their infrastructure.
> > 
> > Surely they need ability to control resource usage, not ability to execute DOS
> > attacks. Current hardware just does not have ability to allow the former
> > without the later.
> 
> I don’t see why it cannot be added to HW if requirement comes. However, I think 
> in cloud provider world being able to control resources equals to being able
> to deny these resources when required, so being able to denial of service its clients
> is kind of build-in expectation that everyone just agrees on.  
> 
> > 
> > > >
> > > > If you put it this way, you get to participate in a well understood
> > > > problem space instead of constantly saying "yes but CC is special".  And
> > > > further, you will now talk about features as opposed to fixing bugs.
> > > > Which will stop annoying people who currently seem annoyed by the
> > > > implication that their code is buggy simply because it does not cache in
> > > > memory all data read from hardware. Finally, you then don't really need
> > > > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > > > for many users it's actually the reverse - the reason is not that it's
> > > > not part of a threat model - which then makes you work hard to define
> > > > the threat model - but simply that CC hardware does not support this
> > > > kind of hardening.
> > >
> > > But this won't be correct statement, because it is not limitation of HW, but the
> > > threat and business model that Confidential Computing exists in. I am not
> > > aware of a single cloud provider who would be willing to use the HW that
> > > takes the full control of their infrastructure and running confidential guests,
> > > leaving them with no mechanisms to control the load balancing, enforce
> > > resource usage, etc. So, given that nobody needs/willing to use such HW,
> > > such HW simply doesn’t exist.
> > >
> > > So, I would still say that the model we operate in CoCo usecases is somewhat
> > > special, but I do agree that given that we list a couple of these special
> > assumptions
> > > (over which ones we have no control or ability to influence, none of us are
> > business
> > > people), then the rest becomes just careful enumeration of attack surface
> > interfaces
> > > and break up of potential mitigations.
> > >
> > > Best Regards,
> > > Elena.
> > >
> > 
> > I'd say each business has a slightly different business model, no?
> > Finding common ground is what helps us share code ...
> 
> Fully agree, and a good discussion with everyone willing to listen and cooperate
> can go a long way into defining the best implementation. 
> 
> Best Regards,
> Elena. 

Right. My point was that trying to show how CC usecases are similar to other
existing ones will be more helpful for everyone than just focusing on how they
are different. I hope I was able to show some similarities.

-- 
MST


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-26 13:28           ` Reshetova, Elena
  2023-01-26 13:50             ` Leon Romanovsky
  2023-01-26 20:54             ` Theodore Ts'o
@ 2023-01-27 19:24             ` James Bottomley
  2023-01-30  7:42               ` Reshetova, Elena
  2 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2023-01-27 19:24 UTC (permalink / raw)
  To: Reshetova, Elena, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > > 
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > wrote:
> > > > > > > Hi Greg,
> > > > 
> > > > <...>
> > > > 
> > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > using them right away even without any special HW (readme
> > > > > > > has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > 
> > > > > > Again, as our documentation states, when you submit patches
> > > > > > based on these tools, you HAVE TO document that.  Otherwise
> > > > > > we think you all are crazy and will get your patches
> > > > > > rejected.  You all know this, why ignore it?
> > > > > 
> > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > linux kernel when we are submitting a fix that we have to
> > > > > list the way how it has been found. We will fix this in the
> > > > > future submissions, but some bugs we have are found by
> > > > > plain code audit, so 'human' is the tool. 
> > > > My problem with that statement is that by applying different
> > > > threat model you "invent" bugs which didn't exist in a first
> > > > place.
> > > > 
> > > > For example, in this [1] latest submission, authors labeled
> > > > correct behaviour as "bug".
> > > > 
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > > 
> > > Hm.. Does everyone think that when kernel dies with unhandled
> > > page fault (such as in that case) or detection of a KASAN out of
> > > bounds violation (as it is in some other cases we already have
> > > fixes or investigating) it represents a correct behavior even if
> > > you expect that all your pci HW devices are trusted?
> > 
> > This is exactly what I said. You presented me the cases which exist
> > in your invented world. Mentioned unhandled page fault doesn't
> > exist in real world. If PCI device doesn't work, it needs to be
> > replaced/blocked and not left to be operable and accessible from
> > the kernel/user.
> 
> Can we really assure correct operation of *all* pci devices out
> there? How would such an audit be performed given a huge set of them
> available? Isnt it better instead to make a small fix in the kernel
> behavior that would guard us from such potentially not correctly
> operating devices? 

I think this is really the wrong question from the confidential
computing (CC) point of view.  The question shouldn't be about assuring
that the PCI device is operating completely correctly all the time (for
some value of correct).  It's if it were programmed to be malicious
what could it do to us?  If we take all DoS and Crash outcomes off the
table (annoying but harmless if they don't reveal the confidential
contents), we're left with it trying to extract secrets from the
confidential environment.

The big threat from most devices (including the thunderbolt classes) is
that they can DMA all over memory.  However, this isn't really a threat
in CC (well until PCI becomes able to do encrypted DMA) because the
device has specific unencrypted buffers set aside for the expected DMA.
If it writes outside that CC integrity will detect it and if it reads
outside that it gets unintelligible ciphertext.  So we're left with the
device trying to trick secrets out of us by returning unexpected data.

If I set this as the problem, verifying device correct operation is a
possible solution (albeit hugely expensive), but there are likely many
other cheaper ways to defeat or detect a device trying to trick us into
revealing something.

James


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-27 12:25           ` Reshetova, Elena
  2023-01-27 14:32             ` Michael S. Tsirkin
@ 2023-01-27 20:51             ` Carlos Bilbao
  1 sibling, 0 replies; 39+ messages in thread
From: Carlos Bilbao @ 2023-01-27 20:51 UTC (permalink / raw)
  To: Reshetova, Elena, Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening

On 1/27/23 6:25 AM, Reshetova, Elena wrote:
> 
>> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
>>>> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
>>>>> And this is a very special aspect of 'hardening' since it is about hardening a
>>>> kernel
>>>>> under different threat model/assumptions.
>>>>
>>>> I am not sure it's that special in that hardening IMHO is not a specific
>>>> threat model or a set of assumptions. IIUC it's just something that
>>>> helps reduce severity of vulnerabilities.  Similarly, one can use the CC
>>>> hardware in a variety of ways I guess. And one way is just that -
>>>> hardening linux such that ability to corrupt guest memory does not
>>>> automatically escalate into guest code execution.
>>>
>>> I am not sure if I fully follow you on this. I do agree that it is in principle
>>> the same 'hardening' that we have been doing in Linux for decades just
>>> applied to a new attack surface, host <-> guest, vs userspace <->kernel.
>>
>> Sorry about being unclear this is not the type of hardening I meant
>> really.  The "hardening" you meant is preventing kernel vulnerabilities,
>> right? This is what we've been doing for decades.
>> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
>> we are trying to reduce a chance a vulnerability causes random
>> code execution as opposed to a DOS. To think in these terms you do not
>> need to think about attack surfaces - in the system including
>> a hypervisor, guest supervisor and guest userspace hiding
>> one component from others is helpful even if they share
>> a privelege level.
> 
> Do you mean that the fact that CoCo guest has memory encrypted
> can help even in non-CoCo scenarios? I am sorry, I still seem not to be able
> to grasp your idea fully. When the privilege level is shared, there is no
> incentive to perform privilege escalation attacks across components,
> so why hide them from each other? Data protection? But I don’t think you
> are talking about this? I do agree that KASLR is stronger when you remove
> the possibility to read the memory (make sure kernel code is execute only)
> you are trying to attack, but again not sure if you mean this. 
> 
>>
>>
>>
>>> Interfaces have changed, but the types of vulnerabilities, etc are the same.
>>> The attacker model is somewhat different because we have
>>> different expectations on what host/hypervisor should be able to do
>>> to the guest (following business reasons and use-cases), versus what we
>>> expect normal userspace being able to "do" towards kernel. The host and
>>> hypervisor still has a lot of control over the guest (ability to start/stop it,
>>> manage its resources, etc). But the reasons behind this doesn’t come
>>> from the fact that security CoCo HW not being able to support this stricter
>>> security model (it cannot now indeed, but this is a design decision), but
>>> from the fact that it is important for Cloud service providers to retain that
>>> level of control over their infrastructure.
>>
>> Surely they need ability to control resource usage, not ability to execute DOS
>> attacks. Current hardware just does not have ability to allow the former
>> without the later.
> 
> I don’t see why it cannot be added to HW if requirement comes. However, I think 
> in cloud provider world being able to control resources equals to being able
> to deny these resources when required, so being able to denial of service its clients
> is kind of build-in expectation that everyone just agrees on.  
> 

Just a thought, but I wouldn't discard availability guarantees like that
at some point. As a client I would certainly like it, and if it's good
for business...

>>
>>>>
>>>> If you put it this way, you get to participate in a well understood
>>>> problem space instead of constantly saying "yes but CC is special".  And
>>>> further, you will now talk about features as opposed to fixing bugs.
>>>> Which will stop annoying people who currently seem annoyed by the
>>>> implication that their code is buggy simply because it does not cache in
>>>> memory all data read from hardware. Finally, you then don't really need
>>>> to explain why e.g. DoS is not a problem but info leak is a problem - when
>>>> for many users it's actually the reverse - the reason is not that it's
>>>> not part of a threat model - which then makes you work hard to define
>>>> the threat model - but simply that CC hardware does not support this
>>>> kind of hardening.
>>>
>>> But this won't be correct statement, because it is not limitation of HW, but the
>>> threat and business model that Confidential Computing exists in. I am not
>>> aware of a single cloud provider who would be willing to use the HW that
>>> takes the full control of their infrastructure and running confidential guests,
>>> leaving them with no mechanisms to control the load balancing, enforce
>>> resource usage, etc. So, given that nobody needs/willing to use such HW,
>>> such HW simply doesn’t exist.
>>>
>>> So, I would still say that the model we operate in CoCo usecases is somewhat
>>> special, but I do agree that given that we list a couple of these special
>> assumptions
>>> (over which ones we have no control or ability to influence, none of us are
>> business
>>> people), then the rest becomes just careful enumeration of attack surface
>> interfaces
>>> and break up of potential mitigations.
>>>
>>> Best Regards,
>>> Elena.
>>>
>>
>> I'd say each business has a slightly different business model, no?
>> Finding common ground is what helps us share code ...
> 
> Fully agree, and a good discussion with everyone willing to listen and cooperate
> can go a long way into defining the best implementation. 
> 
> Best Regards,
> Elena. 

Thanks for sharing the threat model with the list!

Carlos

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-27 19:24             ` James Bottomley
@ 2023-01-30  7:42               ` Reshetova, Elena
  2023-01-30 12:40                 ` James Bottomley
  0 siblings, 1 reply; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-30  7:42 UTC (permalink / raw)
  To: jejb, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

 On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > > wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > > wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > > using them right away even without any special HW (readme
> > > > > > > > has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches
> > > > > > > based on these tools, you HAVE TO document that.  Otherwise
> > > > > > > we think you all are crazy and will get your patches
> > > > > > > rejected.  You all know this, why ignore it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > > linux kernel when we are submitting a fix that we have to
> > > > > > list the way how it has been found. We will fix this in the
> > > > > > future submissions, but some bugs we have are found by
> > > > > > plain code audit, so 'human' is the tool.
> > > > > My problem with that statement is that by applying different
> > > > > threat model you "invent" bugs which didn't exist in a first
> > > > > place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled
> > > > > correct behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@linux.intel.com/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled
> > > > page fault (such as in that case) or detection of a KASAN out of
> > > > bounds violation (as it is in some other cases we already have
> > > > fixes or investigating) it represents a correct behavior even if
> > > > you expect that all your pci HW devices are trusted?
> > >
> > > This is exactly what I said. You presented me the cases which exist
> > > in your invented world. Mentioned unhandled page fault doesn't
> > > exist in real world. If PCI device doesn't work, it needs to be
> > > replaced/blocked and not left to be operable and accessible from
> > > the kernel/user.
> >
> > Can we really assure correct operation of *all* pci devices out
> > there? How would such an audit be performed given a huge set of them
> > available? Isnt it better instead to make a small fix in the kernel
> > behavior that would guard us from such potentially not correctly
> > operating devices?
> 
> I think this is really the wrong question from the confidential
> computing (CC) point of view.  The question shouldn't be about assuring
> that the PCI device is operating completely correctly all the time (for
> some value of correct).  It's if it were programmed to be malicious
> what could it do to us?  

Sure, but Leon didn’t agree with CC threat model to begin with, so 
I was trying to argue here how this fix can be useful for non-CC threat 
model case. But obviously my argument for non-CC case wasn't good (
especially reading Ted's reply here 
https://lore.kernel.org/all/Y9Lonw9HzlosUPnS@mit.edu/ ), so I better
stick to CC threat model case indeed. 

>If we take all DoS and Crash outcomes off the
> table (annoying but harmless if they don't reveal the confidential
> contents), we're left with it trying to extract secrets from the
> confidential environment.

Yes, this is the ultimate end goal. 

> 
> The big threat from most devices (including the thunderbolt classes) is
> that they can DMA all over memory.  However, this isn't really a threat
> in CC (well until PCI becomes able to do encrypted DMA) because the
> device has specific unencrypted buffers set aside for the expected DMA.
> If it writes outside that CC integrity will detect it and if it reads
> outside that it gets unintelligible ciphertext.  So we're left with the
> device trying to trick secrets out of us by returning unexpected data.

Yes, by supplying the input that hasn’t been expected. This is exactly 
the case we were trying to fix here for example:
https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/
I do agree that this case is less severe when others where memory
corruption/buffer overrun can happen, like here:
https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/
But we are trying to fix all issues we see now (prioritizing the second ones
though). 

> 
> If I set this as the problem, verifying device correct operation is a
> possible solution (albeit hugely expensive) but there are likely many
> other cheaper ways to defeat or detect a device trying to trick us into
> revealing something.

What do you have in mind here for the actual devices we need to enable for CC cases?
We have been using here a combination of extensive fuzzing and static code analysis.

Best Regards,
Elena.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-30  7:42               ` Reshetova, Elena
@ 2023-01-30 12:40                 ` James Bottomley
  2023-01-31 11:31                   ` Reshetova, Elena
  0 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2023-01-30 12:40 UTC (permalink / raw)
  To: Reshetova, Elena, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
[...]
> > The big threat from most devices (including the thunderbolt
> > classes) is that they can DMA all over memory.  However, this isn't
> > really a threat in CC (well until PCI becomes able to do encrypted
> > DMA) because the device has specific unencrypted buffers set aside
> > for the expected DMA. If it writes outside that CC integrity will
> > detect it and if it reads outside that it gets unintelligible
> > ciphertext.  So we're left with the device trying to trick secrets
> > out of us by returning unexpected data.
> 
> Yes, by supplying the input that hasn’t been expected. This is
> exactly the case we were trying to fix here for example:
> https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/
> I do agree that this case is less severe when others where memory
> corruption/buffer overrun can happen, like here:
> https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/
> But we are trying to fix all issues we see now (prioritizing the
> second ones though). 

I don't see how MSI table sizing is a bug in the category we've
defined.  The very text of the changelog says "resulting in a kernel
page fault in pci_write_msg_msix()."  which is a crash, which I thought
we were agreeing was out of scope for CC attacks?

> > 
> > If I set this as the problem, verifying device correct operation is
> > a possible solution (albeit hugely expensive) but there are likely
> > many other cheaper ways to defeat or detect a device trying to
> > trick us into revealing something.
> 
> What do you have in mind here for the actual devices we need to
> enable for CC cases?

Well, the most dangerous devices seem to be the virtio set a CC system
will rely on to boot up.  After that, there are other ways (like SPDM)
to verify a real PCI device is on the other end of the transaction.

> We have been using here a combination of extensive fuzzing and static
> code analysis.

by fuzzing, I assume you mean fuzzing from the PCI configuration space?
Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
off the table because fuzzing primarily triggers those so its hard to
see what else it could detect given the signal will be smothered by
oopses and secondly I think the PCI interface is likely the wrong place
to begin and you should probably begin on the virtio bus and the
hypervisor generated configuration space.

James


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-30 12:40                 ` James Bottomley
@ 2023-01-31 11:31                   ` Reshetova, Elena
  2023-01-31 13:28                     ` James Bottomley
  2023-02-02 14:51                     ` Jeremi Piotrowski
  0 siblings, 2 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-31 11:31 UTC (permalink / raw)
  To: jejb, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

> On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> [...]
> > > The big threat from most devices (including the thunderbolt
> > > classes) is that they can DMA all over memory.  However, this isn't
> > > really a threat in CC (well until PCI becomes able to do encrypted
> > > DMA) because the device has specific unencrypted buffers set aside
> > > for the expected DMA. If it writes outside that CC integrity will
> > > detect it and if it reads outside that it gets unintelligible
> > > ciphertext.  So we're left with the device trying to trick secrets
> > > out of us by returning unexpected data.
> >
> > Yes, by supplying the input that hasn’t been expected. This is
> > exactly the case we were trying to fix here for example:
> > https://lore.kernel.org/all/20230119170633.40944-2-
> alexander.shishkin@linux.intel.com/
> > I do agree that this case is less severe when others where memory
> > corruption/buffer overrun can happen, like here:
> > https://lore.kernel.org/all/20230119135721.83345-6-
> alexander.shishkin@linux.intel.com/
> > But we are trying to fix all issues we see now (prioritizing the
> > second ones though).
> 
> I don't see how MSI table sizing is a bug in the category we've
> defined.  The very text of the changelog says "resulting in a kernel
> page fault in pci_write_msg_msix()."  which is a crash, which I thought
> we were agreeing was out of scope for CC attacks?

As I said this is an example of a crash and on the first look
might not lead to the exploitable condition (albeit attackers are creative).
But we noticed this one while fuzzing and it was common enough
that prevented fuzzer going deeper into the virtio devices driver fuzzing.
The core PCI/MSI doesn’t seem to have that many easily triggerable 
Other examples in virtio patchset are more severe. 

> 
> > >
> > > If I set this as the problem, verifying device correct operation is
> > > a possible solution (albeit hugely expensive) but there are likely
> > > many other cheaper ways to defeat or detect a device trying to
> > > trick us into revealing something.
> >
> > What do you have in mind here for the actual devices we need to
> > enable for CC cases?
> 
> Well, the most dangerous devices seem to be the virtio set a CC system
> will rely on to boot up.  After that, there are other ways (like SPDM)
> to verify a real PCI device is on the other end of the transaction.

Yes, it the future, but not yet. Other vendors will not necessary be 
using virtio devices at this point, so we will have non-virtio and not
CC enabled devices that we want to securely add to the guest.

> 
> > We have been using here a combination of extensive fuzzing and static
> > code analysis.
> 
> by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> off the table because fuzzing primarily triggers those

If you enable memory sanitizers you can detect more server conditions like
out of bounds accesses and such. I think given that we have a way to 
verify that fuzzing is reaching the code locations we want it to reach, it
can be pretty effective method to find at least low-hanging bugs. And these
will be the bugs that most of the attackers will go after at the first place. 
But of course it is not a formal verification of any kind.

 so its hard to
> see what else it could detect given the signal will be smothered by
> oopses and secondly I think the PCI interface is likely the wrong place
> to begin and you should probably begin on the virtio bus and the
> hypervisor generated configuration space.

This is exactly what we do. We don’t fuzz from the PCI config space,
we supply inputs from the host/vmm via the legitimate interfaces that it can 
inject them to the guest: whenever guest requests a pci config space
(which is controlled by host/hypervisor as you said) read operation, 
it gets input injected by the kafl fuzzer.  Same for other interfaces that 
are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
via #VE handler in our case). When it comes to virtio, we employ 
two different fuzzing techniques: directly injecting kafl fuzz input when
virtio core or virtio drivers gets the data received from the host 
(via injecting input in functions virtio16/32/64_to_cpu and others) and 
directly fuzzing DMA memory pages using kfx fuzzer. 
More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 11:31                   ` Reshetova, Elena
@ 2023-01-31 13:28                     ` James Bottomley
  2023-01-31 15:14                       ` Christophe de Dinechin
  2023-01-31 16:34                       ` Reshetova, Elena
  2023-02-02 14:51                     ` Jeremi Piotrowski
  1 sibling, 2 replies; 39+ messages in thread
From: James Bottomley @ 2023-01-31 13:28 UTC (permalink / raw)
  To: Reshetova, Elena, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > [...]
> > > > The big threat from most devices (including the thunderbolt
> > > > classes) is that they can DMA all over memory.  However, this
> > > > isn't really a threat in CC (well until PCI becomes able to do
> > > > encrypted DMA) because the device has specific unencrypted
> > > > buffers set aside for the expected DMA. If it writes outside
> > > > that CC integrity will detect it and if it reads outside that
> > > > it gets unintelligible ciphertext.  So we're left with the
> > > > device trying to trick secrets out of us by returning
> > > > unexpected data.
> > > 
> > > Yes, by supplying the input that hasn’t been expected. This is
> > > exactly the case we were trying to fix here for example:
> > > https://lore.kernel.org/all/20230119170633.40944-2-
> > alexander.shishkin@linux.intel.com/
> > > I do agree that this case is less severe when others where memory
> > > corruption/buffer overrun can happen, like here:
> > > https://lore.kernel.org/all/20230119135721.83345-6-
> > alexander.shishkin@linux.intel.com/
> > > But we are trying to fix all issues we see now (prioritizing the
> > > second ones though).
> > 
> > I don't see how MSI table sizing is a bug in the category we've
> > defined.  The very text of the changelog says "resulting in a
> > kernel page fault in pci_write_msg_msix()."  which is a crash,
> > which I thought we were agreeing was out of scope for CC attacks?
> 
> As I said this is an example of a crash and on the first look
> might not lead to the exploitable condition (albeit attackers are
> creative). But we noticed this one while fuzzing and it was common
> enough that prevented fuzzer going deeper into the virtio devices
> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
> easily triggerable Other examples in virtio patchset are more severe.

You cited this as your example.  I'm pointing out it seems to be an
event of the class we've agreed not to consider because it's an oops
not an exploit.  If there are examples of fixing actual exploits to CC
VMs, what are they?

This patch is, however, an example of the problem everyone else on the
thread is complaining about: a patch which adds an unnecessary check to
the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
in the real world the tables are correct (or the manufacturer is
quickly chastened), so it adds overhead to no benefit.


[...]
> > see what else it could detect given the signal will be smothered by
> > oopses and secondly I think the PCI interface is likely the wrong
> > place to begin and you should probably begin on the virtio bus and
> > the hypervisor generated configuration space.
> 
> This is exactly what we do. We don’t fuzz from the PCI config space,
> we supply inputs from the host/vmm via the legitimate interfaces that
> it can inject them to the guest: whenever guest requests a pci config
> space (which is controlled by host/hypervisor as you said) read
> operation, it gets input injected by the kafl fuzzer.  Same for other
> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
> anything that goes via #VE handler in our case). When it comes to
> virtio, we employ  two different fuzzing techniques: directly
> injecting kafl fuzz input when virtio core or virtio drivers gets the
> data received from the host (via injecting input in functions
> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> pages using kfx fuzzer. More information can be found in 
> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing

Given that we previously agreed that oppses and other DoS attacks are
out of scope for CC, I really don't think fuzzing, which primarily
finds oopses, is at all a useful tool unless you filter the results by
the question "could we exploit this in a CC VM to reveal secrets". 
Without applying that filter you're sending a load of patches which
don't really do much to reduce the CC attack surface and which do annoy
non-CC people because they add pointless checks to things they expect
the cards and config tables to get right.

James


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 13:28                     ` James Bottomley
@ 2023-01-31 15:14                       ` Christophe de Dinechin
  2023-01-31 17:39                         ` Michael S. Tsirkin
  2023-02-01 10:24                         ` Christophe de Dinechin
  2023-01-31 16:34                       ` Reshetova, Elena
  1 sibling, 2 replies; 39+ messages in thread
From: Christophe de Dinechin @ 2023-01-31 15:14 UTC (permalink / raw)
  To: jejb
  Cc: Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin,
	Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen,
	Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner,
	Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe,
	Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda,
	keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening


On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote...
> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
>> > [...]
>> > > > The big threat from most devices (including the thunderbolt
>> > > > classes) is that they can DMA all over memory.  However, this
>> > > > isn't really a threat in CC (well until PCI becomes able to do
>> > > > encrypted DMA) because the device has specific unencrypted
>> > > > buffers set aside for the expected DMA. If it writes outside
>> > > > that CC integrity will detect it and if it reads outside that
>> > > > it gets unintelligible ciphertext.  So we're left with the
>> > > > device trying to trick secrets out of us by returning
>> > > > unexpected data.
>> > >
>> > > Yes, by supplying the input that hasn’t been expected. This is
>> > > exactly the case we were trying to fix here for example:
>> > > https://lore.kernel.org/all/20230119170633.40944-2-
>> > alexander.shishkin@linux.intel.com/
>> > > I do agree that this case is less severe when others where memory
>> > > corruption/buffer overrun can happen, like here:
>> > > https://lore.kernel.org/all/20230119135721.83345-6-
>> > alexander.shishkin@linux.intel.com/
>> > > But we are trying to fix all issues we see now (prioritizing the
>> > > second ones though).
>> >
>> > I don't see how MSI table sizing is a bug in the category we've
>> > defined.  The very text of the changelog says "resulting in a
>> > kernel page fault in pci_write_msg_msix()."  which is a crash,
>> > which I thought we were agreeing was out of scope for CC attacks?
>>
>> As I said this is an example of a crash and on the first look
>> might not lead to the exploitable condition (albeit attackers are
>> creative). But we noticed this one while fuzzing and it was common
>> enough that prevented fuzzer going deeper into the virtio devices
>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
>> easily triggerable Other examples in virtio patchset are more severe.
>
> You cited this as your example.  I'm pointing out it seems to be an
> event of the class we've agreed not to consider because it's an oops
> not an exploit.  If there are examples of fixing actual exploits to CC
> VMs, what are they?
>
> This patch is, however, an example of the problem everyone else on the
> thread is complaining about: a patch which adds an unnecessary check to
> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
> in the real world the tables are correct (or the manufacturer is
> quickly chastened), so it adds overhead to no benefit.

I'd like to backtrack a little here.


1/ PCI-as-a-thread, where does it come from?

On physical devices, we have to assume that the device is working. As other
pointed out, there are things like PCI compliance tests, etc. So Linux has
to trust the device. You could manufacture a broken device intentionally,
but the value you would get from that would be limited.

On a CC system, the "PCI" values are really provided by the hypervisor,
which is not trusted. This leads to this peculiar way of thinking where we
say "what happens if virtual device feeds us a bogus value *intentionally*".
We cannot assume that the *virtual* PCI device ran through the compliance
tests. Instead, we see the PCI interface as hostile, which makes us look
like weirdos to the rest of the community.

Consequently, as James pointed out, we first need to focus on consequences
that would break what I would call the "CC promise", which is essentially
that we'd rather kill the guest than reveal its secrets. Unless you have a
credible path to a secret being revealed, don't bother "fixing" a bug. And
as was pointed out elsewhere in this thread, caching has a cost, so you
can't really use the "optimization" angle either.


2/ Clarification of the "CC promise" and value proposition

Based on the above, the very first thing is to clarify that "CC promise",
because if exchanges on this thread have proved anything, it is that it's
quite unclear to anyone outside the "CoCo world".

The Linux Guest Kernel Security Specification needs to really elaborate on
what the value proposition of CC is, not assume it is a given. "Bug fixes"
before this value proposition has been understood and accepted by the
non-CoCo community are likely to go absolutely nowhere.

Here is a quick proposal for the Purpose and Scope section:

<doc>
Purpose and Scope

Confidential Computing (CC) is a set of technologies that allows a guest to
run without having to trust either the hypervisor or the host. CC offers two
new guarantees to the guest compared to the non-CC case:

a) The guest will be able to measure and attest, by cryptographic means, the
   guest software stack that it is running, and be assured that this
   software stack cannot be tampered with by the host or the hypervisor
   after it was measured. The root of trust for this aspect of CC is
   typically the CPU manufacturer (e.g. through a private key that can be
   used to respond to cryptographic challenges).

b) Guest state, including memory, become secrets which must remain
   inaccessible to the host. In a CC context, it is considered preferable to
   stop or kill a guest rather than risk leaking its secrets. This aspect of
   CC is typically enforced by means such as memory encryption and new
   semantics for memory protection.

CC leads to a different threat model for a Linux kernel running as a guest
inside a confidential virtual machine (CVM). Notably, whereas the machine
(CPU, I/O devices, etc) is usually considered as trustworthy, in the CC
case, the hypervisor emulating some aspects of the virtual machine is now
considered as potentially malicious. Consequently, effects of any data
provided by the guest to the hypervisor, including ACPI configuration
tables, MMIO interfaces or machine specific registers (MSRs) need to be
re-evaluated.

This document describes the security architecture of the Linux guest kernel
running inside a CVM, with a particular focus on the Intel TDX
implementation. Many aspects of this document will be applicable to other
CC implementations such as AMD SEV.

Aspects of the guest-visible state that are under direct control of the
hardware, such as the CPU state or memory protection, will be considered as
being handled by the CC implementations. This document will therefore only
focus on aspects of the virtual machine that are typically managed by the
hypervisor or the host.

Since the host ultimately owns the resources and can allocate them at will,
including denying their use at any point, this document will not address
denial or service or performance degradation. It will however cover random
number generation, which is central for cryptographic security.

Finally, security considerations that apply irrespective of whether the
platform is confidential or not are also outside of the scope of this
document. This includes topics ranging from timing attacks to social
engineering.
</doc>

Feel free to comment and reword at will ;-)


3/ PCI-as-a-threat: where does that come from

Isn't there a fundamental difference, from a threat model perspective,
between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
should defeat) and compromised software feeding us bad data? I think there
is: at leats inside the TCB, we can detect bad software using measurements,
and prevent it from running using attestation.  In other words, we first
check what we will run, then we run it. The security there is that we know
what we are running. The trust we have in the software is from testing,
reviewing or using it.

This relies on a key aspect provided by TDX and SEV, which is that the
software being measured is largely tamper-resistant thanks to memory
encryption. In other words, after you have measured your guest software
stack, the host or hypervisor cannot willy-nilly change it.

So this brings me to the next question: is there any way we could offer the
same kind of service for KVM and qemu? The measurement part seems relatively
easy. Thetamper-resistant part, on the other hand, seems quite difficult to
me. But maybe someone else will have a brilliant idea?

So I'm asking the question, because if you could somehow prove to the guest
not only that it's running the right guest stack (as we can do today) but
also a known host/KVM/hypervisor stack, we would also switch the potential
issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
this is something which is evidently easier to deal with.

I briefly discussed this with James, and he pointed out two interesting
aspects of that question:

1/ In the CC world, we don't really care about *virtual* PCI devices. We
   care about either virtio devices, or physical ones being passed through
   to the guest. Let's assume physical ones can be trusted, see above.
   That leaves virtio devices. How much damage can a malicious virtio device
   do to the guest kernel, and can this lead to secrets being leaked?

2/ He was not as negative as I anticipated on the possibility of somehow
   being able to prevent tampering of the guest. One example he mentioned is
   a research paper [1] about running the hypervisor itself inside an
   "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
   with TDX using secure enclaves or some other mechanism?


Sorry, this mail is a bit long ;-)


>
>
> [...]
>> > see what else it could detect given the signal will be smothered by
>> > oopses and secondly I think the PCI interface is likely the wrong
>> > place to begin and you should probably begin on the virtio bus and
>> > the hypervisor generated configuration space.
>>
>> This is exactly what we do. We don’t fuzz from the PCI config space,
>> we supply inputs from the host/vmm via the legitimate interfaces that
>> it can inject them to the guest: whenever guest requests a pci config
>> space (which is controlled by host/hypervisor as you said) read
>> operation, it gets input injected by the kafl fuzzer.  Same for other
>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
>> anything that goes via #VE handler in our case). When it comes to
>> virtio, we employ  two different fuzzing techniques: directly
>> injecting kafl fuzz input when virtio core or virtio drivers gets the
>> data received from the host (via injecting input in functions
>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
>> pages using kfx fuzzer. More information can be found in
>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>
> Given that we previously agreed that oppses and other DoS attacks are
> out of scope for CC, I really don't think fuzzing, which primarily
> finds oopses, is at all a useful tool unless you filter the results by
> the question "could we exploit this in a CC VM to reveal secrets".
> Without applying that filter you're sending a load of patches which
> don't really do much to reduce the CC attack surface and which do annoy
> non-CC people because they add pointless checks to things they expect
> the cards and config tables to get right.

Indeed.

[1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592
--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-01-31 13:28                     ` James Bottomley
  2023-01-31 15:14                       ` Christophe de Dinechin
@ 2023-01-31 16:34                       ` Reshetova, Elena
  2023-01-31 17:49                         ` James Bottomley
  1 sibling, 1 reply; 39+ messages in thread
From: Reshetova, Elena @ 2023-01-31 16:34 UTC (permalink / raw)
  To: jejb, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
> > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > > [...]
> > > > > The big threat from most devices (including the thunderbolt
> > > > > classes) is that they can DMA all over memory.  However, this
> > > > > isn't really a threat in CC (well until PCI becomes able to do
> > > > > encrypted DMA) because the device has specific unencrypted
> > > > > buffers set aside for the expected DMA. If it writes outside
> > > > > that CC integrity will detect it and if it reads outside that
> > > > > it gets unintelligible ciphertext.  So we're left with the
> > > > > device trying to trick secrets out of us by returning
> > > > > unexpected data.
> > > >
> > > > Yes, by supplying the input that hasn’t been expected. This is
> > > > exactly the case we were trying to fix here for example:
> > > > https://lore.kernel.org/all/20230119170633.40944-2-
> > > alexander.shishkin@linux.intel.com/
> > > > I do agree that this case is less severe when others where memory
> > > > corruption/buffer overrun can happen, like here:
> > > > https://lore.kernel.org/all/20230119135721.83345-6-
> > > alexander.shishkin@linux.intel.com/
> > > > But we are trying to fix all issues we see now (prioritizing the
> > > > second ones though).
> > >
> > > I don't see how MSI table sizing is a bug in the category we've
> > > defined.  The very text of the changelog says "resulting in a
> > > kernel page fault in pci_write_msg_msix()."  which is a crash,
> > > which I thought we were agreeing was out of scope for CC attacks?
> >
> > As I said this is an example of a crash and on the first look
> > might not lead to the exploitable condition (albeit attackers are
> > creative). But we noticed this one while fuzzing and it was common
> > enough that prevented fuzzer going deeper into the virtio devices
> > driver fuzzing. The core PCI/MSI doesn’t seem to have that many
> > easily triggerable Other examples in virtio patchset are more severe.
> 
> You cited this as your example.  I'm pointing out it seems to be an
> event of the class we've agreed not to consider because it's an oops
> not an exploit.  If there are examples of fixing actual exploits to CC
> VMs, what are they?
> 
> This patch is, however, an example of the problem everyone else on the
> thread is complaining about: a patch which adds an unnecessary check to
> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
> in the real world the tables are correct (or the manufacturer is
> quickly chastened), so it adds overhead to no benefit.

How can you make sure there is no exploit possible using this crash
as a stepping stone into a CC guest? Or are you saying that we are back 
to the times when we can merge the fixes for crashes and out of bound errors in
kernel only given that we submit a proof of concept exploit with the
patch for every issue? 

> 
> 
> [...]
> > > see what else it could detect given the signal will be smothered by
> > > oopses and secondly I think the PCI interface is likely the wrong
> > > place to begin and you should probably begin on the virtio bus and
> > > the hypervisor generated configuration space.
> >
> > This is exactly what we do. We don’t fuzz from the PCI config space,
> > we supply inputs from the host/vmm via the legitimate interfaces that
> > it can inject them to the guest: whenever guest requests a pci config
> > space (which is controlled by host/hypervisor as you said) read
> > operation, it gets input injected by the kafl fuzzer.  Same for other
> > interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
> > anything that goes via #VE handler in our case). When it comes to
> > virtio, we employ  two different fuzzing techniques: directly
> > injecting kafl fuzz input when virtio core or virtio drivers gets the
> > data received from the host (via injecting input in functions
> > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> > pages using kfx fuzzer. More information can be found in
> > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-
> hardening.html#td-guest-fuzzing
> 
> Given that we previously agreed that oppses and other DoS attacks are
> out of scope for CC, I really don't think fuzzing, which primarily
> finds oopses, is at all a useful tool unless you filter the results by
> the question "could we exploit this in a CC VM to reveal secrets".
> Without applying that filter you're sending a load of patches which
> don't really do much to reduce the CC attack surface and which do annoy
> non-CC people because they add pointless checks to things they expect
> the cards and config tables to get right.

I don’t think we have agreed that random kernel crashes are out of scope in CC threat model
(controlled safe panic is out of scope, but this is not what we have here). 
It all depends if this ops can be used in a successful attack against guest private
memory or not and this is *not* a trivial thing to decide.
That's said, we are mostly focusing on KASAN findings, which
have higher likelihood to be exploitable at least for host -> guest privilege escalation
(which in turn compromised guest private memory confidentiality). Fuzzing has a
long history of find such issues in past (including the ones that have been 
exploited after). But even for this ops bug, can anyone guarantee it cannot be chained
with other ones to cause a more complex privilege escalation attack? I wont be making 
such a claim, I feel it is safer to fix this vs debating whenever it can be used for an
attack or not. 

Best Regards,
Elena.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 15:14                       ` Christophe de Dinechin
@ 2023-01-31 17:39                         ` Michael S. Tsirkin
  2023-02-01 10:52                           ` Christophe de Dinechin Dupont de Dinechin
  2023-02-01 10:24                         ` Christophe de Dinechin
  1 sibling, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2023-01-31 17:39 UTC (permalink / raw)
  To: Christophe de Dinechin
  Cc: jejb, Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman,
	Shishkin, Alexander, Shutemov, Kirill, Kuppuswamy,
	Sathyanarayanan, Kleen, Andi, Hansen, Dave, Thomas Gleixner,
	Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> Finally, security considerations that apply irrespective of whether the
> platform is confidential or not are also outside of the scope of this
> document. This includes topics ranging from timing attacks to social
> engineering.

Why are timing attacks by hypervisor on the guest out of scope?

> </doc>
> 
> Feel free to comment and reword at will ;-)
> 
> 
> 3/ PCI-as-a-threat: where does that come from
> 
> Isn't there a fundamental difference, from a threat model perspective,
> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> should defeat) and compromised software feeding us bad data? I think there
> is: at leats inside the TCB, we can detect bad software using measurements,
> and prevent it from running using attestation.  In other words, we first
> check what we will run, then we run it. The security there is that we know
> what we are running. The trust we have in the software is from testing,
> reviewing or using it.
> 
> This relies on a key aspect provided by TDX and SEV, which is that the
> software being measured is largely tamper-resistant thanks to memory
> encryption. In other words, after you have measured your guest software
> stack, the host or hypervisor cannot willy-nilly change it.
> 
> So this brings me to the next question: is there any way we could offer the
> same kind of service for KVM and qemu? The measurement part seems relatively
> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> me. But maybe someone else will have a brilliant idea?
> 
> So I'm asking the question, because if you could somehow prove to the guest
> not only that it's running the right guest stack (as we can do today) but
> also a known host/KVM/hypervisor stack, we would also switch the potential
> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> this is something which is evidently easier to deal with.

Agree absolutely that's much easier.

> I briefly discussed this with James, and he pointed out two interesting
> aspects of that question:
> 
> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>    care about either virtio devices, or physical ones being passed through
>    to the guest. Let's assume physical ones can be trusted, see above.
>    That leaves virtio devices. How much damage can a malicious virtio device
>    do to the guest kernel, and can this lead to secrets being leaked?
> 
> 2/ He was not as negative as I anticipated on the possibility of somehow
>    being able to prevent tampering of the guest. One example he mentioned is
>    a research paper [1] about running the hypervisor itself inside an
>    "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>    with TDX using secure enclaves or some other mechanism?

Or even just secureboot based root of trust?

-- 
MST


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 16:34                       ` Reshetova, Elena
@ 2023-01-31 17:49                         ` James Bottomley
  0 siblings, 0 replies; 39+ messages in thread
From: James Bottomley @ 2023-01-31 17:49 UTC (permalink / raw)
  To: Reshetova, Elena, Leon Romanovsky
  Cc: Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Michael S. Tsirkin, Jason Wang, Poimboe, Josh, aarcange,
	Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Tue, 2023-01-31 at 16:34 +0000, Reshetova, Elena wrote:
[...]
> > You cited this as your example.  I'm pointing out it seems to be an
> > event of the class we've agreed not to consider because it's an
> > oops not an exploit.  If there are examples of fixing actual
> > exploits to CC VMs, what are they?
> > 
> > This patch is, however, an example of the problem everyone else on
> > the thread is complaining about: a patch which adds an unnecessary
> > check to the MSI subsystem; unnecessary because it doesn't fix a CC
> > exploit and in the real world the tables are correct (or the
> > manufacturer is quickly chastened), so it adds overhead to no
> > benefit.
> 
> How can you make sure there is no exploit possible using this crash
> as a stepping stone into a CC guest?

I'm not, what I'm saying is you haven't proved it can be used to
exfiltrate secrets.  In a world where the PCI device is expected to be
correct, and the non-CC kernel doesn't want to second guess that, there
are loads of lies you can tell to the PCI subsystem that causes a crash
or a hang.  If we fix every one, we end up with a massive patch set and
a huge potential slow down for the non-CC kernel.  If there's no way to
tell what lies might leak data, the fuzzing results are a mass of noise
with no real signal and we can't even quantify by how much (or even if)
we've improved the CC VM attack surface even after we merge the huge
patch set it generates.

>  Or are you saying that we are back to the times when we can merge
> the fixes for crashes and out of bound errors in kernel only given
> that we submit a proof of concept exploit with the patch for every
> issue? 

The PCI people have already said that crashing in the face of bogus
configuration data is expected behaviour, so just generating the crash
doesn't prove there's a problem to be fixed.  That means you do have to
go beyond and demonstrate there could be an information leak in a CC VM
on the back of it, yes.

> > [...]
> > > > see what else it could detect given the signal will be
> > > > smothered by oopses and secondly I think the PCI interface is
> > > > likely the wrong place to begin and you should probably begin
> > > > on the virtio bus and the hypervisor generated configuration
> > > > space.
> > > 
> > > This is exactly what we do. We don’t fuzz from the PCI config
> > > space, we supply inputs from the host/vmm via the legitimate
> > > interfaces that it can inject them to the guest: whenever guest
> > > requests a pci config space (which is controlled by
> > > host/hypervisor as you said) read operation, it gets input
> > > injected by the kafl fuzzer.  Same for other interfaces that are
> > > under control of host/VMM (MSRs, port IO, MMIO, anything that
> > > goes via #VE handler in our case). When it comes to virtio, we
> > > employ  two different fuzzing techniques: directly injecting kafl
> > > fuzz input when virtio core or virtio drivers gets the data
> > > received from the host (via injecting input in functions
> > > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> > > pages using kfx fuzzer. More information can be found in
> > > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-
> > hardening.html#td-guest-fuzzing
> > 
> > Given that we previously agreed that oppses and other DoS attacks
> > are out of scope for CC, I really don't think fuzzing, which
> > primarily finds oopses, is at all a useful tool unless you filter
> > the results by the question "could we exploit this in a CC VM to
> > reveal secrets". Without applying that filter you're sending a load
> > of patches which don't really do much to reduce the CC attack
> > surface and which do annoy non-CC people because they add pointless
> > checks to things they expect the cards and config tables to get
> > right.
> 
> I don’t think we have agreed that random kernel crashes are out of
> scope in CC threat model (controlled safe panic is out of scope, but
> this is not what we have here). 

So perhaps making it a controlled panic in the CC VM, so we can
guarantee no information leak, would be the first place to start?

> It all depends if this ops can be used in a successful attack against
> guest private memory or not and this is *not* a trivial thing to
> decide.

Right, but if you can't decide that, you can't extract the signal from
your fuzzing tool noise.

> That's said, we are mostly focusing on KASAN findings, which
> have higher likelihood to be exploitable at least for host -> guest
> privilege escalation (which in turn compromised guest private memory
> confidentiality). Fuzzing has a long history of find such issues in
> past (including the ones that have been exploited after). But even
> for this ops bug, can anyone guarantee it cannot be chained with
> other ones to cause a more complex privilege escalation attack?
> I wont be making such a claim, I feel it is safer to fix this vs
> debating whenever it can be used for an attack or not. 

The PCI people have already been clear that adding a huge framework of
checks to PCI table parsing simply for the promise it "might possibly"
improve CC VM security is way too much effort for too little result. 
If you can hone that down to a few places where you can show it will
prevent a CC information leak, I'm sure they'll be more receptive. 
Telling them to disprove your assertion that there might be an exploit
here isn't going to make them change their minds.

James


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 15:14                       ` Christophe de Dinechin
  2023-01-31 17:39                         ` Michael S. Tsirkin
@ 2023-02-01 10:24                         ` Christophe de Dinechin
  1 sibling, 0 replies; 39+ messages in thread
From: Christophe de Dinechin @ 2023-02-01 10:24 UTC (permalink / raw)
  To: jejb
  Cc: Reshetova, Elena, Leon Romanovsky, Greg Kroah-Hartman, Shishkin,
	Alexander, Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen,
	Andi, Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner,
	Lukas, Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe,
	Josh, aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda,
	keescook, James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

I typoed a lot in this email...


On 2023-01-31 at 16:14 +01, Christophe de Dinechin <dinechin@redhat.com> wrote...
> On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote...
>> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
>>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
>>> > [...]
>>> > > > The big threat from most devices (including the thunderbolt
>>> > > > classes) is that they can DMA all over memory.  However, this
>>> > > > isn't really a threat in CC (well until PCI becomes able to do
>>> > > > encrypted DMA) because the device has specific unencrypted
>>> > > > buffers set aside for the expected DMA. If it writes outside
>>> > > > that CC integrity will detect it and if it reads outside that
>>> > > > it gets unintelligible ciphertext.  So we're left with the
>>> > > > device trying to trick secrets out of us by returning
>>> > > > unexpected data.
>>> > >
>>> > > Yes, by supplying the input that hasn’t been expected. This is
>>> > > exactly the case we were trying to fix here for example:
>>> > > https://lore.kernel.org/all/20230119170633.40944-2-
>>> > alexander.shishkin@linux.intel.com/
>>> > > I do agree that this case is less severe when others where memory
>>> > > corruption/buffer overrun can happen, like here:
>>> > > https://lore.kernel.org/all/20230119135721.83345-6-
>>> > alexander.shishkin@linux.intel.com/
>>> > > But we are trying to fix all issues we see now (prioritizing the
>>> > > second ones though).
>>> >
>>> > I don't see how MSI table sizing is a bug in the category we've
>>> > defined.  The very text of the changelog says "resulting in a
>>> > kernel page fault in pci_write_msg_msix()."  which is a crash,
>>> > which I thought we were agreeing was out of scope for CC attacks?
>>>
>>> As I said this is an example of a crash and on the first look
>>> might not lead to the exploitable condition (albeit attackers are
>>> creative). But we noticed this one while fuzzing and it was common
>>> enough that prevented fuzzer going deeper into the virtio devices
>>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
>>> easily triggerable Other examples in virtio patchset are more severe.
>>
>> You cited this as your example.  I'm pointing out it seems to be an
>> event of the class we've agreed not to consider because it's an oops
>> not an exploit.  If there are examples of fixing actual exploits to CC
>> VMs, what are they?
>>
>> This patch is, however, an example of the problem everyone else on the
>> thread is complaining about: a patch which adds an unnecessary check to
>> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
>> in the real world the tables are correct (or the manufacturer is
>> quickly chastened), so it adds overhead to no benefit.
>
> I'd like to backtrack a little here.
>
>
> 1/ PCI-as-a-thread, where does it come from?

PCI-as-a-threat

>
> On physical devices, we have to assume that the device is working. As other
> pointed out, there are things like PCI compliance tests, etc. So Linux has
> to trust the device. You could manufacture a broken device intentionally,
> but the value you would get from that would be limited.
>
> On a CC system, the "PCI" values are really provided by the hypervisor,
> which is not trusted. This leads to this peculiar way of thinking where we
> say "what happens if virtual device feeds us a bogus value *intentionally*".
> We cannot assume that the *virtual* PCI device ran through the compliance
> tests. Instead, we see the PCI interface as hostile, which makes us look
> like weirdos to the rest of the community.
>
> Consequently, as James pointed out, we first need to focus on consequences
> that would break what I would call the "CC promise", which is essentially
> that we'd rather kill the guest than reveal its secrets. Unless you have a
> credible path to a secret being revealed, don't bother "fixing" a bug. And
> as was pointed out elsewhere in this thread, caching has a cost, so you
> can't really use the "optimization" angle either.
>
>
> 2/ Clarification of the "CC promise" and value proposition
>
> Based on the above, the very first thing is to clarify that "CC promise",
> because if exchanges on this thread have proved anything, it is that it's
> quite unclear to anyone outside the "CoCo world".
>
> The Linux Guest Kernel Security Specification needs to really elaborate on
> what the value proposition of CC is, not assume it is a given. "Bug fixes"
> before this value proposition has been understood and accepted by the
> non-CoCo community are likely to go absolutely nowhere.
>
> Here is a quick proposal for the Purpose and Scope section:
>
> <doc>
> Purpose and Scope
>
> Confidential Computing (CC) is a set of technologies that allows a guest to
> run without having to trust either the hypervisor or the host. CC offers two
> new guarantees to the guest compared to the non-CC case:
>
> a) The guest will be able to measure and attest, by cryptographic means, the
>    guest software stack that it is running, and be assured that this
>    software stack cannot be tampered with by the host or the hypervisor
>    after it was measured. The root of trust for this aspect of CC is
>    typically the CPU manufacturer (e.g. through a private key that can be
>    used to respond to cryptographic challenges).
>
> b) Guest state, including memory, become secrets which must remain
>    inaccessible to the host. In a CC context, it is considered preferable to
>    stop or kill a guest rather than risk leaking its secrets. This aspect of
>    CC is typically enforced by means such as memory encryption and new
>    semantics for memory protection.
>
> CC leads to a different threat model for a Linux kernel running as a guest
> inside a confidential virtual machine (CVM). Notably, whereas the machine
> (CPU, I/O devices, etc) is usually considered as trustworthy, in the CC
> case, the hypervisor emulating some aspects of the virtual machine is now
> considered as potentially malicious. Consequently, effects of any data
> provided by the guest to the hypervisor, including ACPI configuration

to the guest by the hypervisor

> tables, MMIO interfaces or machine specific registers (MSRs) need to be
> re-evaluated.
>
> This document describes the security architecture of the Linux guest kernel
> running inside a CVM, with a particular focus on the Intel TDX
> implementation. Many aspects of this document will be applicable to other
> CC implementations such as AMD SEV.
>
> Aspects of the guest-visible state that are under direct control of the
> hardware, such as the CPU state or memory protection, will be considered as
> being handled by the CC implementations. This document will therefore only
> focus on aspects of the virtual machine that are typically managed by the
> hypervisor or the host.
>
> Since the host ultimately owns the resources and can allocate them at will,
> including denying their use at any point, this document will not address
> denial or service or performance degradation. It will however cover random
> number generation, which is central for cryptographic security.
>
> Finally, security considerations that apply irrespective of whether the
> platform is confidential or not are also outside of the scope of this
> document. This includes topics ranging from timing attacks to social
> engineering.
> </doc>
>
> Feel free to comment and reword at will ;-)
>
>
> 3/ PCI-as-a-threat: where does that come from

3/ Can we shift from "malicious" hypervisor/host input to "bogus" input?

>
> Isn't there a fundamental difference, from a threat model perspective,
> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> should defeat) and compromised software feeding us bad data? I think there
> is: at leats inside the TCB, we can detect bad software using measurements,
> and prevent it from running using attestation.  In other words, we first
> check what we will run, then we run it. The security there is that we know
> what we are running. The trust we have in the software is from testing,
> reviewing or using it.
>
> This relies on a key aspect provided by TDX and SEV, which is that the
> software being measured is largely tamper-resistant thanks to memory
> encryption. In other words, after you have measured your guest software
> stack, the host or hypervisor cannot willy-nilly change it.
>
> So this brings me to the next question: is there any way we could offer the
> same kind of service for KVM and qemu? The measurement part seems relatively
> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> me. But maybe someone else will have a brilliant idea?
>
> So I'm asking the question, because if you could somehow prove to the guest
> not only that it's running the right guest stack (as we can do today) but
> also a known host/KVM/hypervisor stack, we would also switch the potential
> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> this is something which is evidently easier to deal with.
>
> I briefly discussed this with James, and he pointed out two interesting
> aspects of that question:
>
> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>    care about either virtio devices, or physical ones being passed through
>    to the guest. Let's assume physical ones can be trusted, see above.
>    That leaves virtio devices. How much damage can a malicious virtio device
>    do to the guest kernel, and can this lead to secrets being leaked?
>
> 2/ He was not as negative as I anticipated on the possibility of somehow
>    being able to prevent tampering of the guest. One example he mentioned is
>    a research paper [1] about running the hypervisor itself inside an
>    "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>    with TDX using secure enclaves or some other mechanism?
>
>
> Sorry, this mail is a bit long ;-)

and was a bit rushed too...

>
>
>>
>>
>> [...]
>>> > see what else it could detect given the signal will be smothered by
>>> > oopses and secondly I think the PCI interface is likely the wrong
>>> > place to begin and you should probably begin on the virtio bus and
>>> > the hypervisor generated configuration space.
>>>
>>> This is exactly what we do. We don’t fuzz from the PCI config space,
>>> we supply inputs from the host/vmm via the legitimate interfaces that
>>> it can inject them to the guest: whenever guest requests a pci config
>>> space (which is controlled by host/hypervisor as you said) read
>>> operation, it gets input injected by the kafl fuzzer.  Same for other
>>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
>>> anything that goes via #VE handler in our case). When it comes to
>>> virtio, we employ  two different fuzzing techniques: directly
>>> injecting kafl fuzz input when virtio core or virtio drivers gets the
>>> data received from the host (via injecting input in functions
>>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
>>> pages using kfx fuzzer. More information can be found in
>>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>>
>> Given that we previously agreed that oppses and other DoS attacks are
>> out of scope for CC, I really don't think fuzzing, which primarily
>> finds oopses, is at all a useful tool unless you filter the results by
>> the question "could we exploit this in a CC VM to reveal secrets".
>> Without applying that filter you're sending a load of patches which
>> don't really do much to reduce the CC attack surface and which do annoy
>> non-CC people because they add pointless checks to things they expect
>> the cards and config tables to get right.
>
> Indeed.
>
> [1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592


--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 17:39                         ` Michael S. Tsirkin
@ 2023-02-01 10:52                           ` Christophe de Dinechin Dupont de Dinechin
  2023-02-01 11:01                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Christophe de Dinechin Dupont de Dinechin @ 2023-02-01 10:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena,
	Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen,
	Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening



> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> Finally, security considerations that apply irrespective of whether the
>> platform is confidential or not are also outside of the scope of this
>> document. This includes topics ranging from timing attacks to social
>> engineering.
> 
> Why are timing attacks by hypervisor on the guest out of scope?

Good point.

I was thinking that mitigation against timing attacks is the same
irrespective of the source of the attack. However, because the HV
controls CPU time allocation, there are presumably attacks that
are made much easier through the HV. Those should be listed.

> 
>> </doc>
>> 
>> Feel free to comment and reword at will ;-)
>> 
>> 
>> 3/ PCI-as-a-threat: where does that come from
>> 
>> Isn't there a fundamental difference, from a threat model perspective,
>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> should defeat) and compromised software feeding us bad data? I think there
>> is: at leats inside the TCB, we can detect bad software using measurements,
>> and prevent it from running using attestation.  In other words, we first
>> check what we will run, then we run it. The security there is that we know
>> what we are running. The trust we have in the software is from testing,
>> reviewing or using it.
>> 
>> This relies on a key aspect provided by TDX and SEV, which is that the
>> software being measured is largely tamper-resistant thanks to memory
>> encryption. In other words, after you have measured your guest software
>> stack, the host or hypervisor cannot willy-nilly change it.
>> 
>> So this brings me to the next question: is there any way we could offer the
>> same kind of service for KVM and qemu? The measurement part seems relatively
>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> me. But maybe someone else will have a brilliant idea?
>> 
>> So I'm asking the question, because if you could somehow prove to the guest
>> not only that it's running the right guest stack (as we can do today) but
>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> this is something which is evidently easier to deal with.
> 
> Agree absolutely that's much easier.
> 
>> I briefly discussed this with James, and he pointed out two interesting
>> aspects of that question:
>> 
>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>>   care about either virtio devices, or physical ones being passed through
>>   to the guest. Let's assume physical ones can be trusted, see above.
>>   That leaves virtio devices. How much damage can a malicious virtio device
>>   do to the guest kernel, and can this lead to secrets being leaked?
>> 
>> 2/ He was not as negative as I anticipated on the possibility of somehow
>>   being able to prevent tampering of the guest. One example he mentioned is
>>   a research paper [1] about running the hypervisor itself inside an
>>   "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>>   with TDX using secure enclaves or some other mechanism?
> 
> Or even just secureboot based root of trust?

You mean host secureboot? Or guest?

If it’s host, then the problem is detecting malicious tampering with
host code (whether it’s kernel or hypervisor).

If it’s guest, at the moment at least, the measurements do not extend
beyond the TCB.

> 
> -- 
> MST
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-02-01 10:52                           ` Christophe de Dinechin Dupont de Dinechin
@ 2023-02-01 11:01                             ` Michael S. Tsirkin
  2023-02-01 13:15                               ` Christophe de Dinechin Dupont de Dinechin
  2023-02-02  3:24                               ` Jason Wang
  0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2023-02-01 11:01 UTC (permalink / raw)
  To: Christophe de Dinechin Dupont de Dinechin
  Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena,
	Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen,
	Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> 
> 
> > On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >> Finally, security considerations that apply irrespective of whether the
> >> platform is confidential or not are also outside of the scope of this
> >> document. This includes topics ranging from timing attacks to social
> >> engineering.
> > 
> > Why are timing attacks by hypervisor on the guest out of scope?
> 
> Good point.
> 
> I was thinking that mitigation against timing attacks is the same
> irrespective of the source of the attack. However, because the HV
> controls CPU time allocation, there are presumably attacks that
> are made much easier through the HV. Those should be listed.

Not just that, also because it can and does emulate some devices.
For example, are disk encryption systems protected against timing of
disk accesses?
This is why some people keep saying "forget about emulated devices, require
passthrough, include devices in the trust zone".

> > 
> >> </doc>
> >> 
> >> Feel free to comment and reword at will ;-)
> >> 
> >> 
> >> 3/ PCI-as-a-threat: where does that come from
> >> 
> >> Isn't there a fundamental difference, from a threat model perspective,
> >> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >> should defeat) and compromised software feeding us bad data? I think there
> >> is: at leats inside the TCB, we can detect bad software using measurements,
> >> and prevent it from running using attestation.  In other words, we first
> >> check what we will run, then we run it. The security there is that we know
> >> what we are running. The trust we have in the software is from testing,
> >> reviewing or using it.
> >> 
> >> This relies on a key aspect provided by TDX and SEV, which is that the
> >> software being measured is largely tamper-resistant thanks to memory
> >> encryption. In other words, after you have measured your guest software
> >> stack, the host or hypervisor cannot willy-nilly change it.
> >> 
> >> So this brings me to the next question: is there any way we could offer the
> >> same kind of service for KVM and qemu? The measurement part seems relatively
> >> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >> me. But maybe someone else will have a brilliant idea?
> >> 
> >> So I'm asking the question, because if you could somehow prove to the guest
> >> not only that it's running the right guest stack (as we can do today) but
> >> also a known host/KVM/hypervisor stack, we would also switch the potential
> >> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >> this is something which is evidently easier to deal with.
> > 
> > Agree absolutely that's much easier.
> > 
> >> I briefly discussed this with James, and he pointed out two interesting
> >> aspects of that question:
> >> 
> >> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >>   care about either virtio devices, or physical ones being passed through
> >>   to the guest. Let's assume physical ones can be trusted, see above.
> >>   That leaves virtio devices. How much damage can a malicious virtio device
> >>   do to the guest kernel, and can this lead to secrets being leaked?
> >> 
> >> 2/ He was not as negative as I anticipated on the possibility of somehow
> >>   being able to prevent tampering of the guest. One example he mentioned is
> >>   a research paper [1] about running the hypervisor itself inside an
> >>   "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >>   with TDX using secure enclaves or some other mechanism?
> > 
> > Or even just secureboot based root of trust?
> 
> You mean host secureboot? Or guest?
> 
> If it’s host, then the problem is detecting malicious tampering with
> host code (whether it’s kernel or hypervisor).

Host.  Lots of existing systems do this.  As an extreme boot a RO disk,
limit which packages are allowed.

> If it’s guest, at the moment at least, the measurements do not extend
> beyond the TCB.
> 
> > 
> > -- 
> > MST
> > 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-02-01 11:01                             ` Michael S. Tsirkin
@ 2023-02-01 13:15                               ` Christophe de Dinechin Dupont de Dinechin
  2023-02-01 16:02                                 ` Michael S. Tsirkin
  2023-02-02  3:24                               ` Jason Wang
  1 sibling, 1 reply; 39+ messages in thread
From: Christophe de Dinechin Dupont de Dinechin @ 2023-02-01 13:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena,
	Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen,
	Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening



> On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>> 
>> 
>>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>>>> Finally, security considerations that apply irrespective of whether the
>>>> platform is confidential or not are also outside of the scope of this
>>>> document. This includes topics ranging from timing attacks to social
>>>> engineering.
>>> 
>>> Why are timing attacks by hypervisor on the guest out of scope?
>> 
>> Good point.
>> 
>> I was thinking that mitigation against timing attacks is the same
>> irrespective of the source of the attack. However, because the HV
>> controls CPU time allocation, there are presumably attacks that
>> are made much easier through the HV. Those should be listed.
> 
> Not just that, also because it can and does emulate some devices.
> For example, are disk encryption systems protected against timing of
> disk accesses?
> This is why some people keep saying "forget about emulated devices, require
> passthrough, include devices in the trust zone".
> 
>>> 
>>>> </doc>
>>>> 
>>>> Feel free to comment and reword at will ;-)
>>>> 
>>>> 
>>>> 3/ PCI-as-a-threat: where does that come from
>>>> 
>>>> Isn't there a fundamental difference, from a threat model perspective,
>>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>>>> should defeat) and compromised software feeding us bad data? I think there
>>>> is: at leats inside the TCB, we can detect bad software using measurements,
>>>> and prevent it from running using attestation.  In other words, we first
>>>> check what we will run, then we run it. The security there is that we know
>>>> what we are running. The trust we have in the software is from testing,
>>>> reviewing or using it.
>>>> 
>>>> This relies on a key aspect provided by TDX and SEV, which is that the
>>>> software being measured is largely tamper-resistant thanks to memory
>>>> encryption. In other words, after you have measured your guest software
>>>> stack, the host or hypervisor cannot willy-nilly change it.
>>>> 
>>>> So this brings me to the next question: is there any way we could offer the
>>>> same kind of service for KVM and qemu? The measurement part seems relatively
>>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>>>> me. But maybe someone else will have a brilliant idea?
>>>> 
>>>> So I'm asking the question, because if you could somehow prove to the guest
>>>> not only that it's running the right guest stack (as we can do today) but
>>>> also a known host/KVM/hypervisor stack, we would also switch the potential
>>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>>>> this is something which is evidently easier to deal with.
>>> 
>>> Agree absolutely that's much easier.
>>> 
>>>> I briefly discussed this with James, and he pointed out two interesting
>>>> aspects of that question:
>>>> 
>>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>>>>  care about either virtio devices, or physical ones being passed through
>>>>  to the guest. Let's assume physical ones can be trusted, see above.
>>>>  That leaves virtio devices. How much damage can a malicious virtio device
>>>>  do to the guest kernel, and can this lead to secrets being leaked?
>>>> 
>>>> 2/ He was not as negative as I anticipated on the possibility of somehow
>>>>  being able to prevent tampering of the guest. One example he mentioned is
>>>>  a research paper [1] about running the hypervisor itself inside an
>>>>  "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>>>>  with TDX using secure enclaves or some other mechanism?
>>> 
>>> Or even just secureboot based root of trust?
>> 
>> You mean host secureboot? Or guest?
>> 
>> If it’s host, then the problem is detecting malicious tampering with
>> host code (whether it’s kernel or hypervisor).
> 
> Host.  Lots of existing systems do this.  As an extreme boot a RO disk,
> limit which packages are allowed.

Is that provable to the guest?

Consider a cloud provider doing that: how do they prove to their guest:

a) What firmware, kernel and kvm they run

b) That what they booted cannot be maliciouly modified, e.g. by a rogue
   device driver installed by a rogue sysadmin

My understanding is that SecureBoot is only intended to prevent non-verified
operating systems from booting. So the proof is given to the cloud provider,
and the proof is that the system boots successfully.

After that, I think all bets are off. SecureBoot does little AFAICT
to prevent malicious modifications of the running system by someone with
root access, including deliberately loading a malicious kvm-zilog.ko

It does not mean it cannot be done, just that I don’t think we
have the tools at the moment.

> 
>> If it’s guest, at the moment at least, the measurements do not extend
>> beyond the TCB.
>> 
>>> 
>>> -- 
>>> MST



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-02-01 13:15                               ` Christophe de Dinechin Dupont de Dinechin
@ 2023-02-01 16:02                                 ` Michael S. Tsirkin
  2023-02-01 17:13                                   ` Christophe de Dinechin
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2023-02-01 16:02 UTC (permalink / raw)
  To: Christophe de Dinechin Dupont de Dinechin
  Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena,
	Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen,
	Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> 
> 
> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >> 
> >> 
> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> 
> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >>>> Finally, security considerations that apply irrespective of whether the
> >>>> platform is confidential or not are also outside of the scope of this
> >>>> document. This includes topics ranging from timing attacks to social
> >>>> engineering.
> >>> 
> >>> Why are timing attacks by hypervisor on the guest out of scope?
> >> 
> >> Good point.
> >> 
> >> I was thinking that mitigation against timing attacks is the same
> >> irrespective of the source of the attack. However, because the HV
> >> controls CPU time allocation, there are presumably attacks that
> >> are made much easier through the HV. Those should be listed.
> > 
> > Not just that, also because it can and does emulate some devices.
> > For example, are disk encryption systems protected against timing of
> > disk accesses?
> > This is why some people keep saying "forget about emulated devices, require
> > passthrough, include devices in the trust zone".
> > 
> >>> 
> >>>> </doc>
> >>>> 
> >>>> Feel free to comment and reword at will ;-)
> >>>> 
> >>>> 
> >>>> 3/ PCI-as-a-threat: where does that come from
> >>>> 
> >>>> Isn't there a fundamental difference, from a threat model perspective,
> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >>>> should defeat) and compromised software feeding us bad data? I think there
> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
> >>>> and prevent it from running using attestation.  In other words, we first
> >>>> check what we will run, then we run it. The security there is that we know
> >>>> what we are running. The trust we have in the software is from testing,
> >>>> reviewing or using it.
> >>>> 
> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
> >>>> software being measured is largely tamper-resistant thanks to memory
> >>>> encryption. In other words, after you have measured your guest software
> >>>> stack, the host or hypervisor cannot willy-nilly change it.
> >>>> 
> >>>> So this brings me to the next question: is there any way we could offer the
> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >>>> me. But maybe someone else will have a brilliant idea?
> >>>> 
> >>>> So I'm asking the question, because if you could somehow prove to the guest
> >>>> not only that it's running the right guest stack (as we can do today) but
> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >>>> this is something which is evidently easier to deal with.
> >>> 
> >>> Agree absolutely that's much easier.
> >>> 
> >>>> I briefly discussed this with James, and he pointed out two interesting
> >>>> aspects of that question:
> >>>> 
> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >>>>  care about either virtio devices, or physical ones being passed through
> >>>>  to the guest. Let's assume physical ones can be trusted, see above.
> >>>>  That leaves virtio devices. How much damage can a malicious virtio device
> >>>>  do to the guest kernel, and can this lead to secrets being leaked?
> >>>> 
> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
> >>>>  being able to prevent tampering of the guest. One example he mentioned is
> >>>>  a research paper [1] about running the hypervisor itself inside an
> >>>>  "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >>>>  with TDX using secure enclaves or some other mechanism?
> >>> 
> >>> Or even just secureboot based root of trust?
> >> 
> >> You mean host secureboot? Or guest?
> >> 
> >> If it’s host, then the problem is detecting malicious tampering with
> >> host code (whether it’s kernel or hypervisor).
> > 
> > Host.  Lots of existing systems do this.  As an extreme boot a RO disk,
> > limit which packages are allowed.
> 
> Is that provable to the guest?
> 
> Consider a cloud provider doing that: how do they prove to their guest:
> 
> a) What firmware, kernel and kvm they run
> 
> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
>    device driver installed by a rogue sysadmin
> 
> My understanding is that SecureBoot is only intended to prevent non-verified
> operating systems from booting. So the proof is given to the cloud provider,
> and the proof is that the system boots successfully.

I think I should have said measured boot not secure boot.

> 
> After that, I think all bets are off. SecureBoot does little AFAICT
> to prevent malicious modifications of the running system by someone with
> root access, including deliberately loading a malicious kvm-zilog.ko

So disable module loading then or don't allow root access?

> 
> It does not mean it cannot be done, just that I don’t think we
> have the tools at the moment.

Phones, chromebooks do this all the time ...

> > 
> >> If it’s guest, at the moment at least, the measurements do not extend
> >> beyond the TCB.
> >> 
> >>> 
> >>> -- 
> >>> MST
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-02-01 16:02                                 ` Michael S. Tsirkin
@ 2023-02-01 17:13                                   ` Christophe de Dinechin
  2023-02-06 18:58                                     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 39+ messages in thread
From: Christophe de Dinechin @ 2023-02-01 17:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James Bottomley, Reshetova, Elena, Leon Romanovsky,
	Greg Kroah-Hartman, Shishkin, Alexander, Shutemov, Kirill,
	Kuppuswamy, Sathyanarayanan, Kleen, Andi, Hansen, Dave,
	Thomas Gleixner, Peter Zijlstra, Wunner, Lukas, Mika Westerberg,
	Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening


On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote...
> On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>
>> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >
>> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>> >>
>> >>
>> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >>>
>> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> >>>> Finally, security considerations that apply irrespective of whether the
>> >>>> platform is confidential or not are also outside of the scope of this
>> >>>> document. This includes topics ranging from timing attacks to social
>> >>>> engineering.
>> >>>
>> >>> Why are timing attacks by hypervisor on the guest out of scope?
>> >>
>> >> Good point.
>> >>
>> >> I was thinking that mitigation against timing attacks is the same
>> >> irrespective of the source of the attack. However, because the HV
>> >> controls CPU time allocation, there are presumably attacks that
>> >> are made much easier through the HV. Those should be listed.
>> >
>> > Not just that, also because it can and does emulate some devices.
>> > For example, are disk encryption systems protected against timing of
>> > disk accesses?
>> > This is why some people keep saying "forget about emulated devices, require
>> > passthrough, include devices in the trust zone".
>> >
>> >>>
>> >>>> </doc>
>> >>>>
>> >>>> Feel free to comment and reword at will ;-)
>> >>>>
>> >>>>
>> >>>> 3/ PCI-as-a-threat: where does that come from
>> >>>>
>> >>>> Isn't there a fundamental difference, from a threat model perspective,
>> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> >>>> should defeat) and compromised software feeding us bad data? I think there
>> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
>> >>>> and prevent it from running using attestation.  In other words, we first
>> >>>> check what we will run, then we run it. The security there is that we know
>> >>>> what we are running. The trust we have in the software is from testing,
>> >>>> reviewing or using it.
>> >>>>
>> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
>> >>>> software being measured is largely tamper-resistant thanks to memory
>> >>>> encryption. In other words, after you have measured your guest software
>> >>>> stack, the host or hypervisor cannot willy-nilly change it.
>> >>>>
>> >>>> So this brings me to the next question: is there any way we could offer the
>> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
>> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> >>>> me. But maybe someone else will have a brilliant idea?
>> >>>>
>> >>>> So I'm asking the question, because if you could somehow prove to the guest
>> >>>> not only that it's running the right guest stack (as we can do today) but
>> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> >>>> this is something which is evidently easier to deal with.
>> >>>
>> >>> Agree absolutely that's much easier.
>> >>>
>> >>>> I briefly discussed this with James, and he pointed out two interesting
>> >>>> aspects of that question:
>> >>>>
>> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>> >>>>  care about either virtio devices, or physical ones being passed through
>> >>>>  to the guest. Let's assume physical ones can be trusted, see above.
>> >>>>  That leaves virtio devices. How much damage can a malicious virtio device
>> >>>>  do to the guest kernel, and can this lead to secrets being leaked?
>> >>>>
>> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
>> >>>>  being able to prevent tampering of the guest. One example he mentioned is
>> >>>>  a research paper [1] about running the hypervisor itself inside an
>> >>>>  "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>> >>>>  with TDX using secure enclaves or some other mechanism?
>> >>>
>> >>> Or even just secureboot based root of trust?
>> >>
>> >> You mean host secureboot? Or guest?
>> >>
>> >> If it’s host, then the problem is detecting malicious tampering with
>> >> host code (whether it’s kernel or hypervisor).
>> >
>> > Host.  Lots of existing systems do this.  As an extreme boot a RO disk,
>> > limit which packages are allowed.
>>
>> Is that provable to the guest?
>>
>> Consider a cloud provider doing that: how do they prove to their guest:
>>
>> a) What firmware, kernel and kvm they run
>>
>> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
>>    device driver installed by a rogue sysadmin
>>
>> My understanding is that SecureBoot is only intended to prevent non-verified
>> operating systems from booting. So the proof is given to the cloud provider,
>> and the proof is that the system boots successfully.
>
> I think I should have said measured boot not secure boot.

The problem again is how you prove to the guest that you are not lying?

We know how to do that from a guest [1], but you will note that in the
normal process, a trusted hardware component (e.g. the PSP for AMD SEV)
proves the validity of the measurements of the TCB by encrypting it with an
attestation signing key derived from some chip-unique secret. For AMD, this
is called the VCEK, and TDX has something similar. In the case of SEV, this
goes through firmware, and you have to tell the firmware each time you
insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied
to a VM execution context. I do not believe there is any provision to do the
same thing to measure host data. And again, it would be somewhat pointless
if there isn't also a mechanism to ensure the host data is not changed after
the measurement.

Now, I don't think it would be super-difficult to add a firmware service
that would let the host do some kind of equivalent to PVALIDATE, setting
some physical pages aside that then get measured and become inaccessible to
the host. The PSP or similar could then integrate these measurements as part
of the TCB, and the fact that the pages were "transferred" to this special
invariant block would ensure the guests that the code will not change after
being measured.

I am not aware that such a mechanism exists on any of the existing CC
platforms. Please feel free to enlighten me if I'm wrong.

[1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow
>
>>
>> After that, I think all bets are off. SecureBoot does little AFAICT
>> to prevent malicious modifications of the running system by someone with
>> root access, including deliberately loading a malicious kvm-zilog.ko
>
> So disable module loading then or don't allow root access?

Who would do that?

The problem is that we have a host and a tenant, and the tenant does not
trust the host in principle. So it is not sufficient for the host to disable
module loading or carefully control root access. It is also necessary to
prove to the tenant(s) that this was done.

>
>>
>> It does not mean it cannot be done, just that I don’t think we
>> have the tools at the moment.
>
> Phones, chromebooks do this all the time ...

Indeed, but there, this is to prove to the phone's real owner (which,
surprise, is not the naive person who thought they'd get some kind of
ownership by buying the phone) that the software running on the phone has
not been replaced by some horribly jailbreaked goo.

In other words, the user of the phone gets no proof whatsoever of anything,
except that the phone appears to work. This is somewhat the situation in the
cloud today: the owners of the hardware get all sorts of useful checks, from
SecureBoot to error-correction for memory or I/O devices. However, someone
running in a VM on the cloud gets none of that, just like the user of your
phone.

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-02-01 11:01                             ` Michael S. Tsirkin
  2023-02-01 13:15                               ` Christophe de Dinechin Dupont de Dinechin
@ 2023-02-02  3:24                               ` Jason Wang
  1 sibling, 0 replies; 39+ messages in thread
From: Jason Wang @ 2023-02-02  3:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christophe de Dinechin Dupont de Dinechin
  Cc: Christophe de Dinechin, James Bottomley, Reshetova, Elena,
	Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Poimboe, Josh, aarcange, Cfir Cohen, Marc Orr,
	jbachmann, pgonda, keescook, James Morris, Michael Kelley, Lange,
	Jon, linux-coco, Linux Kernel Mailing List, Kernel Hardening


在 2023/2/1 19:01, Michael S. Tsirkin 写道:
> On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>>>> Finally, security considerations that apply irrespective of whether the
>>>> platform is confidential or not are also outside of the scope of this
>>>> document. This includes topics ranging from timing attacks to social
>>>> engineering.
>>> Why are timing attacks by hypervisor on the guest out of scope?
>> Good point.
>>
>> I was thinking that mitigation against timing attacks is the same
>> irrespective of the source of the attack. However, because the HV
>> controls CPU time allocation, there are presumably attacks that
>> are made much easier through the HV. Those should be listed.
> Not just that, also because it can and does emulate some devices.
> For example, are disk encryption systems protected against timing of
> disk accesses?
> This is why some people keep saying "forget about emulated devices, require
> passthrough, include devices in the trust zone".


One problem is that the device could be yet another emulated one that is 
running in the SmartNIC/DPU itself.

Thanks



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-01-31 11:31                   ` Reshetova, Elena
  2023-01-31 13:28                     ` James Bottomley
@ 2023-02-02 14:51                     ` Jeremi Piotrowski
  2023-02-03 14:05                       ` Reshetova, Elena
  1 sibling, 1 reply; 39+ messages in thread
From: Jeremi Piotrowski @ 2023-02-02 14:51 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: jejb, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote:
> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > [...]
> > > > The big threat from most devices (including the thunderbolt
> > > > classes) is that they can DMA all over memory.  However, this isn't
> > > > really a threat in CC (well until PCI becomes able to do encrypted
> > > > DMA) because the device has specific unencrypted buffers set aside
> > > > for the expected DMA. If it writes outside that CC integrity will
> > > > detect it and if it reads outside that it gets unintelligible
> > > > ciphertext.  So we're left with the device trying to trick secrets
> > > > out of us by returning unexpected data.
> > >
> > > Yes, by supplying the input that hasn’t been expected. This is
> > > exactly the case we were trying to fix here for example:
> > > https://lore.kernel.org/all/20230119170633.40944-2-
> > alexander.shishkin@linux.intel.com/
> > > I do agree that this case is less severe when others where memory
> > > corruption/buffer overrun can happen, like here:
> > > https://lore.kernel.org/all/20230119135721.83345-6-
> > alexander.shishkin@linux.intel.com/
> > > But we are trying to fix all issues we see now (prioritizing the
> > > second ones though).
> > 
> > I don't see how MSI table sizing is a bug in the category we've
> > defined.  The very text of the changelog says "resulting in a kernel
> > page fault in pci_write_msg_msix()."  which is a crash, which I thought
> > we were agreeing was out of scope for CC attacks?
> 
> As I said this is an example of a crash and on the first look
> might not lead to the exploitable condition (albeit attackers are creative).
> But we noticed this one while fuzzing and it was common enough
> that prevented fuzzer going deeper into the virtio devices driver fuzzing.
> The core PCI/MSI doesn’t seem to have that many easily triggerable 
> Other examples in virtio patchset are more severe. 
> 
> > 
> > > >
> > > > If I set this as the problem, verifying device correct operation is
> > > > a possible solution (albeit hugely expensive) but there are likely
> > > > many other cheaper ways to defeat or detect a device trying to
> > > > trick us into revealing something.
> > >
> > > What do you have in mind here for the actual devices we need to
> > > enable for CC cases?
> > 
> > Well, the most dangerous devices seem to be the virtio set a CC system
> > will rely on to boot up.  After that, there are other ways (like SPDM)
> > to verify a real PCI device is on the other end of the transaction.
> 
> Yes, it the future, but not yet. Other vendors will not necessary be 
> using virtio devices at this point, so we will have non-virtio and not
> CC enabled devices that we want to securely add to the guest.
> 
> > 
> > > We have been using here a combination of extensive fuzzing and static
> > > code analysis.
> > 
> > by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> > off the table because fuzzing primarily triggers those
> 
> If you enable memory sanitizers you can detect more server conditions like
> out of bounds accesses and such. I think given that we have a way to 
> verify that fuzzing is reaching the code locations we want it to reach, it
> can be pretty effective method to find at least low-hanging bugs. And these
> will be the bugs that most of the attackers will go after at the first place. 
> But of course it is not a formal verification of any kind.
> 
>  so its hard to
> > see what else it could detect given the signal will be smothered by
> > oopses and secondly I think the PCI interface is likely the wrong place
> > to begin and you should probably begin on the virtio bus and the
> > hypervisor generated configuration space.
> 
> This is exactly what we do. We don’t fuzz from the PCI config space,
> we supply inputs from the host/vmm via the legitimate interfaces that it can 
> inject them to the guest: whenever guest requests a pci config space
> (which is controlled by host/hypervisor as you said) read operation, 
> it gets input injected by the kafl fuzzer.  Same for other interfaces that 
> are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
> via #VE handler in our case). When it comes to virtio, we employ 
> two different fuzzing techniques: directly injecting kafl fuzz input when
> virtio core or virtio drivers gets the data received from the host 
> (via injecting input in functions virtio16/32/64_to_cpu and others) and 
> directly fuzzing DMA memory pages using kfx fuzzer. 
> More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
> 
> Best Regards,
> Elena.

Hi Elena,

I think it might be a good idea to narrow down a configuration that *can*
reasonably be hardened to be suitable for confidential computing, before
proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices
in the context of virtualization, but what about taking PCI out of scope
completely by switching to virtio-mmio devices?

Jeremi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Linux guest kernel threat model for Confidential Computing
  2023-02-02 14:51                     ` Jeremi Piotrowski
@ 2023-02-03 14:05                       ` Reshetova, Elena
  0 siblings, 0 replies; 39+ messages in thread
From: Reshetova, Elena @ 2023-02-03 14:05 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: jejb, Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Michael S. Tsirkin, Jason Wang, Poimboe, Josh,
	aarcange, Cfir Cohen, Marc Orr, jbachmann, pgonda, keescook,
	James Morris, Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening


> On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote:
> > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > > [...]
> > > > > The big threat from most devices (including the thunderbolt
> > > > > classes) is that they can DMA all over memory.  However, this isn't
> > > > > really a threat in CC (well until PCI becomes able to do encrypted
> > > > > DMA) because the device has specific unencrypted buffers set aside
> > > > > for the expected DMA. If it writes outside that CC integrity will
> > > > > detect it and if it reads outside that it gets unintelligible
> > > > > ciphertext.  So we're left with the device trying to trick secrets
> > > > > out of us by returning unexpected data.
> > > >
> > > > Yes, by supplying the input that hasn’t been expected. This is
> > > > exactly the case we were trying to fix here for example:
> > > > https://lore.kernel.org/all/20230119170633.40944-2-
> > > alexander.shishkin@linux.intel.com/
> > > > I do agree that this case is less severe when others where memory
> > > > corruption/buffer overrun can happen, like here:
> > > > https://lore.kernel.org/all/20230119135721.83345-6-
> > > alexander.shishkin@linux.intel.com/
> > > > But we are trying to fix all issues we see now (prioritizing the
> > > > second ones though).
> > >
> > > I don't see how MSI table sizing is a bug in the category we've
> > > defined.  The very text of the changelog says "resulting in a kernel
> > > page fault in pci_write_msg_msix()."  which is a crash, which I thought
> > > we were agreeing was out of scope for CC attacks?
> >
> > As I said this is an example of a crash and on the first look
> > might not lead to the exploitable condition (albeit attackers are creative).
> > But we noticed this one while fuzzing and it was common enough
> > that prevented fuzzer going deeper into the virtio devices driver fuzzing.
> > The core PCI/MSI doesn’t seem to have that many easily triggerable
> > Other examples in virtio patchset are more severe.
> >
> > >
> > > > >
> > > > > If I set this as the problem, verifying device correct operation is
> > > > > a possible solution (albeit hugely expensive) but there are likely
> > > > > many other cheaper ways to defeat or detect a device trying to
> > > > > trick us into revealing something.
> > > >
> > > > What do you have in mind here for the actual devices we need to
> > > > enable for CC cases?
> > >
> > > Well, the most dangerous devices seem to be the virtio set a CC system
> > > will rely on to boot up.  After that, there are other ways (like SPDM)
> > > to verify a real PCI device is on the other end of the transaction.
> >
> > Yes, it the future, but not yet. Other vendors will not necessary be
> > using virtio devices at this point, so we will have non-virtio and not
> > CC enabled devices that we want to securely add to the guest.
> >
> > >
> > > > We have been using here a combination of extensive fuzzing and static
> > > > code analysis.
> > >
> > > by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> > > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> > > off the table because fuzzing primarily triggers those
> >
> > If you enable memory sanitizers you can detect more server conditions like
> > out of bounds accesses and such. I think given that we have a way to
> > verify that fuzzing is reaching the code locations we want it to reach, it
> > can be pretty effective method to find at least low-hanging bugs. And these
> > will be the bugs that most of the attackers will go after at the first place.
> > But of course it is not a formal verification of any kind.
> >
> >  so its hard to
> > > see what else it could detect given the signal will be smothered by
> > > oopses and secondly I think the PCI interface is likely the wrong place
> > > to begin and you should probably begin on the virtio bus and the
> > > hypervisor generated configuration space.
> >
> > This is exactly what we do. We don’t fuzz from the PCI config space,
> > we supply inputs from the host/vmm via the legitimate interfaces that it can
> > inject them to the guest: whenever guest requests a pci config space
> > (which is controlled by host/hypervisor as you said) read operation,
> > it gets input injected by the kafl fuzzer.  Same for other interfaces that
> > are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
> > via #VE handler in our case). When it comes to virtio, we employ
> > two different fuzzing techniques: directly injecting kafl fuzz input when
> > virtio core or virtio drivers gets the data received from the host
> > (via injecting input in functions virtio16/32/64_to_cpu and others) and
> > directly fuzzing DMA memory pages using kfx fuzzer.
> > More information can be found in https://intel.github.io/ccc-linux-guest-
> hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
> >
> > Best Regards,
> > Elena.
> 
> Hi Elena,

Hi Jeremi, 

> 
> I think it might be a good idea to narrow down a configuration that *can*
> reasonably be hardened to be suitable for confidential computing, before
> proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices
> in the context of virtualization, but what about taking PCI out of scope
> completely by switching to virtio-mmio devices?

I agree that narrowing down is important and we spent a significant effort
in disabling various code we don’t need (including PCI code, like quirks, 
early PCI, etc). The decision to use virtio over pci vs. mmio I believe comes
from performance and usage scenarios and we have to best we can with these
limitations. 

Moreover, even if we could remove PCI for the virtio devices by
removing the transport dependency, this isn’t possible for other devices that we
know are used in some CC setups: not all CSPs are using virtio-based drivers,
so pretty quickly PCI comes back into hardening scope and we cannot just remove
it unfortunately. 

Best Regards,
Elena. 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux guest kernel threat model for Confidential Computing
  2023-02-01 17:13                                   ` Christophe de Dinechin
@ 2023-02-06 18:58                                     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2023-02-06 18:58 UTC (permalink / raw)
  To: Christophe de Dinechin
  Cc: Michael S. Tsirkin, James Bottomley, Reshetova, Elena,
	Leon Romanovsky, Greg Kroah-Hartman, Shishkin, Alexander,
	Shutemov, Kirill, Kuppuswamy, Sathyanarayanan, Kleen, Andi,
	Hansen, Dave, Thomas Gleixner, Peter Zijlstra, Wunner, Lukas,
	Mika Westerberg, Jason Wang, Poimboe, Josh, aarcange, Cfir Cohen,
	Marc Orr, jbachmann, pgonda, keescook, James Morris,
	Michael Kelley, Lange, Jon, linux-coco,
	Linux Kernel Mailing List, Kernel Hardening

* Christophe de Dinechin (dinechin@redhat.com) wrote:
> 
> On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote...
> > On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >>
> >>
> >> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >
> >> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >> >>
> >> >>
> >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >>>
> >> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >> >>>> Finally, security considerations that apply irrespective of whether the
> >> >>>> platform is confidential or not are also outside of the scope of this
> >> >>>> document. This includes topics ranging from timing attacks to social
> >> >>>> engineering.
> >> >>>
> >> >>> Why are timing attacks by hypervisor on the guest out of scope?
> >> >>
> >> >> Good point.
> >> >>
> >> >> I was thinking that mitigation against timing attacks is the same
> >> >> irrespective of the source of the attack. However, because the HV
> >> >> controls CPU time allocation, there are presumably attacks that
> >> >> are made much easier through the HV. Those should be listed.
> >> >
> >> > Not just that, also because it can and does emulate some devices.
> >> > For example, are disk encryption systems protected against timing of
> >> > disk accesses?
> >> > This is why some people keep saying "forget about emulated devices, require
> >> > passthrough, include devices in the trust zone".
> >> >
> >> >>>
> >> >>>> </doc>
> >> >>>>
> >> >>>> Feel free to comment and reword at will ;-)
> >> >>>>
> >> >>>>
> >> >>>> 3/ PCI-as-a-threat: where does that come from
> >> >>>>
> >> >>>> Isn't there a fundamental difference, from a threat model perspective,
> >> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >> >>>> should defeat) and compromised software feeding us bad data? I think there
> >> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
> >> >>>> and prevent it from running using attestation.  In other words, we first
> >> >>>> check what we will run, then we run it. The security there is that we know
> >> >>>> what we are running. The trust we have in the software is from testing,
> >> >>>> reviewing or using it.
> >> >>>>
> >> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
> >> >>>> software being measured is largely tamper-resistant thanks to memory
> >> >>>> encryption. In other words, after you have measured your guest software
> >> >>>> stack, the host or hypervisor cannot willy-nilly change it.
> >> >>>>
> >> >>>> So this brings me to the next question: is there any way we could offer the
> >> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
> >> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >> >>>> me. But maybe someone else will have a brilliant idea?
> >> >>>>
> >> >>>> So I'm asking the question, because if you could somehow prove to the guest
> >> >>>> not only that it's running the right guest stack (as we can do today) but
> >> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
> >> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >> >>>> this is something which is evidently easier to deal with.
> >> >>>
> >> >>> Agree absolutely that's much easier.
> >> >>>
> >> >>>> I briefly discussed this with James, and he pointed out two interesting
> >> >>>> aspects of that question:
> >> >>>>
> >> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >> >>>>  care about either virtio devices, or physical ones being passed through
> >> >>>>  to the guest. Let's assume physical ones can be trusted, see above.
> >> >>>>  That leaves virtio devices. How much damage can a malicious virtio device
> >> >>>>  do to the guest kernel, and can this lead to secrets being leaked?
> >> >>>>
> >> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
> >> >>>>  being able to prevent tampering of the guest. One example he mentioned is
> >> >>>>  a research paper [1] about running the hypervisor itself inside an
> >> >>>>  "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >> >>>>  with TDX using secure enclaves or some other mechanism?
> >> >>>
> >> >>> Or even just secureboot based root of trust?
> >> >>
> >> >> You mean host secureboot? Or guest?
> >> >>
> >> >> If it’s host, then the problem is detecting malicious tampering with
> >> >> host code (whether it’s kernel or hypervisor).
> >> >
> >> > Host.  Lots of existing systems do this.  As an extreme boot a RO disk,
> >> > limit which packages are allowed.
> >>
> >> Is that provable to the guest?
> >>
> >> Consider a cloud provider doing that: how do they prove to their guest:
> >>
> >> a) What firmware, kernel and kvm they run
> >>
> >> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
> >>    device driver installed by a rogue sysadmin
> >>
> >> My understanding is that SecureBoot is only intended to prevent non-verified
> >> operating systems from booting. So the proof is given to the cloud provider,
> >> and the proof is that the system boots successfully.
> >
> > I think I should have said measured boot not secure boot.
> 
> The problem again is how you prove to the guest that you are not lying?
> 
> We know how to do that from a guest [1], but you will note that in the
> normal process, a trusted hardware component (e.g. the PSP for AMD SEV)
> proves the validity of the measurements of the TCB by encrypting it with an
> attestation signing key derived from some chip-unique secret. For AMD, this
> is called the VCEK, and TDX has something similar. In the case of SEV, this
> goes through firmware, and you have to tell the firmware each time you
> insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied
> to a VM execution context. I do not believe there is any provision to do the
> same thing to measure host data. And again, it would be somewhat pointless
> if there isn't also a mechanism to ensure the host data is not changed after
> the measurement.
> 
> Now, I don't think it would be super-difficult to add a firmware service
> that would let the host do some kind of equivalent to PVALIDATE, setting
> some physical pages aside that then get measured and become inaccessible to
> the host. The PSP or similar could then integrate these measurements as part
> of the TCB, and the fact that the pages were "transferred" to this special
> invariant block would ensure the guests that the code will not change after
> being measured.
> 
> I am not aware that such a mechanism exists on any of the existing CC
> platforms. Please feel free to enlighten me if I'm wrong.
> 
> [1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow
> >
> >>
> >> After that, I think all bets are off. SecureBoot does little AFAICT
> >> to prevent malicious modifications of the running system by someone with
> >> root access, including deliberately loading a malicious kvm-zilog.ko
> >
> > So disable module loading then or don't allow root access?
> 
> Who would do that?
> 
> The problem is that we have a host and a tenant, and the tenant does not
> trust the host in principle. So it is not sufficient for the host to disable
> module loading or carefully control root access. It is also necessary to
> prove to the tenant(s) that this was done.
> 
> >
> >>
> >> It does not mean it cannot be done, just that I don’t think we
> >> have the tools at the moment.
> >
> > Phones, chromebooks do this all the time ...
> 
> Indeed, but there, this is to prove to the phone's real owner (which,
> surprise, is not the naive person who thought they'd get some kind of
> ownership by buying the phone) that the software running on the phone has
> not been replaced by some horribly jailbreaked goo.
> 
> In other words, the user of the phone gets no proof whatsoever of anything,
> except that the phone appears to work. This is somewhat the situation in the
> cloud today: the owners of the hardware get all sorts of useful checks, from
> SecureBoot to error-correction for memory or I/O devices. However, someone
> running in a VM on the cloud gets none of that, just like the user of your
> phone.

Assuming you do a measured boot, the host OS and firmware is measured into the host TPM;
people have thought in the past about triggering attestations of the
host from the guest; then you could have something external attest the
host and only release keys to the guests disks if the attestation is
correct; or a key for the guests disks held in the hosts TPM.

Dave

> --
> Cheers,
> Christophe de Dinechin (https://c3d.github.io)
> Theory of Incomplete Measurements (https://c3d.github.io/TIM)
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-02-06 18:59 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com>
     [not found] ` <Y9EkCvAfNXnJ+ATo@kroah.com>
2023-01-25 15:29   ` Linux guest kernel threat model for Confidential Computing Reshetova, Elena
2023-01-25 16:40     ` Theodore Ts'o
2023-01-26  8:08       ` Reshetova, Elena
2023-01-26 11:19     ` Leon Romanovsky
2023-01-26 11:29       ` Reshetova, Elena
2023-01-26 12:30         ` Leon Romanovsky
2023-01-26 13:28           ` Reshetova, Elena
2023-01-26 13:50             ` Leon Romanovsky
2023-01-26 20:54             ` Theodore Ts'o
2023-01-27 19:24             ` James Bottomley
2023-01-30  7:42               ` Reshetova, Elena
2023-01-30 12:40                 ` James Bottomley
2023-01-31 11:31                   ` Reshetova, Elena
2023-01-31 13:28                     ` James Bottomley
2023-01-31 15:14                       ` Christophe de Dinechin
2023-01-31 17:39                         ` Michael S. Tsirkin
2023-02-01 10:52                           ` Christophe de Dinechin Dupont de Dinechin
2023-02-01 11:01                             ` Michael S. Tsirkin
2023-02-01 13:15                               ` Christophe de Dinechin Dupont de Dinechin
2023-02-01 16:02                                 ` Michael S. Tsirkin
2023-02-01 17:13                                   ` Christophe de Dinechin
2023-02-06 18:58                                     ` Dr. David Alan Gilbert
2023-02-02  3:24                               ` Jason Wang
2023-02-01 10:24                         ` Christophe de Dinechin
2023-01-31 16:34                       ` Reshetova, Elena
2023-01-31 17:49                         ` James Bottomley
2023-02-02 14:51                     ` Jeremi Piotrowski
2023-02-03 14:05                       ` Reshetova, Elena
2023-01-27  9:32           ` Jörg Rödel
2023-01-26 13:58         ` Dr. David Alan Gilbert
2023-01-26 17:48           ` Reshetova, Elena
2023-01-26 18:06             ` Leon Romanovsky
2023-01-26 18:14               ` Dr. David Alan Gilbert
2023-01-26 16:29     ` Michael S. Tsirkin
2023-01-27  8:52       ` Reshetova, Elena
2023-01-27 10:04         ` Michael S. Tsirkin
2023-01-27 12:25           ` Reshetova, Elena
2023-01-27 14:32             ` Michael S. Tsirkin
2023-01-27 20:51             ` Carlos Bilbao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).