All of lore.kernel.org
 help / color / mirror / Atom feed
* Unmapping KVM Guest Memory from Host Kernel
@ 2024-03-08 15:50 ` Gowans, James
  2024-03-08 16:25   ` Brendan Jackman
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Gowans, James @ 2024-03-08 15:50 UTC (permalink / raw)
  To: seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt,
	pbonzini, Woodhouse, David
  Cc: Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel,
	kirill.shutemov, vbabka, mst, somlo, Graf (AWS),
	Alexander, kvm, linux-coco

Hello KVM, MM and memfd_secret folks,

Currently when using anonymous memory for KVM guest RAM, the memory all
remains mapped into the kernel direct map. We are looking at options to
get KVM guest memory out of the kernel’s direct map as a principled
approach to mitigating speculative execution issues in the host kernel.
Our goal is to more completely address the class of issues whose leak
origin is categorized as "Mapped memory" [1].

We currently have downstream-only solutions to this, but we want to move
to purely upstream code.

So far we have been looking at using memfd_secret, which seems to be
designed exactly for usecases where it is undesirable to have some
memory range accessible through the kernel’s direct map.

However, memfd_secret doesn’t work out the box for KVM guest memory; the
main reason seems to be that the GUP path is intentionally disabled for
memfd_secret, so if we use a memfd_secret backed VMA for a memslot then
KVM is not able to fault the memory in. If it’s been pre-faulted in by
userspace then it seems to work.

There are a few other issues around when KVM accesses the guest memory.
For example the KVM PV clock code goes directly to the PFN via the
pfncache, and that also breaks if the PFN is not in the direct map, so
we’d need to change that sort of thing, perhaps going via userspace
addresses.

If we remove the memfd_secret check from the GUP path, and disable KVM’s
pvclock from userspace via KVM_CPUID_FEATURES, we are able to boot a
simple Linux initrd using a Firecracker VMM modified to use
memfd_secret.

We are also aware of ongoing work on guest_memfd. The current
implementation unmaps guest memory from VMM address space, but leaves it
in the kernel’s direct map. We’re not looking at unmapping from VMM
userspace yet; we still need guest RAM there for PV drivers like virtio
to continue to work. So KVM’s gmem doesn’t seem like the right solution?

With this in mind, what’s the best way to solve getting guest RAM out of
the direct map? Is memfd_secret integration with KVM the way to go, or
should we build a solution on top of guest_memfd, for example via some
flag that causes it to leave memory in the host userspace’s page tables,
but removes it from the direct map? 

We are keen to help contribute to getting this working, we’re just
looking for guidance from maintainers on what the correct way to solve
this is.

Cheers,
James + colleagues Derek and Patrick


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James
@ 2024-03-08 16:25   ` Brendan Jackman
  2024-03-08 17:35     ` David Matlack
  2024-03-08 23:22   ` Sean Christopherson
  2024-03-09  5:01   ` Matthew Wilcox
  2 siblings, 1 reply; 14+ messages in thread
From: Brendan Jackman @ 2024-03-08 16:25 UTC (permalink / raw)
  To: Gowans, James
  Cc: seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt,
	pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Graf (AWS),
	Alexander, kvm, linux-coco

Hi James

On Fri, 8 Mar 2024 at 16:50, Gowans, James <jgowans@amazon.com> wrote:
> Our goal is to more completely address the class of issues whose leak
> origin is categorized as "Mapped memory" [1].

Did you forget a link below? I'm interested in hearing about that
categorisation.

> ... what’s the best way to solve getting guest RAM out of
> the direct map?

It's perhaps a bigger hammer than you are looking for, but the
solution we're working on at Google is "Address Space Isolation" (ASI)
- the latest posting about that is [2].

The sense in which it's a bigger hammer is that it doesn't only
support removing guest memory from the direct map, but rather
arbitrary data from arbitrary kernel mappings.

[2] https://lore.kernel.org/linux-mm/CA+i-1C169s8pyqZDx+iSnFmftmGfssdQA29+pYm-gqySAYWgpg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 16:25   ` Brendan Jackman
@ 2024-03-08 17:35     ` David Matlack
  2024-03-08 17:45       ` David Woodhouse
  2024-03-09  2:45       ` Manwaring, Derek
  0 siblings, 2 replies; 14+ messages in thread
From: David Matlack @ 2024-03-08 17:35 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng,
	Manwaring, Derek, rppt, pbonzini, Woodhouse, David, Kalyazin,
	Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel,
	kirill.shutemov, vbabka, mst, somlo, Graf (AWS),
	Alexander, kvm, linux-coco

On Fri, Mar 8, 2024 at 8:25 AM Brendan Jackman <jackmanb@google.com> wrote:
>
> Hi James
>
> On Fri, 8 Mar 2024 at 16:50, Gowans, James <jgowans@amazon.com> wrote:
> > Our goal is to more completely address the class of issues whose leak
> > origin is categorized as "Mapped memory" [1].
>
> Did you forget a link below? I'm interested in hearing about that
> categorisation.
>
> > ... what’s the best way to solve getting guest RAM out of
> > the direct map?
>
> It's perhaps a bigger hammer than you are looking for, but the
> solution we're working on at Google is "Address Space Isolation" (ASI)
> - the latest posting about that is [2].
>
> The sense in which it's a bigger hammer is that it doesn't only
> support removing guest memory from the direct map, but rather
> arbitrary data from arbitrary kernel mappings.

I'm not sure if ASI provides a solution to the problem James is trying
to solve. ASI creates a separate "restricted" address spaces where, yes,
guest memory can be not mapped. But any access to guest memory is
 still allowed. An access will trigger a page fault, the kernel will
switch to the "full" kernel address space (flushing hardware buffers
along the way to prevent speculation), and then proceed. i.e. ASI
doesn't not prevent accessing guest memory through the
direct map, it just prevents speculation of guest memory through the
direct map.

I think what James is looking for (and what we are also interested
in), is _eliminating_ the ability to access guest memory from the
direct map entirely. And in general, eliminate the ability to access
guest memory in as many ways as possible.

For that goal, I have been thinking about guest_memfd as a
solution. Yes guest_memfd today is backed by pages of memory that are
mapped in the direct map. But what we can do is add the ability to
back guest_memfd by pages of memory that aren't in the direct map. I
haven't thought it fully through yet but something like... Hide the
majority of RAM from Linux (I believe there are kernel parameters to
do this) and hand it off to guest_memfd to allocate from as a source
of guest memory. Then the only way to access guest memory is to mmap()
a guest_memfd (e.g. for PV userspace devices).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re:  Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 17:35     ` David Matlack
@ 2024-03-08 17:45       ` David Woodhouse
  2024-03-08 22:47         ` Sean Christopherson
  2024-03-09  2:45       ` Manwaring, Derek
  1 sibling, 1 reply; 14+ messages in thread
From: David Woodhouse @ 2024-03-08 17:45 UTC (permalink / raw)
  To: David Matlack, Brendan Jackman
  Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng,
	Manwaring, Derek, rppt, pbonzini, Kalyazin, Nikita, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Graf (AWS),
	Alexander, kvm, linux-coco

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> I think what James is looking for (and what we are also interested
> in), is _eliminating_ the ability to access guest memory from the
> direct map entirely. And in general, eliminate the ability to access
> guest memory in as many ways as possible.

Well, pKVM does that... 

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 17:45       ` David Woodhouse
@ 2024-03-08 22:47         ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2024-03-08 22:47 UTC (permalink / raw)
  To: David Woodhouse
  Cc: David Matlack, Brendan Jackman, James Gowans, akpm, Patrick Roy,
	chao.p.peng, Derek Manwaring, rppt, pbonzini, Nikita Kalyazin,
	lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov,
	vbabka, mst, somlo, Alexander Graf, kvm, linux-coco

On Fri, Mar 08, 2024, David Woodhouse wrote:
> On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > I think what James is looking for (and what we are also interested
> > in), is _eliminating_ the ability to access guest memory from the
> > direct map entirely. And in general, eliminate the ability to access
> > guest memory in as many ways as possible.
> 
> Well, pKVM does that... 

Out-of-tree :-)

I'm not just being snarky; when pKVM lands this functionality upstream, I fully
expect zapping direct map entries to be generic guest_memfd functionality that
would be opt-in, either by the in-kernel technology, e.g. pKVM, or by userspace,
or by some combination of the two, e.g. I can see making it optional to nuke the
direct map when using guest_memfd for TDX guests so that rogue accesses from the
host generate synchronous #PFs instead of latent #MCs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James
  2024-03-08 16:25   ` Brendan Jackman
@ 2024-03-08 23:22   ` Sean Christopherson
  2024-03-09 11:14     ` Mike Rapoport
  2024-03-14 21:45     ` Manwaring, Derek
  2024-03-09  5:01   ` Matthew Wilcox
  2 siblings, 2 replies; 14+ messages in thread
From: Sean Christopherson @ 2024-03-08 23:22 UTC (permalink / raw)
  To: James Gowans
  Cc: akpm, Patrick Roy, chao.p.peng, Derek Manwaring, rppt, pbonzini,
	David Woodhouse, Nikita Kalyazin, lstoakes, Liam.Howlett,
	linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo,
	Alexander Graf, kvm, linux-coco

On Fri, Mar 08, 2024, James Gowans wrote:
> However, memfd_secret doesn’t work out the box for KVM guest memory; the
> main reason seems to be that the GUP path is intentionally disabled for
> memfd_secret, so if we use a memfd_secret backed VMA for a memslot then
> KVM is not able to fault the memory in. If it’s been pre-faulted in by
> userspace then it seems to work.

Huh, that _shouldn't_ work.  The folio_is_secretmem() in gup_pte_range() is
supposed to prevent the "fast gup" path from getting secretmem pages.

Is this on an upstream kernel?  If so, and if you have bandwidth, can you figure
out why that isn't working?  At the very least, I suspect the memfd_secret
maintainers would be very interested to know that it's possible to fast gup
secretmem.

> There are a few other issues around when KVM accesses the guest memory.
> For example the KVM PV clock code goes directly to the PFN via the
> pfncache, and that also breaks if the PFN is not in the direct map, so
> we’d need to change that sort of thing, perhaps going via userspace
> addresses.
> 
> If we remove the memfd_secret check from the GUP path, and disable KVM’s
> pvclock from userspace via KVM_CPUID_FEATURES, we are able to boot a
> simple Linux initrd using a Firecracker VMM modified to use
> memfd_secret.
> 
> We are also aware of ongoing work on guest_memfd. The current
> implementation unmaps guest memory from VMM address space, but leaves it
> in the kernel’s direct map. We’re not looking at unmapping from VMM
> userspace yet; we still need guest RAM there for PV drivers like virtio
> to continue to work. So KVM’s gmem doesn’t seem like the right solution?

We (and by "we", I really mean the pKVM folks) are also working on allowing
userspace to mmap() guest_memfd[*].  pKVM aside, the long term vision I have for
guest_memfd is to be able to use it for non-CoCo VMs, precisely for the security
and robustness benefits it can bring.

What I am hoping to do with guest_memfd is get userspace to only map memory it
needs, e.g. for emulated/synthetic devices, on-demand.  I.e. to get to a state
where guest memory is mapped only when it needs to be.  More below.

> With this in mind, what’s the best way to solve getting guest RAM out of
> the direct map? Is memfd_secret integration with KVM the way to go, or
> should we build a solution on top of guest_memfd, for example via some
> flag that causes it to leave memory in the host userspace’s page tables,
> but removes it from the direct map? 

100% enhance guest_memfd.  If you're willing to wait long enough, pKVM might even
do all the work for you. :-)

The killer feature of guest_memfd is that it allows the guest mappings to be a
superset of the host userspace mappings.  Most obviously, it allows mapping memory
into the guest without mapping first mapping the memory into the userspace page
tables.  More subtly, it also makes it easier (in theory) to do things like map
the memory with 1GiB hugepages for the guest, but selectively map at 4KiB granularity
in the host.  Or map memory as RWX in the guest, but RO in the host (I don't have
a concrete use case for this, just pointing out it'll be trivial to do once
guest_memfd supports mmap()).

Every attempt to allow mapping VMA-based memory into a guest without it being
accessible by host userspace emory failed; it's literally why we ended up
implementing guest_memfd.  We could teach KVM to do the same with memfd_secret,
but we'd just end up re-implementing guest_memfd.

memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite
sure you'll be fighting memfd_secret all the way.  E.g. it's not dumpable, it
deliberately allocates at 4KiB granularity (though I suspect the bug you found
means that it can be inadvertantly mapped with 2MiB hugepages), it has no line
of sight to taking userspace out of the equation, etc.

With guest_memfd on the other hand, everyone contributing to and maintaining it
has goals that are *very* closely aligned with what you want to do.

[*] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 17:35     ` David Matlack
  2024-03-08 17:45       ` David Woodhouse
@ 2024-03-09  2:45       ` Manwaring, Derek
  2024-03-18 14:11         ` Brendan Jackman
  1 sibling, 1 reply; 14+ messages in thread
From: Manwaring, Derek @ 2024-03-09  2:45 UTC (permalink / raw)
  To: David Matlack, Brendan Jackman
  Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt,
	pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Graf (AWS),
	Alexander, kvm, linux-coco, kvmarm, tabba, qperret,
	jason.cj.chen

On 2024-03-08 10:36-0700, David Matlack wrote:
> On Fri, Mar 8, 2024 at 8:25 AM Brendan Jackman <jackmanb@google.com> wrote:
> > On Fri, 8 Mar 2024 at 16:50, Gowans, James <jgowans@amazon> wrote:
> > > Our goal is to more completely address the class of issues whose leak
> > > origin is categorized as "Mapped memory" [1].
> >
> > Did you forget a link below? I'm interested in hearing about that
> > categorisation.

The paper from Hertogh, et al. is https://download.vusec.net/papers/quarantine_raid23.pdf
specifically Table 1.

> > It's perhaps a bigger hammer than you are looking for, but the
> > solution we're working on at Google is "Address Space Isolation" (ASI)
> > - the latest posting about that is [2].
>
> I think what James is looking for (and what we are also interested
> in), is _eliminating_ the ability to access guest memory from the
> direct map entirely.

Actually, just preventing speculation of guest memory through the
direct map is sufficient for our current focus.

Brendan,
I will look into the general ASI approach, thank you. Did you consider
memfd_secret or a guest_memfd-based approach for Userspace-ASI? Based on
Sean's earlier reply to James it sounds like the vision of guest_memfd
aligns with ASI's goals.

Derek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James
  2024-03-08 16:25   ` Brendan Jackman
  2024-03-08 23:22   ` Sean Christopherson
@ 2024-03-09  5:01   ` Matthew Wilcox
  2 siblings, 0 replies; 14+ messages in thread
From: Matthew Wilcox @ 2024-03-09  5:01 UTC (permalink / raw)
  To: Gowans, James
  Cc: seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt,
	pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Graf (AWS),
	Alexander, kvm, linux-coco

On Fri, Mar 08, 2024 at 03:50:05PM +0000, Gowans, James wrote:
> Currently when using anonymous memory for KVM guest RAM, the memory all
> remains mapped into the kernel direct map. We are looking at options to
> get KVM guest memory out of the kernel’s direct map as a principled
> approach to mitigating speculative execution issues in the host kernel.
> Our goal is to more completely address the class of issues whose leak
> origin is categorized as "Mapped memory" [1].

One of the things that is holding Linux back is the inability to do I/O
to memory which is not part of memmap.  _So Much_ of our infrastructure
is based on having a struct page available to stick into an sglist, bio,
skb_frag, or whatever.  The solution to this is to move to a (phys_addr,
length) tuple instead of (page, offset, len) tuple.  I call this "phyr"
and I've written about it before.  I'm not working on this as I have
quite enough to do with the folio work, but I hope somebody works on it
before I get time to.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 23:22   ` Sean Christopherson
@ 2024-03-09 11:14     ` Mike Rapoport
  2024-03-14 21:45     ` Manwaring, Derek
  1 sibling, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2024-03-09 11:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: James Gowans, akpm, Patrick Roy, chao.p.peng, Derek Manwaring,
	pbonzini, David Woodhouse, Nikita Kalyazin, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Alexander Graf, kvm, linux-coco

On Fri, Mar 08, 2024 at 03:22:50PM -0800, Sean Christopherson wrote:
> On Fri, Mar 08, 2024, James Gowans wrote:
> > However, memfd_secret doesn’t work out the box for KVM guest memory; the
> > main reason seems to be that the GUP path is intentionally disabled for
> > memfd_secret, so if we use a memfd_secret backed VMA for a memslot then
> > KVM is not able to fault the memory in. If it’s been pre-faulted in by
> > userspace then it seems to work.
> 
> Huh, that _shouldn't_ work.  The folio_is_secretmem() in gup_pte_range() is
> supposed to prevent the "fast gup" path from getting secretmem pages.

I suspect this works because KVM only calls gup on faults and if the memory
was pre-faulted via memfd_secret there won't be faults and no gups from
KVM.
 
> > With this in mind, what’s the best way to solve getting guest RAM out of
> > the direct map? Is memfd_secret integration with KVM the way to go, or
> > should we build a solution on top of guest_memfd, for example via some
> > flag that causes it to leave memory in the host userspace’s page tables,
> > but removes it from the direct map? 
> 
> memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite
> sure you'll be fighting memfd_secret all the way.  E.g. it's not dumpable, it
> deliberately allocates at 4KiB granularity (though I suspect the bug you found
> means that it can be inadvertantly mapped with 2MiB hugepages), it has no line
> of sight to taking userspace out of the equation, etc.
> 
> With guest_memfd on the other hand, everyone contributing to and maintaining it
> has goals that are *very* closely aligned with what you want to do.

I agree with Sean, guest_memfd seems a better interface to use. It's
integrated by design with KVM and removing guest memory from the direct map
looks like a natural enhancement to guest_memfd. 

Unless I'm missing something, for fast-and-dirty POC it'll be a oneliner
that adds set_memory_np() to kvm_gmem_get_folio() and then figuring out
what to do with virtio :)

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 23:22   ` Sean Christopherson
  2024-03-09 11:14     ` Mike Rapoport
@ 2024-03-14 21:45     ` Manwaring, Derek
  1 sibling, 0 replies; 14+ messages in thread
From: Manwaring, Derek @ 2024-03-14 21:45 UTC (permalink / raw)
  To: Sean Christopherson, James Gowans
  Cc: akpm, Patrick Roy, chao.p.peng, rppt, pbonzini, David Woodhouse,
	Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel,
	kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm,
	linux-coco, xmarcalx, tabba, qperret, kvmarm

On Fri, 8 Mar 2024 15:22:50 -0800, Sean Christopherson wrote:
> On Fri, Mar 08, 2024, James Gowans wrote:
> > We are also aware of ongoing work on guest_memfd. The current
> > implementation unmaps guest memory from VMM address space, but leaves it
> > in the kernel’s direct map. We’re not looking at unmapping from VMM
> > userspace yet; we still need guest RAM there for PV drivers like virtio
> > to continue to work. So KVM’s gmem doesn’t seem like the right solution?
>
> We (and by "we", I really mean the pKVM folks) are also working on allowing
> userspace to mmap() guest_memfd[*].  pKVM aside, the long term vision I have for
> guest_memfd is to be able to use it for non-CoCo VMs, precisely for the security
> and robustness benefits it can bring.
>
> What I am hoping to do with guest_memfd is get userspace to only map memory it
> needs, e.g. for emulated/synthetic devices, on-demand.  I.e. to get to a state
> where guest memory is mapped only when it needs to be.

Thank you for the direction, this is super helpful.

We are new to the guest_memfd space, and for simplicity we'd prefer to
leave guest_memfd completely mapped in userspace. Even in the long term,
we actually don't have any use for unmapping from host userspace. The
current form of marking pages shared doesn't quite align with what we're
trying to do either since it also shares the pages with the host kernel.

What are your thoughts on a flag for KVM_CREATE_GUEST_MEMFD that only
removes from the host kernel's direct map, but leaves everything mapped
in userspace?

Derek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-09  2:45       ` Manwaring, Derek
@ 2024-03-18 14:11         ` Brendan Jackman
  0 siblings, 0 replies; 14+ messages in thread
From: Brendan Jackman @ 2024-03-18 14:11 UTC (permalink / raw)
  To: Manwaring, Derek
  Cc: David Matlack, Gowans, James, seanjc, akpm, Roy, Patrick,
	chao.p.peng, rppt, pbonzini, Woodhouse, David, Kalyazin, Nikita,
	lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov,
	vbabka, mst, somlo, Graf (AWS),
	Alexander, kvm, linux-coco, kvmarm, tabba, qperret,
	jason.cj.chen

On Fri, 8 Mar 2024 at 18:36, David Matlack <dmatlack@google.com> wrote:
> I'm not sure if ASI provides a solution to the problem James is trying
> to solve. ASI creates a separate "restricted" address spaces where, yes,
> guest memory can be not mapped. But any access to guest memory is
>  still allowed. An access will trigger a page fault, the kernel will
> switch to the "full" kernel address space (flushing hardware buffers
> along the way to prevent speculation), and then proceed. i.e. ASI
> doesn't not prevent accessing guest memory through the
> direct map, it just prevents speculation of guest memory through the
> direct map.

Yes, there's also a sense in which ASI is a "smaller hammer" in that
it _only_ protects against hardware-bug exploits.

>  it just prevents speculation of guest memory through the
> direct map.

(Although, this is not _all_ it does, because when returning to the
restricted address space, i.e. right before VM Enter, we have an
opportunity to flush _data buffers_ too. So ASI also mitigates
Meltdown-style attacks, e.g. L1TF, where the speculation-related stuff
all happens on the attacker side)

On Sat, 9 Mar 2024 at 03:46, Manwaring, Derek <derekmn@amazon.com> wrote:
> Brendan,
> I will look into the general ASI approach, thank you. Did you consider
> memfd_secret or a guest_memfd-based approach for Userspace-ASI?

I might be misunderstanding you here: I guess you mean using
memfd_secret as a way for userspace to communicate about which parts
of userspace memory are "secret"?

If I didn't misunderstand: we have not looked into this so far because
we actually just consider _all_ userspace/guest memory to be "secret"
from the perspective of other processes/guests.

> Based on
> Sean's earlier reply to James it sounds like the vision of guest_memfd
> aligns with ASI's goals.

But yes, the more general point seems to make sense, I think I need to
research this topic some more, thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-11  9:26 ` Fuad Tabba
@ 2024-03-11  9:29   ` Fuad Tabba
  0 siblings, 0 replies; 14+ messages in thread
From: Fuad Tabba @ 2024-03-11  9:29 UTC (permalink / raw)
  To: Manwaring, Derek
  Cc: David Woodhouse, David Matlack, Brendan Jackman, qperret,
	jason.cj.chen, Gowans, James, seanjc, akpm, Roy, Patrick,
	chao.p.peng, rppt, pbonzini, Kalyazin, Nikita, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Graf (AWS),
	Alexander, kvm, linux-coco, kvmarm, kvmarm

On Mon, Mar 11, 2024 at 9:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi,
>
> On Fri, Mar 8, 2024 at 9:05 PM Manwaring, Derek <derekmn@amazon.com> wrote:
> >
> > On 2024-03-08 at 10:46-0700, David Woodhouse wrote:
> > > On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > > > I think what James is looking for (and what we are also interested
> > > > in), is _eliminating_ the ability to access guest memory from the
> > > > direct map entirely. And in general, eliminate the ability to access
> > > > guest memory in as many ways as possible.
> > >
> > > Well, pKVM does that...
> >
> > Yes we've been looking at pKVM and it accomplishes a lot of what we're trying
> > to do. Our initial inclination is that we want to stick with VHE for the lower
> > overhead. We also want flexibility across server parts, so we would need to
> > get pKVM working on Intel & AMD if we went this route.
> >
> > Certainly there are advantages of pKVM on the perf side like the in-place
> > memory sharing rather than copying as well as on the security side by simply
> > reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs
> > memfd_secret or general ASI.
>
> The work we've done for pKVM is still an RFC [*], but there is nothing
> in it that limits it to nVHE (at least not intentionally). It should
> work with VHE and hVHE as well. On respinning the patch series [*], we
> plan on adding support for normal VMs to use guest_memfd() as well in
> arm64, mainly for testing, and to make it easier for others to base
> their work on it.

Just to clarify, I am referring specifically to the work we did in
porting guest_memfd() to pKVM/arm64. pKVM itself works only in nVHE
mode.
>
> Cheers,
> /fuad
>
> [*] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com
> >
> > Derek
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
  2024-03-08 21:05 Manwaring, Derek
@ 2024-03-11  9:26 ` Fuad Tabba
  2024-03-11  9:29   ` Fuad Tabba
  0 siblings, 1 reply; 14+ messages in thread
From: Fuad Tabba @ 2024-03-11  9:26 UTC (permalink / raw)
  To: Manwaring, Derek
  Cc: David Woodhouse, David Matlack, Brendan Jackman, qperret,
	jason.cj.chen, Gowans, James, seanjc, akpm, Roy, Patrick,
	chao.p.peng, rppt, pbonzini, Kalyazin, Nikita, lstoakes,
	Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst,
	somlo, Graf (AWS),
	Alexander, kvm, linux-coco, kvmarm, kvmarm

Hi,

On Fri, Mar 8, 2024 at 9:05 PM Manwaring, Derek <derekmn@amazon.com> wrote:
>
> On 2024-03-08 at 10:46-0700, David Woodhouse wrote:
> > On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > > I think what James is looking for (and what we are also interested
> > > in), is _eliminating_ the ability to access guest memory from the
> > > direct map entirely. And in general, eliminate the ability to access
> > > guest memory in as many ways as possible.
> >
> > Well, pKVM does that...
>
> Yes we've been looking at pKVM and it accomplishes a lot of what we're trying
> to do. Our initial inclination is that we want to stick with VHE for the lower
> overhead. We also want flexibility across server parts, so we would need to
> get pKVM working on Intel & AMD if we went this route.
>
> Certainly there are advantages of pKVM on the perf side like the in-place
> memory sharing rather than copying as well as on the security side by simply
> reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs
> memfd_secret or general ASI.

The work we've done for pKVM is still an RFC [*], but there is nothing
in it that limits it to nVHE (at least not intentionally). It should
work with VHE and hVHE as well. On respinning the patch series [*], we
plan on adding support for normal VMs to use guest_memfd() as well in
arm64, mainly for testing, and to make it easier for others to base
their work on it.

Cheers,
/fuad

[*] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com
>
> Derek
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unmapping KVM Guest Memory from Host Kernel
@ 2024-03-08 21:05 Manwaring, Derek
  2024-03-11  9:26 ` Fuad Tabba
  0 siblings, 1 reply; 14+ messages in thread
From: Manwaring, Derek @ 2024-03-08 21:05 UTC (permalink / raw)
  To: David Woodhouse, David Matlack, Brendan Jackman, tabba, qperret,
	jason.cj.chen
  Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt,
	pbonzini, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm,
	qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS),
	Alexander, kvm, linux-coco, kvmarm, kvmarm

On 2024-03-08 at 10:46-0700, David Woodhouse wrote:
> On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > I think what James is looking for (and what we are also interested
> > in), is _eliminating_ the ability to access guest memory from the
> > direct map entirely. And in general, eliminate the ability to access
> > guest memory in as many ways as possible.
>
> Well, pKVM does that...

Yes we've been looking at pKVM and it accomplishes a lot of what we're trying
to do. Our initial inclination is that we want to stick with VHE for the lower
overhead. We also want flexibility across server parts, so we would need to
get pKVM working on Intel & AMD if we went this route.

Certainly there are advantages of pKVM on the perf side like the in-place
memory sharing rather than copying as well as on the security side by simply
reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs
memfd_secret or general ASI.

Derek


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-18 14:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AQHacXBJeX10YUH0O0SiQBg1zQLaEw==>
2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James
2024-03-08 16:25   ` Brendan Jackman
2024-03-08 17:35     ` David Matlack
2024-03-08 17:45       ` David Woodhouse
2024-03-08 22:47         ` Sean Christopherson
2024-03-09  2:45       ` Manwaring, Derek
2024-03-18 14:11         ` Brendan Jackman
2024-03-08 23:22   ` Sean Christopherson
2024-03-09 11:14     ` Mike Rapoport
2024-03-14 21:45     ` Manwaring, Derek
2024-03-09  5:01   ` Matthew Wilcox
2024-03-08 21:05 Manwaring, Derek
2024-03-11  9:26 ` Fuad Tabba
2024-03-11  9:29   ` Fuad Tabba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.