* Unmapping KVM Guest Memory from Host Kernel @ 2024-03-08 15:50 ` Gowans, James 2024-03-08 16:25 ` Brendan Jackman ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Gowans, James @ 2024-03-08 15:50 UTC (permalink / raw) To: seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt, pbonzini, Woodhouse, David Cc: Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco Hello KVM, MM and memfd_secret folks, Currently when using anonymous memory for KVM guest RAM, the memory all remains mapped into the kernel direct map. We are looking at options to get KVM guest memory out of the kernel’s direct map as a principled approach to mitigating speculative execution issues in the host kernel. Our goal is to more completely address the class of issues whose leak origin is categorized as "Mapped memory" [1]. We currently have downstream-only solutions to this, but we want to move to purely upstream code. So far we have been looking at using memfd_secret, which seems to be designed exactly for usecases where it is undesirable to have some memory range accessible through the kernel’s direct map. However, memfd_secret doesn’t work out the box for KVM guest memory; the main reason seems to be that the GUP path is intentionally disabled for memfd_secret, so if we use a memfd_secret backed VMA for a memslot then KVM is not able to fault the memory in. If it’s been pre-faulted in by userspace then it seems to work. There are a few other issues around when KVM accesses the guest memory. For example the KVM PV clock code goes directly to the PFN via the pfncache, and that also breaks if the PFN is not in the direct map, so we’d need to change that sort of thing, perhaps going via userspace addresses. If we remove the memfd_secret check from the GUP path, and disable KVM’s pvclock from userspace via KVM_CPUID_FEATURES, we are able to boot a simple Linux initrd using a Firecracker VMM modified to use memfd_secret. We are also aware of ongoing work on guest_memfd. The current implementation unmaps guest memory from VMM address space, but leaves it in the kernel’s direct map. We’re not looking at unmapping from VMM userspace yet; we still need guest RAM there for PV drivers like virtio to continue to work. So KVM’s gmem doesn’t seem like the right solution? With this in mind, what’s the best way to solve getting guest RAM out of the direct map? Is memfd_secret integration with KVM the way to go, or should we build a solution on top of guest_memfd, for example via some flag that causes it to leave memory in the host userspace’s page tables, but removes it from the direct map? We are keen to help contribute to getting this working, we’re just looking for guidance from maintainers on what the correct way to solve this is. Cheers, James + colleagues Derek and Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James @ 2024-03-08 16:25 ` Brendan Jackman 2024-03-08 17:35 ` David Matlack 2024-03-08 23:22 ` Sean Christopherson 2024-03-09 5:01 ` Matthew Wilcox 2 siblings, 1 reply; 21+ messages in thread From: Brendan Jackman @ 2024-03-08 16:25 UTC (permalink / raw) To: Gowans, James Cc: seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt, pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco Hi James On Fri, 8 Mar 2024 at 16:50, Gowans, James <jgowans@amazon.com> wrote: > Our goal is to more completely address the class of issues whose leak > origin is categorized as "Mapped memory" [1]. Did you forget a link below? I'm interested in hearing about that categorisation. > ... what’s the best way to solve getting guest RAM out of > the direct map? It's perhaps a bigger hammer than you are looking for, but the solution we're working on at Google is "Address Space Isolation" (ASI) - the latest posting about that is [2]. The sense in which it's a bigger hammer is that it doesn't only support removing guest memory from the direct map, but rather arbitrary data from arbitrary kernel mappings. [2] https://lore.kernel.org/linux-mm/CA+i-1C169s8pyqZDx+iSnFmftmGfssdQA29+pYm-gqySAYWgpg@mail.gmail.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 16:25 ` Brendan Jackman @ 2024-03-08 17:35 ` David Matlack 2024-03-08 17:45 ` David Woodhouse 2024-03-09 2:45 ` Manwaring, Derek 0 siblings, 2 replies; 21+ messages in thread From: David Matlack @ 2024-03-08 17:35 UTC (permalink / raw) To: Brendan Jackman Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt, pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco On Fri, Mar 8, 2024 at 8:25 AM Brendan Jackman <jackmanb@google.com> wrote: > > Hi James > > On Fri, 8 Mar 2024 at 16:50, Gowans, James <jgowans@amazon.com> wrote: > > Our goal is to more completely address the class of issues whose leak > > origin is categorized as "Mapped memory" [1]. > > Did you forget a link below? I'm interested in hearing about that > categorisation. > > > ... what’s the best way to solve getting guest RAM out of > > the direct map? > > It's perhaps a bigger hammer than you are looking for, but the > solution we're working on at Google is "Address Space Isolation" (ASI) > - the latest posting about that is [2]. > > The sense in which it's a bigger hammer is that it doesn't only > support removing guest memory from the direct map, but rather > arbitrary data from arbitrary kernel mappings. I'm not sure if ASI provides a solution to the problem James is trying to solve. ASI creates a separate "restricted" address spaces where, yes, guest memory can be not mapped. But any access to guest memory is still allowed. An access will trigger a page fault, the kernel will switch to the "full" kernel address space (flushing hardware buffers along the way to prevent speculation), and then proceed. i.e. ASI doesn't not prevent accessing guest memory through the direct map, it just prevents speculation of guest memory through the direct map. I think what James is looking for (and what we are also interested in), is _eliminating_ the ability to access guest memory from the direct map entirely. And in general, eliminate the ability to access guest memory in as many ways as possible. For that goal, I have been thinking about guest_memfd as a solution. Yes guest_memfd today is backed by pages of memory that are mapped in the direct map. But what we can do is add the ability to back guest_memfd by pages of memory that aren't in the direct map. I haven't thought it fully through yet but something like... Hide the majority of RAM from Linux (I believe there are kernel parameters to do this) and hand it off to guest_memfd to allocate from as a source of guest memory. Then the only way to access guest memory is to mmap() a guest_memfd (e.g. for PV userspace devices). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 17:35 ` David Matlack @ 2024-03-08 17:45 ` David Woodhouse 2024-03-08 22:47 ` Sean Christopherson 2024-03-09 2:45 ` Manwaring, Derek 1 sibling, 1 reply; 21+ messages in thread From: David Woodhouse @ 2024-03-08 17:45 UTC (permalink / raw) To: David Matlack, Brendan Jackman Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt, pbonzini, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco [-- Attachment #1: Type: text/plain, Size: 341 bytes --] On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote: > I think what James is looking for (and what we are also interested > in), is _eliminating_ the ability to access guest memory from the > direct map entirely. And in general, eliminate the ability to access > guest memory in as many ways as possible. Well, pKVM does that... [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 17:45 ` David Woodhouse @ 2024-03-08 22:47 ` Sean Christopherson 0 siblings, 0 replies; 21+ messages in thread From: Sean Christopherson @ 2024-03-08 22:47 UTC (permalink / raw) To: David Woodhouse Cc: David Matlack, Brendan Jackman, James Gowans, akpm, Patrick Roy, chao.p.peng, Derek Manwaring, rppt, pbonzini, Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm, linux-coco On Fri, Mar 08, 2024, David Woodhouse wrote: > On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote: > > I think what James is looking for (and what we are also interested > > in), is _eliminating_ the ability to access guest memory from the > > direct map entirely. And in general, eliminate the ability to access > > guest memory in as many ways as possible. > > Well, pKVM does that... Out-of-tree :-) I'm not just being snarky; when pKVM lands this functionality upstream, I fully expect zapping direct map entries to be generic guest_memfd functionality that would be opt-in, either by the in-kernel technology, e.g. pKVM, or by userspace, or by some combination of the two, e.g. I can see making it optional to nuke the direct map when using guest_memfd for TDX guests so that rogue accesses from the host generate synchronous #PFs instead of latent #MCs. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 17:35 ` David Matlack 2024-03-08 17:45 ` David Woodhouse @ 2024-03-09 2:45 ` Manwaring, Derek 2024-03-18 14:11 ` Brendan Jackman 1 sibling, 1 reply; 21+ messages in thread From: Manwaring, Derek @ 2024-03-09 2:45 UTC (permalink / raw) To: David Matlack, Brendan Jackman Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt, pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco, kvmarm, tabba, qperret, jason.cj.chen On 2024-03-08 10:36-0700, David Matlack wrote: > On Fri, Mar 8, 2024 at 8:25 AM Brendan Jackman <jackmanb@google.com> wrote: > > On Fri, 8 Mar 2024 at 16:50, Gowans, James <jgowans@amazon> wrote: > > > Our goal is to more completely address the class of issues whose leak > > > origin is categorized as "Mapped memory" [1]. > > > > Did you forget a link below? I'm interested in hearing about that > > categorisation. The paper from Hertogh, et al. is https://download.vusec.net/papers/quarantine_raid23.pdf specifically Table 1. > > It's perhaps a bigger hammer than you are looking for, but the > > solution we're working on at Google is "Address Space Isolation" (ASI) > > - the latest posting about that is [2]. > > I think what James is looking for (and what we are also interested > in), is _eliminating_ the ability to access guest memory from the > direct map entirely. Actually, just preventing speculation of guest memory through the direct map is sufficient for our current focus. Brendan, I will look into the general ASI approach, thank you. Did you consider memfd_secret or a guest_memfd-based approach for Userspace-ASI? Based on Sean's earlier reply to James it sounds like the vision of guest_memfd aligns with ASI's goals. Derek ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-09 2:45 ` Manwaring, Derek @ 2024-03-18 14:11 ` Brendan Jackman 0 siblings, 0 replies; 21+ messages in thread From: Brendan Jackman @ 2024-03-18 14:11 UTC (permalink / raw) To: Manwaring, Derek Cc: David Matlack, Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt, pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco, kvmarm, tabba, qperret, jason.cj.chen On Fri, 8 Mar 2024 at 18:36, David Matlack <dmatlack@google.com> wrote: > I'm not sure if ASI provides a solution to the problem James is trying > to solve. ASI creates a separate "restricted" address spaces where, yes, > guest memory can be not mapped. But any access to guest memory is > still allowed. An access will trigger a page fault, the kernel will > switch to the "full" kernel address space (flushing hardware buffers > along the way to prevent speculation), and then proceed. i.e. ASI > doesn't not prevent accessing guest memory through the > direct map, it just prevents speculation of guest memory through the > direct map. Yes, there's also a sense in which ASI is a "smaller hammer" in that it _only_ protects against hardware-bug exploits. > it just prevents speculation of guest memory through the > direct map. (Although, this is not _all_ it does, because when returning to the restricted address space, i.e. right before VM Enter, we have an opportunity to flush _data buffers_ too. So ASI also mitigates Meltdown-style attacks, e.g. L1TF, where the speculation-related stuff all happens on the attacker side) On Sat, 9 Mar 2024 at 03:46, Manwaring, Derek <derekmn@amazon.com> wrote: > Brendan, > I will look into the general ASI approach, thank you. Did you consider > memfd_secret or a guest_memfd-based approach for Userspace-ASI? I might be misunderstanding you here: I guess you mean using memfd_secret as a way for userspace to communicate about which parts of userspace memory are "secret"? If I didn't misunderstand: we have not looked into this so far because we actually just consider _all_ userspace/guest memory to be "secret" from the perspective of other processes/guests. > Based on > Sean's earlier reply to James it sounds like the vision of guest_memfd > aligns with ASI's goals. But yes, the more general point seems to make sense, I think I need to research this topic some more, thanks! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James 2024-03-08 16:25 ` Brendan Jackman @ 2024-03-08 23:22 ` Sean Christopherson 2024-03-09 11:14 ` Mike Rapoport 2024-03-14 21:45 ` Manwaring, Derek 2024-03-09 5:01 ` Matthew Wilcox 2 siblings, 2 replies; 21+ messages in thread From: Sean Christopherson @ 2024-03-08 23:22 UTC (permalink / raw) To: James Gowans Cc: akpm, Patrick Roy, chao.p.peng, Derek Manwaring, rppt, pbonzini, David Woodhouse, Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm, linux-coco On Fri, Mar 08, 2024, James Gowans wrote: > However, memfd_secret doesn’t work out the box for KVM guest memory; the > main reason seems to be that the GUP path is intentionally disabled for > memfd_secret, so if we use a memfd_secret backed VMA for a memslot then > KVM is not able to fault the memory in. If it’s been pre-faulted in by > userspace then it seems to work. Huh, that _shouldn't_ work. The folio_is_secretmem() in gup_pte_range() is supposed to prevent the "fast gup" path from getting secretmem pages. Is this on an upstream kernel? If so, and if you have bandwidth, can you figure out why that isn't working? At the very least, I suspect the memfd_secret maintainers would be very interested to know that it's possible to fast gup secretmem. > There are a few other issues around when KVM accesses the guest memory. > For example the KVM PV clock code goes directly to the PFN via the > pfncache, and that also breaks if the PFN is not in the direct map, so > we’d need to change that sort of thing, perhaps going via userspace > addresses. > > If we remove the memfd_secret check from the GUP path, and disable KVM’s > pvclock from userspace via KVM_CPUID_FEATURES, we are able to boot a > simple Linux initrd using a Firecracker VMM modified to use > memfd_secret. > > We are also aware of ongoing work on guest_memfd. The current > implementation unmaps guest memory from VMM address space, but leaves it > in the kernel’s direct map. We’re not looking at unmapping from VMM > userspace yet; we still need guest RAM there for PV drivers like virtio > to continue to work. So KVM’s gmem doesn’t seem like the right solution? We (and by "we", I really mean the pKVM folks) are also working on allowing userspace to mmap() guest_memfd[*]. pKVM aside, the long term vision I have for guest_memfd is to be able to use it for non-CoCo VMs, precisely for the security and robustness benefits it can bring. What I am hoping to do with guest_memfd is get userspace to only map memory it needs, e.g. for emulated/synthetic devices, on-demand. I.e. to get to a state where guest memory is mapped only when it needs to be. More below. > With this in mind, what’s the best way to solve getting guest RAM out of > the direct map? Is memfd_secret integration with KVM the way to go, or > should we build a solution on top of guest_memfd, for example via some > flag that causes it to leave memory in the host userspace’s page tables, > but removes it from the direct map? 100% enhance guest_memfd. If you're willing to wait long enough, pKVM might even do all the work for you. :-) The killer feature of guest_memfd is that it allows the guest mappings to be a superset of the host userspace mappings. Most obviously, it allows mapping memory into the guest without mapping first mapping the memory into the userspace page tables. More subtly, it also makes it easier (in theory) to do things like map the memory with 1GiB hugepages for the guest, but selectively map at 4KiB granularity in the host. Or map memory as RWX in the guest, but RO in the host (I don't have a concrete use case for this, just pointing out it'll be trivial to do once guest_memfd supports mmap()). Every attempt to allow mapping VMA-based memory into a guest without it being accessible by host userspace emory failed; it's literally why we ended up implementing guest_memfd. We could teach KVM to do the same with memfd_secret, but we'd just end up re-implementing guest_memfd. memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite sure you'll be fighting memfd_secret all the way. E.g. it's not dumpable, it deliberately allocates at 4KiB granularity (though I suspect the bug you found means that it can be inadvertantly mapped with 2MiB hugepages), it has no line of sight to taking userspace out of the equation, etc. With guest_memfd on the other hand, everyone contributing to and maintaining it has goals that are *very* closely aligned with what you want to do. [*] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 23:22 ` Sean Christopherson @ 2024-03-09 11:14 ` Mike Rapoport 2024-05-13 10:31 ` Patrick Roy 2024-03-14 21:45 ` Manwaring, Derek 1 sibling, 1 reply; 21+ messages in thread From: Mike Rapoport @ 2024-03-09 11:14 UTC (permalink / raw) To: Sean Christopherson Cc: James Gowans, akpm, Patrick Roy, chao.p.peng, Derek Manwaring, pbonzini, David Woodhouse, Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm, linux-coco On Fri, Mar 08, 2024 at 03:22:50PM -0800, Sean Christopherson wrote: > On Fri, Mar 08, 2024, James Gowans wrote: > > However, memfd_secret doesn’t work out the box for KVM guest memory; the > > main reason seems to be that the GUP path is intentionally disabled for > > memfd_secret, so if we use a memfd_secret backed VMA for a memslot then > > KVM is not able to fault the memory in. If it’s been pre-faulted in by > > userspace then it seems to work. > > Huh, that _shouldn't_ work. The folio_is_secretmem() in gup_pte_range() is > supposed to prevent the "fast gup" path from getting secretmem pages. I suspect this works because KVM only calls gup on faults and if the memory was pre-faulted via memfd_secret there won't be faults and no gups from KVM. > > With this in mind, what’s the best way to solve getting guest RAM out of > > the direct map? Is memfd_secret integration with KVM the way to go, or > > should we build a solution on top of guest_memfd, for example via some > > flag that causes it to leave memory in the host userspace’s page tables, > > but removes it from the direct map? > > memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite > sure you'll be fighting memfd_secret all the way. E.g. it's not dumpable, it > deliberately allocates at 4KiB granularity (though I suspect the bug you found > means that it can be inadvertantly mapped with 2MiB hugepages), it has no line > of sight to taking userspace out of the equation, etc. > > With guest_memfd on the other hand, everyone contributing to and maintaining it > has goals that are *very* closely aligned with what you want to do. I agree with Sean, guest_memfd seems a better interface to use. It's integrated by design with KVM and removing guest memory from the direct map looks like a natural enhancement to guest_memfd. Unless I'm missing something, for fast-and-dirty POC it'll be a oneliner that adds set_memory_np() to kvm_gmem_get_folio() and then figuring out what to do with virtio :) -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-09 11:14 ` Mike Rapoport @ 2024-05-13 10:31 ` Patrick Roy 2024-05-13 15:39 ` Sean Christopherson 0 siblings, 1 reply; 21+ messages in thread From: Patrick Roy @ 2024-05-13 10:31 UTC (permalink / raw) To: Mike Rapoport, Sean Christopherson Cc: James Gowans, akpm, chao.p.peng, Derek Manwaring, pbonzini, David Woodhouse, Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm, linux-coco Hi all, On 3/9/24 11:14, Mike Rapoport wrote: >>> >>> With this in mind, what’s the best way to solve getting guest RAM out of >>> >>> the direct map? Is memfd_secret integration with KVM the way to go, or >>> >>> should we build a solution on top of guest_memfd, for example via some >>> >>> flag that causes it to leave memory in the host userspace’s page tables, >>> >>> but removes it from the direct map? >> >> memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite >> >> sure you'll be fighting memfd_secret all the way. E.g. it's not dumpable, it >> >> deliberately allocates at 4KiB granularity (though I suspect the bug you found >> >> means that it can be inadvertantly mapped with 2MiB hugepages), it has no line >> >> of sight to taking userspace out of the equation, etc. >> >> >> >> With guest_memfd on the other hand, everyone contributing to and maintaining it >> >> has goals that are *very* closely aligned with what you want to do. > > I agree with Sean, guest_memfd seems a better interface to use. It's > > integrated by design with KVM and removing guest memory from the direct map > > looks like a natural enhancement to guest_memfd. > > > > Unless I'm missing something, for fast-and-dirty POC it'll be a oneliner > > that adds set_memory_np() to kvm_gmem_get_folio() and then figuring out > > what to do with virtio :) We’ve been playing around with extending guest_memfd to remove guest memory from the direct map. Removal from direct map aspect is indeed fairly straight-forward; since we cannot map guest_memfd, we don’t need to worry about folios without direct map entries getting to places where they will cause kernel panics. However, we ran into problems running non-CoCo VMs with guest_memfd for guest memory, independent of direct map entries being available or not. There’s a handful of places where a traditional KVM / Userspace setup currently touches guest memory: * Loading the Guest Kernel into guest-owned memory * Instruction fetch from arbitrary guest addresses and guest page table walks for MMIO emulation (for example for IOAPIC accesses) * kvm-clock * I/O devices With guest_memfd, if the guest is running from guest-private memory, these need to be rethought, since now the memory is unavailable to userspace, and KVM is not enlightened about guest_memfd’s existance everywhere (when I was experimenting with this, it generally read garbage data from the shared VMA, but I think I’ve since seen some patches floating around that would make it return -EFAULT instead). CoCo VMs have various methods for working around these: You load a guest kernel using some “populate on first access” mechanism [1], kvm-clock and I/O is solved by having the guest mark the relevant address ranges as “shared” ahead of time [2] and bounce buffering via swiotlb [4], and Intel TDX solves the instruction emulation problem for MMIO by injecting a #VE and having the guest do the emulation itself [3]. For non-CoCo VMs, where memory is not encrypted, and the threat model assumes a trusted host userspace, we would like to avoid changing the VM model so completely. If we adopt CoCo’s approaches where KVM / Userspace touches guest memory we would get all the complexity, yet none of the encryption. Particularly the complexity on the MMIO path seems nasty, but x86 does not pre-decode instructions on MMIO exits (which are just EPT_VIOLATIONs) like it does for PIO exits, so I also don’t really see a way around it in the guest_memfd model. We’ve played around a lot with allowing userspace mappings of guest_memfd, and then having KVM internally access guest_memfd via userspace page tables (and came up with multiple hacky ways to boot simple Linux initrds from guest_memfd), but this is fairly awkward for two reasons: 1. Now lots of codepaths in KVM end up accessing guest_memfd, which from my understanding goes against the guest_memfd goal of making machine checks because of incorrect accesses to TDX memory impossible, and 2. We need to somehow get a userspace mapping of guest_memfd into KVM (a hacky way I could make this work was setting up kvm_user_memory_region2 with userspace_addr set to a mmap of guest_memory, which actually "works" for everything but kvm-clock, but I also realized later that this is just memfd_secret with extra steps). We also played around with having KVM access guest_memfd through the direct map (by temporarily reinserting pages into it when needed), but this again means lots of KVM code learns about how to access guest RAM via guest_memfd. There are a few other features we need to support, such as serving page faults using UFFD, which we are not too sure how to realize with guest_memfd since UFFD is VMA based (although to me some sort of “UFFD-for-FD” sounds like something that’d be useful even outside of our guest_memfd usecase). With these challenges in mind, some variant of memfd_secret continues to look attractive for the non-CoCo case. Perhaps a variant that supports in-kernel faults and provides some way for gfn_to_pfn_cache users like kvm-clock to restore the direct map entries. Sean, you mentioned that you envision guest_memfd also supporting non-CoCo VMs. Do you have some thoughts about how to make the above cases work in the guest_memfd context? > > -- > > Sincerely yours, > > Mike. Best, Patrick [1]: https://lore.kernel.org/kvm/20240404185034.3184582-1-pbonzini@redhat.com/T/#m4cc08ce3142a313d96951c2b1286eb290c7d1dac [2]: https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/kvmclock.c#L227 [3]: https://www.kernel.org/doc/html/next/x86/tdx.html#mmio-handling [4]: https://www.kernel.org/doc/html/next/x86/tdx.html#shared-memory-conversions ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-05-13 10:31 ` Patrick Roy @ 2024-05-13 15:39 ` Sean Christopherson 2024-05-13 16:01 ` Gowans, James 0 siblings, 1 reply; 21+ messages in thread From: Sean Christopherson @ 2024-05-13 15:39 UTC (permalink / raw) To: Patrick Roy Cc: Mike Rapoport, James Gowans, akpm, chao.p.peng, Derek Manwaring, pbonzini, David Woodhouse, Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm, linux-coco On Mon, May 13, 2024, Patrick Roy wrote: > For non-CoCo VMs, where memory is not encrypted, and the threat model assumes a > trusted host userspace, we would like to avoid changing the VM model so > completely. If we adopt CoCo’s approaches where KVM / Userspace touches guest > memory we would get all the complexity, yet none of the encryption. > Particularly the complexity on the MMIO path seems nasty, but x86 does not Uber nit, modern AMD CPUs do provide the byte stream, though there is at least one related erratum. Intel CPUs don't provide the byte stream or pre-decode in any way. > pre-decode instructions on MMIO exits (which are just EPT_VIOLATIONs) like it > does for PIO exits, so I also don’t really see a way around it in the > guest_memfd model. ... > Sean, you mentioned that you envision guest_memfd also supporting non-CoCo VMs. > Do you have some thoughts about how to make the above cases work in the > guest_memfd context? Yes. The hand-wavy plan is to allow selectively mmap()ing guest_memfd(). There is a long thread[*] discussing how exactly we want to do that. The TL;DR is that the basic functionality is also straightforward; the bulk of the discussion is around gup(), reclaim, page migration, etc. [*] https://lore.kernel.org/all/ZdfoR3nCEP3HTtm1@casper.infradead.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-05-13 15:39 ` Sean Christopherson @ 2024-05-13 16:01 ` Gowans, James 2024-05-13 17:09 ` Sean Christopherson 0 siblings, 1 reply; 21+ messages in thread From: Gowans, James @ 2024-05-13 16:01 UTC (permalink / raw) To: seanjc, Roy, Patrick Cc: kvm, Kalyazin, Nikita, qemu-devel, rppt, linux-coco, somlo, vbabka, akpm, Liam.Howlett, kirill.shutemov, Woodhouse, David, pbonzini, linux-mm, Graf (AWS), Alexander, Manwaring, Derek, chao.p.peng, lstoakes, mst On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote: > > Sean, you mentioned that you envision guest_memfd also supporting non-CoCo VMs. > > Do you have some thoughts about how to make the above cases work in the > > guest_memfd context? > > Yes. The hand-wavy plan is to allow selectively mmap()ing guest_memfd(). There > is a long thread[*] discussing how exactly we want to do that. The TL;DR is that > the basic functionality is also straightforward; the bulk of the discussion is > around gup(), reclaim, page migration, etc. I still need to read this long thread, but just a thought on the word "restricted" here: for MMIO the instruction can be anywhere and similarly the load/store MMIO data can be anywhere. Does this mean that for running unmodified non-CoCo VMs with guest_memfd backend that we'll always need to have the whole of guest memory mmapped? I guess the idea is that this use case will still be subject to the normal restriction rules, but for a non-CoCo non-pKVM VM there will be no restriction in practice, and userspace will need to mmap everything always? It really seems yucky to need to have all of guest RAM mmapped all the time just for MMIO to work... But I suppose there is no way around that for Intel x86. JG > > [*] https://lore.kernel.org/all/ZdfoR3nCEP3HTtm1@casper.infradead.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-05-13 16:01 ` Gowans, James @ 2024-05-13 17:09 ` Sean Christopherson 2024-05-13 19:43 ` Gowans, James 0 siblings, 1 reply; 21+ messages in thread From: Sean Christopherson @ 2024-05-13 17:09 UTC (permalink / raw) To: James Gowans Cc: Patrick Roy, kvm, Nikita Kalyazin, qemu-devel, rppt, linux-coco, somlo, vbabka, akpm, Liam.Howlett, kirill.shutemov, David Woodhouse, pbonzini, linux-mm, Alexander Graf, Derek Manwaring, chao.p.peng, lstoakes, mst On Mon, May 13, 2024, James Gowans wrote: > On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote: > > > Sean, you mentioned that you envision guest_memfd also supporting non-CoCo VMs. > > > Do you have some thoughts about how to make the above cases work in the > > > guest_memfd context? > > > > Yes. The hand-wavy plan is to allow selectively mmap()ing guest_memfd(). There > > is a long thread[*] discussing how exactly we want to do that. The TL;DR is that > > the basic functionality is also straightforward; the bulk of the discussion is > > around gup(), reclaim, page migration, etc. > > I still need to read this long thread, but just a thought on the word > "restricted" here: for MMIO the instruction can be anywhere and > similarly the load/store MMIO data can be anywhere. Does this mean that > for running unmodified non-CoCo VMs with guest_memfd backend that we'll > always need to have the whole of guest memory mmapped? Not necessarily, e.g. KVM could re-establish the direct map or mremap() on-demand. There are variation on that, e.g. if ASI[*] were to ever make it's way upstream, which is a huge if, then we could have guest_memfd mapped into a KVM-only CR3. > I guess the idea is that this use case will still be subject to the > normal restriction rules, but for a non-CoCo non-pKVM VM there will be > no restriction in practice, and userspace will need to mmap everything > always? > > It really seems yucky to need to have all of guest RAM mmapped all the > time just for MMIO to work... But I suppose there is no way around that > for Intel x86. It's not just MMIO. Nested virtualization, and more specifically shadowing nested TDP, is also problematic (probably more so than MMIO). And there are more cases, i.e. we'll need a generic solution for this. As above, there are a variety of options, it's largely just a matter of doing the work. I'm not saying it's a trivial amount of work/effort, but it's far from an unsolvable problem. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-05-13 17:09 ` Sean Christopherson @ 2024-05-13 19:43 ` Gowans, James 2024-05-13 20:36 ` Sean Christopherson 0 siblings, 1 reply; 21+ messages in thread From: Gowans, James @ 2024-05-13 19:43 UTC (permalink / raw) To: seanjc Cc: kvm, linux-coco, Kalyazin, Nikita, rppt, qemu-devel, Roy, Patrick, somlo, vbabka, akpm, kirill.shutemov, Liam.Howlett, Woodhouse, David, pbonzini, linux-mm, Graf (AWS), Alexander, Manwaring, Derek, chao.p.peng, lstoakes, mst On Mon, 2024-05-13 at 10:09 -0700, Sean Christopherson wrote: > On Mon, May 13, 2024, James Gowans wrote: > > On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote: > > > > Sean, you mentioned that you envision guest_memfd also supporting non-CoCo VMs. > > > > Do you have some thoughts about how to make the above cases work in the > > > > guest_memfd context? > > > > > > Yes. The hand-wavy plan is to allow selectively mmap()ing guest_memfd(). There > > > is a long thread[*] discussing how exactly we want to do that. The TL;DR is that > > > the basic functionality is also straightforward; the bulk of the discussion is > > > around gup(), reclaim, page migration, etc. > > > > I still need to read this long thread, but just a thought on the word > > "restricted" here: for MMIO the instruction can be anywhere and > > similarly the load/store MMIO data can be anywhere. Does this mean that > > for running unmodified non-CoCo VMs with guest_memfd backend that we'll > > always need to have the whole of guest memory mmapped? > > Not necessarily, e.g. KVM could re-establish the direct map or mremap() on-demand. > There are variation on that, e.g. if ASI[*] were to ever make it's way upstream, > which is a huge if, then we could have guest_memfd mapped into a KVM-only CR3. Yes, on-demand mapping in of guest RAM pages is definitely an option. It sounds quite challenging to need to always go via interfaces which demand map/fault memory, and also potentially quite slow needing to unmap and flush afterwards. Not too sure what you have in mind with "guest_memfd mapped into KVM- only CR3" - could you expand? > > I guess the idea is that this use case will still be subject to the > > normal restriction rules, but for a non-CoCo non-pKVM VM there will be > > no restriction in practice, and userspace will need to mmap everything > > always? > > > > It really seems yucky to need to have all of guest RAM mmapped all the > > time just for MMIO to work... But I suppose there is no way around that > > for Intel x86. > > It's not just MMIO. Nested virtualization, and more specifically shadowing nested > TDP, is also problematic (probably more so than MMIO). And there are more cases, > i.e. we'll need a generic solution for this. As above, there are a variety of > options, it's largely just a matter of doing the work. I'm not saying it's a > trivial amount of work/effort, but it's far from an unsolvable problem. I didn't even think of nested virt, but that will absolutely be an even bigger problem too. MMIO was just the first roadblock which illustrated the problem. Overall what I'm trying to figure out is whether there is any sane path here other than needing to mmap all guest RAM all the time. Trying to get nested virt and MMIO and whatever else needs access to guest RAM working by doing just-in-time (aka: on-demand) mappings and unmappings of guest RAM sounds like a painful game of whack-a-mole, potentially really bad for performance too. Do you think we should look at doing this on-demand mapping, or, for now, simply require that all guest RAM is mmapped all the time and KVM be given a valid virtual addr for the memslots? Note that I'm specifically referring to regular non-CoCo non-enlightened VMs here. For CoCo we definitely need all the cooperative MMIO and sharing. What we're trying to do here is to get guest RAM out of the direct map using guest_memfd, and now tackling the knock-on problem of whether or not to mmap all of guest RAM all the time in userspace. JG ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-05-13 19:43 ` Gowans, James @ 2024-05-13 20:36 ` Sean Christopherson 2024-05-13 22:01 ` Manwaring, Derek 0 siblings, 1 reply; 21+ messages in thread From: Sean Christopherson @ 2024-05-13 20:36 UTC (permalink / raw) To: James Gowans Cc: kvm, linux-coco, Nikita Kalyazin, rppt, qemu-devel, Patrick Roy, somlo, vbabka, akpm, kirill.shutemov, Liam.Howlett, David Woodhouse, pbonzini, linux-mm, Alexander Graf, Derek Manwaring, chao.p.peng, lstoakes, mst On Mon, May 13, 2024, James Gowans wrote: > On Mon, 2024-05-13 at 10:09 -0700, Sean Christopherson wrote: > > On Mon, May 13, 2024, James Gowans wrote: > > > On Mon, 2024-05-13 at 08:39 -0700, Sean Christopherson wrote: > > > > > Sean, you mentioned that you envision guest_memfd also supporting non-CoCo VMs. > > > > > Do you have some thoughts about how to make the above cases work in the > > > > > guest_memfd context? > > > > > > > > Yes. The hand-wavy plan is to allow selectively mmap()ing guest_memfd(). There > > > > is a long thread[*] discussing how exactly we want to do that. The TL;DR is that > > > > the basic functionality is also straightforward; the bulk of the discussion is > > > > around gup(), reclaim, page migration, etc. > > > > > > I still need to read this long thread, but just a thought on the word > > > "restricted" here: for MMIO the instruction can be anywhere and > > > similarly the load/store MMIO data can be anywhere. Does this mean that > > > for running unmodified non-CoCo VMs with guest_memfd backend that we'll > > > always need to have the whole of guest memory mmapped? > > > > Not necessarily, e.g. KVM could re-establish the direct map or mremap() on-demand. > > There are variation on that, e.g. if ASI[*] were to ever make it's way upstream, > > which is a huge if, then we could have guest_memfd mapped into a KVM-only CR3. > > Yes, on-demand mapping in of guest RAM pages is definitely an option. It > sounds quite challenging to need to always go via interfaces which > demand map/fault memory, and also potentially quite slow needing to > unmap and flush afterwards. > > Not too sure what you have in mind with "guest_memfd mapped into KVM- > only CR3" - could you expand? Remove guest_memfd from the kernel's direct map, e.g. so that the kernel at-large can't touch guest memory, but have a separate set of page tables that have the direct map, userspace page tables, _and_ kernel mappings for guest_memfd. On KVM_RUN (or vcpu_load()?), switch to KVM's CR3 so that KVM always map/unmap are free (literal nops). That's an imperfect solution as IRQs and NMIs will run kernel code with KVM's page tables, i.e. guest memory would still be exposed to the host kernel. And of course we'd need to get buy in from multiple architecturs and maintainers, etc. > > > I guess the idea is that this use case will still be subject to the > > > normal restriction rules, but for a non-CoCo non-pKVM VM there will be > > > no restriction in practice, and userspace will need to mmap everything > > > always? > > > > > > It really seems yucky to need to have all of guest RAM mmapped all the > > > time just for MMIO to work... But I suppose there is no way around that > > > for Intel x86. > > > > It's not just MMIO. Nested virtualization, and more specifically shadowing nested > > TDP, is also problematic (probably more so than MMIO). And there are more cases, > > i.e. we'll need a generic solution for this. As above, there are a variety of > > options, it's largely just a matter of doing the work. I'm not saying it's a > > trivial amount of work/effort, but it's far from an unsolvable problem. > > I didn't even think of nested virt, but that will absolutely be an even > bigger problem too. MMIO was just the first roadblock which illustrated > the problem. > Overall what I'm trying to figure out is whether there is any sane path > here other than needing to mmap all guest RAM all the time. Trying to > get nested virt and MMIO and whatever else needs access to guest RAM > working by doing just-in-time (aka: on-demand) mappings and unmappings > of guest RAM sounds like a painful game of whack-a-mole, potentially > really bad for performance too. It's a whack-a-mole game that KVM already plays, e.g. for dirty tracking, post-copy demand paging, etc.. There is still plenty of room for improvement, e.g. to reduce the number of touchpoints and thus the potential for missed cases. But KVM more or less needs to solve this basic problem no matter what, so I don't think that guest_memfd adds much, if any, burden. > Do you think we should look at doing this on-demand mapping, or, for > now, simply require that all guest RAM is mmapped all the time and KVM > be given a valid virtual addr for the memslots? I don't think "map everything into userspace" is a viable approach, precisely because it requires reflecting that back into KVM's memslots, which in turn means guest_memfd needs to allow gup(). And I don't think we want to allow gup(), because that opens a rather large can of worms (see the long thread I linked). Hmm, a slightly crazy idea (ok, maybe wildly crazy) would be to support mapping all of guest_memfd into kernel address space, but as USER=1 mappings. I.e. don't require a carve-out from userspace, but do require CLAC/STAC when access guest memory from the kernel. I think/hope that would provide the speculative execution mitigation properties you're looking for? Userspace would still have access to guest memory, but it would take a truly malicious userspace for that to matter. And when CPUs that support LASS come along, userspace would be completely unable to access guest memory through KVM's magic mapping. This too would require a decent amount of buy-in from outside of KVM, e.g. to carve out the virtual address range in the kernel. But the performance overhead would be identical to the status quo. And there could be advantages to being able to identify accesses to guest memory based purely on kernel virtual address. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-05-13 20:36 ` Sean Christopherson @ 2024-05-13 22:01 ` Manwaring, Derek 0 siblings, 0 replies; 21+ messages in thread From: Manwaring, Derek @ 2024-05-13 22:01 UTC (permalink / raw) To: Sean Christopherson, James Gowans Cc: kvm, linux-coco, Nikita Kalyazin, rppt, qemu-devel, Patrick Roy, somlo, vbabka, akpm, kirill.shutemov, Liam.Howlett, David Woodhouse, pbonzini, linux-mm, Alexander Graf, chao.p.peng, lstoakes, mst, Moritz Lipp, Claudio Canella On 2024-05-13 13:36-0700, Sean Christopherson wrote: > Hmm, a slightly crazy idea (ok, maybe wildly crazy) would be to support mapping > all of guest_memfd into kernel address space, but as USER=1 mappings. I.e. don't > require a carve-out from userspace, but do require CLAC/STAC when access guest > memory from the kernel. I think/hope that would provide the speculative execution > mitigation properties you're looking for? This is interesting. I'm hesitant to rely on SMAP since it can be enforced too late by the microarchitecture. But Canella, et al. [1] did say in 2019 that the kernel->user access route seemed to be free of any "Meltdown" effects. LASS sounds like it will be even stronger, though it's not clear to me from Intel's programming reference that speculative scenarios are in scope [2]. AMD does list SMAP specifically as a feature that can control speculation [3]. I don't see an equivalent read-access control on ARM. It has PXN for execute. Read access can probably also be controlled? But I think for the non-CoCo case we should favor solutions that are less dependent on hardware-specific protections. Derek [1] https://www.usenix.org/system/files/sec19-canella.pdf [2] https://cdrdv2.intel.com/v1/dl/getContent/671368 [3] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/software-techniques-for-managing-speculation.pdf ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 23:22 ` Sean Christopherson 2024-03-09 11:14 ` Mike Rapoport @ 2024-03-14 21:45 ` Manwaring, Derek 1 sibling, 0 replies; 21+ messages in thread From: Manwaring, Derek @ 2024-03-14 21:45 UTC (permalink / raw) To: Sean Christopherson, James Gowans Cc: akpm, Patrick Roy, chao.p.peng, rppt, pbonzini, David Woodhouse, Nikita Kalyazin, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Alexander Graf, kvm, linux-coco, xmarcalx, tabba, qperret, kvmarm On Fri, 8 Mar 2024 15:22:50 -0800, Sean Christopherson wrote: > On Fri, Mar 08, 2024, James Gowans wrote: > > We are also aware of ongoing work on guest_memfd. The current > > implementation unmaps guest memory from VMM address space, but leaves it > > in the kernel’s direct map. We’re not looking at unmapping from VMM > > userspace yet; we still need guest RAM there for PV drivers like virtio > > to continue to work. So KVM’s gmem doesn’t seem like the right solution? > > We (and by "we", I really mean the pKVM folks) are also working on allowing > userspace to mmap() guest_memfd[*]. pKVM aside, the long term vision I have for > guest_memfd is to be able to use it for non-CoCo VMs, precisely for the security > and robustness benefits it can bring. > > What I am hoping to do with guest_memfd is get userspace to only map memory it > needs, e.g. for emulated/synthetic devices, on-demand. I.e. to get to a state > where guest memory is mapped only when it needs to be. Thank you for the direction, this is super helpful. We are new to the guest_memfd space, and for simplicity we'd prefer to leave guest_memfd completely mapped in userspace. Even in the long term, we actually don't have any use for unmapping from host userspace. The current form of marking pages shared doesn't quite align with what we're trying to do either since it also shares the pages with the host kernel. What are your thoughts on a flag for KVM_CREATE_GUEST_MEMFD that only removes from the host kernel's direct map, but leaves everything mapped in userspace? Derek ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James 2024-03-08 16:25 ` Brendan Jackman 2024-03-08 23:22 ` Sean Christopherson @ 2024-03-09 5:01 ` Matthew Wilcox 2 siblings, 0 replies; 21+ messages in thread From: Matthew Wilcox @ 2024-03-09 5:01 UTC (permalink / raw) To: Gowans, James Cc: seanjc, akpm, Roy, Patrick, chao.p.peng, Manwaring, Derek, rppt, pbonzini, Woodhouse, David, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco On Fri, Mar 08, 2024 at 03:50:05PM +0000, Gowans, James wrote: > Currently when using anonymous memory for KVM guest RAM, the memory all > remains mapped into the kernel direct map. We are looking at options to > get KVM guest memory out of the kernel’s direct map as a principled > approach to mitigating speculative execution issues in the host kernel. > Our goal is to more completely address the class of issues whose leak > origin is categorized as "Mapped memory" [1]. One of the things that is holding Linux back is the inability to do I/O to memory which is not part of memmap. _So Much_ of our infrastructure is based on having a struct page available to stick into an sglist, bio, skb_frag, or whatever. The solution to this is to move to a (phys_addr, length) tuple instead of (page, offset, len) tuple. I call this "phyr" and I've written about it before. I'm not working on this as I have quite enough to do with the folio work, but I hope somebody works on it before I get time to. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel
@ 2024-03-08 21:05 Manwaring, Derek
2024-03-11 9:26 ` Fuad Tabba
0 siblings, 1 reply; 21+ messages in thread
From: Manwaring, Derek @ 2024-03-08 21:05 UTC (permalink / raw)
To: David Woodhouse, David Matlack, Brendan Jackman, tabba, qperret,
jason.cj.chen
Cc: Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt,
pbonzini, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm,
qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS),
Alexander, kvm, linux-coco, kvmarm, kvmarm
On 2024-03-08 at 10:46-0700, David Woodhouse wrote:
> On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > I think what James is looking for (and what we are also interested
> > in), is _eliminating_ the ability to access guest memory from the
> > direct map entirely. And in general, eliminate the ability to access
> > guest memory in as many ways as possible.
>
> Well, pKVM does that...
Yes we've been looking at pKVM and it accomplishes a lot of what we're trying
to do. Our initial inclination is that we want to stick with VHE for the lower
overhead. We also want flexibility across server parts, so we would need to
get pKVM working on Intel & AMD if we went this route.
Certainly there are advantages of pKVM on the perf side like the in-place
memory sharing rather than copying as well as on the security side by simply
reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs
memfd_secret or general ASI.
Derek
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-08 21:05 Manwaring, Derek @ 2024-03-11 9:26 ` Fuad Tabba 2024-03-11 9:29 ` Fuad Tabba 0 siblings, 1 reply; 21+ messages in thread From: Fuad Tabba @ 2024-03-11 9:26 UTC (permalink / raw) To: Manwaring, Derek Cc: David Woodhouse, David Matlack, Brendan Jackman, qperret, jason.cj.chen, Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt, pbonzini, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco, kvmarm, kvmarm Hi, On Fri, Mar 8, 2024 at 9:05 PM Manwaring, Derek <derekmn@amazon.com> wrote: > > On 2024-03-08 at 10:46-0700, David Woodhouse wrote: > > On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote: > > > I think what James is looking for (and what we are also interested > > > in), is _eliminating_ the ability to access guest memory from the > > > direct map entirely. And in general, eliminate the ability to access > > > guest memory in as many ways as possible. > > > > Well, pKVM does that... > > Yes we've been looking at pKVM and it accomplishes a lot of what we're trying > to do. Our initial inclination is that we want to stick with VHE for the lower > overhead. We also want flexibility across server parts, so we would need to > get pKVM working on Intel & AMD if we went this route. > > Certainly there are advantages of pKVM on the perf side like the in-place > memory sharing rather than copying as well as on the security side by simply > reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs > memfd_secret or general ASI. The work we've done for pKVM is still an RFC [*], but there is nothing in it that limits it to nVHE (at least not intentionally). It should work with VHE and hVHE as well. On respinning the patch series [*], we plan on adding support for normal VMs to use guest_memfd() as well in arm64, mainly for testing, and to make it easier for others to base their work on it. Cheers, /fuad [*] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com > > Derek > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Unmapping KVM Guest Memory from Host Kernel 2024-03-11 9:26 ` Fuad Tabba @ 2024-03-11 9:29 ` Fuad Tabba 0 siblings, 0 replies; 21+ messages in thread From: Fuad Tabba @ 2024-03-11 9:29 UTC (permalink / raw) To: Manwaring, Derek Cc: David Woodhouse, David Matlack, Brendan Jackman, qperret, jason.cj.chen, Gowans, James, seanjc, akpm, Roy, Patrick, chao.p.peng, rppt, pbonzini, Kalyazin, Nikita, lstoakes, Liam.Howlett, linux-mm, qemu-devel, kirill.shutemov, vbabka, mst, somlo, Graf (AWS), Alexander, kvm, linux-coco, kvmarm, kvmarm On Mon, Mar 11, 2024 at 9:26 AM Fuad Tabba <tabba@google.com> wrote: > > Hi, > > On Fri, Mar 8, 2024 at 9:05 PM Manwaring, Derek <derekmn@amazon.com> wrote: > > > > On 2024-03-08 at 10:46-0700, David Woodhouse wrote: > > > On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote: > > > > I think what James is looking for (and what we are also interested > > > > in), is _eliminating_ the ability to access guest memory from the > > > > direct map entirely. And in general, eliminate the ability to access > > > > guest memory in as many ways as possible. > > > > > > Well, pKVM does that... > > > > Yes we've been looking at pKVM and it accomplishes a lot of what we're trying > > to do. Our initial inclination is that we want to stick with VHE for the lower > > overhead. We also want flexibility across server parts, so we would need to > > get pKVM working on Intel & AMD if we went this route. > > > > Certainly there are advantages of pKVM on the perf side like the in-place > > memory sharing rather than copying as well as on the security side by simply > > reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs > > memfd_secret or general ASI. > > The work we've done for pKVM is still an RFC [*], but there is nothing > in it that limits it to nVHE (at least not intentionally). It should > work with VHE and hVHE as well. On respinning the patch series [*], we > plan on adding support for normal VMs to use guest_memfd() as well in > arm64, mainly for testing, and to make it easier for others to base > their work on it. Just to clarify, I am referring specifically to the work we did in porting guest_memfd() to pKVM/arm64. pKVM itself works only in nVHE mode. > > Cheers, > /fuad > > [*] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com > > > > Derek > > ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-05-13 22:02 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <AQHacXBJeX10YUH0O0SiQBg1zQLaEw==> 2024-03-08 15:50 ` Unmapping KVM Guest Memory from Host Kernel Gowans, James 2024-03-08 16:25 ` Brendan Jackman 2024-03-08 17:35 ` David Matlack 2024-03-08 17:45 ` David Woodhouse 2024-03-08 22:47 ` Sean Christopherson 2024-03-09 2:45 ` Manwaring, Derek 2024-03-18 14:11 ` Brendan Jackman 2024-03-08 23:22 ` Sean Christopherson 2024-03-09 11:14 ` Mike Rapoport 2024-05-13 10:31 ` Patrick Roy 2024-05-13 15:39 ` Sean Christopherson 2024-05-13 16:01 ` Gowans, James 2024-05-13 17:09 ` Sean Christopherson 2024-05-13 19:43 ` Gowans, James 2024-05-13 20:36 ` Sean Christopherson 2024-05-13 22:01 ` Manwaring, Derek 2024-03-14 21:45 ` Manwaring, Derek 2024-03-09 5:01 ` Matthew Wilcox 2024-03-08 21:05 Manwaring, Derek 2024-03-11 9:26 ` Fuad Tabba 2024-03-11 9:29 ` Fuad Tabba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.