All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev,
	 chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  willy@infradead.org,
	akpm@linux-foundation.org, paul@paul-moore.com,
	 jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	 kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org,  chao.p.peng@linux.intel.com,
	tabba@google.com, jarkko@kernel.org,  yu.c.zhang@linux.intel.com,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	 vbabka@suse.cz, david@redhat.com, qperret@google.com,
	michael.roth@amd.com,  wei.w.wang@intel.com,
	liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com
Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Date: Tue, 15 Aug 2023 18:43:39 +0000	[thread overview]
Message-ID: <diqzh6p02lk4.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <ZNKv9ul2I7A4V7IF@google.com> (message from Sean Christopherson on Tue, 8 Aug 2023 14:13:26 -0700)

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> I’d like to propose an alternative to the refcounting approach between
>> the gmem file and associated kvm, where we think of KVM’s memslots as
>> users of the gmem file.
>>
>> Instead of having the gmem file pin the VM (i.e. take a refcount on
>> kvm), we could let memslot take a refcount on the gmem file when the
>> memslots are configured.
>>
>> Here’s a POC patch that flips the refcounting (and modified selftests in
>> the next commit):
>> https://github.com/googleprodkernel/linux-cc/commit/7f487b029b89b9f3e9b094a721bc0772f3c8c797
>>
>> One side effect of having the gmem file pin the VM is that now the gmem
>> file becomes sort of a false handle on the VM:
>>
>> + Closing the file destroys the file pointers in the VM and invalidates
>>   the pointers
>
> Yeah, this is less than ideal.  But, it's also how things operate today.  KVM
> doesn't hold references to VMAs or files, e.g. if userspace munmap()s memory,
> any and all SPTEs pointing at the memory are zapped.  The only difference with
> gmem is that KVM needs to explicitly invalidate file pointers, instead of that
> happening behind the scenes (no more VMAs to find).  Again, I agree the resulting
> code is more complex than I would prefer, but from a userspace perspective I
> don't see this as problematic.
>
>> + Keeping the file open keeps the VM around in the kernel even though
>>   the VM fd may already be closed.
>
> That is perfectly ok.  There is plenty of prior art, as well as plenty of ways
> for userspace to shoot itself in the foot.  E.g. open a stats fd for a vCPU and
> the VM and all its vCPUs will be kept alive.  And conceptually it's sound,
> anything created in the scope of a VM _should_ pin the VM.
>

Thanks for explaining!

>> I feel that memslots form a natural way of managing usage of the gmem
>> file. When a memslot is created, it is using the file; hence we take a
>> refcount on the gmem file, and as memslots are removed, we drop
>> refcounts on the gmem file.
>
> Yes and no.  It's definitely more natural *if* the goal is to allow guest_memfd
> memory to exist without being attached to a VM.  But I'm not at all convinced
> that we want to allow that, or that it has desirable properties.  With TDX and
> SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM is
> very underisable (more below).
>

This is a little confusing, with the file/inode split in gmem where the
physical memory/data is attached to the inode and the file represents
the VM's view of that memory, won't the memory outlive the VM?

This [1] POC was built based on that premise, that the gmem inode can be
linked to another file and handed off to another VM, to facilitate
intra-host migration, where the point is to save the work of rebuilding
the VM's memory in the destination VM.

With this, the bindings don't outlive the VM, but the data/memory
does. I think this split design you proposed is really nice.

>> The KVM pointer is shared among all the bindings in gmem’s xarray, and we can
>> enforce that a gmem file is used only with one VM:
>>
>> + When binding a memslot to the file, if a kvm pointer exists, it must
>>   be the same kvm as the one in this binding
>> + When the binding to the last memslot is removed from a file, NULL the
>>   kvm pointer.
>
> Nullifying the KVM pointer isn't sufficient, because without additional actions
> userspace could extract data from a VM by deleting its memslots and then binding
> the guest_memfd to an attacker controlled VM.  Or more likely with TDX and SNP,
> induce badness by coercing KVM into mapping memory into a guest with the wrong
> ASID/HKID.
>
> I can think of three ways to handle that:
>
>   (a) prevent a different VM from *ever* binding to the gmem instance
>   (b) free/zero physical pages when unbinding
>   (c) free/zero when binding to a different VM
>
> Option (a) is easy, but that pretty much defeats the purpose of decopuling
> guest_memfd from a VM.
>
> Option (b) isn't hard to implement, but it screws up the lifecycle of the memory,
> e.g. would require memory when a memslot is deleted.  That isn't necessarily a
> deal-breaker, but it runs counter to how KVM memlots currently operate.  Memslots
> are basically just weird page tables, e.g. deleting a memslot doesn't have any
> impact on the underlying data in memory.  TDX throws a wrench in this as removing
> a page from the Secure EPT is effectively destructive to the data (can't be mapped
> back in to the VM without zeroing the data), but IMO that's an oddity with TDX and
> not necessarily something we want to carry over to other VM types.
>
> There would also be performance implications (probably a non-issue in practice),
> and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.  E.g. what
> should happen if the last memslot (binding) is deleted, but there outstanding userspace
> mappings?
>
> Option (c) is better from a lifecycle perspective, but it adds its own flavor of
> complexity, e.g. the performant way to reclaim TDX memory requires the TDMR
> (effectively the VM pointer), and so a deferred relcaim doesn't really work for
> TDX.  And I'm pretty sure it *can't* work for SNP, because RMP entries must not
> outlive the VM; KVM can't reuse an ASID if there are pages assigned to that ASID
> in the RMP, i.e. until all memory belonging to the VM has been fully freed.
>

If we are on the same page that the memory should outlive the VM but not
the bindings, then associating the gmem inode to a new VM should be a
feature and not a bug.

What do we want to defend against here?

(a) Malicious host VMM

For a malicious host VMM to read guest memory (with TDX and SNP), it can
create a new VM with the same HKID/ASID as the victim VM, rebind the
gmem inode to a VM crafted with an image that dumps the memory.

I believe it is not possible for userspace to arbitrarily select a
matching HKID unless userspace uses the intra-host migration ioctls, but if the
migration ioctl is used, then EPTs are migrated and the memory dumper VM
can't successfully run a different image from the victim VM. If the
dumper VM needs to run the same image as the victim VM, then it would be
a successful migration rather than an attack. (Perhaps we need to clean
up some #MCs here but that can be a separate patch)

(b) Malicious host kernel

A malicious host kernel can allow a malicious host VMM to re-use a HKID
for the dumper VM, but this isn't something a better gmem design can
defend against.

(c) Attacks using gmem for software-protected VMs

Attacks using gmem for software-protected VMs are possible since there
is no real encryption with HKID/ASID (yet?). The selftest for [1]
actually uses this lack of encryption to test that the destination VM
can read the source VM's memory after the migration. In the POC [1], as
long as both destination VM knows where in the inode's memory to read,
it can read what it wants to. This is a problem for software-protected
VMs, but I feel that it is also a separate issue from gmem's design.

>> Could binding gmem files not on creation, but at memslot configuration
>> time be sufficient and simpler?
>
> After working through the flows, I think binding on-demand would simplify the
> refcounting (stating the obvious), but complicate the lifecycle of the memory as
> well as the contract between KVM and userspace,

If we are on the same page that the memory should outlive the VM but not
the bindings, does it still complicate the lifecycle of the memory and
the userspace/KVM contract? Could it just be a different contract?

> and would break the separation of
> concerns between the inode (physical memory / data) and file (VM's view / mappings).

Binding on-demand is orthogonal to the separation of concerns between
inode and file, because it can be built regardless of whether we do the
gmem file/inode split.

+ This flip-the-refcounting POC is built with the file/inode split and
+ In [2] (the delayed binding approach to solve intra-host migration), I
  also tried flipping the refcounting, and that without the gmem
  file/inode split. (Refcounting in [2] is buggy because the file can't
  take a refcount on KVM, but it would work without taking that refcount)

[1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.com/T/
[2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9915c9c1e4cc1006a40b49df

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev,
	chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, willy@infradead.org,
	akpm@linux-foundation.org, paul@paul-moore.com,
	jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org, chao.p.peng@linux.intel.com,
	tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	vbabka@suse.cz, david@redhat.com, qperret@google.com,
	michael.roth@amd.com, wei.w.wang@intel.com,
	liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	kirill.shutemov@linux.intel.com
Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Date: Tue, 15 Aug 2023 18:43:39 +0000	[thread overview]
Message-ID: <diqzh6p02lk4.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <ZNKv9ul2I7A4V7IF@google.com> (message from Sean Christopherson on Tue, 8 Aug 2023 14:13:26 -0700)

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> I’d like to propose an alternative to the refcounting approach between
>> the gmem file and associated kvm, where we think of KVM’s memslots as
>> users of the gmem file.
>>
>> Instead of having the gmem file pin the VM (i.e. take a refcount on
>> kvm), we could let memslot take a refcount on the gmem file when the
>> memslots are configured.
>>
>> Here’s a POC patch that flips the refcounting (and modified selftests in
>> the next commit):
>> https://github.com/googleprodkernel/linux-cc/commit/7f487b029b89b9f3e9b094a721bc0772f3c8c797
>>
>> One side effect of having the gmem file pin the VM is that now the gmem
>> file becomes sort of a false handle on the VM:
>>
>> + Closing the file destroys the file pointers in the VM and invalidates
>>   the pointers
>
> Yeah, this is less than ideal.  But, it's also how things operate today.  KVM
> doesn't hold references to VMAs or files, e.g. if userspace munmap()s memory,
> any and all SPTEs pointing at the memory are zapped.  The only difference with
> gmem is that KVM needs to explicitly invalidate file pointers, instead of that
> happening behind the scenes (no more VMAs to find).  Again, I agree the resulting
> code is more complex than I would prefer, but from a userspace perspective I
> don't see this as problematic.
>
>> + Keeping the file open keeps the VM around in the kernel even though
>>   the VM fd may already be closed.
>
> That is perfectly ok.  There is plenty of prior art, as well as plenty of ways
> for userspace to shoot itself in the foot.  E.g. open a stats fd for a vCPU and
> the VM and all its vCPUs will be kept alive.  And conceptually it's sound,
> anything created in the scope of a VM _should_ pin the VM.
>

Thanks for explaining!

>> I feel that memslots form a natural way of managing usage of the gmem
>> file. When a memslot is created, it is using the file; hence we take a
>> refcount on the gmem file, and as memslots are removed, we drop
>> refcounts on the gmem file.
>
> Yes and no.  It's definitely more natural *if* the goal is to allow guest_memfd
> memory to exist without being attached to a VM.  But I'm not at all convinced
> that we want to allow that, or that it has desirable properties.  With TDX and
> SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM is
> very underisable (more below).
>

This is a little confusing, with the file/inode split in gmem where the
physical memory/data is attached to the inode and the file represents
the VM's view of that memory, won't the memory outlive the VM?

This [1] POC was built based on that premise, that the gmem inode can be
linked to another file and handed off to another VM, to facilitate
intra-host migration, where the point is to save the work of rebuilding
the VM's memory in the destination VM.

With this, the bindings don't outlive the VM, but the data/memory
does. I think this split design you proposed is really nice.

>> The KVM pointer is shared among all the bindings in gmem’s xarray, and we can
>> enforce that a gmem file is used only with one VM:
>>
>> + When binding a memslot to the file, if a kvm pointer exists, it must
>>   be the same kvm as the one in this binding
>> + When the binding to the last memslot is removed from a file, NULL the
>>   kvm pointer.
>
> Nullifying the KVM pointer isn't sufficient, because without additional actions
> userspace could extract data from a VM by deleting its memslots and then binding
> the guest_memfd to an attacker controlled VM.  Or more likely with TDX and SNP,
> induce badness by coercing KVM into mapping memory into a guest with the wrong
> ASID/HKID.
>
> I can think of three ways to handle that:
>
>   (a) prevent a different VM from *ever* binding to the gmem instance
>   (b) free/zero physical pages when unbinding
>   (c) free/zero when binding to a different VM
>
> Option (a) is easy, but that pretty much defeats the purpose of decopuling
> guest_memfd from a VM.
>
> Option (b) isn't hard to implement, but it screws up the lifecycle of the memory,
> e.g. would require memory when a memslot is deleted.  That isn't necessarily a
> deal-breaker, but it runs counter to how KVM memlots currently operate.  Memslots
> are basically just weird page tables, e.g. deleting a memslot doesn't have any
> impact on the underlying data in memory.  TDX throws a wrench in this as removing
> a page from the Secure EPT is effectively destructive to the data (can't be mapped
> back in to the VM without zeroing the data), but IMO that's an oddity with TDX and
> not necessarily something we want to carry over to other VM types.
>
> There would also be performance implications (probably a non-issue in practice),
> and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.  E.g. what
> should happen if the last memslot (binding) is deleted, but there outstanding userspace
> mappings?
>
> Option (c) is better from a lifecycle perspective, but it adds its own flavor of
> complexity, e.g. the performant way to reclaim TDX memory requires the TDMR
> (effectively the VM pointer), and so a deferred relcaim doesn't really work for
> TDX.  And I'm pretty sure it *can't* work for SNP, because RMP entries must not
> outlive the VM; KVM can't reuse an ASID if there are pages assigned to that ASID
> in the RMP, i.e. until all memory belonging to the VM has been fully freed.
>

If we are on the same page that the memory should outlive the VM but not
the bindings, then associating the gmem inode to a new VM should be a
feature and not a bug.

What do we want to defend against here?

(a) Malicious host VMM

For a malicious host VMM to read guest memory (with TDX and SNP), it can
create a new VM with the same HKID/ASID as the victim VM, rebind the
gmem inode to a VM crafted with an image that dumps the memory.

I believe it is not possible for userspace to arbitrarily select a
matching HKID unless userspace uses the intra-host migration ioctls, but if the
migration ioctl is used, then EPTs are migrated and the memory dumper VM
can't successfully run a different image from the victim VM. If the
dumper VM needs to run the same image as the victim VM, then it would be
a successful migration rather than an attack. (Perhaps we need to clean
up some #MCs here but that can be a separate patch)

(b) Malicious host kernel

A malicious host kernel can allow a malicious host VMM to re-use a HKID
for the dumper VM, but this isn't something a better gmem design can
defend against.

(c) Attacks using gmem for software-protected VMs

Attacks using gmem for software-protected VMs are possible since there
is no real encryption with HKID/ASID (yet?). The selftest for [1]
actually uses this lack of encryption to test that the destination VM
can read the source VM's memory after the migration. In the POC [1], as
long as both destination VM knows where in the inode's memory to read,
it can read what it wants to. This is a problem for software-protected
VMs, but I feel that it is also a separate issue from gmem's design.

>> Could binding gmem files not on creation, but at memslot configuration
>> time be sufficient and simpler?
>
> After working through the flows, I think binding on-demand would simplify the
> refcounting (stating the obvious), but complicate the lifecycle of the memory as
> well as the contract between KVM and userspace,

If we are on the same page that the memory should outlive the VM but not
the bindings, does it still complicate the lifecycle of the memory and
the userspace/KVM contract? Could it just be a different contract?

> and would break the separation of
> concerns between the inode (physical memory / data) and file (VM's view / mappings).

Binding on-demand is orthogonal to the separation of concerns between
inode and file, because it can be built regardless of whether we do the
gmem file/inode split.

+ This flip-the-refcounting POC is built with the file/inode split and
+ In [2] (the delayed binding approach to solve intra-host migration), I
  also tried flipping the refcounting, and that without the gmem
  file/inode split. (Refcounting in [2] is buggy because the file can't
  take a refcount on KVM, but it would work without taking that refcount)

[1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.com/T/
[2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9915c9c1e4cc1006a40b49df

WARNING: multiple messages have this Message-ID (diff)
From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, david@redhat.com,
	yu.c.zhang@linux.intel.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, chao.p.peng@linux.intel.com,
	linux-riscv@lists.infradead.org, isaku.yamahata@gmail.com,
	paul@paul-moore.com, maz@kernel.org, chenhuacai@kernel.org,
	jmorris@namei.org, willy@infradead.org, wei.w.wang@intel.com,
	tabba@google.com, jarkko@kernel.org, serge@hallyn.com,
	mail@maciej.szmigiero.name, aou@eecs.berkeley.edu,
	vbabka@suse.cz, michael.roth@amd.com, paul.walmsley@sifive.com,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	qperret@google.com, liam.merwick@oracle.com,
	linux-mips@vger.kernel.org, oliver.upton@linux.dev,
	linux-security-module@vger.kernel.org, palmer@dabbelt.com,
	kvm-riscv@lists.infradead.org, anup@brainfault.org,
	linux-fsdevel@vger.kernel.org, pbonzini@redhat.com,
	akpm@linux-foundation.org, vannapurve@google.com,
	linuxppc-dev@lists.ozlabs.org, kirill.shutemov@linux.intel.com
Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Date: Tue, 15 Aug 2023 18:43:39 +0000	[thread overview]
Message-ID: <diqzh6p02lk4.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <ZNKv9ul2I7A4V7IF@google.com> (message from Sean Christopherson on Tue, 8 Aug 2023 14:13:26 -0700)

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> I’d like to propose an alternative to the refcounting approach between
>> the gmem file and associated kvm, where we think of KVM’s memslots as
>> users of the gmem file.
>>
>> Instead of having the gmem file pin the VM (i.e. take a refcount on
>> kvm), we could let memslot take a refcount on the gmem file when the
>> memslots are configured.
>>
>> Here’s a POC patch that flips the refcounting (and modified selftests in
>> the next commit):
>> https://github.com/googleprodkernel/linux-cc/commit/7f487b029b89b9f3e9b094a721bc0772f3c8c797
>>
>> One side effect of having the gmem file pin the VM is that now the gmem
>> file becomes sort of a false handle on the VM:
>>
>> + Closing the file destroys the file pointers in the VM and invalidates
>>   the pointers
>
> Yeah, this is less than ideal.  But, it's also how things operate today.  KVM
> doesn't hold references to VMAs or files, e.g. if userspace munmap()s memory,
> any and all SPTEs pointing at the memory are zapped.  The only difference with
> gmem is that KVM needs to explicitly invalidate file pointers, instead of that
> happening behind the scenes (no more VMAs to find).  Again, I agree the resulting
> code is more complex than I would prefer, but from a userspace perspective I
> don't see this as problematic.
>
>> + Keeping the file open keeps the VM around in the kernel even though
>>   the VM fd may already be closed.
>
> That is perfectly ok.  There is plenty of prior art, as well as plenty of ways
> for userspace to shoot itself in the foot.  E.g. open a stats fd for a vCPU and
> the VM and all its vCPUs will be kept alive.  And conceptually it's sound,
> anything created in the scope of a VM _should_ pin the VM.
>

Thanks for explaining!

>> I feel that memslots form a natural way of managing usage of the gmem
>> file. When a memslot is created, it is using the file; hence we take a
>> refcount on the gmem file, and as memslots are removed, we drop
>> refcounts on the gmem file.
>
> Yes and no.  It's definitely more natural *if* the goal is to allow guest_memfd
> memory to exist without being attached to a VM.  But I'm not at all convinced
> that we want to allow that, or that it has desirable properties.  With TDX and
> SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM is
> very underisable (more below).
>

This is a little confusing, with the file/inode split in gmem where the
physical memory/data is attached to the inode and the file represents
the VM's view of that memory, won't the memory outlive the VM?

This [1] POC was built based on that premise, that the gmem inode can be
linked to another file and handed off to another VM, to facilitate
intra-host migration, where the point is to save the work of rebuilding
the VM's memory in the destination VM.

With this, the bindings don't outlive the VM, but the data/memory
does. I think this split design you proposed is really nice.

>> The KVM pointer is shared among all the bindings in gmem’s xarray, and we can
>> enforce that a gmem file is used only with one VM:
>>
>> + When binding a memslot to the file, if a kvm pointer exists, it must
>>   be the same kvm as the one in this binding
>> + When the binding to the last memslot is removed from a file, NULL the
>>   kvm pointer.
>
> Nullifying the KVM pointer isn't sufficient, because without additional actions
> userspace could extract data from a VM by deleting its memslots and then binding
> the guest_memfd to an attacker controlled VM.  Or more likely with TDX and SNP,
> induce badness by coercing KVM into mapping memory into a guest with the wrong
> ASID/HKID.
>
> I can think of three ways to handle that:
>
>   (a) prevent a different VM from *ever* binding to the gmem instance
>   (b) free/zero physical pages when unbinding
>   (c) free/zero when binding to a different VM
>
> Option (a) is easy, but that pretty much defeats the purpose of decopuling
> guest_memfd from a VM.
>
> Option (b) isn't hard to implement, but it screws up the lifecycle of the memory,
> e.g. would require memory when a memslot is deleted.  That isn't necessarily a
> deal-breaker, but it runs counter to how KVM memlots currently operate.  Memslots
> are basically just weird page tables, e.g. deleting a memslot doesn't have any
> impact on the underlying data in memory.  TDX throws a wrench in this as removing
> a page from the Secure EPT is effectively destructive to the data (can't be mapped
> back in to the VM without zeroing the data), but IMO that's an oddity with TDX and
> not necessarily something we want to carry over to other VM types.
>
> There would also be performance implications (probably a non-issue in practice),
> and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.  E.g. what
> should happen if the last memslot (binding) is deleted, but there outstanding userspace
> mappings?
>
> Option (c) is better from a lifecycle perspective, but it adds its own flavor of
> complexity, e.g. the performant way to reclaim TDX memory requires the TDMR
> (effectively the VM pointer), and so a deferred relcaim doesn't really work for
> TDX.  And I'm pretty sure it *can't* work for SNP, because RMP entries must not
> outlive the VM; KVM can't reuse an ASID if there are pages assigned to that ASID
> in the RMP, i.e. until all memory belonging to the VM has been fully freed.
>

If we are on the same page that the memory should outlive the VM but not
the bindings, then associating the gmem inode to a new VM should be a
feature and not a bug.

What do we want to defend against here?

(a) Malicious host VMM

For a malicious host VMM to read guest memory (with TDX and SNP), it can
create a new VM with the same HKID/ASID as the victim VM, rebind the
gmem inode to a VM crafted with an image that dumps the memory.

I believe it is not possible for userspace to arbitrarily select a
matching HKID unless userspace uses the intra-host migration ioctls, but if the
migration ioctl is used, then EPTs are migrated and the memory dumper VM
can't successfully run a different image from the victim VM. If the
dumper VM needs to run the same image as the victim VM, then it would be
a successful migration rather than an attack. (Perhaps we need to clean
up some #MCs here but that can be a separate patch)

(b) Malicious host kernel

A malicious host kernel can allow a malicious host VMM to re-use a HKID
for the dumper VM, but this isn't something a better gmem design can
defend against.

(c) Attacks using gmem for software-protected VMs

Attacks using gmem for software-protected VMs are possible since there
is no real encryption with HKID/ASID (yet?). The selftest for [1]
actually uses this lack of encryption to test that the destination VM
can read the source VM's memory after the migration. In the POC [1], as
long as both destination VM knows where in the inode's memory to read,
it can read what it wants to. This is a problem for software-protected
VMs, but I feel that it is also a separate issue from gmem's design.

>> Could binding gmem files not on creation, but at memslot configuration
>> time be sufficient and simpler?
>
> After working through the flows, I think binding on-demand would simplify the
> refcounting (stating the obvious), but complicate the lifecycle of the memory as
> well as the contract between KVM and userspace,

If we are on the same page that the memory should outlive the VM but not
the bindings, does it still complicate the lifecycle of the memory and
the userspace/KVM contract? Could it just be a different contract?

> and would break the separation of
> concerns between the inode (physical memory / data) and file (VM's view / mappings).

Binding on-demand is orthogonal to the separation of concerns between
inode and file, because it can be built regardless of whether we do the
gmem file/inode split.

+ This flip-the-refcounting POC is built with the file/inode split and
+ In [2] (the delayed binding approach to solve intra-host migration), I
  also tried flipping the refcounting, and that without the gmem
  file/inode split. (Refcounting in [2] is buggy because the file can't
  take a refcount on KVM, but it would work without taking that refcount)

[1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.com/T/
[2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9915c9c1e4cc1006a40b49df

WARNING: multiple messages have this Message-ID (diff)
From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev,
	 chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  willy@infradead.org,
	akpm@linux-foundation.org, paul@paul-moore.com,
	 jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	 kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org,  chao.p.peng@linux.intel.com,
	tabba@google.com, jarkko@kernel.org,  yu.c.zhang@linux.intel.com,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	 vbabka@suse.cz, david@redhat.com, qperret@google.com,
	michael.roth@amd.com,  wei.w.wang@intel.com,
	liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com
Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Date: Tue, 15 Aug 2023 18:43:39 +0000	[thread overview]
Message-ID: <diqzh6p02lk4.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <ZNKv9ul2I7A4V7IF@google.com> (message from Sean Christopherson on Tue, 8 Aug 2023 14:13:26 -0700)

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> I’d like to propose an alternative to the refcounting approach between
>> the gmem file and associated kvm, where we think of KVM’s memslots as
>> users of the gmem file.
>>
>> Instead of having the gmem file pin the VM (i.e. take a refcount on
>> kvm), we could let memslot take a refcount on the gmem file when the
>> memslots are configured.
>>
>> Here’s a POC patch that flips the refcounting (and modified selftests in
>> the next commit):
>> https://github.com/googleprodkernel/linux-cc/commit/7f487b029b89b9f3e9b094a721bc0772f3c8c797
>>
>> One side effect of having the gmem file pin the VM is that now the gmem
>> file becomes sort of a false handle on the VM:
>>
>> + Closing the file destroys the file pointers in the VM and invalidates
>>   the pointers
>
> Yeah, this is less than ideal.  But, it's also how things operate today.  KVM
> doesn't hold references to VMAs or files, e.g. if userspace munmap()s memory,
> any and all SPTEs pointing at the memory are zapped.  The only difference with
> gmem is that KVM needs to explicitly invalidate file pointers, instead of that
> happening behind the scenes (no more VMAs to find).  Again, I agree the resulting
> code is more complex than I would prefer, but from a userspace perspective I
> don't see this as problematic.
>
>> + Keeping the file open keeps the VM around in the kernel even though
>>   the VM fd may already be closed.
>
> That is perfectly ok.  There is plenty of prior art, as well as plenty of ways
> for userspace to shoot itself in the foot.  E.g. open a stats fd for a vCPU and
> the VM and all its vCPUs will be kept alive.  And conceptually it's sound,
> anything created in the scope of a VM _should_ pin the VM.
>

Thanks for explaining!

>> I feel that memslots form a natural way of managing usage of the gmem
>> file. When a memslot is created, it is using the file; hence we take a
>> refcount on the gmem file, and as memslots are removed, we drop
>> refcounts on the gmem file.
>
> Yes and no.  It's definitely more natural *if* the goal is to allow guest_memfd
> memory to exist without being attached to a VM.  But I'm not at all convinced
> that we want to allow that, or that it has desirable properties.  With TDX and
> SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM is
> very underisable (more below).
>

This is a little confusing, with the file/inode split in gmem where the
physical memory/data is attached to the inode and the file represents
the VM's view of that memory, won't the memory outlive the VM?

This [1] POC was built based on that premise, that the gmem inode can be
linked to another file and handed off to another VM, to facilitate
intra-host migration, where the point is to save the work of rebuilding
the VM's memory in the destination VM.

With this, the bindings don't outlive the VM, but the data/memory
does. I think this split design you proposed is really nice.

>> The KVM pointer is shared among all the bindings in gmem’s xarray, and we can
>> enforce that a gmem file is used only with one VM:
>>
>> + When binding a memslot to the file, if a kvm pointer exists, it must
>>   be the same kvm as the one in this binding
>> + When the binding to the last memslot is removed from a file, NULL the
>>   kvm pointer.
>
> Nullifying the KVM pointer isn't sufficient, because without additional actions
> userspace could extract data from a VM by deleting its memslots and then binding
> the guest_memfd to an attacker controlled VM.  Or more likely with TDX and SNP,
> induce badness by coercing KVM into mapping memory into a guest with the wrong
> ASID/HKID.
>
> I can think of three ways to handle that:
>
>   (a) prevent a different VM from *ever* binding to the gmem instance
>   (b) free/zero physical pages when unbinding
>   (c) free/zero when binding to a different VM
>
> Option (a) is easy, but that pretty much defeats the purpose of decopuling
> guest_memfd from a VM.
>
> Option (b) isn't hard to implement, but it screws up the lifecycle of the memory,
> e.g. would require memory when a memslot is deleted.  That isn't necessarily a
> deal-breaker, but it runs counter to how KVM memlots currently operate.  Memslots
> are basically just weird page tables, e.g. deleting a memslot doesn't have any
> impact on the underlying data in memory.  TDX throws a wrench in this as removing
> a page from the Secure EPT is effectively destructive to the data (can't be mapped
> back in to the VM without zeroing the data), but IMO that's an oddity with TDX and
> not necessarily something we want to carry over to other VM types.
>
> There would also be performance implications (probably a non-issue in practice),
> and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.  E.g. what
> should happen if the last memslot (binding) is deleted, but there outstanding userspace
> mappings?
>
> Option (c) is better from a lifecycle perspective, but it adds its own flavor of
> complexity, e.g. the performant way to reclaim TDX memory requires the TDMR
> (effectively the VM pointer), and so a deferred relcaim doesn't really work for
> TDX.  And I'm pretty sure it *can't* work for SNP, because RMP entries must not
> outlive the VM; KVM can't reuse an ASID if there are pages assigned to that ASID
> in the RMP, i.e. until all memory belonging to the VM has been fully freed.
>

If we are on the same page that the memory should outlive the VM but not
the bindings, then associating the gmem inode to a new VM should be a
feature and not a bug.

What do we want to defend against here?

(a) Malicious host VMM

For a malicious host VMM to read guest memory (with TDX and SNP), it can
create a new VM with the same HKID/ASID as the victim VM, rebind the
gmem inode to a VM crafted with an image that dumps the memory.

I believe it is not possible for userspace to arbitrarily select a
matching HKID unless userspace uses the intra-host migration ioctls, but if the
migration ioctl is used, then EPTs are migrated and the memory dumper VM
can't successfully run a different image from the victim VM. If the
dumper VM needs to run the same image as the victim VM, then it would be
a successful migration rather than an attack. (Perhaps we need to clean
up some #MCs here but that can be a separate patch)

(b) Malicious host kernel

A malicious host kernel can allow a malicious host VMM to re-use a HKID
for the dumper VM, but this isn't something a better gmem design can
defend against.

(c) Attacks using gmem for software-protected VMs

Attacks using gmem for software-protected VMs are possible since there
is no real encryption with HKID/ASID (yet?). The selftest for [1]
actually uses this lack of encryption to test that the destination VM
can read the source VM's memory after the migration. In the POC [1], as
long as both destination VM knows where in the inode's memory to read,
it can read what it wants to. This is a problem for software-protected
VMs, but I feel that it is also a separate issue from gmem's design.

>> Could binding gmem files not on creation, but at memslot configuration
>> time be sufficient and simpler?
>
> After working through the flows, I think binding on-demand would simplify the
> refcounting (stating the obvious), but complicate the lifecycle of the memory as
> well as the contract between KVM and userspace,

If we are on the same page that the memory should outlive the VM but not
the bindings, does it still complicate the lifecycle of the memory and
the userspace/KVM contract? Could it just be a different contract?

> and would break the separation of
> concerns between the inode (physical memory / data) and file (VM's view / mappings).

Binding on-demand is orthogonal to the separation of concerns between
inode and file, because it can be built regardless of whether we do the
gmem file/inode split.

+ This flip-the-refcounting POC is built with the file/inode split and
+ In [2] (the delayed binding approach to solve intra-host migration), I
  also tried flipping the refcounting, and that without the gmem
  file/inode split. (Refcounting in [2] is buggy because the file can't
  take a refcount on KVM, but it would work without taking that refcount)

[1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.com/T/
[2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9915c9c1e4cc1006a40b49df

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-08-15 18:43 UTC|newest]

Thread overview: 517+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18 23:44 [RFC PATCH v11 00/29] KVM: guest_memfd() and per-page attributes Sean Christopherson
2023-07-18 23:44 ` Sean Christopherson
2023-07-18 23:44 ` Sean Christopherson
2023-07-18 23:44 ` Sean Christopherson
2023-07-18 23:44 ` [RFC PATCH v11 01/29] KVM: Wrap kvm_gfn_range.pte in a per-action union Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19 13:39   ` Jarkko Sakkinen
2023-07-19 13:39     ` Jarkko Sakkinen
2023-07-19 13:39     ` Jarkko Sakkinen
2023-07-19 13:39     ` Jarkko Sakkinen
2023-07-19 15:39     ` Sean Christopherson
2023-07-19 15:39       ` Sean Christopherson
2023-07-19 15:39       ` Sean Christopherson
2023-07-19 15:39       ` Sean Christopherson
2023-07-19 16:55   ` Paolo Bonzini
2023-07-19 16:55     ` Paolo Bonzini
2023-07-19 16:55     ` Paolo Bonzini
2023-07-19 16:55     ` Paolo Bonzini
2023-07-26 20:22     ` Sean Christopherson
2023-07-26 20:22       ` Sean Christopherson
2023-07-26 20:22       ` Sean Christopherson
2023-07-21  6:26   ` Yan Zhao
2023-07-21  6:26     ` Yan Zhao
2023-07-21  6:26     ` Yan Zhao
2023-07-21  6:26     ` Yan Zhao
2023-07-21 10:45     ` Xu Yilun
2023-07-21 10:45       ` Xu Yilun
2023-07-21 10:45       ` Xu Yilun
2023-07-21 10:45       ` Xu Yilun
2023-07-25 18:05       ` Sean Christopherson
2023-07-25 18:05         ` Sean Christopherson
2023-07-25 18:05         ` Sean Christopherson
2023-07-18 23:44 ` [RFC PATCH v11 02/29] KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19 17:12   ` Paolo Bonzini
2023-07-19 17:12     ` Paolo Bonzini
2023-07-19 17:12     ` Paolo Bonzini
2023-07-19 17:12     ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 03/29] KVM: Use gfn instead of hva for mmu_notifier_retry Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19 17:12   ` Paolo Bonzini
2023-07-19 17:12     ` Paolo Bonzini
2023-07-19 17:12     ` Paolo Bonzini
2023-07-19 17:12     ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 04/29] KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19 17:34   ` Paolo Bonzini
2023-07-19 17:34     ` Paolo Bonzini
2023-07-19 17:34     ` Paolo Bonzini
2023-07-19 17:34     ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 05/29] KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19  7:31   ` Yuan Yao
2023-07-19  7:31     ` Yuan Yao
2023-07-19  7:31     ` Yuan Yao
2023-07-19  7:31     ` Yuan Yao
2023-07-19 14:15     ` Sean Christopherson
2023-07-19 14:15       ` Sean Christopherson
2023-07-19 14:15       ` Sean Christopherson
2023-07-19 14:15       ` Sean Christopherson
2023-07-20  1:15       ` Yuan Yao
2023-07-20  1:15         ` Yuan Yao
2023-07-20  1:15         ` Yuan Yao
2023-07-20  1:15         ` Yuan Yao
2023-07-18 23:44 ` [RFC PATCH v11 06/29] KVM: Introduce KVM_SET_USER_MEMORY_REGION2 Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-21  9:03   ` Paolo Bonzini
2023-07-21  9:03     ` Paolo Bonzini
2023-07-21  9:03     ` Paolo Bonzini
2023-07-21  9:03     ` Paolo Bonzini
2023-07-28  9:25   ` Quentin Perret
2023-07-28  9:25     ` Quentin Perret
2023-07-28  9:25     ` Quentin Perret
2023-07-28  9:25     ` Quentin Perret
2023-07-29  0:03     ` Sean Christopherson
2023-07-29  0:03       ` Sean Christopherson
2023-07-29  0:03       ` Sean Christopherson
2023-07-29  0:03       ` Sean Christopherson
2023-07-31  9:30       ` Quentin Perret
2023-07-31  9:30         ` Quentin Perret
2023-07-31  9:30         ` Quentin Perret
2023-07-31  9:30         ` Quentin Perret
2023-07-31 15:58       ` Paolo Bonzini
2023-07-31 15:58         ` Paolo Bonzini
2023-07-31 15:58         ` Paolo Bonzini
2023-07-31 15:58         ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 07/29] KVM: Add KVM_EXIT_MEMORY_FAULT exit Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19  7:54   ` Yuan Yao
2023-07-19  7:54     ` Yuan Yao
2023-07-19  7:54     ` Yuan Yao
2023-07-19  7:54     ` Yuan Yao
2023-07-19 14:16     ` Sean Christopherson
2023-07-19 14:16       ` Sean Christopherson
2023-07-19 14:16       ` Sean Christopherson
2023-07-19 14:16       ` Sean Christopherson
2023-07-18 23:44 ` [RFC PATCH v11 08/29] KVM: Introduce per-page memory attributes Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-20  8:09   ` Yuan Yao
2023-07-20  8:09     ` Yuan Yao
2023-07-20  8:09     ` Yuan Yao
2023-07-20  8:09     ` Yuan Yao
2023-07-20 19:02     ` Isaku Yamahata
2023-07-20 19:02       ` Isaku Yamahata
2023-07-20 19:02       ` Isaku Yamahata
2023-07-20 19:02       ` Isaku Yamahata
2023-07-20 20:20       ` Sean Christopherson
2023-07-20 20:20         ` Sean Christopherson
2023-07-20 20:20         ` Sean Christopherson
2023-07-20 20:20         ` Sean Christopherson
2023-07-21 10:57   ` Paolo Bonzini
2023-07-21 10:57     ` Paolo Bonzini
2023-07-21 10:57     ` Paolo Bonzini
2023-07-21 10:57     ` Paolo Bonzini
2023-07-21 15:56   ` Xiaoyao Li
2023-07-21 15:56     ` Xiaoyao Li
2023-07-21 15:56     ` Xiaoyao Li
2023-07-21 15:56     ` Xiaoyao Li
2023-07-24  4:43   ` Xu Yilun
2023-07-24  4:43     ` Xu Yilun
2023-07-24  4:43     ` Xu Yilun
2023-07-26 15:59     ` Sean Christopherson
2023-07-26 15:59       ` Sean Christopherson
2023-07-26 15:59       ` Sean Christopherson
2023-07-27  3:24       ` Xu Yilun
2023-07-27  3:24         ` Xu Yilun
2023-07-27  3:24         ` Xu Yilun
2023-07-27  3:24         ` Xu Yilun
2023-08-02 20:31   ` Isaku Yamahata
2023-08-02 20:31     ` Isaku Yamahata
2023-08-02 20:31     ` Isaku Yamahata
2023-08-02 20:31     ` Isaku Yamahata
2023-08-14  0:44   ` Binbin Wu
2023-08-14  0:44     ` Binbin Wu
2023-08-14  0:44     ` Binbin Wu
2023-08-14  0:44     ` Binbin Wu
2023-08-14 21:54     ` Sean Christopherson
2023-08-14 21:54       ` Sean Christopherson
2023-08-14 21:54       ` Sean Christopherson
2023-08-14 21:54       ` Sean Christopherson
2023-07-18 23:44 ` [RFC PATCH v11 09/29] KVM: x86: Disallow hugepages when memory attributes are mixed Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-21 11:59   ` Paolo Bonzini
2023-07-21 11:59     ` Paolo Bonzini
2023-07-21 11:59     ` Paolo Bonzini
2023-07-21 11:59     ` Paolo Bonzini
2023-07-21 17:41     ` Sean Christopherson
2023-07-21 17:41       ` Sean Christopherson
2023-07-21 17:41       ` Sean Christopherson
2023-07-21 17:41       ` Sean Christopherson
2023-07-18 23:44 ` [RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-25 10:24   ` Kirill A . Shutemov
2023-07-25 10:24     ` Kirill A . Shutemov
2023-07-25 10:24     ` Kirill A . Shutemov
2023-07-25 12:51     ` Matthew Wilcox
2023-07-25 12:51       ` Matthew Wilcox
2023-07-25 12:51       ` Matthew Wilcox
2023-07-26 11:36       ` Kirill A . Shutemov
2023-07-26 11:36         ` Kirill A . Shutemov
2023-07-26 11:36         ` Kirill A . Shutemov
2023-07-28 16:02       ` Vlastimil Babka
2023-07-28 16:02         ` Vlastimil Babka
2023-07-28 16:02         ` Vlastimil Babka
2023-07-28 16:02         ` Vlastimil Babka
2023-07-28 16:13         ` Paolo Bonzini
2023-07-28 16:13           ` Paolo Bonzini
2023-07-28 16:13           ` Paolo Bonzini
2023-07-28 16:13           ` Paolo Bonzini
2023-09-01  8:23       ` Vlastimil Babka
2023-09-01  8:23         ` Vlastimil Babka
2023-09-01  8:23         ` Vlastimil Babka
2023-09-01  8:23         ` Vlastimil Babka
2023-07-18 23:44 ` [RFC PATCH v11 11/29] security: Export security_inode_init_security_anon() for use by KVM Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19  2:14   ` Paul Moore
2023-07-19  2:14     ` Paul Moore
2023-07-19  2:14     ` Paul Moore
2023-07-19  2:14     ` Paul Moore
2023-07-31 10:46   ` Vlastimil Babka
2023-07-31 10:46     ` Vlastimil Babka
2023-07-31 10:46     ` Vlastimil Babka
2023-07-31 10:46     ` Vlastimil Babka
2023-07-18 23:44 ` [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-19 17:21   ` Vishal Annapurve
2023-07-19 17:21     ` Vishal Annapurve
2023-07-19 17:21     ` Vishal Annapurve
2023-07-19 17:21     ` Vishal Annapurve
2023-07-19 17:47     ` Sean Christopherson
2023-07-19 17:47       ` Sean Christopherson
2023-07-19 17:47       ` Sean Christopherson
2023-07-19 17:47       ` Sean Christopherson
2023-07-20 14:45   ` Xiaoyao Li
2023-07-20 14:45     ` Xiaoyao Li
2023-07-20 14:45     ` Xiaoyao Li
2023-07-20 14:45     ` Xiaoyao Li
2023-07-20 15:14     ` Sean Christopherson
2023-07-20 15:14       ` Sean Christopherson
2023-07-20 15:14       ` Sean Christopherson
2023-07-20 15:14       ` Sean Christopherson
2023-07-20 21:28   ` Isaku Yamahata
2023-07-20 21:28     ` Isaku Yamahata
2023-07-20 21:28     ` Isaku Yamahata
2023-07-20 21:28     ` Isaku Yamahata
2023-07-21  6:13   ` Yuan Yao
2023-07-21  6:13     ` Yuan Yao
2023-07-21  6:13     ` Yuan Yao
2023-07-21  6:13     ` Yuan Yao
2023-07-21 22:27     ` Isaku Yamahata
2023-07-21 22:27       ` Isaku Yamahata
2023-07-21 22:27       ` Isaku Yamahata
2023-07-21 22:27       ` Isaku Yamahata
2023-07-21 22:33       ` Sean Christopherson
2023-07-21 22:33         ` Sean Christopherson
2023-07-21 22:33         ` Sean Christopherson
2023-07-21 22:33         ` Sean Christopherson
2023-07-21 15:05   ` Xiaoyao Li
2023-07-21 15:05     ` Xiaoyao Li
2023-07-21 15:05     ` Xiaoyao Li
2023-07-21 15:05     ` Xiaoyao Li
2023-07-21 15:42     ` Xiaoyao Li
2023-07-21 15:42       ` Xiaoyao Li
2023-07-21 15:42       ` Xiaoyao Li
2023-07-21 15:42       ` Xiaoyao Li
2023-07-21 17:42       ` Sean Christopherson
2023-07-21 17:42         ` Sean Christopherson
2023-07-21 17:42         ` Sean Christopherson
2023-07-21 17:42         ` Sean Christopherson
2023-07-21 17:17   ` Paolo Bonzini
2023-07-21 17:17     ` Paolo Bonzini
2023-07-21 17:17     ` Paolo Bonzini
2023-07-21 17:17     ` Paolo Bonzini
2023-07-21 17:50     ` Sean Christopherson
2023-07-21 17:50       ` Sean Christopherson
2023-07-21 17:50       ` Sean Christopherson
2023-07-21 17:50       ` Sean Christopherson
2023-07-25 15:09   ` Wang, Wei W
2023-07-25 15:09     ` Wang, Wei W
2023-07-25 15:09     ` Wang, Wei W
2023-07-25 16:03     ` Sean Christopherson
2023-07-25 16:03       ` Sean Christopherson
2023-07-25 16:03       ` Sean Christopherson
2023-07-26  1:51       ` Wang, Wei W
2023-07-26  1:51         ` Wang, Wei W
2023-07-26  1:51         ` Wang, Wei W
2023-07-31 16:23       ` Fuad Tabba
2023-07-31 16:23         ` Fuad Tabba
2023-07-31 16:23         ` Fuad Tabba
2023-07-31 16:23         ` Fuad Tabba
2023-07-26 17:18   ` Elliot Berman
2023-07-26 17:18     ` Elliot Berman
2023-07-26 17:18     ` Elliot Berman
2023-07-26 19:28     ` Sean Christopherson
2023-07-26 19:28       ` Sean Christopherson
2023-07-26 19:28       ` Sean Christopherson
2023-07-27 10:39   ` Fuad Tabba
2023-07-27 10:39     ` Fuad Tabba
2023-07-27 10:39     ` Fuad Tabba
2023-07-27 10:39     ` Fuad Tabba
2023-07-27 17:13     ` Sean Christopherson
2023-07-27 17:13       ` Sean Christopherson
2023-07-27 17:13       ` Sean Christopherson
2023-07-27 17:13       ` Sean Christopherson
2023-07-31 13:46       ` Fuad Tabba
2023-07-31 13:46         ` Fuad Tabba
2023-07-31 13:46         ` Fuad Tabba
2023-07-31 13:46         ` Fuad Tabba
2023-08-03 19:15   ` Ryan Afranji
2023-08-03 19:15     ` Ryan Afranji
2023-08-03 19:15     ` Ryan Afranji
2023-08-03 19:15     ` Ryan Afranji
2023-08-07 23:06   ` Ackerley Tng
2023-08-07 23:06     ` Ackerley Tng
2023-08-07 23:06     ` Ackerley Tng
2023-08-07 23:06     ` Ackerley Tng
2023-08-08 21:13     ` Sean Christopherson
2023-08-08 21:13       ` Sean Christopherson
2023-08-08 21:13       ` Sean Christopherson
2023-08-08 21:13       ` Sean Christopherson
2023-08-10 23:57       ` Vishal Annapurve
2023-08-10 23:57         ` Vishal Annapurve
2023-08-10 23:57         ` Vishal Annapurve
2023-08-10 23:57         ` Vishal Annapurve
2023-08-11 17:44         ` Sean Christopherson
2023-08-11 17:44           ` Sean Christopherson
2023-08-11 17:44           ` Sean Christopherson
2023-08-11 17:44           ` Sean Christopherson
2023-08-15 18:43       ` Ackerley Tng [this message]
2023-08-15 18:43         ` Ackerley Tng
2023-08-15 18:43         ` Ackerley Tng
2023-08-15 18:43         ` Ackerley Tng
2023-08-15 20:03         ` Sean Christopherson
2023-08-15 20:03           ` Sean Christopherson
2023-08-15 20:03           ` Sean Christopherson
2023-08-15 20:03           ` Sean Christopherson
2023-08-21 17:30           ` Ackerley Tng
2023-08-21 17:30             ` Ackerley Tng
2023-08-21 17:30             ` Ackerley Tng
2023-08-21 17:30             ` Ackerley Tng
2023-08-21 19:33             ` Sean Christopherson
2023-08-21 19:33               ` Sean Christopherson
2023-08-21 19:33               ` Sean Christopherson
2023-08-21 19:33               ` Sean Christopherson
2023-08-28 22:56               ` Ackerley Tng
2023-08-28 22:56                 ` Ackerley Tng
2023-08-28 22:56                 ` Ackerley Tng
2023-08-28 22:56                 ` Ackerley Tng
2023-08-29  2:53                 ` Elliot Berman
2023-08-29  2:53                   ` Elliot Berman
2023-08-29  2:53                   ` Elliot Berman
2023-08-29  2:53                   ` Elliot Berman
2023-09-14 19:12                   ` Sean Christopherson
2023-09-14 19:12                     ` Sean Christopherson
2023-09-14 19:12                     ` Sean Christopherson
2023-09-14 19:12                     ` Sean Christopherson
2023-09-14 18:15                 ` Sean Christopherson
2023-09-14 18:15                   ` Sean Christopherson
2023-09-14 18:15                   ` Sean Christopherson
2023-09-14 18:15                   ` Sean Christopherson
2023-09-14 23:19                   ` Ackerley Tng
2023-09-14 23:19                     ` Ackerley Tng
2023-09-14 23:19                     ` Ackerley Tng
2023-09-14 23:19                     ` Ackerley Tng
2023-09-15  0:33                     ` Sean Christopherson
2023-09-15  0:33                       ` Sean Christopherson
2023-09-15  0:33                       ` Sean Christopherson
2023-09-15  0:33                       ` Sean Christopherson
2023-08-30 15:12   ` Binbin Wu
2023-08-30 15:12     ` Binbin Wu
2023-08-30 15:12     ` Binbin Wu
2023-08-30 15:12     ` Binbin Wu
2023-08-30 16:44     ` Ackerley Tng
2023-08-30 16:44       ` Ackerley Tng
2023-08-30 16:44       ` Ackerley Tng
2023-08-30 16:44       ` Ackerley Tng
2023-09-01  3:45       ` Binbin Wu
2023-09-01  3:45         ` Binbin Wu
2023-09-01  3:45         ` Binbin Wu
2023-09-01  3:45         ` Binbin Wu
2023-09-01 16:46         ` Ackerley Tng
2023-09-01 16:46           ` Ackerley Tng
2023-09-01 16:46           ` Ackerley Tng
2023-09-01 16:46           ` Ackerley Tng
2023-07-18 23:44 ` [RFC PATCH v11 13/29] KVM: Add transparent hugepage support for dedicated guest memory Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-21 15:07   ` Paolo Bonzini
2023-07-21 15:07     ` Paolo Bonzini
2023-07-21 15:07     ` Paolo Bonzini
2023-07-21 15:07     ` Paolo Bonzini
2023-07-21 17:13     ` Sean Christopherson
2023-07-21 17:13       ` Sean Christopherson
2023-07-21 17:13       ` Sean Christopherson
2023-07-21 17:13       ` Sean Christopherson
2023-09-06 22:10       ` Paolo Bonzini
2023-09-06 22:10         ` Paolo Bonzini
2023-09-06 22:10         ` Paolo Bonzini
2023-09-06 22:10         ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 14/29] KVM: x86/mmu: Handle page fault for private memory Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-21 15:09   ` Paolo Bonzini
2023-07-21 15:09     ` Paolo Bonzini
2023-07-21 15:09     ` Paolo Bonzini
2023-07-21 15:09     ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 15/29] KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-21 15:07   ` Paolo Bonzini
2023-07-21 15:07     ` Paolo Bonzini
2023-07-21 15:07     ` Paolo Bonzini
2023-07-21 15:07     ` Paolo Bonzini
2023-07-18 23:44 ` [RFC PATCH v11 16/29] KVM: Allow arch code to track number of memslot address spaces per VM Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-18 23:44   ` Sean Christopherson
2023-07-21 15:12   ` Paolo Bonzini
2023-07-21 15:12     ` Paolo Bonzini
2023-07-21 15:12     ` Paolo Bonzini
2023-07-21 15:12     ` Paolo Bonzini
2023-07-18 23:45 ` [RFC PATCH v11 17/29] KVM: x86: Add support for "protected VMs" that can utilize private memory Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 18/29] KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-21 15:14   ` Paolo Bonzini
2023-07-21 15:14     ` Paolo Bonzini
2023-07-21 15:14     ` Paolo Bonzini
2023-07-21 15:14     ` Paolo Bonzini
2023-07-18 23:45 ` [RFC PATCH v11 19/29] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2 Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 20/29] KVM: selftests: Add support for creating private memslots Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 21/29] KVM: selftests: Add helpers to convert guest memory b/w private and shared Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 22/29] KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86) Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 23/29] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 24/29] KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 25/29] KVM: selftests: Add x86-only selftest for private memory conversions Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 26/29] KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45 ` [RFC PATCH v11 27/29] KVM: selftests: Expand set_memory_region_test to validate guest_memfd() Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-08-07 23:17   ` Ackerley Tng
2023-08-07 23:17     ` Ackerley Tng
2023-08-07 23:17     ` Ackerley Tng
2023-08-07 23:17     ` Ackerley Tng
2023-07-18 23:45 ` [RFC PATCH v11 28/29] KVM: selftests: Add basic selftest for guest_memfd() Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-08-07 23:20   ` Ackerley Tng
2023-08-07 23:20     ` Ackerley Tng
2023-08-07 23:20     ` Ackerley Tng
2023-08-07 23:20     ` Ackerley Tng
2023-08-18 23:03     ` Sean Christopherson
2023-08-18 23:03       ` Sean Christopherson
2023-08-18 23:03       ` Sean Christopherson
2023-08-18 23:03       ` Sean Christopherson
2023-08-07 23:25   ` Ackerley Tng
2023-08-07 23:25     ` Ackerley Tng
2023-08-07 23:25     ` Ackerley Tng
2023-08-07 23:25     ` Ackerley Tng
2023-08-18 23:01     ` Sean Christopherson
2023-08-18 23:01       ` Sean Christopherson
2023-08-18 23:01       ` Sean Christopherson
2023-08-18 23:01       ` Sean Christopherson
2023-08-21 19:49       ` Ackerley Tng
2023-08-21 19:49         ` Ackerley Tng
2023-08-21 19:49         ` Ackerley Tng
2023-08-21 19:49         ` Ackerley Tng
2023-07-18 23:45 ` [RFC PATCH v11 29/29] KVM: selftests: Test KVM exit behavior for private memory/access Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-18 23:45   ` Sean Christopherson
2023-07-24  6:38 ` [RFC PATCH v11 00/29] KVM: guest_memfd() and per-page attributes Nikunj A. Dadhania
2023-07-24  6:38   ` Nikunj A. Dadhania
2023-07-24  6:38   ` Nikunj A. Dadhania
2023-07-24 17:00   ` Sean Christopherson
2023-07-24 17:00     ` Sean Christopherson
2023-07-24 17:00     ` Sean Christopherson
2023-07-26 11:20     ` Nikunj A. Dadhania
2023-07-26 11:20       ` Nikunj A. Dadhania
2023-07-26 11:20       ` Nikunj A. Dadhania
2023-07-26 14:24       ` Sean Christopherson
2023-07-26 14:24         ` Sean Christopherson
2023-07-26 14:24         ` Sean Christopherson
2023-07-27  6:42         ` Nikunj A. Dadhania
2023-07-27  6:42           ` Nikunj A. Dadhania
2023-07-27  6:42           ` Nikunj A. Dadhania
2023-07-27  6:42           ` Nikunj A. Dadhania
2023-08-03 11:03       ` Vlastimil Babka
2023-08-03 11:03         ` Vlastimil Babka
2023-08-03 11:03         ` Vlastimil Babka
2023-08-03 11:03         ` Vlastimil Babka
2023-07-24 20:16 ` Sean Christopherson
2023-08-25 17:47 ` Sean Christopherson
2023-08-29  9:12   ` Chao Peng
2023-08-31 18:29     ` Sean Christopherson
2023-09-01  1:17       ` Chao Peng
2023-09-01  8:26         ` Vlastimil Babka
2023-09-01  9:10         ` Paolo Bonzini
2023-08-30  0:00   ` Isaku Yamahata
2023-09-09  0:16   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqzh6p02lk4.fsf@ackerleytng-ctop.c.googlers.com \
    --to=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=jarkko@kernel.org \
    --cc=jmorris@namei.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paul@paul-moore.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=seanjc@google.com \
    --cc=serge@hallyn.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.