kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Marc Zyngier" <maz@kernel.org>,
	"Oliver Upton" <oliver.upton@linux.dev>,
	"Huacai Chen" <chenhuacai@kernel.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Anup Patel" <anup@brainfault.org>,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Christian Brauner" <brauner@kernel.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-mips@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org,
	linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Xiaoyao Li" <xiaoyao.li@intel.com>,
	"Xu Yilun" <yilun.xu@intel.com>,
	"Chao Peng" <chao.p.peng@linux.intel.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Anish Moorthy" <amoorthy@google.com>,
	"David Matlack" <dmatlack@google.com>,
	"Yu Zhang" <yu.c.zhang@linux.intel.com>,
	"Isaku Yamahata" <isaku.yamahata@intel.com>,
	"Mickaël Salaün" <mic@digikod.net>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Vishal Annapurve" <vannapurve@google.com>,
	"Ackerley Tng" <ackerleytng@google.com>,
	"Maciej Szmigiero" <mail@maciej.szmigiero.name>,
	"David Hildenbrand" <david@redhat.com>,
	"Quentin Perret" <qperret@google.com>,
	"Michael Roth" <michael.roth@amd.com>,
	Wang <wei.w.wang@intel.com>,
	"Liam Merwick" <liam.merwick@oracle.com>,
	"Isaku Yamahata" <isaku.yamahata@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Date: Wed, 1 Nov 2023 14:55:46 -0700	[thread overview]
Message-ID: <ZULJYg5cf1UrNq3e@google.com> (raw)
In-Reply-To: <CA+EHjTwTT9cFzYTtwT43nLJS01Sgt0NqzUgKAnfo2fiV3tEvXg@mail.gmail.com>

On Wed, Nov 01, 2023, Fuad Tabba wrote:
> > > > @@ -1034,6 +1034,9 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
> > > >  /* This does not remove the slot from struct kvm_memslots data structures */
> > > >  static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > > >  {
> > > > +       if (slot->flags & KVM_MEM_PRIVATE)
> > > > +               kvm_gmem_unbind(slot);
> > > > +
> > >
> > > Should this be called after kvm_arch_free_memslot()? Arch-specific ode
> > > might need some of the data before the unbinding, something I thought
> > > might be necessary at one point for the pKVM port when deleting a
> > > memslot, but realized later that kvm_invalidate_memslot() ->
> > > kvm_arch_guest_memory_reclaimed() was the more logical place for it.
> > > Also, since that seems to be the pattern for arch-specific handlers in
> > > KVM.
> >
> > Maybe?  But only if we can about symmetry between the allocation and free paths
> > I really don't think kvm_arch_free_memslot() should be doing anything beyond a
> > "pure" free.  E.g. kvm_arch_free_memslot() is also called after moving a memslot,
> > which hopefully we never actually have to allow for guest_memfd, but any code in
> > kvm_arch_free_memslot() would bring about "what if" questions regarding memslot
> > movement.  I.e. the API is intended to be a "free arch metadata associated with
> > the memslot".
> >
> > Out of curiosity, what does pKVM need to do at kvm_arch_guest_memory_reclaimed()?
> 
> It's about the host reclaiming ownership of guest memory when tearing
> down a protected guest. In pKVM, we currently teardown the guest and
> reclaim its memory when kvm_arch_destroy_vm() is called. The problem
> with guestmem is that kvm_gmem_unbind() could get called before that
> happens, after which the host might try to access the unbound guest
> memory. Since the host hasn't reclaimed ownership of the guest memory
> from hyp, hilarity ensues (it crashes).
> 
> Initially, I hooked reclaim guest memory to kvm_free_memslot(), but
> then I needed to move the unbind later in the function. I realized
> later that kvm_arch_guest_memory_reclaimed() gets called earlier (at
> the right time), and is more aptly named.

Aha!  I suspected that might be the case.

TDX and SNP also need to solve the same problem of "reclaiming" memory before it
can be safely accessed by the host.  The plan is to add an arch hook (or two?)
into guest_memfd that is invoked when memory is freed from guest_memfd.

Hooking kvm_arch_guest_memory_reclaimed() isn't completely correct as deleting a
memslot doesn't *guarantee* that guest memory is actually reclaimed (which reminds
me, we need to figure out a better name for that thing before introducing
kvm_arch_gmem_invalidate()).

The effective false positives aren't fatal for the current usage because the hook
is used only for x86 SEV guests to flush caches.  An unnecessary flush can cause
performance issues, but it doesn't affect correctness. For TDX and SNP, and IIUC
pKVM, false positives are fatal because KVM could assign memory back to the host
that is still owned by guest_memfd.

E.g. a misbehaving userspace could prematurely delete a memslot.  And the more
fun example is intrahost migration, where the plan is to allow pointing multiple
guest_memfd files at a single guest_memfd inode:
https://lore.kernel.org/all/cover.1691446946.git.ackerleytng@google.com

There was a lot of discussion for this, but it's scattered all over the place.
The TL;DR is is that the inode will represent physical memory, and a file will
represent a given "struct kvm" instance's view of that memory.  And so the memory
isn't reclaimed until the inode is truncated/punched.

I _think_ this reflects the most recent plan from the guest_memfd side:
https://lore.kernel.org/all/1233d749211c08d51f9ca5d427938d47f008af1f.1689893403.git.isaku.yamahata@intel.com

  reply	other threads:[~2023-11-01 21:55 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-27 18:21 [PATCH v13 00/35] KVM: guest_memfd() and per-page attributes Sean Christopherson
2023-10-27 18:21 ` [PATCH v13 01/35] KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges Sean Christopherson
2023-11-01 12:46   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 02/35] KVM: Assert that mmu_invalidate_in_progress *never* goes negative Sean Christopherson
2023-10-30 16:27   ` Paolo Bonzini
2023-11-01 12:46   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 03/35] KVM: Use gfn instead of hva for mmu_notifier_retry Sean Christopherson
2023-10-30 16:30   ` Paolo Bonzini
2023-10-30 16:53   ` David Matlack
2023-10-30 17:00     ` Paolo Bonzini
2023-10-30 18:21       ` David Matlack
2023-10-30 18:19     ` David Matlack
2023-11-01 15:31   ` Xu Yilun
2023-10-27 18:21 ` [PATCH v13 04/35] KVM: WARN if there are dangling MMU invalidations at VM destruction Sean Christopherson
2023-10-30 16:32   ` Paolo Bonzini
2023-11-01 12:50   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 05/35] KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER Sean Christopherson
2023-10-30 16:34   ` Paolo Bonzini
2023-11-01 12:51   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 06/35] KVM: PPC: Return '1' unconditionally for KVM_CAP_SYNC_MMU Sean Christopherson
2023-10-27 18:21 ` [PATCH v13 07/35] KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER Sean Christopherson
2023-10-30 16:37   ` Paolo Bonzini
2023-11-01 12:54   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 08/35] KVM: Introduce KVM_SET_USER_MEMORY_REGION2 Sean Christopherson
2023-10-30 16:41   ` Paolo Bonzini
2023-10-30 20:25     ` Sean Christopherson
2023-10-30 22:12       ` Sean Christopherson
2023-10-30 23:22       ` Paolo Bonzini
2023-10-31  0:18         ` Sean Christopherson
2023-10-31  2:26   ` Xiaoyao Li
2023-10-31 14:04     ` Sean Christopherson
2023-11-01 14:19   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 09/35] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace Sean Christopherson
2023-10-30 17:22   ` Paolo Bonzini
2023-11-01  7:30   ` Binbin Wu
2023-11-01 10:52   ` Huang, Kai
2023-11-01 17:36     ` Sean Christopherson
2023-11-02  2:19       ` Xiaoyao Li
2023-11-02 15:51         ` Sean Christopherson
2023-11-02  3:17       ` Huang, Kai
2023-11-02  9:35         ` Huang, Kai
2023-11-02 11:03           ` Paolo Bonzini
2023-11-02 15:44             ` Sean Christopherson
2023-11-02 18:35               ` Huang, Kai
2023-11-02 15:56         ` Sean Christopherson
2023-11-02 11:01       ` Paolo Bonzini
2023-11-03  4:09   ` Xu Yilun
2023-10-27 18:21 ` [PATCH v13 10/35] KVM: Add a dedicated mmu_notifier flag for reclaiming freed memory Sean Christopherson
2023-10-30 17:11   ` Paolo Bonzini
2023-11-02 13:55   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 11/35] KVM: Drop .on_unlock() mmu_notifier hook Sean Christopherson
2023-10-30 17:18   ` Paolo Bonzini
2023-11-02 13:55   ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 12/35] KVM: Prepare for handling only shared mappings in mmu_notifier events Sean Christopherson
2023-10-30 17:21   ` Paolo Bonzini
2023-10-30 22:07     ` Sean Christopherson
2023-11-02  5:59   ` Binbin Wu
2023-11-02 11:14     ` Paolo Bonzini
2023-11-02 14:01   ` Fuad Tabba
2023-11-02 14:41     ` Sean Christopherson
2023-11-02 14:57       ` Fuad Tabba
2023-10-27 18:21 ` [PATCH v13 13/35] KVM: Introduce per-page memory attributes Sean Christopherson
2023-10-30  8:11   ` Chao Gao
2023-10-30 16:10     ` Sean Christopherson
2023-10-30 22:05       ` Sean Christopherson
2023-10-31 16:43   ` David Matlack
2023-11-02  3:01   ` Huang, Kai
2023-11-02 10:32     ` Paolo Bonzini
2023-11-02 10:55       ` Huang, Kai
2023-10-27 18:21 ` [PATCH v13 14/35] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable Sean Christopherson
2023-10-30 17:24   ` Paolo Bonzini
2023-10-27 18:21 ` [PATCH v13 15/35] fs: Export anon_inode_getfile_secure() for use by KVM Sean Christopherson
2023-10-30 17:30   ` Paolo Bonzini
2023-11-02 16:24   ` Christian Brauner
2023-11-03 10:40     ` Paolo Bonzini
2023-10-27 18:21 ` [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory Sean Christopherson
2023-10-31  2:27   ` Xiaoyao Li
2023-10-31  6:30   ` Chao Gao
2023-10-31 14:10     ` Sean Christopherson
2023-10-31 15:05   ` Fuad Tabba
2023-10-31 22:13     ` Sean Christopherson
2023-10-31 22:18       ` Paolo Bonzini
2023-11-01 10:51       ` Fuad Tabba
2023-11-01 21:55         ` Sean Christopherson [this message]
2023-11-02 13:52           ` Fuad Tabba
2023-11-03 23:17             ` Sean Christopherson
2023-10-31 18:24   ` David Matlack
2023-10-31 21:36     ` Sean Christopherson
2023-10-31 22:39       ` David Matlack
2023-11-02 15:48         ` Paolo Bonzini
2023-11-02 16:03           ` Sean Christopherson
2023-11-02 16:28             ` David Matlack
2023-11-02 17:37               ` Sean Christopherson
2023-11-03  9:42   ` Fuad Tabba
2023-11-04 10:26   ` Xu Yilun
2023-11-06 15:43     ` Sean Christopherson
2023-10-27 18:21 ` [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory Sean Christopherson
2023-10-31  8:35   ` Xiaoyao Li
2023-10-31 14:16     ` Sean Christopherson
2023-11-01  7:25       ` Xiaoyao Li
2023-11-01 13:41         ` Sean Christopherson
2023-11-01 13:49           ` Paolo Bonzini
2023-11-01 16:36             ` Sean Christopherson
2023-11-01 22:28               ` Paolo Bonzini
2023-11-01 22:34                 ` Sean Christopherson
2023-11-01 23:17                   ` Paolo Bonzini
2023-11-02 15:38                     ` Sean Christopherson
2023-11-02 15:46                       ` Paolo Bonzini
2023-11-27 11:13                         ` Vlastimil Babka
2023-11-29 22:40                           ` Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 18/35] KVM: x86: "Reset" vcpu->run->exit_reason early in KVM_RUN Sean Christopherson
2023-10-30 17:31   ` Paolo Bonzini
2023-11-02 14:16   ` Fuad Tabba
2023-10-27 18:22 ` [PATCH v13 19/35] KVM: x86: Disallow hugepages when memory attributes are mixed Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 20/35] KVM: x86/mmu: Handle page fault for private memory Sean Christopherson
2023-11-02 14:34   ` Fuad Tabba
2023-11-05 13:02   ` Xu Yilun
2023-11-05 16:19     ` Paolo Bonzini
2023-11-06 13:29       ` Xu Yilun
2023-11-06 15:56         ` Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 21/35] KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro Sean Christopherson
2023-11-02 14:35   ` Fuad Tabba
2023-10-27 18:22 ` [PATCH v13 22/35] KVM: Allow arch code to track number of memslot address spaces per VM Sean Christopherson
2023-10-30 17:34   ` Paolo Bonzini
2023-11-02 14:52   ` Fuad Tabba
2023-10-27 18:22 ` [PATCH v13 23/35] KVM: x86: Add support for "protected VMs" that can utilize private memory Sean Christopherson
2023-10-30 17:36   ` Paolo Bonzini
2023-11-06 11:00   ` Fuad Tabba
2023-11-06 11:03     ` Paolo Bonzini
2023-10-27 18:22 ` [PATCH v13 24/35] KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 25/35] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2 Sean Christopherson
2024-04-25 14:12   ` Dan Carpenter
2024-04-25 14:45     ` Shuah Khan
2024-04-25 15:09       ` Sean Christopherson
2024-04-25 16:22         ` Shuah Khan
2024-04-26  7:33         ` Jarkko Sakkinen
2023-10-27 18:22 ` [PATCH v13 26/35] KVM: selftests: Add support for creating private memslots Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 27/35] KVM: selftests: Add helpers to convert guest memory b/w private and shared Sean Christopherson
2023-11-06 11:26   ` Fuad Tabba
2023-10-27 18:22 ` [PATCH v13 28/35] KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86) Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 29/35] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 30/35] KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 31/35] KVM: selftests: Add x86-only selftest for private memory conversions Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 32/35] KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 33/35] KVM: selftests: Expand set_memory_region_test to validate guest_memfd() Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 34/35] KVM: selftests: Add basic selftest for guest_memfd() Sean Christopherson
2023-10-27 18:22 ` [PATCH v13 35/35] KVM: selftests: Test KVM exit behavior for private memory/access Sean Christopherson
2023-10-30 17:39 ` [PATCH v13 00/35] KVM: guest_memfd() and per-page attributes Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZULJYg5cf1UrNq3e@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jarkko@kernel.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).