linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michael Roth <michael.roth@amd.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Chao Peng <chao.p.peng@linux.intel.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-api@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	<x86@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	"Hugh Dickins" <hughd@google.com>,
	Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	<jun.nakajima@intel.com>, <dave.hansen@intel.com>,
	<ak@linux.intel.com>, <david@redhat.com>, <aarcange@redhat.com>,
	<ddutile@redhat.com>, <dhildenb@redhat.com>,
	"Quentin Perret" <qperret@google.com>, <mhocko@suse.com>,
	"Nikunj A. Dadhania" <nikunj@amd.com>
Subject: Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory
Date: Thu, 23 Jun 2022 17:59:49 -0500	[thread overview]
Message-ID: <20220623225949.kkdx6uwjlk2ec4iq@amd.com> (raw)
In-Reply-To: <YofeZps9YXgtP3f1@google.com>

On Fri, May 20, 2022 at 06:31:02PM +0000, Sean Christopherson wrote:
> On Fri, May 20, 2022, Andy Lutomirski wrote:
> > The alternative would be to have some kind of separate table or bitmap (part
> > of the memslot?) that tells KVM whether a GPA should map to the fd.
> > 
> > What do you all think?
> 
> My original proposal was to have expolicit shared vs. private memslots, and punch
> holes in KVM's memslots on conversion, but due to the way KVM (and userspace)
> handle memslot updates, conversions would be painfully slow.  That's how we ended
> up with the current propsoal.
> 
> But a dedicated KVM ioctl() to add/remove shared ranges would be easy to implement
> and wouldn't necessarily even need to interact with the memslots.  It could be a
> consumer of memslots, e.g. if we wanted to disallow registering regions without an
> associated memslot, but I think we'd want to avoid even that because things will
> get messy during memslot updates, e.g. if dirty logging is toggled or a shared
> memory region is temporarily removed then we wouldn't want to destroy the tracking.
> 
> I don't think we'd want to use a bitmap, e.g. for a well-behaved guest, XArray
> should be far more efficient.
> 
> One benefit to explicitly tracking this in KVM is that it might be useful for
> software-only protected VMs, e.g. KVM could mark a region in the XArray as "pending"
> based on guest hypercalls to share/unshare memory, and then complete the transaction
> when userspace invokes the ioctl() to complete the share/unshare.

Another upside to implementing a KVM ioctl is basically the reverse of the
discussion around avoiding double-allocations: *supporting* double-allocations.

One thing I noticed while testing SNP+UPM support is a fairly dramatic
slow-down with how it handles OVMF, which does some really nasty stuff
with DMA where it takes 1 or 2 pages and flips them between
shared/private on every transaction. Obviously that's not ideal and
should be fixed directly at some point, but it's something that exists in the
wild and might not be the only such instance where we need to deal with that
sort of usage pattern. 

With the current implementation, one option I had to address this was to
disable hole-punching in QEMU when doing shared->private conversions:

Boot time from 1GB guest:
                               SNP:   32s
                           SNP+UPM: 1m43s
  SNP+UPM (disable shared discard): 1m08s

Of course, we don't have the option of disabling discard/hole-punching
for private memory to see if we get similar gains there, since that also
doubles as the interface for doing private->shared conversions. A separate
KVM ioctl to decouple these 2 things would allow for that, and allow for a
way for userspace to implement things like batched/lazy-discard of
previously-converted pages to deal with cases like these.

Another motivator for these separate ioctl is that, since we're considering
'out-of-band' interactions with private memfd where userspace might
erroneously/inadvertently do things like double allocations, another thing it
might do is pre-allocating pages in the private memfd prior to associating
the memfd with a private memslot. Since the notifiers aren't registered until
that point, any associated callbacks that would normally need to be done as
part of those fallocate() notification would be missed unless we do something
like 'replay' all the notifications once the private memslot is registered and
associating with a memfile notifier. But that seems a bit ugly, and I'm not
sure how well that would work. This also seems to hint at this additional
'conversion' state being something that should be owned and managed directly
by KVM rather than hooking into the allocations.

It would also nicely solve the question of how to handle in-place
encryption, since unlike userspace, KVM is perfectly capable of copying
data from shared->private prior to conversion / guest start, and
disallowing such things afterward. Would just need an extra flag basically.


  parent reply	other threads:[~2022-06-23 23:00 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 15:37 [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-05-19 15:37 ` [PATCH v6 1/8] mm: Introduce memfile_notifier Chao Peng
2022-05-19 15:37 ` [PATCH v6 2/8] mm/shmem: Support memfile_notifier Chao Peng
2022-05-19 15:37 ` [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-05-31 19:15   ` Vishal Annapurve
2022-06-01 10:17     ` Chao Peng
2022-06-01 12:11       ` Gupta, Pankaj
2022-06-02 10:07         ` Chao Peng
2022-06-14 20:23           ` Sean Christopherson
2022-06-15  8:53             ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-05-20 17:57   ` Andy Lutomirski
2022-05-20 18:31     ` Sean Christopherson
2022-05-22  4:03       ` Andy Lutomirski
2022-05-23 13:21       ` Chao Peng
2022-05-23 15:22         ` Sean Christopherson
2022-05-30 13:26           ` Chao Peng
2022-06-10 16:14             ` Sean Christopherson
2022-06-14  6:45               ` Chao Peng
2022-06-23 22:59       ` Michael Roth [this message]
2022-06-24  8:54         ` Chao Peng
2022-06-24 13:01           ` Michael Roth
2022-06-17 20:52   ` Sean Christopherson
2022-06-17 21:27     ` Sean Christopherson
2022-06-20 14:09       ` Chao Peng
2022-06-20 14:08     ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 5/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-05-19 15:37 ` [PATCH v6 6/8] KVM: Handle page fault for private memory Chao Peng
2022-06-17 21:30   ` Sean Christopherson
2022-06-20 14:16     ` Chao Peng
2022-08-19  0:40     ` Kirill A. Shutemov
2022-08-25 23:43       ` Sean Christopherson
2022-06-24  3:58   ` Nikunj A. Dadhania
2022-06-24  9:02     ` Chao Peng
2022-06-30 19:14       ` Vishal Annapurve
2022-06-30 22:21         ` Michael Roth
2022-07-01  1:21           ` Xiaoyao Li
2022-07-07 20:08             ` Sean Christopherson
2022-07-08  3:29               ` Xiaoyao Li
2022-07-20 23:08                 ` Vishal Annapurve
2022-07-21  9:45                   ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-06-23 22:07   ` Michael Roth
2022-06-24  8:43     ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 8/8] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng
2022-06-06 20:09 ` [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory Vishal Annapurve
2022-06-07  6:57   ` Chao Peng
2022-06-08  0:55     ` Marc Orr
2022-06-08  2:18       ` Chao Peng
2022-06-08 19:37         ` Vishal Annapurve
2022-06-09 20:29           ` Sean Christopherson
2022-06-14  7:28             ` Chao Peng
2022-06-14 17:37               ` Andy Lutomirski
2022-06-14 19:08                 ` Sean Christopherson
2022-06-14 20:59                   ` Andy Lutomirski
2022-06-15  9:17                     ` Chao Peng
2022-06-15 14:29                       ` Sean Christopherson
2022-06-10  0:11         ` Marc Orr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220623225949.kkdx6uwjlk2ec4iq@amd.com \
    --to=michael.roth@amd.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=nikunj@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).