All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Vishal Annapurve <vannapurve@google.com>,
	Marc Orr <marcorr@google.com>, kvm list <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86 <x86@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com,
	Quentin Perret <qperret@google.com>,
	Michael Roth <michael.roth@amd.com>,
	mhocko@suse.com
Subject: Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory
Date: Wed, 15 Jun 2022 14:29:45 +0000	[thread overview]
Message-ID: <Yqns2ar0TND4RP9P@google.com> (raw)
In-Reply-To: <20220615091759.GB1823790@chaop.bj.intel.com>

On Wed, Jun 15, 2022, Chao Peng wrote:
> On Tue, Jun 14, 2022 at 01:59:41PM -0700, Andy Lutomirski wrote:
> > On Tue, Jun 14, 2022 at 12:09 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, Jun 14, 2022, Andy Lutomirski wrote:
> > > > This patch series is fairly close to implementing a rather more
> > > > efficient solution.  I'm not familiar enough with hypervisor userspace
> > > > to really know if this would work, but:
> > > >
> > > > What if shared guest memory could also be file-backed, either in the
> > > > same fd or with a second fd covering the shared portion of a memslot?
> > > > This would allow changes to the backing store (punching holes, etc) to
> > > > be some without mmap_lock or host-userspace TLB flushes?  Depending on
> > > > what the guest is doing with its shared memory, userspace might need
> > > > the memory mapped or it might not.
> > >
> > > That's what I'm angling for with the F_SEAL_FAULT_ALLOCATIONS idea.  The issue,
> > > unless I'm misreading code, is that punching a hole in the shared memory backing
> > > store doesn't prevent reallocating that hole on fault, i.e. a helper process that
> > > keeps a valid mapping of guest shared memory can silently fill the hole.
> > >
> > > What we're hoping to achieve is a way to prevent allocating memory without a very
> > > explicit action from userspace, e.g. fallocate().
> > 
> > Ah, I misunderstood.  I thought your goal was to mmap it and prevent
> > page faults from allocating.

I don't think you misunderstood, that's also one of the goals.  The use case is
that multiple processes in the host mmap() guest memory, and we'd like to be able
to punch a hole without having to rendezvous with all processes and also to prevent
an unintentional re-allocation.

> I think we still need the mmap, but want to prevent allocating when
> userspace touches previously mmaped area that has never filled the page.

Yes, or if a chunk was filled at some point but then was removed via PUNCH_HOLE.

> I don't have clear answer if other operations like read/write should be
> also prevented (probably yes). And only after an explicit fallocate() to
> allocate the page these operations would act normally.

I always forget about read/write.  I believe reads should be ok, the semantics of
holes are that they return zeros, i.e. can use ZERO_PAGE() and not allocate a new
backing page.  Not sure what to do about writes though.  Allocating on direct writes
might be ok for our use case, but that could also result in a rather wierd API.

> > It is indeed the case (and has been since before quite a few of us
> > were born) that a hole in a sparse file is logically just a bunch of
> > zeros.  A way to make a file for which a hole is an actual hole seems
> > like it would solve this problem nicely.  It could also be solved more
> > specifically for KVM by making sure that the private/shared mode that
> > userspace programs is strict enough to prevent accidental allocations
> > -- if a GPA is definitively private, shared, neither, or (potentially,
> > on TDX only) both, then a page that *isn't* shared will never be
> > accidentally allocated by KVM.
> 
> KVM is clever enough to not allocate since it knows a GPA is shared or
> not. This case it's the host userspace that can cause the allocating and
> is too complex to check on every access from guest.

Yes, KVM is not in the picture at all.  KVM won't trigger allocation, but KVM also
is not in a position to prevent userspace from touching memory.

> > If the shared backing is not mmapped,
> > it also won't be accidentally allocated by host userspace on a stray
> > or careless write.
> 
> As said above, mmap is still prefered, otherwise too many changes are
> needed for usespace VMM.

Forcing userspace to change doesn't bother me too much, the biggest concern is
having to take mmap_lock for write in a per-host process.

  reply	other threads:[~2022-06-15 14:29 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 15:37 [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-05-19 15:37 ` [PATCH v6 1/8] mm: Introduce memfile_notifier Chao Peng
2022-05-19 15:37 ` [PATCH v6 2/8] mm/shmem: Support memfile_notifier Chao Peng
2022-05-19 15:37 ` [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-05-31 19:15   ` Vishal Annapurve
2022-06-01 10:17     ` Chao Peng
2022-06-01 12:11       ` Gupta, Pankaj
2022-06-02 10:07         ` Chao Peng
2022-06-14 20:23           ` Sean Christopherson
2022-06-15  8:53             ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-05-20 17:57   ` Andy Lutomirski
2022-05-20 18:31     ` Sean Christopherson
2022-05-22  4:03       ` Andy Lutomirski
2022-05-23 13:21       ` Chao Peng
2022-05-23 15:22         ` Sean Christopherson
2022-05-30 13:26           ` Chao Peng
2022-06-10 16:14             ` Sean Christopherson
2022-06-14  6:45               ` Chao Peng
2022-06-23 22:59       ` Michael Roth
2022-06-24  8:54         ` Chao Peng
2022-06-24 13:01           ` Michael Roth
2022-06-17 20:52   ` Sean Christopherson
2022-06-17 21:27     ` Sean Christopherson
2022-06-20 14:09       ` Chao Peng
2022-06-20 14:08     ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 5/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-05-19 15:37 ` [PATCH v6 6/8] KVM: Handle page fault for private memory Chao Peng
2022-06-17 21:30   ` Sean Christopherson
2022-06-20 14:16     ` Chao Peng
2022-08-19  0:40     ` Kirill A. Shutemov
2022-08-25 23:43       ` Sean Christopherson
2022-06-24  3:58   ` Nikunj A. Dadhania
2022-06-24  9:02     ` Chao Peng
2022-06-30 19:14       ` Vishal Annapurve
2022-06-30 22:21         ` Michael Roth
2022-07-01  1:21           ` Xiaoyao Li
2022-07-07 20:08             ` Sean Christopherson
2022-07-08  3:29               ` Xiaoyao Li
2022-07-20 23:08                 ` Vishal Annapurve
2022-07-21  9:45                   ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-06-23 22:07   ` Michael Roth
2022-06-24  8:43     ` Chao Peng
2022-05-19 15:37 ` [PATCH v6 8/8] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng
2022-06-06 20:09 ` [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory Vishal Annapurve
2022-06-07  6:57   ` Chao Peng
2022-06-08  0:55     ` Marc Orr
2022-06-08  2:18       ` Chao Peng
2022-06-08 19:37         ` Vishal Annapurve
2022-06-09 20:29           ` Sean Christopherson
2022-06-14  7:28             ` Chao Peng
2022-06-14 17:37               ` Andy Lutomirski
2022-06-14 19:08                 ` Sean Christopherson
2022-06-14 20:59                   ` Andy Lutomirski
2022-06-15  9:17                     ` Chao Peng
2022-06-15 14:29                       ` Sean Christopherson [this message]
2022-06-10  0:11         ` Marc Orr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yqns2ar0TND4RP9P@google.com \
    --to=seanjc@google.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=marcorr@google.com \
    --cc=mhocko@suse.com \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.