All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andy Lutomirski" <luto@kernel.org>
To: "Quentin Perret" <qperret@google.com>
Cc: "Sean Christopherson" <seanjc@google.com>,
	"Steven Price" <steven.price@arm.com>,
	"Chao Peng" <chao.p.peng@linux.intel.com>,
	"kvm list" <kvm@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	"Linux API" <linux-api@vger.kernel.org>,
	qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Jim Mattson" <jmattson@google.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Jeff Layton" <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Vishal Annapurve" <vannapurve@google.com>,
	"Yu Zhang" <yu.c.zhang@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Andi Kleen" <ak@linux.intel.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Marc Zyngier" <maz@kernel.org>, "Will Deacon" <will@kernel.org>
Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
Date: Fri, 01 Apr 2022 12:56:50 -0700	[thread overview]
Message-ID: <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> (raw)
In-Reply-To: <YkcTTY4YjQs5BRhE@google.com>

On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote:
> On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote:


> To answer your original question about memory 'conversion', the key
> thing is that the pKVM hypervisor controls the stage-2 page-tables for
> everyone in the system, all guests as well as the host. As such, a page
> 'conversion' is nothing more than a permission change in the relevant
> page-tables.
>

So I can see two different ways to approach this.

One is that you split the whole address space in half and, just like SEV and TDX, allocate one bit to indicate the shared/private status of a page.  This makes it work a lot like SEV and TDX.

The other is to have shared and private pages be distinguished only by their hypercall history and the (protected) page tables.  This saves some address space and some page table allocations, but it opens some cans of worms too.  In particular, the guest and the hypervisor need to coordinate, in a way that the guest can trust, to ensure that the guest's idea of which pages are private match the host's.  This model seems a bit harder to support nicely with the private memory fd model, but not necessarily impossible.

Also, what are you trying to accomplish by having the host userspace mmap private pages?  Is the idea that multiple guest could share the same page until such time as one of them tries to write to it?  That would be kind of like having a third kind of memory that's visible to host and guests but is read-only for everyone.  TDX and SEV can't support this at all (a private page belongs to one guest and one guest only, at least in SEV and in the current TDX SEAM spec).  I imagine that this could be supported with private memory fds with some care without mmap, though -- the host could still populate the page with memcpy.  Or I suppose a memslot could support using MAP_PRIVATE fds and have approximately the right semantics.

--Andy



WARNING: multiple messages have this Message-ID (diff)
From: "Andy Lutomirski" <luto@kernel.org>
To: "Quentin Perret" <qperret@google.com>
Cc: Wanpeng Li <wanpengli@tencent.com>,
	kvm list <kvm@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>,
	qemu-devel@nongnu.org, "J . Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, "H. Peter Anvin" <hpa@zytor.com>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	Will Deacon <will@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Steven Price <steven.price@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Borislav Petkov <bp@alien8.de>,
	"Nakajima,  Jun" <jun.nakajima@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Jim Mattson <jmattson@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Jeff Layton <jlayton@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux API <linux-api@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Vishal Annapurve <vannapurve@google.com>,
	Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
Date: Fri, 01 Apr 2022 12:56:50 -0700	[thread overview]
Message-ID: <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> (raw)
In-Reply-To: <YkcTTY4YjQs5BRhE@google.com>

On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote:
> On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote:


> To answer your original question about memory 'conversion', the key
> thing is that the pKVM hypervisor controls the stage-2 page-tables for
> everyone in the system, all guests as well as the host. As such, a page
> 'conversion' is nothing more than a permission change in the relevant
> page-tables.
>

So I can see two different ways to approach this.

One is that you split the whole address space in half and, just like SEV and TDX, allocate one bit to indicate the shared/private status of a page.  This makes it work a lot like SEV and TDX.

The other is to have shared and private pages be distinguished only by their hypercall history and the (protected) page tables.  This saves some address space and some page table allocations, but it opens some cans of worms too.  In particular, the guest and the hypervisor need to coordinate, in a way that the guest can trust, to ensure that the guest's idea of which pages are private match the host's.  This model seems a bit harder to support nicely with the private memory fd model, but not necessarily impossible.

Also, what are you trying to accomplish by having the host userspace mmap private pages?  Is the idea that multiple guest could share the same page until such time as one of them tries to write to it?  That would be kind of like having a third kind of memory that's visible to host and guests but is read-only for everyone.  TDX and SEV can't support this at all (a private page belongs to one guest and one guest only, at least in SEV and in the current TDX SEAM spec).  I imagine that this could be supported with private memory fds with some care without mmap, though -- the host could still populate the page with memcpy.  Or I suppose a memslot could support using MAP_PRIVATE fds and have approximately the right semantics.

--Andy




  parent reply	other threads:[~2022-04-01 19:57 UTC|newest]

Thread overview: 183+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 14:08 [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-03-10 14:08 ` Chao Peng
2022-03-10 14:08 ` [PATCH v5 01/13] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-03-10 14:08   ` Chao Peng
2022-04-11 15:10   ` Kirill A. Shutemov
2022-04-11 15:10     ` Kirill A. Shutemov
2022-04-12 13:11     ` Chao Peng
2022-04-12 13:11       ` Chao Peng
2022-04-23  5:43   ` Vishal Annapurve
2022-04-24  8:15     ` Chao Peng
2022-04-24  8:15       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 02/13] mm: Introduce memfile_notifier Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-29 18:45   ` Sean Christopherson
2022-04-08 12:54     ` Chao Peng
2022-04-08 12:54       ` Chao Peng
2022-04-12 14:36   ` Hillf Danton
2022-04-13  6:47     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 03/13] mm/shmem: Support memfile_notifier Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-10 23:08   ` Dave Chinner
2022-03-10 23:08     ` Dave Chinner
2022-03-11  8:42     ` Chao Peng
2022-03-11  8:42       ` Chao Peng
2022-04-11 15:26   ` Kirill A. Shutemov
2022-04-11 15:26     ` Kirill A. Shutemov
2022-04-12 13:12     ` Chao Peng
2022-04-12 13:12       ` Chao Peng
2022-04-19 22:40   ` Vishal Annapurve
2022-04-20  3:24     ` Chao Peng
2022-04-20  3:24       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-04-07 16:05   ` Sean Christopherson
2022-04-07 17:09     ` Andy Lutomirski
2022-04-07 17:09       ` Andy Lutomirski
2022-04-08 17:56       ` Sean Christopherson
2022-04-08 18:54         ` David Hildenbrand
2022-04-08 18:54           ` David Hildenbrand
2022-04-12 14:36           ` Jason Gunthorpe
2022-04-12 14:36             ` Jason Gunthorpe
2022-04-12 21:27             ` Andy Lutomirski
2022-04-12 21:27               ` Andy Lutomirski
2022-04-13 16:30               ` David Hildenbrand
2022-04-13 16:30                 ` David Hildenbrand
2022-04-13 16:24             ` David Hildenbrand
2022-04-13 16:24               ` David Hildenbrand
2022-04-13 17:52               ` Jason Gunthorpe
2022-04-13 17:52                 ` Jason Gunthorpe
2022-04-25 14:07                 ` David Hildenbrand
2022-04-25 14:07                   ` David Hildenbrand
2022-04-08 13:02     ` Chao Peng
2022-04-08 13:02       ` Chao Peng
2022-04-11 15:34       ` Kirill A. Shutemov
2022-04-11 15:34         ` Kirill A. Shutemov
2022-04-12  5:14         ` Hugh Dickins
2022-04-11 15:32     ` Kirill A. Shutemov
2022-04-11 15:32       ` Kirill A. Shutemov
2022-04-12 13:39       ` Chao Peng
2022-04-12 13:39         ` Chao Peng
2022-04-12 19:28         ` Kirill A. Shutemov
2022-04-12 19:28           ` Kirill A. Shutemov
2022-04-13  9:15           ` Chao Peng
2022-04-13  9:15             ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 05/13] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-28 21:27   ` Sean Christopherson
2022-04-08 13:21     ` Chao Peng
2022-04-08 13:21       ` Chao Peng
2022-03-28 21:56   ` Sean Christopherson
2022-04-08 13:46     ` Chao Peng
2022-04-08 13:46       ` Chao Peng
2022-04-08 17:45       ` Sean Christopherson
2022-03-10 14:09 ` [PATCH v5 06/13] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-28 22:26   ` Sean Christopherson
2022-04-08 13:58     ` Chao Peng
2022-04-08 13:58       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 07/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-28 22:33   ` Sean Christopherson
2022-04-08 13:59     ` Chao Peng
2022-04-08 13:59       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 08/13] KVM: Use memfile_pfn_ops to obtain pfn for private pages Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-28 23:56   ` Sean Christopherson
2022-04-08 14:07     ` Chao Peng
2022-04-08 14:07       ` Chao Peng
2022-04-28 12:37     ` Chao Peng
2022-04-28 12:37       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 09/13] KVM: Handle page fault for private memory Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-29  1:07   ` Sean Christopherson
2022-04-12 12:10     ` Chao Peng
2022-04-12 12:10       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 10/13] KVM: Register private memslot to memory backing store Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-29 19:01   ` Sean Christopherson
2022-04-12 12:40     ` Chao Peng
2022-04-12 12:40       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-29 19:23   ` Sean Christopherson
2022-04-12 12:43     ` Chao Peng
2022-04-12 12:43       ` Chao Peng
2022-04-05 23:45   ` Michael Roth
2022-04-08  3:06     ` Sean Christopherson
2022-04-19 22:43   ` Vishal Annapurve
2022-04-20  3:17     ` Chao Peng
2022-04-20  3:17       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 12/13] KVM: Expose KVM_MEM_PRIVATE Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-29 19:13   ` Sean Christopherson
2022-04-12 12:56     ` Chao Peng
2022-04-12 12:56       ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 13/13] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng
2022-03-10 14:09   ` Chao Peng
2022-03-24 15:51 ` [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Quentin Perret
2022-03-28 17:13   ` Sean Christopherson
2022-03-28 18:00     ` Quentin Perret
2022-03-28 18:58       ` Sean Christopherson
2022-03-29 17:01         ` Quentin Perret
2022-03-30  8:58           ` Steven Price
2022-03-30  8:58             ` Steven Price
2022-03-30 10:39             ` Quentin Perret
2022-03-30 17:58               ` Sean Christopherson
2022-03-31 16:04                 ` Andy Lutomirski
2022-03-31 16:04                   ` Andy Lutomirski
2022-04-01 14:59                   ` Quentin Perret
2022-04-01 17:14                     ` Sean Christopherson
2022-04-01 18:03                       ` Quentin Perret
2022-04-01 18:24                         ` Sean Christopherson
2022-04-01 19:56                     ` Andy Lutomirski [this message]
2022-04-01 19:56                       ` Andy Lutomirski
2022-04-04 15:01                       ` Quentin Perret
2022-04-04 17:06                         ` Sean Christopherson
2022-04-04 22:04                           ` Andy Lutomirski
2022-04-04 22:04                             ` Andy Lutomirski
2022-04-05 10:36                             ` Quentin Perret
2022-04-05 17:51                               ` Andy Lutomirski
2022-04-05 17:51                                 ` Andy Lutomirski
2022-04-05 18:30                                 ` Sean Christopherson
2022-04-06 18:42                                   ` Andy Lutomirski
2022-04-06 18:42                                     ` Andy Lutomirski
2022-04-06 13:05                                 ` Quentin Perret
2022-04-05 18:03                               ` Sean Christopherson
2022-04-06 10:34                                 ` Quentin Perret
2022-04-22 10:56                                 ` Chao Peng
2022-04-22 10:56                                   ` Chao Peng
2022-04-22 11:06                                   ` Paolo Bonzini
2022-04-22 11:06                                     ` Paolo Bonzini
2022-04-24  8:07                                     ` Chao Peng
2022-04-24  8:07                                       ` Chao Peng
2022-04-24 16:59                                   ` Andy Lutomirski
2022-04-24 16:59                                     ` Andy Lutomirski
2022-04-25 13:40                                     ` Chao Peng
2022-04-25 13:40                                       ` Chao Peng
2022-04-25 14:52                                       ` Andy Lutomirski
2022-04-25 14:52                                         ` Andy Lutomirski
2022-04-25 20:30                                         ` Sean Christopherson
2022-06-10 19:18                                           ` Andy Lutomirski
2022-06-10 19:27                                             ` Sean Christopherson
2022-04-28 12:29                                         ` Chao Peng
2022-04-28 12:29                                           ` Chao Peng
2022-05-03 11:12                                           ` Quentin Perret
2022-05-09 22:30                                   ` Michael Roth
2022-05-09 23:29                                     ` Sean Christopherson
2022-07-21 20:05                                       ` Gupta, Pankaj
2022-07-21 21:19                                         ` Sean Christopherson
2022-07-21 21:36                                           ` Gupta, Pankaj
2022-07-23  3:09                                           ` Andy Lutomirski
2022-07-25  9:19                                             ` Gupta, Pankaj
2022-03-30 16:18             ` Sean Christopherson
2022-03-28 20:16 ` Andy Lutomirski
2022-03-28 20:16   ` Andy Lutomirski
2022-03-28 22:48   ` Nakajima, Jun
2022-03-28 22:48     ` Nakajima, Jun
2022-03-29  0:04     ` Sean Christopherson
2022-04-08 21:35   ` Vishal Annapurve
2022-04-12 13:00     ` Chao Peng
2022-04-12 13:00       ` Chao Peng
2022-04-12 19:58   ` Kirill A. Shutemov
2022-04-12 19:58     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com \
    --to=luto@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.