linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Quentin Perret <qperret@google.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com, david@redhat.com, maz@kernel.org,
	will@kernel.org
Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
Date: Thu, 24 Mar 2022 15:51:48 +0000	[thread overview]
Message-ID: <YjyS6A0o4JASQK+B@google.com> (raw)
In-Reply-To: <20220310140911.50924-1-chao.p.peng@linux.intel.com>

Hi Chao,

+CC Will and Marc for visibility.

On Thursday 10 Mar 2022 at 22:08:58 (+0800), Chao Peng wrote:
> This is the v5 of this series which tries to implement the fd-based KVM
> guest private memory. The patches are based on latest kvm/queue branch
> commit:
> 
>   d5089416b7fb KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
>  
> Introduction
> ------------
> In general this patch series introduce fd-based memslot which provides
> guest memory through memory file descriptor fd[offset,size] instead of
> hva/size. The fd can be created from a supported memory filesystem
> like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
> and the the memory backing store exchange callbacks when such memslot
> gets created. At runtime KVM will call into callbacks provided by the
> backing store to get the pfn with the fd+offset. Memory backing store
> will also call into KVM callbacks when userspace fallocate/punch hole
> on the fd to notify KVM to map/unmap secondary MMU page tables.
> 
> Comparing to existing hva-based memslot, this new type of memslot allows
> guest memory unmapped from host userspace like QEMU and even the kernel
> itself, therefore reduce attack surface and prevent bugs.
> 
> Based on this fd-based memslot, we can build guest private memory that
> is going to be used in confidential computing environments such as Intel
> TDX and AMD SEV. When supported, the memory backing store can provide
> more enforcement on the fd and KVM can use a single memslot to hold both
> the private and shared part of the guest memory. 
> 
> mm extension
> ---------------------
> Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file created
> with these flags cannot read(), write() or mmap() etc via normal
> MMU operations. The file content can only be used with the newly
> introduced memfile_notifier extension.
> 
> The memfile_notifier extension provides two sets of callbacks for KVM to
> interact with the memory backing store:
>   - memfile_notifier_ops: callbacks for memory backing store to notify
>     KVM when memory gets allocated/invalidated.
>   - memfile_pfn_ops: callbacks for KVM to call into memory backing store
>     to request memory pages for guest private memory.
> 
> The memfile_notifier extension also provides APIs for memory backing
> store to register/unregister itself and to trigger the notifier when the
> bookmarked memory gets fallocated/invalidated.
> 
> memslot extension
> -----------------
> Add the private fd and the fd offset to existing 'shared' memslot so that
> both private/shared guest memory can live in one single memslot. A page in
> the memslot is either private or shared. A page is private only when it's
> already allocated in the backing store fd, all the other cases it's treated
> as shared, this includes those already mapped as shared as well as those
> having not been mapped. This means the memory backing store is the place
> which tells the truth of which page is private.
> 
> Private memory map/unmap and conversion
> ---------------------------------------
> Userspace's map/unmap operations are done by fallocate() ioctl on the
> backing store fd.
>   - map: default fallocate() with mode=0.
>   - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE.
> The map/unmap will trigger above memfile_notifier_ops to let KVM map/unmap
> secondary MMU page tables.

I recently came across this series which is interesting for the
Protected KVM work that's currently ongoing in the Android world (see
[1], [2] or [3] for more details). The idea is similar in a number of
ways to the Intel TDX stuff (from what I understand, but I'm clearly not
understanding it all so, ...) or the Arm CCA solution, but using stage-2
MMUs instead of encryption; and leverages the caveat of the nVHE
KVM/arm64 implementation to isolate the control of stage-2 MMUs from the
host.

For Protected KVM (and I suspect most other confidential computing
solutions), guests have the ability to share some of their pages back
with the host kernel using a dedicated hypercall. This is necessary
for e.g. virtio communications, so these shared pages need to be mapped
back into the VMM's address space. I'm a bit confused about how that
would work with the approach proposed here. What is going to be the
approach for TDX?

It feels like the most 'natural' thing would be to have a KVM exit
reason describing which pages have been shared back by the guest, and to
then allow the VMM to mmap those specific pages in response in the
memfd. Is this something that has been discussed or considered?

Thanks,
Quentin

[1] https://lwn.net/Articles/836693/
[2] https://www.youtube.com/watch?v=wY-u6n75iXc
[3] https://www.youtube.com/watch?v=54q6RzS9BpQ&t=10862s

  parent reply	other threads:[~2022-03-24 15:52 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 14:08 [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-03-10 14:08 ` [PATCH v5 01/13] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-04-11 15:10   ` Kirill A. Shutemov
2022-04-12 13:11     ` Chao Peng
2022-04-23  5:43   ` Vishal Annapurve
2022-04-24  8:15     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 02/13] mm: Introduce memfile_notifier Chao Peng
2022-03-29 18:45   ` Sean Christopherson
2022-04-08 12:54     ` Chao Peng
     [not found]   ` <20220412143654.6313-1-hdanton@sina.com>
2022-04-13  6:47     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 03/13] mm/shmem: Support memfile_notifier Chao Peng
2022-03-10 23:08   ` Dave Chinner
2022-03-11  8:42     ` Chao Peng
2022-04-11 15:26   ` Kirill A. Shutemov
2022-04-12 13:12     ` Chao Peng
2022-04-19 22:40   ` Vishal Annapurve
2022-04-20  3:24     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Chao Peng
2022-04-07 16:05   ` Sean Christopherson
2022-04-07 17:09     ` Andy Lutomirski
2022-04-08 17:56       ` Sean Christopherson
2022-04-08 18:54         ` David Hildenbrand
2022-04-12 14:36           ` Jason Gunthorpe
2022-04-12 21:27             ` Andy Lutomirski
2022-04-13 16:30               ` David Hildenbrand
2022-04-13 16:24             ` David Hildenbrand
2022-04-13 17:52               ` Jason Gunthorpe
2022-04-25 14:07                 ` David Hildenbrand
2022-04-08 13:02     ` Chao Peng
2022-04-11 15:34       ` Kirill A. Shutemov
2022-04-12  5:14         ` Hugh Dickins
2022-04-11 15:32     ` Kirill A. Shutemov
2022-04-12 13:39       ` Chao Peng
2022-04-12 19:28         ` Kirill A. Shutemov
2022-04-13  9:15           ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 05/13] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-03-28 21:27   ` Sean Christopherson
2022-04-08 13:21     ` Chao Peng
2022-03-28 21:56   ` Sean Christopherson
2022-04-08 13:46     ` Chao Peng
2022-04-08 17:45       ` Sean Christopherson
2022-03-10 14:09 ` [PATCH v5 06/13] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2022-03-28 22:26   ` Sean Christopherson
2022-04-08 13:58     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 07/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2022-03-28 22:33   ` Sean Christopherson
2022-04-08 13:59     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 08/13] KVM: Use memfile_pfn_ops to obtain pfn for private pages Chao Peng
2022-03-28 23:56   ` Sean Christopherson
2022-04-08 14:07     ` Chao Peng
2022-04-28 12:37     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 09/13] KVM: Handle page fault for private memory Chao Peng
2022-03-29  1:07   ` Sean Christopherson
2022-04-12 12:10     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 10/13] KVM: Register private memslot to memory backing store Chao Peng
2022-03-29 19:01   ` Sean Christopherson
2022-04-12 12:40     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd Chao Peng
2022-03-29 19:23   ` Sean Christopherson
2022-04-12 12:43     ` Chao Peng
2022-04-05 23:45   ` Michael Roth
2022-04-08  3:06     ` Sean Christopherson
2022-04-19 22:43   ` Vishal Annapurve
2022-04-20  3:17     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 12/13] KVM: Expose KVM_MEM_PRIVATE Chao Peng
2022-03-29 19:13   ` Sean Christopherson
2022-04-12 12:56     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 13/13] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng
2022-03-24 15:51 ` Quentin Perret [this message]
2022-03-28 17:13   ` [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Sean Christopherson
2022-03-28 18:00     ` Quentin Perret
2022-03-28 18:58       ` Sean Christopherson
2022-03-29 17:01         ` Quentin Perret
2022-03-30  8:58           ` Steven Price
2022-03-30 10:39             ` Quentin Perret
2022-03-30 17:58               ` Sean Christopherson
2022-03-31 16:04                 ` Andy Lutomirski
2022-04-01 14:59                   ` Quentin Perret
2022-04-01 17:14                     ` Sean Christopherson
2022-04-01 18:03                       ` Quentin Perret
2022-04-01 18:24                         ` Sean Christopherson
2022-04-01 19:56                     ` Andy Lutomirski
2022-04-04 15:01                       ` Quentin Perret
2022-04-04 17:06                         ` Sean Christopherson
2022-04-04 22:04                           ` Andy Lutomirski
2022-04-05 10:36                             ` Quentin Perret
2022-04-05 17:51                               ` Andy Lutomirski
2022-04-05 18:30                                 ` Sean Christopherson
2022-04-06 18:42                                   ` Andy Lutomirski
2022-04-06 13:05                                 ` Quentin Perret
2022-04-05 18:03                               ` Sean Christopherson
2022-04-06 10:34                                 ` Quentin Perret
2022-04-22 10:56                                 ` Chao Peng
2022-04-22 11:06                                   ` Paolo Bonzini
2022-04-24  8:07                                     ` Chao Peng
2022-04-24 16:59                                   ` Andy Lutomirski
2022-04-25 13:40                                     ` Chao Peng
2022-04-25 14:52                                       ` Andy Lutomirski
2022-04-25 20:30                                         ` Sean Christopherson
2022-06-10 19:18                                           ` Andy Lutomirski
2022-06-10 19:27                                             ` Sean Christopherson
2022-04-28 12:29                                         ` Chao Peng
2022-05-03 11:12                                           ` Quentin Perret
2022-05-09 22:30                                   ` Michael Roth
2022-05-09 23:29                                     ` Sean Christopherson
2022-07-21 20:05                                       ` Gupta, Pankaj
2022-07-21 21:19                                         ` Sean Christopherson
2022-07-21 21:36                                           ` Gupta, Pankaj
2022-07-23  3:09                                           ` Andy Lutomirski
2022-07-25  9:19                                             ` Gupta, Pankaj
2022-03-30 16:18             ` Sean Christopherson
2022-03-28 20:16 ` Andy Lutomirski
2022-03-28 22:48   ` Nakajima, Jun
2022-03-29  0:04     ` Sean Christopherson
2022-04-08 21:35   ` Vishal Annapurve
2022-04-12 13:00     ` Chao Peng
2022-04-12 19:58   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjyS6A0o4JASQK+B@google.com \
    --to=qperret@google.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).