All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Wang, Wei W" <wei.w.wang@intel.com>
To: "Lutomirski, Andy" <luto@kernel.org>, "Christopherson,,
	Sean" <seanjc@google.com>, David Hildenbrand <david@redhat.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>,
	kvm list <kvm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	the arch/x86 maintainers <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>,
	Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, "Mike Rapoport" <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	"aarcange@redhat.com" <aarcange@redhat.com>,
	"ddutile@redhat.com" <ddutile@redhat.com>,
	"dhildenb@redhat.com" <dhildenb@redhat.com>,
	"Quentin Perret" <qperret@google.com>,
	Michael Roth <michael.roth@amd.com>,
	"Hocko, Michal" <mhocko@suse.com>,
	Muchun Song <songmuchun@bytedance.com>,
	"Will Deacon" <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Fuad Tabba <tabba@google.com>
Subject: RE: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd
Date: Thu, 22 Sep 2022 13:23:42 +0000	[thread overview]
Message-ID: <DS0PR11MB63734C0FC6A7FABCAF826620DC4E9@DS0PR11MB6373.namprd11.prod.outlook.com> (raw)
In-Reply-To: <84e81d21-c800-4fd5-ad7c-f20bcdd7508b@www.fastmail.com>

On Thursday, September 22, 2022 5:11 AM, Andy Lutomirski wrote:
> To: Christopherson,, Sean <seanjc@google.com>; David Hildenbrand
> <david@redhat.com>
> Cc: Chao Peng <chao.p.peng@linux.intel.com>; kvm list
> <kvm@vger.kernel.org>; Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org>; linux-mm@kvack.org;
> linux-fsdevel@vger.kernel.org; Linux API <linux-api@vger.kernel.org>;
> linux-doc@vger.kernel.org; qemu-devel@nongnu.org; Paolo Bonzini
> <pbonzini@redhat.com>; Jonathan Corbet <corbet@lwn.net>; Vitaly
> Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>;
> Jim Mattson <jmattson@google.com>; Joerg Roedel <joro@8bytes.org>;
> Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>;
> Borislav Petkov <bp@alien8.de>; the arch/x86 maintainers <x86@kernel.org>;
> H. Peter Anvin <hpa@zytor.com>; Hugh Dickins <hughd@google.com>; Jeff
> Layton <jlayton@kernel.org>; J . Bruce Fields <bfields@fieldses.org>; Andrew
> Morton <akpm@linux-foundation.org>; Shuah Khan <shuah@kernel.org>;
> Mike Rapoport <rppt@kernel.org>; Steven Price <steven.price@arm.com>;
> Maciej S . Szmigiero <mail@maciej.szmigiero.name>; Vlastimil Babka
> <vbabka@suse.cz>; Vishal Annapurve <vannapurve@google.com>; Yu Zhang
> <yu.c.zhang@linux.intel.com>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; Nakajima, Jun <jun.nakajima@intel.com>;
> Hansen, Dave <dave.hansen@intel.com>; Andi Kleen <ak@linux.intel.com>;
> aarcange@redhat.com; ddutile@redhat.com; dhildenb@redhat.com; Quentin
> Perret <qperret@google.com>; Michael Roth <michael.roth@amd.com>;
> Hocko, Michal <mhocko@suse.com>; Muchun Song
> <songmuchun@bytedance.com>; Wang, Wei W <wei.w.wang@intel.com>;
> Will Deacon <will@kernel.org>; Marc Zyngier <maz@kernel.org>; Fuad Tabba
> <tabba@google.com>
> Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible
> memfd
> 
> (please excuse any formatting disasters.  my internet went out as I was
> composing this, and i did my best to rescue it.)
> 
> On Mon, Sep 19, 2022, at 12:10 PM, Sean Christopherson wrote:
> > +Will, Marc and Fuad (apologies if I missed other pKVM folks)
> >
> > On Mon, Sep 19, 2022, David Hildenbrand wrote:
> >> On 15.09.22 16:29, Chao Peng wrote:
> >> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >> >
> >> > KVM can use memfd-provided memory for guest memory. For normal
> >> > userspace accessible memory, KVM userspace (e.g. QEMU) mmaps the
> >> > memfd into its virtual address space and then tells KVM to use the
> >> > virtual address to setup the mapping in the secondary page table (e.g.
> EPT).
> >> >
> >> > With confidential computing technologies like Intel TDX, the
> >> > memfd-provided memory may be encrypted with special key for special
> >> > software domain (e.g. KVM guest) and is not expected to be directly
> >> > accessed by userspace. Precisely, userspace access to such
> >> > encrypted memory may lead to host crash so it should be prevented.
> >>
> >> Initially my thaught was that this whole inaccessible thing is TDX
> >> specific and there is no need to force that on other mechanisms.
> >> That's why I suggested to not expose this to user space but handle
> >> the notifier requirements internally.
> >>
> >> IIUC now, protected KVM has similar demands. Either access
> >> (read/write) of guest RAM would result in a fault and possibly crash
> >> the hypervisor (at least not the whole machine IIUC).
> >
> > Yep.  The missing piece for pKVM is the ability to convert from shared
> > to private while preserving the contents, e.g. to hand off a large
> > buffer (hundreds of MiB) for processing in the protected VM.  Thoughts
> > on this at the bottom.
> >
> >> > This patch introduces userspace inaccessible memfd (created with
> >> > MFD_INACCESSIBLE). Its memory is inaccessible from userspace
> >> > through ordinary MMU access (e.g. read/write/mmap) but can be
> >> > accessed via in-kernel interface so KVM can directly interact with
> >> > core-mm without the need to map the memory into KVM userspace.
> >>
> >> With secretmem we decided to not add such "concept switch" flags and
> >> instead use a dedicated syscall.
> >>
> >
> > I have no personal preference whatsoever between a flag and a
> > dedicated syscall, but a dedicated syscall does seem like it would
> > give the kernel a bit more flexibility.
> 
> The third option is a device node, e.g. /dev/kvm_secretmem or
> /dev/kvm_tdxmem or similar.  But if we need flags or other details in the
> future, maybe this isn't ideal.
> 
> >
> >> What about memfd_inaccessible()? Especially, sealing and hugetlb are
> >> not even supported and it might take a while to support either.
> >
> > Don't know about sealing, but hugetlb support for "inaccessible"
> > memory needs to come sooner than later.  "inaccessible" in quotes
> > because we might want to choose a less binary name, e.g.
> > "restricted"?.
> >
> > Regarding pKVM's use case, with the shim approach I believe this can
> > be done by allowing userspace mmap() the "hidden" memfd, but with a
> > ton of restrictions piled on top.
> >
> > My first thought was to make the uAPI a set of KVM ioctls so that KVM
> > could tightly tightly control usage without taking on too much
> > complexity in the kernel, but working through things, routing the
> > behavior through the shim itself might not be all that horrific.
> >
> > IIRC, we discarded the idea of allowing userspace to map the "private"
> > fd because
> > things got too complex, but with the shim it doesn't seem _that_ bad.
> 
> What's the exact use case?  Is it just to pre-populate the memory?

Add one more use case here. For TDX live migration support, on the destination side,
we map the private fd during migration to store the encrypted private memory data sent
from source, and at the end of migration, we unmap it and make it inaccessible before
resuming the TD to run.

  reply	other threads:[~2022-09-22 13:24 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15 14:29 [PATCH v8 0/8] KVM: mm: fd-based approach for supporting KVM Chao Peng
2022-09-15 14:29 ` [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd Chao Peng
2022-09-19  9:12   ` David Hildenbrand
2022-09-19 19:10     ` Sean Christopherson
2022-09-21 21:10       ` Andy Lutomirski
2022-09-22 13:23         ` Wang, Wei W [this message]
2022-09-23 15:20         ` Fuad Tabba
2022-09-23 15:19       ` Fuad Tabba
2022-09-26 14:23         ` Chao Peng
2022-09-26 15:51           ` Fuad Tabba
2022-09-27 22:47             ` Sean Christopherson
2022-09-30 16:19               ` Fuad Tabba
2022-10-13 13:34                 ` Chao Peng
2022-10-17 10:31                   ` Fuad Tabba
2022-10-17 14:58                     ` Chao Peng
2022-10-17 19:05                       ` Fuad Tabba
2022-10-19 13:30                         ` Chao Peng
2022-10-18  0:33                 ` Sean Christopherson
2022-10-19 15:04                   ` Fuad Tabba
2022-09-23  0:58     ` Kirill A . Shutemov
2022-09-26 10:35       ` David Hildenbrand
2022-09-26 14:48         ` Kirill A. Shutemov
2022-09-26 14:53           ` David Hildenbrand
2022-09-27 23:23             ` Sean Christopherson
2022-09-28 13:36               ` Kirill A. Shutemov
2022-09-22 13:26   ` Wang, Wei W
2022-09-22 19:49     ` Sean Christopherson
2022-09-23  0:53       ` Kirill A . Shutemov
2022-09-23 15:20         ` Fuad Tabba
2022-09-30 16:14   ` Fuad Tabba
2022-09-30 16:23     ` Kirill A . Shutemov
2022-10-03  7:33       ` Fuad Tabba
2022-10-03 11:01         ` Kirill A. Shutemov
2022-10-04 15:39           ` Fuad Tabba
2022-10-06  8:50   ` Fuad Tabba
2022-10-06 13:04     ` Kirill A. Shutemov
2022-10-17 13:00   ` Vlastimil Babka
2022-10-17 16:19     ` Kirill A . Shutemov
2022-10-17 16:39       ` Gupta, Pankaj
2022-10-17 21:56         ` Kirill A . Shutemov
2022-10-18 13:42           ` Vishal Annapurve
2022-10-19 15:32             ` Kirill A . Shutemov
2022-10-20 10:50               ` Vishal Annapurve
2022-10-21 13:54                 ` Chao Peng
2022-10-21 16:53                   ` Sean Christopherson
2022-10-19 12:23   ` Vishal Annapurve
2022-10-21 13:47     ` Chao Peng
2022-10-21 16:18       ` Sean Christopherson
2022-10-24 14:59         ` Kirill A . Shutemov
2022-10-24 15:26           ` David Hildenbrand
2022-11-03 16:27           ` Vishal Annapurve
2022-09-15 14:29 ` [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-09-16  9:14   ` Bagas Sanjaya
2022-09-16  9:53     ` Chao Peng
2022-09-26 10:26   ` Fuad Tabba
2022-09-26 14:04     ` Chao Peng
2022-09-29 22:45   ` Isaku Yamahata
2022-09-29 23:22     ` Sean Christopherson
2022-10-05 13:04   ` Jarkko Sakkinen
2022-10-05 22:05     ` Jarkko Sakkinen
2022-10-06  9:00   ` Fuad Tabba
2022-10-06 14:58   ` Jarkko Sakkinen
2022-10-06 15:07     ` Jarkko Sakkinen
2022-10-06 15:34       ` Sean Christopherson
2022-10-07 11:14         ` Jarkko Sakkinen
2022-10-07 14:58           ` Sean Christopherson
2022-10-07 21:54             ` Jarkko Sakkinen
2022-10-08 16:15               ` Jarkko Sakkinen
2022-10-08 17:35                 ` Jarkko Sakkinen
2022-10-10  8:25                   ` Chao Peng
2022-10-12  8:14                     ` Jarkko Sakkinen
2022-09-15 14:29 ` [PATCH v8 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-09-16  9:17   ` Bagas Sanjaya
2022-09-16  9:54     ` Chao Peng
2022-09-15 14:29 ` [PATCH v8 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-09-15 14:29 ` [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions Chao Peng
2022-09-26 10:36   ` Fuad Tabba
2022-09-26 14:07     ` Chao Peng
2022-10-11  9:48   ` Fuad Tabba
2022-10-12  2:35     ` Chao Peng
2022-10-17 10:15       ` Fuad Tabba
2022-10-17 22:17         ` Sean Christopherson
2022-10-19 13:23           ` Chao Peng
2022-10-19 15:02             ` Fuad Tabba
2022-10-19 16:09               ` Sean Christopherson
2022-10-19 18:32                 ` Fuad Tabba
2022-09-15 14:29 ` [PATCH v8 6/8] KVM: Update lpage info when private/shared memory are mixed Chao Peng
2022-09-29 16:52   ` Isaku Yamahata
2022-09-30  8:59     ` Chao Peng
2022-09-15 14:29 ` [PATCH v8 7/8] KVM: Handle page fault for private memory Chao Peng
2022-10-14 18:57   ` Sean Christopherson
2022-10-17 14:48     ` Chao Peng
2022-09-15 14:29 ` [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-10-04 14:55   ` Jarkko Sakkinen
2022-10-10  8:31     ` Chao Peng
2022-10-06  8:55   ` Fuad Tabba
2022-10-10  8:33     ` Chao Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DS0PR11MB63734C0FC6A7FABCAF826620DC4E9@DS0PR11MB6373.namprd11.prod.outlook.com \
    --to=wei.w.wang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mhocko@suse.com \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=steven.price@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.