All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Jason Gunthorpe <jgg@ziepe.ca>, Chao Peng <chao.p.peng@linux.intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, john.ji@intel.com, susie.li@intel.com,
	jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com
Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST
Date: Fri, 19 Nov 2021 16:39:15 +0100	[thread overview]
Message-ID: <df11d753-6242-8f7c-cb04-c095f68b41fa@redhat.com> (raw)
In-Reply-To: <20211119151943.GH876299@ziepe.ca>

On 19.11.21 16:19, Jason Gunthorpe wrote:
> On Fri, Nov 19, 2021 at 09:47:27PM +0800, Chao Peng wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> The new seal type provides semantics required for KVM guest private
>> memory support. A file descriptor with the seal set is going to be used
>> as source of guest memory in confidential computing environments such as
>> Intel TDX and AMD SEV.
>>
>> F_SEAL_GUEST can only be set on empty memfd. After the seal is set
>> userspace cannot read, write or mmap the memfd.
>>
>> Userspace is in charge of guest memory lifecycle: it can allocate the
>> memory with falloc or punch hole to free memory from the guest.
>>
>> The file descriptor passed down to KVM as guest memory backend. KVM
>> register itself as the owner of the memfd via memfd_register_guest().
>>
>> KVM provides callback that needed to be called on fallocate and punch
>> hole.
>>
>> memfd_register_guest() returns callbacks that need be used for
>> requesting a new page from memfd.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>  include/linux/memfd.h      |  24 ++++++++
>>  include/linux/shmem_fs.h   |   9 +++
>>  include/uapi/linux/fcntl.h |   1 +
>>  mm/memfd.c                 |  33 +++++++++-
>>  mm/shmem.c                 | 123 ++++++++++++++++++++++++++++++++++++-
>>  5 files changed, 186 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/memfd.h b/include/linux/memfd.h
>> index 4f1600413f91..ff920ef28688 100644
>> +++ b/include/linux/memfd.h
>> @@ -4,13 +4,37 @@
>>  
>>  #include <linux/file.h>
>>  
>> +struct guest_ops {
>> +	void (*invalidate_page_range)(struct inode *inode, void *owner,
>> +				      pgoff_t start, pgoff_t end);
>> +	void (*fallocate)(struct inode *inode, void *owner,
>> +			  pgoff_t start, pgoff_t end);
>> +};
>> +
>> +struct guest_mem_ops {
>> +	unsigned long (*get_lock_pfn)(struct inode *inode, pgoff_t offset,
>> +				      bool alloc, int *order);
>> +	void (*put_unlock_pfn)(unsigned long pfn);
>> +
>> +};
> 
> Ignoring confidential compute for a moment
> 
> If qmeu can put all the guest memory in a memfd and not map it, then
> I'd also like to see that the IOMMU can use this interface too so we
> can have VFIO working in this configuration.

In QEMU we usually want to (and must) be able to access guest memory
from user space, with the current design we wouldn't even be able to
temporarily mmap it -- which makes sense for encrypted memory only. The
corner case really is encrypted memory. So I don't think we'll see a
broad use of this feature outside of encrypted VMs in QEMU. I might be
wrong, most probably I am :)

> 
> As designed the above looks useful to import a memfd to a VFIO
> container but could you consider some more generic naming than calling
> this 'guest' ?

+1 the guest terminology is somewhat sob-optimal.

> 
> Along the same lines, to support fast migration, we'd want to be able
> to send these things to the RDMA subsytem as well so we can do data
> xfer. Very similar to VFIO.
> 
> Also, shouldn't this be two patches? F_SEAL is not really related to
> these acessors, is it?


Apart from the special "encrypted memory" semantics, I assume nothing
speaks against allowing for mmaping these memfds, for example, for any
other VFIO use cases.

-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Jason Gunthorpe <jgg@ziepe.ca>, Chao Peng <chao.p.peng@linux.intel.com>
Cc: Wanpeng Li <wanpengli@tencent.com>,
	jun.nakajima@intel.com, kvm@vger.kernel.org,
	qemu-devel@nongnu.org, "J . Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, "H . Peter Anvin" <hpa@zytor.com>,
	ak@linux.intel.com, Jonathan Corbet <corbet@lwn.net>,
	Joerg Roedel <joro@8bytes.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	luto@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	dave.hansen@intel.com, Sean Christopherson <seanjc@google.com>,
	susie.li@intel.com, Jeff Layton <jlayton@kernel.org>,
	linux-kernel@vger.kernel.org, john.ji@intel.com,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	linux-fsdevel@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST
Date: Fri, 19 Nov 2021 16:39:15 +0100	[thread overview]
Message-ID: <df11d753-6242-8f7c-cb04-c095f68b41fa@redhat.com> (raw)
In-Reply-To: <20211119151943.GH876299@ziepe.ca>

On 19.11.21 16:19, Jason Gunthorpe wrote:
> On Fri, Nov 19, 2021 at 09:47:27PM +0800, Chao Peng wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> The new seal type provides semantics required for KVM guest private
>> memory support. A file descriptor with the seal set is going to be used
>> as source of guest memory in confidential computing environments such as
>> Intel TDX and AMD SEV.
>>
>> F_SEAL_GUEST can only be set on empty memfd. After the seal is set
>> userspace cannot read, write or mmap the memfd.
>>
>> Userspace is in charge of guest memory lifecycle: it can allocate the
>> memory with falloc or punch hole to free memory from the guest.
>>
>> The file descriptor passed down to KVM as guest memory backend. KVM
>> register itself as the owner of the memfd via memfd_register_guest().
>>
>> KVM provides callback that needed to be called on fallocate and punch
>> hole.
>>
>> memfd_register_guest() returns callbacks that need be used for
>> requesting a new page from memfd.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>  include/linux/memfd.h      |  24 ++++++++
>>  include/linux/shmem_fs.h   |   9 +++
>>  include/uapi/linux/fcntl.h |   1 +
>>  mm/memfd.c                 |  33 +++++++++-
>>  mm/shmem.c                 | 123 ++++++++++++++++++++++++++++++++++++-
>>  5 files changed, 186 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/memfd.h b/include/linux/memfd.h
>> index 4f1600413f91..ff920ef28688 100644
>> +++ b/include/linux/memfd.h
>> @@ -4,13 +4,37 @@
>>  
>>  #include <linux/file.h>
>>  
>> +struct guest_ops {
>> +	void (*invalidate_page_range)(struct inode *inode, void *owner,
>> +				      pgoff_t start, pgoff_t end);
>> +	void (*fallocate)(struct inode *inode, void *owner,
>> +			  pgoff_t start, pgoff_t end);
>> +};
>> +
>> +struct guest_mem_ops {
>> +	unsigned long (*get_lock_pfn)(struct inode *inode, pgoff_t offset,
>> +				      bool alloc, int *order);
>> +	void (*put_unlock_pfn)(unsigned long pfn);
>> +
>> +};
> 
> Ignoring confidential compute for a moment
> 
> If qmeu can put all the guest memory in a memfd and not map it, then
> I'd also like to see that the IOMMU can use this interface too so we
> can have VFIO working in this configuration.

In QEMU we usually want to (and must) be able to access guest memory
from user space, with the current design we wouldn't even be able to
temporarily mmap it -- which makes sense for encrypted memory only. The
corner case really is encrypted memory. So I don't think we'll see a
broad use of this feature outside of encrypted VMs in QEMU. I might be
wrong, most probably I am :)

> 
> As designed the above looks useful to import a memfd to a VFIO
> container but could you consider some more generic naming than calling
> this 'guest' ?

+1 the guest terminology is somewhat sob-optimal.

> 
> Along the same lines, to support fast migration, we'd want to be able
> to send these things to the RDMA subsytem as well so we can do data
> xfer. Very similar to VFIO.
> 
> Also, shouldn't this be two patches? F_SEAL is not really related to
> these acessors, is it?


Apart from the special "encrypted memory" semantics, I assume nothing
speaks against allowing for mmaping these memfds, for example, for any
other VFIO use cases.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-11-19 15:39 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19 13:47 [RFC v2 PATCH 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2021-11-19 13:47 ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:51   ` David Hildenbrand
2021-11-19 13:51     ` David Hildenbrand
2021-11-22 13:59     ` Kirill A. Shutemov
2021-11-22 13:59       ` Kirill A. Shutemov
2021-11-19 15:19   ` Jason Gunthorpe
2021-11-19 15:19     ` Jason Gunthorpe
2021-11-19 15:39     ` David Hildenbrand [this message]
2021-11-19 15:39       ` David Hildenbrand
2021-11-19 16:00       ` Jason Gunthorpe
2021-11-19 16:00         ` Jason Gunthorpe
2021-11-22  9:26         ` David Hildenbrand
2021-11-22  9:26           ` David Hildenbrand
2021-11-22 13:31           ` Jason Gunthorpe
2021-11-22 13:31             ` Jason Gunthorpe
2021-11-22 13:35             ` David Hildenbrand
2021-11-22 13:35               ` David Hildenbrand
2021-11-22 14:01               ` Jason Gunthorpe
2021-11-22 14:01                 ` Jason Gunthorpe
2021-11-22 14:57                 ` David Hildenbrand
2021-11-22 14:57                   ` David Hildenbrand
2021-11-22 15:09                   ` Jason Gunthorpe
2021-11-22 15:09                     ` Jason Gunthorpe
2021-11-22 15:15                     ` David Hildenbrand
2021-11-22 15:15                       ` David Hildenbrand
2021-11-19 19:18       ` Sean Christopherson
2021-11-19 19:47         ` Jason Gunthorpe
2021-11-19 19:47           ` Jason Gunthorpe
2021-11-19 22:21           ` Sean Christopherson
2021-11-19 23:33             ` Jason Gunthorpe
2021-11-19 23:33               ` Jason Gunthorpe
2021-11-20  1:23               ` Sean Christopherson
2021-11-21  0:05                 ` Jason Gunthorpe
2021-11-21  0:05                   ` Jason Gunthorpe
2021-11-23  9:06       ` Paolo Bonzini
2021-11-23  9:06         ` Paolo Bonzini
2021-11-23 14:33         ` Chao Peng
2021-11-23 14:33           ` Chao Peng
2021-11-23 15:20         ` David Hildenbrand
2021-11-23 15:20           ` David Hildenbrand
2021-11-23 17:17         ` Jason Gunthorpe
2021-11-23 17:17           ` Jason Gunthorpe
2021-11-23  8:54   ` Paolo Bonzini
2021-11-23  8:54     ` Paolo Bonzini
2021-12-03  1:11   ` Andy Lutomirski
2021-12-03  1:11     ` Andy Lutomirski
2021-11-19 13:47 ` [RFC v2 PATCH 02/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 03/13] KVM: Extend kvm_userspace_memory_region to support fd based memslot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 04/13] KVM: Add fd-based memslot data structure and utils Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-23  8:41   ` Paolo Bonzini
2021-11-23  8:41     ` Paolo Bonzini
2021-11-23 14:30     ` Chao Peng
2021-11-23 14:30       ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 05/13] KVM: Implement fd-based memory using new memfd interfaces Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 06/13] KVM: Register/unregister memfd backed memslot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-25 16:55   ` Steven Price
2021-11-25 16:55     ` Steven Price
2021-11-19 13:47 ` [RFC v2 PATCH 07/13] KVM: Handle page fault for fd based memslot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-20  1:55   ` Yao Yuan
2021-11-20  1:55     ` Yao Yuan
2021-11-22  9:18     ` Chao Peng
2021-11-22  9:18       ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 08/13] KVM: Rename hva memory invalidation code to cover fd-based offset Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 09/13] KVM: Introduce kvm_memfd_invalidate_range Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-23  8:46   ` Paolo Bonzini
2021-11-23  8:46     ` Paolo Bonzini
2021-11-23 14:24     ` Chao Peng
2021-11-23 14:24       ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 10/13] KVM: Match inode for invalidation of fd-based slot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 11/13] KVM: Add kvm_map_gfn_range Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 12/13] KVM: Introduce kvm_memfd_fallocate_range Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 13/13] KVM: Enable memfd based page invalidation/fallocate Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-22 14:16   ` Kirill A. Shutemov
2021-11-22 14:16     ` Kirill A. Shutemov
2021-11-23  1:06     ` Chao Peng
2021-11-23  1:06       ` Chao Peng
2021-11-23  9:09       ` Paolo Bonzini
2021-11-23  9:09         ` Paolo Bonzini
2021-11-23 15:00         ` Chao Peng
2021-11-23 15:00           ` Chao Peng
2021-11-23  8:51   ` Paolo Bonzini
2021-11-23  8:51     ` Paolo Bonzini
2021-12-03  1:08 ` [RFC v2 PATCH 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Andy Lutomirski
2021-12-03  1:08   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df11d753-6242-8f7c-cb04-c095f68b41fa@redhat.com \
    --to=david@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=john.ji@intel.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seanjc@google.com \
    --cc=susie.li@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.