All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Chao Peng <chao.p.peng@linux.intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, john.ji@intel.com, susie.li@intel.com,
	jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com
Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST
Date: Fri, 19 Nov 2021 14:51:11 +0100	[thread overview]
Message-ID: <942e0dd6-e426-06f6-7b6c-0e80d23c27e6@redhat.com> (raw)
In-Reply-To: <20211119134739.20218-2-chao.p.peng@linux.intel.com>

On 19.11.21 14:47, Chao Peng wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> The new seal type provides semantics required for KVM guest private
> memory support. A file descriptor with the seal set is going to be used
> as source of guest memory in confidential computing environments such as
> Intel TDX and AMD SEV.
> 
> F_SEAL_GUEST can only be set on empty memfd. After the seal is set
> userspace cannot read, write or mmap the memfd.
> 
> Userspace is in charge of guest memory lifecycle: it can allocate the
> memory with falloc or punch hole to free memory from the guest.
> 
> The file descriptor passed down to KVM as guest memory backend. KVM
> register itself as the owner of the memfd via memfd_register_guest().
> 
> KVM provides callback that needed to be called on fallocate and punch
> hole.
> 
> memfd_register_guest() returns callbacks that need be used for
> requesting a new page from memfd.
> 

Repeating the feedback I already shared in a private mail thread:


As long as page migration / swapping is not supported, these pages
behave like any longterm pinned pages (e.g., VFIO) or secretmem pages.

1. These pages are not MOVABLE. They must not end up on ZONE_MOVABLE or
MIGRATE_CMA.

That should be easy to handle, you have to adjust the gfp_mask to
	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
just as mm/secretmem.c:secretmem_file_create() does.

2. These pages behave like mlocked pages and should be accounted as such.

This is probably where the accounting "fun" starts, but maybe it's
easier than I think to handle.

See mm/secretmem.c:secretmem_mmap(), where we account the pages as
VM_LOCKED and will consequently check per-process mlock limits. As we
don't mmap(), the same approach cannot be reused.

See drivers/vfio/vfio_iommu_type1.c:vfio_pin_map_dma() and
vfio_pin_pages_remote() on how to manually account via mm->locked_vm .

But it's a bit hairy because these pages are not actually mapped into
the page tables of the MM, so it might need some thought. Similarly,
these pages actually behave like "pinned" (as in mm->pinned_vm), but we
just don't increase the refcount AFAIR. Again, accounting really is a
bit hairy ...



-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Chao Peng <chao.p.peng@linux.intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	qemu-devel@nongnu.org
Cc: Wanpeng Li <wanpengli@tencent.com>,
	luto@kernel.org, "J . Bruce Fields" <bfields@fieldses.org>,
	dave.hansen@intel.com, "H . Peter Anvin" <hpa@zytor.com>,
	ak@linux.intel.com, Jonathan Corbet <corbet@lwn.net>,
	Joerg Roedel <joro@8bytes.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	jun.nakajima@intel.com, Thomas Gleixner <tglx@linutronix.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	Sean Christopherson <seanjc@google.com>,
	susie.li@intel.com, Jeff Layton <jlayton@kernel.org>,
	john.ji@intel.com, Yu Zhang <yu.c.zhang@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST
Date: Fri, 19 Nov 2021 14:51:11 +0100	[thread overview]
Message-ID: <942e0dd6-e426-06f6-7b6c-0e80d23c27e6@redhat.com> (raw)
In-Reply-To: <20211119134739.20218-2-chao.p.peng@linux.intel.com>

On 19.11.21 14:47, Chao Peng wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> The new seal type provides semantics required for KVM guest private
> memory support. A file descriptor with the seal set is going to be used
> as source of guest memory in confidential computing environments such as
> Intel TDX and AMD SEV.
> 
> F_SEAL_GUEST can only be set on empty memfd. After the seal is set
> userspace cannot read, write or mmap the memfd.
> 
> Userspace is in charge of guest memory lifecycle: it can allocate the
> memory with falloc or punch hole to free memory from the guest.
> 
> The file descriptor passed down to KVM as guest memory backend. KVM
> register itself as the owner of the memfd via memfd_register_guest().
> 
> KVM provides callback that needed to be called on fallocate and punch
> hole.
> 
> memfd_register_guest() returns callbacks that need be used for
> requesting a new page from memfd.
> 

Repeating the feedback I already shared in a private mail thread:


As long as page migration / swapping is not supported, these pages
behave like any longterm pinned pages (e.g., VFIO) or secretmem pages.

1. These pages are not MOVABLE. They must not end up on ZONE_MOVABLE or
MIGRATE_CMA.

That should be easy to handle, you have to adjust the gfp_mask to
	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
just as mm/secretmem.c:secretmem_file_create() does.

2. These pages behave like mlocked pages and should be accounted as such.

This is probably where the accounting "fun" starts, but maybe it's
easier than I think to handle.

See mm/secretmem.c:secretmem_mmap(), where we account the pages as
VM_LOCKED and will consequently check per-process mlock limits. As we
don't mmap(), the same approach cannot be reused.

See drivers/vfio/vfio_iommu_type1.c:vfio_pin_map_dma() and
vfio_pin_pages_remote() on how to manually account via mm->locked_vm .

But it's a bit hairy because these pages are not actually mapped into
the page tables of the MM, so it might need some thought. Similarly,
these pages actually behave like "pinned" (as in mm->pinned_vm), but we
just don't increase the refcount AFAIR. Again, accounting really is a
bit hairy ...



-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-11-19 13:51 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19 13:47 [RFC v2 PATCH 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2021-11-19 13:47 ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:51   ` David Hildenbrand [this message]
2021-11-19 13:51     ` David Hildenbrand
2021-11-22 13:59     ` Kirill A. Shutemov
2021-11-22 13:59       ` Kirill A. Shutemov
2021-11-19 15:19   ` Jason Gunthorpe
2021-11-19 15:19     ` Jason Gunthorpe
2021-11-19 15:39     ` David Hildenbrand
2021-11-19 15:39       ` David Hildenbrand
2021-11-19 16:00       ` Jason Gunthorpe
2021-11-19 16:00         ` Jason Gunthorpe
2021-11-22  9:26         ` David Hildenbrand
2021-11-22  9:26           ` David Hildenbrand
2021-11-22 13:31           ` Jason Gunthorpe
2021-11-22 13:31             ` Jason Gunthorpe
2021-11-22 13:35             ` David Hildenbrand
2021-11-22 13:35               ` David Hildenbrand
2021-11-22 14:01               ` Jason Gunthorpe
2021-11-22 14:01                 ` Jason Gunthorpe
2021-11-22 14:57                 ` David Hildenbrand
2021-11-22 14:57                   ` David Hildenbrand
2021-11-22 15:09                   ` Jason Gunthorpe
2021-11-22 15:09                     ` Jason Gunthorpe
2021-11-22 15:15                     ` David Hildenbrand
2021-11-22 15:15                       ` David Hildenbrand
2021-11-19 19:18       ` Sean Christopherson
2021-11-19 19:47         ` Jason Gunthorpe
2021-11-19 19:47           ` Jason Gunthorpe
2021-11-19 22:21           ` Sean Christopherson
2021-11-19 23:33             ` Jason Gunthorpe
2021-11-19 23:33               ` Jason Gunthorpe
2021-11-20  1:23               ` Sean Christopherson
2021-11-21  0:05                 ` Jason Gunthorpe
2021-11-21  0:05                   ` Jason Gunthorpe
2021-11-23  9:06       ` Paolo Bonzini
2021-11-23  9:06         ` Paolo Bonzini
2021-11-23 14:33         ` Chao Peng
2021-11-23 14:33           ` Chao Peng
2021-11-23 15:20         ` David Hildenbrand
2021-11-23 15:20           ` David Hildenbrand
2021-11-23 17:17         ` Jason Gunthorpe
2021-11-23 17:17           ` Jason Gunthorpe
2021-11-23  8:54   ` Paolo Bonzini
2021-11-23  8:54     ` Paolo Bonzini
2021-12-03  1:11   ` Andy Lutomirski
2021-12-03  1:11     ` Andy Lutomirski
2021-11-19 13:47 ` [RFC v2 PATCH 02/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 03/13] KVM: Extend kvm_userspace_memory_region to support fd based memslot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 04/13] KVM: Add fd-based memslot data structure and utils Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-23  8:41   ` Paolo Bonzini
2021-11-23  8:41     ` Paolo Bonzini
2021-11-23 14:30     ` Chao Peng
2021-11-23 14:30       ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 05/13] KVM: Implement fd-based memory using new memfd interfaces Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 06/13] KVM: Register/unregister memfd backed memslot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-25 16:55   ` Steven Price
2021-11-25 16:55     ` Steven Price
2021-11-19 13:47 ` [RFC v2 PATCH 07/13] KVM: Handle page fault for fd based memslot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-20  1:55   ` Yao Yuan
2021-11-20  1:55     ` Yao Yuan
2021-11-22  9:18     ` Chao Peng
2021-11-22  9:18       ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 08/13] KVM: Rename hva memory invalidation code to cover fd-based offset Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 09/13] KVM: Introduce kvm_memfd_invalidate_range Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-23  8:46   ` Paolo Bonzini
2021-11-23  8:46     ` Paolo Bonzini
2021-11-23 14:24     ` Chao Peng
2021-11-23 14:24       ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 10/13] KVM: Match inode for invalidation of fd-based slot Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 11/13] KVM: Add kvm_map_gfn_range Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 12/13] KVM: Introduce kvm_memfd_fallocate_range Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-19 13:47 ` [RFC v2 PATCH 13/13] KVM: Enable memfd based page invalidation/fallocate Chao Peng
2021-11-19 13:47   ` Chao Peng
2021-11-22 14:16   ` Kirill A. Shutemov
2021-11-22 14:16     ` Kirill A. Shutemov
2021-11-23  1:06     ` Chao Peng
2021-11-23  1:06       ` Chao Peng
2021-11-23  9:09       ` Paolo Bonzini
2021-11-23  9:09         ` Paolo Bonzini
2021-11-23 15:00         ` Chao Peng
2021-11-23 15:00           ` Chao Peng
2021-11-23  8:51   ` Paolo Bonzini
2021-11-23  8:51     ` Paolo Bonzini
2021-12-03  1:08 ` [RFC v2 PATCH 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Andy Lutomirski
2021-12-03  1:08   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=942e0dd6-e426-06f6-7b6c-0e80d23c27e6@redhat.com \
    --to=david@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=john.ji@intel.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seanjc@google.com \
    --cc=susie.li@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.