All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michael Roth <michael.roth@amd.com>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-api@vger.kernel.org,
	linux-doc@vger.kernel.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com, david@redhat.com, aarcange@redhat.com,
	ddutile@redhat.com, dhildenb@redhat.com,
	Quentin Perret <qperret@google.com>,
	tabba@google.com, mhocko@suse.com,
	Muchun Song <songmuchun@bytedance.com>,
	wei.w.wang@intel.com
Subject: Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory
Date: Mon, 14 Nov 2022 18:28:43 +0300	[thread overview]
Message-ID: <20221114152843.ylxe4dis254vrj5u@box.shutemov.name> (raw)
In-Reply-To: <20a11042-2cfb-8f42-9d80-6672e155ca2c@suse.cz>

On Mon, Nov 14, 2022 at 03:02:37PM +0100, Vlastimil Babka wrote:
> On 11/1/22 16:19, Michael Roth wrote:
> > On Tue, Nov 01, 2022 at 07:37:29PM +0800, Chao Peng wrote:
> >> > 
> >> >   1) restoring kernel directmap:
> >> > 
> >> >      Currently SNP (and I believe TDX) need to either split or remove kernel
> >> >      direct mappings for restricted PFNs, since there is no guarantee that
> >> >      other PFNs within a 2MB range won't be used for non-restricted
> >> >      (which will cause an RMP #PF in the case of SNP since the 2MB
> >> >      mapping overlaps with guest-owned pages)
> >> 
> >> Has the splitting and restoring been a well-discussed direction? I'm
> >> just curious whether there is other options to solve this issue.
> > 
> > For SNP it's been discussed for quite some time, and either splitting or
> > removing private entries from directmap are the well-discussed way I'm
> > aware of to avoid RMP violations due to some other kernel process using
> > a 2MB mapping to access shared memory if there are private pages that
> > happen to be within that range.
> > 
> > In both cases the issue of how to restore directmap as 2M becomes a
> > problem.
> > 
> > I was also under the impression TDX had similar requirements. If so,
> > do you know what the plan is for handling this for TDX?
> > 
> > There are also 2 potential alternatives I'm aware of, but these haven't
> > been discussed in much detail AFAIK:
> > 
> > a) Ensure confidential guests are backed by 2MB pages. shmem has a way to
> >    request 2MB THP pages, but I'm not sure how reliably we can guarantee
> >    that enough THPs are available, so if we went that route we'd probably
> >    be better off requiring the use of hugetlbfs as the backing store. But
> >    obviously that's a bit limiting and it would be nice to have the option
> >    of using normal pages as well. One nice thing with invalidation
> >    scheme proposed here is that this would "Just Work" if implement
> >    hugetlbfs support, so an admin that doesn't want any directmap
> >    splitting has this option available, otherwise it's done as a
> >    best-effort.
> > 
> > b) Implement general support for restoring directmap as 2M even when
> >    subpages might be in use by other kernel threads. This would be the
> >    most flexible approach since it requires no special handling during
> >    invalidations, but I think it's only possible if all the CPA
> >    attributes for the 2M range are the same at the time the mapping is
> >    restored/unsplit, so some potential locking issues there and still
> >    chance for splitting directmap over time.
> 
> I've been hoping that
> 
> c) using a mechanism such as [1] [2] where the goal is to group together
> these small allocations that need to increase directmap granularity so
> maximum number of large mappings are preserved.

As I mentioned in the other thread the restricted memfd can be backed by
secretmem instead of plain memfd. It already handles directmap with care.

But I don't think it has to be part of initial restricted memfd
implementation. It is SEV-specific requirement and AMD folks can extend
implementation as needed later.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

  reply	other threads:[~2022-11-14 15:29 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 15:13 [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Chao Peng
2022-10-25 15:13 ` [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory Chao Peng
2022-10-26 17:31   ` Isaku Yamahata
2022-10-28  6:12     ` Chao Peng
2022-10-27 10:20   ` Fuad Tabba
2022-10-31 17:47   ` Michael Roth
2022-11-01 11:37     ` Chao Peng
2022-11-01 15:19       ` Michael Roth
2022-11-01 19:30         ` Michael Roth
2022-11-02 14:53           ` Chao Peng
2022-11-02 21:19             ` Michael Roth
2022-11-14 14:02         ` Vlastimil Babka
2022-11-14 15:28           ` Kirill A. Shutemov [this message]
2022-11-14 22:16             ` Michael Roth
2022-11-15  9:48               ` Chao Peng
2022-11-14 22:16           ` Michael Roth
2022-11-02 21:14     ` Kirill A. Shutemov
2022-11-02 21:26       ` Michael Roth
2022-11-02 22:07       ` Michael Roth
2022-11-03 16:30         ` Kirill A. Shutemov
2022-11-29  0:06   ` Michael Roth
2022-11-29 11:21     ` Kirill A. Shutemov
2022-11-29 11:39       ` David Hildenbrand
2022-11-29 13:59         ` Chao Peng
2022-11-29 13:58       ` Chao Peng
2022-11-29  0:37   ` Michael Roth
2022-11-29 14:06     ` Chao Peng
2022-11-29 19:06       ` Michael Roth
2022-11-29 19:18         ` Michael Roth
2022-11-30  9:39           ` Chao Peng
2022-11-30 14:31             ` Michael Roth
2022-11-29 18:01     ` Vishal Annapurve
2022-12-02  2:16   ` Vishal Annapurve
2022-12-02  6:49     ` Chao Peng
2022-12-02 13:44       ` Kirill A . Shutemov
2022-10-25 15:13 ` [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-10-27 10:25   ` Fuad Tabba
2022-10-28  7:04   ` Xiaoyao Li
2022-10-31 14:14     ` Chao Peng
2022-11-14 16:04   ` Alex Bennée
2022-11-15  9:29     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-10-25 15:26   ` Peter Maydell
2022-10-25 16:17     ` Sean Christopherson
2022-10-27 10:27   ` Fuad Tabba
2022-10-28  6:14     ` Chao Peng
2022-11-15 16:56   ` Alex Bennée
2022-11-16  3:14     ` Chao Peng
2022-11-16 19:03       ` Alex Bennée
2022-11-17 13:45         ` Chao Peng
2022-11-17 15:08           ` Alex Bennée
2022-11-18  1:32             ` Chao Peng
2022-11-18 13:23               ` Alex Bennée
2022-11-18 15:59                 ` Sean Christopherson
2022-11-22  9:50                   ` Chao Peng
2022-11-23 18:02                     ` Sean Christopherson
2022-11-16 18:15   ` Andy Lutomirski
2022-11-16 18:48     ` Sean Christopherson
2022-11-17 13:42       ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-10-27 10:29   ` Fuad Tabba
2022-11-04  2:28     ` Chao Peng
2022-11-04 22:29       ` Sean Christopherson
2022-11-08  7:16         ` Chao Peng
2022-11-10 17:53           ` Sean Christopherson
2022-11-10 20:06   ` Sean Christopherson
2022-11-11  8:27     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions Chao Peng
2022-10-27 10:31   ` Fuad Tabba
2022-11-03 23:04   ` Sean Christopherson
2022-11-04  8:28     ` Chao Peng
2022-11-04 21:19       ` Sean Christopherson
2022-11-08  8:24         ` Chao Peng
2022-11-08  1:35   ` Yuan Yao
2022-11-08  9:41     ` Chao Peng
2022-11-09  5:52       ` Yuan Yao
2022-11-16 22:24   ` Sean Christopherson
2022-11-17 13:20     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed Chao Peng
2022-10-26 20:46   ` Isaku Yamahata
2022-10-28  6:38     ` Chao Peng
2022-11-08 12:08   ` Yuan Yao
2022-11-09  4:13     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 7/8] KVM: Handle page fault for private memory Chao Peng
2022-10-26 21:54   ` Isaku Yamahata
2022-10-28  6:55     ` Chao Peng
2022-11-01  0:02       ` Isaku Yamahata
2022-11-01 11:38         ` Chao Peng
2022-11-16 20:50   ` Ackerley Tng
2022-11-16 22:13     ` Sean Christopherson
2022-11-17 13:25       ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-10-27 10:31   ` Fuad Tabba
2022-11-03 12:13 ` [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Vishal Annapurve
2022-11-08  0:41   ` Isaku Yamahata
2022-11-09 15:54     ` Kirill A. Shutemov
2022-11-15 14:36       ` Kirill A. Shutemov
2022-11-14 11:43 ` Alex Bennée
2022-11-16  5:00   ` Chao Peng
2022-11-16  9:40     ` Alex Bennée
2022-11-17 14:16       ` Chao Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221114152843.ylxe4dis254vrj5u@box.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mhocko@suse.com \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=steven.price@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=wei.w.wang@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.