linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-api@vger.kernel.org,
	linux-doc@vger.kernel.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com, david@redhat.com, aarcange@redhat.com,
	ddutile@redhat.com, dhildenb@redhat.com,
	Quentin Perret <qperret@google.com>,
	tabba@google.com, Michael Roth <michael.roth@amd.com>,
	mhocko@suse.com, Muchun Song <songmuchun@bytedance.com>,
	wei.w.wang@intel.com
Subject: Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions
Date: Fri, 4 Nov 2022 21:19:31 +0000	[thread overview]
Message-ID: <Y2WB48kD0J4VGynX@google.com> (raw)
In-Reply-To: <20221104082843.GA4142342@chaop.bj.intel.com>

Paolo, any thoughts before I lead things further astray?

On Fri, Nov 04, 2022, Chao Peng wrote:
> On Thu, Nov 03, 2022 at 11:04:53PM +0000, Sean Christopherson wrote:
> > On Tue, Oct 25, 2022, Chao Peng wrote:
> > > @@ -4708,6 +4802,24 @@ static long kvm_vm_ioctl(struct file *filp,
> > >  		r = kvm_vm_ioctl_set_memory_region(kvm, &mem);
> > >  		break;
> > >  	}
> > > +#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> > > +	case KVM_MEMORY_ENCRYPT_REG_REGION:
> > > +	case KVM_MEMORY_ENCRYPT_UNREG_REGION: {
> > 
> > I'm having second thoughts about usurping KVM_MEMORY_ENCRYPT_(UN)REG_REGION.  Aside
> > from the fact that restricted/protected memory may not be encrypted, there are
> > other potential use cases for per-page memory attributes[*], e.g. to make memory
> > read-only (or no-exec, or exec-only, etc...) without having to modify memslots.
> > 
> > Any paravirt use case where the attributes of a page are effectively dictated by
> > the guest is going to run into the exact same performance problems with memslots,
> > which isn't suprising in hindsight since shared vs. private is really just an
> > attribute, albeit with extra special semantics.
> > 
> > And if we go with a brand new ioctl(), maybe someday in the very distant future
> > we can deprecate and delete KVM_MEMORY_ENCRYPT_(UN)REG_REGION.
> > 
> > Switching to a new ioctl() should be a minor change, i.e. shouldn't throw too big
> > of a wrench into things.
> > 
> > Something like:
> > 
> >   KVM_SET_MEMORY_ATTRIBUTES
> > 
> >   struct kvm_memory_attributes {
> > 	__u64 address;
> > 	__u64 size;
> > 	__u64 flags;

Oh, this is half-baked.  I lost track of which flags were which.  What I intended
was a separate, initially-unused flags, e.g.

 struct kvm_memory_attributes {
	__u64 address;
	__u64 size;
	__u64 attributes;
	__u64 flags;
  }

so that KVM can tweak behavior and/or extend the effective size of the struct.

> I like the idea of adding a new ioctl(). But putting all attributes into
> a flags in uAPI sounds not good to me, e.g. forcing userspace to set all
> attributes in one call can cause pain for userspace, probably for KVM
> implementation as well. For private<->shared memory conversion, we
> actually only care the KVM_MEM_ATTR_SHARED or KVM_MEM_ATTR_PRIVATE bit,

Not necessarily, e.g. I can see pKVM wanting to convert from RW+PRIVATE => RO+SHARED
or even RW+PRIVATE => NONE+SHARED so that the guest can't write/access the memory
while it's accessible from the host.

And if this does extend beyond shared/private, dropping from RWX=>R, i.e. dropping
WX permissions, would also be a common operation.

Hmm, typing that out makes me think that if we do end up supporting other "attributes",
i.e. protections, we should go straight to full RWX protections instead of doing
things piecemeal, i.e. add individual protections instead of combinations like
NO_EXEC and READ_ONLY.  The protections would have to be inverted for backwards
compatibility, but that's easy enough to handle.  The semantics could be like
protection keys, which also have inverted persmissions, where the final protections
are the combination of memslot+attributes, i.e. a read-only memslot couldn't be made
writable via attributes.

E.g. userspace could do "NO_READ | NO_WRITE | NO_EXEC" to temporarily block access
to memory without needing to delete the memslot.  KVM would need to disallow
unsupported combinations, e.g. disallowed effective protections would be:

  - W or WX [unless there's an arch that supports write-only memory]
  - R or RW [until KVM plumbs through support for no-exec, or it's unsupported in hardware]
  - X       [until KVM plumbs through support for exec-only, or it's unsupported in hardware]

Anyways, that's all future work...

> but we force userspace to set other irrelevant bits as well if use this
> API.

They aren't irrelevant though, as the memory attributes are all describing the
allowed protections for a given page.  If there's a use case where userspace "can't"
keep track of the attributes for whatever reason, then userspace could do a RMW
to set/clear attributes.  Alternatively, the ioctl() could take an "operation" and
support WRITE/OR/AND to allow setting/clearing individual flags, e.g. tweak the
above to be: 
 
 struct kvm_memory_attributes {
	__u64 address;
	__u64 size;
	__u64 attributes;
	__u32 operation;
	__u32 flags;
  }

> I looked at kvm_device_attr, sounds we can do similar:

The device attributes deal with isolated, arbitrary values, whereas memory attributes
are flags, i.e. devices are 1:1 whereas memory is 1:MANY.  There is no "unset" for
device attributes, because they aren't flags.  Device attributes vs. memory attributes
really are two very different things that just happen to use a common name.

If it helped clarify things without creating naming problems, we could even use
PROTECTIONS instead of ATTRIBUTES.

>   KVM_SET_MEMORY_ATTR
> 
>   struct kvm_memory_attr {
> 	__u64 address;
> 	__u64 size;
> #define KVM_MEM_ATTR_SHARED	BIT(0)
> #define KVM_MEM_ATTR_READONLY	BIT(1)
> #define KVM_MEM_ATTR_NOEXEC	BIT(2)
> 	__u32 attr;

As above, letting userspace set only a single attribute would prevent setting
(or clearing) multiple attributes in a single ioctl().

> 	__u32 pad;
>   }
> 
> I'm not sure if we need KVM_GET_MEMORY_ATTR/KVM_HAS_MEMORY_ATTR as well,

Definitely would need to communicate to userspace that various attributes are
supported.  That doesn't necessarily require a common ioctl(), but I don't see
any reason not to add a common helper, and adding a common helper would mean
KVM_CAP_PRIVATE_MEM can go away.  But it should return a bitmask so that userspace
can do a single query to get all supported attributes, e.g. KVM_SUPPORTED_MEMORY_ATTRIBUTES.  

As for KVM_GET_MEMORY_ATTRIBUTES, we wouldn't necessarily have to provide such an
API, e.g. we could hold off until someone came along with a RMW use case (as above).
That said, debug would likely be a nightmare without KVM_GET_MEMORY_ATTRIBUTES,
so it's probably best to add it straightway.

> but sounds like we need a KVM_UNSET_MEMORY_ATTR.

No need if the setter operates on all attributes.

> Since we are exposing the attribute directly to userspace I also think
> we'd better treat shared memory as the default, so even when the private
> memory is not used, the bit can still be meaningful. So define BIT(0) as
> KVM_MEM_ATTR_PRIVATE instead of KVM_MEM_ATTR_SHARED.

Ah, right.


  reply	other threads:[~2022-11-04 21:19 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 15:13 [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Chao Peng
2022-10-25 15:13 ` [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory Chao Peng
2022-10-26 17:31   ` Isaku Yamahata
2022-10-28  6:12     ` Chao Peng
2022-10-27 10:20   ` Fuad Tabba
2022-10-31 17:47   ` Michael Roth
2022-11-01 11:37     ` Chao Peng
2022-11-01 15:19       ` Michael Roth
2022-11-01 19:30         ` Michael Roth
2022-11-02 14:53           ` Chao Peng
2022-11-02 21:19             ` Michael Roth
2022-11-14 14:02         ` Vlastimil Babka
2022-11-14 15:28           ` Kirill A. Shutemov
2022-11-14 22:16             ` Michael Roth
2022-11-15  9:48               ` Chao Peng
2022-11-14 22:16           ` Michael Roth
2022-11-02 21:14     ` Kirill A. Shutemov
2022-11-02 21:26       ` Michael Roth
2022-11-02 22:07       ` Michael Roth
2022-11-03 16:30         ` Kirill A. Shutemov
2022-11-29  0:06   ` Michael Roth
2022-11-29 11:21     ` Kirill A. Shutemov
2022-11-29 11:39       ` David Hildenbrand
2022-11-29 13:59         ` Chao Peng
2022-11-29 13:58       ` Chao Peng
2022-11-29  0:37   ` Michael Roth
2022-11-29 14:06     ` Chao Peng
2022-11-29 19:06       ` Michael Roth
2022-11-29 19:18         ` Michael Roth
2022-11-30  9:39           ` Chao Peng
2022-11-30 14:31             ` Michael Roth
2022-11-29 18:01     ` Vishal Annapurve
2022-12-02  2:16   ` Vishal Annapurve
2022-12-02  6:49     ` Chao Peng
2022-12-02 13:44       ` Kirill A . Shutemov
2022-10-25 15:13 ` [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-10-27 10:25   ` Fuad Tabba
2022-10-28  7:04   ` Xiaoyao Li
2022-10-31 14:14     ` Chao Peng
2022-11-14 16:04   ` Alex Bennée
2022-11-15  9:29     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-10-25 15:26   ` Peter Maydell
2022-10-25 16:17     ` Sean Christopherson
2022-10-27 10:27   ` Fuad Tabba
2022-10-28  6:14     ` Chao Peng
2022-11-15 16:56   ` Alex Bennée
2022-11-16  3:14     ` Chao Peng
2022-11-16 19:03       ` Alex Bennée
2022-11-17 13:45         ` Chao Peng
2022-11-17 15:08           ` Alex Bennée
2022-11-18  1:32             ` Chao Peng
2022-11-18 13:23               ` Alex Bennée
2022-11-18 15:59                 ` Sean Christopherson
2022-11-22  9:50                   ` Chao Peng
2022-11-23 18:02                     ` Sean Christopherson
2022-11-16 18:15   ` Andy Lutomirski
2022-11-16 18:48     ` Sean Christopherson
2022-11-17 13:42       ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-10-27 10:29   ` Fuad Tabba
2022-11-04  2:28     ` Chao Peng
2022-11-04 22:29       ` Sean Christopherson
2022-11-08  7:16         ` Chao Peng
2022-11-10 17:53           ` Sean Christopherson
2022-11-10 20:06   ` Sean Christopherson
2022-11-11  8:27     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions Chao Peng
2022-10-27 10:31   ` Fuad Tabba
2022-11-03 23:04   ` Sean Christopherson
2022-11-04  8:28     ` Chao Peng
2022-11-04 21:19       ` Sean Christopherson [this message]
2022-11-08  8:24         ` Chao Peng
2022-11-08  1:35   ` Yuan Yao
2022-11-08  9:41     ` Chao Peng
2022-11-09  5:52       ` Yuan Yao
2022-11-16 22:24   ` Sean Christopherson
2022-11-17 13:20     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed Chao Peng
2022-10-26 20:46   ` Isaku Yamahata
2022-10-28  6:38     ` Chao Peng
2022-11-08 12:08   ` Yuan Yao
2022-11-09  4:13     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 7/8] KVM: Handle page fault for private memory Chao Peng
2022-10-26 21:54   ` Isaku Yamahata
2022-10-28  6:55     ` Chao Peng
2022-11-01  0:02       ` Isaku Yamahata
2022-11-01 11:38         ` Chao Peng
2022-11-16 20:50   ` Ackerley Tng
2022-11-16 22:13     ` Sean Christopherson
2022-11-17 13:25       ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-10-27 10:31   ` Fuad Tabba
2022-11-03 12:13 ` [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Vishal Annapurve
2022-11-08  0:41   ` Isaku Yamahata
2022-11-09 15:54     ` Kirill A. Shutemov
2022-11-15 14:36       ` Kirill A. Shutemov
2022-11-14 11:43 ` Alex Bennée
2022-11-16  5:00   ` Chao Peng
2022-11-16  9:40     ` Alex Bennée
2022-11-17 14:16       ` Chao Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2WB48kD0J4VGynX@google.com \
    --to=seanjc@google.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mhocko@suse.com \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=steven.price@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=wei.w.wang@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).