From: Chao Peng <chao.p.peng@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-arch@vger.kernel.org, linux-api@vger.kernel.org,
linux-doc@vger.kernel.org, qemu-devel@nongnu.org,
Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
"J . Bruce Fields" <bfields@fieldses.org>,
Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
Steven Price <steven.price@arm.com>,
"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
Vlastimil Babka <vbabka@suse.cz>,
Vishal Annapurve <vannapurve@google.com>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
ak@linux.intel.com, david@redhat.com, aarcange@redhat.com,
ddutile@redhat.com, dhildenb@redhat.com,
Quentin Perret <qperret@google.com>,
tabba@google.com, Michael Roth <michael.roth@amd.com>,
mhocko@suse.com, Muchun Song <songmuchun@bytedance.com>,
wei.w.wang@intel.com
Subject: Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions
Date: Tue, 8 Nov 2022 16:24:10 +0800 [thread overview]
Message-ID: <20221108082410.GA84375@chaop.bj.intel.com> (raw)
In-Reply-To: <Y2WB48kD0J4VGynX@google.com>
On Fri, Nov 04, 2022 at 09:19:31PM +0000, Sean Christopherson wrote:
> Paolo, any thoughts before I lead things further astray?
>
> On Fri, Nov 04, 2022, Chao Peng wrote:
> > On Thu, Nov 03, 2022 at 11:04:53PM +0000, Sean Christopherson wrote:
> > > On Tue, Oct 25, 2022, Chao Peng wrote:
> > > > @@ -4708,6 +4802,24 @@ static long kvm_vm_ioctl(struct file *filp,
> > > > r = kvm_vm_ioctl_set_memory_region(kvm, &mem);
> > > > break;
> > > > }
> > > > +#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> > > > + case KVM_MEMORY_ENCRYPT_REG_REGION:
> > > > + case KVM_MEMORY_ENCRYPT_UNREG_REGION: {
> > >
> > > I'm having second thoughts about usurping KVM_MEMORY_ENCRYPT_(UN)REG_REGION. Aside
> > > from the fact that restricted/protected memory may not be encrypted, there are
> > > other potential use cases for per-page memory attributes[*], e.g. to make memory
> > > read-only (or no-exec, or exec-only, etc...) without having to modify memslots.
> > >
> > > Any paravirt use case where the attributes of a page are effectively dictated by
> > > the guest is going to run into the exact same performance problems with memslots,
> > > which isn't suprising in hindsight since shared vs. private is really just an
> > > attribute, albeit with extra special semantics.
> > >
> > > And if we go with a brand new ioctl(), maybe someday in the very distant future
> > > we can deprecate and delete KVM_MEMORY_ENCRYPT_(UN)REG_REGION.
> > >
> > > Switching to a new ioctl() should be a minor change, i.e. shouldn't throw too big
> > > of a wrench into things.
> > >
> > > Something like:
> > >
> > > KVM_SET_MEMORY_ATTRIBUTES
> > >
> > > struct kvm_memory_attributes {
> > > __u64 address;
> > > __u64 size;
> > > __u64 flags;
>
> Oh, this is half-baked. I lost track of which flags were which. What I intended
> was a separate, initially-unused flags, e.g.
That makes sense.
>
> struct kvm_memory_attributes {
> __u64 address;
> __u64 size;
> __u64 attributes;
> __u64 flags;
> }
>
> so that KVM can tweak behavior and/or extend the effective size of the struct.
>
> > I like the idea of adding a new ioctl(). But putting all attributes into
> > a flags in uAPI sounds not good to me, e.g. forcing userspace to set all
> > attributes in one call can cause pain for userspace, probably for KVM
> > implementation as well. For private<->shared memory conversion, we
> > actually only care the KVM_MEM_ATTR_SHARED or KVM_MEM_ATTR_PRIVATE bit,
>
> Not necessarily, e.g. I can see pKVM wanting to convert from RW+PRIVATE => RO+SHARED
> or even RW+PRIVATE => NONE+SHARED so that the guest can't write/access the memory
> while it's accessible from the host.
>
> And if this does extend beyond shared/private, dropping from RWX=>R, i.e. dropping
> WX permissions, would also be a common operation.
>
> Hmm, typing that out makes me think that if we do end up supporting other "attributes",
> i.e. protections, we should go straight to full RWX protections instead of doing
> things piecemeal, i.e. add individual protections instead of combinations like
> NO_EXEC and READ_ONLY. The protections would have to be inverted for backwards
> compatibility, but that's easy enough to handle. The semantics could be like
> protection keys, which also have inverted persmissions, where the final protections
> are the combination of memslot+attributes, i.e. a read-only memslot couldn't be made
> writable via attributes.
>
> E.g. userspace could do "NO_READ | NO_WRITE | NO_EXEC" to temporarily block access
> to memory without needing to delete the memslot. KVM would need to disallow
> unsupported combinations, e.g. disallowed effective protections would be:
>
> - W or WX [unless there's an arch that supports write-only memory]
> - R or RW [until KVM plumbs through support for no-exec, or it's unsupported in hardware]
> - X [until KVM plumbs through support for exec-only, or it's unsupported in hardware]
>
> Anyways, that's all future work...
>
> > but we force userspace to set other irrelevant bits as well if use this
> > API.
>
> They aren't irrelevant though, as the memory attributes are all describing the
> allowed protections for a given page.
The 'allowed' protections seems answer my concern. But after we enabled
"NO_READ | NO_WRITE | NO_EXEC", are we going to check "NO_READ |
NO_WRITE | NO_EXEC" are also set together with the PRIVATE bit? I just
can't imagine what the semantic would be if we have the PRIVATE bit set
but other bits indicate it's actually can READ/WRITE/EXEC from usrspace.
> If there's a use case where userspace "can't"
> keep track of the attributes for whatever reason, then userspace could do a RMW
> to set/clear attributes. Alternatively, the ioctl() could take an "operation" and
> support WRITE/OR/AND to allow setting/clearing individual flags, e.g. tweak the
> above to be:
A getter would be good, it might also be needed for live migration.
>
> struct kvm_memory_attributes {
> __u64 address;
> __u64 size;
> __u64 attributes;
> __u32 operation;
> __u32 flags;
> }
>
> > I looked at kvm_device_attr, sounds we can do similar:
>
> The device attributes deal with isolated, arbitrary values, whereas memory attributes
> are flags, i.e. devices are 1:1 whereas memory is 1:MANY. There is no "unset" for
> device attributes, because they aren't flags. Device attributes vs. memory attributes
> really are two very different things that just happen to use a common name.
>
> If it helped clarify things without creating naming problems, we could even use
> PROTECTIONS instead of ATTRIBUTES.
>
> > KVM_SET_MEMORY_ATTR
> >
> > struct kvm_memory_attr {
> > __u64 address;
> > __u64 size;
> > #define KVM_MEM_ATTR_SHARED BIT(0)
> > #define KVM_MEM_ATTR_READONLY BIT(1)
> > #define KVM_MEM_ATTR_NOEXEC BIT(2)
> > __u32 attr;
>
> As above, letting userspace set only a single attribute would prevent setting
> (or clearing) multiple attributes in a single ioctl().
>
> > __u32 pad;
> > }
> >
> > I'm not sure if we need KVM_GET_MEMORY_ATTR/KVM_HAS_MEMORY_ATTR as well,
>
> Definitely would need to communicate to userspace that various attributes are
> supported. That doesn't necessarily require a common ioctl(), but I don't see
> any reason not to add a common helper, and adding a common helper would mean
> KVM_CAP_PRIVATE_MEM can go away. But it should return a bitmask so that userspace
> can do a single query to get all supported attributes, e.g. KVM_SUPPORTED_MEMORY_ATTRIBUTES.
Do you have preference on using a new ioctl or just keep it as a cap?
E.g. KVM_CAP_MEMORY_ATTIBUTES can also returns a mask.
>
> As for KVM_GET_MEMORY_ATTRIBUTES, we wouldn't necessarily have to provide such an
> API, e.g. we could hold off until someone came along with a RMW use case (as above).
> That said, debug would likely be a nightmare without KVM_GET_MEMORY_ATTRIBUTES,
> so it's probably best to add it straightway.
Dive into the implementation a bit, for KVM_GET_MEMORY_ATTRIBUTES we can
have different attributes for different pages in the same user-provided
range, in that case we will have to either return a list or just a error
number. Or we only support per-page attributes for the getter?
Chao
>
> > but sounds like we need a KVM_UNSET_MEMORY_ATTR.
>
> No need if the setter operates on all attributes.
>
> > Since we are exposing the attribute directly to userspace I also think
> > we'd better treat shared memory as the default, so even when the private
> > memory is not used, the bit can still be meaningful. So define BIT(0) as
> > KVM_MEM_ATTR_PRIVATE instead of KVM_MEM_ATTR_SHARED.
>
> Ah, right.
next prev parent reply other threads:[~2022-11-08 8:28 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-25 15:13 [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Chao Peng
2022-10-25 15:13 ` [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory Chao Peng
2022-10-26 17:31 ` Isaku Yamahata
2022-10-28 6:12 ` Chao Peng
2022-10-27 10:20 ` Fuad Tabba
2022-10-31 17:47 ` Michael Roth
2022-11-01 11:37 ` Chao Peng
2022-11-01 15:19 ` Michael Roth
2022-11-01 19:30 ` Michael Roth
2022-11-02 14:53 ` Chao Peng
2022-11-02 21:19 ` Michael Roth
2022-11-14 14:02 ` Vlastimil Babka
2022-11-14 15:28 ` Kirill A. Shutemov
2022-11-14 22:16 ` Michael Roth
2022-11-15 9:48 ` Chao Peng
2022-11-14 22:16 ` Michael Roth
2022-11-02 21:14 ` Kirill A. Shutemov
2022-11-02 21:26 ` Michael Roth
2022-11-02 22:07 ` Michael Roth
2022-11-03 16:30 ` Kirill A. Shutemov
2022-11-29 0:06 ` Michael Roth
2022-11-29 11:21 ` Kirill A. Shutemov
2022-11-29 11:39 ` David Hildenbrand
2022-11-29 13:59 ` Chao Peng
2022-11-29 13:58 ` Chao Peng
2022-11-29 0:37 ` Michael Roth
2022-11-29 14:06 ` Chao Peng
2022-11-29 19:06 ` Michael Roth
2022-11-29 19:18 ` Michael Roth
2022-11-30 9:39 ` Chao Peng
2022-11-30 14:31 ` Michael Roth
2022-11-29 18:01 ` Vishal Annapurve
2022-12-02 2:16 ` Vishal Annapurve
2022-12-02 6:49 ` Chao Peng
2022-12-02 13:44 ` Kirill A . Shutemov
2022-10-25 15:13 ` [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-10-27 10:25 ` Fuad Tabba
2022-10-28 7:04 ` Xiaoyao Li
2022-10-31 14:14 ` Chao Peng
2022-11-14 16:04 ` Alex Bennée
2022-11-15 9:29 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-10-25 15:26 ` Peter Maydell
2022-10-25 16:17 ` Sean Christopherson
2022-10-27 10:27 ` Fuad Tabba
2022-10-28 6:14 ` Chao Peng
2022-11-15 16:56 ` Alex Bennée
2022-11-16 3:14 ` Chao Peng
2022-11-16 19:03 ` Alex Bennée
2022-11-17 13:45 ` Chao Peng
2022-11-17 15:08 ` Alex Bennée
2022-11-18 1:32 ` Chao Peng
2022-11-18 13:23 ` Alex Bennée
2022-11-18 15:59 ` Sean Christopherson
2022-11-22 9:50 ` Chao Peng
2022-11-23 18:02 ` Sean Christopherson
2022-11-16 18:15 ` Andy Lutomirski
2022-11-16 18:48 ` Sean Christopherson
2022-11-17 13:42 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-10-27 10:29 ` Fuad Tabba
2022-11-04 2:28 ` Chao Peng
2022-11-04 22:29 ` Sean Christopherson
2022-11-08 7:16 ` Chao Peng
2022-11-10 17:53 ` Sean Christopherson
2022-11-10 20:06 ` Sean Christopherson
2022-11-11 8:27 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions Chao Peng
2022-10-27 10:31 ` Fuad Tabba
2022-11-03 23:04 ` Sean Christopherson
2022-11-04 8:28 ` Chao Peng
2022-11-04 21:19 ` Sean Christopherson
2022-11-08 8:24 ` Chao Peng [this message]
2022-11-08 1:35 ` Yuan Yao
2022-11-08 9:41 ` Chao Peng
2022-11-09 5:52 ` Yuan Yao
2022-11-16 22:24 ` Sean Christopherson
2022-11-17 13:20 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed Chao Peng
2022-10-26 20:46 ` Isaku Yamahata
2022-10-28 6:38 ` Chao Peng
2022-11-08 12:08 ` Yuan Yao
2022-11-09 4:13 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 7/8] KVM: Handle page fault for private memory Chao Peng
2022-10-26 21:54 ` Isaku Yamahata
2022-10-28 6:55 ` Chao Peng
2022-11-01 0:02 ` Isaku Yamahata
2022-11-01 11:38 ` Chao Peng
2022-11-16 20:50 ` Ackerley Tng
2022-11-16 22:13 ` Sean Christopherson
2022-11-17 13:25 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-10-27 10:31 ` Fuad Tabba
2022-11-03 12:13 ` [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Vishal Annapurve
2022-11-08 0:41 ` Isaku Yamahata
2022-11-09 15:54 ` Kirill A. Shutemov
2022-11-15 14:36 ` Kirill A. Shutemov
2022-11-14 11:43 ` Alex Bennée
2022-11-16 5:00 ` Chao Peng
2022-11-16 9:40 ` Alex Bennée
2022-11-17 14:16 ` Chao Peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221108082410.GA84375@chaop.bj.intel.com \
--to=chao.p.peng@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=ddutile@redhat.com \
--cc=dhildenb@redhat.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=jun.nakajima@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mail@maciej.szmigiero.name \
--cc=mhocko@suse.com \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qperret@google.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=steven.price@arm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=wei.w.wang@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).