All of lore.kernel.org
 help / color / mirror / Atom feed
From: Isaku Yamahata <isaku.yamahata@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: David Matlack <dmatlack@google.com>,
	Kai Huang <kai.huang@intel.com>,
	Isaku Yamahata <isaku.yamahata@linux.intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Isaku Yamahata <isaku.yamahata@intel.com>,
	"federico.parola@polito.it" <federico.parola@polito.it>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
	"michael.roth@amd.com" <michael.roth@amd.com>
Subject: Re: [RFC PATCH 1/8] KVM: Document KVM_MAP_MEMORY ioctl
Date: Thu, 7 Mar 2024 18:19:41 -0800	[thread overview]
Message-ID: <20240308021941.GM368614@ls.amr.corp.intel.com> (raw)
In-Reply-To: <ZepptFuo5ZK6w4TT@google.com>

On Thu, Mar 07, 2024 at 05:28:20PM -0800,
Sean Christopherson <seanjc@google.com> wrote:

> On Thu, Mar 07, 2024, David Matlack wrote:
> > On 2024-03-08 01:20 PM, Huang, Kai wrote:
> > > > > > +:Parameters: struct kvm_memory_mapping(in/out)
> > > > > > +:Returns: 0 on success, <0 on error
> > > > > > +
> > > > > > +KVM_MAP_MEMORY populates guest memory without running vcpu.
> > > > > > +
> > > > > > +::
> > > > > > +
> > > > > > +  struct kvm_memory_mapping {
> > > > > > +	__u64 base_gfn;
> > > > > > +	__u64 nr_pages;
> > > > > > +	__u64 flags;
> > > > > > +	__u64 source;
> > > > > > +  };
> > > > > > +
> > > > > > +  /* For kvm_memory_mapping:: flags */
> > > > > > +  #define KVM_MEMORY_MAPPING_FLAG_WRITE         _BITULL(0)
> > > > > > +  #define KVM_MEMORY_MAPPING_FLAG_EXEC          _BITULL(1)
> > > > > > +  #define KVM_MEMORY_MAPPING_FLAG_USER          _BITULL(2)
> > > > > 
> > > > > I am not sure what's the good of having "FLAG_USER"?
> > > > > 
> > > > > This ioctl is called from userspace, thus I think we can just treat this always
> > > > > as user-fault?
> > > > 
> > > > The point is how to emulate kvm page fault as if vcpu caused the kvm page
> > > > fault.  Not we call the ioctl as user context.
> > > 
> > > Sorry I don't quite follow.  What's wrong if KVM just append the #PF USER
> > > error bit before it calls into the fault handler?
> > > 
> > > My question is, since this is ABI, you have to tell how userspace is
> > > supposed to use this.  Maybe I am missing something, but I don't see how
> > > USER should be used here.
> > 
> > If we restrict this API to the TDP MMU then KVM_MEMORY_MAPPING_FLAG_USER
> > is meaningless, PFERR_USER_MASK is only relevant for shadow paging.
> 
> +1
> 
> > KVM_MEMORY_MAPPING_FLAG_WRITE seems useful to allow memslots to be
> > populated with writes (which avoids just faulting in the zero-page for
> > anon or tmpfs backed memslots), while also allowing populating read-only
> > memslots.
> > 
> > I don't really see a use-case for KVM_MEMORY_MAPPING_FLAG_EXEC.
> 
> It would midly be interesting for something like the NX hugepage mitigation.
> 
> For the initial implementation, I don't think the ioctl() should specify
> protections, period.
> 
> VMA-based mappings, i.e. !guest_memfd, already have a way to specify protections.
> And for guest_memfd, finer grained control in general, and long term compatibility
> with other features that are in-flight or proposed, I would rather userspace specify
> RWX protections via KVM_SET_MEMORY_ATTRIBUTES.  Oh, and dirty logging would be a
> pain too.
> 
> KVM doesn't currently support execute-only (XO) or !executable (RW), so I think
> we can simply define KVM_MAP_MEMORY to behave like a read fault.  E.g. map RX,
> and add W if all underlying protections allow it.
> 
> That way we can defer dealing with things like XO and RW *if* KVM ever does gain
> support for specifying those combinations via KVM_SET_MEMORY_ATTRIBUTES, which
> will likely be per-arch/vendor and non-trivial, e.g. AMD's NPT doesn't even allow
> for XO memory.
> 
> And we shouldn't need to do anything for KVM_MAP_MEMORY in particular if
> KVM_SET_MEMORY_ATTRIBUTES gains support for RWX protections the existing RWX and
> RX combinations, e.g. if there's a use-case for write-protecting guest_memfd
> regions.
> 
> We can always expand the uAPI, but taking away functionality is much harder, if
> not impossible.

Ok, let me drop all the flags.  Here is the updated one.

4.143 KVM_MAP_MEMORY
------------------------

:Capability: KVM_CAP_MAP_MEMORY
:Architectures: none
:Type: vcpu ioctl
:Parameters: struct kvm_memory_mapping(in/out)
:Returns: 0 on success, < 0 on error

Errors:

  ======   =============================================================
  EINVAL   vcpu state is not in TDP MMU mode or is in guest mode.
           Currently, this ioctl is restricted to TDP MMU.
  EAGAIN   The region is only processed partially.  The caller should
           issue the ioctl with the updated parameters.
  EINTR    An unmasked signal is pending.  The region may be processed
           partially.  If `nr_pages` > 0, the caller should issue the
           ioctl with the updated parameters.
  ======   =============================================================

KVM_MAP_MEMORY populates guest memory before the VM starts to run.  Multiple
vcpus can call this ioctl simultaneously.  It may result in the error of EAGAIN
due to race conditions.

::

  struct kvm_memory_mapping {
	__u64 base_gfn;
	__u64 nr_pages;
	__u64 flags;
	__u64 source;
  };

KVM_MAP_MEMORY populates guest memory at the specified range (`base_gfn`,
`nr_pages`) in the underlying mapping. `source` is an optional user pointer.  If
`source` is not NULL and the underlying technology supports it, the memory
contents of `source` are copied into the guest memory.  The backend may encrypt
it.  `flags` must be zero.  It's reserved for future use.

When the ioctl returns, the input values are updated.  If `nr_pages` is large,
it may return EAGAIN or EINTR for pending signal and update the values
(`base_gfn` and `nr_pages`.  `source` if not zero) to point to the remaining
range.

-- 
Isaku Yamahata <isaku.yamahata@linux.intel.com>

  reply	other threads:[~2024-03-08  2:19 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 17:28 [RFC PATCH 0/8] KVM: Prepopulate guest memory API isaku.yamahata
2024-03-01 17:28 ` [RFC PATCH 1/8] KVM: Document KVM_MAP_MEMORY ioctl isaku.yamahata
2024-03-07  0:43   ` David Matlack
2024-03-07  1:29     ` Isaku Yamahata
2024-03-07 12:30   ` Huang, Kai
2024-03-07 20:33     ` Isaku Yamahata
2024-03-08  0:20       ` Huang, Kai
2024-03-08  0:56         ` David Matlack
2024-03-08  1:28           ` Sean Christopherson
2024-03-08  2:19             ` Isaku Yamahata [this message]
2024-03-10 23:12               ` Michael Roth
2024-03-11  1:05               ` Huang, Kai
2024-03-11  1:08                 ` Huang, Kai
2024-03-12  1:34                   ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 2/8] KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory isaku.yamahata
2024-03-07  0:49   ` David Matlack
2024-03-07  2:52     ` Isaku Yamahata
2024-03-07 12:45   ` Huang, Kai
2024-03-07 20:41     ` Isaku Yamahata
2024-03-11 17:23   ` Sean Christopherson
2024-03-11 22:19     ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 3/8] KVM: x86/mmu: Introduce initialier macro for struct kvm_page_fault isaku.yamahata
2024-03-11 17:24   ` Sean Christopherson
2024-03-11 22:56     ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 4/8] KVM: x86/mmu: Factor out kvm_mmu_do_page_fault() isaku.yamahata
2024-03-01 17:28 ` [RFC PATCH 5/8] KVM: x86/mmu: Introduce kvm_mmu_map_page() for prepopulating guest memory isaku.yamahata
2024-03-07  0:38   ` David Matlack
2024-03-19 15:53     ` Isaku Yamahata
2024-03-11 17:29   ` Sean Christopherson
2024-03-11 22:57     ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 6/8] KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory() isaku.yamahata
2024-03-07  0:30   ` David Matlack
2024-03-07  0:36     ` David Matlack
2024-03-07  1:51       ` Isaku Yamahata
2024-03-19 16:26         ` Isaku Yamahata
2024-04-03 23:15           ` Sean Christopherson
2024-03-07  1:34     ` Isaku Yamahata
2024-03-11 23:26   ` Sean Christopherson
2024-03-12 12:38     ` Huang, Kai
2024-03-12 14:20       ` Sean Christopherson
2024-03-12 21:41         ` Huang, Kai
2024-03-12 21:46           ` Huang, Kai
2024-03-12 23:03             ` Sean Christopherson
2024-03-01 17:28 ` [RFC PATCH 7/8] KVM: x86: Add hooks in kvm_arch_vcpu_map_memory() isaku.yamahata
2024-03-01 17:28 ` [RFC PATCH 8/8] KVM: selftests: x86: Add test for KVM_MAP_MEMORY isaku.yamahata
2024-03-07  0:53 ` [RFC PATCH 0/8] KVM: Prepopulate guest memory API David Matlack
2024-03-07  2:09   ` Isaku Yamahata
2024-03-19 16:33     ` Isaku Yamahata
2024-04-03 18:30       ` Sean Christopherson
2024-04-03 22:00         ` Isaku Yamahata
2024-04-03 22:42           ` Sean Christopherson
2024-03-11  3:20 ` Michael Roth
2024-03-11 23:44   ` Sean Christopherson
2024-03-12  1:32     ` Isaku Yamahata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240308021941.GM368614@ls.amr.corp.intel.com \
    --to=isaku.yamahata@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=federico.parola@polito.it \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.