kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: isaku.yamahata@intel.com
Cc: kvm@vger.kernel.org, isaku.yamahata@gmail.com,
	 linux-kernel@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Michael Roth <michael.roth@amd.com>,
	David Matlack <dmatlack@google.com>,
	 Federico Parola <federico.parola@polito.it>
Subject: Re: [RFC PATCH 6/8] KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory()
Date: Mon, 11 Mar 2024 16:26:30 -0700	[thread overview]
Message-ID: <Ze-TJh0BBOWm9spT@google.com> (raw)
In-Reply-To: <66a957f4ec4a8591d2ff2550686e361ec648b308.1709288671.git.isaku.yamahata@intel.com>

On Fri, Mar 01, 2024, isaku.yamahata@intel.com wrote:
> +int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu,
> +			     struct kvm_memory_mapping *mapping)
> +{
> +	u8 max_level, goal_level = PG_LEVEL_4K;

goal_level is misleading.  kvm_page_fault.goal_level is appropriate, because for
the majority of the structures lifetime, it is the level KVM is trying to map,
not the level that KVM has mapped.

For this variable, maybe "mapped_level" or just "level"


> +	u32 error_code;

u64 when this gets rebased...

> +	int r;
> +
> +	error_code = 0;
> +	if (mapping->flags & KVM_MEMORY_MAPPING_FLAG_WRITE)
> +		error_code |= PFERR_WRITE_MASK;
> +	if (mapping->flags & KVM_MEMORY_MAPPING_FLAG_EXEC)
> +		error_code |= PFERR_FETCH_MASK;
> +	if (mapping->flags & KVM_MEMORY_MAPPING_FLAG_USER)
> +		error_code |= PFERR_USER_MASK;
> +	if (mapping->flags & KVM_MEMORY_MAPPING_FLAG_PRIVATE) {
> +#ifdef PFERR_PRIVATE_ACCESS
> +		error_code |= PFERR_PRIVATE_ACCESS;

...because PFERR_PRIVATE_ACCESS is in bits 63:32.

> +#else
> +		return -OPNOTSUPP;

Broken code.  And I don't see any reason for this to exist, PFERR_PRIVATE_ACCESS
will be unconditionally #defined.

> +#endif
> +	}
> +
> +	if (IS_ALIGNED(mapping->base_gfn, KVM_PAGES_PER_HPAGE(PG_LEVEL_1G)) &&
> +	    mapping->nr_pages >= KVM_PAGES_PER_HPAGE(PG_LEVEL_1G))
> +		max_level = PG_LEVEL_1G;
> +	else if (IS_ALIGNED(mapping->base_gfn, KVM_PAGES_PER_HPAGE(PG_LEVEL_2M)) &&
> +		 mapping->nr_pages >= KVM_PAGES_PER_HPAGE(PG_LEVEL_2M))
> +		max_level = PG_LEVEL_2M;
> +	else
> +		max_level = PG_LEVEL_4K;

Hrm, I would much prefer to allow KVM to map more than what is requested, i.e.
not artificially restrict the hugepage size.  Restricting the mapping size is
just giving userspace another way to shoot themselves in the foot.  E.g. if
userspace prefaults the wrong size, KVM will likely either run with degraded
performance or immediately zap and rebuild the too-small SPTE.

Ugh, and TDX's S-EPT *must* be populated with 4KiB entries to start.  On one hand,
allowing KVM to map a larger page (but still bounded by the memslot(s)) won't affect
TDX's measurement, because KVM never use a larger page.  On the other hand, TDX
would need a way to restrict the mapping.

Another complication is that TDH.MEM.PAGE.ADD affects the measurement.  We can
push the ordering requirements to userspace, but for all intents and purposes,
calls to this ioctl() would need to be serialized for doing PAGE.ADD, but could
run in parallel for every other use case, including PAGE.AUG and pre-mapping
shared memory for TDX.

Hrm.

Wait. KVM doesn't *need* to do PAGE.ADD from deep in the MMU.  The only inputs to
PAGE.ADD are the gfn, pfn, tdr (vm), and source.  The S-EPT structures need to be
pre-built, but when they are built is irrelevant, so long as they are in place
before PAGE.ADD.

Crazy idea.  For TDX S-EPT, what if KVM_MAP_MEMORY does all of the SEPT.ADD stuff,
which doesn't affect the measurement, and even fills in KVM's copy of the leaf EPTE, 
but tdx_sept_set_private_spte() doesn't do anything if the TD isn't finalized?

Then KVM provides a dedicated TDX ioctl(), i.e. what is/was KVM_TDX_INIT_MEM_REGION,
to do PAGE.ADD.  KVM_TDX_INIT_MEM_REGION wouldn't need to map anything, it would
simply need to verify that the pfn from guest_memfd() is the same as what's in
the TDP MMU.

Or if we want to make things more robust for userspace, set a software-available
flag in the leaf TDP MMU SPTE to indicate that the page is awaiting PAGE.ADD.
That way tdp_mmu_map_handle_target_level() wouldn't treat a fault as spurious
(KVM will see the SPTE as PRESENT, but the S-EPT entry will be !PRESENT).

Then KVM_MAP_MEMORY doesn't need to support @source, KVM_TDX_INIT_MEM_REGION
doesn't need to fake a page fault and doesn't need to temporarily stash the
source_pa in KVM, and KVM_MAP_MEMORY could be used to fully pre-map TDX memory.

I believe the only missing piece is a way for the TDX code to communicate that
hugepages are disallowed.

  parent reply	other threads:[~2024-03-11 23:26 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 17:28 [RFC PATCH 0/8] KVM: Prepopulate guest memory API isaku.yamahata
2024-03-01 17:28 ` [RFC PATCH 1/8] KVM: Document KVM_MAP_MEMORY ioctl isaku.yamahata
2024-03-07  0:43   ` David Matlack
2024-03-07  1:29     ` Isaku Yamahata
2024-03-07 12:30   ` Huang, Kai
2024-03-07 20:33     ` Isaku Yamahata
2024-03-08  0:20       ` Huang, Kai
2024-03-08  0:56         ` David Matlack
2024-03-08  1:28           ` Sean Christopherson
2024-03-08  2:19             ` Isaku Yamahata
2024-03-10 23:12               ` Michael Roth
2024-03-11  1:05               ` Huang, Kai
2024-03-11  1:08                 ` Huang, Kai
2024-03-12  1:34                   ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 2/8] KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory isaku.yamahata
2024-03-07  0:49   ` David Matlack
2024-03-07  2:52     ` Isaku Yamahata
2024-03-07 12:45   ` Huang, Kai
2024-03-07 20:41     ` Isaku Yamahata
2024-03-11 17:23   ` Sean Christopherson
2024-03-11 22:19     ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 3/8] KVM: x86/mmu: Introduce initialier macro for struct kvm_page_fault isaku.yamahata
2024-03-11 17:24   ` Sean Christopherson
2024-03-11 22:56     ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 4/8] KVM: x86/mmu: Factor out kvm_mmu_do_page_fault() isaku.yamahata
2024-03-01 17:28 ` [RFC PATCH 5/8] KVM: x86/mmu: Introduce kvm_mmu_map_page() for prepopulating guest memory isaku.yamahata
2024-03-07  0:38   ` David Matlack
2024-03-19 15:53     ` Isaku Yamahata
2024-03-11 17:29   ` Sean Christopherson
2024-03-11 22:57     ` Isaku Yamahata
2024-03-01 17:28 ` [RFC PATCH 6/8] KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory() isaku.yamahata
2024-03-07  0:30   ` David Matlack
2024-03-07  0:36     ` David Matlack
2024-03-07  1:51       ` Isaku Yamahata
2024-03-19 16:26         ` Isaku Yamahata
2024-04-03 23:15           ` Sean Christopherson
2024-03-07  1:34     ` Isaku Yamahata
2024-03-11 23:26   ` Sean Christopherson [this message]
2024-03-12 12:38     ` Huang, Kai
2024-03-12 14:20       ` Sean Christopherson
2024-03-12 21:41         ` Huang, Kai
2024-03-12 21:46           ` Huang, Kai
2024-03-12 23:03             ` Sean Christopherson
2024-03-01 17:28 ` [RFC PATCH 7/8] KVM: x86: Add hooks in kvm_arch_vcpu_map_memory() isaku.yamahata
2024-03-01 17:28 ` [RFC PATCH 8/8] KVM: selftests: x86: Add test for KVM_MAP_MEMORY isaku.yamahata
2024-03-07  0:53 ` [RFC PATCH 0/8] KVM: Prepopulate guest memory API David Matlack
2024-03-07  2:09   ` Isaku Yamahata
2024-03-19 16:33     ` Isaku Yamahata
2024-04-03 18:30       ` Sean Christopherson
2024-04-03 22:00         ` Isaku Yamahata
2024-04-03 22:42           ` Sean Christopherson
2024-03-11  3:20 ` Michael Roth
2024-03-11 23:44   ` Sean Christopherson
2024-03-12  1:32     ` Isaku Yamahata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze-TJh0BBOWm9spT@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=federico.parola@polito.it \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).