kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com, david@redhat.com
Subject: Re: [PATCH v5 09/13] KVM: Handle page fault for private memory
Date: Tue, 29 Mar 2022 01:07:18 +0000	[thread overview]
Message-ID: <YkJbxiL/Az7olWlq@google.com> (raw)
In-Reply-To: <20220310140911.50924-10-chao.p.peng@linux.intel.com>

On Thu, Mar 10, 2022, Chao Peng wrote:
> @@ -3890,7 +3893,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
>  }
>  
> -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
> +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	/*
> +	 * At this time private gfn has not been supported yet. Other patch
> +	 * that enables it should change this.
> +	 */
> +	return false;
> +}
> +
> +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> +				    struct kvm_page_fault *fault,
> +				    bool *is_private_pfn, int *r)

@is_private_pfn should be a field in @fault, not a separate parameter, and it
should be a const property set by the original caller.  I would also name it
"is_private", because if KVM proceeds past this point, it will be a property of
the fault/access _and_ the pfn

I say it's a property of the fault because the below kvm_vcpu_is_private_gfn()
should instead be:

	if (fault->is_private)

The kvm_vcpu_is_private_gfn() check is TDX centric.  For SNP, private vs. shared
is communicated via error code.  For software-only (I'm being optimistic ;-) ),
we'd probably need to track private vs. shared internally in KVM, I don't think
we'd want to force it to be a property of the gfn.

Then you can also move the fault->is_private waiver into is_page_fault_stale(),
and drop the local is_private_pfn in direct_page_fault().

> +{
> +	int order;
> +	unsigned int flags = 0;
> +	struct kvm_memory_slot *slot = fault->slot;
> +	long pfn = kvm_memfile_get_pfn(slot, fault->gfn, &order);

If get_lock_pfn() and thus kvm_memfile_get_pfn() returns a pure error code instead
of multiplexing the pfn, then this can be:

	bool is_private_pfn;

	is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn, &fault->pfn, &order);

That self-documents the "pfn < 0" == shared logic.

> +
> +	if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) {
> +		if (pfn < 0)
> +			flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
> +		else {
> +			fault->pfn = pfn;
> +			if (slot->flags & KVM_MEM_READONLY)
> +				fault->map_writable = false;
> +			else
> +				fault->map_writable = true;
> +
> +			if (order == 0)
> +				fault->max_level = PG_LEVEL_4K;

This doesn't correctly handle order > 0, but less than the next page size, in which
case max_level needs to be PG_LEVEL_4k.  It also doesn't handle the case where
max_level > PG_LEVEL_2M.

That said, I think the proper fix is to have the get_lock_pfn() API return the max
mapping level, not the order.  KVM, and presumably any other secondary MMU that might
use these APIs, doesn't care about the order of the struct page, KVM cares about the
max size/level of page it can map into the guest.  And similar to the previous patch,
"order" is specific to struct page, which we are trying to avoid.

> +			*is_private_pfn = true;

This is where KVM guarantees that is_private_pfn == fault->is_private.

> +			*r = RET_PF_FIXED;
> +			return true;

Ewww.  This is super confusing.  Ditto for the "*r = -1" magic number.  I totally
understand why you took this approach, it's just hard to follow because it kinda
follows the kvm_faultin_pfn() semantics, but then inverts true and false in this
one case.

I think the least awful option is to forego the helper and open code everything.
If we ever refactor kvm_faultin_pfn() to be less weird then we can maybe move this
to a helper.

Open coding isn't too bad if you reorganize things so that the exit-to-userspace
path is a dedicated, early check.  IMO, it's a lot easier to read this way, open
coded or not.

I think this is correct?  "is_private_pfn" and "level" are locals, everything else
is in @fault.

	if (kvm_slot_is_private(slot)) {
		is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn,
						       &fault->pfn, &level);

		if (fault->is_private != is_private_pfn) {
			if (is_private_pfn)
				kvm_memfile_put_pfn(slot, fault->pfn);

			vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
			if (fault->is_private)
				vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
			else
				vcpu->run->memory.flags = 0;
			vcpu->run->memory.padding = 0;
			vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
			vcpu->run->memory.size = PAGE_SIZE;
			*r = 0;
			return true;
		}

		/*
		 * fault->pfn is all set if the fault is for a private pfn, just
		 * need to update other metadata.
		 */
		if (fault->is_private) {
			fault->max_level = min(fault->max_level, level);
			fault->map_writable = !(slot->flags & KVM_MEM_READONLY);
			return false;
		}

		/* Fault is shared, fallthrough to the standard path. */
	}

	async = false;

> @@ -4016,7 +4076,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	else
>  		write_lock(&vcpu->kvm->mmu_lock);
>  
> -	if (is_page_fault_stale(vcpu, fault, mmu_seq))
> +	if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq))

As above, I'd prefer this check go in is_page_fault_stale().  It means shadow MMUs
will suffer a pointless check, but I don't think that's a big issue.  Oooh, unless
we support software-only, which would play nice with nested and probably even legacy
shadow paging.  Fun :-)

>  		goto out_unlock;
>  
>  	r = make_mmu_pages_available(vcpu);
> @@ -4033,7 +4093,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(fault->pfn);
> +
> +	if (is_private_pfn)

And this can be

	if (fault->is_private)

Same feedback for paging_tmpl.h.

  reply	other threads:[~2022-03-29  1:07 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 14:08 [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-03-10 14:08 ` [PATCH v5 01/13] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-04-11 15:10   ` Kirill A. Shutemov
2022-04-12 13:11     ` Chao Peng
2022-04-23  5:43   ` Vishal Annapurve
2022-04-24  8:15     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 02/13] mm: Introduce memfile_notifier Chao Peng
2022-03-29 18:45   ` Sean Christopherson
2022-04-08 12:54     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 03/13] mm/shmem: Support memfile_notifier Chao Peng
2022-03-10 23:08   ` Dave Chinner
2022-03-11  8:42     ` Chao Peng
2022-04-11 15:26   ` Kirill A. Shutemov
2022-04-12 13:12     ` Chao Peng
2022-04-19 22:40   ` Vishal Annapurve
2022-04-20  3:24     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Chao Peng
2022-04-07 16:05   ` Sean Christopherson
2022-04-07 17:09     ` Andy Lutomirski
2022-04-08 17:56       ` Sean Christopherson
2022-04-08 18:54         ` David Hildenbrand
2022-04-12 14:36           ` Jason Gunthorpe
2022-04-12 21:27             ` Andy Lutomirski
2022-04-13 16:30               ` David Hildenbrand
2022-04-13 16:24             ` David Hildenbrand
2022-04-13 17:52               ` Jason Gunthorpe
2022-04-25 14:07                 ` David Hildenbrand
2022-04-08 13:02     ` Chao Peng
2022-04-11 15:34       ` Kirill A. Shutemov
2022-04-12  5:14         ` Hugh Dickins
2022-04-11 15:32     ` Kirill A. Shutemov
2022-04-12 13:39       ` Chao Peng
2022-04-12 19:28         ` Kirill A. Shutemov
2022-04-13  9:15           ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 05/13] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-03-28 21:27   ` Sean Christopherson
2022-04-08 13:21     ` Chao Peng
2022-03-28 21:56   ` Sean Christopherson
2022-04-08 13:46     ` Chao Peng
2022-04-08 17:45       ` Sean Christopherson
2022-03-10 14:09 ` [PATCH v5 06/13] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2022-03-28 22:26   ` Sean Christopherson
2022-04-08 13:58     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 07/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2022-03-28 22:33   ` Sean Christopherson
2022-04-08 13:59     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 08/13] KVM: Use memfile_pfn_ops to obtain pfn for private pages Chao Peng
2022-03-28 23:56   ` Sean Christopherson
2022-04-08 14:07     ` Chao Peng
2022-04-28 12:37     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 09/13] KVM: Handle page fault for private memory Chao Peng
2022-03-29  1:07   ` Sean Christopherson [this message]
2022-04-12 12:10     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 10/13] KVM: Register private memslot to memory backing store Chao Peng
2022-03-29 19:01   ` Sean Christopherson
2022-04-12 12:40     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd Chao Peng
2022-03-29 19:23   ` Sean Christopherson
2022-04-12 12:43     ` Chao Peng
2022-04-05 23:45   ` Michael Roth
2022-04-08  3:06     ` Sean Christopherson
2022-04-19 22:43   ` Vishal Annapurve
2022-04-20  3:17     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 12/13] KVM: Expose KVM_MEM_PRIVATE Chao Peng
2022-03-29 19:13   ` Sean Christopherson
2022-04-12 12:56     ` Chao Peng
2022-03-10 14:09 ` [PATCH v5 13/13] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng
2022-03-24 15:51 ` [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Quentin Perret
2022-03-28 17:13   ` Sean Christopherson
2022-03-28 18:00     ` Quentin Perret
2022-03-28 18:58       ` Sean Christopherson
2022-03-29 17:01         ` Quentin Perret
2022-03-30  8:58           ` Steven Price
2022-03-30 10:39             ` Quentin Perret
2022-03-30 17:58               ` Sean Christopherson
2022-03-31 16:04                 ` Andy Lutomirski
2022-04-01 14:59                   ` Quentin Perret
2022-04-01 17:14                     ` Sean Christopherson
2022-04-01 18:03                       ` Quentin Perret
2022-04-01 18:24                         ` Sean Christopherson
2022-04-01 19:56                     ` Andy Lutomirski
2022-04-04 15:01                       ` Quentin Perret
2022-04-04 17:06                         ` Sean Christopherson
2022-04-04 22:04                           ` Andy Lutomirski
2022-04-05 10:36                             ` Quentin Perret
2022-04-05 17:51                               ` Andy Lutomirski
2022-04-05 18:30                                 ` Sean Christopherson
2022-04-06 18:42                                   ` Andy Lutomirski
2022-04-06 13:05                                 ` Quentin Perret
2022-04-05 18:03                               ` Sean Christopherson
2022-04-06 10:34                                 ` Quentin Perret
2022-04-22 10:56                                 ` Chao Peng
2022-04-22 11:06                                   ` Paolo Bonzini
2022-04-24  8:07                                     ` Chao Peng
2022-04-24 16:59                                   ` Andy Lutomirski
2022-04-25 13:40                                     ` Chao Peng
2022-04-25 14:52                                       ` Andy Lutomirski
2022-04-25 20:30                                         ` Sean Christopherson
2022-06-10 19:18                                           ` Andy Lutomirski
2022-06-10 19:27                                             ` Sean Christopherson
2022-04-28 12:29                                         ` Chao Peng
2022-05-03 11:12                                           ` Quentin Perret
2022-05-09 22:30                                   ` Michael Roth
2022-05-09 23:29                                     ` Sean Christopherson
2022-07-21 20:05                                       ` Gupta, Pankaj
2022-07-21 21:19                                         ` Sean Christopherson
2022-07-21 21:36                                           ` Gupta, Pankaj
2022-07-23  3:09                                           ` Andy Lutomirski
2022-07-25  9:19                                             ` Gupta, Pankaj
2022-03-30 16:18             ` Sean Christopherson
2022-03-28 20:16 ` Andy Lutomirski
2022-03-28 22:48   ` Nakajima, Jun
2022-03-29  0:04     ` Sean Christopherson
2022-04-08 21:35   ` Vishal Annapurve
2022-04-12 13:00     ` Chao Peng
2022-04-12 19:58   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YkJbxiL/Az7olWlq@google.com \
    --to=seanjc@google.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rppt@kernel.org \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).