kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	 robert.hoo.linux@gmail.com, jthoughton@google.com,
	amoorthy@google.com,  dmatlack@google.com,
	axelrasmussen@google.com, peterx@redhat.com,
	 nadav.amit@gmail.com, isaku.yamahata@gmail.com,
	kconsul@linux.vnet.ibm.com
Subject: [PATCH v6 06/14] KVM: Add memslot flag to let userspace force an exit on missing hva mappings
Date: Thu,  9 Nov 2023 21:03:17 +0000	[thread overview]
Message-ID: <20231109210325.3806151-7-amoorthy@google.com> (raw)
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>

Allowing KVM to fault in pages during vcpu-context guest memory accesses
can be undesirable: during userfaultfd-based postcopy, it can cause
significant performance issues due to vCPUs contending for
userfaultfd-internal locks.

Add a new memslot flag (KVM_MEM_EXIT_ON_MISSING) through which userspace
can indicate that KVM_RUN should exit instead of faulting in pages
during vcpu-context guest memory accesses. The unfaulted pages are
reported by the accompanying KVM_EXIT_MEMORY_FAULT_INFO, allowing
userspace to determine and take appropriate action.

The basic implementation strategy is to check the memslot flag from
within __gfn_to_pfn_memslot() and override the caller-provided arguments
accordingly. Some callers (such as kvm_vcpu_map()) must be able to opt
out of this behavior, and do so by passing can_exit_on_missing=false.

No functional change intended: nothing sets KVM_MEM_EXIT_ON_MISSING or
passes can_exit_on_missing=true to __gfn_to_pfn_memslot().

Suggested-by: James Houghton <jthoughton@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst         | 28 +++++++++++++++++++++++---
 arch/arm64/kvm/mmu.c                   |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/x86/kvm/mmu/mmu.c                 |  4 ++--
 include/linux/kvm_host.h               | 12 ++++++++++-
 include/uapi/linux/kvm.h               |  2 ++
 virt/kvm/Kconfig                       |  3 +++
 virt/kvm/kvm_main.c                    | 25 ++++++++++++++++++-----
 9 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a07964f601de..1457865f6e98 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1365,6 +1365,8 @@ yet and must be cleared on entry.
   /* for kvm_userspace_memory_region::flags */
   #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
   #define KVM_MEM_READONLY	(1UL << 1)
+  #define KVM_MEM_GUEST_MEMFD      (1UL << 2)
+  #define KVM_MEM_EXIT_ON_MISSING  (1UL << 3)
 
 This ioctl allows the user to create, modify or delete a guest physical
 memory slot.  Bits 0-15 of "slot" specify the slot id and this value
@@ -1395,12 +1397,16 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.
 
-The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
-KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
+The flags field supports four flags
+
+1.  KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of
 writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
+use it.
+2.  KVM_MEM_READONLY: can be set, if KVM_CAP_READONLY_MEM capability allows it,
 to make a new slot read-only.  In this case, writes to this memory will be
 posted to userspace as KVM_EXIT_MMIO exits.
+3.  KVM_MEM_GUEST_MEMFD
+4.  KVM_MEM_EXIT_ON_MISSING: see KVM_CAP_EXIT_ON_MISSING for details.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region are automatically reflected into the guest.  For example, an
@@ -8059,6 +8065,22 @@ error/annotated fault.
 
 See KVM_EXIT_MEMORY_FAULT for more information.
 
+7.35 KVM_CAP_EXIT_ON_MISSING
+----------------------------
+
+:Architectures: None
+:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
+
+The presence of this capability indicates that userspace may set the
+KVM_MEM_EXIT_ON_MISSING flag on memslots. Said flag will cause KVM_RUN to fail
+(-EFAULT) in response to guest-context memory accesses which would require KVM
+to page fault on the userspace mapping.
+
+The range of guest physical memory causing the fault is advertised to userspace
+through KVM_CAP_MEMORY_FAULT_INFO. Userspace should take appropriate action.
+This could mean, for instance, checking that the fault is resolvable, faulting
+in the relevant userspace mapping, then retrying KVM_RUN.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4e41ceed5468..13066a6fdfff 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1486,7 +1486,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmap_read_unlock(current->mm);
 
 	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-				   write_fault, &writable, NULL);
+				   write_fault, &writable, false, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index efd0ebf70a5e..2ce0e1d3f597 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -613,7 +613,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 	} else {
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-					   writing, &write_ok, NULL);
+					   writing, &write_ok, false, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 572707858d65..9d40ca02747f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -847,7 +847,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-					   writing, upgrade_p, NULL);
+					   writing, upgrade_p, false, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4de7670d5976..b1e5e42bdeb4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4375,7 +4375,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	async = false;
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async,
 					  fault->write, &fault->map_writable,
-					  &fault->hva);
+					  false, &fault->hva);
 	if (!async)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */
 
@@ -4397,7 +4397,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL,
 					  fault->write, &fault->map_writable,
-					  &fault->hva);
+					  false, &fault->hva);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5201400358da..e8e30088289e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1219,7 +1219,8 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool atomic, bool interruptible, bool *async,
-			       bool write_fault, bool *writable, hva_t *hva);
+			       bool write_fault, bool *writable,
+			       bool can_exit_on_missing, hva_t *hva);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -2423,4 +2424,13 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
+/*
+ * Whether vCPUs should exit upon trying to access memory for which the
+ * userspace mappings are missing.
+ */
+static inline bool kvm_is_slot_exit_on_missing(const struct kvm_memory_slot *slot)
+{
+	return slot && slot->flags & KVM_MEM_EXIT_ON_MISSING;
+}
+
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index bda5622a9c68..18546cbada61 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -116,6 +116,7 @@ struct kvm_userspace_memory_region2 {
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
 #define KVM_MEM_GUEST_MEMFD	(1UL << 2)
+#define KVM_MEM_EXIT_ON_MISSING	(1UL << 3)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -1231,6 +1232,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_EXIT_ON_MISSING 236
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 2c964586aa14..241f524a4e9d 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -109,3 +109,6 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_PRIVATE_MEM
        bool
+
+config HAVE_KVM_EXIT_ON_MISSING
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 725191333c4e..faaccdba179c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1614,7 +1614,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
  * only allows these.
  */
 #define KVM_SET_USER_MEMORY_REGION_V1_FLAGS \
-	(KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)
+	(KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY | KVM_MEM_EXIT_ON_MISSING)
 
 static int check_memory_region_flags(struct kvm *kvm,
 				     const struct kvm_userspace_memory_region2 *mem)
@@ -1632,6 +1632,9 @@ static int check_memory_region_flags(struct kvm *kvm,
 	valid_flags |= KVM_MEM_READONLY;
 #endif
 
+	if (IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING))
+		valid_flags |= KVM_MEM_EXIT_ON_MISSING;
+
 	if (mem->flags & ~valid_flags)
 		return -EINVAL;
 
@@ -3047,7 +3050,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool atomic, bool interruptible, bool *async,
-			       bool write_fault, bool *writable, hva_t *hva)
+			       bool write_fault, bool *writable,
+			       bool can_exit_on_missing, hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -3070,6 +3074,15 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}
 
+	if (!atomic && can_exit_on_missing
+	    && kvm_is_slot_exit_on_missing(slot)) {
+		atomic = true;
+		if (async) {
+			*async = false;
+			async = NULL;
+		}
+	}
+
 	return hva_to_pfn(addr, atomic, interruptible, async, write_fault,
 			  writable);
 }
@@ -3079,21 +3092,21 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
-				    NULL, write_fault, writable, NULL);
+				    NULL, write_fault, writable, false, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true,
-				    NULL, NULL);
+				    NULL, false, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true,
-				    NULL, NULL);
+				    NULL, false, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
@@ -4898,6 +4911,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
+	case KVM_CAP_EXIT_ON_MISSING:
+		return IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING);
 	default:
 		break;
 	}
-- 
2.42.0.869.gea05f2083d-goog


  parent reply	other threads:[~2023-11-09 21:03 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09 21:03 [PATCH v6 00/14] Improve KVM + userfaultfd performance via KVM_MEMORY_FAULT_EXITs on stage-2 faults Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 01/14] KVM: Documentation: Clarify meaning of hva_to_pfn()'s 'atomic' parameter Anish Moorthy
2024-02-07 15:26   ` Sean Christopherson
2024-02-07 18:44     ` Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 02/14] KVM: Documentation: Add docstrings for __kvm_read/write_guest_page() Anish Moorthy
2024-02-07 15:30   ` Sean Christopherson
2023-11-09 21:03 ` [PATCH v6 03/14] KVM: Simplify error handling in __gfn_to_pfn_memslot() Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 04/14] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace Anish Moorthy
2024-02-07 15:30   ` Sean Christopherson
2024-02-07 18:57     ` Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 05/14] KVM: Try using fast GUP to resolve read faults Anish Moorthy
2024-02-05 21:55   ` Anish Moorthy
2023-11-09 21:03 ` Anish Moorthy [this message]
2024-01-31  0:25   ` [PATCH v6 06/14] KVM: Add memslot flag to let userspace force an exit on missing hva mappings James Houghton
2024-01-31 21:59     ` Anish Moorthy
2024-02-01  0:26       ` James Houghton
2024-02-01  1:19         ` Oliver Upton
2024-02-01 16:28           ` Sean Christopherson
2024-02-01 19:24             ` Anish Moorthy
2024-02-02  1:03               ` Oliver Upton
2024-02-02  1:01             ` Oliver Upton
2024-02-01 16:09       ` Sean Christopherson
2024-02-01 19:53         ` Anish Moorthy
2024-02-07 15:35           ` Sean Christopherson
2023-11-09 21:03 ` [PATCH v6 07/14] KVM: x86: Enable KVM_CAP_EXIT_ON_MISSING and annotate EFAULTs from stage-2 fault handler Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
2023-11-09 21:07   ` Anish Moorthy
2024-02-07 15:39     ` Sean Christopherson
2024-02-07 16:41       ` Oliver Upton
2024-02-07 21:21         ` Anish Moorthy
2024-02-07 21:41           ` Sean Christopherson
2024-02-07 22:07           ` Oliver Upton
2024-02-09  1:13             ` Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 09/14] KVM: arm64: Enable KVM_CAP_EXIT_ON_MISSING and annotate an EFAULT from stage-2 fault-handler Anish Moorthy
2024-01-30 23:58   ` James Houghton
2024-01-31 22:38     ` Anish Moorthy
2024-02-09  1:21       ` Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 10/14] KVM: selftests: Report per-vcpu demand paging rate from demand paging test Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 11/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in " Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 12/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2023-11-09 21:03 ` [PATCH v6 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2024-02-07 15:46 ` [PATCH v6 00/14] Improve KVM + userfaultfd performance via KVM_MEMORY_FAULT_EXITs on stage-2 faults Sean Christopherson
2024-02-09 16:00   ` Anish Moorthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231109210325.3806151-7-amoorthy@google.com \
    --to=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=jthoughton@google.com \
    --cc=kconsul@linux.vnet.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=robert.hoo.linux@gmail.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).