All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature
@ 2022-04-14 13:19 Vitaly Kuznetsov
  2022-04-14 13:19 ` [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag Vitaly Kuznetsov
                   ` (34 more replies)
  0 siblings, 35 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Changes since v1:
To address Sean's review comments:
- s,Direct,L2, everywhere.
- s,tlbflush,tlb_flush, everywhere.
- "KVM: x86: hyper-v: Add helper to read hypercall data for array" patch
  added.
- "x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK
  constants" patch added.
- "KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK
  instead of raw '64'" patch added.
- Other code improvements.

Other changes:
- Rebase to the latest kvm/queue.
- "KVM: selftests: add hyperv_svm_test to .gitignore" patch dropped
 (already fixed).
- "KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'" 
 patch added.
- Fix a race in the newly introduced Hyper-V IPI test.

Original description:

Currently, KVM handles HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests
by flushing the whole VPID and this is sub-optimal. This series introduces
the required mechanism to make handling of these requests more 
fine-grained by flushing individual GVAs only (when requested). On this
foundation, "Direct Virtual Flush" Hyper-V feature is implemented. The 
feature allows L0 to handle Hyper-V TLB flush hypercalls directly at
L0 without the need to reflect the exit to L1. This has at least two
benefits: reflecting vmexit and the consequent vmenter are avoided + L0
has precise information whether the target vCPU is actually running (and
thus requires a kick).

Sean Christopherson (1):
  KVM: x86: hyper-v: Add helper to read hypercall data for array

Vitaly Kuznetsov (33):
  KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
  KVM: x86: hyper-v: Introduce TLB flush ring
  KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls
    gently
  KVM: x86: hyper-v: Expose support for extended gva ranges for flush
    hypercalls
  KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
  x86/hyperv: Introduce
    HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  KVM: x86: hyper-v: Use
    HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw
    '64'
  KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in
    kvm_hv_send_ipi()
  KVM: x86: hyper-v: Create a separate ring for L2 TLB flush
  KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv'
    instead of on-stack 'sparse_banks'
  KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use
  KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
  KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook
  KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
  KVM: x86: hyper-v: L2 TLB flush
  KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check
  x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
  KVM: nVMX: hyper-v: Enable L2 TLB flush
  KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of
    KVM_REQ_HV_TLB_FLUSH too
  KVM: nSVM: hyper-v: Enable L2 TLB flush
  KVM: x86: Expose Hyper-V L2 TLB flush feature
  KVM: selftests: Better XMM read/write helpers
  KVM: selftests: Hyper-V PV IPI selftest
  KVM: selftests: Make it possible to replace PTEs with __virt_pg_map()
  KVM: selftests: Hyper-V PV TLB flush selftest
  KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with
    hyperv-tlfs.h
  KVM: selftests: nVMX: Allocate Hyper-V partition assist page
  KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist
    pages
  KVM: selftests: Sync 'struct hv_vp_assist_page' definition with
    hyperv-tlfs.h
  KVM: selftests: evmcs_test: Introduce L2 TLB flush test
  KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h
  KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test
  KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'

 arch/x86/include/asm/hyperv-tlfs.h            |   6 +-
 arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
 arch/x86/include/asm/kvm_host.h               |  37 +-
 arch/x86/kvm/Makefile                         |   3 +-
 arch/x86/kvm/hyperv.c                         | 376 ++++++++--
 arch/x86/kvm/hyperv.h                         |  48 ++
 arch/x86/kvm/svm/hyperv.c                     |  18 +
 arch/x86/kvm/svm/hyperv.h                     |  37 +
 arch/x86/kvm/svm/nested.c                     |  25 +-
 arch/x86/kvm/svm/svm_onhyperv.c               |   2 +-
 arch/x86/kvm/svm/svm_onhyperv.h               |   6 +-
 arch/x86/kvm/trace.h                          |  21 +-
 arch/x86/kvm/vmx/evmcs.c                      |  24 +
 arch/x86/kvm/vmx/evmcs.h                      |  11 +
 arch/x86/kvm/vmx/nested.c                     |  32 +
 arch/x86/kvm/vmx/vmx.c                        |   6 +-
 arch/x86/kvm/x86.c                            |  20 +-
 arch/x86/kvm/x86.h                            |   1 +
 include/asm-generic/hyperv-tlfs.h             |   5 +
 include/asm-generic/mshyperv.h                |  11 +-
 tools/testing/selftests/kvm/.gitignore        |   2 +
 tools/testing/selftests/kvm/Makefile          |   4 +-
 .../selftests/kvm/include/x86_64/evmcs.h      |  40 +-
 .../selftests/kvm/include/x86_64/hyperv.h     |  35 +
 .../selftests/kvm/include/x86_64/processor.h  |  72 +-
 .../selftests/kvm/include/x86_64/svm_util.h   |  10 +
 .../selftests/kvm/include/x86_64/vmx.h        |   4 +
 .../testing/selftests/kvm/lib/x86_64/hyperv.c |  21 +
 .../selftests/kvm/lib/x86_64/processor.c      |   6 +-
 tools/testing/selftests/kvm/lib/x86_64/svm.c  |  10 +
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   7 +
 .../selftests/kvm/max_guest_memory_test.c     |   2 +-
 .../testing/selftests/kvm/x86_64/evmcs_test.c |  53 +-
 .../selftests/kvm/x86_64/hyperv_features.c    |   5 +-
 .../testing/selftests/kvm/x86_64/hyperv_ipi.c | 374 ++++++++++
 .../selftests/kvm/x86_64/hyperv_svm_test.c    |  60 +-
 .../selftests/kvm/x86_64/hyperv_tlb_flush.c   | 647 ++++++++++++++++++
 .../selftests/kvm/x86_64/mmu_role_test.c      |   2 +-
 38 files changed, 1883 insertions(+), 162 deletions(-)
 create mode 100644 arch/x86/kvm/svm/hyperv.c
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/hyperv.c
 create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
 create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:18   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring Vitaly Kuznetsov
                   ` (33 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH
request in kvm_service_local_tlb_flush_requests() when
KVM_REQ_TLB_FLUSH_GUEST was also requested.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/hyperv.c           | 4 ++--
 arch/x86/kvm/x86.c              | 6 +++++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c20f715f009..1de3ad9308d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -105,6 +105,8 @@
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_HV_TLB_FLUSH \
+	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 46f9dfb60469..b402ad059eb9 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1876,11 +1876,11 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
 	if (all_cpus) {
-		kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
+		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
 	} else {
 		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
 
-		kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, vcpu_mask);
+		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
 	}
 
 ret_success:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab336f7c82e4..f633cff8cd7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3360,8 +3360,12 @@ void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
 		kvm_vcpu_flush_tlb_current(vcpu);
 
-	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
+	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
 		kvm_vcpu_flush_tlb_guest(vcpu);
+		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+	} else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
+		kvm_vcpu_flush_tlb_guest(vcpu);
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
  2022-04-14 13:19 ` [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:19   ` Maxim Levitsky
  2022-05-16 19:34   ` Sean Christopherson
  2022-04-14 13:19 ` [PATCH v3 03/34] KVM: x86: hyper-v: Add helper to read hypercall data for array Vitaly Kuznetsov
                   ` (32 subsequent siblings)
  34 siblings, 2 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

To allow flushing individual GVAs instead of always flushing the whole
VPID a per-vCPU structure to pass the requests is needed. Introduce a
simple ring write-locked structure to hold two types of entries:
individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits)
and 'flush all'.

The queuing rule is: if there's not enough space on the ring to put
the request and leave at least 1 entry for 'flush all' - put 'flush
all' entry.

The size of the ring is arbitrary set to '16'.

Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now so
there's very small functional change but the infrastructure is
prepared to handle individual GVA flush requests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 16 +++++++
 arch/x86/kvm/hyperv.c           | 83 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/hyperv.h           | 13 ++++++
 arch/x86/kvm/x86.c              |  5 +-
 arch/x86/kvm/x86.h              |  1 +
 5 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1de3ad9308d8..b4dd2ff61658 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,6 +578,20 @@ struct kvm_vcpu_hv_synic {
 	bool dont_zero_synic_pages;
 };
 
+#define KVM_HV_TLB_FLUSH_RING_SIZE (16)
+
+struct kvm_vcpu_hv_tlb_flush_entry {
+	u64 addr;
+	u64 flush_all:1;
+	u64 pad:63;
+};
+
+struct kvm_vcpu_hv_tlb_flush_ring {
+	int read_idx, write_idx;
+	spinlock_t write_lock;
+	struct kvm_vcpu_hv_tlb_flush_entry entries[KVM_HV_TLB_FLUSH_RING_SIZE];
+};
+
 /* Hyper-V per vcpu emulation context */
 struct kvm_vcpu_hv {
 	struct kvm_vcpu *vcpu;
@@ -597,6 +611,8 @@ struct kvm_vcpu_hv {
 		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
 		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
 	} cpuid_cache;
+
+	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring;
 };
 
 /* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index b402ad059eb9..fb716cf919ed 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -29,6 +29,7 @@
 #include <linux/kvm_host.h>
 #include <linux/highmem.h>
 #include <linux/sched/cputime.h>
+#include <linux/spinlock.h>
 #include <linux/eventfd.h>
 
 #include <asm/apicdef.h>
@@ -954,6 +955,8 @@ static int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
 
 	hv_vcpu->vp_index = vcpu->vcpu_idx;
 
+	spin_lock_init(&hv_vcpu->tlb_flush_ring.write_lock);
+
 	return 0;
 }
 
@@ -1789,6 +1792,74 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
 			      var_cnt * sizeof(*sparse_banks));
 }
 
+static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
+					 int read_idx, int write_idx)
+{
+	if (write_idx >= read_idx)
+		return KVM_HV_TLB_FLUSH_RING_SIZE - (write_idx - read_idx) - 1;
+
+	return read_idx - write_idx - 1;
+}
+
+static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	int ring_free, write_idx, read_idx;
+	unsigned long flags;
+
+	if (!hv_vcpu)
+		return;
+
+	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
+
+	spin_lock_irqsave(&tlb_flush_ring->write_lock, flags);
+
+	/*
+	 * 'read_idx' is updated by the vCPU which does the flush, this
+	 * happens without 'tlb_flush_ring->write_lock' being held; make
+	 * sure we read it once.
+	 */
+	read_idx = READ_ONCE(tlb_flush_ring->read_idx);
+	/*
+	 * 'write_idx' is only updated here, under 'tlb_flush_ring->write_lock'.
+	 * allow the compiler to re-read it, it can't change.
+	 */
+	write_idx = tlb_flush_ring->write_idx;
+
+	ring_free = hv_tlb_flush_ring_free(hv_vcpu, read_idx, write_idx);
+	/* Full ring always contains 'flush all' entry */
+	if (!ring_free)
+		goto out_unlock;
+
+	tlb_flush_ring->entries[write_idx].addr = 0;
+	tlb_flush_ring->entries[write_idx].flush_all = 1;
+	/*
+	 * Advance write index only after filling in the entry to
+	 * synchronize with lockless reader.
+	 */
+	smp_wmb();
+	tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
+
+out_unlock:
+	spin_unlock_irqrestore(&tlb_flush_ring->write_lock, flags);
+}
+
+void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+	kvm_vcpu_flush_tlb_guest(vcpu);
+
+	if (!hv_vcpu)
+		return;
+
+	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
+
+	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
+}
+
 static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -1797,6 +1868,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
 	u64 valid_bank_mask;
 	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
+	struct kvm_vcpu *v;
+	unsigned long i;
 	bool all_cpus;
 
 	/*
@@ -1876,10 +1949,20 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
 	if (all_cpus) {
+		kvm_for_each_vcpu(i, v, kvm)
+			hv_tlb_flush_ring_enqueue(v);
+
 		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
 	} else {
 		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
 
+		for_each_set_bit(i, vcpu_mask, KVM_MAX_VCPUS) {
+			v = kvm_get_vcpu(kvm, i);
+			if (!v)
+				continue;
+			hv_tlb_flush_ring_enqueue(v);
+		}
+
 		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
 	}
 
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index da2737f2a956..6847caeaaf84 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -147,4 +147,17 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
 int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 		     struct kvm_cpuid_entry2 __user *entries);
 
+
+static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+	if (!hv_vcpu)
+		return;
+
+	hv_vcpu->tlb_flush_ring.read_idx = hv_vcpu->tlb_flush_ring.write_idx;
+}
+void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
+
+
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f633cff8cd7f..e5aec386d299 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3324,7 +3324,7 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
 	static_call(kvm_x86_flush_tlb_all)(vcpu);
 }
 
-static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
+void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.tlb_flush;
 
@@ -3362,7 +3362,8 @@ void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
 
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
 		kvm_vcpu_flush_tlb_guest(vcpu);
-		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+			kvm_hv_vcpu_empty_flush_tlb(vcpu);
 	} else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
 		kvm_vcpu_flush_tlb_guest(vcpu);
 	}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 588792f00334..2324f496c500 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -58,6 +58,7 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
 
 #define MSR_IA32_CR_PAT_DEFAULT  0x0007040600070406ULL
 
+void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu);
 void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
 int kvm_check_nested_events(struct kvm_vcpu *vcpu);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 03/34] KVM: x86: hyper-v: Add helper to read hypercall data for array
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
  2022-04-14 13:19 ` [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag Vitaly Kuznetsov
  2022-04-14 13:19 ` [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:20   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Vitaly Kuznetsov
                   ` (31 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

From: Sean Christopherson <seanjc@google.com>

Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
reading a guest-provided array can be reused in the future, e.g. for
getting a list of virtual addresses whose TLB entries need to be flushed.

Opportunisticaly swap the order of the data and XMM adjustment so that
the XMM/gpa offsets are bundled together.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 53 +++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fb716cf919ed..d66c27fd1e8a 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1758,38 +1758,51 @@ struct kvm_hv_hcall {
 	sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
 };
 
-static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
-				 int consumed_xmm_halves,
-				 u64 *sparse_banks, gpa_t offset)
-{
-	u16 var_cnt;
-	int i;
 
-	if (hc->var_cnt > 64)
-		return -EINVAL;
-
-	/* Ignore banks that cannot possibly contain a legal VP index. */
-	var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
+static int kvm_hv_get_hc_data(struct kvm *kvm, struct kvm_hv_hcall *hc,
+			      u16 orig_cnt, u16 cnt_cap, u64 *data,
+			      int consumed_xmm_halves, gpa_t offset)
+{
+	/*
+	 * Preserve the original count when ignoring entries via a "cap", KVM
+	 * still needs to validate the guest input (though the non-XMM path
+	 * punts on the checks).
+	 */
+	u16 cnt = min(orig_cnt, cnt_cap);
+	int i, j;
 
 	if (hc->fast) {
 		/*
 		 * Each XMM holds two sparse banks, but do not count halves that
 		 * have already been consumed for hypercall parameters.
 		 */
-		if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
+		if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
 			return HV_STATUS_INVALID_HYPERCALL_INPUT;
-		for (i = 0; i < var_cnt; i++) {
-			int j = i + consumed_xmm_halves;
+
+		for (i = 0; i < cnt; i++) {
+			j = i + consumed_xmm_halves;
 			if (j % 2)
-				sparse_banks[i] = sse128_hi(hc->xmm[j / 2]);
+				data[i] = sse128_hi(hc->xmm[j / 2]);
 			else
-				sparse_banks[i] = sse128_lo(hc->xmm[j / 2]);
+				data[i] = sse128_lo(hc->xmm[j / 2]);
 		}
 		return 0;
 	}
 
-	return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
-			      var_cnt * sizeof(*sparse_banks));
+	return kvm_read_guest(kvm, hc->ingpa + offset, data,
+			      cnt * sizeof(*data));
+}
+
+static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
+				 u64 *sparse_banks, int consumed_xmm_halves,
+				 gpa_t offset)
+{
+	if (hc->var_cnt > 64)
+		return -EINVAL;
+
+	/* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
+	return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS,
+				  sparse_banks, consumed_xmm_halves, offset);
 }
 
 static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
@@ -1937,7 +1950,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 		if (!hc->var_cnt)
 			goto ret_success;
 
-		if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks,
+		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
 					  offsetof(struct hv_tlb_flush_ex,
 						   hv_vp_set.bank_contents)))
 			return HV_STATUS_INVALID_HYPERCALL_INPUT;
@@ -2048,7 +2061,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 		if (!hc->var_cnt)
 			goto ret_success;
 
-		if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks,
+		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1,
 					  offsetof(struct hv_send_ipi_ex,
 						   vp_set.bank_contents)))
 			return HV_STATUS_INVALID_HYPERCALL_INPUT;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 03/34] KVM: x86: hyper-v: Add helper to read hypercall data for array Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:22   ` Maxim Levitsky
  2022-05-16 19:41   ` Sean Christopherson
  2022-04-14 13:19 ` [PATCH v3 05/34] KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls Vitaly Kuznetsov
                   ` (30 subsequent siblings)
  34 siblings, 2 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
flushing the whole VPID and this is sub-optimal. Switch to handling
these requests with 'flush_tlb_gva()' hooks instead. Use the newly
introduced TLB flush ring to queue the requests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 132 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 115 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index d66c27fd1e8a..759e1a16e5c3 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1805,6 +1805,13 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
 				  sparse_banks, consumed_xmm_halves, offset);
 }
 
+static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc, u64 entries[],
+				       int consumed_xmm_halves, gpa_t offset)
+{
+	return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt,
+				  entries, consumed_xmm_halves, offset);
+}
+
 static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
 					 int read_idx, int write_idx)
 {
@@ -1814,12 +1821,13 @@ static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
 	return read_idx - write_idx - 1;
 }
 
-static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
+static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
 {
 	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	int ring_free, write_idx, read_idx;
 	unsigned long flags;
+	int i;
 
 	if (!hv_vcpu)
 		return;
@@ -1845,14 +1853,34 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
 	if (!ring_free)
 		goto out_unlock;
 
-	tlb_flush_ring->entries[write_idx].addr = 0;
-	tlb_flush_ring->entries[write_idx].flush_all = 1;
 	/*
-	 * Advance write index only after filling in the entry to
-	 * synchronize with lockless reader.
+	 * All entries should fit on the ring leaving one free for 'flush all'
+	 * entry in case another request comes in. In case there's not enough
+	 * space, just put 'flush all' entry there.
+	 */
+	if (!count || count >= ring_free - 1 || !entries) {
+		tlb_flush_ring->entries[write_idx].addr = 0;
+		tlb_flush_ring->entries[write_idx].flush_all = 1;
+		/*
+		 * Advance write index only after filling in the entry to
+		 * synchronize with lockless reader.
+		 */
+		smp_wmb();
+		tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
+		goto out_unlock;
+	}
+
+	for (i = 0; i < count; i++) {
+		tlb_flush_ring->entries[write_idx].addr = entries[i];
+		tlb_flush_ring->entries[write_idx].flush_all = 0;
+		write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
+	}
+	/*
+	 * Advance write index only after filling in the entry to synchronize
+	 * with lockless reader.
 	 */
 	smp_wmb();
-	tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
+	tlb_flush_ring->write_idx = write_idx;
 
 out_unlock:
 	spin_unlock_irqrestore(&tlb_flush_ring->write_lock, flags);
@@ -1862,15 +1890,58 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	struct kvm_vcpu_hv_tlb_flush_entry *entry;
+	int read_idx, write_idx;
+	u64 address;
+	u32 count;
+	int i, j;
 
-	kvm_vcpu_flush_tlb_guest(vcpu);
-
-	if (!hv_vcpu)
+	if (!tdp_enabled || !hv_vcpu) {
+		kvm_vcpu_flush_tlb_guest(vcpu);
 		return;
+	}
 
 	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
 
-	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
+	/*
+	 * TLB flush must be performed on the target vCPU so 'read_idx'
+	 * (AKA 'tail') cannot change underneath, the compiler is free
+	 * to re-read it.
+	 */
+	read_idx = tlb_flush_ring->read_idx;
+
+	/*
+	 * 'write_idx' (AKA 'head') can be concurently updated by a different
+	 * vCPU so we must be sure it's read once.
+	 */
+	write_idx = READ_ONCE(tlb_flush_ring->write_idx);
+
+	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
+	smp_rmb();
+
+	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
+		entry = &tlb_flush_ring->entries[i];
+
+		if (entry->flush_all)
+			goto out_flush_all;
+
+		/*
+		 * Lower 12 bits of 'address' encode the number of additional
+		 * pages to flush.
+		 */
+		address = entry->addr & PAGE_MASK;
+		count = (entry->addr & ~PAGE_MASK) + 1;
+		for (j = 0; j < count; j++)
+			static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
+	}
+	++vcpu->stat.tlb_flush;
+	goto out_empty_ring;
+
+out_flush_all:
+	kvm_vcpu_flush_tlb_guest(vcpu);
+
+out_empty_ring:
+	tlb_flush_ring->read_idx = write_idx;
 }
 
 static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
@@ -1879,11 +1950,22 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	struct hv_tlb_flush_ex flush_ex;
 	struct hv_tlb_flush flush;
 	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
+	/*
+	 * Normally, there can be no more than 'KVM_HV_TLB_FLUSH_RING_SIZE - 1'
+	 * entries on the TLB Flush ring as when 'read_idx == write_idx' the
+	 * ring is considered as empty. The last entry on the ring, however,
+	 * needs to be always left free for 'flush all' entry which gets placed
+	 * when there is not enough space to put all the requested entries.
+	 */
+	u64 __tlb_flush_entries[KVM_HV_TLB_FLUSH_RING_SIZE - 2];
+	u64 *tlb_flush_entries;
 	u64 valid_bank_mask;
 	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
 	struct kvm_vcpu *v;
 	unsigned long i;
 	bool all_cpus;
+	int consumed_xmm_halves = 0;
+	gpa_t data_offset;
 
 	/*
 	 * The Hyper-V TLFS doesn't allow more than 64 sparse banks, e.g. the
@@ -1899,10 +1981,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 			flush.address_space = hc->ingpa;
 			flush.flags = hc->outgpa;
 			flush.processor_mask = sse128_lo(hc->xmm[0]);
+			consumed_xmm_halves = 1;
 		} else {
 			if (unlikely(kvm_read_guest(kvm, hc->ingpa,
 						    &flush, sizeof(flush))))
 				return HV_STATUS_INVALID_HYPERCALL_INPUT;
+			data_offset = sizeof(flush);
 		}
 
 		trace_kvm_hv_flush_tlb(flush.processor_mask,
@@ -1926,10 +2010,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 			flush_ex.flags = hc->outgpa;
 			memcpy(&flush_ex.hv_vp_set,
 			       &hc->xmm[0], sizeof(hc->xmm[0]));
+			consumed_xmm_halves = 2;
 		} else {
 			if (unlikely(kvm_read_guest(kvm, hc->ingpa, &flush_ex,
 						    sizeof(flush_ex))))
 				return HV_STATUS_INVALID_HYPERCALL_INPUT;
+			data_offset = sizeof(flush_ex);
 		}
 
 		trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
@@ -1945,25 +2031,37 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 			return HV_STATUS_INVALID_HYPERCALL_INPUT;
 
 		if (all_cpus)
-			goto do_flush;
+			goto read_flush_entries;
 
 		if (!hc->var_cnt)
 			goto ret_success;
 
-		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
-					  offsetof(struct hv_tlb_flush_ex,
-						   hv_vp_set.bank_contents)))
+		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, consumed_xmm_halves,
+					  data_offset))
+			return HV_STATUS_INVALID_HYPERCALL_INPUT;
+		data_offset += hc->var_cnt * sizeof(sparse_banks[0]);
+		consumed_xmm_halves += hc->var_cnt;
+	}
+
+read_flush_entries:
+	if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
+	    hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
+	    hc->rep_cnt > ARRAY_SIZE(__tlb_flush_entries)) {
+		tlb_flush_entries = NULL;
+	} else {
+		if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries,
+						consumed_xmm_halves, data_offset))
 			return HV_STATUS_INVALID_HYPERCALL_INPUT;
+		tlb_flush_entries = __tlb_flush_entries;
 	}
 
-do_flush:
 	/*
 	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
 	if (all_cpus) {
 		kvm_for_each_vcpu(i, v, kvm)
-			hv_tlb_flush_ring_enqueue(v);
+			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
 
 		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
 	} else {
@@ -1973,7 +2071,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 			v = kvm_get_vcpu(kvm, i);
 			if (!v)
 				continue;
-			hv_tlb_flush_ring_enqueue(v);
+			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
 		}
 
 		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 05/34] KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:23   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs Vitaly Kuznetsov
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Extended GVA ranges support bit seems to indicate whether lower 12
bits of GVA can be used to specify up to 4095 additional consequent
GVAs to flush. This is somewhat described in TLFS.

Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
requests by flushing the whole VPID so technically, extended GVA
ranges were already supported. As such requests are handled more
gently now, advertizing support for extended ranges starts making
sense to reduce the size of TLB flush requests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 2 ++
 arch/x86/kvm/hyperv.c              | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 0a9407dc0859..5225a85c08c3 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -61,6 +61,8 @@
 #define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE		BIT(10)
 /* Support for debug MSRs available */
 #define HV_FEATURE_DEBUG_MSRS_AVAILABLE			BIT(11)
+/* Support for extended gva ranges for flush hypercalls available */
+#define HV_FEATURE_EXT_GVA_RANGES_FLUSH			BIT(14)
 /*
  * Support for returning hypercall output block via XMM
  * registers is available
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 759e1a16e5c3..1a6f9628cee9 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2702,6 +2702,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 			ent->ebx |= HV_DEBUGGING;
 			ent->edx |= HV_X64_GUEST_DEBUGGING_AVAILABLE;
 			ent->edx |= HV_FEATURE_DEBUG_MSRS_AVAILABLE;
+			ent->edx |= HV_FEATURE_EXT_GVA_RANGES_FLUSH;
 
 			/*
 			 * Direct Synthetic timers only make sense with in-kernel
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 05/34] KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:23   ` Maxim Levitsky
  2022-05-11 11:23   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
                   ` (28 subsequent siblings)
  34 siblings, 2 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

To handle L2 TLB flush requests, KVM needs to translate the specified
L2 GPA to L1 GPA to read hypercall arguments from there.

No fucntional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 1a6f9628cee9..fc4bb0ead9fa 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,6 +23,7 @@
 #include "ioapic.h"
 #include "cpuid.h"
 #include "hyperv.h"
+#include "mmu.h"
 #include "xen.h"
 
 #include <linux/cpu.h>
@@ -1975,6 +1976,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	 */
 	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);
 
+	if (!hc->fast && is_guest_mode(vcpu)) {
+		hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL);
+		if (unlikely(hc->ingpa == UNMAPPED_GVA))
+			return HV_STATUS_INVALID_HYPERCALL_INPUT;
+	}
+
 	if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
 	    hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE) {
 		if (hc->fast) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-04-25 15:47   ` Wei Liu
                     ` (4 more replies)
  2022-04-14 13:19 ` [PATCH v3 08/34] KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64' Vitaly Kuznetsov
                   ` (27 subsequent siblings)
  34 siblings, 5 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

It may not come clear from where the magical '64' value used in
__cpumask_to_vpset() come from. Moreover, '64' means both the maximum
sparse bank number as well as the number of vCPUs per bank. Add defines
to make things clear. These defines are also going to be used by KVM.

No functional change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/asm-generic/hyperv-tlfs.h |  5 +++++
 include/asm-generic/mshyperv.h    | 11 ++++++-----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index fdce7a4cfc6f..020ca9bdbb79 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -399,6 +399,11 @@ struct hv_vpset {
 	u64 bank_contents[];
 } __packed;
 
+/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
+#define HV_MAX_SPARSE_VCPU_BANKS (64)
+/* The number of vCPUs in one sparse bank */
+#define HV_VCPUS_PER_SPARSE_BANK (64)
+
 /* HvCallSendSyntheticClusterIpi hypercall */
 struct hv_send_ipi {
 	u32 vector;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c08758b6b364..0abe91df1ef6 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
 {
 	int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
 	int this_cpu = smp_processor_id();
+	int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
 
-	/* valid_bank_mask can represent up to 64 banks */
-	if (hv_max_vp_index / 64 >= 64)
+	/* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
+	if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
 		return 0;
 
 	/*
@@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
 	 * structs are not cleared between calls, we risk flushing unneeded
 	 * vCPUs otherwise.
 	 */
-	for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
+	for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
 		vpset->bank_contents[vcpu_bank] = 0;
 
 	/*
@@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
 		vcpu = hv_cpu_number_to_vp_number(cpu);
 		if (vcpu == VP_INVAL)
 			return -1;
-		vcpu_bank = vcpu / 64;
-		vcpu_offset = vcpu % 64;
+		vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
+		vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
 		__set_bit(vcpu_offset, (unsigned long *)
 			  &vpset->bank_contents[vcpu_bank]);
 		if (vcpu_bank >= nr_bank)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 08/34] KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64'
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:24   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi() Vitaly Kuznetsov
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

It may not be clear from where the '64' limit for the maximum sparse
bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead.
Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's
definition. Opportunistically adjust the comment around BUILD_BUG_ON().

No functional change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fc4bb0ead9fa..3cf68645a2e6 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -43,7 +43,7 @@
 /* "Hv#1" signature */
 #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
 
-#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
+#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, HV_VCPUS_PER_SPARSE_BANK)
 
 static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
 				bool vcpu_kick);
@@ -1798,7 +1798,7 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
 				 u64 *sparse_banks, int consumed_xmm_halves,
 				 gpa_t offset)
 {
-	if (hc->var_cnt > 64)
+	if (hc->var_cnt > HV_MAX_SPARSE_VCPU_BANKS)
 		return -EINVAL;
 
 	/* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
@@ -1969,12 +1969,11 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	gpa_t data_offset;
 
 	/*
-	 * The Hyper-V TLFS doesn't allow more than 64 sparse banks, e.g. the
-	 * valid mask is a u64.  Fail the build if KVM's max allowed number of
-	 * vCPUs (>4096) would exceed this limit, KVM will additional changes
-	 * for Hyper-V support to avoid setting the guest up to fail.
+	 * The Hyper-V TLFS doesn't allow more than HV_MAX_SPARSE_VCPU_BANKS
+	 * sparse banks. Fail the build if KVM's max allowed number of
+	 * vCPUs (>4096) exceeds this limit.
 	 */
-	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);
+	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > HV_MAX_SPARSE_VCPU_BANKS);
 
 	if (!hc->fast && is_guest_mode(vcpu)) {
 		hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 08/34] KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64' Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:24   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 10/34] KVM: x86: hyper-v: Create a separate ring for L2 TLB flush Vitaly Kuznetsov
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
for a smaller number of vCPUs in the request. When Hyper-V TLB flush
is in  use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
send IPIs to a large number of vCPUs (and are rarely used in general).

Introduce hv_is_vp_in_sparse_set() to directly check if the specified
VP_ID is present in sparse vCPU set.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 3cf68645a2e6..aebbb598ad1d 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1746,6 +1746,25 @@ static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
 	}
 }
 
+static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_banks[])
+{
+	int bank, sbank = 0;
+
+	if (!test_bit(vp_id / HV_VCPUS_PER_SPARSE_BANK,
+		      (unsigned long *)&valid_bank_mask))
+		return false;
+
+	for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
+			 KVM_HV_MAX_SPARSE_VCPU_SET_BITS) {
+		if (bank == vp_id / HV_VCPUS_PER_SPARSE_BANK)
+			break;
+		sbank++;
+	}
+
+	return test_bit(vp_id % HV_VCPUS_PER_SPARSE_BANK,
+			(unsigned long *)&sparse_banks[sbank]);
+}
+
 struct kvm_hv_hcall {
 	u64 param;
 	u64 ingpa;
@@ -2089,8 +2108,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 		((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
 }
 
-static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
-				 unsigned long *vcpu_bitmap)
+static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
+				    u64 *sparse_banks, u64 valid_bank_mask)
 {
 	struct kvm_lapic_irq irq = {
 		.delivery_mode = APIC_DM_FIXED,
@@ -2100,7 +2119,10 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
 	unsigned long i;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
+		if (sparse_banks &&
+		    !hv_is_vp_in_sparse_set(kvm_hv_get_vpindex(vcpu),
+					    valid_bank_mask,
+					    sparse_banks))
 			continue;
 
 		/* We fail only when APIC is disabled */
@@ -2113,7 +2135,6 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	struct kvm *kvm = vcpu->kvm;
 	struct hv_send_ipi_ex send_ipi_ex;
 	struct hv_send_ipi send_ipi;
-	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
 	unsigned long valid_bank_mask;
 	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
 	u32 vector;
@@ -2175,13 +2196,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
 		return HV_STATUS_INVALID_HYPERCALL_INPUT;
 
-	if (all_cpus) {
-		kvm_send_ipi_to_many(kvm, vector, NULL);
-	} else {
-		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
-
-		kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
-	}
+	kvm_hv_send_ipi_to_many(kvm, vector, all_cpus ? NULL : sparse_banks, valid_bank_mask);
 
 ret_success:
 	return HV_STATUS_SUCCESS;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 10/34] KVM: x86: hyper-v: Create a separate ring for L2 TLB flush
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi() Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:24   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' Vitaly Kuznetsov
                   ` (24 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

To handle L2 TLB flush requests, KVM needs to use a separate ring from
regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
something in L2 is made, the target vCPU can transition from L2 to L1,
receive a request to flush a GVA for L1 and then try to enter L2 back.
The first request needs to be processed at this point. Similarly,
requests to flush GVAs in L1 must wait until L2 exits to L1.

No functional change as KVM doesn't handle L2 TLB flush requests from
L2 yet.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  8 +++++++-
 arch/x86/kvm/hyperv.c           |  8 +++++---
 arch/x86/kvm/hyperv.h           | 19 ++++++++++++++++---
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b4dd2ff61658..058061621872 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -580,6 +580,12 @@ struct kvm_vcpu_hv_synic {
 
 #define KVM_HV_TLB_FLUSH_RING_SIZE (16)
 
+enum hv_tlb_flush_rings {
+	HV_L1_TLB_FLUSH_RING,
+	HV_L2_TLB_FLUSH_RING,
+	HV_NR_TLB_FLUSH_RINGS,
+};
+
 struct kvm_vcpu_hv_tlb_flush_entry {
 	u64 addr;
 	u64 flush_all:1;
@@ -612,7 +618,7 @@ struct kvm_vcpu_hv {
 		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
 	} cpuid_cache;
 
-	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring;
+	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
 };
 
 /* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index aebbb598ad1d..1cef2b8f7001 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -956,7 +956,8 @@ static int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
 
 	hv_vcpu->vp_index = vcpu->vcpu_idx;
 
-	spin_lock_init(&hv_vcpu->tlb_flush_ring.write_lock);
+	for (i = 0; i < HV_NR_TLB_FLUSH_RINGS; i++)
+		spin_lock_init(&hv_vcpu->tlb_flush_ring[i].write_lock);
 
 	return 0;
 }
@@ -1852,7 +1853,8 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int c
 	if (!hv_vcpu)
 		return;
 
-	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
+	/* kvm_hv_flush_tlb() is not ready to handle requests for L2s yet */
+	tlb_flush_ring = &hv_vcpu->tlb_flush_ring[HV_L1_TLB_FLUSH_RING];
 
 	spin_lock_irqsave(&tlb_flush_ring->write_lock, flags);
 
@@ -1921,7 +1923,7 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 		return;
 	}
 
-	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
+	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
 
 	/*
 	 * TLB flush must be performed on the target vCPU so 'read_idx'
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 6847caeaaf84..d59f96700104 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -22,6 +22,7 @@
 #define __ARCH_X86_KVM_HYPERV_H__
 
 #include <linux/kvm_host.h>
+#include "x86.h"
 
 /*
  * The #defines related to the synthetic debugger are required by KDNet, but
@@ -147,15 +148,27 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
 int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 		     struct kvm_cpuid_entry2 __user *entries);
 
+static inline struct kvm_vcpu_hv_tlb_flush_ring *kvm_hv_get_tlb_flush_ring(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	int i = !is_guest_mode(vcpu) ? HV_L1_TLB_FLUSH_RING :
+		HV_L2_TLB_FLUSH_RING;
+
+	/* KVM does not handle L2 TLB flush requests yet */
+	WARN_ON_ONCE(i != HV_L1_TLB_FLUSH_RING);
+
+	return &hv_vcpu->tlb_flush_ring[i];
+}
 
 static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
 
-	if (!hv_vcpu)
+	if (!to_hv_vcpu(vcpu))
 		return;
 
-	hv_vcpu->tlb_flush_ring.read_idx = hv_vcpu->tlb_flush_ring.write_idx;
+	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
+	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
 }
 void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (9 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 10/34] KVM: x86: hyper-v: Create a separate ring for L2 TLB flush Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:25   ` Maxim Levitsky
  2022-05-16 20:05   ` Sean Christopherson
  2022-04-14 13:19 ` [PATCH v3 12/34] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use Vitaly Kuznetsov
                   ` (23 subsequent siblings)
  34 siblings, 2 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
dynamically.

Note: sparse_set_to_vcpu_mask() keeps using on-stack allocation as it
won't be used to handle L2 TLB flush requests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 3 +++
 arch/x86/kvm/hyperv.c           | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 058061621872..837c07e213de 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -619,6 +619,9 @@ struct kvm_vcpu_hv {
 	} cpuid_cache;
 
 	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
+
+	/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
+	u64 sparse_banks[64];
 };
 
 /* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 1cef2b8f7001..e9793d36acca 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1968,6 +1968,8 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 
 static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 {
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	u64 *sparse_banks = hv_vcpu->sparse_banks;
 	struct kvm *kvm = vcpu->kvm;
 	struct hv_tlb_flush_ex flush_ex;
 	struct hv_tlb_flush flush;
@@ -1982,7 +1984,6 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	u64 __tlb_flush_entries[KVM_HV_TLB_FLUSH_RING_SIZE - 2];
 	u64 *tlb_flush_entries;
 	u64 valid_bank_mask;
-	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
 	struct kvm_vcpu *v;
 	unsigned long i;
 	bool all_cpus;
@@ -2134,11 +2135,12 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
 
 static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 {
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	u64 *sparse_banks = hv_vcpu->sparse_banks;
 	struct kvm *kvm = vcpu->kvm;
 	struct hv_send_ipi_ex send_ipi_ex;
 	struct hv_send_ipi send_ipi;
 	unsigned long valid_bank_mask;
-	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
 	u32 vector;
 	bool all_cpus;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 12/34] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (10 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:25   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id Vitaly Kuznetsov
                   ` (22 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
also needed to handle post-flush exit to L1 upon request.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/vmx/nested.c       | 15 +++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 837c07e213de..8b2a52bf26c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -622,6 +622,12 @@ struct kvm_vcpu_hv {
 
 	/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
 	u64 sparse_banks[64];
+
+	struct {
+		u64 pa_page_gpa;
+		u64 vm_id;
+		u32 vp_id;
+	} nested;
 };
 
 /* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a6688663da4d..ee88921c6156 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -225,6 +225,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
 
 static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
@@ -233,6 +234,12 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
 	}
 
 	vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
+
+	if (hv_vcpu) {
+		hv_vcpu->nested.pa_page_gpa = INVALID_GPA;
+		hv_vcpu->nested.vm_id = 0;
+		hv_vcpu->nested.vp_id = 0;
+	}
 }
 
 static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
@@ -1591,11 +1598,19 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
 {
 	struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12;
 	struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(&vmx->vcpu);
 
 	/* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */
 	vmcs12->tpr_threshold = evmcs->tpr_threshold;
 	vmcs12->guest_rip = evmcs->guest_rip;
 
+	if (unlikely(!(hv_clean_fields &
+		       HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL))) {
+		hv_vcpu->nested.pa_page_gpa = evmcs->partition_assist_page;
+		hv_vcpu->nested.vm_id = evmcs->hv_vm_id;
+		hv_vcpu->nested.vp_id = evmcs->hv_vp_id;
+	}
+
 	if (unlikely(!(hv_clean_fields &
 		       HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC))) {
 		vmcs12->guest_rsp = evmcs->guest_rsp;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (11 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 12/34] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:27   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook Vitaly Kuznetsov
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
assist page address to handle L2 TLB flush requests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
 arch/x86/kvm/svm/nested.c |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index 7d6d97968fb9..8cf702fed7e5 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -9,6 +9,7 @@
 #include <asm/mshyperv.h>
 
 #include "../hyperv.h"
+#include "svm.h"
 
 /*
  * Hyper-V uses the software reserved 32 bytes in VMCB
@@ -32,4 +33,19 @@ struct hv_enlightenments {
  */
 #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
 
+static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct hv_enlightenments *hve =
+		(struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+	if (!hv_vcpu)
+		return;
+
+	hv_vcpu->nested.pa_page_gpa = hve->partition_assist_page;
+	hv_vcpu->nested.vm_id = hve->hv_vm_id;
+	hv_vcpu->nested.vp_id = hve->hv_vp_id;
+}
+
 #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bed5e1692cef..2d1a76343404 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -826,6 +826,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 
 	svm->nested.nested_run_pending = 1;
 
+	nested_svm_hv_update_vm_vp_ids(vcpu);
+
 	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
 		goto out_exit_err;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (12 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:32   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall() Vitaly Kuznetsov
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Hyper-V supports injecting synthetic L2->L1 exit after performing
L2 TLB flush operation but the procedure is vendor specific.
Introduce .post_hv_l2_tlb_flush() nested hook for it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/Makefile           |  3 ++-
 arch/x86/kvm/svm/hyperv.c       | 11 +++++++++++
 arch/x86/kvm/svm/hyperv.h       |  2 ++
 arch/x86/kvm/svm/nested.c       |  1 +
 arch/x86/kvm/vmx/evmcs.c        |  4 ++++
 arch/x86/kvm/vmx/evmcs.h        |  1 +
 arch/x86/kvm/vmx/nested.c       |  1 +
 8 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/svm/hyperv.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8b2a52bf26c0..ce62fde5f4ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1558,6 +1558,7 @@ struct kvm_x86_nested_ops {
 	int (*enable_evmcs)(struct kvm_vcpu *vcpu,
 			    uint16_t *vmcs_version);
 	uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
+	void (*post_hv_l2_tlb_flush)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_init_ops {
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 30f244b64523..b6d53b045692 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -25,7 +25,8 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
 
-kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
+kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
+			   svm/sev.o svm/hyperv.o
 
 ifdef CONFIG_HYPERV
 kvm-amd-y		+= svm/svm_onhyperv.o
diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
new file mode 100644
index 000000000000..c0749fc282fe
--- /dev/null
+++ b/arch/x86/kvm/svm/hyperv.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM specific code for Hyper-V on KVM.
+ *
+ * Copyright 2022 Red Hat, Inc. and/or its affiliates.
+ */
+#include "hyperv.h"
+
+void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index 8cf702fed7e5..a2b0d7580b0d 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -48,4 +48,6 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
 	hv_vcpu->nested.vp_id = hve->hv_vp_id;
 }
 
+void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
+
 #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2d1a76343404..de3f27301b5c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1665,4 +1665,5 @@ struct kvm_x86_nested_ops svm_nested_ops = {
 	.get_nested_state_pages = svm_get_nested_state_pages,
 	.get_state = svm_get_nested_state,
 	.set_state = svm_set_nested_state,
+	.post_hv_l2_tlb_flush = svm_post_hv_l2_tlb_flush,
 };
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 87e3dc10edf4..e390e67496df 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -437,3 +437,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 
 	return 0;
 }
+
+void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 8d70f9aea94b..b120b0ead4f3 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -244,5 +244,6 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version);
 void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
 int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
+void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
 
 #endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ee88921c6156..cc6c944b5815 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6850,4 +6850,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
 	.write_log_dirty = nested_vmx_write_pml_buffer,
 	.enable_evmcs = nested_enable_evmcs,
 	.get_evmcs_version = nested_get_evmcs_version,
+	.post_hv_l2_tlb_flush = vmx_post_hv_l2_tlb_flush,
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (13 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:25   ` Maxim Levitsky
  2022-05-16 20:09   ` Sean Christopherson
  2022-04-14 13:19 ` [PATCH v3 16/34] KVM: x86: hyper-v: L2 TLB flush Vitaly Kuznetsov
                   ` (19 subsequent siblings)
  34 siblings, 2 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

The newly introduced helper checks whether vCPU is performing a
Hyper-V TLB flush hypercall. This is required to filter out L2 TLB
flush hypercalls for processing.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index d59f96700104..ca67c18cef2c 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -170,6 +170,24 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
 	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
 	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
 }
+
+static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	u16 code;
+
+	if (!hv_vcpu)
+		return false;
+
+	code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read(vcpu) :
+		kvm_rax_read(vcpu);
+
+	return (code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
+		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
+		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
+		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX);
+}
+
 void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
 
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 16/34] KVM: x86: hyper-v: L2 TLB flush
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (14 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall() Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:29   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check Vitaly Kuznetsov
                   ` (18 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Handle L2 TLB flush requests by going through all vCPUs and checking
whether there are vCPUs running the same VM_ID with a VP_ID specified
in the requests. Perform synthetic exit to L2 upon finish.

Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit
racy, we count on the fact that KVM flushes the whole L2 VPID upon
transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon
transition between L1 and L2 to make sure all pending requests are
always processed.

For the reference, Hyper-V TLFS refers to the feature as "Direct
Virtual Flush".

Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 73 ++++++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/hyperv.h |  3 --
 arch/x86/kvm/trace.h  | 21 ++++++++-----
 3 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index e9793d36acca..79aabe0c33ec 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -34,6 +34,7 @@
 #include <linux/eventfd.h>
 
 #include <asm/apicdef.h>
+#include <asm/mshyperv.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
@@ -1842,9 +1843,10 @@ static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
 	return read_idx - write_idx - 1;
 }
 
-static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
+static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
+				      struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring,
+				      u64 *entries, int count)
 {
-	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	int ring_free, write_idx, read_idx;
 	unsigned long flags;
@@ -1853,9 +1855,6 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int c
 	if (!hv_vcpu)
 		return;
 
-	/* kvm_hv_flush_tlb() is not ready to handle requests for L2s yet */
-	tlb_flush_ring = &hv_vcpu->tlb_flush_ring[HV_L1_TLB_FLUSH_RING];
-
 	spin_lock_irqsave(&tlb_flush_ring->write_lock, flags);
 
 	/*
@@ -1974,6 +1973,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	struct hv_tlb_flush_ex flush_ex;
 	struct hv_tlb_flush flush;
 	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
+	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
 	/*
 	 * Normally, there can be no more than 'KVM_HV_TLB_FLUSH_RING_SIZE - 1'
 	 * entries on the TLB Flush ring as when 'read_idx == write_idx' the
@@ -2018,7 +2018,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 		}
 
 		trace_kvm_hv_flush_tlb(flush.processor_mask,
-				       flush.address_space, flush.flags);
+				       flush.address_space, flush.flags,
+				       is_guest_mode(vcpu));
 
 		valid_bank_mask = BIT_ULL(0);
 		sparse_banks[0] = flush.processor_mask;
@@ -2049,7 +2050,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 		trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
 					  flush_ex.hv_vp_set.format,
 					  flush_ex.address_space,
-					  flush_ex.flags);
+					  flush_ex.flags, is_guest_mode(vcpu));
 
 		valid_bank_mask = flush_ex.hv_vp_set.valid_bank_mask;
 		all_cpus = flush_ex.hv_vp_set.format !=
@@ -2083,23 +2084,54 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 		tlb_flush_entries = __tlb_flush_entries;
 	}
 
+	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
+
 	/*
 	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
-	if (all_cpus) {
+	if (all_cpus && !is_guest_mode(vcpu)) {
 		kvm_for_each_vcpu(i, v, kvm)
-			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
+			hv_tlb_flush_ring_enqueue(v, tlb_flush_ring,
+						  tlb_flush_entries, hc->rep_cnt);
 
 		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
-	} else {
+	} else if (!is_guest_mode(vcpu)) {
 		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
 
 		for_each_set_bit(i, vcpu_mask, KVM_MAX_VCPUS) {
 			v = kvm_get_vcpu(kvm, i);
 			if (!v)
 				continue;
-			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
+			hv_tlb_flush_ring_enqueue(v, tlb_flush_ring,
+						  tlb_flush_entries, hc->rep_cnt);
+		}
+
+		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
+	} else {
+		struct kvm_vcpu_hv *hv_v;
+
+		bitmap_zero(vcpu_mask, KVM_MAX_VCPUS);
+
+		kvm_for_each_vcpu(i, v, kvm) {
+			hv_v = to_hv_vcpu(v);
+
+			/*
+			 * TLB is fully flushed on L2 VM change: either by KVM
+			 * (on a eVMPTR switch) or by L1 hypervisor (in case it
+			 * re-purposes the active eVMCS for a different VM/VP).
+			 */
+			if (!hv_v || hv_v->nested.vm_id != hv_vcpu->nested.vm_id)
+				continue;
+
+			if (!all_cpus &&
+			    !hv_is_vp_in_sparse_set(hv_v->nested.vp_id, valid_bank_mask,
+						    sparse_banks))
+				continue;
+
+			__set_bit(i, vcpu_mask);
+			hv_tlb_flush_ring_enqueue(v, tlb_flush_ring,
+						  tlb_flush_entries, hc->rep_cnt);
 		}
 
 		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
@@ -2287,10 +2319,27 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
 
 static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
 {
+	int ret;
+
 	trace_kvm_hv_hypercall_done(result);
 	kvm_hv_hypercall_set_result(vcpu, result);
 	++vcpu->stat.hypercalls;
-	return kvm_skip_emulated_instruction(vcpu);
+	ret = kvm_skip_emulated_instruction(vcpu);
+
+	if (unlikely(hv_result_success(result) && is_guest_mode(vcpu)
+		     && kvm_hv_is_tlb_flush_hcall(vcpu))) {
+		struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+		u32 tlb_lock_count;
+
+		if (unlikely(kvm_read_guest(vcpu->kvm, hv_vcpu->nested.pa_page_gpa,
+					    &tlb_lock_count, sizeof(tlb_lock_count))))
+			kvm_inject_gp(vcpu, 0);
+
+		if (tlb_lock_count)
+			kvm_x86_ops.nested_ops->post_hv_l2_tlb_flush(vcpu);
+	}
+
+	return ret;
 }
 
 static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index ca67c18cef2c..f593c9fd1dee 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -154,9 +154,6 @@ static inline struct kvm_vcpu_hv_tlb_flush_ring *kvm_hv_get_tlb_flush_ring(struc
 	int i = !is_guest_mode(vcpu) ? HV_L1_TLB_FLUSH_RING :
 		HV_L2_TLB_FLUSH_RING;
 
-	/* KVM does not handle L2 TLB flush requests yet */
-	WARN_ON_ONCE(i != HV_L1_TLB_FLUSH_RING);
-
 	return &hv_vcpu->tlb_flush_ring[i];
 }
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index e3a24b8f04be..af7896182935 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1479,38 +1479,41 @@ TRACE_EVENT(kvm_hv_timer_state,
  * Tracepoint for kvm_hv_flush_tlb.
  */
 TRACE_EVENT(kvm_hv_flush_tlb,
-	TP_PROTO(u64 processor_mask, u64 address_space, u64 flags),
-	TP_ARGS(processor_mask, address_space, flags),
+	TP_PROTO(u64 processor_mask, u64 address_space, u64 flags, bool guest_mode),
+	TP_ARGS(processor_mask, address_space, flags, guest_mode),
 
 	TP_STRUCT__entry(
 		__field(u64, processor_mask)
 		__field(u64, address_space)
 		__field(u64, flags)
+		__field(bool, guest_mode)
 	),
 
 	TP_fast_assign(
 		__entry->processor_mask = processor_mask;
 		__entry->address_space = address_space;
 		__entry->flags = flags;
+		__entry->guest_mode = guest_mode;
 	),
 
-	TP_printk("processor_mask 0x%llx address_space 0x%llx flags 0x%llx",
+	TP_printk("processor_mask 0x%llx address_space 0x%llx flags 0x%llx %s",
 		  __entry->processor_mask, __entry->address_space,
-		  __entry->flags)
+		  __entry->flags, __entry->guest_mode ? "(L2)" : "")
 );
 
 /*
  * Tracepoint for kvm_hv_flush_tlb_ex.
  */
 TRACE_EVENT(kvm_hv_flush_tlb_ex,
-	TP_PROTO(u64 valid_bank_mask, u64 format, u64 address_space, u64 flags),
-	TP_ARGS(valid_bank_mask, format, address_space, flags),
+	TP_PROTO(u64 valid_bank_mask, u64 format, u64 address_space, u64 flags, bool guest_mode),
+	TP_ARGS(valid_bank_mask, format, address_space, flags, guest_mode),
 
 	TP_STRUCT__entry(
 		__field(u64, valid_bank_mask)
 		__field(u64, format)
 		__field(u64, address_space)
 		__field(u64, flags)
+		__field(bool, guest_mode)
 	),
 
 	TP_fast_assign(
@@ -1518,12 +1521,14 @@ TRACE_EVENT(kvm_hv_flush_tlb_ex,
 		__entry->format = format;
 		__entry->address_space = address_space;
 		__entry->flags = flags;
+		__entry->guest_mode = guest_mode;
 	),
 
 	TP_printk("valid_bank_mask 0x%llx format 0x%llx "
-		  "address_space 0x%llx flags 0x%llx",
+		  "address_space 0x%llx flags 0x%llx %s",
 		  __entry->valid_bank_mask, __entry->format,
-		  __entry->address_space, __entry->flags)
+		  __entry->address_space, __entry->flags,
+		  __entry->guest_mode ? "(L2)" : "")
 );
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (15 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 16/34] KVM: x86: hyper-v: L2 TLB flush Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:30   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 18/34] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
                   ` (17 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
from L2 in L0 to process L2 TLB flush requests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/hyperv.c           | 6 ++++++
 arch/x86/kvm/hyperv.h           | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ce62fde5f4ff..168600490bd1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
 		u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
 		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
 		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
+		u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
 	} cpuid_cache;
 
 	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 79aabe0c33ec..68a0df4e3f66 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
 		hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
 	else
 		hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
+
+	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
+	if (entry)
+		hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
+	else
+		hv_vcpu->cpuid_cache.nested_features_eax = 0;
 }
 
 int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index f593c9fd1dee..d8cb6d70dbc8 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
 	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
 }
 
+static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+	return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
+}
+
 static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 18/34] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (16 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:30   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
                   ` (16 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Section 1.9 of TLFS v6.0b says:

"All structures are padded in such a way that fields are aligned
naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
and so on)".

'struct enlightened_vmcs' has a glitch:

...
        struct {
                u32                nested_flush_hypercall:1; /*   836: 0  4 */
                u32                msr_bitmap:1;         /*   836: 1  4 */
                u32                reserved:30;          /*   836: 2  4 */
        } hv_enlightenments_control;                     /*   836     4 */
        u32                        hv_vp_id;             /*   840     4 */
        u64                        hv_vm_id;             /*   844     8 */
        u64                        partition_assist_page; /*   852     8 */
...

And the observed values in 'partition_assist_page' make no sense at
all. Fix the layout by padding the structure properly.

Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 5225a85c08c3..e7ddae8e02c6 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -548,7 +548,7 @@ struct hv_enlightened_vmcs {
 	u64 guest_rip;
 
 	u32 hv_clean_fields;
-	u32 hv_padding_32;
+	u32 padding32_1;
 	u32 hv_synthetic_controls;
 	struct {
 		u32 nested_flush_hypercall:1;
@@ -556,7 +556,7 @@ struct hv_enlightened_vmcs {
 		u32 reserved:30;
 	}  __packed hv_enlightenments_control;
 	u32 hv_vp_id;
-
+	u32 padding32_2;
 	u64 hv_vm_id;
 	u64 partition_assist_page;
 	u64 padding64_4[4];
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (17 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 18/34] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:31   ` Maxim Levitsky
  2022-04-14 13:19 ` [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too Vitaly Kuznetsov
                   ` (15 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Enable L2 TLB flush feature on nVMX when:
- Enlightened VMCS is in use.
- The feature flag is enabled in eVMCS.
- The feature flag is enabled in partition assist page.

Perform synthetic vmexit to L1 after processing TLB flush call upon
request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c  | 20 ++++++++++++++++++++
 arch/x86/kvm/vmx/evmcs.h  | 10 ++++++++++
 arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index e390e67496df..e0cb2e223daa 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -6,6 +6,7 @@
 #include "../hyperv.h"
 #include "../cpuid.h"
 #include "evmcs.h"
+#include "nested.h"
 #include "vmcs.h"
 #include "vmx.h"
 #include "trace.h"
@@ -438,6 +439,25 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
+	struct hv_vp_assist_page assist_page;
+
+	if (!evmcs)
+		return false;
+
+	if (!evmcs->hv_enlightenments_control.nested_flush_hypercall)
+		return false;
+
+	if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
+		return false;
+
+	return assist_page.nested_control.features.directhypercall;
+}
+
 void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	nested_vmx_vmexit(vcpu, HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH, 0, 0);
 }
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index b120b0ead4f3..ddbdb557cc53 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -65,6 +65,15 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
 #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
 #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
 
+/*
+ * Note, Hyper-V isn't actually stealing bit 28 from Intel, just abusing it by
+ * pairing it with architecturally impossible exit reasons.  Bit 28 is set only
+ * on SMI exits to a SMI transfer monitor (STM) and if and only if a MTF VM-Exit
+ * is pending.  I.e. it will never be set by hardware for non-SMI exits (there
+ * are only three), nor will it ever be set unless the VMM is an STM.
+ */
+#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
+
 struct evmcs_field {
 	u16 offset;
 	u16 clean_field;
@@ -244,6 +253,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version);
 void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
 int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
+bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu);
 void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
 
 #endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cc6c944b5815..3e2ef5edad4a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1170,6 +1170,17 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	/*
+	 * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
+	 * L2's VP_ID upon request from the guest. Make sure we check for
+	 * pending entries for the case when the request got misplaced (e.g.
+	 * a transition from L2->L1 happened while processing L2 TLB flush
+	 * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
+	 * anything if there are no requests in the corresponding buffer.
+	 */
+	if (to_hv_vcpu(vcpu))
+		kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+
 	/*
 	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
 	 * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
@@ -5997,6 +6008,11 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
 		 * Handle L2's bus locks in L0 directly.
 		 */
 		return true;
+	case EXIT_REASON_VMCALL:
+		/* Hyper-V L2 TLB flush hypercall is handled by L0 */
+		return kvm_hv_l2_tlb_flush_exposed(vcpu) &&
+			nested_evmcs_l2_tlb_flush_enabled(vcpu) &&
+			kvm_hv_is_tlb_flush_hcall(vcpu);
 	default:
 		break;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (18 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
@ 2022-04-14 13:19 ` Vitaly Kuznetsov
  2022-05-11 11:33   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 21/34] KVM: nSVM: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
                   ` (14 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:19 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

KVM_REQ_TLB_FLUSH_CURRENT is an even stronger operation than
KVM_REQ_TLB_FLUSH_GUEST so KVM_REQ_HV_TLB_FLUSH needs not to be
processed after it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/x86.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5aec386d299..d3839e648ab3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3357,8 +3357,11 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
  */
 void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
 {
-	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
+	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
 		kvm_vcpu_flush_tlb_current(vcpu);
+		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+			kvm_hv_vcpu_empty_flush_tlb(vcpu);
+	}
 
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
 		kvm_vcpu_flush_tlb_guest(vcpu);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 21/34] KVM: nSVM: hyper-v: Enable L2 TLB flush
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (19 preceding siblings ...)
  2022-04-14 13:19 ` [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 11:33   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 22/34] KVM: x86: Expose Hyper-V L2 TLB flush feature Vitaly Kuznetsov
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
both in extended 'nested controls' in VMCB and partition assist page.
According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
- HV_SVM_EXITCODE_ENL exit_code.
- HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/svm/hyperv.c |  7 +++++++
 arch/x86/kvm/svm/hyperv.h | 19 +++++++++++++++++++
 arch/x86/kvm/svm/nested.c | 22 +++++++++++++++++++++-
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
index c0749fc282fe..3842548bb88c 100644
--- a/arch/x86/kvm/svm/hyperv.c
+++ b/arch/x86/kvm/svm/hyperv.c
@@ -8,4 +8,11 @@
 
 void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL;
+	svm->vmcb->control.exit_code_hi = 0;
+	svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH;
+	svm->vmcb->control.exit_info_2 = 0;
+	nested_svm_vmexit(svm);
 }
diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index a2b0d7580b0d..cd33e89f9f61 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -33,6 +33,9 @@ struct hv_enlightenments {
  */
 #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
 
+#define HV_SVM_EXITCODE_ENL 0xF0000000
+#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH   (1)
+
 static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -48,6 +51,22 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
 	hv_vcpu->nested.vp_id = hve->hv_vp_id;
 }
 
+static inline bool nested_svm_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct hv_enlightenments *hve =
+		(struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
+	struct hv_vp_assist_page assist_page;
+
+	if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
+		return false;
+
+	if (!hve->hv_enlightenments_control.nested_flush_hypercall)
+		return false;
+
+	return assist_page.nested_control.features.directhypercall;
+}
+
 void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
 
 #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de3f27301b5c..a6d9807c09b1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -172,7 +172,8 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	}
 
 	/* We don't want to see VMMCALLs from a nested guest */
-	vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
+	if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
+		vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
 
 	for (i = 0; i < MAX_INTERCEPT; i++)
 		c->intercepts[i] |= g->intercepts[i];
@@ -488,6 +489,17 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 
 static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
+	 * L2's VP_ID upon request from the guest. Make sure we check for
+	 * pending entries for the case when the request got misplaced (e.g.
+	 * a transition from L2->L1 happened while processing L2 TLB flush
+	 * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
+	 * anything if there are no requests in the corresponding buffer.
+	 */
+	if (to_hv_vcpu(vcpu))
+		kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+
 	/*
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
@@ -1357,6 +1369,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 int nested_svm_exit_special(struct vcpu_svm *svm)
 {
 	u32 exit_code = svm->vmcb->control.exit_code;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	switch (exit_code) {
 	case SVM_EXIT_INTR:
@@ -1375,6 +1388,13 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 			return NESTED_EXIT_HOST;
 		break;
 	}
+	case SVM_EXIT_VMMCALL:
+		/* Hyper-V L2 TLB flush hypercall is handled by L0 */
+		if (kvm_hv_l2_tlb_flush_exposed(vcpu) &&
+		    nested_svm_l2_tlb_flush_enabled(vcpu) &&
+		    kvm_hv_is_tlb_flush_hcall(vcpu))
+			return NESTED_EXIT_HOST;
+		break;
 	default:
 		break;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 22/34] KVM: x86: Expose Hyper-V L2 TLB flush feature
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (20 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 21/34] KVM: nSVM: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 11:34   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 23/34] KVM: selftests: Better XMM read/write helpers Vitaly Kuznetsov
                   ` (12 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

With both nSVM and nVMX implementations in place, KVM can now expose
Hyper-V L2 TLB flush feature to userspace.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 68a0df4e3f66..1d6927538bc7 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2826,6 +2826,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 
 		case HYPERV_CPUID_NESTED_FEATURES:
 			ent->eax = evmcs_ver;
+			ent->eax |= HV_X64_NESTED_DIRECT_FLUSH;
 			ent->eax |= HV_X64_NESTED_MSR_BITMAP;
 
 			break;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 23/34] KVM: selftests: Better XMM read/write helpers
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (21 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 22/34] KVM: x86: Expose Hyper-V L2 TLB flush feature Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 11:34   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 24/34] KVM: selftests: Hyper-V PV IPI selftest Vitaly Kuznetsov
                   ` (11 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
from 128-bit registers. Moreover, these helpers are not used. Borrow
_kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
XMM0-XMM8 for now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/include/x86_64/processor.h  | 70 ++++++++++---------
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 37db341d4cc5..9ad7602a257b 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -296,71 +296,73 @@ static inline void cpuid(uint32_t *eax, uint32_t *ebx,
 	    : "memory");
 }
 
-#define SET_XMM(__var, __xmm) \
-	asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
+typedef u32		__attribute__((vector_size(16))) sse128_t;
+#define __sse128_u	union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
+#define sse128_lo(x)	({ __sse128_u t; t.vec = x; t.as_u64[0]; })
+#define sse128_hi(x)	({ __sse128_u t; t.vec = x; t.as_u64[1]; })
 
-static inline void set_xmm(int n, unsigned long val)
+static inline void read_sse_reg(int reg, sse128_t *data)
 {
-	switch (n) {
+	switch (reg) {
 	case 0:
-		SET_XMM(val, xmm0);
+		asm("movdqa %%xmm0, %0" : "=m"(*data));
 		break;
 	case 1:
-		SET_XMM(val, xmm1);
+		asm("movdqa %%xmm1, %0" : "=m"(*data));
 		break;
 	case 2:
-		SET_XMM(val, xmm2);
+		asm("movdqa %%xmm2, %0" : "=m"(*data));
 		break;
 	case 3:
-		SET_XMM(val, xmm3);
+		asm("movdqa %%xmm3, %0" : "=m"(*data));
 		break;
 	case 4:
-		SET_XMM(val, xmm4);
+		asm("movdqa %%xmm4, %0" : "=m"(*data));
 		break;
 	case 5:
-		SET_XMM(val, xmm5);
+		asm("movdqa %%xmm5, %0" : "=m"(*data));
 		break;
 	case 6:
-		SET_XMM(val, xmm6);
+		asm("movdqa %%xmm6, %0" : "=m"(*data));
 		break;
 	case 7:
-		SET_XMM(val, xmm7);
+		asm("movdqa %%xmm7, %0" : "=m"(*data));
 		break;
+	default:
+		BUG();
 	}
 }
 
-#define GET_XMM(__xmm)							\
-({									\
-	unsigned long __val;						\
-	asm volatile("movq %%"#__xmm", %0" : "=r"(__val));		\
-	__val;								\
-})
-
-static inline unsigned long get_xmm(int n)
+static inline void write_sse_reg(int reg, const sse128_t *data)
 {
-	assert(n >= 0 && n <= 7);
-
-	switch (n) {
+	switch (reg) {
 	case 0:
-		return GET_XMM(xmm0);
+		asm("movdqa %0, %%xmm0" : : "m"(*data));
+		break;
 	case 1:
-		return GET_XMM(xmm1);
+		asm("movdqa %0, %%xmm1" : : "m"(*data));
+		break;
 	case 2:
-		return GET_XMM(xmm2);
+		asm("movdqa %0, %%xmm2" : : "m"(*data));
+		break;
 	case 3:
-		return GET_XMM(xmm3);
+		asm("movdqa %0, %%xmm3" : : "m"(*data));
+		break;
 	case 4:
-		return GET_XMM(xmm4);
+		asm("movdqa %0, %%xmm4" : : "m"(*data));
+		break;
 	case 5:
-		return GET_XMM(xmm5);
+		asm("movdqa %0, %%xmm5" : : "m"(*data));
+		break;
 	case 6:
-		return GET_XMM(xmm6);
+		asm("movdqa %0, %%xmm6" : : "m"(*data));
+		break;
 	case 7:
-		return GET_XMM(xmm7);
+		asm("movdqa %0, %%xmm7" : : "m"(*data));
+		break;
+	default:
+		BUG();
 	}
-
-	/* never reached */
-	return 0;
 }
 
 static inline void cpu_relax(void)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 24/34] KVM: selftests: Hyper-V PV IPI selftest
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (22 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 23/34] KVM: selftests: Better XMM read/write helpers Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 11:35   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 25/34] KVM: selftests: Make it possible to replace PTEs with __virt_pg_map() Vitaly Kuznetsov
                   ` (10 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Introduce a selftest for Hyper-V PV IPI hypercalls
(HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx).

The test creates one 'sender' vCPU and two 'receiver' vCPU and then
issues various combinations of send IPI hypercalls in both 'normal'
and 'fast' (with XMM input where necessary) mode. Later, the test
checks whether IPIs were delivered to the expected destination vCPU[s].

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/hyperv.h     |   3 +
 .../selftests/kvm/x86_64/hyperv_features.c    |   5 +-
 .../testing/selftests/kvm/x86_64/hyperv_ipi.c | 374 ++++++++++++++++++
 5 files changed, 381 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_ipi.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 56140068b763..5d5fbb161d56 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -23,6 +23,7 @@
 /x86_64/hyperv_clock
 /x86_64/hyperv_cpuid
 /x86_64/hyperv_features
+/x86_64/hyperv_ipi
 /x86_64/hyperv_svm_test
 /x86_64/mmio_warning_test
 /x86_64/mmu_role_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index af582d168621..44889f897fe7 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -52,6 +52,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
+TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index b66910702c0a..f51d6fab8e93 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -184,5 +184,8 @@
 
 /* hypercall options */
 #define HV_HYPERCALL_FAST_BIT		BIT(16)
+#define HV_HYPERCALL_VARHEAD_OFFSET	17
+
+#define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
 
 #endif /* !SELFTEST_KVM_HYPERV_H */
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
index 672915ce73d8..98c020356925 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
@@ -14,7 +14,6 @@
 #include "hyperv.h"
 
 #define VCPU_ID 0
-#define LINUX_OS_ID ((u64)0x8100 << 48)
 
 extern unsigned char rdmsr_start;
 extern unsigned char rdmsr_end;
@@ -127,7 +126,7 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
 	int i = 0;
 	u64 res, input, output;
 
-	wrmsr(HV_X64_MSR_GUEST_OS_ID, LINUX_OS_ID);
+	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
 	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
 
 	while (hcall->control) {
@@ -230,7 +229,7 @@ static void guest_test_msrs_access(void)
 			 */
 			msr->idx = HV_X64_MSR_GUEST_OS_ID;
 			msr->write = 1;
-			msr->write_val = LINUX_OS_ID;
+			msr->write_val = HYPERV_LINUX_OS_ID;
 			msr->available = 1;
 			break;
 		case 3:
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
new file mode 100644
index 000000000000..075963c32d45
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
@@ -0,0 +1,374 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V HvCallSendSyntheticClusterIpi{,Ex} tests
+ *
+ * Copyright (C) 2022, Red Hat, Inc.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+
+#include "kvm_util.h"
+#include "hyperv.h"
+#include "processor.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#define SENDER_VCPU_ID   1
+#define RECEIVER_VCPU_ID_1 2
+#define RECEIVER_VCPU_ID_2 65
+
+#define IPI_VECTOR	 0xfe
+
+static volatile uint64_t ipis_rcvd[RECEIVER_VCPU_ID_2 + 1];
+
+struct thread_params {
+	struct kvm_vm *vm;
+	uint32_t vcpu_id;
+};
+
+struct hv_vpset {
+	u64 format;
+	u64 valid_bank_mask;
+	u64 bank_contents[2];
+};
+
+enum HV_GENERIC_SET_FORMAT {
+	HV_GENERIC_SET_SPARSE_4K,
+	HV_GENERIC_SET_ALL,
+};
+
+/* HvCallSendSyntheticClusterIpi hypercall */
+struct hv_send_ipi {
+	u32 vector;
+	u32 reserved;
+	u64 cpu_mask;
+};
+
+/* HvCallSendSyntheticClusterIpiEx hypercall */
+struct hv_send_ipi_ex {
+	u32 vector;
+	u32 reserved;
+	struct hv_vpset vp_set;
+};
+
+static inline void hv_init(vm_vaddr_t pgs_gpa)
+{
+	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
+}
+
+static void receiver_code(void *hcall_page, vm_vaddr_t pgs_gpa)
+{
+	u32 vcpu_id;
+
+	x2apic_enable();
+	hv_init(pgs_gpa);
+
+	vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
+
+	/* Signal sender vCPU we're ready */
+	ipis_rcvd[vcpu_id] = (u64)-1;
+
+	for (;;)
+		asm volatile("sti; hlt; cli");
+}
+
+static void guest_ipi_handler(struct ex_regs *regs)
+{
+	u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
+
+	ipis_rcvd[vcpu_id]++;
+	wrmsr(HV_X64_MSR_EOI, 1);
+}
+
+static inline u64 hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
+{
+	u64 hv_status;
+
+	asm volatile("mov %3, %%r8\n"
+		     "vmcall"
+		     : "=a" (hv_status),
+		       "+c" (control), "+d" (arg1)
+		     :  "r" (arg2)
+		     : "cc", "memory", "r8", "r9", "r10", "r11");
+
+	return hv_status;
+}
+
+static inline void nop_loop(void)
+{
+	int i;
+
+	for (i = 0; i < 100000000; i++)
+		asm volatile("nop");
+}
+
+static inline void sync_to_xmm(void *data)
+{
+	int i;
+
+	for (i = 0; i < 8; i++)
+		write_sse_reg(i, (sse128_t *)(data + sizeof(sse128_t) * i));
+}
+
+static void sender_guest_code(void *hcall_page, vm_vaddr_t pgs_gpa)
+{
+	struct hv_send_ipi *ipi = (struct hv_send_ipi *)hcall_page;
+	struct hv_send_ipi_ex *ipi_ex = (struct hv_send_ipi_ex *)hcall_page;
+	int stage = 1, ipis_expected[2] = {0};
+	u64 res;
+
+	hv_init(pgs_gpa);
+	GUEST_SYNC(stage++);
+
+	/* Wait for receiver vCPUs to come up */
+	while (!ipis_rcvd[RECEIVER_VCPU_ID_1] || !ipis_rcvd[RECEIVER_VCPU_ID_2])
+		nop_loop();
+	ipis_rcvd[RECEIVER_VCPU_ID_1] = ipis_rcvd[RECEIVER_VCPU_ID_2] = 0;
+
+	/* 'Slow' HvCallSendSyntheticClusterIpi to RECEIVER_VCPU_ID_1 */
+	ipi->vector = IPI_VECTOR;
+	ipi->cpu_mask = 1 << RECEIVER_VCPU_ID_1;
+	res = hypercall(HVCALL_SEND_IPI, pgs_gpa, pgs_gpa + 4096);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+	GUEST_SYNC(stage++);
+	/* 'Fast' HvCallSendSyntheticClusterIpi to RECEIVER_VCPU_ID_1 */
+	res = hypercall(HVCALL_SEND_IPI | HV_HYPERCALL_FAST_BIT,
+			IPI_VECTOR, 1 << RECEIVER_VCPU_ID_1);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+	GUEST_SYNC(stage++);
+
+	/* 'Slow' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_1 */
+	memset(hcall_page, 0, 4096);
+	ipi_ex->vector = IPI_VECTOR;
+	ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+	ipi_ex->vp_set.valid_bank_mask = 1 << 0;
+	ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_1);
+	res = hypercall(HVCALL_SEND_IPI_EX | (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+			pgs_gpa, pgs_gpa + 4096);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+	GUEST_SYNC(stage++);
+	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_1 */
+	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
+	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
+			(1 << HV_HYPERCALL_VARHEAD_OFFSET),
+			IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+	GUEST_SYNC(stage++);
+
+	/* 'Slow' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_2 */
+	memset(hcall_page, 0, 4096);
+	ipi_ex->vector = IPI_VECTOR;
+	ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+	ipi_ex->vp_set.valid_bank_mask = 1 << 1;
+	ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_2 - 64);
+	res = hypercall(HVCALL_SEND_IPI_EX | (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+			pgs_gpa, pgs_gpa + 4096);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+	GUEST_SYNC(stage++);
+	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_2 */
+	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
+	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
+			(1 << HV_HYPERCALL_VARHEAD_OFFSET),
+			IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+	GUEST_SYNC(stage++);
+
+	/* 'Slow' HvCallSendSyntheticClusterIpiEx to both RECEIVER_VCPU_ID_{1,2} */
+	memset(hcall_page, 0, 4096);
+	ipi_ex->vector = IPI_VECTOR;
+	ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+	ipi_ex->vp_set.valid_bank_mask = 1 << 1 | 1;
+	ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_1);
+	ipi_ex->vp_set.bank_contents[1] = BIT(RECEIVER_VCPU_ID_2 - 64);
+	res = hypercall(HVCALL_SEND_IPI_EX | (2 << HV_HYPERCALL_VARHEAD_OFFSET),
+			pgs_gpa, pgs_gpa + 4096);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+	GUEST_SYNC(stage++);
+	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to both RECEIVER_VCPU_ID_{1, 2} */
+	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
+	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
+			(2 << HV_HYPERCALL_VARHEAD_OFFSET),
+			IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+	GUEST_SYNC(stage++);
+
+	/* 'Slow' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL */
+	memset(hcall_page, 0, 4096);
+	ipi_ex->vector = IPI_VECTOR;
+	ipi_ex->vp_set.format = HV_GENERIC_SET_ALL;
+	res = hypercall(HVCALL_SEND_IPI_EX,
+			pgs_gpa, pgs_gpa + 4096);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+	GUEST_SYNC(stage++);
+	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL */
+	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
+	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT,
+			IPI_VECTOR, HV_GENERIC_SET_ALL);
+	GUEST_ASSERT((res & 0xffff) == 0);
+	nop_loop();
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+	GUEST_SYNC(stage++);
+
+	GUEST_DONE();
+}
+
+static void *vcpu_thread(void *arg)
+{
+	struct thread_params *params = (struct thread_params *)arg;
+	struct ucall uc;
+	int old;
+	int r;
+	unsigned int exit_reason;
+
+	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+	TEST_ASSERT(r == 0,
+		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
+		    params->vcpu_id, r);
+
+	vcpu_run(params->vm, params->vcpu_id);
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+
+	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
+		TEST_ASSERT(false,
+			    "vCPU %u exited with error: %s.\n",
+			    params->vcpu_id, (const char *)uc.args[0]);
+	}
+
+	return NULL;
+}
+
+static void cancel_join_vcpu_thread(pthread_t thread, uint32_t vcpu_id)
+{
+	void *retval;
+	int r;
+
+	r = pthread_cancel(thread);
+	TEST_ASSERT(r == 0,
+		    "pthread_cancel on vcpu_id=%d failed with errno=%d",
+		    vcpu_id, r);
+
+	r = pthread_join(thread, &retval);
+	TEST_ASSERT(r == 0,
+		    "pthread_join on vcpu_id=%d failed with errno=%d",
+		    vcpu_id, r);
+	TEST_ASSERT(retval == PTHREAD_CANCELED,
+		    "expected retval=%p, got %p", PTHREAD_CANCELED,
+		    retval);
+}
+
+int main(int argc, char *argv[])
+{
+	int r;
+	pthread_t threads[2];
+	struct thread_params params[2];
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	vm_vaddr_t hcall_page;
+	struct ucall uc;
+	int stage = 1;
+
+	vm = vm_create_default(SENDER_VCPU_ID, 0, sender_guest_code);
+	params[0].vm = vm;
+	params[1].vm = vm;
+
+	/* Hypercall input/output */
+	hcall_page = vm_vaddr_alloc_pages(vm, 2);
+	memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
+
+	vm_init_descriptor_tables(vm);
+
+	vm_vcpu_add_default(vm, RECEIVER_VCPU_ID_1, receiver_code);
+	vcpu_init_descriptor_tables(vm, RECEIVER_VCPU_ID_1);
+	vcpu_args_set(vm, RECEIVER_VCPU_ID_1, 2, hcall_page, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_msr(vm, RECEIVER_VCPU_ID_1, HV_X64_MSR_VP_INDEX, RECEIVER_VCPU_ID_1);
+	vcpu_set_hv_cpuid(vm, RECEIVER_VCPU_ID_1);
+
+	vm_vcpu_add_default(vm, RECEIVER_VCPU_ID_2, receiver_code);
+	vcpu_init_descriptor_tables(vm, RECEIVER_VCPU_ID_2);
+	vcpu_args_set(vm, RECEIVER_VCPU_ID_2, 2, hcall_page, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_msr(vm, RECEIVER_VCPU_ID_2, HV_X64_MSR_VP_INDEX, RECEIVER_VCPU_ID_2);
+	vcpu_set_hv_cpuid(vm, RECEIVER_VCPU_ID_2);
+
+	vm_install_exception_handler(vm, IPI_VECTOR, guest_ipi_handler);
+
+	vcpu_args_set(vm, SENDER_VCPU_ID, 2, hcall_page, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_hv_cpuid(vm, SENDER_VCPU_ID);
+
+	params[0].vcpu_id = RECEIVER_VCPU_ID_1;
+	r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
+	TEST_ASSERT(r == 0,
+		    "pthread_create halter failed errno=%d", errno);
+
+	params[1].vcpu_id = RECEIVER_VCPU_ID_2;
+	r = pthread_create(&threads[1], NULL, vcpu_thread, &params[1]);
+	TEST_ASSERT(r == 0,
+		    "pthread_create halter failed errno=%d", errno);
+
+	run = vcpu_state(vm, SENDER_VCPU_ID);
+
+	while (true) {
+		r = _vcpu_run(vm, SENDER_VCPU_ID);
+		TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "unexpected exit reason: %u (%s)",
+			    run->exit_reason, exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, SENDER_VCPU_ID, &uc)) {
+		case UCALL_SYNC:
+			TEST_ASSERT(uc.args[1] == stage,
+				    "Unexpected stage: %ld (%d expected)\n",
+				    uc.args[1], stage);
+			break;
+		case UCALL_ABORT:
+			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+				  __FILE__, uc.args[1]);
+			return 1;
+		case UCALL_DONE:
+			return 0;
+		}
+
+		stage++;
+	}
+
+	cancel_join_vcpu_thread(threads[0], RECEIVER_VCPU_ID_1);
+	cancel_join_vcpu_thread(threads[1], RECEIVER_VCPU_ID_2);
+	kvm_vm_free(vm);
+
+	return 0;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 25/34] KVM: selftests: Make it possible to replace PTEs with __virt_pg_map()
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (23 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 24/34] KVM: selftests: Hyper-V PV IPI selftest Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 11:34   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest Vitaly Kuznetsov
                   ` (9 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

__virt_pg_map() makes an assumption that leaf PTE is not present. This
is not suitable if the test wants to replace an already present
PTE. Hyper-V PV TLB flush test is going to need that.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/processor.h | 2 +-
 tools/testing/selftests/kvm/lib/x86_64/processor.c     | 6 +++---
 tools/testing/selftests/kvm/max_guest_memory_test.c    | 2 +-
 tools/testing/selftests/kvm/x86_64/mmu_role_test.c     | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 9ad7602a257b..c20b18d05119 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -473,7 +473,7 @@ enum x86_page_size {
 	X86_PAGE_SIZE_1G,
 };
 void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
-		   enum x86_page_size page_size);
+		   enum x86_page_size page_size, bool replace);
 
 /*
  * Basic CPU control in CR0
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f000dfb5594..20df3e84d777 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -229,7 +229,7 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
 }
 
 void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
-		   enum x86_page_size page_size)
+		   enum x86_page_size page_size, bool replace)
 {
 	const uint64_t pg_size = 1ull << ((page_size * 9) + 12);
 	struct pageUpperEntry *pml4e, *pdpe, *pde;
@@ -270,7 +270,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 
 	/* Fill in page table entry. */
 	pte = virt_get_pte(vm, pde->pfn, vaddr, 0);
-	TEST_ASSERT(!pte->present,
+	TEST_ASSERT(replace || !pte->present,
 		    "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
 	pte->pfn = paddr >> vm->page_shift;
 	pte->writable = true;
@@ -279,7 +279,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 
 void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
 {
-	__virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K);
+	__virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K, false);
 }
 
 static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid,
diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c
index 3875c4b23a04..437f77633b0e 100644
--- a/tools/testing/selftests/kvm/max_guest_memory_test.c
+++ b/tools/testing/selftests/kvm/max_guest_memory_test.c
@@ -244,7 +244,7 @@ int main(int argc, char *argv[])
 #ifdef __x86_64__
 		/* Identity map memory in the guest using 1gb pages. */
 		for (i = 0; i < slot_size; i += size_1gb)
-			__virt_pg_map(vm, gpa + i, gpa + i, X86_PAGE_SIZE_1G);
+			__virt_pg_map(vm, gpa + i, gpa + i, X86_PAGE_SIZE_1G, false);
 #else
 		for (i = 0; i < slot_size; i += vm_get_page_size(vm))
 			virt_pg_map(vm, gpa + i, gpa + i);
diff --git a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
index da2325fcad87..e3fdf320b9f4 100644
--- a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
+++ b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
@@ -35,7 +35,7 @@ static void mmu_role_test(u32 *cpuid_reg, u32 evil_cpuid_val)
 	run = vcpu_state(vm, VCPU_ID);
 
 	/* Map 1gb page without a backing memlot. */
-	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, X86_PAGE_SIZE_1G);
+	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, X86_PAGE_SIZE_1G, false);
 
 	r = _vcpu_run(vm, VCPU_ID);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (24 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 25/34] KVM: selftests: Make it possible to replace PTEs with __virt_pg_map() Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:17   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 27/34] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Introduce a selftest for Hyper-V PV TLB flush hypercalls
(HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).

The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
loop reading from a certain GVA checking the observed value. Sender
vCPU drops to the host to swap the data page with another page filled
with a different value. The expectation for workers is also
altered. Without TLB flush on worker vCPUs, they may continue to
observe old value. To guard against accidental TLB flushes for worker
vCPUs the test is repeated 100 times.

Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
fast' modes.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/hyperv.h     |   1 +
 .../selftests/kvm/x86_64/hyperv_tlb_flush.c   | 647 ++++++++++++++++++
 4 files changed, 650 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 5d5fbb161d56..1a1d09e414d5 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -25,6 +25,7 @@
 /x86_64/hyperv_features
 /x86_64/hyperv_ipi
 /x86_64/hyperv_svm_test
+/x86_64/hyperv_tlb_flush
 /x86_64/mmio_warning_test
 /x86_64/mmu_role_test
 /x86_64/platform_info_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 44889f897fe7..8b83abc09a1a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -54,6 +54,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
+TEST_GEN_PROGS_x86_64 += x86_64/hyperv_tlb_flush
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index f51d6fab8e93..1e34dd7c5075 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -185,6 +185,7 @@
 /* hypercall options */
 #define HV_HYPERCALL_FAST_BIT		BIT(16)
 #define HV_HYPERCALL_VARHEAD_OFFSET	17
+#define HV_HYPERCALL_REP_COMP_OFFSET	32
 
 #define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
 
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
new file mode 100644
index 000000000000..00bcae45ddd2
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
@@ -0,0 +1,647 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V HvFlushVirtualAddress{List,Space}{,Ex} tests
+ *
+ * Copyright (C) 2022, Red Hat, Inc.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+
+#include "kvm_util.h"
+#include "hyperv.h"
+#include "processor.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#define SENDER_VCPU_ID   1
+#define WORKER_VCPU_ID_1 2
+#define WORKER_VCPU_ID_2 65
+
+#define NTRY 100
+
+struct thread_params {
+	struct kvm_vm *vm;
+	uint32_t vcpu_id;
+};
+
+struct hv_vpset {
+	u64 format;
+	u64 valid_bank_mask;
+	u64 bank_contents[];
+};
+
+enum HV_GENERIC_SET_FORMAT {
+	HV_GENERIC_SET_SPARSE_4K,
+	HV_GENERIC_SET_ALL,
+};
+
+#define HV_FLUSH_ALL_PROCESSORS			BIT(0)
+#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	BIT(1)
+#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	BIT(2)
+#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	BIT(3)
+
+/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
+struct hv_tlb_flush {
+	u64 address_space;
+	u64 flags;
+	u64 processor_mask;
+	u64 gva_list[];
+} __packed;
+
+/* HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressListEx hypercalls */
+struct hv_tlb_flush_ex {
+	u64 address_space;
+	u64 flags;
+	struct hv_vpset hv_vp_set;
+	u64 gva_list[];
+} __packed;
+
+static inline void hv_init(vm_vaddr_t pgs_gpa)
+{
+	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
+}
+
+static void worker_code(void *test_pages, vm_vaddr_t pgs_gpa)
+{
+	u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
+	unsigned char chr;
+
+	x2apic_enable();
+	hv_init(pgs_gpa);
+
+	for (;;) {
+		chr = READ_ONCE(*(unsigned char *)(test_pages + 4096 * 2 + vcpu_id));
+		if (chr)
+			GUEST_ASSERT(*(unsigned char *)test_pages == chr);
+		asm volatile("nop");
+	}
+}
+
+static inline u64 hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
+{
+	u64 hv_status;
+
+	asm volatile("mov %3, %%r8\n"
+		     "vmcall"
+		     : "=a" (hv_status),
+		       "+c" (control), "+d" (arg1)
+		     :  "r" (arg2)
+		     : "cc", "memory", "r8", "r9", "r10", "r11");
+
+	return hv_status;
+}
+
+static inline void nop_loop(void)
+{
+	int i;
+
+	for (i = 0; i < 10000000; i++)
+		asm volatile("nop");
+}
+
+static inline void sync_to_xmm(void *data)
+{
+	int i;
+
+	for (i = 0; i < 8; i++)
+		write_sse_reg(i, (sse128_t *)(data + sizeof(sse128_t) * i));
+}
+
+static void set_expected_char(void *addr, unsigned char chr, int vcpu_id)
+{
+	asm volatile("mfence");
+	*(unsigned char *)(addr + 2 * 4096 + vcpu_id) = chr;
+}
+
+static void sender_guest_code(void *hcall_page, void *test_pages, vm_vaddr_t pgs_gpa)
+{
+	struct hv_tlb_flush *flush = (struct hv_tlb_flush *)hcall_page;
+	struct hv_tlb_flush_ex *flush_ex = (struct hv_tlb_flush_ex *)hcall_page;
+	int stage = 1, i;
+	u64 res;
+
+	hv_init(pgs_gpa);
+
+	/* "Slow" hypercalls */
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+		flush->gva_list[0] = (u64)test_pages;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
+		flush->processor_mask = 0;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
+		flush->gva_list[0] = (u64)test_pages;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+				(1 << HV_HYPERCALL_VARHEAD_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		/* bank_contents and gva_list occupy the same space, thus [1] */
+		flush_ex->gva_list[1] = (u64)test_pages;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+				(1 << HV_HYPERCALL_VARHEAD_OFFSET) |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
+			BIT_ULL(WORKER_VCPU_ID_1 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+				(2 << HV_HYPERCALL_VARHEAD_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
+			BIT_ULL(WORKER_VCPU_ID_2 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		/* bank_contents and gva_list occupy the same space, thus [2] */
+		flush_ex->gva_list[2] = (u64)test_pages;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+				(2 << HV_HYPERCALL_VARHEAD_OFFSET) |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX,
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+		flush_ex->gva_list[0] = (u64)test_pages;
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				pgs_gpa, pgs_gpa + 4096);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* "Fast" hypercalls */
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+		sync_to_xmm(&flush->processor_mask);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
+				HV_HYPERCALL_FAST_BIT, 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+		flush->gva_list[0] = (u64)test_pages;
+		sync_to_xmm(&flush->processor_mask);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST | HV_HYPERCALL_FAST_BIT |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		sync_to_xmm(&flush->processor_mask);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+				HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush->gva_list[0] = (u64)test_pages;
+		sync_to_xmm(&flush->processor_mask);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST | HV_HYPERCALL_FAST_BIT |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET), 0x0,
+				HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		sync_to_xmm(&flush_ex->hv_vp_set);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT |
+				(1 << HV_HYPERCALL_VARHEAD_OFFSET),
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		/* bank_contents and gva_list occupy the same space, thus [1] */
+		flush_ex->gva_list[1] = (u64)test_pages;
+		sync_to_xmm(&flush_ex->hv_vp_set);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
+				(1 << HV_HYPERCALL_VARHEAD_OFFSET) |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
+			BIT_ULL(WORKER_VCPU_ID_1 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		sync_to_xmm(&flush_ex->hv_vp_set);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT |
+				(2 << HV_HYPERCALL_VARHEAD_OFFSET),
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
+			BIT_ULL(WORKER_VCPU_ID_2 / 64);
+		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+		/* bank_contents and gva_list occupy the same space, thus [2] */
+		flush_ex->gva_list[2] = (u64)test_pages;
+		sync_to_xmm(&flush_ex->hv_vp_set);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
+				(2 << HV_HYPERCALL_VARHEAD_OFFSET) |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+		sync_to_xmm(&flush_ex->hv_vp_set);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT,
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
+	for (i = 0; i < NTRY; i++) {
+		memset(hcall_page, 0, 4096);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
+		GUEST_SYNC(stage++);
+		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+		flush_ex->gva_list[0] = (u64)test_pages;
+		sync_to_xmm(&flush_ex->hv_vp_set);
+		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
+				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+		GUEST_ASSERT((res & 0xffff) == 0);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
+		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
+		nop_loop();
+	}
+
+	GUEST_DONE();
+}
+
+static void *vcpu_thread(void *arg)
+{
+	struct thread_params *params = (struct thread_params *)arg;
+	struct ucall uc;
+	int old;
+	int r;
+	unsigned int exit_reason;
+
+	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+	TEST_ASSERT(r == 0,
+		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
+		    params->vcpu_id, r);
+
+	vcpu_run(params->vm, params->vcpu_id);
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+
+	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
+		TEST_ASSERT(false,
+			    "vCPU %u exited with error: %s.\n",
+			    params->vcpu_id, (const char *)uc.args[0]);
+	}
+
+	return NULL;
+}
+
+static void cancel_join_vcpu_thread(pthread_t thread, uint32_t vcpu_id)
+{
+	void *retval;
+	int r;
+
+	r = pthread_cancel(thread);
+	TEST_ASSERT(r == 0,
+		    "pthread_cancel on vcpu_id=%d failed with errno=%d",
+		    vcpu_id, r);
+
+	r = pthread_join(thread, &retval);
+	TEST_ASSERT(r == 0,
+		    "pthread_join on vcpu_id=%d failed with errno=%d",
+		    vcpu_id, r);
+	TEST_ASSERT(retval == PTHREAD_CANCELED,
+		    "expected retval=%p, got %p", PTHREAD_CANCELED,
+		    retval);
+}
+
+int main(int argc, char *argv[])
+{
+	int r;
+	pthread_t threads[2];
+	struct thread_params params[2];
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	vm_vaddr_t hcall_page, test_pages;
+	struct ucall uc;
+	int stage = 1;
+
+	vm = vm_create_default(SENDER_VCPU_ID, 0, sender_guest_code);
+	params[0].vm = vm;
+	params[1].vm = vm;
+
+	/* Hypercall input/output */
+	hcall_page = vm_vaddr_alloc_pages(vm, 2);
+	memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
+
+	/*
+	 * Test pages: the first one is filled with '0x1's, the second with '0x2's
+	 * and the test will swap their mappings. The third page keeps the indication
+	 * about the current state of mappings.
+	 */
+	test_pages = vm_vaddr_alloc_pages(vm, 3);
+	memset(addr_gva2hva(vm, test_pages), 0x1, 4096);
+	memset(addr_gva2hva(vm, test_pages) + 4096, 0x2, 4096);
+	set_expected_char(addr_gva2hva(vm, test_pages), 0x0, WORKER_VCPU_ID_1);
+	set_expected_char(addr_gva2hva(vm, test_pages), 0x0, WORKER_VCPU_ID_2);
+
+	vm_vcpu_add_default(vm, WORKER_VCPU_ID_1, worker_code);
+	vcpu_args_set(vm, WORKER_VCPU_ID_1, 2, test_pages, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_msr(vm, WORKER_VCPU_ID_1, HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_1);
+	vcpu_set_hv_cpuid(vm, WORKER_VCPU_ID_1);
+
+	vm_vcpu_add_default(vm, WORKER_VCPU_ID_2, worker_code);
+	vcpu_args_set(vm, WORKER_VCPU_ID_2, 2, test_pages, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_msr(vm, WORKER_VCPU_ID_2, HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_2);
+	vcpu_set_hv_cpuid(vm, WORKER_VCPU_ID_2);
+
+	vcpu_args_set(vm, SENDER_VCPU_ID, 3, hcall_page, test_pages,
+		      addr_gva2gpa(vm, hcall_page));
+	vcpu_set_hv_cpuid(vm, SENDER_VCPU_ID);
+
+	params[0].vcpu_id = WORKER_VCPU_ID_1;
+	r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
+	TEST_ASSERT(r == 0,
+		    "pthread_create halter failed errno=%d", errno);
+
+	params[1].vcpu_id = WORKER_VCPU_ID_2;
+	r = pthread_create(&threads[1], NULL, vcpu_thread, &params[1]);
+	TEST_ASSERT(r == 0,
+		    "pthread_create halter failed errno=%d", errno);
+
+	run = vcpu_state(vm, SENDER_VCPU_ID);
+
+	while (true) {
+		r = _vcpu_run(vm, SENDER_VCPU_ID);
+		TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "unexpected exit reason: %u (%s)",
+			    run->exit_reason, exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, SENDER_VCPU_ID, &uc)) {
+		case UCALL_SYNC:
+			TEST_ASSERT(uc.args[1] == stage,
+				    "Unexpected stage: %ld (%d expected)\n",
+				    uc.args[1], stage);
+			break;
+		case UCALL_ABORT:
+			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+				  __FILE__, uc.args[1]);
+			return 1;
+		case UCALL_DONE:
+			return 0;
+		}
+
+		/* Swap test pages */
+		if (stage % 2) {
+			__virt_pg_map(vm, test_pages, addr_gva2gpa(vm, test_pages) + 4096,
+				      X86_PAGE_SIZE_4K, true);
+			__virt_pg_map(vm, test_pages + 4096, addr_gva2gpa(vm, test_pages) - 4096,
+				      X86_PAGE_SIZE_4K, true);
+		} else {
+			__virt_pg_map(vm, test_pages, addr_gva2gpa(vm, test_pages) - 4096,
+				      X86_PAGE_SIZE_4K, true);
+			__virt_pg_map(vm, test_pages + 4096, addr_gva2gpa(vm, test_pages) + 4096,
+				      X86_PAGE_SIZE_4K, true);
+		}
+
+		stage++;
+	}
+
+	cancel_join_vcpu_thread(threads[0], WORKER_VCPU_ID_1);
+	cancel_join_vcpu_thread(threads[1], WORKER_VCPU_ID_2);
+	kvm_vm_free(vm);
+
+	return 0;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 27/34] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (25 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:17   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 28/34] KVM: selftests: nVMX: Allocate Hyper-V partition assist page Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

'struct hv_enlightened_vmcs' definition in selftests is not '__packed'
and so we rely on the compiler doing the right padding. This is not
obvious so it seems beneficial to use the same definition as in kernel.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/evmcs.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index cc5d14a45702..b6067b555110 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -41,6 +41,8 @@ struct hv_enlightened_vmcs {
 	u16 host_gs_selector;
 	u16 host_tr_selector;
 
+	u16 padding16_1;
+
 	u64 host_ia32_pat;
 	u64 host_ia32_efer;
 
@@ -159,7 +161,7 @@ struct hv_enlightened_vmcs {
 	u64 ept_pointer;
 
 	u16 virtual_processor_id;
-	u16 padding16[3];
+	u16 padding16_2[3];
 
 	u64 padding64_2[5];
 	u64 guest_physical_address;
@@ -195,15 +197,15 @@ struct hv_enlightened_vmcs {
 	u64 guest_rip;
 
 	u32 hv_clean_fields;
-	u32 hv_padding_32;
+	u32 padding32_1;
 	u32 hv_synthetic_controls;
 	struct {
 		u32 nested_flush_hypercall:1;
 		u32 msr_bitmap:1;
 		u32 reserved:30;
-	} hv_enlightenments_control;
+	}  __packed hv_enlightenments_control;
 	u32 hv_vp_id;
-
+	u32 padding32_2;
 	u64 hv_vm_id;
 	u64 partition_assist_page;
 	u64 padding64_4[4];
@@ -211,7 +213,7 @@ struct hv_enlightened_vmcs {
 	u64 padding64_5[7];
 	u64 xss_exit_bitmap;
 	u64 padding64_6[7];
-};
+} __packed;
 
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE                     0
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP                BIT(0)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 28/34] KVM: selftests: nVMX: Allocate Hyper-V partition assist page
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (26 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 27/34] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:17   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 29/34] KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist pages Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate
so-called Partition assist page and link it to 'struct vmx_pages'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/vmx.h | 4 ++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 583ceb0d1457..f99922ca8259 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -567,6 +567,10 @@ struct vmx_pages {
 	uint64_t enlightened_vmcs_gpa;
 	void *enlightened_vmcs;
 
+	void *partition_assist_hva;
+	uint64_t partition_assist_gpa;
+	void *partition_assist;
+
 	void *eptp_hva;
 	uint64_t eptp_gpa;
 	void *eptp;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index d089d8b850b5..3db21e0e1a8f 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -124,6 +124,13 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
 	vmx->enlightened_vmcs_gpa =
 		addr_gva2gpa(vm, (uintptr_t)vmx->enlightened_vmcs);
 
+	/* Setup of a region of guest memory for the partition assist page. */
+	vmx->partition_assist = (void *)vm_vaddr_alloc_page(vm);
+	vmx->partition_assist_hva =
+		addr_gva2hva(vm, (uintptr_t)vmx->partition_assist);
+	vmx->partition_assist_gpa =
+		addr_gva2gpa(vm, (uintptr_t)vmx->partition_assist);
+
 	*p_vmx_gva = vmx_gva;
 	return vmx;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 29/34] KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist pages
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (27 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 28/34] KVM: selftests: nVMX: Allocate Hyper-V partition assist page Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:17   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 30/34] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate VP
assist and Partition assist pages and link them to 'struct svm_test_data'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/svm_util.h | 10 ++++++++++
 tools/testing/selftests/kvm/lib/x86_64/svm.c          | 10 ++++++++++
 2 files changed, 20 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index a25aabd8f5e7..640859b58fd6 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -34,6 +34,16 @@ struct svm_test_data {
 	void *msr; /* gva */
 	void *msr_hva;
 	uint64_t msr_gpa;
+
+	/* Hyper-V VP assist page */
+	void *vp_assist; /* gva */
+	void *vp_assist_hva;
+	uint64_t vp_assist_gpa;
+
+	/* Hyper-V Partition assist page */
+	void *partition_assist; /* gva */
+	void *partition_assist_hva;
+	uint64_t partition_assist_gpa;
 };
 
 struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/svm.c b/tools/testing/selftests/kvm/lib/x86_64/svm.c
index 736ee4a23df6..c284e8f87f5c 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/svm.c
@@ -48,6 +48,16 @@ vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva)
 	svm->msr_gpa = addr_gva2gpa(vm, (uintptr_t)svm->msr);
 	memset(svm->msr_hva, 0, getpagesize());
 
+	svm->vp_assist = (void *)vm_vaddr_alloc_page(vm);
+	svm->vp_assist_hva = addr_gva2hva(vm, (uintptr_t)svm->vp_assist);
+	svm->vp_assist_gpa = addr_gva2gpa(vm, (uintptr_t)svm->vp_assist);
+	memset(svm->vp_assist_hva, 0, getpagesize());
+
+	svm->partition_assist = (void *)vm_vaddr_alloc_page(vm);
+	svm->partition_assist_hva = addr_gva2hva(vm, (uintptr_t)svm->partition_assist);
+	svm->partition_assist_gpa = addr_gva2gpa(vm, (uintptr_t)svm->partition_assist);
+	memset(svm->partition_assist_hva, 0, getpagesize());
+
 	*p_svm_gva = svm_gva;
 	return svm;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 30/34] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (28 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 29/34] KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist pages Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:18   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 31/34] KVM: selftests: evmcs_test: Introduce L2 TLB flush test Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

'struct hv_vp_assist_page' definition doesn't match TLFS. Also, define
'struct hv_nested_enlightenments_control' and use it instead of opaque
'__u64'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/include/x86_64/evmcs.h      | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index b6067b555110..9c965ba73dec 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -20,14 +20,26 @@
 
 extern bool enable_evmcs;
 
+struct hv_nested_enlightenments_control {
+	struct {
+		__u32 directhypercall:1;
+		__u32 reserved:31;
+	} features;
+	struct {
+		__u32 reserved;
+	} hypercallControls;
+} __packed;
+
+/* Define virtual processor assist page structure. */
 struct hv_vp_assist_page {
 	__u32 apic_assist;
-	__u32 reserved;
-	__u64 vtl_control[2];
-	__u64 nested_enlightenments_control[2];
-	__u32 enlighten_vmentry;
+	__u32 reserved1;
+	__u64 vtl_control[3];
+	struct hv_nested_enlightenments_control nested_control;
+	__u8 enlighten_vmentry;
+	__u8 reserved2[7];
 	__u64 current_nested_vmcs;
-};
+} __packed;
 
 struct hv_enlightened_vmcs {
 	u32 revision_id;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 31/34] KVM: selftests: evmcs_test: Introduce L2 TLB flush test
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (29 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 30/34] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:18   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 32/34] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the
Partition assist page.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/include/x86_64/evmcs.h      |  2 +
 .../testing/selftests/kvm/x86_64/evmcs_test.c | 52 ++++++++++++++++++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 9c965ba73dec..36c0a67d8602 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -252,6 +252,8 @@ struct hv_enlightened_vmcs {
 #define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
 		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
 
+#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
+
 extern struct hv_enlightened_vmcs *current_evmcs;
 extern struct hv_vp_assist_page *current_vp_assist;
 
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index d12e043aa2ee..8d2aa7600d78 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -16,6 +16,7 @@
 
 #include "kvm_util.h"
 
+#include "hyperv.h"
 #include "vmx.h"
 
 #define VCPU_ID		5
@@ -49,6 +50,16 @@ static inline void rdmsr_gs_base(void)
 			      "r13", "r14", "r15");
 }
 
+static inline void hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
+{
+	asm volatile("mov %3, %%r8\n"
+		     "vmcall"
+		     : "+c" (control), "+d" (arg1)
+		     :  "r" (arg2)
+		     : "cc", "memory", "rax", "rbx", "r8", "r9", "r10",
+		       "r11", "r12", "r13", "r14", "r15");
+}
+
 void l2_guest_code(void)
 {
 	GUEST_SYNC(7);
@@ -67,15 +78,27 @@ void l2_guest_code(void)
 	vmcall();
 	rdmsr_gs_base(); /* intercepted */
 
+	/* L2 TLB flush tests */
+	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+	rdmsr_fs_base();
+	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+	/* Make sure we're no issuing Hyper-V TLB flush call again */
+	__asm__ __volatile__ ("mov $0xdeadbeef, %rcx");
+
 	/* Done, exit to L1 and never come back.  */
 	vmcall();
 }
 
-void guest_code(struct vmx_pages *vmx_pages)
+void guest_code(struct vmx_pages *vmx_pages, vm_vaddr_t pgs_gpa)
 {
 #define L2_GUEST_STACK_SIZE 64
 	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
 
+	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
+
 	x2apic_enable();
 
 	GUEST_SYNC(1);
@@ -105,6 +128,14 @@ void guest_code(struct vmx_pages *vmx_pages)
 	vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmreadz(PIN_BASED_VM_EXEC_CONTROL) |
 		PIN_BASED_NMI_EXITING);
 
+	/* L2 TLB flush setup */
+	current_evmcs->partition_assist_page = vmx_pages->partition_assist_gpa;
+	current_evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;
+	current_evmcs->hv_vm_id = 1;
+	current_evmcs->hv_vp_id = 1;
+	current_vp_assist->nested_control.features.directhypercall = 1;
+	*(u32 *)(vmx_pages->partition_assist) = 0;
+
 	GUEST_ASSERT(!vmlaunch());
 	GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);
 
@@ -149,6 +180,18 @@ void guest_code(struct vmx_pages *vmx_pages)
 	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_MSR_READ);
 	current_evmcs->guest_rip += 2; /* rdmsr */
 
+	/*
+	 * L2 TLB flush test. First VMCALL should be handled directly by L0,
+	 * no VMCALL exit expected.
+	 */
+	GUEST_ASSERT(!vmresume());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_MSR_READ);
+	current_evmcs->guest_rip += 2; /* rdmsr */
+	/* Enable synthetic vmexit */
+	*(u32 *)(vmx_pages->partition_assist) = 1;
+	GUEST_ASSERT(!vmresume());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH);
+
 	GUEST_ASSERT(!vmresume());
 	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
 	GUEST_SYNC(11);
@@ -201,6 +244,7 @@ static void save_restore_vm(struct kvm_vm *vm)
 int main(int argc, char *argv[])
 {
 	vm_vaddr_t vmx_pages_gva = 0;
+	vm_vaddr_t hcall_page;
 
 	struct kvm_vm *vm;
 	struct kvm_run *run;
@@ -217,11 +261,15 @@ int main(int argc, char *argv[])
 		exit(KSFT_SKIP);
 	}
 
+	hcall_page = vm_vaddr_alloc_pages(vm, 1);
+	memset(addr_gva2hva(vm, hcall_page), 0x0,  getpagesize());
+
 	vcpu_set_hv_cpuid(vm, VCPU_ID);
 	vcpu_enable_evmcs(vm, VCPU_ID);
 
 	vcpu_alloc_vmx(vm, &vmx_pages_gva);
-	vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+	vcpu_args_set(vm, VCPU_ID, 2, vmx_pages_gva, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_msr(vm, VCPU_ID, HV_X64_MSR_VP_INDEX, VCPU_ID);
 
 	vm_init_descriptor_tables(vm);
 	vcpu_init_descriptor_tables(vm, VCPU_ID);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 32/34] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (30 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 31/34] KVM: selftests: evmcs_test: Introduce L2 TLB flush test Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:18   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 33/34] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Hyper-V VP assist page is not eVMCS specific, it is also used for
enlightened nSVM. Move the code to vendor neutral place.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/Makefile          |  2 +-
 .../selftests/kvm/include/x86_64/evmcs.h      | 40 +------------------
 .../selftests/kvm/include/x86_64/hyperv.h     | 31 ++++++++++++++
 .../testing/selftests/kvm/lib/x86_64/hyperv.c | 21 ++++++++++
 .../testing/selftests/kvm/x86_64/evmcs_test.c |  1 +
 5 files changed, 56 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/hyperv.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 8b83abc09a1a..ae13aa32f3ce 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -38,7 +38,7 @@ ifeq ($(ARCH),riscv)
 endif
 
 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c
-LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
+LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/hyperv.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
 LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c lib/aarch64/gic_v3.c lib/aarch64/vgic.c
 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
 LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 36c0a67d8602..026586b53013 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -10,6 +10,7 @@
 #define SELFTEST_KVM_EVMCS_H
 
 #include <stdint.h>
+#include "hyperv.h"
 #include "vmx.h"
 
 #define u16 uint16_t
@@ -20,27 +21,6 @@
 
 extern bool enable_evmcs;
 
-struct hv_nested_enlightenments_control {
-	struct {
-		__u32 directhypercall:1;
-		__u32 reserved:31;
-	} features;
-	struct {
-		__u32 reserved;
-	} hypercallControls;
-} __packed;
-
-/* Define virtual processor assist page structure. */
-struct hv_vp_assist_page {
-	__u32 apic_assist;
-	__u32 reserved1;
-	__u64 vtl_control[3];
-	struct hv_nested_enlightenments_control nested_control;
-	__u8 enlighten_vmentry;
-	__u8 reserved2[7];
-	__u64 current_nested_vmcs;
-} __packed;
-
 struct hv_enlightened_vmcs {
 	u32 revision_id;
 	u32 abort;
@@ -246,31 +226,15 @@ struct hv_enlightened_vmcs {
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL    BIT(15)
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL                      0xFFFF
 
-#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
-#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE	0x00000001
-#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT	12
-#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
-		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
-
 #define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
 
 extern struct hv_enlightened_vmcs *current_evmcs;
-extern struct hv_vp_assist_page *current_vp_assist;
 
 int vcpu_enable_evmcs(struct kvm_vm *vm, int vcpu_id);
 
-static inline int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
+static inline void evmcs_enable(void)
 {
-	u64 val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
-		HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
-
-	wrmsr(HV_X64_MSR_VP_ASSIST_PAGE, val);
-
-	current_vp_assist = vp_assist;
-
 	enable_evmcs = true;
-
-	return 0;
 }
 
 static inline int evmcs_vmptrld(uint64_t vmcs_pa, void *vmcs)
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index 1e34dd7c5075..095c15fc5381 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -189,4 +189,35 @@
 
 #define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
 
+#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
+#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE	0x00000001
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT	12
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
+		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
+
+struct hv_nested_enlightenments_control {
+	struct {
+		__u32 directhypercall:1;
+		__u32 reserved:31;
+	} features;
+	struct {
+		__u32 reserved;
+	} hypercallControls;
+} __packed;
+
+/* Define virtual processor assist page structure. */
+struct hv_vp_assist_page {
+	__u32 apic_assist;
+	__u32 reserved1;
+	__u64 vtl_control[3];
+	struct hv_nested_enlightenments_control nested_control;
+	__u8 enlighten_vmentry;
+	__u8 reserved2[7];
+	__u64 current_nested_vmcs;
+} __packed;
+
+extern struct hv_vp_assist_page *current_vp_assist;
+
+int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist);
+
 #endif /* !SELFTEST_KVM_HYPERV_H */
diff --git a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
new file mode 100644
index 000000000000..32dc0afd9e5b
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hyper-V specific functions.
+ *
+ * Copyright (C) 2021, Red Hat Inc.
+ */
+#include <stdint.h>
+#include "processor.h"
+#include "hyperv.h"
+
+int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
+{
+	uint64_t val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
+		HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
+
+	wrmsr(HV_X64_MSR_VP_ASSIST_PAGE, val);
+
+	current_vp_assist = vp_assist;
+
+	return 0;
+}
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 8d2aa7600d78..8fa50e76d557 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -105,6 +105,7 @@ void guest_code(struct vmx_pages *vmx_pages, vm_vaddr_t pgs_gpa)
 	GUEST_SYNC(2);
 
 	enable_vp_assist(vmx_pages->vp_assist_gpa, vmx_pages->vp_assist);
+	evmcs_enable();
 
 	GUEST_ASSERT(vmx_pages->vmcs_gpa);
 	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 33/34] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (31 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 32/34] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:19   ` Maxim Levitsky
  2022-04-14 13:20 ` [PATCH v3 34/34] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush' Vitaly Kuznetsov
  2022-05-03 15:01 ` [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the Partition
assist page.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/x86_64/hyperv_svm_test.c    | 60 +++++++++++++++++--
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c b/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
index 21f5ca9197da..99f0a2ead7df 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
@@ -42,11 +42,24 @@ struct hv_enlightenments {
  */
 #define VMCB_HV_NESTED_ENLIGHTENMENTS (1U << 31)
 
+#define HV_SVM_EXITCODE_ENL 0xF0000000
+#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH   (1)
+
 static inline void vmmcall(void)
 {
 	__asm__ __volatile__("vmmcall");
 }
 
+static inline void hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
+{
+	asm volatile("mov %3, %%r8\n"
+		     "vmmcall"
+		     : "+c" (control), "+d" (arg1)
+		     :  "r" (arg2)
+		     : "cc", "memory", "rax", "rbx", "r8", "r9", "r10",
+		       "r11", "r12", "r13", "r14", "r15");
+}
+
 void l2_guest_code(void)
 {
 	GUEST_SYNC(3);
@@ -62,11 +75,21 @@ void l2_guest_code(void)
 
 	GUEST_SYNC(5);
 
+	/* L2 TLB flush tests */
+	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+	rdmsr(MSR_FS_BASE);
+	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+	/* Make sure we're not issuing Hyper-V TLB flush call again */
+	__asm__ __volatile__ ("mov $0xdeadbeef, %rcx");
+
 	/* Done, exit to L1 and never come back.  */
 	vmmcall();
 }
 
-static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
+static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
+						    vm_vaddr_t pgs_gpa)
 {
 	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
 	struct vmcb *vmcb = svm->vmcb;
@@ -75,13 +98,23 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
 
 	GUEST_SYNC(1);
 
-	wrmsr(HV_X64_MSR_GUEST_OS_ID, (u64)0x8100 << 48);
+	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
+	enable_vp_assist(svm->vp_assist_gpa, svm->vp_assist);
 
 	GUEST_ASSERT(svm->vmcb_gpa);
 	/* Prepare for L2 execution. */
 	generic_svm_setup(svm, l2_guest_code,
 			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
 
+	/* L2 TLB flush setup */
+	hve->partition_assist_page = svm->partition_assist_gpa;
+	hve->hv_enlightenments_control.nested_flush_hypercall = 1;
+	hve->hv_vm_id = 1;
+	hve->hv_vp_id = 1;
+	current_vp_assist->nested_control.features.directhypercall = 1;
+	*(u32 *)(svm->partition_assist) = 0;
+
 	GUEST_SYNC(2);
 	run_guest(vmcb, svm->vmcb_gpa);
 	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
@@ -116,6 +149,20 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
 	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_MSR);
 	vmcb->save.rip += 2; /* rdmsr */
 
+
+	/*
+	 * L2 TLB flush test. First VMCALL should be handled directly by L0,
+	 * no VMCALL exit expected.
+	 */
+	run_guest(vmcb, svm->vmcb_gpa);
+	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_MSR);
+	vmcb->save.rip += 2; /* rdmsr */
+	/* Enable synthetic vmexit */
+	*(u32 *)(svm->partition_assist) = 1;
+	run_guest(vmcb, svm->vmcb_gpa);
+	GUEST_ASSERT(vmcb->control.exit_code == HV_SVM_EXITCODE_ENL);
+	GUEST_ASSERT(vmcb->control.exit_info_1 == HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH);
+
 	run_guest(vmcb, svm->vmcb_gpa);
 	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
 	GUEST_SYNC(6);
@@ -126,7 +173,7 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
 int main(int argc, char *argv[])
 {
 	vm_vaddr_t nested_gva = 0;
-
+	vm_vaddr_t hcall_page;
 	struct kvm_vm *vm;
 	struct kvm_run *run;
 	struct ucall uc;
@@ -141,7 +188,12 @@ int main(int argc, char *argv[])
 	vcpu_set_hv_cpuid(vm, VCPU_ID);
 	run = vcpu_state(vm, VCPU_ID);
 	vcpu_alloc_svm(vm, &nested_gva);
-	vcpu_args_set(vm, VCPU_ID, 1, nested_gva);
+
+	hcall_page = vm_vaddr_alloc_pages(vm, 1);
+	memset(addr_gva2hva(vm, hcall_page), 0x0,  getpagesize());
+
+	vcpu_args_set(vm, VCPU_ID, 2, nested_gva, addr_gva2gpa(vm, hcall_page));
+	vcpu_set_msr(vm, VCPU_ID, HV_X64_MSR_VP_INDEX, VCPU_ID);
 
 	for (stage = 1;; stage++) {
 		_vcpu_run(vm, VCPU_ID);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 34/34] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (32 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 33/34] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test Vitaly Kuznetsov
@ 2022-04-14 13:20 ` Vitaly Kuznetsov
  2022-05-11 12:18   ` Maxim Levitsky
  2022-05-03 15:01 ` [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
  34 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-04-14 13:20 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent,
rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change
eliminates the use of confusing 'direct' and adds the missing underscore.

No functional change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/svm/svm_onhyperv.c    | 2 +-
 arch/x86/kvm/svm/svm_onhyperv.h    | 6 +++---
 arch/x86/kvm/vmx/vmx.c             | 6 +++---
 arch/x86/kvm/x86.c                 | 6 +++---
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 96e4e9842dfc..1e13612a6446 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -121,7 +121,7 @@ KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
 KVM_X86_OP(get_msr_feature)
 KVM_X86_OP(can_emulate_instruction)
 KVM_X86_OP(apic_init_signal_blocked)
-KVM_X86_OP_OPTIONAL(enable_direct_tlbflush)
+KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
 KVM_X86_OP_OPTIONAL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 168600490bd1..f4fd6da1f565 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1526,7 +1526,7 @@ struct kvm_x86_ops {
 					void *insn, int insn_len);
 
 	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
-	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
+	int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
 	void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c
index 8cdc62c74a96..69a7014d1cef 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.c
+++ b/arch/x86/kvm/svm/svm_onhyperv.c
@@ -14,7 +14,7 @@
 #include "kvm_onhyperv.h"
 #include "svm_onhyperv.h"
 
-int svm_hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	struct hv_enlightenments *hve;
 	struct hv_partition_assist_pg **p_hv_pa_pg =
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index e2fc59380465..d6ec4aeebedb 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -13,7 +13,7 @@
 
 static struct kvm_x86_ops svm_x86_ops;
 
-int svm_hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu);
+int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
 
 static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
 {
@@ -51,8 +51,8 @@ static inline void svm_hv_hardware_setup(void)
 
 			vp_ap->nested_control.features.directhypercall = 1;
 		}
-		svm_x86_ops.enable_direct_tlbflush =
-				svm_hv_enable_direct_tlbflush;
+		svm_x86_ops.enable_l2_tlb_flush =
+				svm_hv_enable_l2_tlb_flush;
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a81e44852f54..2b3c73b49dcb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -461,7 +461,7 @@ static unsigned long host_idt_base;
 static bool __read_mostly enlightened_vmcs = true;
 module_param(enlightened_vmcs, bool, 0444);
 
-static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+static int hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	struct hv_enlightened_vmcs *evmcs;
 	struct hv_partition_assist_pg **p_hv_pa_pg =
@@ -8151,8 +8151,8 @@ static int __init vmx_init(void)
 		}
 
 		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
-			vmx_x86_ops.enable_direct_tlbflush
-				= hv_enable_direct_tlbflush;
+			vmx_x86_ops.enable_l2_tlb_flush
+				= hv_enable_l2_tlb_flush;
 
 	} else {
 		enlightened_vmcs = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3839e648ab3..d620c56bc526 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4365,7 +4365,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 			kvm_x86_ops.nested_ops->get_state(NULL, NULL, 0) : 0;
 		break;
 	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
-		r = kvm_x86_ops.enable_direct_tlbflush != NULL;
+		r = kvm_x86_ops.enable_l2_tlb_flush != NULL;
 		break;
 	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
 		r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
@@ -5275,10 +5275,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		}
 		return r;
 	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
-		if (!kvm_x86_ops.enable_direct_tlbflush)
+		if (!kvm_x86_ops.enable_l2_tlb_flush)
 			return -ENOTTY;
 
-		return static_call(kvm_x86_enable_direct_tlbflush)(vcpu);
+		return static_call(kvm_x86_enable_l2_tlb_flush)(vcpu);
 
 	case KVM_CAP_HYPERV_ENFORCE_CPUID:
 		return kvm_hv_set_enforce_cpuid(vcpu, cap->args[0]);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
@ 2022-04-25 15:47   ` Wei Liu
  2022-04-25 17:34     ` Michael Kelley (LINUX)
  2022-04-25 19:09   ` Christophe JAILLET
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 102+ messages in thread
From: Wei Liu @ 2022-04-25 15:47 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Michael Kelley
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson,
	Michael Kelley, Siddharth Chandrasekaran, linux-hyperv,
	linux-kernel, Wei Liu

On Thu, Apr 14, 2022 at 03:19:46PM +0200, Vitaly Kuznetsov wrote:
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
> 
> No functional change.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  include/asm-generic/hyperv-tlfs.h |  5 +++++
>  include/asm-generic/mshyperv.h    | 11 ++++++-----
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
>  	u64 bank_contents[];
>  } __packed;
>  
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)

I think replacing the magic number with a macro is a good thing.

Where do you get these names? Did you make them up yourself?

I'm trying to dig into internal code to find the most appropriate names,
but I couldn't find any so far. Michael, do you have insight here?

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-25 15:47   ` Wei Liu
@ 2022-04-25 17:34     ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Kelley (LINUX) @ 2022-04-25 17:34 UTC (permalink / raw)
  To: Wei Liu, vkuznets
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

From: Wei Liu <wei.liu@kernel.org> Sent: Monday, April 25, 2022 8:47 AM

> On Thu, Apr 14, 2022 at 03:19:46PM +0200, Vitaly Kuznetsov wrote:
> > It may not come clear from where the magical '64' value used in
> > __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> > sparse bank number as well as the number of vCPUs per bank. Add defines
> > to make things clear. These defines are also going to be used by KVM.
> >
> > No functional change.
> >
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> >  include/asm-generic/hyperv-tlfs.h |  5 +++++
> >  include/asm-generic/mshyperv.h    | 11 ++++++-----
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> > index fdce7a4cfc6f..020ca9bdbb79 100644
> > --- a/include/asm-generic/hyperv-tlfs.h
> > +++ b/include/asm-generic/hyperv-tlfs.h
> > @@ -399,6 +399,11 @@ struct hv_vpset {
> >  	u64 bank_contents[];
> >  } __packed;
> >
> > +/* The maximum number of sparse vCPU banks which can be encoded by 'struct
> hv_vpset' */
> > +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> > +/* The number of vCPUs in one sparse bank */
> > +#define HV_VCPUS_PER_SPARSE_BANK (64)
> 
> I think replacing the magic number with a macro is a good thing.
> 
> Where do you get these names? Did you make them up yourself?
> 
> I'm trying to dig into internal code to find the most appropriate names,
> but I couldn't find any so far. Michael, do you have insight here?
> 
> Thanks,
> Wei.

These names look good to me.  The "sparse" and "bank" terminology
comes from the Hyper-V TLFS, sections 7.8.7.3 thru 7.8.7.5.  The TLFS
uses the constant "64", but for two different purposes as Vitaly
points out.  But in both cases the "64" accrues from the use of
a uint64 value as a bitmap.

Michael

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
  2022-04-25 15:47   ` Wei Liu
@ 2022-04-25 19:09   ` Christophe JAILLET
  2022-04-25 19:16   ` Christophe JAILLET
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 102+ messages in thread
From: Christophe JAILLET @ 2022-04-25 19:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Hi,

Le 14/04/2022 à 15:19, Vitaly Kuznetsov a écrit :
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
> 
> No functional change.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   include/asm-generic/hyperv-tlfs.h |  5 +++++
>   include/asm-generic/mshyperv.h    | 11 ++++++-----
>   2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
>   	u64 bank_contents[];
>   } __packed;
>   
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
> +
>   /* HvCallSendSyntheticClusterIpi hypercall */
>   struct hv_send_ipi {
>   	u32 vector;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c08758b6b364..0abe91df1ef6 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>   {
>   	int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
>   	int this_cpu = smp_processor_id();
> +	int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
>   
> -	/* valid_bank_mask can represent up to 64 banks */
> -	if (hv_max_vp_index / 64 >= 64)
> +	/* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
> +	if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
>   		return 0;
>   
>   	/*
> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>   	 * structs are not cleared between calls, we risk flushing unneeded
>   	 * vCPUs otherwise.
>   	 */
> -	for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
> +	for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
>   		vpset->bank_contents[vcpu_bank] = 0;
>   
>   	/*
> @@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>   		vcpu = hv_cpu_number_to_vp_number(cpu);
>   		if (vcpu == VP_INVAL)
>   			return -1;
> -		vcpu_bank = vcpu / 64;
> -		vcpu_offset = vcpu % 64;
> +		vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
> +		vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
>   		__set_bit(vcpu_offset, (unsigned long *)
>   			  &vpset->bank_contents[vcpu_bank]);

Here, we could also use directly:
	__set_bit(vcpu, vpset->bank_contents);

This is simpler, more readable (IMHO) and also makes 'vcpu_offset' useless.
And in case gcc is not able to optimize it by itself, this should also 
save a few cycles.

Just my 2c,
CJ

>   		if (vcpu_bank >= nr_bank)


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
  2022-04-25 15:47   ` Wei Liu
  2022-04-25 19:09   ` Christophe JAILLET
@ 2022-04-25 19:16   ` Christophe JAILLET
  2022-05-03 14:59     ` Vitaly Kuznetsov
  2022-05-03 11:11   ` Wei Liu
  2022-05-11 11:23   ` Maxim Levitsky
  4 siblings, 1 reply; 102+ messages in thread
From: Christophe JAILLET @ 2022-04-25 19:16 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Le 14/04/2022 à 15:19, Vitaly Kuznetsov a écrit :
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
> 
> No functional change.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   include/asm-generic/hyperv-tlfs.h |  5 +++++
>   include/asm-generic/mshyperv.h    | 11 ++++++-----
>   2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
>   	u64 bank_contents[];
>   } __packed;
>   
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
> +
>   /* HvCallSendSyntheticClusterIpi hypercall */
>   struct hv_send_ipi {
>   	u32 vector;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c08758b6b364..0abe91df1ef6 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>   {
>   	int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
>   	int this_cpu = smp_processor_id();
> +	int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
>   
> -	/* valid_bank_mask can represent up to 64 banks */
> -	if (hv_max_vp_index / 64 >= 64)
> +	/* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
> +	if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
>   		return 0;
>   
>   	/*
> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>   	 * structs are not cleared between calls, we risk flushing unneeded
>   	 * vCPUs otherwise.
>   	 */
> -	for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
> +	for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
>   		vpset->bank_contents[vcpu_bank] = 0;

and here:
	bitmap_clear(vpset->bank_contents, 0, hv_max_vp_index);
or maybe even if it is safe to do so:
	bitmap_zero(vpset->bank_contents, hv_max_vp_index);

CJ

>   
>   	/*
> @@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>   		vcpu = hv_cpu_number_to_vp_number(cpu);
>   		if (vcpu == VP_INVAL)
>   			return -1;
> -		vcpu_bank = vcpu / 64;
> -		vcpu_offset = vcpu % 64;
> +		vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
> +		vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
>   		__set_bit(vcpu_offset, (unsigned long *)
>   			  &vpset->bank_contents[vcpu_bank]);
>   		if (vcpu_bank >= nr_bank)


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
                     ` (2 preceding siblings ...)
  2022-04-25 19:16   ` Christophe JAILLET
@ 2022-05-03 11:11   ` Wei Liu
  2022-05-11 11:23   ` Maxim Levitsky
  4 siblings, 0 replies; 102+ messages in thread
From: Wei Liu @ 2022-05-03 11:11 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson,
	Michael Kelley, Siddharth Chandrasekaran, linux-hyperv,
	linux-kernel, Wei Liu

On Thu, Apr 14, 2022 at 03:19:46PM +0200, Vitaly Kuznetsov wrote:
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
> 
> No functional change.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Acked-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-25 19:16   ` Christophe JAILLET
@ 2022-05-03 14:59     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-03 14:59 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Christophe JAILLET <christophe.jaillet@wanadoo.fr> writes:

> Le 14/04/2022 à 15:19, Vitaly Kuznetsov a écrit :

...

>> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>>   	 * structs are not cleared between calls, we risk flushing unneeded
>>   	 * vCPUs otherwise.
>>   	 */
>> -	for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
>> +	for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
>>   		vpset->bank_contents[vcpu_bank] = 0;
>
> and here:
> 	bitmap_clear(vpset->bank_contents, 0, hv_max_vp_index);
> or maybe even if it is safe to do so:
> 	bitmap_zero(vpset->bank_contents, hv_max_vp_index);

Both your suggestions (including the one for "PATCH v3 07/34]") look
good to me, thanks! I'd however want to send them to linux-hyperv@
separately when this series lands through KVM tree just to not make this
heavy series even heavier.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature
  2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
                   ` (33 preceding siblings ...)
  2022-04-14 13:20 ` [PATCH v3 34/34] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush' Vitaly Kuznetsov
@ 2022-05-03 15:01 ` Vitaly Kuznetsov
  34 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-03 15:01 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Changes since v1:

This should've beed 'since v2', obviously.

...

>
> Currently, KVM handles HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests
> by flushing the whole VPID and this is sub-optimal. This series introduces
> the required mechanism to make handling of these requests more 
> fine-grained by flushing individual GVAs only (when requested). On this
> foundation, "Direct Virtual Flush" Hyper-V feature is implemented. The 
> feature allows L0 to handle Hyper-V TLB flush hypercalls directly at
> L0 without the need to reflect the exit to L1. This has at least two
> benefits: reflecting vmexit and the consequent vmenter are avoided + L0
> has precise information whether the target vCPU is actually running (and
> thus requires a kick).

FWIW, patches still apply cleanly to kvm/queue so probably there's no
need to resend.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
  2022-04-14 13:19 ` [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag Vitaly Kuznetsov
@ 2022-05-11 11:18   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> In preparation to implementing fine-grained Hyper-V TLB flush and
> L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
> KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH
> request in kvm_service_local_tlb_flush_requests() when
> KVM_REQ_TLB_FLUSH_GUEST was also requested.
> 
> No functional change intended.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 2 ++
>  arch/x86/kvm/hyperv.c           | 4 ++--
>  arch/x86/kvm/x86.c              | 6 +++++-
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c20f715f009..1de3ad9308d8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -105,6 +105,8 @@
>  	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
>  	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> +#define KVM_REQ_HV_TLB_FLUSH \
> +	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  
>  #define CR0_RESERVED_BITS                                               \
>  	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 46f9dfb60469..b402ad059eb9 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1876,11 +1876,11 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	 * analyze it here, flush TLB regardless of the specified address space.
>  	 */
>  	if (all_cpus) {
> -		kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
> +		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
>  	} else {
>  		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
>  
> -		kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, vcpu_mask);
> +		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
>  	}
>  
>  ret_success:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ab336f7c82e4..f633cff8cd7f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3360,8 +3360,12 @@ void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>  	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
>  		kvm_vcpu_flush_tlb_current(vcpu);
>  
> -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
> +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
>  		kvm_vcpu_flush_tlb_guest(vcpu);
> +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> +	} else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
> +		kvm_vcpu_flush_tlb_guest(vcpu);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring
  2022-04-14 13:19 ` [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring Vitaly Kuznetsov
@ 2022-05-11 11:19   ` Maxim Levitsky
  2022-05-16 14:29     ` Vitaly Kuznetsov
  2022-05-16 19:34   ` Sean Christopherson
  1 sibling, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To allow flushing individual GVAs instead of always flushing the whole
> VPID a per-vCPU structure to pass the requests is needed. Introduce a
> simple ring write-locked structure to hold two types of entries:
> individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits)
> and 'flush all'.
> 
> The queuing rule is: if there's not enough space on the ring to put
> the request and leave at least 1 entry for 'flush all' - put 'flush
> all' entry.
> 
> The size of the ring is arbitrary set to '16'.
> 
> Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now so
> there's very small functional change but the infrastructure is
> prepared to handle individual GVA flush requests.

As I see from this patch, also the code doesn't process the requests
from the ring buffer yet, but rather just ignores it completely,
and resets the whole ring buffer (kvm_hv_vcpu_empty_flush_tlb)
Maybe you should mention it here.


> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 16 +++++++
>  arch/x86/kvm/hyperv.c           | 83 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/hyperv.h           | 13 ++++++
>  arch/x86/kvm/x86.c              |  5 +-
>  arch/x86/kvm/x86.h              |  1 +
>  5 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1de3ad9308d8..b4dd2ff61658 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -578,6 +578,20 @@ struct kvm_vcpu_hv_synic {
>  	bool dont_zero_synic_pages;
>  };
>  
> +#define KVM_HV_TLB_FLUSH_RING_SIZE (16)
> +
> +struct kvm_vcpu_hv_tlb_flush_entry {
> +	u64 addr;
> +	u64 flush_all:1;
> +	u64 pad:63;
> +};

Have you considered using kfifo.h library instead?

> +
> +struct kvm_vcpu_hv_tlb_flush_ring {
> +	int read_idx, write_idx;
> +	spinlock_t write_lock;
> +	struct kvm_vcpu_hv_tlb_flush_entry entries[KVM_HV_TLB_FLUSH_RING_SIZE];
> +};
> +
>  /* Hyper-V per vcpu emulation context */
>  struct kvm_vcpu_hv {
>  	struct kvm_vcpu *vcpu;
> @@ -597,6 +611,8 @@ struct kvm_vcpu_hv {
>  		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
>  		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
>  	} cpuid_cache;
> +
> +	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring;
>  };
>  
>  /* Xen HVM per vcpu emulation context */
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index b402ad059eb9..fb716cf919ed 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -29,6 +29,7 @@
>  #include <linux/kvm_host.h>
>  #include <linux/highmem.h>
>  #include <linux/sched/cputime.h>
> +#include <linux/spinlock.h>
>  #include <linux/eventfd.h>
>  
>  #include <asm/apicdef.h>
> @@ -954,6 +955,8 @@ static int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
>  
>  	hv_vcpu->vp_index = vcpu->vcpu_idx;
>  
> +	spin_lock_init(&hv_vcpu->tlb_flush_ring.write_lock);
> +
>  	return 0;
>  }
>  
> @@ -1789,6 +1792,74 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
>  			      var_cnt * sizeof(*sparse_banks));
>  }
>  
> +static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
> +					 int read_idx, int write_idx)
> +{
> +	if (write_idx >= read_idx)
> +		return KVM_HV_TLB_FLUSH_RING_SIZE - (write_idx - read_idx) - 1;
> +
> +	return read_idx - write_idx - 1;
> +}
> +
> +static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	int ring_free, write_idx, read_idx;
> +	unsigned long flags;
> +
> +	if (!hv_vcpu)
> +		return;
> +
> +	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
> +
> +	spin_lock_irqsave(&tlb_flush_ring->write_lock, flags);
> +
> +	/*
> +	 * 'read_idx' is updated by the vCPU which does the flush, this
> +	 * happens without 'tlb_flush_ring->write_lock' being held; make
> +	 * sure we read it once.
> +	 */
> +	read_idx = READ_ONCE(tlb_flush_ring->read_idx);
> +	/*
> +	 * 'write_idx' is only updated here, under 'tlb_flush_ring->write_lock'.
> +	 * allow the compiler to re-read it, it can't change.
> +	 */
> +	write_idx = tlb_flush_ring->write_idx;
> +
> +	ring_free = hv_tlb_flush_ring_free(hv_vcpu, read_idx, write_idx);
> +	/* Full ring always contains 'flush all' entry */
> +	if (!ring_free)
> +		goto out_unlock;
> +
> +	tlb_flush_ring->entries[write_idx].addr = 0;
> +	tlb_flush_ring->entries[write_idx].flush_all = 1;
> +	/*
> +	 * Advance write index only after filling in the entry to
> +	 * synchronize with lockless reader.
> +	 */
> +	smp_wmb();
> +	tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&tlb_flush_ring->write_lock, flags);
> +}
> +
> +void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> +	kvm_vcpu_flush_tlb_guest(vcpu);
> +
> +	if (!hv_vcpu)
> +		return;
> +
> +	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
> +
> +	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
> +}
> +
>  static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1797,6 +1868,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
>  	u64 valid_bank_mask;
>  	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
> +	struct kvm_vcpu *v;
> +	unsigned long i;
>  	bool all_cpus;
>  
>  	/*
> @@ -1876,10 +1949,20 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	 * analyze it here, flush TLB regardless of the specified address space.
>  	 */
>  	if (all_cpus) {
> +		kvm_for_each_vcpu(i, v, kvm)
> +			hv_tlb_flush_ring_enqueue(v);
> +
>  		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
>  	} else {
>  		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
>  
> +		for_each_set_bit(i, vcpu_mask, KVM_MAX_VCPUS) {
> +			v = kvm_get_vcpu(kvm, i);
> +			if (!v)
> +				continue;
> +			hv_tlb_flush_ring_enqueue(v);
> +		}
> +
>  		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
>  	}
>  
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index da2737f2a956..6847caeaaf84 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -147,4 +147,17 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
>  int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>  		     struct kvm_cpuid_entry2 __user *entries);
>  
> +
> +static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> +	if (!hv_vcpu)
> +		return;
> +
> +	hv_vcpu->tlb_flush_ring.read_idx = hv_vcpu->tlb_flush_ring.write_idx;
> +}
> +void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
> +
> +
>  #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f633cff8cd7f..e5aec386d299 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3324,7 +3324,7 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
>  	static_call(kvm_x86_flush_tlb_all)(vcpu);
>  }
>  
> -static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.tlb_flush;
>  
> @@ -3362,7 +3362,8 @@ void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>  
>  	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
>  		kvm_vcpu_flush_tlb_guest(vcpu);
> -		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> +		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
> +			kvm_hv_vcpu_empty_flush_tlb(vcpu);
>  	} else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
>  		kvm_vcpu_flush_tlb_guest(vcpu);
>  	}
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 588792f00334..2324f496c500 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -58,6 +58,7 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
>  
>  #define MSR_IA32_CR_PAT_DEFAULT  0x0007040600070406ULL
>  
> +void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu);
>  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
>  int kvm_check_nested_events(struct kvm_vcpu *vcpu);
>  


Overall looks good to me. I might have missed something though.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 03/34] KVM: x86: hyper-v: Add helper to read hypercall data for array
  2022-04-14 13:19 ` [PATCH v3 03/34] KVM: x86: hyper-v: Add helper to read hypercall data for array Vitaly Kuznetsov
@ 2022-05-11 11:20   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:20 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
> reading a guest-provided array can be reused in the future, e.g. for
> getting a list of virtual addresses whose TLB entries need to be flushed.
> 
> Opportunisticaly swap the order of the data and XMM adjustment so that
> the XMM/gpa offsets are bundled together.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 53 +++++++++++++++++++++++++++----------------
>  1 file changed, 33 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index fb716cf919ed..d66c27fd1e8a 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1758,38 +1758,51 @@ struct kvm_hv_hcall {
>  	sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
>  };
>  
> -static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
> -				 int consumed_xmm_halves,
> -				 u64 *sparse_banks, gpa_t offset)
> -{
> -	u16 var_cnt;
> -	int i;
>  
> -	if (hc->var_cnt > 64)
> -		return -EINVAL;
> -
> -	/* Ignore banks that cannot possibly contain a legal VP index. */
> -	var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
> +static int kvm_hv_get_hc_data(struct kvm *kvm, struct kvm_hv_hcall *hc,
> +			      u16 orig_cnt, u16 cnt_cap, u64 *data,
> +			      int consumed_xmm_halves, gpa_t offset)
> +{
> +	/*
> +	 * Preserve the original count when ignoring entries via a "cap", KVM
> +	 * still needs to validate the guest input (though the non-XMM path
> +	 * punts on the checks).
> +	 */
> +	u16 cnt = min(orig_cnt, cnt_cap);
> +	int i, j;
>  
>  	if (hc->fast) {
>  		/*
>  		 * Each XMM holds two sparse banks, but do not count halves that
>  		 * have already been consumed for hypercall parameters.
>  		 */
> -		if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
> +		if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
>  			return HV_STATUS_INVALID_HYPERCALL_INPUT;
> -		for (i = 0; i < var_cnt; i++) {
> -			int j = i + consumed_xmm_halves;
> +
> +		for (i = 0; i < cnt; i++) {
> +			j = i + consumed_xmm_halves;
>  			if (j % 2)
> -				sparse_banks[i] = sse128_hi(hc->xmm[j / 2]);
> +				data[i] = sse128_hi(hc->xmm[j / 2]);
>  			else
> -				sparse_banks[i] = sse128_lo(hc->xmm[j / 2]);
> +				data[i] = sse128_lo(hc->xmm[j / 2]);
>  		}
>  		return 0;
>  	}
>  
> -	return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
> -			      var_cnt * sizeof(*sparse_banks));
> +	return kvm_read_guest(kvm, hc->ingpa + offset, data,
> +			      cnt * sizeof(*data));
> +}
> +
> +static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
> +				 u64 *sparse_banks, int consumed_xmm_halves,
> +				 gpa_t offset)
> +{
> +	if (hc->var_cnt > 64)
> +		return -EINVAL;
> +
> +	/* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
> +	return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS,
> +				  sparse_banks, consumed_xmm_halves, offset);
>  }
>  
>  static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
> @@ -1937,7 +1950,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  		if (!hc->var_cnt)
>  			goto ret_success;
>  
> -		if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks,
> +		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
>  					  offsetof(struct hv_tlb_flush_ex,
>  						   hv_vp_set.bank_contents)))
>  			return HV_STATUS_INVALID_HYPERCALL_INPUT;
> @@ -2048,7 +2061,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  		if (!hc->var_cnt)
>  			goto ret_success;
>  
> -		if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks,
> +		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1,
>  					  offsetof(struct hv_send_ipi_ex,
>  						   vp_set.bank_contents)))
>  			return HV_STATUS_INVALID_HYPERCALL_INPUT;

I don't see anything wrong, but I don't know this area that well, so I might have
missed something.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-04-14 13:19 ` [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Vitaly Kuznetsov
@ 2022-05-11 11:22   ` Maxim Levitsky
  2022-05-18  9:39     ` Vitaly Kuznetsov
  2022-05-16 19:41   ` Sean Christopherson
  1 sibling, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:22 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
> the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
> flushing the whole VPID and this is sub-optimal. Switch to handling
> these requests with 'flush_tlb_gva()' hooks instead. Use the newly
> introduced TLB flush ring to queue the requests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 132 ++++++++++++++++++++++++++++++++++++------
>  1 file changed, 115 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index d66c27fd1e8a..759e1a16e5c3 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1805,6 +1805,13 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
>  				  sparse_banks, consumed_xmm_halves, offset);
>  }
>  
> +static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc, u64 entries[],
> +				       int consumed_xmm_halves, gpa_t offset)
> +{
> +	return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt,
> +				  entries, consumed_xmm_halves, offset);
> +}
> +
>  static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
>  					 int read_idx, int write_idx)
>  {
> @@ -1814,12 +1821,13 @@ static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
>  	return read_idx - write_idx - 1;
>  }
>  
> -static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
> +static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
>  {
>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>  	int ring_free, write_idx, read_idx;
>  	unsigned long flags;
> +	int i;
>  
>  	if (!hv_vcpu)
>  		return;
> @@ -1845,14 +1853,34 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
>  	if (!ring_free)
>  		goto out_unlock;
>  
> -	tlb_flush_ring->entries[write_idx].addr = 0;
> -	tlb_flush_ring->entries[write_idx].flush_all = 1;
>  	/*
> -	 * Advance write index only after filling in the entry to
> -	 * synchronize with lockless reader.
> +	 * All entries should fit on the ring leaving one free for 'flush all'
> +	 * entry in case another request comes in. In case there's not enough
> +	 * space, just put 'flush all' entry there.
> +	 */
> +	if (!count || count >= ring_free - 1 || !entries) {
> +		tlb_flush_ring->entries[write_idx].addr = 0;
> +		tlb_flush_ring->entries[write_idx].flush_all = 1;
> +		/*
> +		 * Advance write index only after filling in the entry to
> +		 * synchronize with lockless reader.
> +		 */
> +		smp_wmb();
> +		tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
> +		goto out_unlock;
> +	}
> +
> +	for (i = 0; i < count; i++) {
> +		tlb_flush_ring->entries[write_idx].addr = entries[i];
> +		tlb_flush_ring->entries[write_idx].flush_all = 0;
> +		write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
> +	}
> +	/*
> +	 * Advance write index only after filling in the entry to synchronize
> +	 * with lockless reader.
>  	 */
>  	smp_wmb();
> -	tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
> +	tlb_flush_ring->write_idx = write_idx;
>  
>  out_unlock:
>  	spin_unlock_irqrestore(&tlb_flush_ring->write_lock, flags);
> @@ -1862,15 +1890,58 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	struct kvm_vcpu_hv_tlb_flush_entry *entry;
> +	int read_idx, write_idx;
> +	u64 address;
> +	u32 count;
> +	int i, j;
>  
> -	kvm_vcpu_flush_tlb_guest(vcpu);
> -
> -	if (!hv_vcpu)
> +	if (!tdp_enabled || !hv_vcpu) {
> +		kvm_vcpu_flush_tlb_guest(vcpu);
>  		return;
> +	}
>  
>  	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
>  
> -	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
> +	/*
> +	 * TLB flush must be performed on the target vCPU so 'read_idx'
> +	 * (AKA 'tail') cannot change underneath, the compiler is free
> +	 * to re-read it.
> +	 */
> +	read_idx = tlb_flush_ring->read_idx;
> +
> +	/*
> +	 * 'write_idx' (AKA 'head') can be concurently updated by a different
> +	 * vCPU so we must be sure it's read once.
> +	 */
> +	write_idx = READ_ONCE(tlb_flush_ring->write_idx);
> +
> +	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
> +	smp_rmb();
> +
> +	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
> +		entry = &tlb_flush_ring->entries[i];
> +
> +		if (entry->flush_all)
> +			goto out_flush_all;

I have an idea: instead of special 'flush all entry' in the ring,
just have a boolean in parallel to the ring.

Also the ring buffer entries will be 2x smaller since they won't need
to have the 'flush all' boolean.

This would allow to just flush the whole thing and discard the ring if that boolean is set,
allow to not enqueue anything to the ring also if the boolean is already set,
also we won't need to have extra space in the ring for that entry, etc, etc.

Or if using kfifo, then it can contain plain u64 items, which is even more natural.


> +
> +		/*
> +		 * Lower 12 bits of 'address' encode the number of additional
> +		 * pages to flush.
> +		 */
> +		address = entry->addr & PAGE_MASK;
> +		count = (entry->addr & ~PAGE_MASK) + 1;
> +		for (j = 0; j < count; j++)
> +			static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
> +	}
> +	++vcpu->stat.tlb_flush;
> +	goto out_empty_ring;
> +
> +out_flush_all:
> +	kvm_vcpu_flush_tlb_guest(vcpu);
> +
> +out_empty_ring:
> +	tlb_flush_ring->read_idx = write_idx;
>  }
>  
>  static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> @@ -1879,11 +1950,22 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	struct hv_tlb_flush_ex flush_ex;
>  	struct hv_tlb_flush flush;
>  	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
> +	/*
> +	 * Normally, there can be no more than 'KVM_HV_TLB_FLUSH_RING_SIZE - 1'
> +	 * entries on the TLB Flush ring as when 'read_idx == write_idx' the
> +	 * ring is considered as empty. The last entry on the ring, however,
> +	 * needs to be always left free for 'flush all' entry which gets placed
> +	 * when there is not enough space to put all the requested entries.
> +	 */
> +	u64 __tlb_flush_entries[KVM_HV_TLB_FLUSH_RING_SIZE - 2];
> +	u64 *tlb_flush_entries;
>  	u64 valid_bank_mask;
>  	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
>  	struct kvm_vcpu *v;
>  	unsigned long i;
>  	bool all_cpus;
> +	int consumed_xmm_halves = 0;
> +	gpa_t data_offset;
>  
>  	/*
>  	 * The Hyper-V TLFS doesn't allow more than 64 sparse banks, e.g. the
> @@ -1899,10 +1981,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  			flush.address_space = hc->ingpa;
>  			flush.flags = hc->outgpa;
>  			flush.processor_mask = sse128_lo(hc->xmm[0]);
> +			consumed_xmm_halves = 1;
>  		} else {
>  			if (unlikely(kvm_read_guest(kvm, hc->ingpa,
>  						    &flush, sizeof(flush))))
>  				return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +			data_offset = sizeof(flush);
>  		}
>  
>  		trace_kvm_hv_flush_tlb(flush.processor_mask,
> @@ -1926,10 +2010,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  			flush_ex.flags = hc->outgpa;
>  			memcpy(&flush_ex.hv_vp_set,
>  			       &hc->xmm[0], sizeof(hc->xmm[0]));
> +			consumed_xmm_halves = 2;
>  		} else {
>  			if (unlikely(kvm_read_guest(kvm, hc->ingpa, &flush_ex,
>  						    sizeof(flush_ex))))
>  				return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +			data_offset = sizeof(flush_ex);
>  		}
>  
>  		trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
> @@ -1945,25 +2031,37 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  			return HV_STATUS_INVALID_HYPERCALL_INPUT;
>  
>  		if (all_cpus)
> -			goto do_flush;
> +			goto read_flush_entries;
>  
>  		if (!hc->var_cnt)
>  			goto ret_success;
>  
> -		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
> -					  offsetof(struct hv_tlb_flush_ex,
> -						   hv_vp_set.bank_contents)))
> +		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, consumed_xmm_halves,
> +					  data_offset))
> +			return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +		data_offset += hc->var_cnt * sizeof(sparse_banks[0]);
> +		consumed_xmm_halves += hc->var_cnt;
> +	}
> +
> +read_flush_entries:
> +	if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
> +	    hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
> +	    hc->rep_cnt > ARRAY_SIZE(__tlb_flush_entries)) {
> +		tlb_flush_entries = NULL;
> +	} else {
> +		if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries,
> +						consumed_xmm_halves, data_offset))
>  			return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +		tlb_flush_entries = __tlb_flush_entries;
>  	}
>  
> -do_flush:
>  	/*
>  	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
>  	 * analyze it here, flush TLB regardless of the specified address space.
>  	 */
>  	if (all_cpus) {
>  		kvm_for_each_vcpu(i, v, kvm)
> -			hv_tlb_flush_ring_enqueue(v);
> +			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
>  
>  		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
>  	} else {
> @@ -1973,7 +2071,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  			v = kvm_get_vcpu(kvm, i);
>  			if (!v)
>  				continue;
> -			hv_tlb_flush_ring_enqueue(v);
> +			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
>  		}
>  
>  		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);


Overall the code looks good to me but I haven't checked it closely, so
I might have missed some simple bugs like off by one there and there.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 05/34] KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls
  2022-04-14 13:19 ` [PATCH v3 05/34] KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls Vitaly Kuznetsov
@ 2022-05-11 11:23   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Extended GVA ranges support bit seems to indicate whether lower 12
> bits of GVA can be used to specify up to 4095 additional consequent
> GVAs to flush. This is somewhat described in TLFS.
> 
> Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
> requests by flushing the whole VPID so technically, extended GVA
> ranges were already supported. As such requests are handled more
> gently now, advertizing support for extended ranges starts making
> sense to reduce the size of TLB flush requests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 2 ++
>  arch/x86/kvm/hyperv.c              | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 0a9407dc0859..5225a85c08c3 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -61,6 +61,8 @@
>  #define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE		BIT(10)
>  /* Support for debug MSRs available */
>  #define HV_FEATURE_DEBUG_MSRS_AVAILABLE			BIT(11)
> +/* Support for extended gva ranges for flush hypercalls available */
> +#define HV_FEATURE_EXT_GVA_RANGES_FLUSH			BIT(14)
>  /*
>   * Support for returning hypercall output block via XMM
>   * registers is available
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 759e1a16e5c3..1a6f9628cee9 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2702,6 +2702,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>  			ent->ebx |= HV_DEBUGGING;
>  			ent->edx |= HV_X64_GUEST_DEBUGGING_AVAILABLE;
>  			ent->edx |= HV_FEATURE_DEBUG_MSRS_AVAILABLE;
> +			ent->edx |= HV_FEATURE_EXT_GVA_RANGES_FLUSH;
>  
>  			/*
>  			 * Direct Synthetic timers only make sense with in-kernel


I do think that we need to ask Microsoft to document this,
since from the spec (v6.0b) the only mention of this is 

"Bit 14: ExtendedGvaRangesForFlushVirtualAddressListAvailable"


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
  2022-04-14 13:19 ` [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs Vitaly Kuznetsov
@ 2022-05-11 11:23   ` Maxim Levitsky
  2022-05-11 11:23   ` Maxim Levitsky
  1 sibling, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To handle L2 TLB flush requests, KVM needs to translate the specified
> L2 GPA to L1 GPA to read hypercall arguments from there.
> 
> No fucntional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.
   ^ typo
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1a6f9628cee9..fc4bb0ead9fa 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -23,6 +23,7 @@
>  #include "ioapic.h"
>  #include "cpuid.h"
>  #include "hyperv.h"
> +#include "mmu.h"
>  #include "xen.h"
>  
>  #include <linux/cpu.h>
> @@ -1975,6 +1976,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	 */
>  	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);
>  
> +	if (!hc->fast && is_guest_mode(vcpu)) {
> +		hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL);
> +		if (unlikely(hc->ingpa == UNMAPPED_GVA))
> +			return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +	}
> +
>  	if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
>  	    hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE) {
>  		if (hc->fast) {


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
  2022-04-14 13:19 ` [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs Vitaly Kuznetsov
  2022-05-11 11:23   ` Maxim Levitsky
@ 2022-05-11 11:23   ` Maxim Levitsky
  1 sibling, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To handle L2 TLB flush requests, KVM needs to translate the specified
> L2 GPA to L1 GPA to read hypercall arguments from there.
> 
> No fucntional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.
   ^ typo
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1a6f9628cee9..fc4bb0ead9fa 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -23,6 +23,7 @@
>  #include "ioapic.h"
>  #include "cpuid.h"
>  #include "hyperv.h"
> +#include "mmu.h"
>  #include "xen.h"
>  
>  #include <linux/cpu.h>
> @@ -1975,6 +1976,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	 */
>  	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);
>  
> +	if (!hc->fast && is_guest_mode(vcpu)) {
> +		hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL);
> +		if (unlikely(hc->ingpa == UNMAPPED_GVA))
> +			return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +	}
> +
>  	if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
>  	    hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE) {
>  		if (hc->fast) {


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
  2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
                     ` (3 preceding siblings ...)
  2022-05-03 11:11   ` Wei Liu
@ 2022-05-11 11:23   ` Maxim Levitsky
  4 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
> 
> No functional change.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  include/asm-generic/hyperv-tlfs.h |  5 +++++
>  include/asm-generic/mshyperv.h    | 11 ++++++-----
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
>  	u64 bank_contents[];
>  } __packed;
>  
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
> +
>  /* HvCallSendSyntheticClusterIpi hypercall */
>  struct hv_send_ipi {
>  	u32 vector;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c08758b6b364..0abe91df1ef6 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>  {
>  	int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
>  	int this_cpu = smp_processor_id();
> +	int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
>  
> -	/* valid_bank_mask can represent up to 64 banks */
> -	if (hv_max_vp_index / 64 >= 64)
> +	/* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
> +	if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
>  		return 0;
>  
>  	/*
> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>  	 * structs are not cleared between calls, we risk flushing unneeded
>  	 * vCPUs otherwise.
>  	 */
> -	for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
> +	for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
>  		vpset->bank_contents[vcpu_bank] = 0;
>  
>  	/*
> @@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>  		vcpu = hv_cpu_number_to_vp_number(cpu);
>  		if (vcpu == VP_INVAL)
>  			return -1;
> -		vcpu_bank = vcpu / 64;
> -		vcpu_offset = vcpu % 64;
> +		vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
> +		vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
>  		__set_bit(vcpu_offset, (unsigned long *)
>  			  &vpset->bank_contents[vcpu_bank]);
>  		if (vcpu_bank >= nr_bank)
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 08/34] KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64'
  2022-04-14 13:19 ` [PATCH v3 08/34] KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64' Vitaly Kuznetsov
@ 2022-05-11 11:24   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> It may not be clear from where the '64' limit for the maximum sparse
> bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead.
> Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's
> definition. Opportunistically adjust the comment around BUILD_BUG_ON().
> 
> No functional change.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index fc4bb0ead9fa..3cf68645a2e6 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -43,7 +43,7 @@
>  /* "Hv#1" signature */
>  #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
>  
> -#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
> +#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, HV_VCPUS_PER_SPARSE_BANK)
>  
>  static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
>  				bool vcpu_kick);
> @@ -1798,7 +1798,7 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
>  				 u64 *sparse_banks, int consumed_xmm_halves,
>  				 gpa_t offset)
>  {
> -	if (hc->var_cnt > 64)
> +	if (hc->var_cnt > HV_MAX_SPARSE_VCPU_BANKS)
>  		return -EINVAL;
>  
>  	/* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
> @@ -1969,12 +1969,11 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	gpa_t data_offset;
>  
>  	/*
> -	 * The Hyper-V TLFS doesn't allow more than 64 sparse banks, e.g. the
> -	 * valid mask is a u64.  Fail the build if KVM's max allowed number of
> -	 * vCPUs (>4096) would exceed this limit, KVM will additional changes
> -	 * for Hyper-V support to avoid setting the guest up to fail.
> +	 * The Hyper-V TLFS doesn't allow more than HV_MAX_SPARSE_VCPU_BANKS
> +	 * sparse banks. Fail the build if KVM's max allowed number of
> +	 * vCPUs (>4096) exceeds this limit.
>  	 */
> -	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);
> +	BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > HV_MAX_SPARSE_VCPU_BANKS);
>  
>  	if (!hc->fast && is_guest_mode(vcpu)) {
>  		hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()
  2022-04-14 13:19 ` [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi() Vitaly Kuznetsov
@ 2022-05-11 11:24   ` Maxim Levitsky
  2022-05-16 19:52     ` Sean Christopherson
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
> for a smaller number of vCPUs in the request. When Hyper-V TLB flush
> is in  use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
> send IPIs to a large number of vCPUs (and are rarely used in general).
> 
> Introduce hv_is_vp_in_sparse_set() to directly check if the specified
> VP_ID is present in sparse vCPU set.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 37 ++++++++++++++++++++++++++-----------
>  1 file changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 3cf68645a2e6..aebbb598ad1d 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1746,6 +1746,25 @@ static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
>  	}
>  }
>  
> +static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_banks[])
> +{
> +	int bank, sbank = 0;
> +
> +	if (!test_bit(vp_id / HV_VCPUS_PER_SPARSE_BANK,
> +		      (unsigned long *)&valid_bank_mask))
> +		return false;
> +
> +	for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
> +			 KVM_HV_MAX_SPARSE_VCPU_SET_BITS) {
> +		if (bank == vp_id / HV_VCPUS_PER_SPARSE_BANK)
> +			break;
> +		sbank++;
> +	}
> +
> +	return test_bit(vp_id % HV_VCPUS_PER_SPARSE_BANK,
> +			(unsigned long *)&sparse_banks[sbank]);
> +}
> +
>  struct kvm_hv_hcall {
>  	u64 param;
>  	u64 ingpa;
> @@ -2089,8 +2108,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  		((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
>  }
>  
> -static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
> -				 unsigned long *vcpu_bitmap)
> +static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
> +				    u64 *sparse_banks, u64 valid_bank_mask)
I think the indentation is wrong here (was wrong before as well)


>  {
>  	struct kvm_lapic_irq irq = {
>  		.delivery_mode = APIC_DM_FIXED,
> @@ -2100,7 +2119,10 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
>  	unsigned long i;
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> -		if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
> +		if (sparse_banks &&
> +		    !hv_is_vp_in_sparse_set(kvm_hv_get_vpindex(vcpu),
> +					    valid_bank_mask,
> +					    sparse_banks))
>  			continue;
>  
>  		/* We fail only when APIC is disabled */
> @@ -2113,7 +2135,6 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	struct kvm *kvm = vcpu->kvm;
>  	struct hv_send_ipi_ex send_ipi_ex;
>  	struct hv_send_ipi send_ipi;
> -	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
>  	unsigned long valid_bank_mask;
>  	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
>  	u32 vector;
> @@ -2175,13 +2196,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
>  		return HV_STATUS_INVALID_HYPERCALL_INPUT;
>  
> -	if (all_cpus) {
> -		kvm_send_ipi_to_many(kvm, vector, NULL);
> -	} else {
> -		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
> -
> -		kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
> -	}
> +	kvm_hv_send_ipi_to_many(kvm, vector, all_cpus ? NULL : sparse_banks, valid_bank_mask);
>  
>  ret_success:
>  	return HV_STATUS_SUCCESS;


Overall looks good to me, but I might have missed something.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 10/34] KVM: x86: hyper-v: Create a separate ring for L2 TLB flush
  2022-04-14 13:19 ` [PATCH v3 10/34] KVM: x86: hyper-v: Create a separate ring for L2 TLB flush Vitaly Kuznetsov
@ 2022-05-11 11:24   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To handle L2 TLB flush requests, KVM needs to use a separate ring from
> regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
> something in L2 is made, the target vCPU can transition from L2 to L1,
> receive a request to flush a GVA for L1 and then try to enter L2 back.
> The first request needs to be processed at this point. Similarly,
> requests to flush GVAs in L1 must wait until L2 exits to L1.
> 
> No functional change as KVM doesn't handle L2 TLB flush requests from
> L2 yet.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  8 +++++++-
>  arch/x86/kvm/hyperv.c           |  8 +++++---
>  arch/x86/kvm/hyperv.h           | 19 ++++++++++++++++---
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b4dd2ff61658..058061621872 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -580,6 +580,12 @@ struct kvm_vcpu_hv_synic {
>  
>  #define KVM_HV_TLB_FLUSH_RING_SIZE (16)
>  
> +enum hv_tlb_flush_rings {
> +	HV_L1_TLB_FLUSH_RING,
> +	HV_L2_TLB_FLUSH_RING,
> +	HV_NR_TLB_FLUSH_RINGS,
> +};
> +
>  struct kvm_vcpu_hv_tlb_flush_entry {
>  	u64 addr;
>  	u64 flush_all:1;
> @@ -612,7 +618,7 @@ struct kvm_vcpu_hv {
>  		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
>  	} cpuid_cache;
>  
> -	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring;
> +	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
>  };
>  
>  /* Xen HVM per vcpu emulation context */
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index aebbb598ad1d..1cef2b8f7001 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -956,7 +956,8 @@ static int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
>  
>  	hv_vcpu->vp_index = vcpu->vcpu_idx;
>  
> -	spin_lock_init(&hv_vcpu->tlb_flush_ring.write_lock);
> +	for (i = 0; i < HV_NR_TLB_FLUSH_RINGS; i++)
> +		spin_lock_init(&hv_vcpu->tlb_flush_ring[i].write_lock);
>  
>  	return 0;
>  }
> @@ -1852,7 +1853,8 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int c
>  	if (!hv_vcpu)
>  		return;
>  
> -	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
> +	/* kvm_hv_flush_tlb() is not ready to handle requests for L2s yet */
> +	tlb_flush_ring = &hv_vcpu->tlb_flush_ring[HV_L1_TLB_FLUSH_RING];
>  
>  	spin_lock_irqsave(&tlb_flush_ring->write_lock, flags);
>  
> @@ -1921,7 +1923,7 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  		return;
>  	}
>  
> -	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
> +	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
>  
>  	/*
>  	 * TLB flush must be performed on the target vCPU so 'read_idx'
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 6847caeaaf84..d59f96700104 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -22,6 +22,7 @@
>  #define __ARCH_X86_KVM_HYPERV_H__
>  
>  #include <linux/kvm_host.h>
> +#include "x86.h"
>  
>  /*
>   * The #defines related to the synthetic debugger are required by KDNet, but
> @@ -147,15 +148,27 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
>  int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>  		     struct kvm_cpuid_entry2 __user *entries);
>  
> +static inline struct kvm_vcpu_hv_tlb_flush_ring *kvm_hv_get_tlb_flush_ring(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	int i = !is_guest_mode(vcpu) ? HV_L1_TLB_FLUSH_RING :
> +		HV_L2_TLB_FLUSH_RING;
> +
> +	/* KVM does not handle L2 TLB flush requests yet */
> +	WARN_ON_ONCE(i != HV_L1_TLB_FLUSH_RING);
> +
> +	return &hv_vcpu->tlb_flush_ring[i];
> +}
>  
>  static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  
> -	if (!hv_vcpu)
> +	if (!to_hv_vcpu(vcpu))
>  		return;
>  
> -	hv_vcpu->tlb_flush_ring.read_idx = hv_vcpu->tlb_flush_ring.write_idx;
> +	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
> +	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>  }
>  void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
>  


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
  2022-04-14 13:19 ` [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' Vitaly Kuznetsov
@ 2022-05-11 11:25   ` Maxim Levitsky
  2022-05-16 20:05   ` Sean Christopherson
  1 sibling, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
> to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
> may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
> 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
> dynamically.
> 
> Note: sparse_set_to_vcpu_mask() keeps using on-stack allocation as it
> won't be used to handle L2 TLB flush requests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 3 +++
>  arch/x86/kvm/hyperv.c           | 6 ++++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 058061621872..837c07e213de 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -619,6 +619,9 @@ struct kvm_vcpu_hv {
>  	} cpuid_cache;
>  
>  	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
> +
> +	/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
> +	u64 sparse_banks[64];
>  };
>  
>  /* Xen HVM per vcpu emulation context */
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1cef2b8f7001..e9793d36acca 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1968,6 +1968,8 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  
>  static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  {
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	u64 *sparse_banks = hv_vcpu->sparse_banks;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct hv_tlb_flush_ex flush_ex;
>  	struct hv_tlb_flush flush;
> @@ -1982,7 +1984,6 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	u64 __tlb_flush_entries[KVM_HV_TLB_FLUSH_RING_SIZE - 2];
>  	u64 *tlb_flush_entries;
>  	u64 valid_bank_mask;
> -	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
>  	struct kvm_vcpu *v;
>  	unsigned long i;
>  	bool all_cpus;
> @@ -2134,11 +2135,12 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
>  
>  static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  {
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	u64 *sparse_banks = hv_vcpu->sparse_banks;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct hv_send_ipi_ex send_ipi_ex;
>  	struct hv_send_ipi send_ipi;
>  	unsigned long valid_bank_mask;
> -	u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
>  	u32 vector;
>  	bool all_cpus;
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 12/34] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use
  2022-04-14 13:19 ` [PATCH v3 12/34] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use Vitaly Kuznetsov
@ 2022-05-11 11:25   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
> VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
> also needed to handle post-flush exit to L1 upon request.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  6 ++++++
>  arch/x86/kvm/vmx/nested.c       | 15 +++++++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 837c07e213de..8b2a52bf26c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -622,6 +622,12 @@ struct kvm_vcpu_hv {
>  
>  	/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
>  	u64 sparse_banks[64];
> +
> +	struct {
> +		u64 pa_page_gpa;
> +		u64 vm_id;
> +		u32 vp_id;
> +	} nested;
>  };
>  
>  /* Xen HVM per vcpu emulation context */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index a6688663da4d..ee88921c6156 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -225,6 +225,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
>  
>  static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
>  	if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
> @@ -233,6 +234,12 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
>  	}
>  
>  	vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
> +
> +	if (hv_vcpu) {
> +		hv_vcpu->nested.pa_page_gpa = INVALID_GPA;
> +		hv_vcpu->nested.vm_id = 0;
> +		hv_vcpu->nested.vp_id = 0;
> +	}
>  }
>  
>  static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
> @@ -1591,11 +1598,19 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
>  {
>  	struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12;
>  	struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(&vmx->vcpu);
>  
>  	/* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */
>  	vmcs12->tpr_threshold = evmcs->tpr_threshold;
>  	vmcs12->guest_rip = evmcs->guest_rip;
>  
> +	if (unlikely(!(hv_clean_fields &
> +		       HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL))) {
> +		hv_vcpu->nested.pa_page_gpa = evmcs->partition_assist_page;
> +		hv_vcpu->nested.vm_id = evmcs->hv_vm_id;
> +		hv_vcpu->nested.vp_id = evmcs->hv_vp_id;
> +	}
> +
>  	if (unlikely(!(hv_clean_fields &
>  		       HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC))) {
>  		vmcs12->guest_rsp = evmcs->guest_rsp;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
  2022-04-14 13:19 ` [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall() Vitaly Kuznetsov
@ 2022-05-11 11:25   ` Maxim Levitsky
  2022-05-16 20:09   ` Sean Christopherson
  1 sibling, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> The newly introduced helper checks whether vCPU is performing a
> Hyper-V TLB flush hypercall. This is required to filter out L2 TLB
> flush hypercalls for processing.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index d59f96700104..ca67c18cef2c 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -170,6 +170,24 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
>  	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
>  	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>  }
> +
> +static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	u16 code;
> +
> +	if (!hv_vcpu)
> +		return false;
> +
> +	code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read(vcpu) :
> +		kvm_rax_read(vcpu);
> +
> +	return (code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
> +		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
> +		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
> +		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX);
> +}
> +
>  void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
>  
>  

Looks Ok, but my knowelege of HV spec is limited so I might have missed something.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
  2022-04-14 13:19 ` [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id Vitaly Kuznetsov
@ 2022-05-11 11:27   ` Maxim Levitsky
  2022-05-18 12:25     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
> assist page address to handle L2 TLB flush requests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
>  arch/x86/kvm/svm/nested.c |  2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> index 7d6d97968fb9..8cf702fed7e5 100644
> --- a/arch/x86/kvm/svm/hyperv.h
> +++ b/arch/x86/kvm/svm/hyperv.h
> @@ -9,6 +9,7 @@
>  #include <asm/mshyperv.h>
>  
>  #include "../hyperv.h"
> +#include "svm.h"
>  
>  /*
>   * Hyper-V uses the software reserved 32 bytes in VMCB
> @@ -32,4 +33,19 @@ struct hv_enlightenments {
>   */
>  #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>  
> +static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct hv_enlightenments *hve =
> +		(struct hv_enlightenments *)svm->nested.ctl.reserved_sw;

Small nitpick:

Can we use this as an opportunity to rename the 'reserved_sw' to \
'hv_enlightenments' or something, because that is what it is?

Also the reserved_sw is an array, which is confusing, since from first look,
it looks like we have a pointer dereference here.



> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> +	if (!hv_vcpu)
> +		return;
> +
> +	hv_vcpu->nested.pa_page_gpa = hve->partition_assist_page;
> +	hv_vcpu->nested.vm_id = hve->hv_vm_id;
> +	hv_vcpu->nested.vp_id = hve->hv_vp_id;
> +}
> +
>  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index bed5e1692cef..2d1a76343404 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -826,6 +826,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
>  
>  	svm->nested.nested_run_pending = 1;
>  
> +	nested_svm_hv_update_vm_vp_ids(vcpu);
> +
>  	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
>  		goto out_exit_err;
>  

That won't work after migration, since this won't be called
if we migrate with nested guest running.


I think that nested_svm_hv_update_vm_vp_ids should be called 
from enter_svm_guest_mode.


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 16/34] KVM: x86: hyper-v: L2 TLB flush
  2022-04-14 13:19 ` [PATCH v3 16/34] KVM: x86: hyper-v: L2 TLB flush Vitaly Kuznetsov
@ 2022-05-11 11:29   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:29 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Handle L2 TLB flush requests by going through all vCPUs and checking
> whether there are vCPUs running the same VM_ID with a VP_ID specified
> in the requests. Perform synthetic exit to L2 upon finish.
> 
> Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit
> racy, we count on the fact that KVM flushes the whole L2 VPID upon
> transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon
> transition between L1 and L2 to make sure all pending requests are
> always processed.
> 
> For the reference, Hyper-V TLFS refers to the feature as "Direct
> Virtual Flush".
> 
> Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 73 ++++++++++++++++++++++++++++++++++++-------
>  arch/x86/kvm/hyperv.h |  3 --
>  arch/x86/kvm/trace.h  | 21 ++++++++-----
>  3 files changed, 74 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index e9793d36acca..79aabe0c33ec 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -34,6 +34,7 @@
>  #include <linux/eventfd.h>
>  
>  #include <asm/apicdef.h>
> +#include <asm/mshyperv.h>
>  #include <trace/events/kvm.h>
>  
>  #include "trace.h"
> @@ -1842,9 +1843,10 @@ static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
>  	return read_idx - write_idx - 1;
>  }
>  
> -static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
> +static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
> +				      struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring,
> +				      u64 *entries, int count)
>  {
> -	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>  	int ring_free, write_idx, read_idx;
>  	unsigned long flags;
> @@ -1853,9 +1855,6 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int c
>  	if (!hv_vcpu)
>  		return;
>  
> -	/* kvm_hv_flush_tlb() is not ready to handle requests for L2s yet */
> -	tlb_flush_ring = &hv_vcpu->tlb_flush_ring[HV_L1_TLB_FLUSH_RING];
> -
>  	spin_lock_irqsave(&tlb_flush_ring->write_lock, flags);
>  
>  	/*
> @@ -1974,6 +1973,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  	struct hv_tlb_flush_ex flush_ex;
>  	struct hv_tlb_flush flush;
>  	DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
> +	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  	/*
>  	 * Normally, there can be no more than 'KVM_HV_TLB_FLUSH_RING_SIZE - 1'
>  	 * entries on the TLB Flush ring as when 'read_idx == write_idx' the
> @@ -2018,7 +2018,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  		}
>  
>  		trace_kvm_hv_flush_tlb(flush.processor_mask,
> -				       flush.address_space, flush.flags);
> +				       flush.address_space, flush.flags,
> +				       is_guest_mode(vcpu));
>  
>  		valid_bank_mask = BIT_ULL(0);
>  		sparse_banks[0] = flush.processor_mask;
> @@ -2049,7 +2050,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  		trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
>  					  flush_ex.hv_vp_set.format,
>  					  flush_ex.address_space,
> -					  flush_ex.flags);
> +					  flush_ex.flags, is_guest_mode(vcpu));
>  
>  		valid_bank_mask = flush_ex.hv_vp_set.valid_bank_mask;
>  		all_cpus = flush_ex.hv_vp_set.format !=
> @@ -2083,23 +2084,54 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
>  		tlb_flush_entries = __tlb_flush_entries;
>  	}
>  
> +	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
> +
>  	/*
>  	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
>  	 * analyze it here, flush TLB regardless of the specified address space.
>  	 */
> -	if (all_cpus) {
> +	if (all_cpus && !is_guest_mode(vcpu)) {
>  		kvm_for_each_vcpu(i, v, kvm)
> -			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
> +			hv_tlb_flush_ring_enqueue(v, tlb_flush_ring,
> +						  tlb_flush_entries, hc->rep_cnt);
>  
>  		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
> -	} else {
> +	} else if (!is_guest_mode(vcpu)) {
>  		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
>  
>  		for_each_set_bit(i, vcpu_mask, KVM_MAX_VCPUS) {
>  			v = kvm_get_vcpu(kvm, i);
>  			if (!v)
>  				continue;
> -			hv_tlb_flush_ring_enqueue(v, tlb_flush_entries, hc->rep_cnt);
> +			hv_tlb_flush_ring_enqueue(v, tlb_flush_ring,
> +						  tlb_flush_entries, hc->rep_cnt);
> +		}
> +
> +		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
> +	} else {
> +		struct kvm_vcpu_hv *hv_v;
> +
> +		bitmap_zero(vcpu_mask, KVM_MAX_VCPUS);
> +
> +		kvm_for_each_vcpu(i, v, kvm) {
> +			hv_v = to_hv_vcpu(v);
> +
> +			/*
> +			 * TLB is fully flushed on L2 VM change: either by KVM
> +			 * (on a eVMPTR switch) or by L1 hypervisor (in case it
> +			 * re-purposes the active eVMCS for a different VM/VP).
> +			 */
> +			if (!hv_v || hv_v->nested.vm_id != hv_vcpu->nested.vm_id)
> +				continue;

This is indeed racy, but I think it is OK.

Nitpick:

I think that this does need a better comment on why the race is OK.
The current comment explains that we flush the TLB but doesn't explain
why that is sufficient.

I would probably write something like that:


"This races with nested vCPUs entering/exiting and/or migrating between
the L1's vCPUs.

However the only case when we want to actually flush the TLB
of the target nested vCPU is when it was running non-stop on same L1 vCPU
since the moment the flush request was created and till now.

Otherwise either the target nested vCPU is not running and it will flush
its TLB, once it runs again, or it already flushed its TLB by exiting
to L1 and entring itself again (possibly on a different L1 vCPU)"



> +
> +			if (!all_cpus &&
> +			    !hv_is_vp_in_sparse_set(hv_v->nested.vp_id, valid_bank_mask,
> +						    sparse_banks))
> +				continue;
> +
> +			__set_bit(i, vcpu_mask);
> +			hv_tlb_flush_ring_enqueue(v, tlb_flush_ring,
> +						  tlb_flush_entries, hc->rep_cnt);



>  		}
>  
>  		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
> @@ -2287,10 +2319,27 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
>  
>  static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
>  {
> +	int ret;
> +
>  	trace_kvm_hv_hypercall_done(result);
>  	kvm_hv_hypercall_set_result(vcpu, result);
>  	++vcpu->stat.hypercalls;
> -	return kvm_skip_emulated_instruction(vcpu);
> +	ret = kvm_skip_emulated_instruction(vcpu);
> +
> +	if (unlikely(hv_result_success(result) && is_guest_mode(vcpu)
> +		     && kvm_hv_is_tlb_flush_hcall(vcpu))) {
> +		struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +		u32 tlb_lock_count;
> +
> +		if (unlikely(kvm_read_guest(vcpu->kvm, hv_vcpu->nested.pa_page_gpa,
> +					    &tlb_lock_count, sizeof(tlb_lock_count))))
> +			kvm_inject_gp(vcpu, 0);
> +
> +		if (tlb_lock_count)
> +			kvm_x86_ops.nested_ops->post_hv_l2_tlb_flush(vcpu);
> +	}
> +
> +	return ret;
>  }
>  
>  static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index ca67c18cef2c..f593c9fd1dee 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -154,9 +154,6 @@ static inline struct kvm_vcpu_hv_tlb_flush_ring *kvm_hv_get_tlb_flush_ring(struc
>  	int i = !is_guest_mode(vcpu) ? HV_L1_TLB_FLUSH_RING :
>  		HV_L2_TLB_FLUSH_RING;
>  
> -	/* KVM does not handle L2 TLB flush requests yet */
> -	WARN_ON_ONCE(i != HV_L1_TLB_FLUSH_RING);
> -
>  	return &hv_vcpu->tlb_flush_ring[i];
>  }
>  
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index e3a24b8f04be..af7896182935 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -1479,38 +1479,41 @@ TRACE_EVENT(kvm_hv_timer_state,
>   * Tracepoint for kvm_hv_flush_tlb.
>   */
>  TRACE_EVENT(kvm_hv_flush_tlb,
> -	TP_PROTO(u64 processor_mask, u64 address_space, u64 flags),
> -	TP_ARGS(processor_mask, address_space, flags),
> +	TP_PROTO(u64 processor_mask, u64 address_space, u64 flags, bool guest_mode),
> +	TP_ARGS(processor_mask, address_space, flags, guest_mode),
>  
>  	TP_STRUCT__entry(
>  		__field(u64, processor_mask)
>  		__field(u64, address_space)
>  		__field(u64, flags)
> +		__field(bool, guest_mode)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->processor_mask = processor_mask;
>  		__entry->address_space = address_space;
>  		__entry->flags = flags;
> +		__entry->guest_mode = guest_mode;
>  	),
>  
> -	TP_printk("processor_mask 0x%llx address_space 0x%llx flags 0x%llx",
> +	TP_printk("processor_mask 0x%llx address_space 0x%llx flags 0x%llx %s",
>  		  __entry->processor_mask, __entry->address_space,
> -		  __entry->flags)
> +		  __entry->flags, __entry->guest_mode ? "(L2)" : "")
>  );
>  
>  /*
>   * Tracepoint for kvm_hv_flush_tlb_ex.
>   */
>  TRACE_EVENT(kvm_hv_flush_tlb_ex,
> -	TP_PROTO(u64 valid_bank_mask, u64 format, u64 address_space, u64 flags),
> -	TP_ARGS(valid_bank_mask, format, address_space, flags),
> +	TP_PROTO(u64 valid_bank_mask, u64 format, u64 address_space, u64 flags, bool guest_mode),
> +	TP_ARGS(valid_bank_mask, format, address_space, flags, guest_mode),
>  
>  	TP_STRUCT__entry(
>  		__field(u64, valid_bank_mask)
>  		__field(u64, format)
>  		__field(u64, address_space)
>  		__field(u64, flags)
> +		__field(bool, guest_mode)
>  	),
>  
>  	TP_fast_assign(
> @@ -1518,12 +1521,14 @@ TRACE_EVENT(kvm_hv_flush_tlb_ex,
>  		__entry->format = format;
>  		__entry->address_space = address_space;
>  		__entry->flags = flags;
> +		__entry->guest_mode = guest_mode;
>  	),
>  
>  	TP_printk("valid_bank_mask 0x%llx format 0x%llx "
> -		  "address_space 0x%llx flags 0x%llx",
> +		  "address_space 0x%llx flags 0x%llx %s",
>  		  __entry->valid_bank_mask, __entry->format,
> -		  __entry->address_space, __entry->flags)
> +		  __entry->address_space, __entry->flags,
> +		  __entry->guest_mode ? "(L2)" : "")
>  );
>  
>  /*


Looks OK, I might have missed something.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check
  2022-04-14 13:19 ` [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check Vitaly Kuznetsov
@ 2022-05-11 11:30   ` Maxim Levitsky
  2022-05-19 13:25     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:30 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
> from L2 in L0 to process L2 TLB flush requests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/hyperv.c           | 6 ++++++
>  arch/x86/kvm/hyperv.h           | 7 +++++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ce62fde5f4ff..168600490bd1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
>  		u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
>  		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
>  		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
> +		u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
>  	} cpuid_cache;
>  
>  	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 79aabe0c33ec..68a0df4e3f66 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
>  		hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
>  	else
>  		hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
> +
> +	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
> +	if (entry)
> +		hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
> +	else
> +		hv_vcpu->cpuid_cache.nested_features_eax = 0;
>  }
>  
>  int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index f593c9fd1dee..d8cb6d70dbc8 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
>  	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>  }
>  
> +static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> +	return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
> +}

Tiny nipick (feel free to ignore): maybe use 'supported' instead of 'exposed',
as we don't use this term in KVM often.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


> +
>  static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);





^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 18/34] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
  2022-04-14 13:19 ` [PATCH v3 18/34] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
@ 2022-05-11 11:30   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:30 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Section 1.9 of TLFS v6.0b says:
> 
> "All structures are padded in such a way that fields are aligned
> naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
> and so on)".
> 
> 'struct enlightened_vmcs' has a glitch:
> 
> ...
>         struct {
>                 u32                nested_flush_hypercall:1; /*   836: 0  4 */
>                 u32                msr_bitmap:1;         /*   836: 1  4 */
>                 u32                reserved:30;          /*   836: 2  4 */
>         } hv_enlightenments_control;                     /*   836     4 */
>         u32                        hv_vp_id;             /*   840     4 */
>         u64                        hv_vm_id;             /*   844     8 */
>         u64                        partition_assist_page; /*   852     8 */
> ...
> 
> And the observed values in 'partition_assist_page' make no sense at
> all. Fix the layout by padding the structure properly.
> 
> Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 5225a85c08c3..e7ddae8e02c6 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -548,7 +548,7 @@ struct hv_enlightened_vmcs {
>  	u64 guest_rip;
>  
>  	u32 hv_clean_fields;
> -	u32 hv_padding_32;
> +	u32 padding32_1;
>  	u32 hv_synthetic_controls;
>  	struct {
>  		u32 nested_flush_hypercall:1;
> @@ -556,7 +556,7 @@ struct hv_enlightened_vmcs {
>  		u32 reserved:30;
>  	}  __packed hv_enlightenments_control;
>  	u32 hv_vp_id;
> -
> +	u32 padding32_2;
>  	u64 hv_vm_id;
>  	u64 partition_assist_page;
>  	u64 padding64_4[4];


Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush
  2022-04-14 13:19 ` [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
@ 2022-05-11 11:31   ` Maxim Levitsky
  2022-05-16 20:16     ` Sean Christopherson
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Enable L2 TLB flush feature on nVMX when:
> - Enlightened VMCS is in use.
> - The feature flag is enabled in eVMCS.
> - The feature flag is enabled in partition assist page.
> 
> Perform synthetic vmexit to L1 after processing TLB flush call upon
> request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH).
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/vmx/evmcs.c  | 20 ++++++++++++++++++++
>  arch/x86/kvm/vmx/evmcs.h  | 10 ++++++++++
>  arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index e390e67496df..e0cb2e223daa 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -6,6 +6,7 @@
>  #include "../hyperv.h"
>  #include "../cpuid.h"
>  #include "evmcs.h"
> +#include "nested.h"
>  #include "vmcs.h"
>  #include "vmx.h"
>  #include "trace.h"
> @@ -438,6 +439,25 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> +bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
> +	struct hv_vp_assist_page assist_page;
> +
> +	if (!evmcs)
> +		return false;
> +
> +	if (!evmcs->hv_enlightenments_control.nested_flush_hypercall)
> +		return false;
> +
> +	if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
> +		return false;
> +
> +	return assist_page.nested_control.features.directhypercall;
> +}
> +
>  void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
>  {
> +	nested_vmx_vmexit(vcpu, HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH, 0, 0);
>  }
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index b120b0ead4f3..ddbdb557cc53 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -65,6 +65,15 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
>  #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
>  #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
>  
> +/*
> + * Note, Hyper-V isn't actually stealing bit 28 from Intel, just abusing it by
> + * pairing it with architecturally impossible exit reasons.  Bit 28 is set only
> + * on SMI exits to a SMI transfer monitor (STM) and if and only if a MTF VM-Exit
> + * is pending.  I.e. it will never be set by hardware for non-SMI exits (there
> + * are only three), nor will it ever be set unless the VMM is an STM.

I am sure that this will backfire this way or another. Their fault though...


I also wonder why they need that synthetic VM exit, it's in the spec,
but why I don't fully understand. Their fault as well though.

The flag that controls it is 'TlbLockCount', I wonder what it means...

> + */
> +#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
> +
>  struct evmcs_field {
>  	u16 offset;
>  	u16 clean_field;
> @@ -244,6 +253,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  			uint16_t *vmcs_version);
>  void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
>  int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
> +bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu);
>  void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
>  
>  #endif /* __KVM_X86_VMX_EVMCS_H */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index cc6c944b5815..3e2ef5edad4a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1170,6 +1170,17 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
> +	/*
> +	 * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
> +	 * L2's VP_ID upon request from the guest. Make sure we check for
> +	 * pending entries for the case when the request got misplaced (e.g.
> +	 * a transition from L2->L1 happened while processing L2 TLB flush
> +	 * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
> +	 * anything if there are no requests in the corresponding buffer.
> +	 */
> +	if (to_hv_vcpu(vcpu))
> +		kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> +
>  	/*
>  	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
>  	 * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
> @@ -5997,6 +6008,11 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
>  		 * Handle L2's bus locks in L0 directly.
>  		 */
>  		return true;
> +	case EXIT_REASON_VMCALL:
> +		/* Hyper-V L2 TLB flush hypercall is handled by L0 */
> +		return kvm_hv_l2_tlb_flush_exposed(vcpu) &&
> +			nested_evmcs_l2_tlb_flush_enabled(vcpu) &&
> +			kvm_hv_is_tlb_flush_hcall(vcpu);
>  	default:
>  		break;
>  	}



Looks good,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook
  2022-04-14 13:19 ` [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook Vitaly Kuznetsov
@ 2022-05-11 11:32   ` Maxim Levitsky
  2022-05-18 12:43     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Hyper-V supports injecting synthetic L2->L1 exit after performing
> L2 TLB flush operation but the procedure is vendor specific.
> Introduce .post_hv_l2_tlb_flush() nested hook for it.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/Makefile           |  3 ++-
>  arch/x86/kvm/svm/hyperv.c       | 11 +++++++++++
>  arch/x86/kvm/svm/hyperv.h       |  2 ++
>  arch/x86/kvm/svm/nested.c       |  1 +
>  arch/x86/kvm/vmx/evmcs.c        |  4 ++++
>  arch/x86/kvm/vmx/evmcs.h        |  1 +
>  arch/x86/kvm/vmx/nested.c       |  1 +
>  8 files changed, 23 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kvm/svm/hyperv.c
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8b2a52bf26c0..ce62fde5f4ff 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1558,6 +1558,7 @@ struct kvm_x86_nested_ops {
>  	int (*enable_evmcs)(struct kvm_vcpu *vcpu,
>  			    uint16_t *vmcs_version);
>  	uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
> +	void (*post_hv_l2_tlb_flush)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_x86_init_ops {
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index 30f244b64523..b6d53b045692 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -25,7 +25,8 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
>  			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
>  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
>  
> -kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
> +kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
> +			   svm/sev.o svm/hyperv.o
>  
>  ifdef CONFIG_HYPERV
>  kvm-amd-y		+= svm/svm_onhyperv.o
> diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
> new file mode 100644
> index 000000000000..c0749fc282fe
> --- /dev/null
> +++ b/arch/x86/kvm/svm/hyperv.c
> @@ -0,0 +1,11 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD SVM specific code for Hyper-V on KVM.
> + *
> + * Copyright 2022 Red Hat, Inc. and/or its affiliates.
> + */
> +#include "hyperv.h"
> +
> +void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
> +{
> +}
> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> index 8cf702fed7e5..a2b0d7580b0d 100644
> --- a/arch/x86/kvm/svm/hyperv.h
> +++ b/arch/x86/kvm/svm/hyperv.h
> @@ -48,4 +48,6 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
>  	hv_vcpu->nested.vp_id = hve->hv_vp_id;
>  }
>  
> +void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 2d1a76343404..de3f27301b5c 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1665,4 +1665,5 @@ struct kvm_x86_nested_ops svm_nested_ops = {
>  	.get_nested_state_pages = svm_get_nested_state_pages,
>  	.get_state = svm_get_nested_state,
>  	.set_state = svm_set_nested_state,
> +	.post_hv_l2_tlb_flush = svm_post_hv_l2_tlb_flush,
>  };
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 87e3dc10edf4..e390e67496df 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -437,3 +437,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  
>  	return 0;
>  }
> +
> +void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
> +{
> +}
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 8d70f9aea94b..b120b0ead4f3 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -244,5 +244,6 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  			uint16_t *vmcs_version);
>  void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
>  int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
> +void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
>  
>  #endif /* __KVM_X86_VMX_EVMCS_H */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index ee88921c6156..cc6c944b5815 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6850,4 +6850,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
>  	.write_log_dirty = nested_vmx_write_pml_buffer,
>  	.enable_evmcs = nested_enable_evmcs,
>  	.get_evmcs_version = nested_get_evmcs_version,
> +	.post_hv_l2_tlb_flush = vmx_post_hv_l2_tlb_flush,
>  };


I think that the name of the function is misleading, since it is not called
after each L2 HV tlb flush, but only after a flush which needs to inject
that synthetic VM exit.

I think something like 'inject_synthetic_l2_hv_tlb_flush_vmexit' 
(not a good name IMHO, but you get the idea) would be better.

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too
  2022-04-14 13:19 ` [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too Vitaly Kuznetsov
@ 2022-05-11 11:33   ` Maxim Levitsky
  2022-05-19  9:12     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> KVM_REQ_TLB_FLUSH_CURRENT is an even stronger operation than
> KVM_REQ_TLB_FLUSH_GUEST so KVM_REQ_HV_TLB_FLUSH needs not to be
> processed after it.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5aec386d299..d3839e648ab3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3357,8 +3357,11 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>   */
>  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>  		kvm_vcpu_flush_tlb_current(vcpu);
> +		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
> +			kvm_hv_vcpu_empty_flush_tlb(vcpu);
> +	}
>  
>  	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
>  		kvm_vcpu_flush_tlb_guest(vcpu);


I think that this patch should be moved near patch 1 and/or even squished with it.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 21/34] KVM: nSVM: hyper-v: Enable L2 TLB flush
  2022-04-14 13:20 ` [PATCH v3 21/34] KVM: nSVM: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
@ 2022-05-11 11:33   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
> both in extended 'nested controls' in VMCB and partition assist page.
> According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
> - HV_SVM_EXITCODE_ENL exit_code.
> - HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/svm/hyperv.c |  7 +++++++
>  arch/x86/kvm/svm/hyperv.h | 19 +++++++++++++++++++
>  arch/x86/kvm/svm/nested.c | 22 +++++++++++++++++++++-
>  3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
> index c0749fc282fe..3842548bb88c 100644
> --- a/arch/x86/kvm/svm/hyperv.c
> +++ b/arch/x86/kvm/svm/hyperv.c
> @@ -8,4 +8,11 @@
>  
>  void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL;
> +	svm->vmcb->control.exit_code_hi = 0;
> +	svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH;
> +	svm->vmcb->control.exit_info_2 = 0;
> +	nested_svm_vmexit(svm);
>  }
> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> index a2b0d7580b0d..cd33e89f9f61 100644
> --- a/arch/x86/kvm/svm/hyperv.h
> +++ b/arch/x86/kvm/svm/hyperv.h
> @@ -33,6 +33,9 @@ struct hv_enlightenments {
>   */
>  #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>  
> +#define HV_SVM_EXITCODE_ENL 0xF0000000
> +#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH   (1)
> +
>  static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -48,6 +51,22 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
>  	hv_vcpu->nested.vp_id = hve->hv_vp_id;
>  }
>  
> +static inline bool nested_svm_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct hv_enlightenments *hve =
> +		(struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
> +	struct hv_vp_assist_page assist_page;
> +
> +	if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
> +		return false;
> +
> +	if (!hve->hv_enlightenments_control.nested_flush_hypercall)
> +		return false;
> +
> +	return assist_page.nested_control.features.directhypercall;
> +}
> +
>  void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de3f27301b5c..a6d9807c09b1 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -172,7 +172,8 @@ void recalc_intercepts(struct vcpu_svm *svm)
>  	}
>  
>  	/* We don't want to see VMMCALLs from a nested guest */

Minor nitpick: Maybe update the comment?


> -	vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> +	if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
> +		vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
>  
>  	for (i = 0; i < MAX_INTERCEPT; i++)
>  		c->intercepts[i] |= g->intercepts[i];
> @@ -488,6 +489,17 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
>  
>  static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
>  {
> +	/*
> +	 * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
> +	 * L2's VP_ID upon request from the guest. Make sure we check for
> +	 * pending entries for the case when the request got misplaced (e.g.
> +	 * a transition from L2->L1 happened while processing L2 TLB flush
> +	 * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
> +	 * anything if there are no requests in the corresponding buffer.
> +	 */
> +	if (to_hv_vcpu(vcpu))
> +		kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> +
>  	/*
>  	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
>  	 * things to fix before this can be conditional:
> @@ -1357,6 +1369,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
>  int nested_svm_exit_special(struct vcpu_svm *svm)
>  {
>  	u32 exit_code = svm->vmcb->control.exit_code;
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
>  
>  	switch (exit_code) {
>  	case SVM_EXIT_INTR:
> @@ -1375,6 +1388,13 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
>  			return NESTED_EXIT_HOST;
>  		break;
>  	}
> +	case SVM_EXIT_VMMCALL:
> +		/* Hyper-V L2 TLB flush hypercall is handled by L0 */
> +		if (kvm_hv_l2_tlb_flush_exposed(vcpu) &&
> +		    nested_svm_l2_tlb_flush_enabled(vcpu) &&
> +		    kvm_hv_is_tlb_flush_hcall(vcpu))
> +			return NESTED_EXIT_HOST;
> +		break;
>  	default:
>  		break;
>  	}


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 22/34] KVM: x86: Expose Hyper-V L2 TLB flush feature
  2022-04-14 13:20 ` [PATCH v3 22/34] KVM: x86: Expose Hyper-V L2 TLB flush feature Vitaly Kuznetsov
@ 2022-05-11 11:34   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> With both nSVM and nVMX implementations in place, KVM can now expose
> Hyper-V L2 TLB flush feature to userspace.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 68a0df4e3f66..1d6927538bc7 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2826,6 +2826,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>  
>  		case HYPERV_CPUID_NESTED_FEATURES:
>  			ent->eax = evmcs_ver;
> +			ent->eax |= HV_X64_NESTED_DIRECT_FLUSH;
>  			ent->eax |= HV_X64_NESTED_MSR_BITMAP;
>  
>  			break;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 23/34] KVM: selftests: Better XMM read/write helpers
  2022-04-14 13:20 ` [PATCH v3 23/34] KVM: selftests: Better XMM read/write helpers Vitaly Kuznetsov
@ 2022-05-11 11:34   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
> from 128-bit registers. Moreover, these helpers are not used. Borrow
> _kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
> XMM0-XMM8 for now.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  .../selftests/kvm/include/x86_64/processor.h  | 70 ++++++++++---------
>  1 file changed, 36 insertions(+), 34 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 37db341d4cc5..9ad7602a257b 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -296,71 +296,73 @@ static inline void cpuid(uint32_t *eax, uint32_t *ebx,
>  	    : "memory");
>  }
>  
> -#define SET_XMM(__var, __xmm) \
> -	asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
> +typedef u32		__attribute__((vector_size(16))) sse128_t;
> +#define __sse128_u	union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
> +#define sse128_lo(x)	({ __sse128_u t; t.vec = x; t.as_u64[0]; })
> +#define sse128_hi(x)	({ __sse128_u t; t.vec = x; t.as_u64[1]; })
>  
> -static inline void set_xmm(int n, unsigned long val)
> +static inline void read_sse_reg(int reg, sse128_t *data)
>  {
> -	switch (n) {
> +	switch (reg) {
>  	case 0:
> -		SET_XMM(val, xmm0);
> +		asm("movdqa %%xmm0, %0" : "=m"(*data));
>  		break;
>  	case 1:
> -		SET_XMM(val, xmm1);
> +		asm("movdqa %%xmm1, %0" : "=m"(*data));
>  		break;
>  	case 2:
> -		SET_XMM(val, xmm2);
> +		asm("movdqa %%xmm2, %0" : "=m"(*data));
>  		break;
>  	case 3:
> -		SET_XMM(val, xmm3);
> +		asm("movdqa %%xmm3, %0" : "=m"(*data));
>  		break;
>  	case 4:
> -		SET_XMM(val, xmm4);
> +		asm("movdqa %%xmm4, %0" : "=m"(*data));
>  		break;
>  	case 5:
> -		SET_XMM(val, xmm5);
> +		asm("movdqa %%xmm5, %0" : "=m"(*data));
>  		break;
>  	case 6:
> -		SET_XMM(val, xmm6);
> +		asm("movdqa %%xmm6, %0" : "=m"(*data));
>  		break;
>  	case 7:
> -		SET_XMM(val, xmm7);
> +		asm("movdqa %%xmm7, %0" : "=m"(*data));
>  		break;
> +	default:
> +		BUG();
>  	}
>  }
>  
> -#define GET_XMM(__xmm)							\
> -({									\
> -	unsigned long __val;						\
> -	asm volatile("movq %%"#__xmm", %0" : "=r"(__val));		\
> -	__val;								\
> -})
> -
> -static inline unsigned long get_xmm(int n)
> +static inline void write_sse_reg(int reg, const sse128_t *data)
>  {
> -	assert(n >= 0 && n <= 7);
> -
> -	switch (n) {
> +	switch (reg) {
>  	case 0:
> -		return GET_XMM(xmm0);
> +		asm("movdqa %0, %%xmm0" : : "m"(*data));
> +		break;
>  	case 1:
> -		return GET_XMM(xmm1);
> +		asm("movdqa %0, %%xmm1" : : "m"(*data));
> +		break;
>  	case 2:
> -		return GET_XMM(xmm2);
> +		asm("movdqa %0, %%xmm2" : : "m"(*data));
> +		break;
>  	case 3:
> -		return GET_XMM(xmm3);
> +		asm("movdqa %0, %%xmm3" : : "m"(*data));
> +		break;
>  	case 4:
> -		return GET_XMM(xmm4);
> +		asm("movdqa %0, %%xmm4" : : "m"(*data));
> +		break;
>  	case 5:
> -		return GET_XMM(xmm5);
> +		asm("movdqa %0, %%xmm5" : : "m"(*data));
> +		break;
>  	case 6:
> -		return GET_XMM(xmm6);
> +		asm("movdqa %0, %%xmm6" : : "m"(*data));
> +		break;
>  	case 7:
> -		return GET_XMM(xmm7);
> +		asm("movdqa %0, %%xmm7" : : "m"(*data));
> +		break;
> +	default:
> +		BUG();
>  	}
> -
> -	/* never reached */
> -	return 0;
>  }
>  
>  static inline void cpu_relax(void)


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 25/34] KVM: selftests: Make it possible to replace PTEs with __virt_pg_map()
  2022-04-14 13:20 ` [PATCH v3 25/34] KVM: selftests: Make it possible to replace PTEs with __virt_pg_map() Vitaly Kuznetsov
@ 2022-05-11 11:34   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> __virt_pg_map() makes an assumption that leaf PTE is not present. This
> is not suitable if the test wants to replace an already present
> PTE. Hyper-V PV TLB flush test is going to need that.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/include/x86_64/processor.h | 2 +-
>  tools/testing/selftests/kvm/lib/x86_64/processor.c     | 6 +++---
>  tools/testing/selftests/kvm/max_guest_memory_test.c    | 2 +-
>  tools/testing/selftests/kvm/x86_64/mmu_role_test.c     | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 9ad7602a257b..c20b18d05119 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -473,7 +473,7 @@ enum x86_page_size {
>  	X86_PAGE_SIZE_1G,
>  };
>  void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> -		   enum x86_page_size page_size);
> +		   enum x86_page_size page_size, bool replace);
>  
>  /*
>   * Basic CPU control in CR0
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 9f000dfb5594..20df3e84d777 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -229,7 +229,7 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
>  }
>  
>  void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> -		   enum x86_page_size page_size)
> +		   enum x86_page_size page_size, bool replace)
>  {
>  	const uint64_t pg_size = 1ull << ((page_size * 9) + 12);
>  	struct pageUpperEntry *pml4e, *pdpe, *pde;
> @@ -270,7 +270,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  
>  	/* Fill in page table entry. */
>  	pte = virt_get_pte(vm, pde->pfn, vaddr, 0);
> -	TEST_ASSERT(!pte->present,
> +	TEST_ASSERT(replace || !pte->present,
>  		    "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
>  	pte->pfn = paddr >> vm->page_shift;
>  	pte->writable = true;
> @@ -279,7 +279,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  
>  void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
>  {
> -	__virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K);
> +	__virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K, false);
>  }
>  
>  static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid,
> diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c
> index 3875c4b23a04..437f77633b0e 100644
> --- a/tools/testing/selftests/kvm/max_guest_memory_test.c
> +++ b/tools/testing/selftests/kvm/max_guest_memory_test.c
> @@ -244,7 +244,7 @@ int main(int argc, char *argv[])
>  #ifdef __x86_64__
>  		/* Identity map memory in the guest using 1gb pages. */
>  		for (i = 0; i < slot_size; i += size_1gb)
> -			__virt_pg_map(vm, gpa + i, gpa + i, X86_PAGE_SIZE_1G);
> +			__virt_pg_map(vm, gpa + i, gpa + i, X86_PAGE_SIZE_1G, false);
>  #else
>  		for (i = 0; i < slot_size; i += vm_get_page_size(vm))
>  			virt_pg_map(vm, gpa + i, gpa + i);
> diff --git a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
> index da2325fcad87..e3fdf320b9f4 100644
> --- a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
> @@ -35,7 +35,7 @@ static void mmu_role_test(u32 *cpuid_reg, u32 evil_cpuid_val)
>  	run = vcpu_state(vm, VCPU_ID);
>  
>  	/* Map 1gb page without a backing memlot. */
> -	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, X86_PAGE_SIZE_1G);
> +	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, X86_PAGE_SIZE_1G, false);
>  
>  	r = _vcpu_run(vm, VCPU_ID);
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 24/34] KVM: selftests: Hyper-V PV IPI selftest
  2022-04-14 13:20 ` [PATCH v3 24/34] KVM: selftests: Hyper-V PV IPI selftest Vitaly Kuznetsov
@ 2022-05-11 11:35   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 11:35 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Introduce a selftest for Hyper-V PV IPI hypercalls
> (HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx).
> 
> The test creates one 'sender' vCPU and two 'receiver' vCPU and then
> issues various combinations of send IPI hypercalls in both 'normal'
> and 'fast' (with XMM input where necessary) mode. Later, the test
> checks whether IPIs were delivered to the expected destination vCPU[s].
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/hyperv.h     |   3 +
>  .../selftests/kvm/x86_64/hyperv_features.c    |   5 +-
>  .../testing/selftests/kvm/x86_64/hyperv_ipi.c | 374 ++++++++++++++++++
>  5 files changed, 381 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 56140068b763..5d5fbb161d56 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -23,6 +23,7 @@
>  /x86_64/hyperv_clock
>  /x86_64/hyperv_cpuid
>  /x86_64/hyperv_features
> +/x86_64/hyperv_ipi
>  /x86_64/hyperv_svm_test
>  /x86_64/mmio_warning_test
>  /x86_64/mmu_role_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index af582d168621..44889f897fe7 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -52,6 +52,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
> +TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
> diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> index b66910702c0a..f51d6fab8e93 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> @@ -184,5 +184,8 @@
>  
>  /* hypercall options */
>  #define HV_HYPERCALL_FAST_BIT		BIT(16)
> +#define HV_HYPERCALL_VARHEAD_OFFSET	17
> +
> +#define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
>  
>  #endif /* !SELFTEST_KVM_HYPERV_H */
> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
> index 672915ce73d8..98c020356925 100644
> --- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
> @@ -14,7 +14,6 @@
>  #include "hyperv.h"
>  
>  #define VCPU_ID 0
> -#define LINUX_OS_ID ((u64)0x8100 << 48)
>  
>  extern unsigned char rdmsr_start;
>  extern unsigned char rdmsr_end;
> @@ -127,7 +126,7 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
>  	int i = 0;
>  	u64 res, input, output;
>  
> -	wrmsr(HV_X64_MSR_GUEST_OS_ID, LINUX_OS_ID);
> +	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
>  	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
>  
>  	while (hcall->control) {
> @@ -230,7 +229,7 @@ static void guest_test_msrs_access(void)
>  			 */
>  			msr->idx = HV_X64_MSR_GUEST_OS_ID;
>  			msr->write = 1;
> -			msr->write_val = LINUX_OS_ID;
> +			msr->write_val = HYPERV_LINUX_OS_ID;
>  			msr->available = 1;
>  			break;
>  		case 3:

Nitpick: I think that the HYPERV_LINUX_OS_ID change should be in a separate patch.


> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
> new file mode 100644
> index 000000000000..075963c32d45
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
> @@ -0,0 +1,374 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V HvCallSendSyntheticClusterIpi{,Ex} tests
> + *
> + * Copyright (C) 2022, Red Hat, Inc.
> + *
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <pthread.h>
> +#include <inttypes.h>
> +
> +#include "kvm_util.h"
> +#include "hyperv.h"
> +#include "processor.h"
> +#include "test_util.h"
> +#include "vmx.h"
> +
> +#define SENDER_VCPU_ID   1
> +#define RECEIVER_VCPU_ID_1 2
> +#define RECEIVER_VCPU_ID_2 65
> +
> +#define IPI_VECTOR	 0xfe
> +
> +static volatile uint64_t ipis_rcvd[RECEIVER_VCPU_ID_2 + 1];
> +
> +struct thread_params {
> +	struct kvm_vm *vm;
> +	uint32_t vcpu_id;
> +};
> +
> +struct hv_vpset {
> +	u64 format;
> +	u64 valid_bank_mask;
> +	u64 bank_contents[2];
> +};
> +
> +enum HV_GENERIC_SET_FORMAT {
> +	HV_GENERIC_SET_SPARSE_4K,
> +	HV_GENERIC_SET_ALL,
> +};
> +
> +/* HvCallSendSyntheticClusterIpi hypercall */
> +struct hv_send_ipi {
> +	u32 vector;
> +	u32 reserved;
> +	u64 cpu_mask;
> +};
> +
> +/* HvCallSendSyntheticClusterIpiEx hypercall */
> +struct hv_send_ipi_ex {
> +	u32 vector;
> +	u32 reserved;
> +	struct hv_vpset vp_set;
> +};
> +
> +static inline void hv_init(vm_vaddr_t pgs_gpa)
> +{
> +	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
> +	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
> +}
> +
> +static void receiver_code(void *hcall_page, vm_vaddr_t pgs_gpa)
> +{
> +	u32 vcpu_id;
> +
> +	x2apic_enable();
> +	hv_init(pgs_gpa);
> +
> +	vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
> +
> +	/* Signal sender vCPU we're ready */
> +	ipis_rcvd[vcpu_id] = (u64)-1;
> +
> +	for (;;)
> +		asm volatile("sti; hlt; cli");
> +}
> +
> +static void guest_ipi_handler(struct ex_regs *regs)
> +{
> +	u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
> +
> +	ipis_rcvd[vcpu_id]++;
> +	wrmsr(HV_X64_MSR_EOI, 1);
> +}
> +
> +static inline u64 hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
> +{
> +	u64 hv_status;
> +
> +	asm volatile("mov %3, %%r8\n"
> +		     "vmcall"
> +		     : "=a" (hv_status),
> +		       "+c" (control), "+d" (arg1)
> +		     :  "r" (arg2)
> +		     : "cc", "memory", "r8", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +
> +static inline void nop_loop(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < 100000000; i++)
> +		asm volatile("nop");
> +}
> +
> +static inline void sync_to_xmm(void *data)
> +{
> +	int i;
> +
> +	for (i = 0; i < 8; i++)
> +		write_sse_reg(i, (sse128_t *)(data + sizeof(sse128_t) * i));
> +}
> +
> +static void sender_guest_code(void *hcall_page, vm_vaddr_t pgs_gpa)
> +{
> +	struct hv_send_ipi *ipi = (struct hv_send_ipi *)hcall_page;
> +	struct hv_send_ipi_ex *ipi_ex = (struct hv_send_ipi_ex *)hcall_page;
> +	int stage = 1, ipis_expected[2] = {0};
> +	u64 res;
> +
> +	hv_init(pgs_gpa);
> +	GUEST_SYNC(stage++);
> +
> +	/* Wait for receiver vCPUs to come up */
> +	while (!ipis_rcvd[RECEIVER_VCPU_ID_1] || !ipis_rcvd[RECEIVER_VCPU_ID_2])
> +		nop_loop();
> +	ipis_rcvd[RECEIVER_VCPU_ID_1] = ipis_rcvd[RECEIVER_VCPU_ID_2] = 0;
> +
> +	/* 'Slow' HvCallSendSyntheticClusterIpi to RECEIVER_VCPU_ID_1 */
> +	ipi->vector = IPI_VECTOR;
> +	ipi->cpu_mask = 1 << RECEIVER_VCPU_ID_1;
> +	res = hypercall(HVCALL_SEND_IPI, pgs_gpa, pgs_gpa + 4096);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +	/* 'Fast' HvCallSendSyntheticClusterIpi to RECEIVER_VCPU_ID_1 */
> +	res = hypercall(HVCALL_SEND_IPI | HV_HYPERCALL_FAST_BIT,
> +			IPI_VECTOR, 1 << RECEIVER_VCPU_ID_1);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +
> +	/* 'Slow' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_1 */
> +	memset(hcall_page, 0, 4096);
> +	ipi_ex->vector = IPI_VECTOR;
> +	ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +	ipi_ex->vp_set.valid_bank_mask = 1 << 0;
> +	ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_1);
> +	res = hypercall(HVCALL_SEND_IPI_EX | (1 << HV_HYPERCALL_VARHEAD_OFFSET),
> +			pgs_gpa, pgs_gpa + 4096);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_1 */
> +	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
> +	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
> +			(1 << HV_HYPERCALL_VARHEAD_OFFSET),
> +			IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +
> +	/* 'Slow' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_2 */
> +	memset(hcall_page, 0, 4096);
> +	ipi_ex->vector = IPI_VECTOR;
> +	ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +	ipi_ex->vp_set.valid_bank_mask = 1 << 1;
> +	ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_2 - 64);
> +	res = hypercall(HVCALL_SEND_IPI_EX | (1 << HV_HYPERCALL_VARHEAD_OFFSET),
> +			pgs_gpa, pgs_gpa + 4096);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_2 */
> +	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
> +	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
> +			(1 << HV_HYPERCALL_VARHEAD_OFFSET),
> +			IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +
> +	/* 'Slow' HvCallSendSyntheticClusterIpiEx to both RECEIVER_VCPU_ID_{1,2} */
> +	memset(hcall_page, 0, 4096);
> +	ipi_ex->vector = IPI_VECTOR;
> +	ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +	ipi_ex->vp_set.valid_bank_mask = 1 << 1 | 1;
> +	ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_1);
> +	ipi_ex->vp_set.bank_contents[1] = BIT(RECEIVER_VCPU_ID_2 - 64);
> +	res = hypercall(HVCALL_SEND_IPI_EX | (2 << HV_HYPERCALL_VARHEAD_OFFSET),
> +			pgs_gpa, pgs_gpa + 4096);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to both RECEIVER_VCPU_ID_{1, 2} */
> +	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
> +	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
> +			(2 << HV_HYPERCALL_VARHEAD_OFFSET),
> +			IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +
> +	/* 'Slow' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL */
> +	memset(hcall_page, 0, 4096);
> +	ipi_ex->vector = IPI_VECTOR;
> +	ipi_ex->vp_set.format = HV_GENERIC_SET_ALL;
> +	res = hypercall(HVCALL_SEND_IPI_EX,
> +			pgs_gpa, pgs_gpa + 4096);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +	/* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL */
> +	sync_to_xmm(&ipi_ex->vp_set.valid_bank_mask);
> +	res = hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT,
> +			IPI_VECTOR, HV_GENERIC_SET_ALL);
> +	GUEST_ASSERT((res & 0xffff) == 0);
> +	nop_loop();
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
> +	GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
> +	GUEST_SYNC(stage++);
> +
> +	GUEST_DONE();
> +}
> +
> +static void *vcpu_thread(void *arg)
> +{
> +	struct thread_params *params = (struct thread_params *)arg;
> +	struct ucall uc;
> +	int old;
> +	int r;
> +	unsigned int exit_reason;
> +
> +	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
> +		    params->vcpu_id, r);
> +
> +	vcpu_run(params->vm, params->vcpu_id);
> +	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> +
> +	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> +		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> +		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> +
> +	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
> +		TEST_ASSERT(false,
> +			    "vCPU %u exited with error: %s.\n",
> +			    params->vcpu_id, (const char *)uc.args[0]);
> +	}
> +
> +	return NULL;
> +}
> +
> +static void cancel_join_vcpu_thread(pthread_t thread, uint32_t vcpu_id)
> +{
> +	void *retval;
> +	int r;
> +
> +	r = pthread_cancel(thread);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_cancel on vcpu_id=%d failed with errno=%d",
> +		    vcpu_id, r);
> +
> +	r = pthread_join(thread, &retval);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_join on vcpu_id=%d failed with errno=%d",
> +		    vcpu_id, r);
> +	TEST_ASSERT(retval == PTHREAD_CANCELED,
> +		    "expected retval=%p, got %p", PTHREAD_CANCELED,
> +		    retval);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int r;
> +	pthread_t threads[2];
> +	struct thread_params params[2];
> +	struct kvm_vm *vm;
> +	struct kvm_run *run;
> +	vm_vaddr_t hcall_page;
> +	struct ucall uc;
> +	int stage = 1;
> +
> +	vm = vm_create_default(SENDER_VCPU_ID, 0, sender_guest_code);
> +	params[0].vm = vm;
> +	params[1].vm = vm;
> +
> +	/* Hypercall input/output */
> +	hcall_page = vm_vaddr_alloc_pages(vm, 2);
> +	memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
> +
> +	vm_init_descriptor_tables(vm);
> +
> +	vm_vcpu_add_default(vm, RECEIVER_VCPU_ID_1, receiver_code);
> +	vcpu_init_descriptor_tables(vm, RECEIVER_VCPU_ID_1);
> +	vcpu_args_set(vm, RECEIVER_VCPU_ID_1, 2, hcall_page, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_msr(vm, RECEIVER_VCPU_ID_1, HV_X64_MSR_VP_INDEX, RECEIVER_VCPU_ID_1);
> +	vcpu_set_hv_cpuid(vm, RECEIVER_VCPU_ID_1);
> +
> +	vm_vcpu_add_default(vm, RECEIVER_VCPU_ID_2, receiver_code);
> +	vcpu_init_descriptor_tables(vm, RECEIVER_VCPU_ID_2);
> +	vcpu_args_set(vm, RECEIVER_VCPU_ID_2, 2, hcall_page, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_msr(vm, RECEIVER_VCPU_ID_2, HV_X64_MSR_VP_INDEX, RECEIVER_VCPU_ID_2);
> +	vcpu_set_hv_cpuid(vm, RECEIVER_VCPU_ID_2);
> +
> +	vm_install_exception_handler(vm, IPI_VECTOR, guest_ipi_handler);
> +
> +	vcpu_args_set(vm, SENDER_VCPU_ID, 2, hcall_page, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_hv_cpuid(vm, SENDER_VCPU_ID);
> +
> +	params[0].vcpu_id = RECEIVER_VCPU_ID_1;
> +	r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_create halter failed errno=%d", errno);
> +
> +	params[1].vcpu_id = RECEIVER_VCPU_ID_2;
> +	r = pthread_create(&threads[1], NULL, vcpu_thread, &params[1]);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_create halter failed errno=%d", errno);
> +
> +	run = vcpu_state(vm, SENDER_VCPU_ID);
> +
> +	while (true) {
> +		r = _vcpu_run(vm, SENDER_VCPU_ID);
> +		TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
> +		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
> +			    "unexpected exit reason: %u (%s)",
> +			    run->exit_reason, exit_reason_str(run->exit_reason));
> +
> +		switch (get_ucall(vm, SENDER_VCPU_ID, &uc)) {
> +		case UCALL_SYNC:
> +			TEST_ASSERT(uc.args[1] == stage,
> +				    "Unexpected stage: %ld (%d expected)\n",
> +				    uc.args[1], stage);
> +			break;
> +		case UCALL_ABORT:
> +			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
> +				  __FILE__, uc.args[1]);
> +			return 1;
> +		case UCALL_DONE:
> +			return 0;
> +		}
> +
> +		stage++;
> +	}
> +
> +	cancel_join_vcpu_thread(threads[0], RECEIVER_VCPU_ID_1);
> +	cancel_join_vcpu_thread(threads[1], RECEIVER_VCPU_ID_2);
> +	kvm_vm_free(vm);
> +
> +	return 0;
> +}


Looks overall good to me, but I might have missed something.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest
  2022-04-14 13:20 ` [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest Vitaly Kuznetsov
@ 2022-05-11 12:17   ` Maxim Levitsky
  2022-05-24 14:51     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Introduce a selftest for Hyper-V PV TLB flush hypercalls
> (HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
> HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).
> 
> The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
> loop reading from a certain GVA checking the observed value. Sender
> vCPU drops to the host to swap the data page with another page filled
> with a different value. The expectation for workers is also
> altered. Without TLB flush on worker vCPUs, they may continue to
> observe old value. To guard against accidental TLB flushes for worker
> vCPUs the test is repeated 100 times.
> 
> Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
> fast' modes.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/hyperv.h     |   1 +
>  .../selftests/kvm/x86_64/hyperv_tlb_flush.c   | 647 ++++++++++++++++++
>  4 files changed, 650 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 5d5fbb161d56..1a1d09e414d5 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -25,6 +25,7 @@
>  /x86_64/hyperv_features
>  /x86_64/hyperv_ipi
>  /x86_64/hyperv_svm_test
> +/x86_64/hyperv_tlb_flush
>  /x86_64/mmio_warning_test
>  /x86_64/mmu_role_test
>  /x86_64/platform_info_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 44889f897fe7..8b83abc09a1a 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -54,6 +54,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
> +TEST_GEN_PROGS_x86_64 += x86_64/hyperv_tlb_flush
>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
>  TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
> diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> index f51d6fab8e93..1e34dd7c5075 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> @@ -185,6 +185,7 @@
>  /* hypercall options */
>  #define HV_HYPERCALL_FAST_BIT		BIT(16)
>  #define HV_HYPERCALL_VARHEAD_OFFSET	17
> +#define HV_HYPERCALL_REP_COMP_OFFSET	32
>  
>  #define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
>  
> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
> new file mode 100644
> index 000000000000..00bcae45ddd2
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
> @@ -0,0 +1,647 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V HvFlushVirtualAddress{List,Space}{,Ex} tests
> + *
> + * Copyright (C) 2022, Red Hat, Inc.
> + *
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <pthread.h>
> +#include <inttypes.h>
> +
> +#include "kvm_util.h"
> +#include "hyperv.h"
> +#include "processor.h"
> +#include "test_util.h"
> +#include "vmx.h"
> +
> +#define SENDER_VCPU_ID   1
> +#define WORKER_VCPU_ID_1 2
> +#define WORKER_VCPU_ID_2 65
> +
> +#define NTRY 100
> +
> +struct thread_params {
> +	struct kvm_vm *vm;
> +	uint32_t vcpu_id;
> +};
> +
> +struct hv_vpset {
> +	u64 format;
> +	u64 valid_bank_mask;
> +	u64 bank_contents[];
> +};
> +
> +enum HV_GENERIC_SET_FORMAT {
> +	HV_GENERIC_SET_SPARSE_4K,
> +	HV_GENERIC_SET_ALL,
> +};
> +
> +#define HV_FLUSH_ALL_PROCESSORS			BIT(0)
> +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	BIT(1)
> +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	BIT(2)
> +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	BIT(3)
> +
> +/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
> +struct hv_tlb_flush {
> +	u64 address_space;
> +	u64 flags;
> +	u64 processor_mask;
> +	u64 gva_list[];
> +} __packed;
> +
> +/* HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressListEx hypercalls */
> +struct hv_tlb_flush_ex {
> +	u64 address_space;
> +	u64 flags;
> +	struct hv_vpset hv_vp_set;
> +	u64 gva_list[];
> +} __packed;
> +
> +static inline void hv_init(vm_vaddr_t pgs_gpa)
> +{
> +	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
> +	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
> +}
> +
> +static void worker_code(void *test_pages, vm_vaddr_t pgs_gpa)
> +{
> +	u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
> +	unsigned char chr;
> +
> +	x2apic_enable();
> +	hv_init(pgs_gpa);
> +
> +	for (;;) {
> +		chr = READ_ONCE(*(unsigned char *)(test_pages + 4096 * 2 + vcpu_id));
It would be nice to wrap this into a function, like set_expected_char does for ease
of code understanding.

> +		if (chr)
> +			GUEST_ASSERT(*(unsigned char *)test_pages == chr);
> +		asm volatile("nop");
> +	}
> +}
> +
> +static inline u64 hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
> +{
> +	u64 hv_status;
> +
> +	asm volatile("mov %3, %%r8\n"
> +		     "vmcall"
> +		     : "=a" (hv_status),
> +		       "+c" (control), "+d" (arg1)
> +		     :  "r" (arg2)
> +		     : "cc", "memory", "r8", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +
> +static inline void nop_loop(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < 10000000; i++)
> +		asm volatile("nop");
> +}
> +
> +static inline void sync_to_xmm(void *data)
> +{
> +	int i;
> +
> +	for (i = 0; i < 8; i++)
> +		write_sse_reg(i, (sse128_t *)(data + sizeof(sse128_t) * i));
> +}

Nitpick: I see duplicated code, I complain ;-) - maybe put the above to some common file?

> +
> +static void set_expected_char(void *addr, unsigned char chr, int vcpu_id)
> +{
> +	asm volatile("mfence");

I remember that Paolo once told me (I might not remember that correctly though),
that on x86 the actual hardware barriers like mfence are not really
needed, because hardware already does memory accesses in order,
unless fancy (e.g non WB) memory types are used.

> +	*(unsigned char *)(addr + 2 * 4096 + vcpu_id) = chr;
> +}
> +
> +static void sender_guest_code(void *hcall_page, void *test_pages, vm_vaddr_t pgs_gpa)
> +{
> +	struct hv_tlb_flush *flush = (struct hv_tlb_flush *)hcall_page;
> +	struct hv_tlb_flush_ex *flush_ex = (struct hv_tlb_flush_ex *)hcall_page;
> +	int stage = 1, i;
> +	u64 res;
> +
> +	hv_init(pgs_gpa);
> +
> +	/* "Slow" hypercalls */

I hopefully understand it correctly, see my comments below,
but it might be worthy to add something similar to my comments
to the code to make it easier for someone reading the code to understand it.

> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);

Here we set expected char to 0, meaning that now workers will not assert
if there is mismatch.

> +		GUEST_SYNC(stage++);
Now there is a mismatch, the host swapped pages for us.

> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);

Now we flushed the TLB, the guest should see correct value.

> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);

Now we force the workers to check it.

Btw, an idea: it might be nice to use more that two test pages,
like say 100 test pages each filled with different value,
memory is cheap, and this way there will be no way for something
to cause 'double error' which could hide the bug by a chance.


Another thing, it might be nice to wrap this into a macro/function
to avoid *that* much duplication.


> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
> +		flush->gva_list[0] = (u64)test_pages;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
> +		flush->processor_mask = 0;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
> +		flush->gva_list[0] = (u64)test_pages;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		/* bank_contents and gva_list occupy the same space, thus [1] */
> +		flush_ex->gva_list[1] = (u64)test_pages;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET) |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
> +			BIT_ULL(WORKER_VCPU_ID_1 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
> +			BIT_ULL(WORKER_VCPU_ID_2 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		/* bank_contents and gva_list occupy the same space, thus [2] */
> +		flush_ex->gva_list[2] = (u64)test_pages;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET) |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX,
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
> +		flush_ex->gva_list[0] = (u64)test_pages;
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				pgs_gpa, pgs_gpa + 4096);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* "Fast" hypercalls */
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
> +		sync_to_xmm(&flush->processor_mask);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
> +				HV_HYPERCALL_FAST_BIT, 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
> +		flush->gva_list[0] = (u64)test_pages;
> +		sync_to_xmm(&flush->processor_mask);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST | HV_HYPERCALL_FAST_BIT |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		sync_to_xmm(&flush->processor_mask);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
> +				HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush->gva_list[0] = (u64)test_pages;
> +		sync_to_xmm(&flush->processor_mask);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST | HV_HYPERCALL_FAST_BIT |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET), 0x0,
> +				HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		sync_to_xmm(&flush_ex->hv_vp_set);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT |
> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET),
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		/* bank_contents and gva_list occupy the same space, thus [1] */
> +		flush_ex->gva_list[1] = (u64)test_pages;
> +		sync_to_xmm(&flush_ex->hv_vp_set);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET) |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
> +			BIT_ULL(WORKER_VCPU_ID_1 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		sync_to_xmm(&flush_ex->hv_vp_set);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT |
> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET),
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
> +			BIT_ULL(WORKER_VCPU_ID_2 / 64);
> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
> +		/* bank_contents and gva_list occupy the same space, thus [2] */
> +		flush_ex->gva_list[2] = (u64)test_pages;
> +		sync_to_xmm(&flush_ex->hv_vp_set);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET) |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
> +		sync_to_xmm(&flush_ex->hv_vp_set);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT,
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
> +	for (i = 0; i < NTRY; i++) {
> +		memset(hcall_page, 0, 4096);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
> +		GUEST_SYNC(stage++);
> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
> +		flush_ex->gva_list[0] = (u64)test_pages;
> +		sync_to_xmm(&flush_ex->hv_vp_set);
> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
> +		GUEST_ASSERT((res & 0xffff) == 0);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
> +		nop_loop();
> +	}
> +
> +	GUEST_DONE();
> +}
> +
> +static void *vcpu_thread(void *arg)
> +{
> +	struct thread_params *params = (struct thread_params *)arg;
> +	struct ucall uc;
> +	int old;
> +	int r;
> +	unsigned int exit_reason;
> +
> +	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
> +		    params->vcpu_id, r);
> +
> +	vcpu_run(params->vm, params->vcpu_id);
> +	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> +
> +	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> +		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> +		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> +
> +	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
> +		TEST_ASSERT(false,
> +			    "vCPU %u exited with error: %s.\n",
> +			    params->vcpu_id, (const char *)uc.args[0]);
> +	}
> +
> +	return NULL;
> +}
> +
> +static void cancel_join_vcpu_thread(pthread_t thread, uint32_t vcpu_id)
> +{
> +	void *retval;
> +	int r;
> +
> +	r = pthread_cancel(thread);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_cancel on vcpu_id=%d failed with errno=%d",
> +		    vcpu_id, r);
> +
> +	r = pthread_join(thread, &retval);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_join on vcpu_id=%d failed with errno=%d",
> +		    vcpu_id, r);
> +	TEST_ASSERT(retval == PTHREAD_CANCELED,
> +		    "expected retval=%p, got %p", PTHREAD_CANCELED,
> +		    retval);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int r;
> +	pthread_t threads[2];
> +	struct thread_params params[2];
> +	struct kvm_vm *vm;
> +	struct kvm_run *run;
> +	vm_vaddr_t hcall_page, test_pages;
> +	struct ucall uc;
> +	int stage = 1;
> +
> +	vm = vm_create_default(SENDER_VCPU_ID, 0, sender_guest_code);
> +	params[0].vm = vm;
> +	params[1].vm = vm;
> +
> +	/* Hypercall input/output */
> +	hcall_page = vm_vaddr_alloc_pages(vm, 2);
> +	memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
> +
> +	/*
> +	 * Test pages: the first one is filled with '0x1's, the second with '0x2's
> +	 * and the test will swap their mappings. The third page keeps the indication
> +	 * about the current state of mappings.
> +	 */
> +	test_pages = vm_vaddr_alloc_pages(vm, 3);
> +	memset(addr_gva2hva(vm, test_pages), 0x1, 4096);
> +	memset(addr_gva2hva(vm, test_pages) + 4096, 0x2, 4096);
> +	set_expected_char(addr_gva2hva(vm, test_pages), 0x0, WORKER_VCPU_ID_1);
> +	set_expected_char(addr_gva2hva(vm, test_pages), 0x0, WORKER_VCPU_ID_2);
> +
> +	vm_vcpu_add_default(vm, WORKER_VCPU_ID_1, worker_code);
> +	vcpu_args_set(vm, WORKER_VCPU_ID_1, 2, test_pages, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_msr(vm, WORKER_VCPU_ID_1, HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_1);
> +	vcpu_set_hv_cpuid(vm, WORKER_VCPU_ID_1);
> +
> +	vm_vcpu_add_default(vm, WORKER_VCPU_ID_2, worker_code);
> +	vcpu_args_set(vm, WORKER_VCPU_ID_2, 2, test_pages, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_msr(vm, WORKER_VCPU_ID_2, HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_2);
> +	vcpu_set_hv_cpuid(vm, WORKER_VCPU_ID_2);
> +
> +	vcpu_args_set(vm, SENDER_VCPU_ID, 3, hcall_page, test_pages,
> +		      addr_gva2gpa(vm, hcall_page));

It seems that all worker vCPUs get pointer to the hypercall page,
which they don't need and if used will create a race.


> +	vcpu_set_hv_cpuid(vm, SENDER_VCPU_ID);
> +
> +	params[0].vcpu_id = WORKER_VCPU_ID_1;
> +	r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_create halter failed errno=%d", errno);
> +
> +	params[1].vcpu_id = WORKER_VCPU_ID_2;
> +	r = pthread_create(&threads[1], NULL, vcpu_thread, &params[1]);
> +	TEST_ASSERT(r == 0,
> +		    "pthread_create halter failed errno=%d", errno);

Also here worker threads don't halt, the message was not updated I think.


> +
> +	run = vcpu_state(vm, SENDER_VCPU_ID);
> +
> +	while (true) {
> +		r = _vcpu_run(vm, SENDER_VCPU_ID);
> +		TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
> +		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
> +			    "unexpected exit reason: %u (%s)",
> +			    run->exit_reason, exit_reason_str(run->exit_reason));
> +
> +		switch (get_ucall(vm, SENDER_VCPU_ID, &uc)) {
> +		case UCALL_SYNC:
> +			TEST_ASSERT(uc.args[1] == stage,
> +				    "Unexpected stage: %ld (%d expected)\n",
> +				    uc.args[1], stage);
> +			break;
> +		case UCALL_ABORT:
> +			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
> +				  __FILE__, uc.args[1]);
> +			return 1;
> +		case UCALL_DONE:
> +			return 0;
> +		}
> +
> +		/* Swap test pages */
> +		if (stage % 2) {
> +			__virt_pg_map(vm, test_pages, addr_gva2gpa(vm, test_pages) + 4096,
> +				      X86_PAGE_SIZE_4K, true);
> +			__virt_pg_map(vm, test_pages + 4096, addr_gva2gpa(vm, test_pages) - 4096,
> +				      X86_PAGE_SIZE_4K, true);
> +		} else {
> +			__virt_pg_map(vm, test_pages, addr_gva2gpa(vm, test_pages) - 4096,
> +				      X86_PAGE_SIZE_4K, true);
> +			__virt_pg_map(vm, test_pages + 4096, addr_gva2gpa(vm, test_pages) + 4096,
> +				      X86_PAGE_SIZE_4K, true);
> +		}

Another question: why the host doing the swapping of the pages? Since !EPT/!NPT is not the goal of this test,

no doubt, why not let the guest vCPU (the sender) do the swapping, which should eliminate the VM exits
to the host (which can interfere with TLB flush even) and make it closer to the real world usage.


> +
> +		stage++;
> +	}
> +
> +	cancel_join_vcpu_thread(threads[0], WORKER_VCPU_ID_1);
> +	cancel_join_vcpu_thread(threads[1], WORKER_VCPU_ID_2);
> +	kvm_vm_free(vm);
> +
> +	return 0;
> +}


Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 27/34] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h
  2022-04-14 13:20 ` [PATCH v3 27/34] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h Vitaly Kuznetsov
@ 2022-05-11 12:17   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> 'struct hv_enlightened_vmcs' definition in selftests is not '__packed'
> and so we rely on the compiler doing the right padding. This is not
> obvious so it seems beneficial to use the same definition as in kernel.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/include/x86_64/evmcs.h | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> index cc5d14a45702..b6067b555110 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> @@ -41,6 +41,8 @@ struct hv_enlightened_vmcs {
>  	u16 host_gs_selector;
>  	u16 host_tr_selector;
>  
> +	u16 padding16_1;
> +
>  	u64 host_ia32_pat;
>  	u64 host_ia32_efer;
>  
> @@ -159,7 +161,7 @@ struct hv_enlightened_vmcs {
>  	u64 ept_pointer;
>  
>  	u16 virtual_processor_id;
> -	u16 padding16[3];
> +	u16 padding16_2[3];
>  
>  	u64 padding64_2[5];
>  	u64 guest_physical_address;
> @@ -195,15 +197,15 @@ struct hv_enlightened_vmcs {
>  	u64 guest_rip;
>  
>  	u32 hv_clean_fields;
> -	u32 hv_padding_32;
> +	u32 padding32_1;
>  	u32 hv_synthetic_controls;
>  	struct {
>  		u32 nested_flush_hypercall:1;
>  		u32 msr_bitmap:1;
>  		u32 reserved:30;
> -	} hv_enlightenments_control;
> +	}  __packed hv_enlightenments_control;
>  	u32 hv_vp_id;
> -
> +	u32 padding32_2;
>  	u64 hv_vm_id;
>  	u64 partition_assist_page;
>  	u64 padding64_4[4];
> @@ -211,7 +213,7 @@ struct hv_enlightened_vmcs {
>  	u64 padding64_5[7];
>  	u64 xss_exit_bitmap;
>  	u64 padding64_6[7];
> -};
> +} __packed;
>  
>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE                     0
>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP                BIT(0)

Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 28/34] KVM: selftests: nVMX: Allocate Hyper-V partition assist page
  2022-04-14 13:20 ` [PATCH v3 28/34] KVM: selftests: nVMX: Allocate Hyper-V partition assist page Vitaly Kuznetsov
@ 2022-05-11 12:17   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate
> so-called Partition assist page and link it to 'struct vmx_pages'.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/include/x86_64/vmx.h | 4 ++++
>  tools/testing/selftests/kvm/lib/x86_64/vmx.c     | 7 +++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> index 583ceb0d1457..f99922ca8259 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> @@ -567,6 +567,10 @@ struct vmx_pages {
>  	uint64_t enlightened_vmcs_gpa;
>  	void *enlightened_vmcs;
>  
> +	void *partition_assist_hva;
> +	uint64_t partition_assist_gpa;
> +	void *partition_assist;
> +
>  	void *eptp_hva;
>  	uint64_t eptp_gpa;
>  	void *eptp;
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> index d089d8b850b5..3db21e0e1a8f 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> @@ -124,6 +124,13 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
>  	vmx->enlightened_vmcs_gpa =
>  		addr_gva2gpa(vm, (uintptr_t)vmx->enlightened_vmcs);
>  
> +	/* Setup of a region of guest memory for the partition assist page. */
> +	vmx->partition_assist = (void *)vm_vaddr_alloc_page(vm);
> +	vmx->partition_assist_hva =
> +		addr_gva2hva(vm, (uintptr_t)vmx->partition_assist);
> +	vmx->partition_assist_gpa =
> +		addr_gva2gpa(vm, (uintptr_t)vmx->partition_assist);
> +
>  	*p_vmx_gva = vmx_gva;
>  	return vmx;
>  }


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 29/34] KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist pages
  2022-04-14 13:20 ` [PATCH v3 29/34] KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist pages Vitaly Kuznetsov
@ 2022-05-11 12:17   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate VP
> assist and Partition assist pages and link them to 'struct svm_test_data'.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/include/x86_64/svm_util.h | 10 ++++++++++
>  tools/testing/selftests/kvm/lib/x86_64/svm.c          | 10 ++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
> index a25aabd8f5e7..640859b58fd6 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
> @@ -34,6 +34,16 @@ struct svm_test_data {
>  	void *msr; /* gva */
>  	void *msr_hva;
>  	uint64_t msr_gpa;
> +
> +	/* Hyper-V VP assist page */
> +	void *vp_assist; /* gva */
> +	void *vp_assist_hva;
> +	uint64_t vp_assist_gpa;
> +
> +	/* Hyper-V Partition assist page */
> +	void *partition_assist; /* gva */
> +	void *partition_assist_hva;
> +	uint64_t partition_assist_gpa;
>  };
>  
>  struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva);
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/svm.c b/tools/testing/selftests/kvm/lib/x86_64/svm.c
> index 736ee4a23df6..c284e8f87f5c 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/svm.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/svm.c
> @@ -48,6 +48,16 @@ vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva)
>  	svm->msr_gpa = addr_gva2gpa(vm, (uintptr_t)svm->msr);
>  	memset(svm->msr_hva, 0, getpagesize());
>  
> +	svm->vp_assist = (void *)vm_vaddr_alloc_page(vm);
> +	svm->vp_assist_hva = addr_gva2hva(vm, (uintptr_t)svm->vp_assist);
> +	svm->vp_assist_gpa = addr_gva2gpa(vm, (uintptr_t)svm->vp_assist);
> +	memset(svm->vp_assist_hva, 0, getpagesize());
> +
> +	svm->partition_assist = (void *)vm_vaddr_alloc_page(vm);
> +	svm->partition_assist_hva = addr_gva2hva(vm, (uintptr_t)svm->partition_assist);
> +	svm->partition_assist_gpa = addr_gva2gpa(vm, (uintptr_t)svm->partition_assist);
> +	memset(svm->partition_assist_hva, 0, getpagesize());
> +
>  	*p_svm_gva = svm_gva;
>  	return svm;
>  }

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 30/34] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h
  2022-04-14 13:20 ` [PATCH v3 30/34] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h Vitaly Kuznetsov
@ 2022-05-11 12:18   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> 'struct hv_vp_assist_page' definition doesn't match TLFS. Also, define
> 'struct hv_nested_enlightenments_control' and use it instead of opaque
> '__u64'.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  .../selftests/kvm/include/x86_64/evmcs.h      | 22 ++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> index b6067b555110..9c965ba73dec 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> @@ -20,14 +20,26 @@
>  
>  extern bool enable_evmcs;
>  
> +struct hv_nested_enlightenments_control {
> +	struct {
> +		__u32 directhypercall:1;
> +		__u32 reserved:31;
> +	} features;
> +	struct {
> +		__u32 reserved;
> +	} hypercallControls;
> +} __packed;
> +
> +/* Define virtual processor assist page structure. */
>  struct hv_vp_assist_page {
>  	__u32 apic_assist;
> -	__u32 reserved;
> -	__u64 vtl_control[2];
> -	__u64 nested_enlightenments_control[2];
> -	__u32 enlighten_vmentry;
> +	__u32 reserved1;
> +	__u64 vtl_control[3];
> +	struct hv_nested_enlightenments_control nested_control;
> +	__u8 enlighten_vmentry;
> +	__u8 reserved2[7];
>  	__u64 current_nested_vmcs;
> -};
> +} __packed;
>  
>  struct hv_enlightened_vmcs {
>  	u32 revision_id;

Seems to match the spec.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 31/34] KVM: selftests: evmcs_test: Introduce L2 TLB flush test
  2022-04-14 13:20 ` [PATCH v3 31/34] KVM: selftests: evmcs_test: Introduce L2 TLB flush test Vitaly Kuznetsov
@ 2022-05-11 12:18   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
> from L2 don't exit to L1 unless 'TlbLockCount' is set in the
> Partition assist page.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  .../selftests/kvm/include/x86_64/evmcs.h      |  2 +
>  .../testing/selftests/kvm/x86_64/evmcs_test.c | 52 ++++++++++++++++++-
>  2 files changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> index 9c965ba73dec..36c0a67d8602 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> @@ -252,6 +252,8 @@ struct hv_enlightened_vmcs {
>  #define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
>  		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
>  
> +#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
> +
>  extern struct hv_enlightened_vmcs *current_evmcs;
>  extern struct hv_vp_assist_page *current_vp_assist;
>  
> diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> index d12e043aa2ee..8d2aa7600d78 100644
> --- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> @@ -16,6 +16,7 @@
>  
>  #include "kvm_util.h"
>  
> +#include "hyperv.h"
>  #include "vmx.h"
>  
>  #define VCPU_ID		5
> @@ -49,6 +50,16 @@ static inline void rdmsr_gs_base(void)
>  			      "r13", "r14", "r15");
>  }
>  
> +static inline void hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
> +{
> +	asm volatile("mov %3, %%r8\n"
> +		     "vmcall"
> +		     : "+c" (control), "+d" (arg1)
> +		     :  "r" (arg2)
> +		     : "cc", "memory", "rax", "rbx", "r8", "r9", "r10",
> +		       "r11", "r12", "r13", "r14", "r15");
> +}

I see duplicated code, I complain ;-)

> +
>  void l2_guest_code(void)
>  {
>  	GUEST_SYNC(7);
> @@ -67,15 +78,27 @@ void l2_guest_code(void)
>  	vmcall();
>  	rdmsr_gs_base(); /* intercepted */
>  
> +	/* L2 TLB flush tests */
> +	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
> +		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
> +	rdmsr_fs_base();
> +	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
> +		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
> +	/* Make sure we're no issuing Hyper-V TLB flush call again */
> +	__asm__ __volatile__ ("mov $0xdeadbeef, %rcx");
> +
>  	/* Done, exit to L1 and never come back.  */
>  	vmcall();
>  }
>  
> -void guest_code(struct vmx_pages *vmx_pages)
> +void guest_code(struct vmx_pages *vmx_pages, vm_vaddr_t pgs_gpa)
>  {
>  #define L2_GUEST_STACK_SIZE 64
>  	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
>  
> +	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
> +	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
> +
>  	x2apic_enable();
>  
>  	GUEST_SYNC(1);
> @@ -105,6 +128,14 @@ void guest_code(struct vmx_pages *vmx_pages)
>  	vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmreadz(PIN_BASED_VM_EXEC_CONTROL) |
>  		PIN_BASED_NMI_EXITING);
>  
> +	/* L2 TLB flush setup */
> +	current_evmcs->partition_assist_page = vmx_pages->partition_assist_gpa;
> +	current_evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;
> +	current_evmcs->hv_vm_id = 1;
> +	current_evmcs->hv_vp_id = 1;
> +	current_vp_assist->nested_control.features.directhypercall = 1;
> +	*(u32 *)(vmx_pages->partition_assist) = 0;
> +
>  	GUEST_ASSERT(!vmlaunch());
>  	GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);
>  
> @@ -149,6 +180,18 @@ void guest_code(struct vmx_pages *vmx_pages)
>  	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_MSR_READ);
>  	current_evmcs->guest_rip += 2; /* rdmsr */
>  
> +	/*
> +	 * L2 TLB flush test. First VMCALL should be handled directly by L0,
> +	 * no VMCALL exit expected.
> +	 */
> +	GUEST_ASSERT(!vmresume());
> +	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_MSR_READ);
> +	current_evmcs->guest_rip += 2; /* rdmsr */
> +	/* Enable synthetic vmexit */
> +	*(u32 *)(vmx_pages->partition_assist) = 1;
> +	GUEST_ASSERT(!vmresume());
> +	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH);
> +
>  	GUEST_ASSERT(!vmresume());
>  	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
>  	GUEST_SYNC(11);
> @@ -201,6 +244,7 @@ static void save_restore_vm(struct kvm_vm *vm)
>  int main(int argc, char *argv[])
>  {
>  	vm_vaddr_t vmx_pages_gva = 0;
> +	vm_vaddr_t hcall_page;
>  
>  	struct kvm_vm *vm;
>  	struct kvm_run *run;
> @@ -217,11 +261,15 @@ int main(int argc, char *argv[])
>  		exit(KSFT_SKIP);
>  	}
>  
> +	hcall_page = vm_vaddr_alloc_pages(vm, 1);
> +	memset(addr_gva2hva(vm, hcall_page), 0x0,  getpagesize());
> +
>  	vcpu_set_hv_cpuid(vm, VCPU_ID);
>  	vcpu_enable_evmcs(vm, VCPU_ID);
>  
>  	vcpu_alloc_vmx(vm, &vmx_pages_gva);
> -	vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
> +	vcpu_args_set(vm, VCPU_ID, 2, vmx_pages_gva, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_msr(vm, VCPU_ID, HV_X64_MSR_VP_INDEX, VCPU_ID);
>  
>  	vm_init_descriptor_tables(vm);
>  	vcpu_init_descriptor_tables(vm, VCPU_ID);

Looks good overall.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 32/34] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h
  2022-04-14 13:20 ` [PATCH v3 32/34] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h Vitaly Kuznetsov
@ 2022-05-11 12:18   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Hyper-V VP assist page is not eVMCS specific, it is also used for
> enlightened nSVM. Move the code to vendor neutral place.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/testing/selftests/kvm/Makefile          |  2 +-
>  .../selftests/kvm/include/x86_64/evmcs.h      | 40 +------------------
>  .../selftests/kvm/include/x86_64/hyperv.h     | 31 ++++++++++++++
>  .../testing/selftests/kvm/lib/x86_64/hyperv.c | 21 ++++++++++
>  .../testing/selftests/kvm/x86_64/evmcs_test.c |  1 +
>  5 files changed, 56 insertions(+), 39 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/lib/x86_64/hyperv.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 8b83abc09a1a..ae13aa32f3ce 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -38,7 +38,7 @@ ifeq ($(ARCH),riscv)
>  endif
>  
>  LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c
> -LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
> +LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/hyperv.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
>  LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c lib/aarch64/gic_v3.c lib/aarch64/vgic.c
>  LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
>  LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
> diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> index 36c0a67d8602..026586b53013 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
> @@ -10,6 +10,7 @@
>  #define SELFTEST_KVM_EVMCS_H
>  
>  #include <stdint.h>
> +#include "hyperv.h"
>  #include "vmx.h"
>  
>  #define u16 uint16_t
> @@ -20,27 +21,6 @@
>  
>  extern bool enable_evmcs;
>  
> -struct hv_nested_enlightenments_control {
> -	struct {
> -		__u32 directhypercall:1;
> -		__u32 reserved:31;
> -	} features;
> -	struct {
> -		__u32 reserved;
> -	} hypercallControls;
> -} __packed;
> -
> -/* Define virtual processor assist page structure. */
> -struct hv_vp_assist_page {
> -	__u32 apic_assist;
> -	__u32 reserved1;
> -	__u64 vtl_control[3];
> -	struct hv_nested_enlightenments_control nested_control;
> -	__u8 enlighten_vmentry;
> -	__u8 reserved2[7];
> -	__u64 current_nested_vmcs;
> -} __packed;
> -
>  struct hv_enlightened_vmcs {
>  	u32 revision_id;
>  	u32 abort;
> @@ -246,31 +226,15 @@ struct hv_enlightened_vmcs {
>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL    BIT(15)
>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL                      0xFFFF
>  
> -#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
> -#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE	0x00000001
> -#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT	12
> -#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
> -		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
> -
>  #define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
>  
>  extern struct hv_enlightened_vmcs *current_evmcs;
> -extern struct hv_vp_assist_page *current_vp_assist;
>  
>  int vcpu_enable_evmcs(struct kvm_vm *vm, int vcpu_id);
>  
> -static inline int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
> +static inline void evmcs_enable(void)
>  {
> -	u64 val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
> -		HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
> -
> -	wrmsr(HV_X64_MSR_VP_ASSIST_PAGE, val);
> -
> -	current_vp_assist = vp_assist;
> -
>  	enable_evmcs = true;
> -
> -	return 0;
>  }
>  
>  static inline int evmcs_vmptrld(uint64_t vmcs_pa, void *vmcs)
> diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> index 1e34dd7c5075..095c15fc5381 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> @@ -189,4 +189,35 @@
>  
>  #define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
>  
> +#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
> +#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE	0x00000001
> +#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT	12
> +#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
> +		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
> +
> +struct hv_nested_enlightenments_control {
> +	struct {
> +		__u32 directhypercall:1;
> +		__u32 reserved:31;
> +	} features;
> +	struct {
> +		__u32 reserved;
> +	} hypercallControls;
> +} __packed;
> +
> +/* Define virtual processor assist page structure. */
> +struct hv_vp_assist_page {
> +	__u32 apic_assist;
> +	__u32 reserved1;
> +	__u64 vtl_control[3];
> +	struct hv_nested_enlightenments_control nested_control;
> +	__u8 enlighten_vmentry;
> +	__u8 reserved2[7];
> +	__u64 current_nested_vmcs;
> +} __packed;
> +
> +extern struct hv_vp_assist_page *current_vp_assist;
> +
> +int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist);
> +
>  #endif /* !SELFTEST_KVM_HYPERV_H */
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
> new file mode 100644
> index 000000000000..32dc0afd9e5b
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hyper-V specific functions.
> + *
> + * Copyright (C) 2021, Red Hat Inc.
> + */
> +#include <stdint.h>
> +#include "processor.h"
> +#include "hyperv.h"
> +
> +int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
> +{
> +	uint64_t val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
> +		HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
> +
> +	wrmsr(HV_X64_MSR_VP_ASSIST_PAGE, val);
> +
> +	current_vp_assist = vp_assist;
> +
> +	return 0;
> +}
> diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> index 8d2aa7600d78..8fa50e76d557 100644
> --- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
> @@ -105,6 +105,7 @@ void guest_code(struct vmx_pages *vmx_pages, vm_vaddr_t pgs_gpa)
>  	GUEST_SYNC(2);
>  
>  	enable_vp_assist(vmx_pages->vp_assist_gpa, vmx_pages->vp_assist);
> +	evmcs_enable();
>  
>  	GUEST_ASSERT(vmx_pages->vmcs_gpa);
>  	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 34/34] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'
  2022-04-14 13:20 ` [PATCH v3 34/34] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush' Vitaly Kuznetsov
@ 2022-05-11 12:18   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent,
> rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change
> eliminates the use of confusing 'direct' and adds the missing underscore.
> 
> No functional change.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h | 2 +-
>  arch/x86/include/asm/kvm_host.h    | 2 +-
>  arch/x86/kvm/svm/svm_onhyperv.c    | 2 +-
>  arch/x86/kvm/svm/svm_onhyperv.h    | 6 +++---
>  arch/x86/kvm/vmx/vmx.c             | 6 +++---
>  arch/x86/kvm/x86.c                 | 6 +++---
>  6 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 96e4e9842dfc..1e13612a6446 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -121,7 +121,7 @@ KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
>  KVM_X86_OP(get_msr_feature)
>  KVM_X86_OP(can_emulate_instruction)
>  KVM_X86_OP(apic_init_signal_blocked)
> -KVM_X86_OP_OPTIONAL(enable_direct_tlbflush)
> +KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
>  KVM_X86_OP_OPTIONAL(migrate_timers)
>  KVM_X86_OP(msr_filter_changed)
>  KVM_X86_OP(complete_emulated_msr)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 168600490bd1..f4fd6da1f565 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1526,7 +1526,7 @@ struct kvm_x86_ops {
>  					void *insn, int insn_len);
>  
>  	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> -	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> +	int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);
>  
>  	void (*migrate_timers)(struct kvm_vcpu *vcpu);
>  	void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c
> index 8cdc62c74a96..69a7014d1cef 100644
> --- a/arch/x86/kvm/svm/svm_onhyperv.c
> +++ b/arch/x86/kvm/svm/svm_onhyperv.c
> @@ -14,7 +14,7 @@
>  #include "kvm_onhyperv.h"
>  #include "svm_onhyperv.h"
>  
> -int svm_hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
> +int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu)
>  {
>  	struct hv_enlightenments *hve;
>  	struct hv_partition_assist_pg **p_hv_pa_pg =
> diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
> index e2fc59380465..d6ec4aeebedb 100644
> --- a/arch/x86/kvm/svm/svm_onhyperv.h
> +++ b/arch/x86/kvm/svm/svm_onhyperv.h
> @@ -13,7 +13,7 @@
>  
>  static struct kvm_x86_ops svm_x86_ops;
>  
> -int svm_hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu);
> +int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
>  
>  static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
>  {
> @@ -51,8 +51,8 @@ static inline void svm_hv_hardware_setup(void)
>  
>  			vp_ap->nested_control.features.directhypercall = 1;
>  		}
> -		svm_x86_ops.enable_direct_tlbflush =
> -				svm_hv_enable_direct_tlbflush;
> +		svm_x86_ops.enable_l2_tlb_flush =
> +				svm_hv_enable_l2_tlb_flush;
>  	}
>  }
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index a81e44852f54..2b3c73b49dcb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -461,7 +461,7 @@ static unsigned long host_idt_base;
>  static bool __read_mostly enlightened_vmcs = true;
>  module_param(enlightened_vmcs, bool, 0444);
>  
> -static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
> +static int hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu)
>  {
>  	struct hv_enlightened_vmcs *evmcs;
>  	struct hv_partition_assist_pg **p_hv_pa_pg =
> @@ -8151,8 +8151,8 @@ static int __init vmx_init(void)
>  		}
>  
>  		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
> -			vmx_x86_ops.enable_direct_tlbflush
> -				= hv_enable_direct_tlbflush;
> +			vmx_x86_ops.enable_l2_tlb_flush
> +				= hv_enable_l2_tlb_flush;
>  
>  	} else {
>  		enlightened_vmcs = false;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d3839e648ab3..d620c56bc526 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4365,7 +4365,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  			kvm_x86_ops.nested_ops->get_state(NULL, NULL, 0) : 0;
>  		break;
>  	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
> -		r = kvm_x86_ops.enable_direct_tlbflush != NULL;
> +		r = kvm_x86_ops.enable_l2_tlb_flush != NULL;
>  		break;
>  	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>  		r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
> @@ -5275,10 +5275,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  		}
>  		return r;
>  	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
> -		if (!kvm_x86_ops.enable_direct_tlbflush)
> +		if (!kvm_x86_ops.enable_l2_tlb_flush)
>  			return -ENOTTY;
>  
> -		return static_call(kvm_x86_enable_direct_tlbflush)(vcpu);
> +		return static_call(kvm_x86_enable_l2_tlb_flush)(vcpu);
>  
>  	case KVM_CAP_HYPERV_ENFORCE_CPUID:
>  		return kvm_hv_set_enforce_cpuid(vcpu, cap->args[0]);

Nitpick: You may want to put that patch at the start of the series, since it doesn't depend on it.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 33/34] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test
  2022-04-14 13:20 ` [PATCH v3 33/34] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test Vitaly Kuznetsov
@ 2022-05-11 12:19   ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-11 12:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
> from L2 don't exit to L1 unless 'TlbLockCount' is set in the Partition
> assist page.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  .../selftests/kvm/x86_64/hyperv_svm_test.c    | 60 +++++++++++++++++--
>  1 file changed, 56 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c b/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
> index 21f5ca9197da..99f0a2ead7df 100644
> --- a/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
> @@ -42,11 +42,24 @@ struct hv_enlightenments {
>   */
>  #define VMCB_HV_NESTED_ENLIGHTENMENTS (1U << 31)
>  
> +#define HV_SVM_EXITCODE_ENL 0xF0000000
> +#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH   (1)
> +
>  static inline void vmmcall(void)
>  {
>  	__asm__ __volatile__("vmmcall");
>  }
>  
> +static inline void hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
> +{
> +	asm volatile("mov %3, %%r8\n"
> +		     "vmmcall"
> +		     : "+c" (control), "+d" (arg1)
> +		     :  "r" (arg2)
> +		     : "cc", "memory", "rax", "rbx", "r8", "r9", "r10",
> +		       "r11", "r12", "r13", "r14", "r15");
> +}

Yes, this code should really be put in a common file :)

> +
>  void l2_guest_code(void)
>  {
>  	GUEST_SYNC(3);
> @@ -62,11 +75,21 @@ void l2_guest_code(void)
>  
>  	GUEST_SYNC(5);
>  
> +	/* L2 TLB flush tests */
> +	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
> +		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
> +	rdmsr(MSR_FS_BASE);
> +	hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
> +		  HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
> +	/* Make sure we're not issuing Hyper-V TLB flush call again */
> +	__asm__ __volatile__ ("mov $0xdeadbeef, %rcx");
> +
>  	/* Done, exit to L1 and never come back.  */
>  	vmmcall();
>  }
>  
> -static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
> +static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
> +						    vm_vaddr_t pgs_gpa)
>  {
>  	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
>  	struct vmcb *vmcb = svm->vmcb;
> @@ -75,13 +98,23 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
>  
>  	GUEST_SYNC(1);
>  
> -	wrmsr(HV_X64_MSR_GUEST_OS_ID, (u64)0x8100 << 48);
> +	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
> +	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
> +	enable_vp_assist(svm->vp_assist_gpa, svm->vp_assist);
>  
>  	GUEST_ASSERT(svm->vmcb_gpa);
>  	/* Prepare for L2 execution. */
>  	generic_svm_setup(svm, l2_guest_code,
>  			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
>  
> +	/* L2 TLB flush setup */
> +	hve->partition_assist_page = svm->partition_assist_gpa;
> +	hve->hv_enlightenments_control.nested_flush_hypercall = 1;
> +	hve->hv_vm_id = 1;
> +	hve->hv_vp_id = 1;
> +	current_vp_assist->nested_control.features.directhypercall = 1;
> +	*(u32 *)(svm->partition_assist) = 0;
> +
>  	GUEST_SYNC(2);
>  	run_guest(vmcb, svm->vmcb_gpa);
>  	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
> @@ -116,6 +149,20 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
>  	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_MSR);
>  	vmcb->save.rip += 2; /* rdmsr */
>  
> +
> +	/*
> +	 * L2 TLB flush test. First VMCALL should be handled directly by L0,
> +	 * no VMCALL exit expected.
> +	 */
> +	run_guest(vmcb, svm->vmcb_gpa);
> +	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_MSR);
> +	vmcb->save.rip += 2; /* rdmsr */
> +	/* Enable synthetic vmexit */
> +	*(u32 *)(svm->partition_assist) = 1;
> +	run_guest(vmcb, svm->vmcb_gpa);
> +	GUEST_ASSERT(vmcb->control.exit_code == HV_SVM_EXITCODE_ENL);
> +	GUEST_ASSERT(vmcb->control.exit_info_1 == HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH);
> +
>  	run_guest(vmcb, svm->vmcb_gpa);
>  	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
>  	GUEST_SYNC(6);
> @@ -126,7 +173,7 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
>  int main(int argc, char *argv[])
>  {
>  	vm_vaddr_t nested_gva = 0;
> -
> +	vm_vaddr_t hcall_page;
>  	struct kvm_vm *vm;
>  	struct kvm_run *run;
>  	struct ucall uc;
> @@ -141,7 +188,12 @@ int main(int argc, char *argv[])
>  	vcpu_set_hv_cpuid(vm, VCPU_ID);
>  	run = vcpu_state(vm, VCPU_ID);
>  	vcpu_alloc_svm(vm, &nested_gva);
> -	vcpu_args_set(vm, VCPU_ID, 1, nested_gva);
> +
> +	hcall_page = vm_vaddr_alloc_pages(vm, 1);
> +	memset(addr_gva2hva(vm, hcall_page), 0x0,  getpagesize());
> +
> +	vcpu_args_set(vm, VCPU_ID, 2, nested_gva, addr_gva2gpa(vm, hcall_page));
> +	vcpu_set_msr(vm, VCPU_ID, HV_X64_MSR_VP_INDEX, VCPU_ID);
>  
>  	for (stage = 1;; stage++) {
>  		_vcpu_run(vm, VCPU_ID);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring
  2022-05-11 11:19   ` Maxim Levitsky
@ 2022-05-16 14:29     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-16 14:29 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> To allow flushing individual GVAs instead of always flushing the whole
>> VPID a per-vCPU structure to pass the requests is needed. Introduce a
>> simple ring write-locked structure to hold two types of entries:
>> individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits)
>> and 'flush all'.
>> 
>> The queuing rule is: if there's not enough space on the ring to put
>> the request and leave at least 1 entry for 'flush all' - put 'flush
>> all' entry.
>> 
>> The size of the ring is arbitrary set to '16'.
>> 
>> Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now so
>> there's very small functional change but the infrastructure is
>> prepared to handle individual GVA flush requests.
>
> As I see from this patch, also the code doesn't process the requests
> from the ring buffer yet, but rather just ignores it completely,
> and resets the whole ring buffer (kvm_hv_vcpu_empty_flush_tlb)
> Maybe you should mention it here.
>
>
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h | 16 +++++++
>>  arch/x86/kvm/hyperv.c           | 83 +++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/hyperv.h           | 13 ++++++
>>  arch/x86/kvm/x86.c              |  5 +-
>>  arch/x86/kvm/x86.h              |  1 +
>>  5 files changed, 116 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 1de3ad9308d8..b4dd2ff61658 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -578,6 +578,20 @@ struct kvm_vcpu_hv_synic {
>>  	bool dont_zero_synic_pages;
>>  };
>>  
>> +#define KVM_HV_TLB_FLUSH_RING_SIZE (16)
>> +
>> +struct kvm_vcpu_hv_tlb_flush_entry {
>> +	u64 addr;
>> +	u64 flush_all:1;
>> +	u64 pad:63;
>> +};
>
> Have you considered using kfifo.h library instead?
>

As a matter of fact I have not and this is a good suggestion,
actually. Let me try to use it instead of my home-brewed ring. I'll
address your other comments after that. Thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring
  2022-04-14 13:19 ` [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring Vitaly Kuznetsov
  2022-05-11 11:19   ` Maxim Levitsky
@ 2022-05-16 19:34   ` Sean Christopherson
  2022-05-17 13:31     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 102+ messages in thread
From: Sean Christopherson @ 2022-05-16 19:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
> To allow flushing individual GVAs instead of always flushing the whole
> VPID a per-vCPU structure to pass the requests is needed. Introduce a
> simple ring write-locked structure to hold two types of entries:
> individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits)
> and 'flush all'.
> 
> The queuing rule is: if there's not enough space on the ring to put
> the request and leave at least 1 entry for 'flush all' - put 'flush
> all' entry.
> 
> The size of the ring is arbitrary set to '16'.
> 
> Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now so
> there's very small functional change but the infrastructure is
> prepared to handle individual GVA flush requests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 16 +++++++
>  arch/x86/kvm/hyperv.c           | 83 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/hyperv.h           | 13 ++++++
>  arch/x86/kvm/x86.c              |  5 +-
>  arch/x86/kvm/x86.h              |  1 +
>  5 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1de3ad9308d8..b4dd2ff61658 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -578,6 +578,20 @@ struct kvm_vcpu_hv_synic {
>  	bool dont_zero_synic_pages;
>  };
>  
> +#define KVM_HV_TLB_FLUSH_RING_SIZE (16)
> +
> +struct kvm_vcpu_hv_tlb_flush_entry {
> +	u64 addr;

"addr" misleading, this is overloaded to be both the virtual address and the count.
I think we make it a moot point, but it led me astray in thinkin we could use the
lower 12 bits for flags... until I realized those bits are already in use.

> +	u64 flush_all:1;
> +	u64 pad:63;

This is rather odd, why not just use a bool?  But why even have a "flush_all"
field, can't we just use a magic value for write_idx to indicate "flush_all"?
E.g. either an explicit #define or -1.

Writers set write_idx to -1 to indicate "flush all", vCPU/reader goes straight
to "flush all" if write_idx is -1/invalid.  That way, future writes can simply do
nothing until read_idx == write_idx, and the vCPU/reader avoids unnecessary flushes
if there's a "flush all" pending and other valid entries in the ring.

And it allows deferring the "flush all" until the ring is truly full (unless there's
an off-by-one / wraparound edge case I'm missing, which is likely...).

---
 arch/x86/include/asm/kvm_host.h |  8 +-----
 arch/x86/kvm/hyperv.c           | 47 +++++++++++++--------------------
 2 files changed, 19 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b6b9a71a4591..bb45cc383ce4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -605,16 +605,10 @@ enum hv_tlb_flush_rings {
 	HV_NR_TLB_FLUSH_RINGS,
 };

-struct kvm_vcpu_hv_tlb_flush_entry {
-	u64 addr;
-	u64 flush_all:1;
-	u64 pad:63;
-};
-
 struct kvm_vcpu_hv_tlb_flush_ring {
 	int read_idx, write_idx;
 	spinlock_t write_lock;
-	struct kvm_vcpu_hv_tlb_flush_entry entries[KVM_HV_TLB_FLUSH_RING_SIZE];
+	u64 entries[KVM_HV_TLB_FLUSH_RING_SIZE];
 };

 /* Hyper-V per vcpu emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 1d6927538bc7..56f06cf85282 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1837,10 +1837,13 @@ static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc
 static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
 					 int read_idx, int write_idx)
 {
+	if (write_idx < 0)
+		return 0;
+
 	if (write_idx >= read_idx)
-		return KVM_HV_TLB_FLUSH_RING_SIZE - (write_idx - read_idx) - 1;
+		return KVM_HV_TLB_FLUSH_RING_SIZE - (write_idx - read_idx);

-	return read_idx - write_idx - 1;
+	return read_idx - write_idx;
 }

 static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
@@ -1869,6 +1872,9 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
 	 */
 	write_idx = tlb_flush_ring->write_idx;

+	if (write_idx < 0 && read_idx == write_idx)
+		read_idx = write_idx = 0;
+
 	ring_free = hv_tlb_flush_ring_free(hv_vcpu, read_idx, write_idx);
 	/* Full ring always contains 'flush all' entry */
 	if (!ring_free)
@@ -1879,21 +1885,13 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
 	 * entry in case another request comes in. In case there's not enough
 	 * space, just put 'flush all' entry there.
 	 */
-	if (!count || count >= ring_free - 1 || !entries) {
-		tlb_flush_ring->entries[write_idx].addr = 0;
-		tlb_flush_ring->entries[write_idx].flush_all = 1;
-		/*
-		 * Advance write index only after filling in the entry to
-		 * synchronize with lockless reader.
-		 */
-		smp_wmb();
-		tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
+	if (!count || count > ring_free - 1 || !entries) {
+		tlb_flush_ring->write_idx = -1;
 		goto out_unlock;
 	}

 	for (i = 0; i < count; i++) {
-		tlb_flush_ring->entries[write_idx].addr = entries[i];
-		tlb_flush_ring->entries[write_idx].flush_all = 0;
+		tlb_flush_ring->entries[write_idx] = entries[i];
 		write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
 	}
 	/*
@@ -1911,7 +1909,6 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
-	struct kvm_vcpu_hv_tlb_flush_entry *entry;
 	int read_idx, write_idx;
 	u64 address;
 	u32 count;
@@ -1940,26 +1937,18 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
 	smp_rmb();

+	if (write_idx < 0) {
+		kvm_vcpu_flush_tlb_guest(vcpu);
+		goto out_empty_ring;
+	}
+
 	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
-		entry = &tlb_flush_ring->entries[i];
-
-		if (entry->flush_all)
-			goto out_flush_all;
-
-		/*
-		 * Lower 12 bits of 'address' encode the number of additional
-		 * pages to flush.
-		 */
-		address = entry->addr & PAGE_MASK;
-		count = (entry->addr & ~PAGE_MASK) + 1;
+		address = tlb_flush_ring->entries[i] & PAGE_MASK;
+		count = (tlb_flush_ring->entries[i] & ~PAGE_MASK) + 1;
 		for (j = 0; j < count; j++)
 			static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
 	}
 	++vcpu->stat.tlb_flush;
-	goto out_empty_ring;
-
-out_flush_all:
-	kvm_vcpu_flush_tlb_guest(vcpu);

 out_empty_ring:
 	tlb_flush_ring->read_idx = write_idx;

base-commit: 62592c7c742ae78eb1f1005a63965ece19e6effe
--


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-04-14 13:19 ` [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Vitaly Kuznetsov
  2022-05-11 11:22   ` Maxim Levitsky
@ 2022-05-16 19:41   ` Sean Christopherson
  2022-05-17 13:41     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 102+ messages in thread
From: Sean Christopherson @ 2022-05-16 19:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
> @@ -1862,15 +1890,58 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	struct kvm_vcpu_hv_tlb_flush_entry *entry;
> +	int read_idx, write_idx;
> +	u64 address;
> +	u32 count;
> +	int i, j;
>  
> -	kvm_vcpu_flush_tlb_guest(vcpu);
> -
> -	if (!hv_vcpu)
> +	if (!tdp_enabled || !hv_vcpu) {
> +		kvm_vcpu_flush_tlb_guest(vcpu);
>  		return;
> +	}
>  
>  	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
>  
> -	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
> +	/*
> +	 * TLB flush must be performed on the target vCPU so 'read_idx'
> +	 * (AKA 'tail') cannot change underneath, the compiler is free
> +	 * to re-read it.
> +	 */
> +	read_idx = tlb_flush_ring->read_idx;
> +
> +	/*
> +	 * 'write_idx' (AKA 'head') can be concurently updated by a different
> +	 * vCPU so we must be sure it's read once.
> +	 */
> +	write_idx = READ_ONCE(tlb_flush_ring->write_idx);
> +
> +	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
> +	smp_rmb();
> +
> +	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
> +		entry = &tlb_flush_ring->entries[i];
> +
> +		if (entry->flush_all)
> +			goto out_flush_all;
> +
> +		/*
> +		 * Lower 12 bits of 'address' encode the number of additional
> +		 * pages to flush.
> +		 */
> +		address = entry->addr & PAGE_MASK;
> +		count = (entry->addr & ~PAGE_MASK) + 1;
> +		for (j = 0; j < count; j++)
> +			static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
> +	}
> +	++vcpu->stat.tlb_flush;

Bumping tlb_flush is inconsistent with how KVM handles INVLPG, and could be wrong
if the ring is empty (might be impossible without a bug?).  And if my math is right,
or at least in the ballpark, tlb_flush will be incremented once regardless of whether
the loop flushed 1 page or 64k pages (completely full ring, full count on every one).

I'd prefer to either drop the stat adjustment entirely, or bump invlpg in the loop, e.g.

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 56f06cf85282..5654c9d56289 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1945,10 +1945,11 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
        for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
                address = tlb_flush_ring->entries[i] & PAGE_MASK;
                count = (tlb_flush_ring->entries[i] & ~PAGE_MASK) + 1;
-               for (j = 0; j < count; j++)
+               for (j = 0; j < count; j++) {
                        static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
+                       ++vcpu->stat.invlpg;
+               }
        }
-       ++vcpu->stat.tlb_flush;

 out_empty_ring:
        tlb_flush_ring->read_idx = write_idx;


> +	goto out_empty_ring;
> +
> +out_flush_all:
> +	kvm_vcpu_flush_tlb_guest(vcpu);
> +
> +out_empty_ring:
> +	tlb_flush_ring->read_idx = write_idx;
>  }
>  

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()
  2022-05-11 11:24   ` Maxim Levitsky
@ 2022-05-16 19:52     ` Sean Christopherson
  0 siblings, 0 replies; 102+ messages in thread
From: Sean Christopherson @ 2022-05-16 19:52 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Vitaly Kuznetsov, kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson,
	Michael Kelley, Siddharth Chandrasekaran, linux-hyperv,
	linux-kernel

On Wed, May 11, 2022, Maxim Levitsky wrote:
> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > @@ -2089,8 +2108,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> >  		((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
> >  }
> >  
> > -static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
> > -				 unsigned long *vcpu_bitmap)
> > +static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
> > +				    u64 *sparse_banks, u64 valid_bank_mask)
> I think the indentation is wrong here (was wrong before as well)

It's correct, the "+" from the diff/patch misaligns the first line because there's
no tab to eat the extra character.  Amusingly, the misaligment just gets worse the
more ">" / quotes that get added to the front.

I usually end up applying a patch to double check if I suspect indentation is
wrong, it's too hard for me to tell based on the raw patch alone unless it's super
bad/obvious.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
  2022-04-14 13:19 ` [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' Vitaly Kuznetsov
  2022-05-11 11:25   ` Maxim Levitsky
@ 2022-05-16 20:05   ` Sean Christopherson
  2022-05-17 13:51     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 102+ messages in thread
From: Sean Christopherson @ 2022-05-16 20:05 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
> To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
> to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
> may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
> 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
> dynamically.
> 
> Note: sparse_set_to_vcpu_mask() keeps using on-stack allocation as it
> won't be used to handle L2 TLB flush requests.

I think it's worth using stronger language; handling TLB flushes for L2 _can't_
use sparse_set_to_vcpu_mask() because KVM has no idea how to translate an L2
vCPU index to an L1 vCPU.  I found the above mildly confusing because it didn't
call out "vp_bitmap" and so I assumed the note referred to yet another sparse_banks
"allocation".  And while vp_bitmap is related to sparse_banks, it tracks something
entirely different.

Something like?

Note: sparse_set_to_vcpu_mask() can never be used to handle L2 requests as
KVM can't translate L2 vCPU indices to L1 vCPUs, i.e. its vp_bitmap array
is still bounded by the number of L1 vCPUs and so can remain an on-stack
allocation.

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 3 +++
>  arch/x86/kvm/hyperv.c           | 6 ++++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 058061621872..837c07e213de 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -619,6 +619,9 @@ struct kvm_vcpu_hv {
>  	} cpuid_cache;
>  
>  	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
> +
> +	/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
> +	u64 sparse_banks[64];

Shouldn't this be HV_MAX_SPARSE_VCPU_BANKS?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
  2022-04-14 13:19 ` [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall() Vitaly Kuznetsov
  2022-05-11 11:25   ` Maxim Levitsky
@ 2022-05-16 20:09   ` Sean Christopherson
  1 sibling, 0 replies; 102+ messages in thread
From: Sean Christopherson @ 2022-05-16 20:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
> The newly introduced helper checks whether vCPU is performing a
> Hyper-V TLB flush hypercall. This is required to filter out L2 TLB
> flush hypercalls for processing.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index d59f96700104..ca67c18cef2c 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -170,6 +170,24 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
>  	tlb_flush_ring = kvm_hv_get_tlb_flush_ring(vcpu);
>  	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>  }
> +
> +static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	u16 code;
> +
> +	if (!hv_vcpu)
> +		return false;
> +
> +	code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read(vcpu) :
> +		kvm_rax_read(vcpu);

Nit, can you align the two expressions?

	code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read(vcpu) :
					   kvm_rax_read(vcpu);

> +
> +	return (code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
> +		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
> +		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
> +		code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX);
> +}
> +
>  void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
>  
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush
  2022-05-11 11:31   ` Maxim Levitsky
@ 2022-05-16 20:16     ` Sean Christopherson
  0 siblings, 0 replies; 102+ messages in thread
From: Sean Christopherson @ 2022-05-16 20:16 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Vitaly Kuznetsov, kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson,
	Michael Kelley, Siddharth Chandrasekaran, linux-hyperv,
	linux-kernel

On Wed, May 11, 2022, Maxim Levitsky wrote:
> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > +/*
> > + * Note, Hyper-V isn't actually stealing bit 28 from Intel, just abusing it by
> > + * pairing it with architecturally impossible exit reasons.  Bit 28 is set only
> > + * on SMI exits to a SMI transfer monitor (STM) and if and only if a MTF VM-Exit
> > + * is pending.  I.e. it will never be set by hardware for non-SMI exits (there
> > + * are only three), nor will it ever be set unless the VMM is an STM.
> 
> I am sure that this will backfire this way or another. Their fault though...

Heh, that was my initial reaction too, but after working through the architecture
I gotta hand it to the Hyper-V folks, it's very clever :-)  And if we ever need a
synthetic exit reason for PV KVM... :-)

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring
  2022-05-16 19:34   ` Sean Christopherson
@ 2022-05-17 13:31     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-17 13:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
>> To allow flushing individual GVAs instead of always flushing the whole
>> VPID a per-vCPU structure to pass the requests is needed. Introduce a
>> simple ring write-locked structure to hold two types of entries:
>> individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits)
>> and 'flush all'.
>> 
>> The queuing rule is: if there's not enough space on the ring to put
>> the request and leave at least 1 entry for 'flush all' - put 'flush
>> all' entry.
>> 
>> The size of the ring is arbitrary set to '16'.
>> 
>> Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now so
>> there's very small functional change but the infrastructure is
>> prepared to handle individual GVA flush requests.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h | 16 +++++++
>>  arch/x86/kvm/hyperv.c           | 83 +++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/hyperv.h           | 13 ++++++
>>  arch/x86/kvm/x86.c              |  5 +-
>>  arch/x86/kvm/x86.h              |  1 +
>>  5 files changed, 116 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 1de3ad9308d8..b4dd2ff61658 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -578,6 +578,20 @@ struct kvm_vcpu_hv_synic {
>>  	bool dont_zero_synic_pages;
>>  };
>>  
>> +#define KVM_HV_TLB_FLUSH_RING_SIZE (16)
>> +
>> +struct kvm_vcpu_hv_tlb_flush_entry {
>> +	u64 addr;
>
> "addr" misleading, this is overloaded to be both the virtual address and the count.
> I think we make it a moot point, but it led me astray in thinkin we could use the
> lower 12 bits for flags... until I realized those bits are already in use.
>
>> +	u64 flush_all:1;
>> +	u64 pad:63;
>
> This is rather odd, why not just use a bool?  

My initial plan was to eventually put more flags here, i.e. there are
two additional flags which we don't currently handle:

HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES (as we don't actually look at
 HV_ADDRESS_SPACE_ID)
HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY

> But why even have a "flush_all" field, can't we just use a magic value
> for write_idx to indicate "flush_all"? E.g. either an explicit #define
> or -1.

Sure, a magic value would do too and will allow us to make 'struct
kvm_vcpu_hv_tlb_flush_entry' 8 bytes instead of 16 (for the time being
as if we are to add HV_ADDRESS_SPACE_ID/additional flags the net win is
going to be zero).

>
> Writers set write_idx to -1 to indicate "flush all", vCPU/reader goes straight
> to "flush all" if write_idx is -1/invalid.  That way, future writes can simply do
> nothing until read_idx == write_idx, and the vCPU/reader avoids unnecessary flushes
> if there's a "flush all" pending and other valid entries in the ring.
>
> And it allows deferring the "flush all" until the ring is truly full (unless there's
> an off-by-one / wraparound edge case I'm missing, which is likely...).

Thanks for the patch! I am, however, going to look at Maxim's suggestion
to use 'kfifo' to avoid all these uncertainties, funky locking etc. At
first glance it has everything I need here.

>
> ---
>  arch/x86/include/asm/kvm_host.h |  8 +-----
>  arch/x86/kvm/hyperv.c           | 47 +++++++++++++--------------------
>  2 files changed, 19 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b6b9a71a4591..bb45cc383ce4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -605,16 +605,10 @@ enum hv_tlb_flush_rings {
>  	HV_NR_TLB_FLUSH_RINGS,
>  };
>
> -struct kvm_vcpu_hv_tlb_flush_entry {
> -	u64 addr;
> -	u64 flush_all:1;
> -	u64 pad:63;
> -};
> -
>  struct kvm_vcpu_hv_tlb_flush_ring {
>  	int read_idx, write_idx;
>  	spinlock_t write_lock;
> -	struct kvm_vcpu_hv_tlb_flush_entry entries[KVM_HV_TLB_FLUSH_RING_SIZE];
> +	u64 entries[KVM_HV_TLB_FLUSH_RING_SIZE];
>  };
>
>  /* Hyper-V per vcpu emulation context */
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1d6927538bc7..56f06cf85282 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1837,10 +1837,13 @@ static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc
>  static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
>  					 int read_idx, int write_idx)
>  {
> +	if (write_idx < 0)
> +		return 0;
> +
>  	if (write_idx >= read_idx)
> -		return KVM_HV_TLB_FLUSH_RING_SIZE - (write_idx - read_idx) - 1;
> +		return KVM_HV_TLB_FLUSH_RING_SIZE - (write_idx - read_idx);
>
> -	return read_idx - write_idx - 1;
> +	return read_idx - write_idx;
>  }
>
>  static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
> @@ -1869,6 +1872,9 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
>  	 */
>  	write_idx = tlb_flush_ring->write_idx;
>
> +	if (write_idx < 0 && read_idx == write_idx)
> +		read_idx = write_idx = 0;
> +
>  	ring_free = hv_tlb_flush_ring_free(hv_vcpu, read_idx, write_idx);
>  	/* Full ring always contains 'flush all' entry */
>  	if (!ring_free)
> @@ -1879,21 +1885,13 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu,
>  	 * entry in case another request comes in. In case there's not enough
>  	 * space, just put 'flush all' entry there.
>  	 */
> -	if (!count || count >= ring_free - 1 || !entries) {
> -		tlb_flush_ring->entries[write_idx].addr = 0;
> -		tlb_flush_ring->entries[write_idx].flush_all = 1;
> -		/*
> -		 * Advance write index only after filling in the entry to
> -		 * synchronize with lockless reader.
> -		 */
> -		smp_wmb();
> -		tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
> +	if (!count || count > ring_free - 1 || !entries) {
> +		tlb_flush_ring->write_idx = -1;
>  		goto out_unlock;
>  	}
>
>  	for (i = 0; i < count; i++) {
> -		tlb_flush_ring->entries[write_idx].addr = entries[i];
> -		tlb_flush_ring->entries[write_idx].flush_all = 0;
> +		tlb_flush_ring->entries[write_idx] = entries[i];
>  		write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
>  	}
>  	/*
> @@ -1911,7 +1909,6 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> -	struct kvm_vcpu_hv_tlb_flush_entry *entry;
>  	int read_idx, write_idx;
>  	u64 address;
>  	u32 count;
> @@ -1940,26 +1937,18 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
>  	smp_rmb();
>
> +	if (write_idx < 0) {
> +		kvm_vcpu_flush_tlb_guest(vcpu);
> +		goto out_empty_ring;
> +	}
> +
>  	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
> -		entry = &tlb_flush_ring->entries[i];
> -
> -		if (entry->flush_all)
> -			goto out_flush_all;
> -
> -		/*
> -		 * Lower 12 bits of 'address' encode the number of additional
> -		 * pages to flush.
> -		 */
> -		address = entry->addr & PAGE_MASK;
> -		count = (entry->addr & ~PAGE_MASK) + 1;
> +		address = tlb_flush_ring->entries[i] & PAGE_MASK;
> +		count = (tlb_flush_ring->entries[i] & ~PAGE_MASK) + 1;
>  		for (j = 0; j < count; j++)
>  			static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
>  	}
>  	++vcpu->stat.tlb_flush;
> -	goto out_empty_ring;
> -
> -out_flush_all:
> -	kvm_vcpu_flush_tlb_guest(vcpu);
>
>  out_empty_ring:
>  	tlb_flush_ring->read_idx = write_idx;
>
> base-commit: 62592c7c742ae78eb1f1005a63965ece19e6effe
> --
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-05-16 19:41   ` Sean Christopherson
@ 2022-05-17 13:41     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-17 13:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
>> @@ -1862,15 +1890,58 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>>  {
>>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +	struct kvm_vcpu_hv_tlb_flush_entry *entry;
>> +	int read_idx, write_idx;
>> +	u64 address;
>> +	u32 count;
>> +	int i, j;
>>  
>> -	kvm_vcpu_flush_tlb_guest(vcpu);
>> -
>> -	if (!hv_vcpu)
>> +	if (!tdp_enabled || !hv_vcpu) {
>> +		kvm_vcpu_flush_tlb_guest(vcpu);
>>  		return;
>> +	}
>>  
>>  	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
>>  
>> -	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>> +	/*
>> +	 * TLB flush must be performed on the target vCPU so 'read_idx'
>> +	 * (AKA 'tail') cannot change underneath, the compiler is free
>> +	 * to re-read it.
>> +	 */
>> +	read_idx = tlb_flush_ring->read_idx;
>> +
>> +	/*
>> +	 * 'write_idx' (AKA 'head') can be concurently updated by a different
>> +	 * vCPU so we must be sure it's read once.
>> +	 */
>> +	write_idx = READ_ONCE(tlb_flush_ring->write_idx);
>> +
>> +	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
>> +	smp_rmb();
>> +
>> +	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
>> +		entry = &tlb_flush_ring->entries[i];
>> +
>> +		if (entry->flush_all)
>> +			goto out_flush_all;
>> +
>> +		/*
>> +		 * Lower 12 bits of 'address' encode the number of additional
>> +		 * pages to flush.
>> +		 */
>> +		address = entry->addr & PAGE_MASK;
>> +		count = (entry->addr & ~PAGE_MASK) + 1;
>> +		for (j = 0; j < count; j++)
>> +			static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
>> +	}
>> +	++vcpu->stat.tlb_flush;
>
> Bumping tlb_flush is inconsistent with how KVM handles INVLPG, and could be wrong
> if the ring is empty (might be impossible without a bug?).  And if my math is right,
> or at least in the ballpark, tlb_flush will be incremented once regardless of whether
> the loop flushed 1 page or 64k pages (completely full ring, full count on every one).
>
> I'd prefer to either drop the stat adjustment entirely, or bump invlpg in the loop, e.g.
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 56f06cf85282..5654c9d56289 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1945,10 +1945,11 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>         for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
>                 address = tlb_flush_ring->entries[i] & PAGE_MASK;
>                 count = (tlb_flush_ring->entries[i] & ~PAGE_MASK) + 1;
> -               for (j = 0; j < count; j++)
> +               for (j = 0; j < count; j++) {
>                         static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
> +                       ++vcpu->stat.invlpg;
> +               }
>         }
> -       ++vcpu->stat.tlb_flush;
>
>  out_empty_ring:
>         tlb_flush_ring->read_idx = write_idx;
>

My idea was that flushing individual GVAs is always 'less intrusive'
than flushing the whole address space which counts as '1' in
'stat.tlb_flush'. Yes, 'flush 1 GVA' is equal to 'flush 64k' but on the
other hand if we do the math yor way we get:
- flush the whole address space: "stat.tlb_flush" is incremented by '1'.
- flush 100 indivudual GVAs: "stat.tlb_flush" is incremented by '100'.

What if we instead give 'stat.tlb_flush' the following meaning here:
"how many indivudual TLB flush requests were submitted", i.e.:

         for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
                 address = tlb_flush_ring->entries[i] & PAGE_MASK;
                 count = (tlb_flush_ring->entries[i] & ~PAGE_MASK) + 1;
                 for (j = 0; j < count; j++)
                         static_call(kvm_x86_flush_tlb_gva)(vcpu, address + j * PAGE_SIZE);
                 ++vcpu->stat.invlpg;
          }

(something in between what I have now and what you suggest). What do you think?

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
  2022-05-16 20:05   ` Sean Christopherson
@ 2022-05-17 13:51     ` Vitaly Kuznetsov
  2022-05-17 14:04       ` Sean Christopherson
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-17 13:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
>> To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
>> to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
>> may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
>> 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
>> dynamically.
>> 
>> Note: sparse_set_to_vcpu_mask() keeps using on-stack allocation as it
>> won't be used to handle L2 TLB flush requests.
>
> I think it's worth using stronger language; handling TLB flushes for L2 _can't_
> use sparse_set_to_vcpu_mask() because KVM has no idea how to translate an L2
> vCPU index to an L1 vCPU.  I found the above mildly confusing because it didn't
> call out "vp_bitmap" and so I assumed the note referred to yet another sparse_banks
> "allocation".  And while vp_bitmap is related to sparse_banks, it tracks something
> entirely different.
>
> Something like?
>
> Note: sparse_set_to_vcpu_mask() can never be used to handle L2 requests as
> KVM can't translate L2 vCPU indices to L1 vCPUs, i.e. its vp_bitmap array
> is still bounded by the number of L1 vCPUs and so can remain an on-stack
> allocation.

My brain is probably tainted by looking at all this for some time so I
really appreciate such improvements, thanks :)

I wouldn't, however, say "never" ('never say never' :-)): KVM could've
kept 2-level reverse mapping up-to-date:

KVM -> L2 VM list -> L2 vCPU ids -> L1 vCPUs which run them

making it possible for KVM to quickly translate between L2 VP IDs and L1
vCPUs. I don't do this in the series and just record L2 VM_ID/VP_ID for
each L1 vCPU so I have to go over them all for each request. The
optimization is, however, possible and we may get to it if really big
Windows VMs become a reality.

>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h | 3 +++
>>  arch/x86/kvm/hyperv.c           | 6 ++++--
>>  2 files changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 058061621872..837c07e213de 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -619,6 +619,9 @@ struct kvm_vcpu_hv {
>>  	} cpuid_cache;
>>  
>>  	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
>> +
>> +	/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
>> +	u64 sparse_banks[64];
>
> Shouldn't this be HV_MAX_SPARSE_VCPU_BANKS?
>

It certainly should, thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
  2022-05-17 13:51     ` Vitaly Kuznetsov
@ 2022-05-17 14:04       ` Sean Christopherson
  2022-05-17 14:19         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Sean Christopherson @ 2022-05-17 14:04 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Tue, May 17, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
> >> To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
> >> to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
> >> may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
> >> 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
> >> dynamically.
> >> 
> >> Note: sparse_set_to_vcpu_mask() keeps using on-stack allocation as it
> >> won't be used to handle L2 TLB flush requests.
> >
> > I think it's worth using stronger language; handling TLB flushes for L2 _can't_
> > use sparse_set_to_vcpu_mask() because KVM has no idea how to translate an L2
> > vCPU index to an L1 vCPU.  I found the above mildly confusing because it didn't
> > call out "vp_bitmap" and so I assumed the note referred to yet another sparse_banks
> > "allocation".  And while vp_bitmap is related to sparse_banks, it tracks something
> > entirely different.
> >
> > Something like?
> >
> > Note: sparse_set_to_vcpu_mask() can never be used to handle L2 requests as
> > KVM can't translate L2 vCPU indices to L1 vCPUs, i.e. its vp_bitmap array
> > is still bounded by the number of L1 vCPUs and so can remain an on-stack
> > allocation.
> 
> My brain is probably tainted by looking at all this for some time so I
> really appreciate such improvements, thanks :)
> 
> I wouldn't, however, say "never" ('never say never' :-)): KVM could've
> kept 2-level reverse mapping up-to-date:
> 
> KVM -> L2 VM list -> L2 vCPU ids -> L1 vCPUs which run them
> 
> making it possible for KVM to quickly translate between L2 VP IDs and L1
> vCPUs. I don't do this in the series and just record L2 VM_ID/VP_ID for
> each L1 vCPU so I have to go over them all for each request. The
> optimization is, however, possible and we may get to it if really big
> Windows VMs become a reality.

Out of curiosity, is L1 "required" to provides the L2 => L1 translation/map?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'
  2022-05-17 14:04       ` Sean Christopherson
@ 2022-05-17 14:19         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-17 14:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, May 17, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Thu, Apr 14, 2022, Vitaly Kuznetsov wrote:
>> >> To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
>> >> to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
>> >> may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
>> >> 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
>> >> dynamically.
>> >> 
>> >> Note: sparse_set_to_vcpu_mask() keeps using on-stack allocation as it
>> >> won't be used to handle L2 TLB flush requests.
>> >
>> > I think it's worth using stronger language; handling TLB flushes for L2 _can't_
>> > use sparse_set_to_vcpu_mask() because KVM has no idea how to translate an L2
>> > vCPU index to an L1 vCPU.  I found the above mildly confusing because it didn't
>> > call out "vp_bitmap" and so I assumed the note referred to yet another sparse_banks
>> > "allocation".  And while vp_bitmap is related to sparse_banks, it tracks something
>> > entirely different.
>> >
>> > Something like?
>> >
>> > Note: sparse_set_to_vcpu_mask() can never be used to handle L2 requests as
>> > KVM can't translate L2 vCPU indices to L1 vCPUs, i.e. its vp_bitmap array
>> > is still bounded by the number of L1 vCPUs and so can remain an on-stack
>> > allocation.
>> 
>> My brain is probably tainted by looking at all this for some time so I
>> really appreciate such improvements, thanks :)
>> 
>> I wouldn't, however, say "never" ('never say never' :-)): KVM could've
>> kept 2-level reverse mapping up-to-date:
>> 
>> KVM -> L2 VM list -> L2 vCPU ids -> L1 vCPUs which run them
>> 
>> making it possible for KVM to quickly translate between L2 VP IDs and L1
>> vCPUs. I don't do this in the series and just record L2 VM_ID/VP_ID for
>> each L1 vCPU so I have to go over them all for each request. The
>> optimization is, however, possible and we may get to it if really big
>> Windows VMs become a reality.
>
> Out of curiosity, is L1 "required" to provides the L2 => L1 translation/map?
>

To make this "Direct Virtual Flush" feature work? Yes, it is:

...
"
Before enabling it, the L1 hypervisor must configure the following
additional fields of the enlightened VMCS:
- VpId: ID of the virtual processor that the enlightened VMCS controls.
- VmId: ID of the virtual machine that the enlightened VMCS belongs to.
- PartitionAssistPage: Guest physical address of the partition assist
page.
"

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-05-11 11:22   ` Maxim Levitsky
@ 2022-05-18  9:39     ` Vitaly Kuznetsov
  2022-05-18 14:18       ` Sean Christopherson
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-18  9:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
>> the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
>> flushing the whole VPID and this is sub-optimal. Switch to handling
>> these requests with 'flush_tlb_gva()' hooks instead. Use the newly
>> introduced TLB flush ring to queue the requests.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/hyperv.c | 132 ++++++++++++++++++++++++++++++++++++------
>>  1 file changed, 115 insertions(+), 17 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index d66c27fd1e8a..759e1a16e5c3 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1805,6 +1805,13 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
>>  				  sparse_banks, consumed_xmm_halves, offset);
>>  }
>>  
>> +static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc, u64 entries[],
>> +				       int consumed_xmm_halves, gpa_t offset)
>> +{
>> +	return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt,
>> +				  entries, consumed_xmm_halves, offset);
>> +}
>> +
>>  static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
>>  					 int read_idx, int write_idx)
>>  {
>> @@ -1814,12 +1821,13 @@ static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
>>  	return read_idx - write_idx - 1;
>>  }
>>  
>> -static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
>> +static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
>>  {
>>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>>  	int ring_free, write_idx, read_idx;
>>  	unsigned long flags;
>> +	int i;
>>  
>>  	if (!hv_vcpu)
>>  		return;
>> @@ -1845,14 +1853,34 @@ static void hv_tlb_flush_ring_enqueue(struct kvm_vcpu *vcpu)
>>  	if (!ring_free)
>>  		goto out_unlock;
>>  
>> -	tlb_flush_ring->entries[write_idx].addr = 0;
>> -	tlb_flush_ring->entries[write_idx].flush_all = 1;
>>  	/*
>> -	 * Advance write index only after filling in the entry to
>> -	 * synchronize with lockless reader.
>> +	 * All entries should fit on the ring leaving one free for 'flush all'
>> +	 * entry in case another request comes in. In case there's not enough
>> +	 * space, just put 'flush all' entry there.
>> +	 */
>> +	if (!count || count >= ring_free - 1 || !entries) {
>> +		tlb_flush_ring->entries[write_idx].addr = 0;
>> +		tlb_flush_ring->entries[write_idx].flush_all = 1;
>> +		/*
>> +		 * Advance write index only after filling in the entry to
>> +		 * synchronize with lockless reader.
>> +		 */
>> +		smp_wmb();
>> +		tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
>> +		goto out_unlock;
>> +	}
>> +
>> +	for (i = 0; i < count; i++) {
>> +		tlb_flush_ring->entries[write_idx].addr = entries[i];
>> +		tlb_flush_ring->entries[write_idx].flush_all = 0;
>> +		write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
>> +	}
>> +	/*
>> +	 * Advance write index only after filling in the entry to synchronize
>> +	 * with lockless reader.
>>  	 */
>>  	smp_wmb();
>> -	tlb_flush_ring->write_idx = (write_idx + 1) % KVM_HV_TLB_FLUSH_RING_SIZE;
>> +	tlb_flush_ring->write_idx = write_idx;
>>  
>>  out_unlock:
>>  	spin_unlock_irqrestore(&tlb_flush_ring->write_lock, flags);
>> @@ -1862,15 +1890,58 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>>  {
>>  	struct kvm_vcpu_hv_tlb_flush_ring *tlb_flush_ring;
>>  	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +	struct kvm_vcpu_hv_tlb_flush_entry *entry;
>> +	int read_idx, write_idx;
>> +	u64 address;
>> +	u32 count;
>> +	int i, j;
>>  
>> -	kvm_vcpu_flush_tlb_guest(vcpu);
>> -
>> -	if (!hv_vcpu)
>> +	if (!tdp_enabled || !hv_vcpu) {
>> +		kvm_vcpu_flush_tlb_guest(vcpu);
>>  		return;
>> +	}
>>  
>>  	tlb_flush_ring = &hv_vcpu->tlb_flush_ring;
>>  
>> -	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>> +	/*
>> +	 * TLB flush must be performed on the target vCPU so 'read_idx'
>> +	 * (AKA 'tail') cannot change underneath, the compiler is free
>> +	 * to re-read it.
>> +	 */
>> +	read_idx = tlb_flush_ring->read_idx;
>> +
>> +	/*
>> +	 * 'write_idx' (AKA 'head') can be concurently updated by a different
>> +	 * vCPU so we must be sure it's read once.
>> +	 */
>> +	write_idx = READ_ONCE(tlb_flush_ring->write_idx);
>> +
>> +	/* Pairs with smp_wmb() in hv_tlb_flush_ring_enqueue() */
>> +	smp_rmb();
>> +
>> +	for (i = read_idx; i != write_idx; i = (i + 1) % KVM_HV_TLB_FLUSH_RING_SIZE) {
>> +		entry = &tlb_flush_ring->entries[i];
>> +
>> +		if (entry->flush_all)
>> +			goto out_flush_all;
>
> I have an idea: instead of special 'flush all entry' in the ring,
> just have a boolean in parallel to the ring.
>
> Also the ring buffer entries will be 2x smaller since they won't need
> to have the 'flush all' boolean.
>
> This would allow to just flush the whole thing and discard the ring if that boolean is set,
> allow to not enqueue anything to the ring also if the boolean is already set,
> also we won't need to have extra space in the ring for that entry, etc, etc.
>
> Or if using kfifo, then it can contain plain u64 items, which is even more natural.
>

In the next version I switch to fifo and get rid of 'flush_all' entries
but instead of a boolean I use a 'magic' value of '-1' in GVA. This way
we don't need to synchronize with the reader and add any special
handling for the flag.

Note, in the future we may get back to having flags as part of entries
as it is now possible to analize guest's CR3. We'll likely add
'AddressSpace' to each entry. The 'flush all' entry, however, will
always remain 'special' to handle ring overflow case.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
  2022-05-11 11:27   ` Maxim Levitsky
@ 2022-05-18 12:25     ` Vitaly Kuznetsov
  2022-05-18 12:45       ` Maxim Levitsky
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-18 12:25 UTC (permalink / raw)
  To: Maxim Levitsky, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
>> assist page address to handle L2 TLB flush requests.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
>>  arch/x86/kvm/svm/nested.c |  2 ++
>>  2 files changed, 18 insertions(+)
>> 
>> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
>> index 7d6d97968fb9..8cf702fed7e5 100644
>> --- a/arch/x86/kvm/svm/hyperv.h
>> +++ b/arch/x86/kvm/svm/hyperv.h
>> @@ -9,6 +9,7 @@
>>  #include <asm/mshyperv.h>
>>  
>>  #include "../hyperv.h"
>> +#include "svm.h"
>>  
>>  /*
>>   * Hyper-V uses the software reserved 32 bytes in VMCB
>> @@ -32,4 +33,19 @@ struct hv_enlightenments {
>>   */
>>  #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>>  
>> +static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +	struct hv_enlightenments *hve =
>> +		(struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
>
> Small nitpick:
>
> Can we use this as an opportunity to rename the 'reserved_sw' to \
> 'hv_enlightenments' or something, because that is what it is?
>
> Also the reserved_sw is an array, which is confusing, since from first look,
> it looks like we have a pointer dereference here.
>

Well, that's what it is in Hyper-V world and so far we didn't give it
another meaning in KVM but in theory it is not impossible, e.g. we can
use this area to speed up nested KVM on KVM.

AMD calls this "Reserved for Host usage" so we can probably rename it to 
'reserved_host' but I'm not sure it's worth the hassle...

>
>
>> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +
>> +	if (!hv_vcpu)
>> +		return;
>> +
>> +	hv_vcpu->nested.pa_page_gpa = hve->partition_assist_page;
>> +	hv_vcpu->nested.vm_id = hve->hv_vm_id;
>> +	hv_vcpu->nested.vp_id = hve->hv_vp_id;
>> +}
>> +
>>  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index bed5e1692cef..2d1a76343404 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -826,6 +826,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
>>  
>>  	svm->nested.nested_run_pending = 1;
>>  
>> +	nested_svm_hv_update_vm_vp_ids(vcpu);
>> +
>>  	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
>>  		goto out_exit_err;
>>  
>
> That won't work after migration, since this won't be called
> if we migrate with nested guest running.
>
>
> I think that nested_svm_hv_update_vm_vp_ids should be called 
> from enter_svm_guest_mode.
>

Oh that's a good one, thanks! This could've been a hard to debug issue.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook
  2022-05-11 11:32   ` Maxim Levitsky
@ 2022-05-18 12:43     ` Vitaly Kuznetsov
  2022-05-18 12:49       ` Maxim Levitsky
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-18 12:43 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> Hyper-V supports injecting synthetic L2->L1 exit after performing
>> L2 TLB flush operation but the procedure is vendor specific.
>> Introduce .post_hv_l2_tlb_flush() nested hook for it.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  1 +
>>  arch/x86/kvm/Makefile           |  3 ++-
>>  arch/x86/kvm/svm/hyperv.c       | 11 +++++++++++
>>  arch/x86/kvm/svm/hyperv.h       |  2 ++
>>  arch/x86/kvm/svm/nested.c       |  1 +
>>  arch/x86/kvm/vmx/evmcs.c        |  4 ++++
>>  arch/x86/kvm/vmx/evmcs.h        |  1 +
>>  arch/x86/kvm/vmx/nested.c       |  1 +
>>  8 files changed, 23 insertions(+), 1 deletion(-)
>>  create mode 100644 arch/x86/kvm/svm/hyperv.c
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 8b2a52bf26c0..ce62fde5f4ff 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1558,6 +1558,7 @@ struct kvm_x86_nested_ops {
>>  	int (*enable_evmcs)(struct kvm_vcpu *vcpu,
>>  			    uint16_t *vmcs_version);
>>  	uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
>> +	void (*post_hv_l2_tlb_flush)(struct kvm_vcpu *vcpu);
>>  };
>>  
>>  struct kvm_x86_init_ops {
>> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
>> index 30f244b64523..b6d53b045692 100644
>> --- a/arch/x86/kvm/Makefile
>> +++ b/arch/x86/kvm/Makefile
>> @@ -25,7 +25,8 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
>>  			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
>>  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
>>  
>> -kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
>> +kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
>> +			   svm/sev.o svm/hyperv.o
>>  
>>  ifdef CONFIG_HYPERV
>>  kvm-amd-y		+= svm/svm_onhyperv.o
>> diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
>> new file mode 100644
>> index 000000000000..c0749fc282fe
>> --- /dev/null
>> +++ b/arch/x86/kvm/svm/hyperv.c
>> @@ -0,0 +1,11 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * AMD SVM specific code for Hyper-V on KVM.
>> + *
>> + * Copyright 2022 Red Hat, Inc. and/or its affiliates.
>> + */
>> +#include "hyperv.h"
>> +
>> +void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
>> index 8cf702fed7e5..a2b0d7580b0d 100644
>> --- a/arch/x86/kvm/svm/hyperv.h
>> +++ b/arch/x86/kvm/svm/hyperv.h
>> @@ -48,4 +48,6 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
>>  	hv_vcpu->nested.vp_id = hve->hv_vp_id;
>>  }
>>  
>> +void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
>> +
>>  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index 2d1a76343404..de3f27301b5c 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -1665,4 +1665,5 @@ struct kvm_x86_nested_ops svm_nested_ops = {
>>  	.get_nested_state_pages = svm_get_nested_state_pages,
>>  	.get_state = svm_get_nested_state,
>>  	.set_state = svm_set_nested_state,
>> +	.post_hv_l2_tlb_flush = svm_post_hv_l2_tlb_flush,
>>  };
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 87e3dc10edf4..e390e67496df 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -437,3 +437,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>>  
>>  	return 0;
>>  }
>> +
>> +void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>> index 8d70f9aea94b..b120b0ead4f3 100644
>> --- a/arch/x86/kvm/vmx/evmcs.h
>> +++ b/arch/x86/kvm/vmx/evmcs.h
>> @@ -244,5 +244,6 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>>  			uint16_t *vmcs_version);
>>  void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
>>  int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
>> +void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __KVM_X86_VMX_EVMCS_H */
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index ee88921c6156..cc6c944b5815 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -6850,4 +6850,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
>>  	.write_log_dirty = nested_vmx_write_pml_buffer,
>>  	.enable_evmcs = nested_enable_evmcs,
>>  	.get_evmcs_version = nested_get_evmcs_version,
>> +	.post_hv_l2_tlb_flush = vmx_post_hv_l2_tlb_flush,
>>  };
>
>
> I think that the name of the function is misleading, since it is not called
> after each L2 HV tlb flush, but only after a flush which needs to inject
> that synthetic VM exit.
>
> I think something like 'inject_synthetic_l2_hv_tlb_flush_vmexit' 
> (not a good name IMHO, but you get the idea) would be better.
>

Naming is hard indeed,

hv_inject_synthetic_vmexit_post_tlb_flush()

seems to be accurate.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
  2022-05-18 12:25     ` Vitaly Kuznetsov
@ 2022-05-18 12:45       ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-18 12:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Wed, 2022-05-18 at 14:25 +0200, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> 
> > On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > > Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
> > > assist page address to handle L2 TLB flush requests.
> > > 
> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > ---
> > >  arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
> > >  arch/x86/kvm/svm/nested.c |  2 ++
> > >  2 files changed, 18 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> > > index 7d6d97968fb9..8cf702fed7e5 100644
> > > --- a/arch/x86/kvm/svm/hyperv.h
> > > +++ b/arch/x86/kvm/svm/hyperv.h
> > > @@ -9,6 +9,7 @@
> > >  #include <asm/mshyperv.h>
> > >  
> > >  #include "../hyperv.h"
> > > +#include "svm.h"
> > >  
> > >  /*
> > >   * Hyper-V uses the software reserved 32 bytes in VMCB
> > > @@ -32,4 +33,19 @@ struct hv_enlightenments {
> > >   */
> > >  #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
> > >  
> > > +static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct vcpu_svm *svm = to_svm(vcpu);
> > > +	struct hv_enlightenments *hve =
> > > +		(struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
> > 
> > Small nitpick:
> > 
> > Can we use this as an opportunity to rename the 'reserved_sw' to \
> > 'hv_enlightenments' or something, because that is what it is?
> > 
> > Also the reserved_sw is an array, which is confusing, since from first look,
> > it looks like we have a pointer dereference here.
> > 
> 
> Well, that's what it is in Hyper-V world and so far we didn't give it
> another meaning in KVM but in theory it is not impossible, e.g. we can
> use this area to speed up nested KVM on KVM.
> 
> AMD calls this "Reserved for Host usage" so we can probably rename it to 
> 'reserved_host' but I'm not sure it's worth the hassle...

This is a very good piece of information. If AMD calls it like that,
than let it be.

It is probably not worth it to rename the field then, but I think it
is might be worth it to add this info as a comment to the KVM.

Also it might be worth it to add some wrapper function for
'struct hv_enlightenments *hve = (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;'
(+ check if this area is valid - currently it is copied only when 'kvm_hv_hypercall_enabled == true').

Both would be a very low priority items to be honest.

Thanks,
Best regards,
	MaxiMm Levitsky


> 
> > 
> > > +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> > > +
> > > +	if (!hv_vcpu)
> > > +		return;
> > > +
> > > +	hv_vcpu->nested.pa_page_gpa = hve->partition_assist_page;
> > > +	hv_vcpu->nested.vm_id = hve->hv_vm_id;
> > > +	hv_vcpu->nested.vp_id = hve->hv_vp_id;
> > > +}
> > > +
> > >  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index bed5e1692cef..2d1a76343404 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -826,6 +826,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
> > >  
> > >  	svm->nested.nested_run_pending = 1;
> > >  
> > > +	nested_svm_hv_update_vm_vp_ids(vcpu);
> > > +
> > >  	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
> > >  		goto out_exit_err;
> > >  
> > 
> > That won't work after migration, since this won't be called
> > if we migrate with nested guest running.
> > 
> > 
> > I think that nested_svm_hv_update_vm_vp_ids should be called 
> > from enter_svm_guest_mode.
> > 
> 
> Oh that's a good one, thanks! This could've been a hard to debug issue.
> 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook
  2022-05-18 12:43     ` Vitaly Kuznetsov
@ 2022-05-18 12:49       ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-18 12:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

On Wed, 2022-05-18 at 14:43 +0200, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> 
> > On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > > Hyper-V supports injecting synthetic L2->L1 exit after performing
> > > L2 TLB flush operation but the procedure is vendor specific.
> > > Introduce .post_hv_l2_tlb_flush() nested hook for it.
> > > 
> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  1 +
> > >  arch/x86/kvm/Makefile           |  3 ++-
> > >  arch/x86/kvm/svm/hyperv.c       | 11 +++++++++++
> > >  arch/x86/kvm/svm/hyperv.h       |  2 ++
> > >  arch/x86/kvm/svm/nested.c       |  1 +
> > >  arch/x86/kvm/vmx/evmcs.c        |  4 ++++
> > >  arch/x86/kvm/vmx/evmcs.h        |  1 +
> > >  arch/x86/kvm/vmx/nested.c       |  1 +
> > >  8 files changed, 23 insertions(+), 1 deletion(-)
> > >  create mode 100644 arch/x86/kvm/svm/hyperv.c
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 8b2a52bf26c0..ce62fde5f4ff 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1558,6 +1558,7 @@ struct kvm_x86_nested_ops {
> > >  	int (*enable_evmcs)(struct kvm_vcpu *vcpu,
> > >  			    uint16_t *vmcs_version);
> > >  	uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
> > > +	void (*post_hv_l2_tlb_flush)(struct kvm_vcpu *vcpu);
> > >  };
> > >  
> > >  struct kvm_x86_init_ops {
> > > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > > index 30f244b64523..b6d53b045692 100644
> > > --- a/arch/x86/kvm/Makefile
> > > +++ b/arch/x86/kvm/Makefile
> > > @@ -25,7 +25,8 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
> > >  			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
> > >  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
> > >  
> > > -kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
> > > +kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
> > > +			   svm/sev.o svm/hyperv.o
> > >  
> > >  ifdef CONFIG_HYPERV
> > >  kvm-amd-y		+= svm/svm_onhyperv.o
> > > diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
> > > new file mode 100644
> > > index 000000000000..c0749fc282fe
> > > --- /dev/null
> > > +++ b/arch/x86/kvm/svm/hyperv.c
> > > @@ -0,0 +1,11 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * AMD SVM specific code for Hyper-V on KVM.
> > > + *
> > > + * Copyright 2022 Red Hat, Inc. and/or its affiliates.
> > > + */
> > > +#include "hyperv.h"
> > > +
> > > +void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
> > > +{
> > > +}
> > > diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> > > index 8cf702fed7e5..a2b0d7580b0d 100644
> > > --- a/arch/x86/kvm/svm/hyperv.h
> > > +++ b/arch/x86/kvm/svm/hyperv.h
> > > @@ -48,4 +48,6 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> > >  	hv_vcpu->nested.vp_id = hve->hv_vp_id;
> > >  }
> > >  
> > > +void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
> > > +
> > >  #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index 2d1a76343404..de3f27301b5c 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -1665,4 +1665,5 @@ struct kvm_x86_nested_ops svm_nested_ops = {
> > >  	.get_nested_state_pages = svm_get_nested_state_pages,
> > >  	.get_state = svm_get_nested_state,
> > >  	.set_state = svm_set_nested_state,
> > > +	.post_hv_l2_tlb_flush = svm_post_hv_l2_tlb_flush,
> > >  };
> > > diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> > > index 87e3dc10edf4..e390e67496df 100644
> > > --- a/arch/x86/kvm/vmx/evmcs.c
> > > +++ b/arch/x86/kvm/vmx/evmcs.c
> > > @@ -437,3 +437,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> > >  
> > >  	return 0;
> > >  }
> > > +
> > > +void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
> > > +{
> > > +}
> > > diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> > > index 8d70f9aea94b..b120b0ead4f3 100644
> > > --- a/arch/x86/kvm/vmx/evmcs.h
> > > +++ b/arch/x86/kvm/vmx/evmcs.h
> > > @@ -244,5 +244,6 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> > >  			uint16_t *vmcs_version);
> > >  void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
> > >  int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
> > > +void vmx_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
> > >  
> > >  #endif /* __KVM_X86_VMX_EVMCS_H */
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index ee88921c6156..cc6c944b5815 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -6850,4 +6850,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
> > >  	.write_log_dirty = nested_vmx_write_pml_buffer,
> > >  	.enable_evmcs = nested_enable_evmcs,
> > >  	.get_evmcs_version = nested_get_evmcs_version,
> > > +	.post_hv_l2_tlb_flush = vmx_post_hv_l2_tlb_flush,
> > >  };
> > 
> > I think that the name of the function is misleading, since it is not called
> > after each L2 HV tlb flush, but only after a flush which needs to inject
> > that synthetic VM exit.
> > 
> > I think something like 'inject_synthetic_l2_hv_tlb_flush_vmexit' 
> > (not a good name IMHO, but you get the idea) would be better.
> > 
> 
> Naming is hard indeed,

Indeed :-)

https://www.monkeyuser.com/2019/_/


> 
> hv_inject_synthetic_vmexit_post_tlb_flush()

Looks great!

Best regards,
	Maxim Levitsky

> 
> seems to be accurate.
> 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-05-18  9:39     ` Vitaly Kuznetsov
@ 2022-05-18 14:18       ` Sean Christopherson
  2022-05-18 14:43         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 102+ messages in thread
From: Sean Christopherson @ 2022-05-18 14:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Maxim Levitsky, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

On Wed, May 18, 2022, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> > Or if using kfifo, then it can contain plain u64 items, which is even more natural.
> >
> 
> In the next version I switch to fifo and get rid of 'flush_all' entries
> but instead of a boolean I use a 'magic' value of '-1' in GVA. This way
> we don't need to synchronize with the reader and add any special
> handling for the flag.

Isn't -1 theoretically possible?  Or is wrapping not allowed?  E.g. requesting a
flush for address=0xfffffffffffff000, count = 0xfff will yield -1 and doesn't
create any illegal addresses in the process.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-05-18 14:18       ` Sean Christopherson
@ 2022-05-18 14:43         ` Vitaly Kuznetsov
  2022-05-18 14:55           ` Sean Christopherson
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-18 14:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Sean Christopherson <seanjc@google.com> writes:

> On Wed, May 18, 2022, Vitaly Kuznetsov wrote:
>> Maxim Levitsky <mlevitsk@redhat.com> writes:
>> > Or if using kfifo, then it can contain plain u64 items, which is even more natural.
>> >
>> 
>> In the next version I switch to fifo and get rid of 'flush_all' entries
>> but instead of a boolean I use a 'magic' value of '-1' in GVA. This way
>> we don't need to synchronize with the reader and add any special
>> handling for the flag.
>
> Isn't -1 theoretically possible?  Or is wrapping not allowed?  E.g. requesting a
> flush for address=0xfffffffffffff000, count = 0xfff will yield -1 and doesn't
> create any illegal addresses in the process.
>

Such an error would just lead to KVM flushing the whole guest address
space instead of flushing 4096 pages starting with 0xfffffffffffff000
but over-flushing is always architecturally correct, isn't it?

Personally, I'm not opposed to dropping the magic and enhancing flush
entries with 'flags' again but I'd like to avoid keeping this info
somewhere aside. Also, after we switch to kfifo, we can't play with
ring indexes to somehow indicate this special case. We probably can use
'fifo is full' as such indication but this is very, very un-obvious.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
  2022-05-18 14:43         ` Vitaly Kuznetsov
@ 2022-05-18 14:55           ` Sean Christopherson
  0 siblings, 0 replies; 102+ messages in thread
From: Sean Christopherson @ 2022-05-18 14:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Maxim Levitsky, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

On Wed, May 18, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Wed, May 18, 2022, Vitaly Kuznetsov wrote:
> >> Maxim Levitsky <mlevitsk@redhat.com> writes:
> >> > Or if using kfifo, then it can contain plain u64 items, which is even more natural.
> >> >
> >> 
> >> In the next version I switch to fifo and get rid of 'flush_all' entries
> >> but instead of a boolean I use a 'magic' value of '-1' in GVA. This way
> >> we don't need to synchronize with the reader and add any special
> >> handling for the flag.
> >
> > Isn't -1 theoretically possible?  Or is wrapping not allowed?  E.g. requesting a
> > flush for address=0xfffffffffffff000, count = 0xfff will yield -1 and doesn't
> > create any illegal addresses in the process.
> >
> 
> Such an error would just lead to KVM flushing the whole guest address
> space instead of flushing 4096 pages starting with 0xfffffffffffff000
> but over-flushing is always architecturally correct, isn't it?

Oh, duh.  Yeah, flushing everything is totally ok.  Maybe just add a comment above
the #define for the magic value calling out that corner case and why it's ok?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too
  2022-05-11 11:33   ` Maxim Levitsky
@ 2022-05-19  9:12     ` Vitaly Kuznetsov
  2022-05-19 23:44       ` Sean Christopherson
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-19  9:12 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> KVM_REQ_TLB_FLUSH_CURRENT is an even stronger operation than
>> KVM_REQ_TLB_FLUSH_GUEST so KVM_REQ_HV_TLB_FLUSH needs not to be
>> processed after it.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/x86.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e5aec386d299..d3839e648ab3 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3357,8 +3357,11 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>>   */
>>  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>>  {
>> -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
>> +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>>  		kvm_vcpu_flush_tlb_current(vcpu);
>> +		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
>> +			kvm_hv_vcpu_empty_flush_tlb(vcpu);
>> +	}
>>  
>>  	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
>>  		kvm_vcpu_flush_tlb_guest(vcpu);
>
>
> I think that this patch should be moved near patch 1 and/or even squished with it.
>

Sure, will merge.

This, however, made me think there's room for optimization here. In some
cases, when both KVM_REQ_TLB_FLUSH_CURRENT and KVM_REQ_TLB_FLUSH_GUEST
were requested, there's no need to flush twice, e.g. on SVM
.flush_tlb_current == .flush_tlb_guest. I'll probably not go into this
territory with this series as it's already fairly big, just something
for the future.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check
  2022-05-11 11:30   ` Maxim Levitsky
@ 2022-05-19 13:25     ` Vitaly Kuznetsov
  2022-05-19 13:28       ` Maxim Levitsky
  0 siblings, 1 reply; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-19 13:25 UTC (permalink / raw)
  To: Maxim Levitsky, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
>> from L2 in L0 to process L2 TLB flush requests.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h | 1 +
>>  arch/x86/kvm/hyperv.c           | 6 ++++++
>>  arch/x86/kvm/hyperv.h           | 7 +++++++
>>  3 files changed, 14 insertions(+)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ce62fde5f4ff..168600490bd1 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
>>  		u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
>>  		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
>>  		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
>> +		u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
>>  	} cpuid_cache;
>>  
>>  	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 79aabe0c33ec..68a0df4e3f66 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
>>  		hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
>>  	else
>>  		hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
>> +
>> +	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
>> +	if (entry)
>> +		hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
>> +	else
>> +		hv_vcpu->cpuid_cache.nested_features_eax = 0;
>>  }
>>  
>>  int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
>> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
>> index f593c9fd1dee..d8cb6d70dbc8 100644
>> --- a/arch/x86/kvm/hyperv.h
>> +++ b/arch/x86/kvm/hyperv.h
>> @@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
>>  	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>>  }
>>  
>> +static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +
>> +	return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
>> +}
>
> Tiny nipick (feel free to ignore): maybe use 'supported' instead of 'exposed',
> as we don't use this term in KVM often.
>

Indeed we don't. Basically, this is guest_cpuid_has() but for a Hyper-V
bit. I don't quite like 'supported' because we don't actually check
whether KVM or even L1 guest 'support' this feature or not, we check
whether the feature was 'exposed' to L1 so it can actually use it. I'm
going to rename this to

 guest_hv_cpuid_has_l2_tlb_flush()

then.

> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>

Thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check
  2022-05-19 13:25     ` Vitaly Kuznetsov
@ 2022-05-19 13:28       ` Maxim Levitsky
  0 siblings, 0 replies; 102+ messages in thread
From: Maxim Levitsky @ 2022-05-19 13:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel

On Thu, 2022-05-19 at 15:25 +0200, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> 
> > On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > > Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
> > > from L2 in L0 to process L2 TLB flush requests.
> > > 
> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h | 1 +
> > >  arch/x86/kvm/hyperv.c           | 6 ++++++
> > >  arch/x86/kvm/hyperv.h           | 7 +++++++
> > >  3 files changed, 14 insertions(+)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index ce62fde5f4ff..168600490bd1 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
> > >  		u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
> > >  		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
> > >  		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
> > > +		u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
> > >  	} cpuid_cache;
> > >  
> > >  	struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
> > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > > index 79aabe0c33ec..68a0df4e3f66 100644
> > > --- a/arch/x86/kvm/hyperv.c
> > > +++ b/arch/x86/kvm/hyperv.c
> > > @@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
> > >  		hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
> > >  	else
> > >  		hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
> > > +
> > > +	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
> > > +	if (entry)
> > > +		hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
> > > +	else
> > > +		hv_vcpu->cpuid_cache.nested_features_eax = 0;
> > >  }
> > >  
> > >  int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
> > > diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> > > index f593c9fd1dee..d8cb6d70dbc8 100644
> > > --- a/arch/x86/kvm/hyperv.h
> > > +++ b/arch/x86/kvm/hyperv.h
> > > @@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
> > >  	tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
> > >  }
> > >  
> > > +static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> > > +
> > > +	return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
> > > +}
> > 
> > Tiny nipick (feel free to ignore): maybe use 'supported' instead of 'exposed',
> > as we don't use this term in KVM often.
> > 
> 
> Indeed we don't. Basically, this is guest_cpuid_has() but for a Hyper-V
> bit. I don't quite like 'supported' because we don't actually check
> whether KVM or even L1 guest 'support' this feature or not, we check
> whether the feature was 'exposed' to L1 so it can actually use it. I'm
> going to rename this to
> 
>  guest_hv_cpuid_has_l2_tlb_flush()
Sounds perfect!

Best regards,
	Maxim Levitsky

> 
> then.
> 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> 
> Thanks!
> 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too
  2022-05-19  9:12     ` Vitaly Kuznetsov
@ 2022-05-19 23:44       ` Sean Christopherson
  0 siblings, 0 replies; 102+ messages in thread
From: Sean Christopherson @ 2022-05-19 23:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Maxim Levitsky, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

On Thu, May 19, 2022, Vitaly Kuznetsov wrote:
> This, however, made me think there's room for optimization here. In some
> cases, when both KVM_REQ_TLB_FLUSH_CURRENT and KVM_REQ_TLB_FLUSH_GUEST
> were requested, there's no need to flush twice, e.g. on SVM
> .flush_tlb_current == .flush_tlb_guest. I'll probably not go into this
> territory with this series as it's already fairly big, just something
> for the future.

Definitely not worth your time.  On VMX, CURRENT isn't a superset of GUEST when
EPT is enabled.  And on SVM, the flush doesn't actually occur until VM-Enter, i.e.
the redundant flush is just an extra write to svm->vmcb->control.tlb_ctl (or an
extra decrement of asid_generation).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest
  2022-05-11 12:17   ` Maxim Levitsky
@ 2022-05-24 14:51     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 102+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-24 14:51 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Michael Kelley,
	Siddharth Chandrasekaran, linux-hyperv, linux-kernel, kvm,
	Paolo Bonzini

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
>> Introduce a selftest for Hyper-V PV TLB flush hypercalls
>> (HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
>> HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).
>> 
>> The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
>> loop reading from a certain GVA checking the observed value. Sender
>> vCPU drops to the host to swap the data page with another page filled
>> with a different value. The expectation for workers is also
>> altered. Without TLB flush on worker vCPUs, they may continue to
>> observe old value. To guard against accidental TLB flushes for worker
>> vCPUs the test is repeated 100 times.
>> 
>> Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
>> fast' modes.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  tools/testing/selftests/kvm/.gitignore        |   1 +
>>  tools/testing/selftests/kvm/Makefile          |   1 +
>>  .../selftests/kvm/include/x86_64/hyperv.h     |   1 +
>>  .../selftests/kvm/x86_64/hyperv_tlb_flush.c   | 647 ++++++++++++++++++
>>  4 files changed, 650 insertions(+)
>>  create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
>> 
>> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
>> index 5d5fbb161d56..1a1d09e414d5 100644
>> --- a/tools/testing/selftests/kvm/.gitignore
>> +++ b/tools/testing/selftests/kvm/.gitignore
>> @@ -25,6 +25,7 @@
>>  /x86_64/hyperv_features
>>  /x86_64/hyperv_ipi
>>  /x86_64/hyperv_svm_test
>> +/x86_64/hyperv_tlb_flush
>>  /x86_64/mmio_warning_test
>>  /x86_64/mmu_role_test
>>  /x86_64/platform_info_test
>> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
>> index 44889f897fe7..8b83abc09a1a 100644
>> --- a/tools/testing/selftests/kvm/Makefile
>> +++ b/tools/testing/selftests/kvm/Makefile
>> @@ -54,6 +54,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
>>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
>>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
>> +TEST_GEN_PROGS_x86_64 += x86_64/hyperv_tlb_flush
>>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
>>  TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
>>  TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
>> diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
>> index f51d6fab8e93..1e34dd7c5075 100644
>> --- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
>> +++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
>> @@ -185,6 +185,7 @@
>>  /* hypercall options */
>>  #define HV_HYPERCALL_FAST_BIT		BIT(16)
>>  #define HV_HYPERCALL_VARHEAD_OFFSET	17
>> +#define HV_HYPERCALL_REP_COMP_OFFSET	32
>>  
>>  #define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
>>  
>> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
>> new file mode 100644
>> index 000000000000..00bcae45ddd2
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
>> @@ -0,0 +1,647 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Hyper-V HvFlushVirtualAddress{List,Space}{,Ex} tests
>> + *
>> + * Copyright (C) 2022, Red Hat, Inc.
>> + *
>> + */
>> +
>> +#define _GNU_SOURCE /* for program_invocation_short_name */
>> +#include <pthread.h>
>> +#include <inttypes.h>
>> +
>> +#include "kvm_util.h"
>> +#include "hyperv.h"
>> +#include "processor.h"
>> +#include "test_util.h"
>> +#include "vmx.h"
>> +
>> +#define SENDER_VCPU_ID   1
>> +#define WORKER_VCPU_ID_1 2
>> +#define WORKER_VCPU_ID_2 65
>> +
>> +#define NTRY 100
>> +
>> +struct thread_params {
>> +	struct kvm_vm *vm;
>> +	uint32_t vcpu_id;
>> +};
>> +
>> +struct hv_vpset {
>> +	u64 format;
>> +	u64 valid_bank_mask;
>> +	u64 bank_contents[];
>> +};
>> +
>> +enum HV_GENERIC_SET_FORMAT {
>> +	HV_GENERIC_SET_SPARSE_4K,
>> +	HV_GENERIC_SET_ALL,
>> +};
>> +
>> +#define HV_FLUSH_ALL_PROCESSORS			BIT(0)
>> +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	BIT(1)
>> +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	BIT(2)
>> +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	BIT(3)
>> +
>> +/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
>> +struct hv_tlb_flush {
>> +	u64 address_space;
>> +	u64 flags;
>> +	u64 processor_mask;
>> +	u64 gva_list[];
>> +} __packed;
>> +
>> +/* HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressListEx hypercalls */
>> +struct hv_tlb_flush_ex {
>> +	u64 address_space;
>> +	u64 flags;
>> +	struct hv_vpset hv_vp_set;
>> +	u64 gva_list[];
>> +} __packed;
>> +
>> +static inline void hv_init(vm_vaddr_t pgs_gpa)
>> +{
>> +	wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
>> +	wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
>> +}
>> +
>> +static void worker_code(void *test_pages, vm_vaddr_t pgs_gpa)
>> +{
>> +	u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
>> +	unsigned char chr;
>> +
>> +	x2apic_enable();
>> +	hv_init(pgs_gpa);
>> +
>> +	for (;;) {
>> +		chr = READ_ONCE(*(unsigned char *)(test_pages + 4096 * 2 + vcpu_id));
> It would be nice to wrap this into a function, like set_expected_char does for ease
> of code understanding.
>
>> +		if (chr)
>> +			GUEST_ASSERT(*(unsigned char *)test_pages == chr);
>> +		asm volatile("nop");
>> +	}
>> +}
>> +
>> +static inline u64 hypercall(u64 control, vm_vaddr_t arg1, vm_vaddr_t arg2)
>> +{
>> +	u64 hv_status;
>> +
>> +	asm volatile("mov %3, %%r8\n"
>> +		     "vmcall"
>> +		     : "=a" (hv_status),
>> +		       "+c" (control), "+d" (arg1)
>> +		     :  "r" (arg2)
>> +		     : "cc", "memory", "r8", "r9", "r10", "r11");
>> +
>> +	return hv_status;
>> +}
>> +
>> +static inline void nop_loop(void)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < 10000000; i++)
>> +		asm volatile("nop");
>> +}
>> +
>> +static inline void sync_to_xmm(void *data)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < 8; i++)
>> +		write_sse_reg(i, (sse128_t *)(data + sizeof(sse128_t) * i));
>> +}
>
> Nitpick: I see duplicated code, I complain ;-) - maybe put the above to some common file?
>

Gone now.

>> +
>> +static void set_expected_char(void *addr, unsigned char chr, int vcpu_id)
>> +{
>> +	asm volatile("mfence");
>
> I remember that Paolo once told me (I might not remember that correctly though),
> that on x86 the actual hardware barriers like mfence are not really
> needed, because hardware already does memory accesses in order,
> unless fancy (e.g non WB) memory types are used.

Even if it can be dropped we still need a compile barrier so I prefer to
keep explicit 'mfence'/'lfence'/... -- especially in tests where
performance doesn't matter much.

>
>> +	*(unsigned char *)(addr + 2 * 4096 + vcpu_id) = chr;
>> +}
>> +
>> +static void sender_guest_code(void *hcall_page, void *test_pages, vm_vaddr_t pgs_gpa)
>> +{
>> +	struct hv_tlb_flush *flush = (struct hv_tlb_flush *)hcall_page;
>> +	struct hv_tlb_flush_ex *flush_ex = (struct hv_tlb_flush_ex *)hcall_page;
>> +	int stage = 1, i;
>> +	u64 res;
>> +
>> +	hv_init(pgs_gpa);
>> +
>> +	/* "Slow" hypercalls */
>
> I hopefully understand it correctly, see my comments below,
> but it might be worthy to add something similar to my comments
> to the code to make it easier for someone reading the code to understand it.
>
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>
> Here we set expected char to 0, meaning that now workers will not assert
> if there is mismatch.
>
>> +		GUEST_SYNC(stage++);
> Now there is a mismatch, the host swapped pages for us.
>
>> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>
> Now we flushed the TLB, the guest should see correct value.
>
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>
> Now we force the workers to check it.
>
> Btw, an idea: it might be nice to use more that two test pages,
> like say 100 test pages each filled with different value,
> memory is cheap, and this way there will be no way for something
> to cause 'double error' which could hide the bug by a chance.
>
>
> Another thing, it might be nice to wrap this into a macro/function
> to avoid *that* much duplication.

In the next version I still keep two pages and two workers for
simpliciy, but I wrap all these pre- and post- guts into wrapper
functions.

>
>
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
>> +		flush->gva_list[0] = (u64)test_pages;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
>> +		flush->processor_mask = 0;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
>> +		flush->gva_list[0] = (u64)test_pages;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
>> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		/* bank_contents and gva_list occupy the same space, thus [1] */
>> +		flush_ex->gva_list[1] = (u64)test_pages;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
>> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET) |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
>> +			BIT_ULL(WORKER_VCPU_ID_1 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
>> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
>> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
>> +			BIT_ULL(WORKER_VCPU_ID_2 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
>> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		/* bank_contents and gva_list occupy the same space, thus [2] */
>> +		flush_ex->gva_list[2] = (u64)test_pages;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
>> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET) |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX,
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
>> +		flush_ex->gva_list[0] = (u64)test_pages;
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				pgs_gpa, pgs_gpa + 4096);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* "Fast" hypercalls */
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
>> +		sync_to_xmm(&flush->processor_mask);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
>> +				HV_HYPERCALL_FAST_BIT, 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush->processor_mask = BIT(WORKER_VCPU_ID_1);
>> +		flush->gva_list[0] = (u64)test_pages;
>> +		sync_to_xmm(&flush->processor_mask);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST | HV_HYPERCALL_FAST_BIT |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		sync_to_xmm(&flush->processor_mask);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
>> +				HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush->gva_list[0] = (u64)test_pages;
>> +		sync_to_xmm(&flush->processor_mask);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST | HV_HYPERCALL_FAST_BIT |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET), 0x0,
>> +				HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		sync_to_xmm(&flush_ex->hv_vp_set);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT |
>> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET),
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		/* bank_contents and gva_list occupy the same space, thus [1] */
>> +		flush_ex->gva_list[1] = (u64)test_pages;
>> +		sync_to_xmm(&flush_ex->hv_vp_set);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
>> +				(1 << HV_HYPERCALL_VARHEAD_OFFSET) |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
>> +			BIT_ULL(WORKER_VCPU_ID_1 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
>> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		sync_to_xmm(&flush_ex->hv_vp_set);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT |
>> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET),
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>> +		flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
>> +			BIT_ULL(WORKER_VCPU_ID_2 / 64);
>> +		flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
>> +		flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
>> +		/* bank_contents and gva_list occupy the same space, thus [2] */
>> +		flush_ex->gva_list[2] = (u64)test_pages;
>> +		sync_to_xmm(&flush_ex->hv_vp_set);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
>> +				(2 << HV_HYPERCALL_VARHEAD_OFFSET) |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
>> +		sync_to_xmm(&flush_ex->hv_vp_set);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX | HV_HYPERCALL_FAST_BIT,
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	/* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
>> +	for (i = 0; i < NTRY; i++) {
>> +		memset(hcall_page, 0, 4096);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, 0x0, WORKER_VCPU_ID_2);
>> +		GUEST_SYNC(stage++);
>> +		flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
>> +		flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
>> +		flush_ex->gva_list[0] = (u64)test_pages;
>> +		sync_to_xmm(&flush_ex->hv_vp_set);
>> +		res = hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX | HV_HYPERCALL_FAST_BIT |
>> +				(1UL << HV_HYPERCALL_REP_COMP_OFFSET),
>> +				0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
>> +		GUEST_ASSERT((res & 0xffff) == 0);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_1);
>> +		set_expected_char(test_pages, i % 2 ? 0x1 : 0x2, WORKER_VCPU_ID_2);
>> +		nop_loop();
>> +	}
>> +
>> +	GUEST_DONE();
>> +}
>> +
>> +static void *vcpu_thread(void *arg)
>> +{
>> +	struct thread_params *params = (struct thread_params *)arg;
>> +	struct ucall uc;
>> +	int old;
>> +	int r;
>> +	unsigned int exit_reason;
>> +
>> +	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
>> +	TEST_ASSERT(r == 0,
>> +		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
>> +		    params->vcpu_id, r);
>> +
>> +	vcpu_run(params->vm, params->vcpu_id);
>> +	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
>> +
>> +	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
>> +		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
>> +		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
>> +
>> +	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
>> +		TEST_ASSERT(false,
>> +			    "vCPU %u exited with error: %s.\n",
>> +			    params->vcpu_id, (const char *)uc.args[0]);
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static void cancel_join_vcpu_thread(pthread_t thread, uint32_t vcpu_id)
>> +{
>> +	void *retval;
>> +	int r;
>> +
>> +	r = pthread_cancel(thread);
>> +	TEST_ASSERT(r == 0,
>> +		    "pthread_cancel on vcpu_id=%d failed with errno=%d",
>> +		    vcpu_id, r);
>> +
>> +	r = pthread_join(thread, &retval);
>> +	TEST_ASSERT(r == 0,
>> +		    "pthread_join on vcpu_id=%d failed with errno=%d",
>> +		    vcpu_id, r);
>> +	TEST_ASSERT(retval == PTHREAD_CANCELED,
>> +		    "expected retval=%p, got %p", PTHREAD_CANCELED,
>> +		    retval);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +	int r;
>> +	pthread_t threads[2];
>> +	struct thread_params params[2];
>> +	struct kvm_vm *vm;
>> +	struct kvm_run *run;
>> +	vm_vaddr_t hcall_page, test_pages;
>> +	struct ucall uc;
>> +	int stage = 1;
>> +
>> +	vm = vm_create_default(SENDER_VCPU_ID, 0, sender_guest_code);
>> +	params[0].vm = vm;
>> +	params[1].vm = vm;
>> +
>> +	/* Hypercall input/output */
>> +	hcall_page = vm_vaddr_alloc_pages(vm, 2);
>> +	memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
>> +
>> +	/*
>> +	 * Test pages: the first one is filled with '0x1's, the second with '0x2's
>> +	 * and the test will swap their mappings. The third page keeps the indication
>> +	 * about the current state of mappings.
>> +	 */
>> +	test_pages = vm_vaddr_alloc_pages(vm, 3);
>> +	memset(addr_gva2hva(vm, test_pages), 0x1, 4096);
>> +	memset(addr_gva2hva(vm, test_pages) + 4096, 0x2, 4096);
>> +	set_expected_char(addr_gva2hva(vm, test_pages), 0x0, WORKER_VCPU_ID_1);
>> +	set_expected_char(addr_gva2hva(vm, test_pages), 0x0, WORKER_VCPU_ID_2);
>> +
>> +	vm_vcpu_add_default(vm, WORKER_VCPU_ID_1, worker_code);
>> +	vcpu_args_set(vm, WORKER_VCPU_ID_1, 2, test_pages, addr_gva2gpa(vm, hcall_page));
>> +	vcpu_set_msr(vm, WORKER_VCPU_ID_1, HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_1);
>> +	vcpu_set_hv_cpuid(vm, WORKER_VCPU_ID_1);
>> +
>> +	vm_vcpu_add_default(vm, WORKER_VCPU_ID_2, worker_code);
>> +	vcpu_args_set(vm, WORKER_VCPU_ID_2, 2, test_pages, addr_gva2gpa(vm, hcall_page));
>> +	vcpu_set_msr(vm, WORKER_VCPU_ID_2, HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_2);
>> +	vcpu_set_hv_cpuid(vm, WORKER_VCPU_ID_2);
>> +
>> +	vcpu_args_set(vm, SENDER_VCPU_ID, 3, hcall_page, test_pages,
>> +		      addr_gva2gpa(vm, hcall_page));
>
> It seems that all worker vCPUs get pointer to the hypercall page,
> which they don't need and if used will create a race.
>

Dropped (actually, I've created a new 'test_data' structure which is
shared by workers and sender).

>
>> +	vcpu_set_hv_cpuid(vm, SENDER_VCPU_ID);
>> +
>> +	params[0].vcpu_id = WORKER_VCPU_ID_1;
>> +	r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
>> +	TEST_ASSERT(r == 0,
>> +		    "pthread_create halter failed errno=%d", errno);
>> +
>> +	params[1].vcpu_id = WORKER_VCPU_ID_2;
>> +	r = pthread_create(&threads[1], NULL, vcpu_thread, &params[1]);
>> +	TEST_ASSERT(r == 0,
>> +		    "pthread_create halter failed errno=%d", errno);
>
> Also here worker threads don't halt, the message was not updated I think.
>

Fixed!

>
>> +
>> +	run = vcpu_state(vm, SENDER_VCPU_ID);
>> +
>> +	while (true) {
>> +		r = _vcpu_run(vm, SENDER_VCPU_ID);
>> +		TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
>> +		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
>> +			    "unexpected exit reason: %u (%s)",
>> +			    run->exit_reason, exit_reason_str(run->exit_reason));
>> +
>> +		switch (get_ucall(vm, SENDER_VCPU_ID, &uc)) {
>> +		case UCALL_SYNC:
>> +			TEST_ASSERT(uc.args[1] == stage,
>> +				    "Unexpected stage: %ld (%d expected)\n",
>> +				    uc.args[1], stage);
>> +			break;
>> +		case UCALL_ABORT:
>> +			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
>> +				  __FILE__, uc.args[1]);
>> +			return 1;
>> +		case UCALL_DONE:
>> +			return 0;
>> +		}
>> +
>> +		/* Swap test pages */
>> +		if (stage % 2) {
>> +			__virt_pg_map(vm, test_pages, addr_gva2gpa(vm, test_pages) + 4096,
>> +				      X86_PAGE_SIZE_4K, true);
>> +			__virt_pg_map(vm, test_pages + 4096, addr_gva2gpa(vm, test_pages) - 4096,
>> +				      X86_PAGE_SIZE_4K, true);
>> +		} else {
>> +			__virt_pg_map(vm, test_pages, addr_gva2gpa(vm, test_pages) - 4096,
>> +				      X86_PAGE_SIZE_4K, true);
>> +			__virt_pg_map(vm, test_pages + 4096, addr_gva2gpa(vm, test_pages) + 4096,
>> +				      X86_PAGE_SIZE_4K, true);
>> +		}
>
> Another question: why the host doing the swapping of the pages? Since
> !EPT/!NPT is not the goal of this test,
>
> no doubt, why not let the guest vCPU (the sender) do the swapping, which should eliminate the VM exits
> to the host (which can interfere with TLB flush even) and make it
> closer to the real world usage.

This is actually a good idea. It required some APIs to be exported and
some trickery so the guest can actually reach its PTEs but I think it's
worth it so the next version will be doing all updates from the guest
itself.

>
>
>> +
>> +		stage++;
>> +	}
>> +
>> +	cancel_join_vcpu_thread(threads[0], WORKER_VCPU_ID_1);
>> +	cancel_join_vcpu_thread(threads[1], WORKER_VCPU_ID_2);
>> +	kvm_vm_free(vm);
>> +
>> +	return 0;
>> +}
>
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2022-05-24 14:51 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14 13:19 [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov
2022-04-14 13:19 ` [PATCH v3 01/34] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag Vitaly Kuznetsov
2022-05-11 11:18   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 02/34] KVM: x86: hyper-v: Introduce TLB flush ring Vitaly Kuznetsov
2022-05-11 11:19   ` Maxim Levitsky
2022-05-16 14:29     ` Vitaly Kuznetsov
2022-05-16 19:34   ` Sean Christopherson
2022-05-17 13:31     ` Vitaly Kuznetsov
2022-04-14 13:19 ` [PATCH v3 03/34] KVM: x86: hyper-v: Add helper to read hypercall data for array Vitaly Kuznetsov
2022-05-11 11:20   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 04/34] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Vitaly Kuznetsov
2022-05-11 11:22   ` Maxim Levitsky
2022-05-18  9:39     ` Vitaly Kuznetsov
2022-05-18 14:18       ` Sean Christopherson
2022-05-18 14:43         ` Vitaly Kuznetsov
2022-05-18 14:55           ` Sean Christopherson
2022-05-16 19:41   ` Sean Christopherson
2022-05-17 13:41     ` Vitaly Kuznetsov
2022-04-14 13:19 ` [PATCH v3 05/34] KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls Vitaly Kuznetsov
2022-05-11 11:23   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 06/34] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs Vitaly Kuznetsov
2022-05-11 11:23   ` Maxim Levitsky
2022-05-11 11:23   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 07/34] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants Vitaly Kuznetsov
2022-04-25 15:47   ` Wei Liu
2022-04-25 17:34     ` Michael Kelley (LINUX)
2022-04-25 19:09   ` Christophe JAILLET
2022-04-25 19:16   ` Christophe JAILLET
2022-05-03 14:59     ` Vitaly Kuznetsov
2022-05-03 11:11   ` Wei Liu
2022-05-11 11:23   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 08/34] KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64' Vitaly Kuznetsov
2022-05-11 11:24   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 09/34] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi() Vitaly Kuznetsov
2022-05-11 11:24   ` Maxim Levitsky
2022-05-16 19:52     ` Sean Christopherson
2022-04-14 13:19 ` [PATCH v3 10/34] KVM: x86: hyper-v: Create a separate ring for L2 TLB flush Vitaly Kuznetsov
2022-05-11 11:24   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 11/34] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' Vitaly Kuznetsov
2022-05-11 11:25   ` Maxim Levitsky
2022-05-16 20:05   ` Sean Christopherson
2022-05-17 13:51     ` Vitaly Kuznetsov
2022-05-17 14:04       ` Sean Christopherson
2022-05-17 14:19         ` Vitaly Kuznetsov
2022-04-14 13:19 ` [PATCH v3 12/34] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use Vitaly Kuznetsov
2022-05-11 11:25   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 13/34] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id Vitaly Kuznetsov
2022-05-11 11:27   ` Maxim Levitsky
2022-05-18 12:25     ` Vitaly Kuznetsov
2022-05-18 12:45       ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 14/34] KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook Vitaly Kuznetsov
2022-05-11 11:32   ` Maxim Levitsky
2022-05-18 12:43     ` Vitaly Kuznetsov
2022-05-18 12:49       ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 15/34] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall() Vitaly Kuznetsov
2022-05-11 11:25   ` Maxim Levitsky
2022-05-16 20:09   ` Sean Christopherson
2022-04-14 13:19 ` [PATCH v3 16/34] KVM: x86: hyper-v: L2 TLB flush Vitaly Kuznetsov
2022-05-11 11:29   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 17/34] KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check Vitaly Kuznetsov
2022-05-11 11:30   ` Maxim Levitsky
2022-05-19 13:25     ` Vitaly Kuznetsov
2022-05-19 13:28       ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 18/34] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
2022-05-11 11:30   ` Maxim Levitsky
2022-04-14 13:19 ` [PATCH v3 19/34] KVM: nVMX: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
2022-05-11 11:31   ` Maxim Levitsky
2022-05-16 20:16     ` Sean Christopherson
2022-04-14 13:19 ` [PATCH v3 20/34] KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too Vitaly Kuznetsov
2022-05-11 11:33   ` Maxim Levitsky
2022-05-19  9:12     ` Vitaly Kuznetsov
2022-05-19 23:44       ` Sean Christopherson
2022-04-14 13:20 ` [PATCH v3 21/34] KVM: nSVM: hyper-v: Enable L2 TLB flush Vitaly Kuznetsov
2022-05-11 11:33   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 22/34] KVM: x86: Expose Hyper-V L2 TLB flush feature Vitaly Kuznetsov
2022-05-11 11:34   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 23/34] KVM: selftests: Better XMM read/write helpers Vitaly Kuznetsov
2022-05-11 11:34   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 24/34] KVM: selftests: Hyper-V PV IPI selftest Vitaly Kuznetsov
2022-05-11 11:35   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 25/34] KVM: selftests: Make it possible to replace PTEs with __virt_pg_map() Vitaly Kuznetsov
2022-05-11 11:34   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 26/34] KVM: selftests: Hyper-V PV TLB flush selftest Vitaly Kuznetsov
2022-05-11 12:17   ` Maxim Levitsky
2022-05-24 14:51     ` Vitaly Kuznetsov
2022-04-14 13:20 ` [PATCH v3 27/34] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h Vitaly Kuznetsov
2022-05-11 12:17   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 28/34] KVM: selftests: nVMX: Allocate Hyper-V partition assist page Vitaly Kuznetsov
2022-05-11 12:17   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 29/34] KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist pages Vitaly Kuznetsov
2022-05-11 12:17   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 30/34] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h Vitaly Kuznetsov
2022-05-11 12:18   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 31/34] KVM: selftests: evmcs_test: Introduce L2 TLB flush test Vitaly Kuznetsov
2022-05-11 12:18   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 32/34] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h Vitaly Kuznetsov
2022-05-11 12:18   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 33/34] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test Vitaly Kuznetsov
2022-05-11 12:19   ` Maxim Levitsky
2022-04-14 13:20 ` [PATCH v3 34/34] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush' Vitaly Kuznetsov
2022-05-11 12:18   ` Maxim Levitsky
2022-05-03 15:01 ` [PATCH v3 00/34] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush feature Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.