All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests
@ 2020-12-14  8:38 David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn() David Woodhouse
                   ` (16 more replies)
  0 siblings, 17 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

Reviving the first section (so far) of a patch set that Joao posted as 
RFC last year:

https://lore.kernel.org/kvm/20190220201609.28290-1-joao.m.martins@oracle.com/

This is just enough to support Xen HVM guests. It allows hypercalls to
be trapped to userspace for handling, uses the existing KVM functions for
writing system clock and pvclock information to Xen shared pages, and
adds Xen runstate info and event channel upcall vector delivery.

I've updated and reworked the original a bit, including (in my v1):
 • Support for 32-bit guests
 • 64-bit second support in wallclock
 • Time counters for runnable/blocked states in runstate support
 • Self-tests
 • Fixed Viridian coexistence
 • No new KVM_CAP_XEN_xxx, just more bits returned by KVM_CAP_XEN_HVM

v2: 
 • Remember the RCU read-critical sections on using the shared info pages
 • Fix 32-bit build of compat structures (which we use there too)
 • Use RUNSTATE_blocked as initial state not RUNSTATE_runnable
 • Include documentation, add cosmetic KVM_XEN_HVM_CONFIG_HYPERCALL_MSR

v3:
 • Stop mapping the shared pages; use kvm_guest_write_cached() instead.
 • Use kvm_setup_pvclock_page() for Xen pvclock writes too.
 • Fix CPU numbering confusion and update documentation accordingly.
 • Support HVMIRQ_callback_vector delivery based on evtchn_upcall_pending.

With the addition in v3 of the callback vector support, we can now 
successfully boot Linux guests. Other callback types can be handled 
entirely from userspace, but the vector injection needs kernel support 
because it doesn't quite work to inject it as ExtINT.

We will work on a little bit more event channel offload in future patches,
as discussed, but those are purely optimisations. There's a bunch of work
for us to do in userspace before those get to the top of our list, and
this patch set should be functionally complete as it is.

We're working on pushing out rust-vmm support to make use of this, and
Joao's qemu patches from last year should still also work with minor
tweaks where I've "improved" the KVM←→userspace ABI.

 Documentation/virt/kvm/api.rst                       | 123 ++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h                      |  24 ++++
 arch/x86/include/asm/xen/interface.h                 |   3 +
 arch/x86/kvm/Makefile                                |   2 +-
 arch/x86/kvm/hyperv.c                                |  40 +++++--
 arch/x86/kvm/irq.c                                   |   7 ++
 arch/x86/kvm/trace.h                                 |  36 ++++++
 arch/x86/kvm/x86.c                                   | 142 ++++++++++++++---------
 arch/x86/kvm/x86.h                                   |   1 +
 arch/x86/kvm/xen.c                                   | 495 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/xen.h                                   |  68 +++++++++++
 include/linux/kvm_host.h                             |  34 +++---
 include/uapi/linux/kvm.h                             |  50 ++++++++
 include/xen/interface/xen.h                          |   4 +-
 tools/testing/selftests/kvm/Makefile                 |   2 +
 tools/testing/selftests/kvm/lib/kvm_util.c           |   1 +
 tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c | 194 +++++++++++++++++++++++++++++++
 tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c | 150 ++++++++++++++++++++++++
 virt/kvm/kvm_main.c                                  |   8 +-
 19 files changed, 1297 insertions(+), 87 deletions(-)

David Woodhouse (8):
      KVM: Fix arguments to kvm_{un,}map_gfn()
      KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls
      KVM: x86/xen: latch long_mode when hypercall page is set up
      KVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info
      xen: add wc_sec_hi to struct shared_info
      KVM: x86: declare Xen HVM shared info capability and add test case
      KVM: Add documentation for Xen hypercall and shared_info updates
      KVM: x86/xen: Add event channel interrupt vector upcall

Joao Martins (9):
      KVM: x86/xen: fix Xen hypercall page msr handling
      KVM: x86/xen: intercept xen hypercalls if enabled
      KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR
      KVM: x86/xen: register shared_info page
      KVM: x86/xen: update wallclock region
      KVM: x86/xen: register vcpu info
      KVM: x86/xen: setup pvclock updates
      KVM: x86/xen: register vcpu time info region
      KVM: x86/xen: register runstate info




^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14 21:13   ` Vitaly Kuznetsov
  2020-12-14  8:38 ` [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling David Woodhouse
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

It shouldn't take a vcpu.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c       | 8 ++++----
 include/linux/kvm_host.h | 4 ++--
 virt/kvm/kvm_main.c      | 8 ++++----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e545a8a613b1..c7f1ba21212e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2957,7 +2957,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		return;
 
 	/* -EAGAIN is returned in atomic context so we can just return. */
-	if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
+	if (kvm_map_gfn(vcpu->kvm, vcpu->arch.st.msr_val >> PAGE_SHIFT,
 			&map, &vcpu->arch.st.cache, false))
 		return;
 
@@ -2992,7 +2992,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
 	st->version += 1;
 
-	kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
+	kvm_unmap_gfn(vcpu->kvm, &map, &vcpu->arch.st.cache, true, false);
 }
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -3981,7 +3981,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.st.preempted)
 		return;
 
-	if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
+	if (kvm_map_gfn(vcpu->kvm, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
 			&vcpu->arch.st.cache, true))
 		return;
 
@@ -3990,7 +3990,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 
 	st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
 
-	kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
+	kvm_unmap_gfn(vcpu->kvm, &map, &vcpu->arch.st.cache, true, true);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7f2e2a09ebbd..8eb5eb1399f5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -806,11 +806,11 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
 kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
-int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
+int kvm_map_gfn(struct kvm *kvm, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn);
 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
-int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
+int kvm_unmap_gfn(struct kvm *kvm, struct kvm_host_map *map,
 		  struct gfn_to_pfn_cache *cache, bool dirty, bool atomic);
 unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2541a17ff1c4..f01a8df7806a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2181,10 +2181,10 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	return 0;
 }
 
-int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
+int kvm_map_gfn(struct kvm *kvm, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic)
 {
-	return __kvm_map_gfn(kvm_memslots(vcpu->kvm), gfn, map,
+	return __kvm_map_gfn(kvm_memslots(kvm), gfn, map,
 			cache, atomic);
 }
 EXPORT_SYMBOL_GPL(kvm_map_gfn);
@@ -2232,10 +2232,10 @@ static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot,
 	map->page = NULL;
 }
 
-int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map, 
+int kvm_unmap_gfn(struct kvm *kvm, struct kvm_host_map *map,
 		  struct gfn_to_pfn_cache *cache, bool dirty, bool atomic)
 {
-	__kvm_unmap_gfn(gfn_to_memslot(vcpu->kvm, map->gfn), map,
+	__kvm_unmap_gfn(gfn_to_memslot(kvm, map->gfn), map,
 			cache, dirty, atomic);
 	return 0;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn() David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14 21:27   ` Vitaly Kuznetsov
  2020-12-23  8:35   ` Christoph Hellwig
  2020-12-14  8:38 ` [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled David Woodhouse
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

Xen usually places its MSR at 0x40000000 or 0x40000200 depending on
whether it is running in viridian mode or not. Note that this is not
ABI guaranteed, so it is possible for Xen to advertise the MSR some
place else.

Given the way xen_hvm_config() is handled, if the former address is
selected, this will conflict with Hyper-V's MSR
(HV_X64_MSR_GUEST_OS_ID) which unconditionally uses the same address.

Given that the MSR location is arbitrary, move the xen_hvm_config()
handling to the top of kvm_set_msr_common() before falling through.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7f1ba21212e..13ba4a64f748 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3001,6 +3001,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
 
+	if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
+		return xen_hvm_config(vcpu, data);
+
 	switch (msr) {
 	case MSR_AMD64_NB_CFG:
 	case MSR_IA32_UCODE_WRITE:
@@ -3288,8 +3291,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.msr_misc_features_enables = data;
 		break;
 	default:
-		if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
-			return xen_hvm_config(vcpu, data);
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
 		return KVM_MSR_RET_INVALID;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn() David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-23  8:36   ` Christoph Hellwig
  2020-12-14  8:38 ` [PATCH v3 04/17] KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls David Woodhouse
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

Add a new exit reason for emulator to handle Xen hypercalls.

Since this means KVM owns the ABI, dispense with the facility for the
VMM to provide its own copy of the hypercall pages; just fill them in
directly using VMCALL/VMMCALL as we do for the Hyper-V hypercall page.

This behaviour is enabled by a new INTERCEPT_HCALL flag in the
KVM_XEN_HVM_CONFIG ioctl structure, and advertised by the same flag
being returned from the KVM_CAP_XEN_HVM check.

Add a test case and shift xen_hvm_config() to the nascent xen.c while
we're at it.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h               |   6 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/trace.h                          |  36 +++++
 arch/x86/kvm/x86.c                            |  47 +++---
 arch/x86/kvm/xen.c                            | 140 ++++++++++++++++++
 arch/x86/kvm/xen.h                            |  21 +++
 include/uapi/linux/kvm.h                      |  20 +++
 tools/testing/selftests/kvm/Makefile          |   1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
 .../selftests/kvm/x86_64/xen_vmcall_test.c    | 123 +++++++++++++++
 10 files changed, 367 insertions(+), 30 deletions(-)
 create mode 100644 arch/x86/kvm/xen.c
 create mode 100644 arch/x86/kvm/xen.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7e5f33a0d0e2..9de3229e91e1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -520,6 +520,11 @@ struct kvm_vcpu_hv {
 	cpumask_t tlb_flush;
 };
 
+/* Xen HVM per vcpu emulation context */
+struct kvm_vcpu_xen {
+	u64 hypercall_rip;
+};
+
 struct kvm_vcpu_arch {
 	/*
 	 * rip and regs accesses must go through
@@ -717,6 +722,7 @@ struct kvm_vcpu_arch {
 	unsigned long singlestep_rip;
 
 	struct kvm_vcpu_hv hyperv;
+	struct kvm_vcpu_xen xen;
 
 	cpumask_var_t wbinvd_dirty_mask;
 
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b804444e16d4..8bee4afc1fec 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -13,7 +13,7 @@ kvm-y			+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
 				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
-kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
+kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o xen.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
 			   mmu/spte.o mmu/tdp_iter.o mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index aef960f90f26..d28ecb37b62c 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -92,6 +92,42 @@ TRACE_EVENT(kvm_hv_hypercall,
 		  __entry->outgpa)
 );
 
+/*
+ * Tracepoint for Xen hypercall.
+ */
+TRACE_EVENT(kvm_xen_hypercall,
+	TP_PROTO(unsigned long nr, unsigned long a0, unsigned long a1,
+		 unsigned long a2, unsigned long a3, unsigned long a4,
+		 unsigned long a5),
+	    TP_ARGS(nr, a0, a1, a2, a3, a4, a5),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, nr)
+		__field(unsigned long, a0)
+		__field(unsigned long, a1)
+		__field(unsigned long, a2)
+		__field(unsigned long, a3)
+		__field(unsigned long, a4)
+		__field(unsigned long, a5)
+	),
+
+	TP_fast_assign(
+		__entry->nr = nr;
+		__entry->a0 = a0;
+		__entry->a1 = a1;
+		__entry->a2 = a2;
+		__entry->a3 = a3;
+		__entry->a4 = a4;
+		__entry->a4 = a5;
+	),
+
+	TP_printk("nr 0x%lx a0 0x%lx a1 0x%lx a2 0x%lx a3 0x%lx a4 0x%lx a5 %lx",
+		  __entry->nr, __entry->a0, __entry->a1,  __entry->a2,
+		  __entry->a3, __entry->a4, __entry->a5)
+);
+
+
+
 /*
  * Tracepoint for PIO.
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 13ba4a64f748..5394dda3e3b0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -29,6 +29,7 @@
 #include "pmu.h"
 #include "hyperv.h"
 #include "lapic.h"
+#include "xen.h"
 
 #include <linux/clocksource.h>
 #include <linux/interrupt.h>
@@ -2842,32 +2843,6 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return 0;
 }
 
-static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
-{
-	struct kvm *kvm = vcpu->kvm;
-	int lm = is_long_mode(vcpu);
-	u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64
-		: (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32;
-	u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64
-		: kvm->arch.xen_hvm_config.blob_size_32;
-	u32 page_num = data & ~PAGE_MASK;
-	u64 page_addr = data & PAGE_MASK;
-	u8 *page;
-
-	if (page_num >= blob_size)
-		return 1;
-
-	page = memdup_user(blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE);
-	if (IS_ERR(page))
-		return PTR_ERR(page);
-
-	if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) {
-		kfree(page);
-		return 1;
-	}
-	return 0;
-}
-
 static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
 {
 	u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
@@ -3002,7 +2977,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u64 data = msr_info->data;
 
 	if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
-		return xen_hvm_config(vcpu, data);
+		return kvm_xen_hvm_config(vcpu, data);
 
 	switch (msr) {
 	case MSR_AMD64_NB_CFG:
@@ -3703,7 +3678,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
 	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
-	case KVM_CAP_XEN_HVM:
 	case KVM_CAP_VCPU_EVENTS:
 	case KVM_CAP_HYPERV:
 	case KVM_CAP_HYPERV_VAPIC:
@@ -3742,6 +3716,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
 		r = 1;
 		break;
+	case KVM_CAP_XEN_HVM:
+		r = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
+		    KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL;
+		break;
 	case KVM_CAP_SYNC_REGS:
 		r = KVM_SYNC_X86_VALID_FIELDS;
 		break;
@@ -5603,7 +5581,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		if (copy_from_user(&xhc, argp, sizeof(xhc)))
 			goto out;
 		r = -EINVAL;
-		if (xhc.flags)
+		if (xhc.flags & ~KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL)
+			goto out;
+		/*
+		 * With hypercall interception the kernel generates its own
+		 * hypercall page so it must not be provided.
+		 */
+		if ((xhc.flags & KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL) &&
+		    (xhc.blob_addr_32 || xhc.blob_addr_64 ||
+		     xhc.blob_size_32 || xhc.blob_size_64))
 			goto out;
 		memcpy(&kvm->arch.xen_hvm_config, &xhc, sizeof(xhc));
 		r = 0;
@@ -8066,6 +8052,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	unsigned long nr, a0, a1, a2, a3, ret;
 	int op_64_bit;
 
+	if (kvm_xen_hypercall_enabled(vcpu->kvm))
+		return kvm_xen_hypercall(vcpu);
+
 	if (kvm_hv_hypercall_enabled(vcpu->kvm))
 		return kvm_hv_hypercall(vcpu);
 
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
new file mode 100644
index 000000000000..b76d121a86c0
--- /dev/null
+++ b/arch/x86/kvm/xen.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright © 2019 Oracle and/or its affiliates. All rights reserved.
+ * Copyright © 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * KVM Xen emulation
+ */
+
+#include "x86.h"
+#include "xen.h"
+
+#include <linux/kvm_host.h>
+
+#include <trace/events/kvm.h>
+
+#include "trace.h"
+
+int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u32 page_num = data & ~PAGE_MASK;
+	u64 page_addr = data & PAGE_MASK;
+
+	/*
+	 * If Xen hypercall intercept is enabled, fill the hypercall
+	 * page with VMCALL/VMMCALL instructions since that's what
+	 * we catch. Else the VMM has provided the hypercall pages
+	 * with instructions of its own choosing, so use those.
+	 */
+	if (kvm_xen_hypercall_enabled(kvm)) {
+		u8 instructions[32];
+		int i;
+
+		if (page_num)
+			return 1;
+
+		/* mov imm32, %eax */
+		instructions[0] = 0xb8;
+
+		/* vmcall / vmmcall */
+		kvm_x86_ops.patch_hypercall(vcpu, instructions + 5);
+
+		/* ret */
+		instructions[8] = 0xc3;
+
+		/* int3 to pad */
+		memset(instructions + 9, 0xcc, sizeof(instructions) - 9);
+
+		for (i = 0; i < PAGE_SIZE / sizeof(instructions); i++) {
+			*(u32 *)&instructions[1] = i;
+			if (kvm_vcpu_write_guest(vcpu,
+						 page_addr + (i * sizeof(instructions)),
+						 instructions, sizeof(instructions)))
+				return 1;
+		}
+	} else {
+		int lm = is_long_mode(vcpu);
+		u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64
+				   : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32;
+		u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64
+				  : kvm->arch.xen_hvm_config.blob_size_32;
+		u8 *page;
+
+		if (page_num >= blob_size)
+			return 1;
+
+		page = memdup_user(blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE);
+		if (IS_ERR(page))
+			return PTR_ERR(page);
+
+		if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) {
+			kfree(page);
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static int kvm_xen_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
+{
+	kvm_rax_write(vcpu, result);
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
+static int kvm_xen_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
+{
+	struct kvm_run *run = vcpu->run;
+
+	if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.xen.hypercall_rip)))
+		return 1;
+
+	return kvm_xen_hypercall_set_result(vcpu, run->xen.u.hcall.result);
+}
+
+int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
+{
+	bool longmode;
+	u64 input, params[6];
+
+	input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
+
+	longmode = is_64_bit_mode(vcpu);
+	if (!longmode) {
+		params[0] = (u32)kvm_rbx_read(vcpu);
+		params[1] = (u32)kvm_rcx_read(vcpu);
+		params[2] = (u32)kvm_rdx_read(vcpu);
+		params[3] = (u32)kvm_rsi_read(vcpu);
+		params[4] = (u32)kvm_rdi_read(vcpu);
+		params[5] = (u32)kvm_rbp_read(vcpu);
+	}
+#ifdef CONFIG_X86_64
+	else {
+		params[0] = (u64)kvm_rdi_read(vcpu);
+		params[1] = (u64)kvm_rsi_read(vcpu);
+		params[2] = (u64)kvm_rdx_read(vcpu);
+		params[3] = (u64)kvm_r10_read(vcpu);
+		params[4] = (u64)kvm_r8_read(vcpu);
+		params[5] = (u64)kvm_r9_read(vcpu);
+	}
+#endif
+	trace_kvm_xen_hypercall(input, params[0], params[1], params[2],
+				params[3], params[4], params[5]);
+
+	vcpu->run->exit_reason = KVM_EXIT_XEN;
+	vcpu->run->xen.type = KVM_EXIT_XEN_HCALL;
+	vcpu->run->xen.u.hcall.longmode = longmode;
+	vcpu->run->xen.u.hcall.cpl = kvm_x86_ops.get_cpl(vcpu);
+	vcpu->run->xen.u.hcall.input = input;
+	vcpu->run->xen.u.hcall.params[0] = params[0];
+	vcpu->run->xen.u.hcall.params[1] = params[1];
+	vcpu->run->xen.u.hcall.params[2] = params[2];
+	vcpu->run->xen.u.hcall.params[3] = params[3];
+	vcpu->run->xen.u.hcall.params[4] = params[4];
+	vcpu->run->xen.u.hcall.params[5] = params[5];
+	vcpu->arch.xen.hypercall_rip = kvm_get_linear_rip(vcpu);
+	vcpu->arch.complete_userspace_io =
+		kvm_xen_hypercall_complete_userspace;
+
+	return 0;
+}
diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
new file mode 100644
index 000000000000..81e12f716d2e
--- /dev/null
+++ b/arch/x86/kvm/xen.h
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright © 2019 Oracle and/or its affiliates. All rights reserved.
+ * Copyright © 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * KVM Xen emulation
+ */
+
+#ifndef __ARCH_X86_KVM_XEN_H__
+#define __ARCH_X86_KVM_XEN_H__
+
+int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
+int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data);
+
+static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm)
+{
+	return kvm->arch.xen_hvm_config.flags &
+		KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL;
+}
+
+#endif /* __ARCH_X86_KVM_XEN_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..6b593a23cd41 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -216,6 +216,20 @@ struct kvm_hyperv_exit {
 	} u;
 };
 
+struct kvm_xen_exit {
+#define KVM_EXIT_XEN_HCALL          1
+	__u32 type;
+	union {
+		struct {
+			__u32 longmode;
+			__u32 cpl;
+			__u64 input;
+			__u64 result;
+			__u64 params[6];
+		} hcall;
+	} u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -250,6 +264,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_ARM_NISV         28
 #define KVM_EXIT_X86_RDMSR        29
 #define KVM_EXIT_X86_WRMSR        30
+#define KVM_EXIT_XEN              31
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -426,6 +441,8 @@ struct kvm_run {
 			__u32 index; /* kernel -> user */
 			__u64 data; /* kernel <-> user */
 		} msr;
+		/* KVM_EXIT_XEN */
+		struct kvm_xen_exit xen;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1126,6 +1143,9 @@ struct kvm_x86_mce {
 #endif
 
 #ifdef KVM_CAP_XEN_HVM
+#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
+#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
+
 struct kvm_xen_hvm_config {
 	__u32 flags;
 	__u32 msr;
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 3d14ef77755e..d94abec627e6 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -59,6 +59,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
 TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test
+TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 126c6727a6b0..6e96ae47d28c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1654,6 +1654,7 @@ static struct exit_reason {
 	{KVM_EXIT_INTERNAL_ERROR, "INTERNAL_ERROR"},
 	{KVM_EXIT_OSI, "OSI"},
 	{KVM_EXIT_PAPR_HCALL, "PAPR_HCALL"},
+	{KVM_EXIT_XEN, "XEN"},
 #ifdef KVM_EXIT_MEMORY_NOT_PRESENT
 	{KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"},
 #endif
diff --git a/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c b/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c
new file mode 100644
index 000000000000..3f1dd93626e5
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * svm_vmcall_test
+ *
+ * Copyright © 2020 Amazon.com, Inc. or its affiliates.
+ *
+ * Userspace hypercall testing
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID		5
+
+#define HCALL_REGION_GPA	0xc0000000ULL
+#define HCALL_REGION_SLOT	10
+
+static struct kvm_vm *vm;
+
+#define INPUTVALUE 17
+#define ARGVALUE(x) (0xdeadbeef5a5a0000UL + x)
+#define RETVALUE 0xcafef00dfbfbffffUL
+
+#define XEN_HYPERCALL_MSR 0x40000000
+
+static void guest_code(void)
+{
+	unsigned long rax = INPUTVALUE;
+	unsigned long rdi = ARGVALUE(1);
+	unsigned long rsi = ARGVALUE(2);
+	unsigned long rdx = ARGVALUE(3);
+	register unsigned long r10 __asm__("r10") = ARGVALUE(4);
+	register unsigned long r8 __asm__("r8") = ARGVALUE(5);
+	register unsigned long r9 __asm__("r9") = ARGVALUE(6);
+
+	/* First a direct invocation of 'vmcall' */
+	__asm__ __volatile__("vmcall" :
+			     "=a"(rax) :
+			     "a"(rax), "D"(rdi), "S"(rsi), "d"(rdx),
+			     "r"(r10), "r"(r8), "r"(r9));
+	GUEST_ASSERT(rax == RETVALUE);
+
+	/* Now fill in the hypercall page */
+	__asm__ __volatile__("wrmsr" : : "c" (XEN_HYPERCALL_MSR),
+			     "a" (HCALL_REGION_GPA & 0xffffffff),
+			     "d" (HCALL_REGION_GPA >> 32));
+
+	/* And invoke the same hypercall that way */
+	__asm__ __volatile__("call *%1" : "=a"(rax) :
+			     "r"(HCALL_REGION_GPA + INPUTVALUE * 32),
+			     "a"(rax), "D"(rdi), "S"(rsi), "d"(rdx),
+			     "r"(r10), "r"(r8), "r"(r9));
+	GUEST_ASSERT(rax == RETVALUE);
+
+	GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+	if (!(kvm_check_cap(KVM_CAP_XEN_HVM) &
+	      KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL) ) {
+		print_skip("KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL not available");
+		exit(KSFT_SKIP);
+	}
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) guest_code);
+	vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+
+	struct kvm_xen_hvm_config hvmc = {
+		.flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
+		.msr = XEN_HYPERCALL_MSR,
+	};
+	vm_ioctl(vm, KVM_XEN_HVM_CONFIG, &hvmc);
+
+	/* Map a region for the hypercall page */
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+                                    HCALL_REGION_GPA, HCALL_REGION_SLOT,
+				    getpagesize(), 0);
+	virt_map(vm, HCALL_REGION_GPA, HCALL_REGION_GPA, 1, 0);
+
+	for (;;) {
+		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+		struct ucall uc;
+
+		vcpu_run(vm, VCPU_ID);
+
+		if (run->exit_reason == KVM_EXIT_XEN) {
+			ASSERT_EQ(run->xen.type, KVM_EXIT_XEN_HCALL);
+			ASSERT_EQ(run->xen.u.hcall.cpl, 0);
+			ASSERT_EQ(run->xen.u.hcall.longmode, 1);
+			ASSERT_EQ(run->xen.u.hcall.input, INPUTVALUE);
+			ASSERT_EQ(run->xen.u.hcall.params[0], ARGVALUE(1));
+			ASSERT_EQ(run->xen.u.hcall.params[1], ARGVALUE(2));
+			ASSERT_EQ(run->xen.u.hcall.params[2], ARGVALUE(3));
+			ASSERT_EQ(run->xen.u.hcall.params[3], ARGVALUE(4));
+			ASSERT_EQ(run->xen.u.hcall.params[4], ARGVALUE(5));
+			ASSERT_EQ(run->xen.u.hcall.params[5], ARGVALUE(6));
+			run->xen.u.hcall.result = RETVALUE;
+			continue;
+		}
+
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+			    run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *)uc.args[0]);
+			/* NOT REACHED */
+		case UCALL_SYNC:
+			break;
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
+		}
+	}
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 04/17] KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (2 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 05/17] KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR David Woodhouse
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

Disambiguate Xen vs. Hyper-V calls by adding 'orl $0x80000000, %eax'
at the start of the Hyper-V hypercall page when Xen hypercalls are
also enabled.

That bit is reserved in the Hyper-V ABI, and those hypercall numbers
will never be used by Xen (because it does precisely the same trick).

Switch to using kvm_vcpu_write_guest() while we're at it, instead of
open-coding it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/hyperv.c                         | 40 ++++++++++++++-----
 arch/x86/kvm/xen.c                            |  6 +++
 .../selftests/kvm/x86_64/xen_vmcall_test.c    | 39 +++++++++++++++---
 3 files changed, 68 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 5c7c4060b45c..347ff9ad70b3 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,6 +23,7 @@
 #include "ioapic.h"
 #include "cpuid.h"
 #include "hyperv.h"
+#include "xen.h"
 
 #include <linux/cpu.h>
 #include <linux/kvm_host.h>
@@ -1139,9 +1140,9 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 			hv->hv_hypercall &= ~HV_X64_MSR_HYPERCALL_ENABLE;
 		break;
 	case HV_X64_MSR_HYPERCALL: {
-		u64 gfn;
-		unsigned long addr;
-		u8 instructions[4];
+		u8 instructions[9];
+		int i = 0;
+		u64 addr;
 
 		/* if guest os id is not set hypercall should remain disabled */
 		if (!hv->hv_guest_os_id)
@@ -1150,16 +1151,33 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 			hv->hv_hypercall = data;
 			break;
 		}
-		gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT;
-		addr = gfn_to_hva(kvm, gfn);
-		if (kvm_is_error_hva(addr))
-			return 1;
-		kvm_x86_ops.patch_hypercall(vcpu, instructions);
-		((unsigned char *)instructions)[3] = 0xc3; /* ret */
-		if (__copy_to_user((void __user *)addr, instructions, 4))
+
+		/*
+		 * If Xen and Hyper-V hypercalls are both enabled, disambiguate
+		 * the same way Xen itself does, by setting the bit 31 of EAX
+		 * which is RsvdZ in the 32-bit Hyper-V hypercall ABI and just
+		 * going to be clobbered on 64-bit.
+		 */
+		if (kvm_xen_hypercall_enabled(kvm)) {
+			/* orl $0x80000000, %eax */
+			instructions[i++] = 0x0d;
+			instructions[i++] = 0x00;
+			instructions[i++] = 0x00;
+			instructions[i++] = 0x00;
+			instructions[i++] = 0x80;
+		}
+
+		/* vmcall/vmmcall */
+		kvm_x86_ops.patch_hypercall(vcpu, instructions + i);
+		i += 3;
+
+		/* ret */
+		((unsigned char *)instructions)[i++] = 0xc3;
+
+		addr = data & HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK;
+		if (kvm_vcpu_write_guest(vcpu, addr, instructions, i))
 			return 1;
 		hv->hv_hypercall = data;
-		mark_page_dirty(kvm, gfn);
 		break;
 	}
 	case HV_X64_MSR_REFERENCE_TSC:
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index b76d121a86c0..503935d8212e 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -8,6 +8,7 @@
 
 #include "x86.h"
 #include "xen.h"
+#include "hyperv.h"
 
 #include <linux/kvm_host.h>
 
@@ -99,6 +100,11 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
 
 	input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
 
+	/* Hyper-V hypercalls get bit 31 set in EAX */
+	if ((input & 0x80000000) &&
+	    kvm_hv_hypercall_enabled(vcpu->kvm))
+		return kvm_hv_hypercall(vcpu);
+
 	longmode = is_64_bit_mode(vcpu);
 	if (!longmode) {
 		params[0] = (u32)kvm_rbx_read(vcpu);
diff --git a/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c b/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c
index 3f1dd93626e5..24f279e1a66b 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_vmcall_test.c
@@ -15,6 +15,7 @@
 
 #define HCALL_REGION_GPA	0xc0000000ULL
 #define HCALL_REGION_SLOT	10
+#define PAGE_SIZE		4096
 
 static struct kvm_vm *vm;
 
@@ -22,7 +23,12 @@ static struct kvm_vm *vm;
 #define ARGVALUE(x) (0xdeadbeef5a5a0000UL + x)
 #define RETVALUE 0xcafef00dfbfbffffUL
 
-#define XEN_HYPERCALL_MSR 0x40000000
+#define XEN_HYPERCALL_MSR	0x40000200
+#define HV_GUEST_OS_ID_MSR	0x40000000
+#define HV_HYPERCALL_MSR	0x40000001
+
+#define HVCALL_SIGNAL_EVENT		0x005d
+#define HV_STATUS_INVALID_ALIGNMENT	4
 
 static void guest_code(void)
 {
@@ -30,6 +36,7 @@ static void guest_code(void)
 	unsigned long rdi = ARGVALUE(1);
 	unsigned long rsi = ARGVALUE(2);
 	unsigned long rdx = ARGVALUE(3);
+	unsigned long rcx;
 	register unsigned long r10 __asm__("r10") = ARGVALUE(4);
 	register unsigned long r8 __asm__("r8") = ARGVALUE(5);
 	register unsigned long r9 __asm__("r9") = ARGVALUE(6);
@@ -41,18 +48,38 @@ static void guest_code(void)
 			     "r"(r10), "r"(r8), "r"(r9));
 	GUEST_ASSERT(rax == RETVALUE);
 
-	/* Now fill in the hypercall page */
+	/* Fill in the Xen hypercall page */
 	__asm__ __volatile__("wrmsr" : : "c" (XEN_HYPERCALL_MSR),
 			     "a" (HCALL_REGION_GPA & 0xffffffff),
 			     "d" (HCALL_REGION_GPA >> 32));
 
-	/* And invoke the same hypercall that way */
+	/* Set Hyper-V Guest OS ID */
+	__asm__ __volatile__("wrmsr" : : "c" (HV_GUEST_OS_ID_MSR),
+			     "a" (0x5a), "d" (0));
+
+	/* Hyper-V hypercall page */
+	u64 msrval = HCALL_REGION_GPA + PAGE_SIZE + 1;
+	__asm__ __volatile__("wrmsr" : : "c" (HV_HYPERCALL_MSR),
+			     "a" (msrval & 0xffffffff),
+			     "d" (msrval >> 32));
+
+	/* Invoke a Xen hypercall */
 	__asm__ __volatile__("call *%1" : "=a"(rax) :
 			     "r"(HCALL_REGION_GPA + INPUTVALUE * 32),
 			     "a"(rax), "D"(rdi), "S"(rsi), "d"(rdx),
 			     "r"(r10), "r"(r8), "r"(r9));
 	GUEST_ASSERT(rax == RETVALUE);
 
+	/* Invoke a Hyper-V hypercall */
+	rax = 0;
+	rcx = HVCALL_SIGNAL_EVENT;	/* code */
+	rdx = 0x5a5a5a5a;		/* ingpa (badly aligned) */
+	__asm__ __volatile__("call *%1" : "=a"(rax) :
+			     "r"(HCALL_REGION_GPA + PAGE_SIZE),
+			     "a"(rax), "c"(rcx), "d"(rdx),
+			     "r"(r8));
+	GUEST_ASSERT(rax == HV_STATUS_INVALID_ALIGNMENT);
+
 	GUEST_DONE();
 }
 
@@ -73,11 +100,11 @@ int main(int argc, char *argv[])
 	};
 	vm_ioctl(vm, KVM_XEN_HVM_CONFIG, &hvmc);
 
-	/* Map a region for the hypercall page */
+	/* Map a region for the hypercall pages */
 	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
                                     HCALL_REGION_GPA, HCALL_REGION_SLOT,
-				    getpagesize(), 0);
-	virt_map(vm, HCALL_REGION_GPA, HCALL_REGION_GPA, 1, 0);
+				    2 * getpagesize(), 0);
+	virt_map(vm, HCALL_REGION_GPA, HCALL_REGION_GPA, 2, 0);
 
 	for (;;) {
 		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 05/17] KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (3 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 04/17] KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 06/17] KVM: x86/xen: latch long_mode when hypercall page is set up David Woodhouse
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

This will be used to set up shared info pages etc.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c       | 20 ++++++++++++++++++++
 arch/x86/kvm/xen.c       | 24 ++++++++++++++++++++++++
 arch/x86/kvm/xen.h       |  3 +++
 include/linux/kvm_host.h | 30 +++++++++++++++---------------
 include/uapi/linux/kvm.h | 11 +++++++++++
 5 files changed, 73 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5394dda3e3b0..975ef5d6dda1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5595,6 +5595,26 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_XEN_HVM_GET_ATTR: {
+		struct kvm_xen_hvm_attr xha;
+
+		r = -EFAULT;
+		if (copy_from_user(&xha, argp, sizeof(xha)))
+			goto out;
+		r = kvm_xen_hvm_get_attr(kvm, &xha);
+		if (copy_to_user(argp, &xha, sizeof(xha)))
+			goto out;
+		break;
+	}
+	case KVM_XEN_HVM_SET_ATTR: {
+		struct kvm_xen_hvm_attr xha;
+
+		r = -EFAULT;
+		if (copy_from_user(&xha, argp, sizeof(xha)))
+			goto out;
+		r = kvm_xen_hvm_set_attr(kvm, &xha);
+		break;
+	}
 	case KVM_SET_CLOCK: {
 		struct kvm_clock_data user_ns;
 		u64 now_ns;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 503935d8212e..c0b2c67e0235 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -16,6 +16,30 @@
 
 #include "trace.h"
 
+int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
+{
+	int r = -ENOENT;
+
+	switch (data->type) {
+	default:
+		break;
+	}
+
+	return r;
+}
+
+int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
+{
+	int r = -ENOENT;
+
+	switch (data->type) {
+	default:
+		break;
+	}
+
+	return r;
+}
+
 int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
index 81e12f716d2e..afc6dad41fb5 100644
--- a/arch/x86/kvm/xen.h
+++ b/arch/x86/kvm/xen.h
@@ -9,8 +9,11 @@
 #ifndef __ARCH_X86_KVM_XEN_H__
 #define __ARCH_X86_KVM_XEN_H__
 
+int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
+int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
 int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data);
+void kvm_xen_destroy_vm(struct kvm *kvm);
 
 static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8eb5eb1399f5..846a010f5d5f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -33,6 +33,21 @@
 
 #include <linux/kvm_types.h>
 
+struct kvm_host_map {
+	/*
+	 * Only valid if the 'pfn' is managed by the host kernel (i.e. There is
+	 * a 'struct page' for it. When using mem= kernel parameter some memory
+	 * can be used as guest memory but they are not managed by host
+	 * kernel).
+	 * If 'pfn' is not managed by the host kernel, this field is
+	 * initialized to KVM_UNMAPPED_PAGE.
+	 */
+	struct page *page;
+	void *hva;
+	kvm_pfn_t pfn;
+	kvm_pfn_t gfn;
+};
+
 #include <asm/kvm_host.h>
 
 #ifndef KVM_MAX_VCPU_ID
@@ -225,21 +240,6 @@ enum {
 
 #define KVM_UNMAPPED_PAGE	((void *) 0x500 + POISON_POINTER_DELTA)
 
-struct kvm_host_map {
-	/*
-	 * Only valid if the 'pfn' is managed by the host kernel (i.e. There is
-	 * a 'struct page' for it. When using mem= kernel parameter some memory
-	 * can be used as guest memory but they are not managed by host
-	 * kernel).
-	 * If 'pfn' is not managed by the host kernel, this field is
-	 * initialized to KVM_UNMAPPED_PAGE.
-	 */
-	struct page *page;
-	void *hva;
-	kvm_pfn_t pfn;
-	kvm_pfn_t gfn;
-};
-
 /*
  * Used to check if the mapping is valid or not. Never use 'kvm_host_map'
  * directly to check for that.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6b593a23cd41..3d9ac2f4dc10 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1577,6 +1577,17 @@ struct kvm_pv_cmd {
 /* Available with KVM_CAP_X86_MSR_FILTER */
 #define KVM_X86_SET_MSR_FILTER	_IOW(KVMIO,  0xc6, struct kvm_msr_filter)
 
+#define KVM_XEN_HVM_GET_ATTR	_IOWR(KVMIO, 0xc7, struct kvm_xen_hvm_attr)
+#define KVM_XEN_HVM_SET_ATTR	_IOW(KVMIO,  0xc8, struct kvm_xen_hvm_attr)
+
+struct kvm_xen_hvm_attr {
+	__u16 type;
+
+	union {
+		__u64 pad[4];
+	} u;
+};
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 06/17] KVM: x86/xen: latch long_mode when hypercall page is set up
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (4 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 05/17] KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 07/17] KVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info David Woodhouse
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/xen.c              | 16 +++++++++++++++-
 include/uapi/linux/kvm.h        |  3 +++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9de3229e91e1..c9a4feaee2e7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -890,6 +890,11 @@ struct msr_bitmap_range {
 	unsigned long *bitmap;
 };
 
+/* Xen emulation context */
+struct kvm_xen {
+	bool long_mode;
+};
+
 enum kvm_irqchip_mode {
 	KVM_IRQCHIP_NONE,
 	KVM_IRQCHIP_KERNEL,       /* created with KVM_CREATE_IRQCHIP */
@@ -969,6 +974,7 @@ struct kvm_arch {
 	struct hlist_head mask_notifier_list;
 
 	struct kvm_hv hyperv;
+	struct kvm_xen xen;
 
 	#ifdef CONFIG_KVM_MMU_AUDIT
 	int audit_point;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index c0b2c67e0235..52cb9e465542 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -21,6 +21,13 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 	int r = -ENOENT;
 
 	switch (data->type) {
+	case KVM_XEN_ATTR_TYPE_LONG_MODE:
+		if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode)
+			return -EINVAL;
+
+		kvm->arch.xen.long_mode = !!data->u.long_mode;
+		r = 0;
+		break;
 	default:
 		break;
 	}
@@ -33,6 +40,10 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 	int r = -ENOENT;
 
 	switch (data->type) {
+	case KVM_XEN_ATTR_TYPE_LONG_MODE:
+		data->u.long_mode = kvm->arch.xen.long_mode;
+		r = 0;
+		break;
 	default:
 		break;
 	}
@@ -45,6 +56,10 @@ int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
 	struct kvm *kvm = vcpu->kvm;
 	u32 page_num = data & ~PAGE_MASK;
 	u64 page_addr = data & PAGE_MASK;
+	bool lm = is_long_mode(vcpu);
+
+	/* Latch long_mode for shared_info pages etc. */
+	vcpu->kvm->arch.xen.long_mode = lm;
 
 	/*
 	 * If Xen hypercall intercept is enabled, fill the hypercall
@@ -79,7 +94,6 @@ int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
 				return 1;
 		}
 	} else {
-		int lm = is_long_mode(vcpu);
 		u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64
 				   : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32;
 		u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3d9ac2f4dc10..6b556ef98b76 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1584,10 +1584,13 @@ struct kvm_xen_hvm_attr {
 	__u16 type;
 
 	union {
+		__u8 long_mode;
 		__u64 pad[4];
 	} u;
 };
 
+#define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 07/17] KVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (5 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 06/17] KVM: x86/xen: latch long_mode when hypercall page is set up David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 08/17] KVM: x86/xen: register shared_info page David Woodhouse
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

There aren't a lot of differences for the things that the kernel needs
to care about, but there are a few.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/xen.h | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
index afc6dad41fb5..cd3c52b62068 100644
--- a/arch/x86/kvm/xen.h
+++ b/arch/x86/kvm/xen.h
@@ -21,4 +21,41 @@ static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm)
 		KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL;
 }
 
+
+/* 32-bit compatibility definitions, also used natively in 32-bit build */
+#include <asm/pvclock-abi.h>
+#include <asm/xen/interface.h>
+
+struct compat_arch_vcpu_info {
+	unsigned int cr2;
+	unsigned int pad[5];
+};
+
+struct compat_vcpu_info {
+        uint8_t evtchn_upcall_pending;
+        uint8_t evtchn_upcall_mask;
+        uint32_t evtchn_pending_sel;
+        struct compat_arch_vcpu_info arch;
+        struct pvclock_vcpu_time_info time;
+}; /* 64 bytes (x86) */
+
+struct compat_arch_shared_info {
+	unsigned int max_pfn;
+	unsigned int pfn_to_mfn_frame_list_list;
+	unsigned int nmi_reason;
+	unsigned int p2m_cr3;
+	unsigned int p2m_vaddr;
+	unsigned int p2m_generation;
+	uint32_t wc_sec_hi;
+};
+
+struct compat_shared_info {
+	struct compat_vcpu_info vcpu_info[MAX_VIRT_CPUS];
+	uint32_t evtchn_pending[32];
+	uint32_t evtchn_mask[32];
+	struct pvclock_wall_clock wc;
+	struct compat_arch_shared_info arch;
+
+};
+
 #endif /* __ARCH_X86_KVM_XEN_H__ */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (6 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 07/17] KVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14 10:45   ` Joao Martins
  2020-12-14  8:38 ` [PATCH v3 09/17] xen: add wc_sec_hi to struct shared_info David Woodhouse
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

We add a new ioctl, XEN_HVM_SHARED_INFO, to allow hypervisor
to know where the guest's shared info page is.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/xen.c              | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/xen.h              |  1 -
 include/uapi/linux/kvm.h        |  4 ++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c9a4feaee2e7..8bcd83dacf43 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -893,6 +893,8 @@ struct msr_bitmap_range {
 /* Xen emulation context */
 struct kvm_xen {
 	bool long_mode;
+	bool shinfo_set;
+	struct gfn_to_hva_cache shinfo_cache;
 };
 
 enum kvm_irqchip_mode {
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 52cb9e465542..9dd9c42842b8 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -13,9 +13,23 @@
 #include <linux/kvm_host.h>
 
 #include <trace/events/kvm.h>
+#include <xen/interface/xen.h>
 
 #include "trace.h"
 
+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+	int ret;
+
+	ret = kvm_gfn_to_hva_cache_init(kvm, &kvm->arch.xen.shinfo_cache,
+					gfn_to_gpa(gfn), PAGE_SIZE);
+	if (ret)
+		return ret;
+
+	kvm->arch.xen.shinfo_set = true;
+	return 0;
+}
+
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 {
 	int r = -ENOENT;
@@ -28,6 +42,11 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		kvm->arch.xen.long_mode = !!data->u.long_mode;
 		r = 0;
 		break;
+
+	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
+		r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
+		break;
+
 	default:
 		break;
 	}
@@ -44,6 +63,14 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		data->u.long_mode = kvm->arch.xen.long_mode;
 		r = 0;
 		break;
+
+	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
+		if (kvm->arch.xen.shinfo_set) {
+			data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa);
+			r = 0;
+		}
+		break;
+
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
index cd3c52b62068..120b7450252a 100644
--- a/arch/x86/kvm/xen.h
+++ b/arch/x86/kvm/xen.h
@@ -13,7 +13,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
 int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data);
-void kvm_xen_destroy_vm(struct kvm *kvm);
 
 static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6b556ef98b76..caa9faf3c5ad 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1585,11 +1585,15 @@ struct kvm_xen_hvm_attr {
 
 	union {
 		__u8 long_mode;
+		struct {
+			__u64 gfn;
+		} shared_info;
 		__u64 pad[4];
 	} u;
 };
 
 #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 09/17] xen: add wc_sec_hi to struct shared_info
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (7 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 08/17] KVM: x86/xen: register shared_info page David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 10/17] KVM: x86/xen: update wallclock region David Woodhouse
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

Xen added this in 2015 (Xen 4.6). On x86_64 and Arm it fills what was
previously a 32-bit hole in the generic shared_info structure; on
i386 it had to go at the end of struct arch_shared_info.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/xen/interface.h | 3 +++
 include/xen/interface/xen.h          | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h
index 9139b3e86316..baca0b00ef76 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -182,6 +182,9 @@ struct arch_shared_info {
 	unsigned long p2m_cr3;		/* cr3 value of the p2m address space */
 	unsigned long p2m_vaddr;	/* virtual address of the p2m list */
 	unsigned long p2m_generation;	/* generation count of p2m mapping */
+#ifdef CONFIG_X86_32
+	uint32_t wc_sec_hi;
+#endif
 };
 #endif	/* !__ASSEMBLY__ */
 
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 8bfb242f433e..5ee37a296481 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -598,7 +598,9 @@ struct shared_info {
 	 * their gettimeofday() syscall on this wallclock-base value.
 	 */
 	struct pvclock_wall_clock wc;
-
+#ifndef CONFIG_X86_32
+	uint32_t wc_sec_hi;
+#endif
 	struct arch_shared_info arch;
 
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 10/17] KVM: x86/xen: update wallclock region
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (8 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 09/17] xen: add wc_sec_hi to struct shared_info David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14  8:38 ` [PATCH v3 11/17] KVM: x86/xen: register vcpu info David Woodhouse
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

Wallclock on Xen is written in the shared_info page.

To that purpose, export kvm_write_wall_clock() and pass on the GPA of
its location to populate the shared_info wall clock data.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c | 17 ++++++++++++-----
 arch/x86/kvm/x86.h |  1 +
 arch/x86/kvm/xen.c | 28 +++++++++++++++++++++++++++-
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 975ef5d6dda1..64016443159c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1921,15 +1921,14 @@ static s64 get_kvmclock_base_ns(void)
 }
 #endif
 
-static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
+void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
 {
 	int version;
 	int r;
 	struct pvclock_wall_clock wc;
+	u32 wc_sec_hi;
 	u64 wall_nsec;
 
-	kvm->arch.wall_clock = wall_clock;
-
 	if (!wall_clock)
 		return;
 
@@ -1958,6 +1957,12 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 
 	kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
 
+	if (sec_hi_ofs) {
+		wc_sec_hi = wall_nsec >> 32;
+		kvm_write_guest(kvm, wall_clock + sec_hi_ofs,
+				&wc_sec_hi, sizeof(wc_sec_hi));
+	}
+
 	version++;
 	kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
 }
@@ -3115,13 +3120,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
 			return 1;
 
-		kvm_write_wall_clock(vcpu->kvm, data);
+		vcpu->kvm->arch.wall_clock = data;
+		kvm_write_wall_clock(vcpu->kvm, data, 0);
 		break;
 	case MSR_KVM_WALL_CLOCK:
 		if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
 			return 1;
 
-		kvm_write_wall_clock(vcpu->kvm, data);
+		vcpu->kvm->arch.wall_clock = data;
+		kvm_write_wall_clock(vcpu->kvm, data, 0);
 		break;
 	case MSR_KVM_SYSTEM_TIME_NEW:
 		if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e7ca622a468f..cf8778410015 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -246,6 +246,7 @@ static inline bool kvm_vcpu_latch_init(struct kvm_vcpu *vcpu)
 	return is_smm(vcpu) || kvm_x86_ops.apic_init_signal_blocked(vcpu);
 }
 
+void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs);
 void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
 
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 9dd9c42842b8..e5117a611737 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -19,14 +19,40 @@
 
 static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
 {
+	gpa_t gpa = gfn_to_gpa(gfn);
+	int wc_ofs, sec_hi_ofs;
 	int ret;
 
 	ret = kvm_gfn_to_hva_cache_init(kvm, &kvm->arch.xen.shinfo_cache,
-					gfn_to_gpa(gfn), PAGE_SIZE);
+					gpa, PAGE_SIZE);
 	if (ret)
 		return ret;
 
 	kvm->arch.xen.shinfo_set = true;
+
+	/* Paranoia checks on the 32-bit struct layout */
+	BUILD_BUG_ON(offsetof(struct compat_shared_info, wc) != 0x900);
+	BUILD_BUG_ON(offsetof(struct compat_shared_info, arch.wc_sec_hi) != 0x924);
+	BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0);
+
+	/* 32-bit location by default */
+	wc_ofs = offsetof(struct compat_shared_info, wc);
+	sec_hi_ofs = offsetof(struct compat_shared_info, arch.wc_sec_hi);
+
+#ifdef CONFIG_X86_64
+	/* Paranoia checks on the 64-bit struct layout */
+	BUILD_BUG_ON(offsetof(struct shared_info, wc) != 0xc00);
+	BUILD_BUG_ON(offsetof(struct shared_info, wc_sec_hi) != 0xc0c);
+
+	if (kvm->arch.xen.long_mode) {
+		wc_ofs = offsetof(struct shared_info, wc);
+		sec_hi_ofs = offsetof(struct shared_info, wc_sec_hi);
+	}
+#endif
+
+	kvm_write_wall_clock(kvm, gpa + wc_ofs, sec_hi_ofs - wc_ofs);
+	kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE);
+
 	return 0;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 11/17] KVM: x86/xen: register vcpu info
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (9 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 10/17] KVM: x86/xen: update wallclock region David Woodhouse
@ 2020-12-14  8:38 ` David Woodhouse
  2020-12-14 10:48   ` Joao Martins
  2020-12-14  8:39 ` [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates David Woodhouse
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:38 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

The vcpu info supersedes the per vcpu area of the shared info page and
the guest vcpus will use this instead.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/xen.c              | 30 ++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h        |  5 +++++
 3 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8bcd83dacf43..56c00a9441a3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -523,6 +523,8 @@ struct kvm_vcpu_hv {
 /* Xen HVM per vcpu emulation context */
 struct kvm_vcpu_xen {
 	u64 hypercall_rip;
+	bool vcpu_info_set;
+	struct gfn_to_hva_cache vcpu_info_cache;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index e5117a611737..4bc72e0b9928 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -58,6 +58,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
 
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 {
+	struct kvm_vcpu *v;
 	int r = -ENOENT;
 
 	switch (data->type) {
@@ -73,6 +74,23 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
 		break;
 
+	case KVM_XEN_ATTR_TYPE_VCPU_INFO:
+		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
+		if (!v)
+			return -EINVAL;
+
+		/* No compat necessary here. */
+		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
+			     sizeof(struct compat_vcpu_info));
+		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.vcpu_info_cache,
+					      data->u.vcpu_attr.gpa,
+					      sizeof(struct vcpu_info));
+		if (r)
+			return r;
+
+		v->arch.xen.vcpu_info_set = true;
+		break;
+
 	default:
 		break;
 	}
@@ -83,6 +101,7 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 {
 	int r = -ENOENT;
+	struct kvm_vcpu *v;
 
 	switch (data->type) {
 	case KVM_XEN_ATTR_TYPE_LONG_MODE:
@@ -97,6 +116,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		}
 		break;
 
+	case KVM_XEN_ATTR_TYPE_VCPU_INFO:
+		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
+		if (!v)
+			return -EINVAL;
+
+		if (v->arch.xen.vcpu_info_set) {
+			data->u.vcpu_attr.gpa = v->arch.xen.vcpu_info_cache.gpa;
+			r = 0;
+		}
+		break;
+
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index caa9faf3c5ad..87d150992f48 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1588,12 +1588,17 @@ struct kvm_xen_hvm_attr {
 		struct {
 			__u64 gfn;
 		} shared_info;
+		struct {
+			__u32 vcpu_id;
+			__u64 gpa;
+		} vcpu_attr;
 		__u64 pad[4];
 	} u;
 };
 
 #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
 #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
+#define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (10 preceding siblings ...)
  2020-12-14  8:38 ` [PATCH v3 11/17] KVM: x86/xen: register vcpu info David Woodhouse
@ 2020-12-14  8:39 ` David Woodhouse
  2020-12-14 13:29   ` Joao Martins
  2020-12-14  8:39 ` [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region David Woodhouse
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:39 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

Parameterise kvm_setup_pvclock_page() a little bit so that it can be
invoked for different gfn_to_hva_cache structures, and with different
offsets. Then we can invoke it for the normal KVM pvclock and also for
the Xen one in the vcpu_info.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c | 31 ++++++++++++++++++-------------
 arch/x86/kvm/xen.c |  4 ++++
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64016443159c..cbdc05bb53bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2582,13 +2582,15 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 	return ret;
 }
 
-static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
+static void kvm_setup_pvclock_page(struct kvm_vcpu *v,
+				   struct gfn_to_hva_cache *cache,
+				   unsigned int offset)
 {
 	struct kvm_vcpu_arch *vcpu = &v->arch;
 	struct pvclock_vcpu_time_info guest_hv_clock;
 
-	if (unlikely(kvm_read_guest_cached(v->kvm, &vcpu->pv_time,
-		&guest_hv_clock, sizeof(guest_hv_clock))))
+	if (unlikely(kvm_read_guest_offset_cached(v->kvm, cache,
+		&guest_hv_clock, offset, sizeof(guest_hv_clock))))
 		return;
 
 	/* This VCPU is paused, but it's legal for a guest to read another
@@ -2611,9 +2613,9 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
 		++guest_hv_clock.version;  /* first time write, random junk */
 
 	vcpu->hv_clock.version = guest_hv_clock.version + 1;
-	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
-				&vcpu->hv_clock,
-				sizeof(vcpu->hv_clock.version));
+	kvm_write_guest_offset_cached(v->kvm, cache,
+				      &vcpu->hv_clock, offset,
+				      sizeof(vcpu->hv_clock.version));
 
 	smp_wmb();
 
@@ -2627,16 +2629,16 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
 
 	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
 
-	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
-				&vcpu->hv_clock,
-				sizeof(vcpu->hv_clock));
+	kvm_write_guest_offset_cached(v->kvm, cache,
+				      &vcpu->hv_clock, offset,
+				      sizeof(vcpu->hv_clock));
 
 	smp_wmb();
 
 	vcpu->hv_clock.version++;
-	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
-				&vcpu->hv_clock,
-				sizeof(vcpu->hv_clock.version));
+	kvm_write_guest_offset_cached(v->kvm, cache,
+				     &vcpu->hv_clock, offset,
+				     sizeof(vcpu->hv_clock.version));
 }
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
@@ -2723,7 +2725,10 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	vcpu->hv_clock.flags = pvclock_flags;
 
 	if (vcpu->pv_time_enabled)
-		kvm_setup_pvclock_page(v);
+		kvm_setup_pvclock_page(v, &vcpu->pv_time, 0);
+	if (vcpu->xen.vcpu_info_set)
+		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_info_cache,
+				       offsetof(struct compat_vcpu_info, time));
 	if (v == kvm_get_vcpu(v->kvm, 0))
 		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
 	return 0;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 4bc72e0b9928..d2055b60fdc1 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -82,6 +82,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		/* No compat necessary here. */
 		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
 			     sizeof(struct compat_vcpu_info));
+		BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
+			     offsetof(struct compat_vcpu_info, time));
+
 		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.vcpu_info_cache,
 					      data->u.vcpu_attr.gpa,
 					      sizeof(struct vcpu_info));
@@ -89,6 +92,7 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 			return r;
 
 		v->arch.xen.vcpu_info_set = true;
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 		break;
 
 	default:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (11 preceding siblings ...)
  2020-12-14  8:39 ` [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates David Woodhouse
@ 2020-12-14  8:39 ` David Woodhouse
  2020-12-14 10:55   ` Joao Martins
  2020-12-14  8:39 ` [PATCH v3 14/17] KVM: x86/xen: register runstate info David Woodhouse
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:39 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

Allow the Xen emulated guest the ability to register secondary
vcpu time information. On Xen guests this is used in order to be
mapped to userspace and hence allow vdso gettimeofday to work.

In doing so, move kvm_xen_set_pvclock_page() logic to
kvm_xen_update_vcpu_time() and have the former a top-level
function which updates primary vcpu time info (in struct vcpu_info)
and secondary one.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              |  2 ++
 arch/x86/kvm/xen.c              | 26 ++++++++++++++++++++++++++
 include/uapi/linux/kvm.h        |  1 +
 4 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 56c00a9441a3..b7dfcb4de92a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -524,7 +524,9 @@ struct kvm_vcpu_hv {
 struct kvm_vcpu_xen {
 	u64 hypercall_rip;
 	bool vcpu_info_set;
+	bool vcpu_time_info_set;
 	struct gfn_to_hva_cache vcpu_info_cache;
+	struct gfn_to_hva_cache vcpu_time_info_cache;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cbdc05bb53bd..2234fdf49d82 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2729,6 +2729,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	if (vcpu->xen.vcpu_info_set)
 		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_info_cache,
 				       offsetof(struct compat_vcpu_info, time));
+	if (vcpu->xen.vcpu_time_info_set)
+		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
 	if (v == kvm_get_vcpu(v->kvm, 0))
 		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
 	return 0;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index d2055b60fdc1..1cca46effec8 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -95,6 +95,21 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 		break;
 
+	case KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO:
+		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
+		if (!v)
+			return -EINVAL;
+
+		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.vcpu_time_info_cache,
+					      data->u.vcpu_attr.gpa,
+					      sizeof(struct pvclock_vcpu_time_info));
+		if (r)
+			return r;
+
+		v->arch.xen.vcpu_time_info_set = true;
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
+		break;
+
 	default:
 		break;
 	}
@@ -131,6 +146,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		}
 		break;
 
+	case KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO:
+		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
+		if (!v)
+			return -EINVAL;
+
+		if (v->arch.xen.vcpu_time_info_set) {
+			data->u.vcpu_attr.gpa = v->arch.xen.vcpu_time_info_cache.gpa;
+			r = 0;
+		}
+		break;
+
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 87d150992f48..f60c5c61761c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1599,6 +1599,7 @@ struct kvm_xen_hvm_attr {
 #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
 #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
 #define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
+#define KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO	0x3
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 14/17] KVM: x86/xen: register runstate info
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (12 preceding siblings ...)
  2020-12-14  8:39 ` [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region David Woodhouse
@ 2020-12-14  8:39 ` David Woodhouse
  2020-12-14 11:10   ` Joao Martins
  2020-12-14  8:39 ` [PATCH v3 15/17] KVM: x86: declare Xen HVM shared info capability and add test case David Woodhouse
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:39 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: Joao Martins <joao.m.martins@oracle.com>

Allow emulator to register vcpu runstates which allow Xen guests
to use that for steal clock. The 'preempted' state of KVM steal clock
equates to 'runnable' state, 'running' has similar meanings for both and
'offline' is used when system admin needs to bring vcpu offline or
hotplug.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h |   5 ++
 arch/x86/kvm/x86.c              |  10 +++
 arch/x86/kvm/xen.c              | 148 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/xen.h              |   9 +-
 include/uapi/linux/kvm.h        |   1 +
 5 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b7dfcb4de92a..4b345a8945ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -523,10 +523,15 @@ struct kvm_vcpu_hv {
 /* Xen HVM per vcpu emulation context */
 struct kvm_vcpu_xen {
 	u64 hypercall_rip;
+	u32 current_runstate;
 	bool vcpu_info_set;
 	bool vcpu_time_info_set;
+	bool runstate_set;
 	struct gfn_to_hva_cache vcpu_info_cache;
 	struct gfn_to_hva_cache vcpu_time_info_cache;
+	struct gfn_to_hva_cache runstate_cache;
+	u64 last_steal;
+	u64 last_state_ns;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2234fdf49d82..bd4bd9a818d8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2940,6 +2940,11 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 	struct kvm_host_map map;
 	struct kvm_steal_time *st;
 
+	if (vcpu->arch.xen.runstate_set) {
+		kvm_xen_setup_runstate_page(vcpu);
+		return;
+	}
+
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
@@ -3968,6 +3973,11 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	struct kvm_host_map map;
 	struct kvm_steal_time *st;
 
+	if (vcpu->arch.xen.runstate_set) {
+		kvm_xen_runstate_set_preempted(vcpu);
+		return;
+	}
+
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 1cca46effec8..17cbb4462b7e 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -11,9 +11,11 @@
 #include "hyperv.h"
 
 #include <linux/kvm_host.h>
+#include <linux/sched/stat.h>
 
 #include <trace/events/kvm.h>
 #include <xen/interface/xen.h>
+#include <xen/interface/vcpu.h>
 
 #include "trace.h"
 
@@ -56,6 +58,124 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
 	return 0;
 }
 
+static void kvm_xen_update_runstate(struct kvm_vcpu *v, int state, u64 steal_ns)
+{
+	struct kvm_vcpu_xen *vcpu_xen = &v->arch.xen;
+	struct vcpu_runstate_info runstate;
+	unsigned int offset = offsetof(struct compat_vcpu_runstate_info, state_entry_time);
+	u64 now, delta;
+
+	BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c);
+
+#ifdef CONFIG_X86_64
+	/*
+	 * The only difference is alignment of uint64_t in 32-bit.
+	 * So the first field 'state' is accessed via *runstate_state
+	 * which is unmodified, while the other fields are accessed
+	 * through 'runstate->' which we tweak here by adding 4.
+	 */
+	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) !=
+		     offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4);
+	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) !=
+		     offsetof(struct compat_vcpu_runstate_info, time) + 4);
+
+	offset = offsetof(struct vcpu_runstate_info, state_entry_time);
+#endif
+	/*
+	 * Although it's called "state_entry_time" and explicitly documented
+	 * as being "the system time at which the VCPU was last scheduled to
+	 * run", Xen just treats it as a counter for HVM domains too.
+	 */
+	if (kvm_read_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache,
+					 &runstate.state_entry_time, offset,
+					 sizeof(u64) * 5))
+		return;
+
+	runstate.state_entry_time = XEN_RUNSTATE_UPDATE |
+		(runstate.state_entry_time + 1);
+
+	if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache,
+					  &runstate.state_entry_time, offset,
+					  sizeof(u64)))
+		return;
+	smp_wmb();
+
+	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) !=
+		     offsetof(struct compat_vcpu_runstate_info, state));
+	BUILD_BUG_ON(sizeof(((struct vcpu_runstate_info *)0)->state) !=
+		     sizeof(((struct compat_vcpu_runstate_info *)0)->state));
+	if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache,
+					  &state,
+					  offsetof(struct vcpu_runstate_info, state),
+					  sizeof(runstate.state)))
+		return;
+
+	now = ktime_get_ns();
+	delta = now - vcpu_xen->last_state_ns - steal_ns;
+	runstate.time[vcpu_xen->current_runstate] += delta;
+	if (steal_ns)
+		runstate.time[RUNSTATE_runnable] += steal_ns;
+
+	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) !=
+		     offsetof(struct vcpu_runstate_info, time) - sizeof(u64));
+	BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state_entry_time) !=
+		     offsetof(struct compat_vcpu_runstate_info, time) - sizeof(u64));
+	BUILD_BUG_ON(sizeof(((struct vcpu_runstate_info *)0)->time) !=
+		     sizeof(((struct compat_vcpu_runstate_info *)0)->time));
+	if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache,
+					  &runstate.time[0],
+					  offset + sizeof(u64),
+					  sizeof(runstate.time)))
+		return;
+	smp_wmb();
+	vcpu_xen->current_runstate = state;
+	vcpu_xen->last_state_ns = now;
+
+	runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+	if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache,
+					  &runstate.state_entry_time, offset,
+					  sizeof(u64)))
+		return;
+}
+
+void kvm_xen_runstate_set_preempted(struct kvm_vcpu *v)
+{
+	struct kvm_vcpu_xen *vcpu_xen = &v->arch.xen;
+	int new_state;
+
+	BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c);
+	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) !=
+		     offsetof(struct compat_vcpu_runstate_info, state));
+	BUILD_BUG_ON(sizeof(((struct vcpu_runstate_info *)0)->state) !=
+		     sizeof(((struct compat_vcpu_runstate_info *)0)->state));
+
+	if (v->preempted) {
+		new_state = RUNSTATE_runnable;
+	} else {
+		new_state = RUNSTATE_blocked;
+		vcpu_xen->last_steal = current->sched_info.run_delay;
+	}
+
+	kvm_xen_update_runstate(v, new_state, 0);
+}
+
+void kvm_xen_setup_runstate_page(struct kvm_vcpu *v)
+{
+	struct kvm_vcpu_xen *vcpu_xen = &v->arch.xen;
+	u64 steal_time = 0;
+
+	/*
+	 * If the CPU was blocked when it last stopped, presumably
+	 * it became unblocked at some point because it's being run
+	 * again now. The scheduler run_delay is the runnable time,
+	 * to be subtracted from the blocked time.
+	 */
+	if (vcpu_xen->current_runstate == RUNSTATE_blocked)
+		steal_time = current->sched_info.run_delay - vcpu_xen->last_steal;
+
+	kvm_xen_update_runstate(v, RUNSTATE_running, steal_time);
+}
+
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 {
 	struct kvm_vcpu *v;
@@ -78,7 +198,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
 		if (!v)
 			return -EINVAL;
-
 		/* No compat necessary here. */
 		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
 			     sizeof(struct compat_vcpu_info));
@@ -110,6 +229,22 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 		break;
 
+	case KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE:
+		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
+		if (!v)
+			return -EINVAL;
+
+		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.runstate_cache,
+					      data->u.vcpu_attr.gpa,
+					      sizeof(struct vcpu_runstate_info));
+		if (r)
+			return r;
+
+		v->arch.xen.runstate_set = true;
+		v->arch.xen.current_runstate = RUNSTATE_blocked;
+		v->arch.xen.last_state_ns = ktime_get_ns();
+		break;
+
 	default:
 		break;
 	}
@@ -157,6 +292,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		}
 		break;
 
+	case KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE:
+		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
+		if (!v)
+			return -EINVAL;
+
+		if (v->arch.xen.runstate_set) {
+			data->u.vcpu_attr.gpa = v->arch.xen.runstate_cache.gpa;
+			r = 0;
+		}
+		break;
+
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
index 120b7450252a..407e717476d6 100644
--- a/arch/x86/kvm/xen.h
+++ b/arch/x86/kvm/xen.h
@@ -9,6 +9,8 @@
 #ifndef __ARCH_X86_KVM_XEN_H__
 #define __ARCH_X86_KVM_XEN_H__
 
+void kvm_xen_setup_runstate_page(struct kvm_vcpu *vcpu);
+void kvm_xen_runstate_set_preempted(struct kvm_vcpu *vcpu);
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
@@ -54,7 +56,12 @@ struct compat_shared_info {
 	uint32_t evtchn_mask[32];
 	struct pvclock_wall_clock wc;
 	struct compat_arch_shared_info arch;
-
 };
 
+struct compat_vcpu_runstate_info {
+    int state;
+    uint64_t state_entry_time;
+    uint64_t time[4];
+} __attribute__((packed));
+
 #endif /* __ARCH_X86_KVM_XEN_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f60c5c61761c..ab83f3588719 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1600,6 +1600,7 @@ struct kvm_xen_hvm_attr {
 #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
 #define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
 #define KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO	0x3
+#define KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE		0x4
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 15/17] KVM: x86: declare Xen HVM shared info capability and add test case
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (13 preceding siblings ...)
  2020-12-14  8:39 ` [PATCH v3 14/17] KVM: x86/xen: register runstate info David Woodhouse
@ 2020-12-14  8:39 ` David Woodhouse
  2020-12-14  8:39 ` [PATCH v3 16/17] KVM: Add documentation for Xen hypercall and shared_info updates David Woodhouse
  2020-12-14  8:39 ` [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall David Woodhouse
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:39 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

Instead of adding a plethora of new KVM_CAP_XEN_FOO capabilities, just
add bits to the return value of KVM_CAP_XEN_HVM.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c                            |   3 +-
 include/uapi/linux/kvm.h                      |   3 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/xen_shinfo_test.c    | 194 ++++++++++++++++++
 4 files changed, 200 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd4bd9a818d8..df44d9e50adc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3737,7 +3737,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		break;
 	case KVM_CAP_XEN_HVM:
 		r = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
-		    KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL;
+		    KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL |
+		    KVM_XEN_HVM_CONFIG_SHARED_INFO;
 		break;
 	case KVM_CAP_SYNC_REGS:
 		r = KVM_SYNC_X86_VALID_FIELDS;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ab83f3588719..749a7112df99 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1145,6 +1145,7 @@ struct kvm_x86_mce {
 #ifdef KVM_CAP_XEN_HVM
 #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
 #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO		(1 << 2)
 
 struct kvm_xen_hvm_config {
 	__u32 flags;
@@ -1577,6 +1578,7 @@ struct kvm_pv_cmd {
 /* Available with KVM_CAP_X86_MSR_FILTER */
 #define KVM_X86_SET_MSR_FILTER	_IOW(KVMIO,  0xc6, struct kvm_msr_filter)
 
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
 #define KVM_XEN_HVM_GET_ATTR	_IOWR(KVMIO, 0xc7, struct kvm_xen_hvm_attr)
 #define KVM_XEN_HVM_SET_ATTR	_IOW(KVMIO,  0xc8, struct kvm_xen_hvm_attr)
 
@@ -1596,6 +1598,7 @@ struct kvm_xen_hvm_attr {
 	} u;
 };
 
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
 #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
 #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
 #define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index d94abec627e6..3d1d93947bda 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -59,6 +59,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
 TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test
+TEST_GEN_PROGS_x86_64 += x86_64/xen_shinfo_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
new file mode 100644
index 000000000000..c8c696a6d41c
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * svm_vmcall_test
+ *
+ * Copyright © 2020 Amazon.com, Inc. or its affiliates.
+ *
+ * Xen shared_info / pvclock testing
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#include <stdint.h>
+#include <time.h>
+
+#define VCPU_ID		5
+
+#define SHINFO_REGION_GPA	0xc0000000ULL
+#define SHINFO_REGION_SLOT	10
+#define PAGE_SIZE		4096
+
+#define PVTIME_ADDR	(SHINFO_REGION_GPA + PAGE_SIZE)
+#define RUNSTATE_ADDR	(SHINFO_REGION_GPA + PAGE_SIZE + 0x20)
+
+static struct kvm_vm *vm;
+
+#define XEN_HYPERCALL_MSR	0x40000000
+
+struct pvclock_vcpu_time_info {
+        u32   version;
+        u32   pad0;
+        u64   tsc_timestamp;
+        u64   system_time;
+        u32   tsc_to_system_mul;
+        s8    tsc_shift;
+        u8    flags;
+        u8    pad[2];
+} __attribute__((__packed__)); /* 32 bytes */
+
+struct pvclock_wall_clock {
+        u32   version;
+        u32   sec;
+        u32   nsec;
+} __attribute__((__packed__));
+
+struct vcpu_runstate_info {
+    uint32_t state;
+    uint64_t state_entry_time;
+    uint64_t time[4];
+};
+
+static void guest_code(void)
+{
+	struct vcpu_runstate_info *rs = (void *)RUNSTATE_ADDR;
+
+	/* Scribble on the runstate, just to make sure that... */
+	rs->state = 0x5a;
+
+	GUEST_SYNC(1);
+
+	/* ... it is being set to RUNSTATE_running */
+	GUEST_ASSERT(rs->state == 0);
+	GUEST_DONE();
+}
+
+static int cmp_timespec(struct timespec *a, struct timespec *b)
+{
+	if (a->tv_sec > b->tv_sec)
+		return 1;
+	else if (a->tv_sec < b->tv_sec)
+		return -1;
+	else if (a->tv_nsec > b->tv_nsec)
+		return 1;
+	else if (a->tv_nsec < b->tv_nsec)
+		return -1;
+	else
+		return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	struct timespec min_ts, max_ts, vm_ts;
+
+	if (!(kvm_check_cap(KVM_CAP_XEN_HVM) &
+	      KVM_XEN_HVM_CONFIG_SHARED_INFO) ) {
+		print_skip("KVM_XEN_HVM_CONFIG_SHARED_INFO not available");
+		exit(KSFT_SKIP);
+	}
+
+	clock_gettime(CLOCK_REALTIME, &min_ts);
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) guest_code);
+	vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+
+	/* Map a region for the shared_info page */
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+                                    SHINFO_REGION_GPA, SHINFO_REGION_SLOT,
+				    2 * getpagesize(), 0);
+	virt_map(vm, SHINFO_REGION_GPA, SHINFO_REGION_GPA, 2, 0);
+
+	struct kvm_xen_hvm_attr lm = {
+		.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+		.u.long_mode = 1,
+	};
+	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &lm);
+
+	struct kvm_xen_hvm_attr ha = {
+		.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+		.u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE,
+	};
+	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &ha);
+
+	struct kvm_xen_hvm_attr vi = {
+		.type = KVM_XEN_ATTR_TYPE_VCPU_INFO,
+		.u.vcpu_attr.vcpu_id = VCPU_ID,
+		.u.vcpu_attr.gpa = SHINFO_REGION_GPA + 40,
+	};
+	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &vi);
+
+	struct kvm_xen_hvm_attr pvclock = {
+		.type = KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO,
+		.u.vcpu_attr.vcpu_id = VCPU_ID,
+		.u.vcpu_attr.gpa = PVTIME_ADDR,
+	};
+	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &pvclock);
+
+	struct kvm_xen_hvm_attr st = {
+		.type = KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE,
+		.u.vcpu_attr.vcpu_id = VCPU_ID,
+		.u.vcpu_attr.gpa = RUNSTATE_ADDR,
+	};
+	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &st);
+
+	for (;;) {
+		volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+		struct ucall uc;
+
+		vcpu_run(vm, VCPU_ID);
+
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+			    run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s", (const char *)uc.args[0]);
+			/* NOT REACHED */
+		case UCALL_SYNC:
+			break;
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
+		}
+	}
+
+ done:
+	clock_gettime(CLOCK_REALTIME, &max_ts);
+
+	/*
+	 * Just a *really* basic check that things are being put in the
+	 * right place. The actual calculations are much the same for
+	 * Xen as they are for the KVM variants, so no need to check.
+	 */
+	struct pvclock_wall_clock *wc;
+	struct pvclock_vcpu_time_info *ti, *ti2;
+	struct vcpu_runstate_info *rs;
+
+	wc = addr_gva2hva(vm, SHINFO_REGION_GPA + 0xc00);
+	ti = addr_gva2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20);
+	ti2 = addr_gva2hva(vm, PVTIME_ADDR);
+	rs = addr_gva2hva(vm, RUNSTATE_ADDR);
+
+	vm_ts.tv_sec = wc->sec;
+	vm_ts.tv_nsec = wc->nsec;
+        TEST_ASSERT(wc->version && !(wc->version & 1),
+		    "Bad wallclock version %x", wc->version);
+	TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old");
+	TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new");
+
+	TEST_ASSERT(ti->version && !(ti->version & 1),
+		    "Bad time_info version %x", ti->version);
+	TEST_ASSERT(ti2->version && !(ti2->version & 1),
+		    "Bad time_info version %x", ti->version);
+
+	/* Check for RUNSTATE_blocked */
+	TEST_ASSERT(rs->state == 2, "Not RUNSTATE_blocked");
+	TEST_ASSERT(rs->time[0], "No RUNSTATE_running time");
+	TEST_ASSERT(rs->time[2], "No RUNSTATE_blocked time");
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 16/17] KVM: Add documentation for Xen hypercall and shared_info updates
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (14 preceding siblings ...)
  2020-12-14  8:39 ` [PATCH v3 15/17] KVM: x86: declare Xen HVM shared info capability and add test case David Woodhouse
@ 2020-12-14  8:39 ` David Woodhouse
  2020-12-14  8:39 ` [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall David Woodhouse
  16 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:39 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 Documentation/virt/kvm/api.rst | 123 +++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e00a66d72372..d1c30105e6fd 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -944,6 +944,13 @@ memory.
 	__u8 pad2[30];
   };
 
+If the KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag is returned from the
+KVM_CAP_XEN_HVM check, it may be set in the flags field of this ioctl.
+This requests KVM to generate the contents of the hypercall page
+automatically, and also to intercept hypercalls with KVM_EXIT_XEN.
+In this case, all of the blob size and address fields must be zero.
+
+No other flags are currently valid.
 
 4.29 KVM_GET_CLOCK
 ------------------
@@ -4807,6 +4814,71 @@ into user space.
 If a vCPU is in running state while this ioctl is invoked, the vCPU may
 experience inconsistent filtering behavior on MSR accesses.
 
+4.127 KVM_XEN_HVM_SET_ATTR
+--------------------------
+
+:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_xen_hvm_attr
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_xen_hvm_attr {
+	__u16 type;
+
+	union {
+		__u8 long_mode;
+		struct {
+			__u64 gfn;
+		} shared_info;
+		struct {
+			__u32 vcpu_id;
+			__u64 gpa;
+		} vcpu_attr;
+		__u64 pad[4];
+	} u;
+  };
+
+type values:
+
+KVM_XEN_ATTR_TYPE_LONG_MODE
+  Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
+  determines the layout of the shared info pages exposed to the VM.
+
+KVM_XEN_ATTR_TYPE_SHARED_INFO
+  Sets the guest physical frame number at which the Xen "shared info"
+  page resides. Note that although Xen places vcpu_info for the first
+  32 vCPUs in the shared_info page, KVM does not automatically do so
+  and requires that KVM_XEN_ATTR_TYPE_VCPU_INFO be used explicitly
+  even when the vcpu_info for a given vCPU resides at the "default"
+  location in the shared_info page. This is because KVM is not aware
+  of the Xen CPU id which is used as the index into the vcpu_info[]
+  array, so cannot know the correct default location.
+
+KVM_XEN_ATTR_TYPE_VCPU_INFO
+  Sets the guest physical address of the vcpu_info for a given vCPU.
+
+KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO
+  Sets the guest physical address of an additional pvclock structure
+  for a given vCPU. This is typically used for guest vsyscall support.
+
+KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE
+  Sets the guest physical address of the vcpu_runstate_info for a given
+  vCPU. This is how a Xen guest tracks CPU state such as steal time.
+
+4.128 KVM_XEN_HVM_GET_ATTR
+--------------------------
+
+:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_xen_hvm_attr
+:Returns: 0 on success, < 0 on error
+
+Allows Xen VM attributes to be read. For the structure and types,
+see KVM_XEN_HVM_SET_ATTR above.
 
 5. The kvm_run structure
 ========================
@@ -5303,6 +5375,34 @@ wants to write. Once finished processing the event, user space must continue
 vCPU execution. If the MSR write was unsuccessful, user space also sets the
 "error" field to "1".
 
+::
+
+
+		struct kvm_xen_exit {
+  #define KVM_EXIT_XEN_HCALL          1
+			__u32 type;
+			union {
+				struct {
+					__u32 longmode;
+					__u32 cpl;
+					__u64 input;
+					__u64 result;
+					__u64 params[6];
+				} hcall;
+			} u;
+		};
+		/* KVM_EXIT_XEN */
+                struct kvm_hyperv_exit xen;
+
+Indicates that the VCPU exits into userspace to process some tasks
+related to Xen emulation.
+
+Valid values for 'type' are:
+
+  - KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall.
+    Userspace is expected to place the hypercall result into the appropriate
+    field before invoking KVM_RUN again.
+
 ::
 
 		/* Fix the size of the union. */
@@ -6390,3 +6490,26 @@ When enabled, KVM will disable paravirtual features provided to the
 guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x40000001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
+
+8.29 KVM_CAP_XEN_HVM
+--------------------
+
+:Architectures: x86
+
+This capability indicates the features that Xen supports for hosting Xen
+PVHVM guests. Valid flags are::
+
+  #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
+  #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
+  #define KVM_XEN_HVM_CONFIG_SHARED_INFO	(1 << 2)
+
+The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
+ioctl is available, for the guest to set its hypercall page.
+
+If KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL is also set, the same flag may also be
+provided in the flags to KVM_XEN_HVM_CONFIG, without providing hypercall page
+contents, to request that KVM generate hypercall page content automatically
+and also enable interception of guest hypercalls with KVM_EXIT_XEN.
+
+The KVM_XEN_HVM_CONFIG_SHARED_INFO flag indicates the availability of the
+KVM_XEN_HVM_SET_ATTR and KVM_XEN_HVM_GET_ATTR ioctls.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall
  2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
                   ` (15 preceding siblings ...)
  2020-12-14  8:39 ` [PATCH v3 16/17] KVM: Add documentation for Xen hypercall and shared_info updates David Woodhouse
@ 2020-12-14  8:39 ` David Woodhouse
  2020-12-14 13:19   ` Joao Martins
  16 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14  8:39 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

From: David Woodhouse <dwmw@amazon.co.uk>

It turns out that we can't handle event channels *entirely* in userspace
by delivering them as ExtINT, because KVM is a bit picky about when it
accepts ExtINT interrupts from a legacy PIC. The in-kernel local APIC
has to have LVT0 configured in APIC_MODE_EXTINT and unmasked, which
isn't necessarily the case for Xen guests especially on secondary CPUs.

To cope with this, add kvm_xen_get_interrupt() which checks the
evtchn_pending_upcall field in the Xen vcpu_info, and delivers the Xen
upcall vector (configured by KVM_XEN_ATTR_TYPE_UPCALL_VECTOR) if it's
set regardless of LAPIC LVT0 configuration. This gives us the minimum
support we need for completely userspace-based implementation of event
channels.

This does mean that vcpu_enter_guest() needs to check for the
evtchn_pending_upcall flag being set, because it can't rely on someone
having set KVM_REQ_EVENT unless we were to add some way for userspace to
do so manually.

But actually, I don't quite see how that works reliably for interrupts
injected with KVM_INTERRUPT either. In kvm_vcpu_ioctl_interrupt() the
KVM_REQ_EVENT request is set once, but that'll get cleared the first time
through vcpu_enter_guest(). So if the first exit is for something *else*
without interrupts being enabled yet, won't the KVM_REQ_EVENT request
have been consumed already and just be lost?

I wonder if my addition of '|| kvm_xen_has_interrupt(vcpu)' should
actually be '|| kvm_has_injectable_intr(vcpu)' to fix that pre-existing
bug?

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/irq.c              |  7 +++++
 arch/x86/kvm/x86.c              |  3 +-
 arch/x86/kvm/xen.c              | 52 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/xen.h              |  1 +
 include/uapi/linux/kvm.h        |  2 ++
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4b345a8945ea..68a66b98fd5f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -903,6 +903,7 @@ struct msr_bitmap_range {
 struct kvm_xen {
 	bool long_mode;
 	bool shinfo_set;
+	u8 upcall_vector;
 	struct gfn_to_hva_cache shinfo_cache;
 };
 
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 814698e5b152..24668b51b5c8 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -14,6 +14,7 @@
 #include "irq.h"
 #include "i8254.h"
 #include "x86.h"
+#include "xen.h"
 
 /*
  * check if there are pending timer events
@@ -56,6 +57,9 @@ int kvm_cpu_has_extint(struct kvm_vcpu *v)
 	if (!lapic_in_kernel(v))
 		return v->arch.interrupt.injected;
 
+	if (kvm_xen_has_interrupt(v))
+		return 1;
+
 	if (!kvm_apic_accept_pic_intr(v))
 		return 0;
 
@@ -110,6 +114,9 @@ static int kvm_cpu_get_extint(struct kvm_vcpu *v)
 	if (!lapic_in_kernel(v))
 		return v->arch.interrupt.nr;
 
+	if (kvm_xen_has_interrupt(v))
+		return v->kvm->arch.xen.upcall_vector;
+
 	if (irqchip_split(v->kvm)) {
 		int vector = v->arch.pending_external_vector;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index df44d9e50adc..e627139cf8cd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8896,7 +8896,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops.msr_filter_changed(vcpu);
 	}
 
-	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
+	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
+	    kvm_xen_has_interrupt(vcpu)) {
 		++vcpu->stat.req_event;
 		kvm_apic_accept_events(vcpu);
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 17cbb4462b7e..4bc9da9fcfb8 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -176,6 +176,45 @@ void kvm_xen_setup_runstate_page(struct kvm_vcpu *v)
 	kvm_xen_update_runstate(v, RUNSTATE_running, steal_time);
 }
 
+int kvm_xen_has_interrupt(struct kvm_vcpu *v)
+{
+	u8 rc = 0;
+
+	/*
+	 * If the global upcall vector (HVMIRQ_callback_vector) is set and
+	 * the vCPU's evtchn_upcall_pending flag is set, the IRQ is pending.
+	 */
+	if (v->arch.xen.vcpu_info_set && v->kvm->arch.xen.upcall_vector) {
+		struct gfn_to_hva_cache *ghc = &v->arch.xen.vcpu_info_cache;
+		struct kvm_memslots *slots = kvm_memslots(v->kvm);
+		unsigned int offset = offsetof(struct vcpu_info, evtchn_upcall_pending);
+
+		/* No need for compat handling here */
+		BUILD_BUG_ON(offsetof(struct vcpu_info, evtchn_upcall_pending) !=
+			     offsetof(struct compat_vcpu_info, evtchn_upcall_pending));
+		BUILD_BUG_ON(sizeof(rc) !=
+			     sizeof(((struct vcpu_info *)0)->evtchn_upcall_pending));
+		BUILD_BUG_ON(sizeof(rc) !=
+			     sizeof(((struct compat_vcpu_info *)0)->evtchn_upcall_pending));
+
+		/*
+		 * For efficiency, this mirrors the checks for using the valid
+		 * cache in kvm_read_guest_offset_cached(), but just uses
+		 * __get_user() instead. And falls back to the slow path.
+		 */
+		if (likely(slots->generation == ghc->generation &&
+			   !kvm_is_error_hva(ghc->hva) && ghc->memslot)) {
+			/* Fast path */
+			__get_user(rc, (u8 __user *)ghc->hva + offset);
+		} else {
+			/* Slow path */
+			kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
+						      sizeof(rc));
+		}
+	}
+	return rc;
+}
+
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 {
 	struct kvm_vcpu *v;
@@ -245,6 +284,14 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		v->arch.xen.last_state_ns = ktime_get_ns();
 		break;
 
+	case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
+		if (data->u.vector < 0x10)
+			return -EINVAL;
+
+		kvm->arch.xen.upcall_vector = data->u.vector;
+		r = 0;
+		break;
+
 	default:
 		break;
 	}
@@ -303,6 +350,11 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		}
 		break;
 
+	case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
+		data->u.vector = kvm->arch.xen.upcall_vector;
+		r = 0;
+		break;
+
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
index 407e717476d6..d64916ac4a12 100644
--- a/arch/x86/kvm/xen.h
+++ b/arch/x86/kvm/xen.h
@@ -11,6 +11,7 @@
 
 void kvm_xen_setup_runstate_page(struct kvm_vcpu *vcpu);
 void kvm_xen_runstate_set_preempted(struct kvm_vcpu *vcpu);
+int kvm_xen_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
 int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 749a7112df99..33609bc25021 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1587,6 +1587,7 @@ struct kvm_xen_hvm_attr {
 
 	union {
 		__u8 long_mode;
+		__u8 vector;
 		struct {
 			__u64 gfn;
 		} shared_info;
@@ -1604,6 +1605,7 @@ struct kvm_xen_hvm_attr {
 #define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
 #define KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO	0x3
 #define KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE		0x4
+#define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR		0x5
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14  8:38 ` [PATCH v3 08/17] KVM: x86/xen: register shared_info page David Woodhouse
@ 2020-12-14 10:45   ` Joao Martins
  2020-12-14 11:30     ` Joao Martins
  2020-12-14 12:02     ` David Woodhouse
  0 siblings, 2 replies; 53+ messages in thread
From: Joao Martins @ 2020-12-14 10:45 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 8:38 AM, David Woodhouse wrote:
> From: Joao Martins <joao.m.martins@oracle.com>
> 
> We add a new ioctl, XEN_HVM_SHARED_INFO, to allow hypervisor
> to know where the guest's shared info page is.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/xen.c              | 27 +++++++++++++++++++++++++++
>  arch/x86/kvm/xen.h              |  1 -
>  include/uapi/linux/kvm.h        |  4 ++++
>  4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c9a4feaee2e7..8bcd83dacf43 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -893,6 +893,8 @@ struct msr_bitmap_range {
>  /* Xen emulation context */
>  struct kvm_xen {
>  	bool long_mode;
> +	bool shinfo_set;
> +	struct gfn_to_hva_cache shinfo_cache;
>  };
>  
>  enum kvm_irqchip_mode {
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 52cb9e465542..9dd9c42842b8 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -13,9 +13,23 @@
>  #include <linux/kvm_host.h>
>  
>  #include <trace/events/kvm.h>
> +#include <xen/interface/xen.h>
>  
>  #include "trace.h"
>  
> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
> +{
> +	int ret;
> +
> +	ret = kvm_gfn_to_hva_cache_init(kvm, &kvm->arch.xen.shinfo_cache,
> +					gfn_to_gpa(gfn), PAGE_SIZE);
> +	if (ret)
> +		return ret;
> +
> +	kvm->arch.xen.shinfo_set = true;

Can't you just use:

	kvm->arch.xen.shinfo_cache.gpa

Rather than added a bool just to say you set a shinfo?

> +	return 0;
> +}

And then here you just return @ret while removing the other conditional.

> +
>  int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  {
>  	int r = -ENOENT;
> @@ -28,6 +42,11 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		kvm->arch.xen.long_mode = !!data->u.long_mode;
>  		r = 0;
>  		break;
> +
> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
> +		r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
> +		break;
> +
>  	default:
>  		break;
>  	}
> @@ -44,6 +63,14 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		data->u.long_mode = kvm->arch.xen.long_mode;
>  		r = 0;
>  		break;
> +
> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
> +		if (kvm->arch.xen.shinfo_set) {
> +			data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa);
> +			r = 0;
> +		}
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
> index cd3c52b62068..120b7450252a 100644
> --- a/arch/x86/kvm/xen.h
> +++ b/arch/x86/kvm/xen.h
> @@ -13,7 +13,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
>  int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
>  int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
>  int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data);
> -void kvm_xen_destroy_vm(struct kvm *kvm);
>  
spurious deletion ?

>  static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm)
>  {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6b556ef98b76..caa9faf3c5ad 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1585,11 +1585,15 @@ struct kvm_xen_hvm_attr {
>  
>  	union {
>  		__u8 long_mode;
> +		struct {
> +			__u64 gfn;
> +		} shared_info;
>  		__u64 pad[4];
>  	} u;
>  };
>  
>  #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
> +#define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
>  
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 11/17] KVM: x86/xen: register vcpu info
  2020-12-14  8:38 ` [PATCH v3 11/17] KVM: x86/xen: register vcpu info David Woodhouse
@ 2020-12-14 10:48   ` Joao Martins
  0 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2020-12-14 10:48 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 8:38 AM, David Woodhouse wrote:
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index e5117a611737..4bc72e0b9928 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -58,6 +58,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
>  
>  int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  {
> +	struct kvm_vcpu *v;
>  	int r = -ENOENT;
>  
>  	switch (data->type) {
> @@ -73,6 +74,23 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
>  		break;
>  
> +	case KVM_XEN_ATTR_TYPE_VCPU_INFO:
> +		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
> +		if (!v)
> +			return -EINVAL;
> +
> +		/* No compat necessary here. */
> +		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
> +			     sizeof(struct compat_vcpu_info));
> +		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.vcpu_info_cache,
> +					      data->u.vcpu_attr.gpa,
> +					      sizeof(struct vcpu_info));
> +		if (r)
> +			return r;
> +
> +		v->arch.xen.vcpu_info_set = true;

Same comment as the shared_info page.

> +		break;
> +
>  	default:
>  		break;
>  	}
> @@ -83,6 +101,7 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  {
>  	int r = -ENOENT;
> +	struct kvm_vcpu *v;
>  
>  	switch (data->type) {
>  	case KVM_XEN_ATTR_TYPE_LONG_MODE:
> @@ -97,6 +116,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		}
>  		break;
>  
> +	case KVM_XEN_ATTR_TYPE_VCPU_INFO:
> +		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
> +		if (!v)
> +			return -EINVAL;
> +
> +		if (v->arch.xen.vcpu_info_set) {
> +			data->u.vcpu_attr.gpa = v->arch.xen.vcpu_info_cache.gpa;
> +			r = 0;
> +		}
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index caa9faf3c5ad..87d150992f48 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1588,12 +1588,17 @@ struct kvm_xen_hvm_attr {
>  		struct {
>  			__u64 gfn;
>  		} shared_info;
> +		struct {
> +			__u32 vcpu_id;
> +			__u64 gpa;
> +		} vcpu_attr;
>  		__u64 pad[4];
>  	} u;
>  };
>  
>  #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
>  #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
> +#define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
>  
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region
  2020-12-14  8:39 ` [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region David Woodhouse
@ 2020-12-14 10:55   ` Joao Martins
  2020-12-14 12:03     ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2020-12-14 10:55 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 8:39 AM, David Woodhouse wrote:
> From: Joao Martins <joao.m.martins@oracle.com>
> 
> Allow the Xen emulated guest the ability to register secondary
> vcpu time information. On Xen guests this is used in order to be
> mapped to userspace and hence allow vdso gettimeofday to work.
> 
> In doing so, move kvm_xen_set_pvclock_page() logic to
> kvm_xen_update_vcpu_time() and have the former a top-level
> function which updates primary vcpu time info (in struct vcpu_info)
> and secondary one.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/x86.c              |  2 ++
>  arch/x86/kvm/xen.c              | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h        |  1 +
>  4 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 56c00a9441a3..b7dfcb4de92a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -524,7 +524,9 @@ struct kvm_vcpu_hv {
>  struct kvm_vcpu_xen {
>  	u64 hypercall_rip;
>  	bool vcpu_info_set;
> +	bool vcpu_time_info_set;
>  	struct gfn_to_hva_cache vcpu_info_cache;
> +	struct gfn_to_hva_cache vcpu_time_info_cache;
>  };
>  
>  struct kvm_vcpu_arch {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cbdc05bb53bd..2234fdf49d82 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2729,6 +2729,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  	if (vcpu->xen.vcpu_info_set)
>  		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_info_cache,
>  				       offsetof(struct compat_vcpu_info, time));
> +	if (vcpu->xen.vcpu_time_info_set)
> +		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
>  	if (v == kvm_get_vcpu(v->kvm, 0))
>  		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
>  	return 0;
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index d2055b60fdc1..1cca46effec8 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -95,6 +95,21 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>  		break;
>  
> +	case KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO:
> +		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
> +		if (!v)
> +			return -EINVAL;
> +
> +		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.vcpu_time_info_cache,
> +					      data->u.vcpu_attr.gpa,
> +					      sizeof(struct pvclock_vcpu_time_info));
> +		if (r)
> +			return r;
> +
> +		v->arch.xen.vcpu_time_info_set = true;

Same comment as shared_info: we probably don't need vcpu_time_info_set if we piggyback
on the gfn_to_hva cache setting its @gpa field.

> +		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> +		break;
> +
>  	default:
>  		break;
>  	}
> @@ -131,6 +146,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		}
>  		break;
>  
> +	case KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO:
> +		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
> +		if (!v)
> +			return -EINVAL;
> +
> +		if (v->arch.xen.vcpu_time_info_set) {
> +			data->u.vcpu_attr.gpa = v->arch.xen.vcpu_time_info_cache.gpa;
> +			r = 0;
> +		}
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 87d150992f48..f60c5c61761c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1599,6 +1599,7 @@ struct kvm_xen_hvm_attr {
>  #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
>  #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
>  #define KVM_XEN_ATTR_TYPE_VCPU_INFO		0x2
> +#define KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO	0x3
>  
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 14/17] KVM: x86/xen: register runstate info
  2020-12-14  8:39 ` [PATCH v3 14/17] KVM: x86/xen: register runstate info David Woodhouse
@ 2020-12-14 11:10   ` Joao Martins
  2020-12-14 15:47     ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2020-12-14 11:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm



On 12/14/20 8:39 AM, David Woodhouse wrote:

[...]

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b7dfcb4de92a..4b345a8945ea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -523,10 +523,15 @@ struct kvm_vcpu_hv {
>  /* Xen HVM per vcpu emulation context */
>  struct kvm_vcpu_xen {
>  	u64 hypercall_rip;
> +	u32 current_runstate;
>  	bool vcpu_info_set;
>  	bool vcpu_time_info_set;
> +	bool runstate_set;
>  	struct gfn_to_hva_cache vcpu_info_cache;
>  	struct gfn_to_hva_cache vcpu_time_info_cache;
> +	struct gfn_to_hva_cache runstate_cache;
> +	u64 last_steal;
> +	u64 last_state_ns;
>  };
>  
>  struct kvm_vcpu_arch {

[...]

> @@ -78,7 +198,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
>  		if (!v)
>  			return -EINVAL;
> -
>  		/* No compat necessary here. */
>  		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
>  			     sizeof(struct compat_vcpu_info));
> @@ -110,6 +229,22 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>  		break;
>  
> +	case KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE:
> +		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
> +		if (!v)
> +			return -EINVAL;
> +
> +		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.runstate_cache,
> +					      data->u.vcpu_attr.gpa,
> +					      sizeof(struct vcpu_runstate_info));
> +		if (r)
> +			return r;
> +
> +		v->arch.xen.runstate_set = true;

Same as shared_info comment.

But maybe {shared_info,vcpu_info,runstate}_set could be instead turned into helpers
if it helps the cosmetics.

> +		v->arch.xen.current_runstate = RUNSTATE_blocked;
> +		v->arch.xen.last_state_ns = ktime_get_ns();
> +		break;
> +
>  	default:
>  		break;
>  	}
> @@ -157,6 +292,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		}
>  		break;
>  
> +	case KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE:
> +		v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id);
> +		if (!v)
> +			return -EINVAL;
> +
> +		if (v->arch.xen.runstate_set) {
> +			data->u.vcpu_attr.gpa = v->arch.xen.runstate_cache.gpa;
> +			r = 0;
> +		}
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
> index 120b7450252a..407e717476d6 100644
> --- a/arch/x86/kvm/xen.h
> +++ b/arch/x86/kvm/xen.h
> @@ -9,6 +9,8 @@
>  #ifndef __ARCH_X86_KVM_XEN_H__
>  #define __ARCH_X86_KVM_XEN_H__
>  
> +void kvm_xen_setup_runstate_page(struct kvm_vcpu *vcpu);
> +void kvm_xen_runstate_set_preempted(struct kvm_vcpu *vcpu);
>  int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
>  int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
>  int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
> @@ -54,7 +56,12 @@ struct compat_shared_info {
>  	uint32_t evtchn_mask[32];
>  	struct pvclock_wall_clock wc;
>  	struct compat_arch_shared_info arch;
> -
>  };
>  

This change belos in the seventh patch I believe.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14 10:45   ` Joao Martins
@ 2020-12-14 11:30     ` Joao Martins
  2020-12-14 12:04       ` David Woodhouse
  2020-12-14 12:02     ` David Woodhouse
  1 sibling, 1 reply; 53+ messages in thread
From: Joao Martins @ 2020-12-14 11:30 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 10:45 AM, Joao Martins wrote:
> On 12/14/20 8:38 AM, David Woodhouse wrote:
>> From: Joao Martins <joao.m.martins@oracle.com>
>>
>> We add a new ioctl, XEN_HVM_SHARED_INFO, to allow hypervisor
>> to know where the guest's shared info page is.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>  arch/x86/kvm/xen.c              | 27 +++++++++++++++++++++++++++
>>  arch/x86/kvm/xen.h              |  1 -
>>  include/uapi/linux/kvm.h        |  4 ++++
>>  4 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index c9a4feaee2e7..8bcd83dacf43 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -893,6 +893,8 @@ struct msr_bitmap_range {
>>  /* Xen emulation context */
>>  struct kvm_xen {
>>  	bool long_mode;
>> +	bool shinfo_set;
>> +	struct gfn_to_hva_cache shinfo_cache;
>>  };
>>  
>>  enum kvm_irqchip_mode {
>> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
>> index 52cb9e465542..9dd9c42842b8 100644
>> --- a/arch/x86/kvm/xen.c
>> +++ b/arch/x86/kvm/xen.c
>> @@ -13,9 +13,23 @@
>>  #include <linux/kvm_host.h>
>>  
>>  #include <trace/events/kvm.h>
>> +#include <xen/interface/xen.h>
>>  
>>  #include "trace.h"
>>  
>> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
>> +{
>> +	int ret;
>> +
>> +	ret = kvm_gfn_to_hva_cache_init(kvm, &kvm->arch.xen.shinfo_cache,
>> +					gfn_to_gpa(gfn), PAGE_SIZE);
>> +	if (ret)
>> +		return ret;
>> +
>> +	kvm->arch.xen.shinfo_set = true;
> 
> Can't you just use:
> 
> 	kvm->arch.xen.shinfo_cache.gpa
> 
> Rather than added a bool just to say you set a shinfo?
> 
Or maybe @len in case you want to consider @gpa = 0 as valid.

>> +	return 0;
>> +}
> 
> And then here you just return @ret while removing the other conditional.
> 
>> +
>>  int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>>  {
>>  	int r = -ENOENT;
>> @@ -28,6 +42,11 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>>  		kvm->arch.xen.long_mode = !!data->u.long_mode;
>>  		r = 0;
>>  		break;
>> +
>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
>> +		r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
>> +		break;
>> +
>>  	default:
>>  		break;
>>  	}
>> @@ -44,6 +63,14 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>>  		data->u.long_mode = kvm->arch.xen.long_mode;
>>  		r = 0;
>>  		break;
>> +
>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
>> +		if (kvm->arch.xen.shinfo_set) {
>> +			data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa);
>> +			r = 0;
>> +		}
>> +		break;
>> +
>>  	default:
>>  		break;
>>  	}
>> diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
>> index cd3c52b62068..120b7450252a 100644
>> --- a/arch/x86/kvm/xen.h
>> +++ b/arch/x86/kvm/xen.h
>> @@ -13,7 +13,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
>>  int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
>>  int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
>>  int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data);
>> -void kvm_xen_destroy_vm(struct kvm *kvm);
>>  
> spurious deletion ?
> 
>>  static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm)
>>  {
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6b556ef98b76..caa9faf3c5ad 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1585,11 +1585,15 @@ struct kvm_xen_hvm_attr {
>>  
>>  	union {
>>  		__u8 long_mode;
>> +		struct {
>> +			__u64 gfn;
>> +		} shared_info;
>>  		__u64 pad[4];
>>  	} u;
>>  };
>>  
>>  #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
>> +#define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
>>  
>>  /* Secure Encrypted Virtualization command */
>>  enum sev_cmd_id {
>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14 10:45   ` Joao Martins
  2020-12-14 11:30     ` Joao Martins
@ 2020-12-14 12:02     ` David Woodhouse
  2020-12-14 12:53       ` Joao Martins
  1 sibling, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 12:02 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 2694 bytes --]

On Mon, 2020-12-14 at 10:45 +0000, Joao Martins wrote:
> On 12/14/20 8:38 AM, David Woodhouse wrote:
> > From: Joao Martins <joao.m.martins@oracle.com>
> > 
> > We add a new ioctl, XEN_HVM_SHARED_INFO, to allow hypervisor
> > to know where the guest's shared info page is.
> > 
> > Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  2 ++
> >  arch/x86/kvm/xen.c              | 27 +++++++++++++++++++++++++++
> >  arch/x86/kvm/xen.h              |  1 -
> >  include/uapi/linux/kvm.h        |  4 ++++
> >  4 files changed, 33 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > index c9a4feaee2e7..8bcd83dacf43 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -893,6 +893,8 @@ struct msr_bitmap_range {
> >  /* Xen emulation context */
> >  struct kvm_xen {
> >  	bool long_mode;
> > +	bool shinfo_set;
> > +	struct gfn_to_hva_cache shinfo_cache;
> >  };
> >  
> >  enum kvm_irqchip_mode {
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 52cb9e465542..9dd9c42842b8 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -13,9 +13,23 @@
> >  #include <linux/kvm_host.h>
> >  
> >  #include <trace/events/kvm.h>
> > +#include <xen/interface/xen.h>
> >  
> >  #include "trace.h"
> >  
> > +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
> > +{
> > +	int ret;
> > +
> > +	ret = kvm_gfn_to_hva_cache_init(kvm, &kvm-
> > >arch.xen.shinfo_cache,
> > +					gfn_to_gpa(gfn), PAGE_SIZE);
> > +	if (ret)
> > +		return ret;
> > +
> > +	kvm->arch.xen.shinfo_set = true;
> 
> Can't you just use:
> 
> 	kvm->arch.xen.shinfo_cache.gpa
> 
> Rather than added a bool just to say you set a shinfo?

I see no reason why a guest shouldn't be able to use GPA zero if it
really wanted to. Using a separate boolean matches what KVM does for
the wallclock info.


> > --- a/arch/x86/kvm/xen.h
> > +++ b/arch/x86/kvm/xen.h
> > @@ -13,7 +13,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct
> > kvm_xen_hvm_attr *data);
> >  int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr
> > *data);
> >  int kvm_xen_hypercall(struct kvm_vcpu *vcpu);
> >  int kvm_xen_hvm_config(struct kvm_vcpu *vcpu, u64 data);
> > -void kvm_xen_destroy_vm(struct kvm *kvm);
> >  
> 
> spurious deletion ?

Now nothing is kept mapped, there's no need for a destroy function. It
must have been spuriously *added* in a previous patch and then deleted
here. Will fix.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region
  2020-12-14 10:55   ` Joao Martins
@ 2020-12-14 12:03     ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 12:03 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 320 bytes --]

On Mon, 2020-12-14 at 10:55 +0000, Joao Martins wrote:
> Same comment as shared_info: we probably don't need
> vcpu_time_info_set if we piggyback
> on the gfn_to_hva cache setting its @gpa field.

Now we *could* potentially use the @hva field there... but I didn't
want to. I'd rather keep it clean and simple.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14 11:30     ` Joao Martins
@ 2020-12-14 12:04       ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 12:04 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

On Mon, 2020-12-14 at 11:30 +0000, Joao Martins wrote:
> > Can't you just use:
> > 
> >        kvm->arch.xen.shinfo_cache.gpa
> > 
> > Rather than added a bool just to say you set a shinfo?
> > 
> 
> Or maybe @len in case you want to consider @gpa = 0 as valid.

Ah yes, it was @len not @hva that I was thinking of using, before
deciding I preferred an explicit boolean.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14 12:02     ` David Woodhouse
@ 2020-12-14 12:53       ` Joao Martins
  2020-12-14 15:05         ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2020-12-14 12:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 12:02 PM, David Woodhouse wrote:
> On Mon, 2020-12-14 at 10:45 +0000, Joao Martins wrote:
>> On 12/14/20 8:38 AM, David Woodhouse wrote:
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>
>>> We add a new ioctl, XEN_HVM_SHARED_INFO, to allow hypervisor
>>> to know where the guest's shared info page is.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>> ---
>>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>>  arch/x86/kvm/xen.c              | 27 +++++++++++++++++++++++++++
>>>  arch/x86/kvm/xen.h              |  1 -
>>>  include/uapi/linux/kvm.h        |  4 ++++
>>>  4 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>> b/arch/x86/include/asm/kvm_host.h
>>> index c9a4feaee2e7..8bcd83dacf43 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -893,6 +893,8 @@ struct msr_bitmap_range {
>>>  /* Xen emulation context */
>>>  struct kvm_xen {
>>>  	bool long_mode;
>>> +	bool shinfo_set;
>>> +	struct gfn_to_hva_cache shinfo_cache;
>>>  };
>>>  
>>>  enum kvm_irqchip_mode {
>>> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
>>> index 52cb9e465542..9dd9c42842b8 100644
>>> --- a/arch/x86/kvm/xen.c
>>> +++ b/arch/x86/kvm/xen.c
>>> @@ -13,9 +13,23 @@
>>>  #include <linux/kvm_host.h>
>>>  
>>>  #include <trace/events/kvm.h>
>>> +#include <xen/interface/xen.h>
>>>  
>>>  #include "trace.h"
>>>  
>>> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = kvm_gfn_to_hva_cache_init(kvm, &kvm-
>>>> arch.xen.shinfo_cache,
>>> +					gfn_to_gpa(gfn), PAGE_SIZE);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	kvm->arch.xen.shinfo_set = true;
>>
>> Can't you just use:
>>
>> 	kvm->arch.xen.shinfo_cache.gpa
>>
>> Rather than added a bool just to say you set a shinfo?
> 
> I see no reason why a guest shouldn't be able to use GPA zero if it
> really wanted to. Using a separate boolean matches what KVM does for
> the wallclock info.
> 
Ah OK -- I didn't notice @pv_time_enabled before suggesting this.

An helper would probably suffice without sacrificing footprint in
data structures (because you will add 3 of these). The helpers would be, say:

	kvm_xen_shared_info_set(vcpu->kvm)
	kvm_xen_vcpu_info_set(vcpu)
	kvm_xen_vcpu_runstate_set(vcpu)

And maybe backed by a:

	kvm_gfn_to_hva_cache_valid(&v->kvm->arch.xen.shinfo_cache)

Which would check whether a cache is initialized or valid.

But anyhow, I don't feel strongly about it, specially if there's an existing
one which sort of sets the precedent.

	Joao

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall
  2020-12-14  8:39 ` [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall David Woodhouse
@ 2020-12-14 13:19   ` Joao Martins
  2020-12-14 13:32     ` Durrant, Paul
  2020-12-14 14:57     ` David Woodhouse
  0 siblings, 2 replies; 53+ messages in thread
From: Joao Martins @ 2020-12-14 13:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 8:39 AM, David Woodhouse wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index df44d9e50adc..e627139cf8cd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8896,7 +8896,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_x86_ops.msr_filter_changed(vcpu);
>  	}
>  
> -	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> +	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
> +	    kvm_xen_has_interrupt(vcpu)) {
>  		++vcpu->stat.req_event;
>  		kvm_apic_accept_events(vcpu);
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 17cbb4462b7e..4bc9da9fcfb8 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -176,6 +176,45 @@ void kvm_xen_setup_runstate_page(struct kvm_vcpu *v)
>  	kvm_xen_update_runstate(v, RUNSTATE_running, steal_time);
>  }
>  
> +int kvm_xen_has_interrupt(struct kvm_vcpu *v)
> +{
> +	u8 rc = 0;
> +
> +	/*
> +	 * If the global upcall vector (HVMIRQ_callback_vector) is set and
> +	 * the vCPU's evtchn_upcall_pending flag is set, the IRQ is pending.
> +	 */
> +	if (v->arch.xen.vcpu_info_set && v->kvm->arch.xen.upcall_vector) {
> +		struct gfn_to_hva_cache *ghc = &v->arch.xen.vcpu_info_cache;
> +		struct kvm_memslots *slots = kvm_memslots(v->kvm);
> +		unsigned int offset = offsetof(struct vcpu_info, evtchn_upcall_pending);
> +

To have less nesting, wouldn't it be better to invert the logic?

say:

	u8 rc = 0;
	struct gfn_to_hva_cache *ghc
	struct kvm_memslots *slots;
	unsigned int offset;


	if (!v->arch.xen.vcpu_info_set || !v->kvm->arch.xen.upcall_vector)
		return 0;

	BUILD_BUG_ON(...)
	
	ghc = &v->arch.xen.vcpu_info_cache;
	slots = kvm_memslots(v->kvm);
	offset = offsetof(struct vcpu_info, evtchn_upcall_pending);

But I think there's a flaw here. That is handling the case where you don't have a
vcpu_info registered, and only shared info. The vcpu_info is then placed elsewhere, i.e.
another offset out of shared_info -- which is *I think* the case for PVHVM Windows guests.

Perhaps introducing a helper which adds xen_vcpu_info() and returns you the hva (picking
the right cache) similar to the RFC patch. Albeit that was with page pinning, but
borrowing an older version I had with hva_to_gfn_cache incantation would probably look like:


        if (v->arch.xen.vcpu_info_set) {
		ghc = &v->arch.xen.vcpu_info_cache;
        } else {
		ghc = &v->arch.xen.vcpu_info_cache;
                offset += offsetof(struct shared_info, vcpu_info);
                offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct vcpu_info);
        }

	if (likely(slots->generation == ghc->generation &&
		   !kvm_is_error_hva(ghc->hva) && ghc->memslot)) {
		/* Fast path */
		__get_user(rc, (u8 __user *)ghc->hva + offset);
	} else {
		/* Slow path */
		kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
					     sizeof(rc));
	}

 ?

	Joao

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates
  2020-12-14  8:39 ` [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates David Woodhouse
@ 2020-12-14 13:29   ` Joao Martins
  2020-12-14 14:58     ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2020-12-14 13:29 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 8:39 AM, David Woodhouse wrote:
> From: Joao Martins <joao.m.martins@oracle.com>
> 
> Parameterise kvm_setup_pvclock_page() a little bit so that it can be
> invoked for different gfn_to_hva_cache structures, and with different
> offsets. Then we can invoke it for the normal KVM pvclock and also for
> the Xen one in the vcpu_info.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/x86.c | 31 ++++++++++++++++++-------------
>  arch/x86/kvm/xen.c |  4 ++++
>  2 files changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 64016443159c..cbdc05bb53bd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2582,13 +2582,15 @@ u64 get_kvmclock_ns(struct kvm *kvm)
>  	return ret;
>  }
>  
> -static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
> +static void kvm_setup_pvclock_page(struct kvm_vcpu *v,
> +				   struct gfn_to_hva_cache *cache,
> +				   unsigned int offset)
>  {
>  	struct kvm_vcpu_arch *vcpu = &v->arch;
>  	struct pvclock_vcpu_time_info guest_hv_clock;
>  
> -	if (unlikely(kvm_read_guest_cached(v->kvm, &vcpu->pv_time,
> -		&guest_hv_clock, sizeof(guest_hv_clock))))
> +	if (unlikely(kvm_read_guest_offset_cached(v->kvm, cache,
> +		&guest_hv_clock, offset, sizeof(guest_hv_clock))))
>  		return;
>  
>  	/* This VCPU is paused, but it's legal for a guest to read another
> @@ -2611,9 +2613,9 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
>  		++guest_hv_clock.version;  /* first time write, random junk */
>  
>  	vcpu->hv_clock.version = guest_hv_clock.version + 1;
> -	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
> -				&vcpu->hv_clock,
> -				sizeof(vcpu->hv_clock.version));
> +	kvm_write_guest_offset_cached(v->kvm, cache,
> +				      &vcpu->hv_clock, offset,
> +				      sizeof(vcpu->hv_clock.version));
>  
>  	smp_wmb();
>  
> @@ -2627,16 +2629,16 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
>  
>  	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
>  
> -	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
> -				&vcpu->hv_clock,
> -				sizeof(vcpu->hv_clock));
> +	kvm_write_guest_offset_cached(v->kvm, cache,
> +				      &vcpu->hv_clock, offset,
> +				      sizeof(vcpu->hv_clock));
>  
>  	smp_wmb();
>  
>  	vcpu->hv_clock.version++;
> -	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
> -				&vcpu->hv_clock,
> -				sizeof(vcpu->hv_clock.version));
> +	kvm_write_guest_offset_cached(v->kvm, cache,
> +				     &vcpu->hv_clock, offset,
> +				     sizeof(vcpu->hv_clock.version));
>  }
>  
>  static int kvm_guest_time_update(struct kvm_vcpu *v)
> @@ -2723,7 +2725,10 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  	vcpu->hv_clock.flags = pvclock_flags;
>  
>  	if (vcpu->pv_time_enabled)
> -		kvm_setup_pvclock_page(v);
> +		kvm_setup_pvclock_page(v, &vcpu->pv_time, 0);
> +	if (vcpu->xen.vcpu_info_set)
> +		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_info_cache,
> +				       offsetof(struct compat_vcpu_info, time));

We might be missing the case where only shared_info is registered. Something like:

	if (vcpu->xen.shinfo_set && !vcpu->xen.vcpu_info_set) {
		offset = offsetof(struct compat_vcpu_info, time);
                offset += offsetof(struct shared_info, vcpu_info);
                offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct compat_vcpu_info);
		
		kvm_setup_pvclock_page(v, &vcpu->xen.shinfo_cache, offset);
	}

Part of the reason I had a kvm_xen_setup_pvclock_page() was to handle this all these
combinations i.e. 1) shared_info && !vcpu_info 2) vcpu_info and unilaterially updating
secondary time info.

But maybe introducing this xen_vcpu_info() helper to accommodate all this.

>  	if (v == kvm_get_vcpu(v->kvm, 0))
>  		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
>  	return 0;
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 4bc72e0b9928..d2055b60fdc1 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -82,6 +82,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		/* No compat necessary here. */
>  		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
>  			     sizeof(struct compat_vcpu_info));
> +		BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
> +			     offsetof(struct compat_vcpu_info, time));
> +
>  		r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.vcpu_info_cache,
>  					      data->u.vcpu_attr.gpa,
>  					      sizeof(struct vcpu_info));
> @@ -89,6 +92,7 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  			return r;
>  
>  		v->arch.xen.vcpu_info_set = true;
> +		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>  		break;
>  
>  	default:
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall
  2020-12-14 13:19   ` Joao Martins
@ 2020-12-14 13:32     ` Durrant, Paul
  2020-12-14 14:57     ` David Woodhouse
  1 sibling, 0 replies; 53+ messages in thread
From: Durrant, Paul @ 2020-12-14 13:32 UTC (permalink / raw)
  To: Joao Martins, David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	Graf (AWS), Alexander, Aslanidis (AWS),
	Ioannis, Agache, Alexandru, Florescu, Andreea, kvm

> -----Original Message-----
> From: Joao Martins <joao.m.martins@oracle.com>
> Sent: 14 December 2020 13:20
> To: David Woodhouse <dwmw2@infradead.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Ankur Arora <ankur.a.arora@oracle.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Sean Christopherson <seanjc@google.com>; Graf (AWS), Alexander
> <graf@amazon.de>; Aslanidis (AWS), Ioannis <iaslan@amazon.de>; Durrant, Paul <pdurrant@amazon.co.uk>;
> Agache, Alexandru <aagch@amazon.com>; Florescu, Andreea <fandree@amazon.com>; kvm@vger.kernel.org
> Subject: RE: [EXTERNAL] [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 12/14/20 8:39 AM, David Woodhouse wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index df44d9e50adc..e627139cf8cd 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8896,7 +8896,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >                       kvm_x86_ops.msr_filter_changed(vcpu);
> >       }
> >
> > -     if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > +     if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
> > +         kvm_xen_has_interrupt(vcpu)) {
> >               ++vcpu->stat.req_event;
> >               kvm_apic_accept_events(vcpu);
> >               if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 17cbb4462b7e..4bc9da9fcfb8 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -176,6 +176,45 @@ void kvm_xen_setup_runstate_page(struct kvm_vcpu *v)
> >       kvm_xen_update_runstate(v, RUNSTATE_running, steal_time);
> >  }
> >
> > +int kvm_xen_has_interrupt(struct kvm_vcpu *v)
> > +{
> > +     u8 rc = 0;
> > +
> > +     /*
> > +      * If the global upcall vector (HVMIRQ_callback_vector) is set and
> > +      * the vCPU's evtchn_upcall_pending flag is set, the IRQ is pending.
> > +      */
> > +     if (v->arch.xen.vcpu_info_set && v->kvm->arch.xen.upcall_vector) {
> > +             struct gfn_to_hva_cache *ghc = &v->arch.xen.vcpu_info_cache;
> > +             struct kvm_memslots *slots = kvm_memslots(v->kvm);
> > +             unsigned int offset = offsetof(struct vcpu_info, evtchn_upcall_pending);
> > +
> 
> To have less nesting, wouldn't it be better to invert the logic?
> 
> say:
> 
>         u8 rc = 0;
>         struct gfn_to_hva_cache *ghc
>         struct kvm_memslots *slots;
>         unsigned int offset;
> 
> 
>         if (!v->arch.xen.vcpu_info_set || !v->kvm->arch.xen.upcall_vector)
>                 return 0;
> 
>         BUILD_BUG_ON(...)
> 
>         ghc = &v->arch.xen.vcpu_info_cache;
>         slots = kvm_memslots(v->kvm);
>         offset = offsetof(struct vcpu_info, evtchn_upcall_pending);
> 
> But I think there's a flaw here. That is handling the case where you don't have a
> vcpu_info registered, and only shared info. The vcpu_info is then placed elsewhere, i.e.
> another offset out of shared_info -- which is *I think* the case for PVHVM Windows guests.

Only been true for Windows for about a week ;-)

  Paul

> 
> Perhaps introducing a helper which adds xen_vcpu_info() and returns you the hva (picking
> the right cache) similar to the RFC patch. Albeit that was with page pinning, but
> borrowing an older version I had with hva_to_gfn_cache incantation would probably look like:
> 
> 
>         if (v->arch.xen.vcpu_info_set) {
>                 ghc = &v->arch.xen.vcpu_info_cache;
>         } else {
>                 ghc = &v->arch.xen.vcpu_info_cache;
>                 offset += offsetof(struct shared_info, vcpu_info);
>                 offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct vcpu_info);
>         }
> 
>         if (likely(slots->generation == ghc->generation &&
>                    !kvm_is_error_hva(ghc->hva) && ghc->memslot)) {
>                 /* Fast path */
>                 __get_user(rc, (u8 __user *)ghc->hva + offset);
>         } else {
>                 /* Slow path */
>                 kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
>                                              sizeof(rc));
>         }
> 
>  ?
> 
>         Joao

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall
  2020-12-14 13:19   ` Joao Martins
  2020-12-14 13:32     ` Durrant, Paul
@ 2020-12-14 14:57     ` David Woodhouse
  2020-12-14 15:13       ` Joao Martins
  1 sibling, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 14:57 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]

On Mon, 2020-12-14 at 13:19 +0000, Joao Martins wrote:
> But I think there's a flaw here. That is handling the case where you don't have a
> vcpu_info registered, and only shared info. The vcpu_info is then placed elsewhere, i.e.
> another offset out of shared_info -- which is *I think* the case for PVHVM Windows guests.

There is no such case any more. In my v3 patch set I *stopped* the
kernel from attempting to use the vcpu_info embedded in the shinfo, and
went to *requiring* that the VMM explicitly tell the kernel where it
is.

$ git diff xenpv-post-2..xenpv-post-3 -- Documentation
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index d98c2ff90880..d1c30105e6fd 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4834,7 +4834,7 @@ experience inconsistent filtering behavior on MSR accesses.
                        __u64 gfn;
                } shared_info;
                struct {
-                       __u32 vcpu;
+                       __u32 vcpu_id;
                        __u64 gpa;
                } vcpu_attr;
                __u64 pad[4];
@@ -4849,9 +4849,13 @@ KVM_XEN_ATTR_TYPE_LONG_MODE
 
 KVM_XEN_ATTR_TYPE_SHARED_INFO
   Sets the guest physical frame number at which the Xen "shared info"
-  page resides. It is the default location for the vcpu_info for the
-  first 32 vCPUs, and contains other VM-wide data structures shared
-  between the VM and the host.
+  page resides. Note that although Xen places vcpu_info for the first
+  32 vCPUs in the shared_info page, KVM does not automatically do so
+  and requires that KVM_XEN_ATTR_TYPE_VCPU_INFO be used explicitly
+  even when the vcpu_info for a given vCPU resides at the "default"
+  location in the shared_info page. This is because KVM is not aware
+  of the Xen CPU id which is used as the index into the vcpu_info[]
+  array, so cannot know the correct default location.
 
 KVM_XEN_ATTR_TYPE_VCPU_INFO
   Sets the guest physical address of the vcpu_info for a given vCPU.

> Perhaps introducing a helper which adds xen_vcpu_info() and returns you the hva (picking
> the right cache) similar to the RFC patch. Albeit that was with page pinning, but
> borrowing an older version I had with hva_to_gfn_cache incantation would probably look like:
> 
> 
>         if (v->arch.xen.vcpu_info_set) {
>                 ghc = &v->arch.xen.vcpu_info_cache;
>         } else {
>                 ghc = &v->arch.xen.vcpu_info_cache;
>                 offset += offsetof(struct shared_info, vcpu_info);
>                 offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct vcpu_info);
>         }

The problem is that we *can't* just use shared_info->vcpu_info[N]
because we don't know what N is.

This is what I spent half of last week chasing down, because whatever I
tried, the original version just ended up delivering interrupts to the
wrong CPUs.

The N we want there is actually the *XEN* vcpu_id, which Xen puts into
CPUID leaf 0x40000004.ebx and which is also the ACPI ID. (Those two had
better match up since Linux guests use CPUID 0x40000004 for the BSP and
ACPI for the APs).

The kernel knows nothing of those numbers. In particular they do *not*
match up to the indices in kvm->vcpus[M] (which is vcpu->vcpu_idx and
means nothing except the chronological order in which each vCPU's
userspace thread happened to create it during startup), and they do not
match up to vcpu->vcpu_id which is the APIC (not ACPI) ID.

The kernel doesn't know. Let userspace tell it, since we already needed
that API for the separate vcpu_info case anyway.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates
  2020-12-14 13:29   ` Joao Martins
@ 2020-12-14 14:58     ` David Woodhouse
  2020-12-14 15:20       ` Joao Martins
  0 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 14:58 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

On Mon, 2020-12-14 at 13:29 +0000, Joao Martins wrote:
> We might be missing the case where only shared_info is registered. Something like:
> 
>         if (vcpu->xen.shinfo_set && !vcpu->xen.vcpu_info_set) {
>                 offset = offsetof(struct compat_vcpu_info, time);
>                 offset += offsetof(struct shared_info, vcpu_info);
>                 offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct compat_vcpu_info);
>                 
>                 kvm_setup_pvclock_page(v, &vcpu->xen.shinfo_cache, offset);
>         }
> 
> Part of the reason I had a kvm_xen_setup_pvclock_page() was to handle this all these
> combinations i.e. 1) shared_info && !vcpu_info 2) vcpu_info and unilaterially updating
> secondary time info.
> 
> But maybe introducing this xen_vcpu_info() helper to accommodate all this.

There was complexity.

I don't like complexity.

I made it go away.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/17] KVM: x86/xen: register shared_info page
  2020-12-14 12:53       ` Joao Martins
@ 2020-12-14 15:05         ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 15:05 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

On Mon, 2020-12-14 at 12:53 +0000, Joao Martins wrote:
> Ah OK -- I didn't notice @pv_time_enabled before suggesting this.
> 
> An helper would probably suffice without sacrificing footprint in
> data structures (because you will add 3 of these). The helpers would be, say:
> 
>         kvm_xen_shared_info_set(vcpu->kvm)
>         kvm_xen_vcpu_info_set(vcpu)
>         kvm_xen_vcpu_runstate_set(vcpu)
> 
> And maybe backed by a:
> 
>         kvm_gfn_to_hva_cache_valid(&v->kvm->arch.xen.shinfo_cache)
> 
> Which would check whether a cache is initialized or valid.
> 
> But anyhow, I don't feel strongly about it, specially if there's an existing
> one which sort of sets the precedent.

Maybe if I end up adding a fifth. But right now they aren't even taking
any space. They're just filling what would otherwise be a hole...

/* Xen HVM per vcpu emulation context */
struct kvm_vcpu_xen {
	u64 hypercall_rip;                                 // 0x00 
	u32 current_runstate;                              // 0x08
	bool vcpu_info_set;                                // 0x0c
	bool vcpu_time_info_set;                           // 0x0d
	bool runstate_set;                                 // 0x0e
        /* I can add one more before I care */             // 0x0f
	struct gfn_to_hva_cache vcpu_info_cache;           // 0x10
	struct gfn_to_hva_cache vcpu_time_info_cache;      // 0x38
	struct gfn_to_hva_cache runstate_cache;            // 0x60
	u64 last_steal;                                    // 0x88
	u64 last_state_ns;                                 // 0x90
};

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall
  2020-12-14 14:57     ` David Woodhouse
@ 2020-12-14 15:13       ` Joao Martins
  0 siblings, 0 replies; 53+ messages in thread
From: Joao Martins @ 2020-12-14 15:13 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 2:57 PM, David Woodhouse wrote:
> On Mon, 2020-12-14 at 13:19 +0000, Joao Martins wrote:
>> But I think there's a flaw here. That is handling the case where you don't have a
>> vcpu_info registered, and only shared info. The vcpu_info is then placed elsewhere, i.e.
>> another offset out of shared_info -- which is *I think* the case for PVHVM Windows guests.
> 
> There is no such case any more. In my v3 patch set I *stopped* the
> kernel from attempting to use the vcpu_info embedded in the shinfo, and
> went to *requiring* that the VMM explicitly tell the kernel where it
> is.
> 
Sigh yes, I forgot about that -- and you did mentioned it in earlier posts.

> $ git diff xenpv-post-2..xenpv-post-3 -- Documentation
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index d98c2ff90880..d1c30105e6fd 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4834,7 +4834,7 @@ experience inconsistent filtering behavior on MSR accesses.
>                         __u64 gfn;
>                 } shared_info;
>                 struct {
> -                       __u32 vcpu;
> +                       __u32 vcpu_id;
>                         __u64 gpa;
>                 } vcpu_attr;
>                 __u64 pad[4];
> @@ -4849,9 +4849,13 @@ KVM_XEN_ATTR_TYPE_LONG_MODE
>  
>  KVM_XEN_ATTR_TYPE_SHARED_INFO
>    Sets the guest physical frame number at which the Xen "shared info"
> -  page resides. It is the default location for the vcpu_info for the
> -  first 32 vCPUs, and contains other VM-wide data structures shared
> -  between the VM and the host.
> +  page resides. Note that although Xen places vcpu_info for the first
> +  32 vCPUs in the shared_info page, KVM does not automatically do so
> +  and requires that KVM_XEN_ATTR_TYPE_VCPU_INFO be used explicitly
> +  even when the vcpu_info for a given vCPU resides at the "default"
> +  location in the shared_info page. This is because KVM is not aware
> +  of the Xen CPU id which is used as the index into the vcpu_info[]
> +  array, so cannot know the correct default location.
>  
/me nods

>  KVM_XEN_ATTR_TYPE_VCPU_INFO
>    Sets the guest physical address of the vcpu_info for a given vCPU.
> 
>> Perhaps introducing a helper which adds xen_vcpu_info() and returns you the hva (picking
>> the right cache) similar to the RFC patch. Albeit that was with page pinning, but
>> borrowing an older version I had with hva_to_gfn_cache incantation would probably look like:
>>
>>
>>         if (v->arch.xen.vcpu_info_set) {
>>                 ghc = &v->arch.xen.vcpu_info_cache;
>>         } else {
>>                 ghc = &v->arch.xen.vcpu_info_cache;
>>                 offset += offsetof(struct shared_info, vcpu_info);
>>                 offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct vcpu_info);
>>         }
> 
> The problem is that we *can't* just use shared_info->vcpu_info[N]
> because we don't know what N is.
> 
> This is what I spent half of last week chasing down, because whatever I
> tried, the original version just ended up delivering interrupts to the
> wrong CPUs.
> 
> The N we want there is actually the *XEN* vcpu_id, which Xen puts into
> CPUID leaf 0x40000004.ebx and which is also the ACPI ID. (Those two had
> better match up since Linux guests use CPUID 0x40000004 for the BSP and
> ACPI for the APs).
> 
> The kernel knows nothing of those numbers. In particular they do *not*
> match up to the indices in kvm->vcpus[M] (which is vcpu->vcpu_idx and
> means nothing except the chronological order in which each vCPU's
> userspace thread happened to create it during startup), and they do not
> match up to vcpu->vcpu_id which is the APIC (not ACPI) ID.
> 
> The kernel doesn't know. Let userspace tell it, since we already needed
> that API for the separate vcpu_info case anyway.
> 

All this time, I was only considering the guest hypercalls in XENMEM_populate_physmap()
for shared_info, and register vcpu_info as the VMM usage for setting the corresponding
attributes. So you register the vcpu_info regardless of guest placement, and I didn't
correlate that and your earlier comment (and also forgot about it) that you used to remove
that complexity.

	Joao

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates
  2020-12-14 14:58     ` David Woodhouse
@ 2020-12-14 15:20       ` Joao Martins
  2020-12-14 15:40         ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Joao Martins @ 2020-12-14 15:20 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

On 12/14/20 2:58 PM, David Woodhouse wrote:
> On Mon, 2020-12-14 at 13:29 +0000, Joao Martins wrote:
>> We might be missing the case where only shared_info is registered. Something like:
>>
>>         if (vcpu->xen.shinfo_set && !vcpu->xen.vcpu_info_set) {
>>                 offset = offsetof(struct compat_vcpu_info, time);
>>                 offset += offsetof(struct shared_info, vcpu_info);
>>                 offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct compat_vcpu_info);
>>                 
>>                 kvm_setup_pvclock_page(v, &vcpu->xen.shinfo_cache, offset);
>>         }
>>
>> Part of the reason I had a kvm_xen_setup_pvclock_page() was to handle this all these
>> combinations i.e. 1) shared_info && !vcpu_info 2) vcpu_info and unilaterially updating
>> secondary time info.
>>
>> But maybe introducing this xen_vcpu_info() helper to accommodate all this.
> 
> There was complexity.
> 
> I don't like complexity.
> 
> I made it go away.
> 
Considering what you said earlier, yes, it would be unnecessary complexity.

	Joao

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates
  2020-12-14 15:20       ` Joao Martins
@ 2020-12-14 15:40         ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 15:40 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 1826 bytes --]

On Mon, 2020-12-14 at 15:20 +0000, Joao Martins wrote:
> On 12/14/20 2:58 PM, David Woodhouse wrote:
> > On Mon, 2020-12-14 at 13:29 +0000, Joao Martins wrote:
> > > We might be missing the case where only shared_info is registered. Something like:
> > > 
> > >          if (vcpu->xen.shinfo_set && !vcpu->xen.vcpu_info_set) {
> > >                  offset = offsetof(struct compat_vcpu_info, time);
> > >                  offset += offsetof(struct shared_info, vcpu_info);
> > >                  offset += (v - kvm_get_vcpu_by_id(0)) * sizeof(struct compat_vcpu_info);
> > >                  
> > >                  kvm_setup_pvclock_page(v, &vcpu->xen.shinfo_cache, offset);
> > >          }
> > > 
> > > Part of the reason I had a kvm_xen_setup_pvclock_page() was to handle this all these
> > > combinations i.e. 1) shared_info && !vcpu_info 2) vcpu_info and unilaterially updating
> > > secondary time info.
> > > 
> > > But maybe introducing this xen_vcpu_info() helper to accommodate all this.
> > 
> > There was complexity.
> > 
> > I don't like complexity.
> > 
> > I made it go away.
> > 
> 
> Considering what you said earlier, yes, it would be unnecessary complexity.

To be fair I don't really make it go away completely; I push it up into
userspace. Which now has to keep track of whether a given vCPU has an
explicitly-set vcpu_info page, or whether it has just used the default
one in the shared_info.

And if the shared_info *changes*, as it might on a kexec or just guest
weirdness, the VMM should ideally change only those vCPUs which were at
the default location.

When it was just v->vcpu_info?:shinfo->vcpu_info[N] in the kernel that
part was slightly simpler. It was just slightly hampered by the fact
that the kernel has no way of knowing what N should be :)



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 14/17] KVM: x86/xen: register runstate info
  2020-12-14 11:10   ` Joao Martins
@ 2020-12-14 15:47     ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 15:47 UTC (permalink / raw)
  To: Joao Martins
  Cc: Paolo Bonzini, Ankur Arora, Boris Ostrovsky, Sean Christopherson,
	graf, iaslan, pdurrant, aagch, fandree, kvm

[-- Attachment #1: Type: text/plain, Size: 608 bytes --]

On Mon, 2020-12-14 at 11:10 +0000, Joao Martins wrote:
> > @@ -54,7 +56,12 @@ struct compat_shared_info {
> >        uint32_t evtchn_mask[32];
> >        struct pvclock_wall_clock wc;
> >        struct compat_arch_shared_info arch;
> > -
> >   };
> >   
> 
> This change belos in the seventh patch I believe.

Ack. I've done that, as well as fixing the spurious addition and
removal of kvm_xen_destroy_vm() in xen.h, in my tree. I've also rebased
onto kvm/master since that now contains the ExtINT injection fixes.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xenpv


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-14  8:38 ` [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn() David Woodhouse
@ 2020-12-14 21:13   ` Vitaly Kuznetsov
  2020-12-14 21:21     ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-14 21:13 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

> From: David Woodhouse <dwmw@amazon.co.uk>
>
> It shouldn't take a vcpu.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/x86.c       | 8 ++++----
>  include/linux/kvm_host.h | 4 ++--
>  virt/kvm/kvm_main.c      | 8 ++++----
>  3 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e545a8a613b1..c7f1ba21212e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2957,7 +2957,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  		return;
>  
>  	/* -EAGAIN is returned in atomic context so we can just return. */
> -	if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
> +	if (kvm_map_gfn(vcpu->kvm, vcpu->arch.st.msr_val >> PAGE_SHIFT,
>  			&map, &vcpu->arch.st.cache, false))
>  		return;
>  
> @@ -2992,7 +2992,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  
>  	st->version += 1;
>  
> -	kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
> +	kvm_unmap_gfn(vcpu->kvm, &map, &vcpu->arch.st.cache, true, false);
>  }
>  
>  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> @@ -3981,7 +3981,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.st.preempted)
>  		return;
>  
> -	if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
> +	if (kvm_map_gfn(vcpu->kvm, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
>  			&vcpu->arch.st.cache, true))
>  		return;
>  
> @@ -3990,7 +3990,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
>  
>  	st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
>  
> -	kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
> +	kvm_unmap_gfn(vcpu->kvm, &map, &vcpu->arch.st.cache, true, true);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7f2e2a09ebbd..8eb5eb1399f5 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -806,11 +806,11 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
>  kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
>  kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
>  int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
> -int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
> +int kvm_map_gfn(struct kvm *kvm, gfn_t gfn, struct kvm_host_map *map,
>  		struct gfn_to_pfn_cache *cache, bool atomic);
>  struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn);
>  void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
> -int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
> +int kvm_unmap_gfn(struct kvm *kvm, struct kvm_host_map *map,
>  		  struct gfn_to_pfn_cache *cache, bool dirty, bool atomic);
>  unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
>  unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2541a17ff1c4..f01a8df7806a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2181,10 +2181,10 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
>  	return 0;
>  }
>  
> -int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
> +int kvm_map_gfn(struct kvm *kvm, gfn_t gfn, struct kvm_host_map *map,
>  		struct gfn_to_pfn_cache *cache, bool atomic)
>  {
> -	return __kvm_map_gfn(kvm_memslots(vcpu->kvm), gfn, map,
> +	return __kvm_map_gfn(kvm_memslots(kvm), gfn, map,
>  			cache, atomic);
>  }
>  EXPORT_SYMBOL_GPL(kvm_map_gfn);
> @@ -2232,10 +2232,10 @@ static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot,
>  	map->page = NULL;
>  }
>  
> -int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map, 
> +int kvm_unmap_gfn(struct kvm *kvm, struct kvm_host_map *map,
>  		  struct gfn_to_pfn_cache *cache, bool dirty, bool atomic)
>  {
> -	__kvm_unmap_gfn(gfn_to_memslot(vcpu->kvm, map->gfn), map,
> +	__kvm_unmap_gfn(gfn_to_memslot(kvm, map->gfn), map,
>  			cache, dirty, atomic);
>  	return 0;
>  }

What about different address space ids? 

gfn_to_memslot() now calls kvm_memslots() which gives memslots for
address space id = 0 but what if we want something different? Note,
different vCPUs can (in theory) be in different address spaces so we
actually need 'vcpu' and not 'kvm' then.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-14 21:13   ` Vitaly Kuznetsov
@ 2020-12-14 21:21     ` David Woodhouse
  2020-12-14 21:41       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 21:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree



On 14 December 2020 21:13:40 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>What about different address space ids? 
>
>gfn_to_memslot() now calls kvm_memslots() which gives memslots for
>address space id = 0 but what if we want something different? Note,
>different vCPUs can (in theory) be in different address spaces so we
>actually need 'vcpu' and not 'kvm' then.

Sure, but then you want a different function; this one is called 'kvm_map_gfn()' and it operates on kvm_memslots(). It *doesn't* operate on the vcpu at all.

Which is why it's so bizarre that its argument is a 'vcpu' which it only ever uses to get vcpu->kvm from it. It should just take the kvm pointer.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14  8:38 ` [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling David Woodhouse
@ 2020-12-14 21:27   ` Vitaly Kuznetsov
  2020-12-14 21:35     ` David Woodhouse
  2020-12-23  8:35   ` Christoph Hellwig
  1 sibling, 1 reply; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-14 21:27 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

> From: Joao Martins <joao.m.martins@oracle.com>
>
> Xen usually places its MSR at 0x40000000 or 0x40000200 depending on
> whether it is running in viridian mode or not. Note that this is not
> ABI guaranteed, so it is possible for Xen to advertise the MSR some
> place else.
>
> Given the way xen_hvm_config() is handled, if the former address is
> selected, this will conflict with Hyper-V's MSR
> (HV_X64_MSR_GUEST_OS_ID) which unconditionally uses the same address.
>
> Given that the MSR location is arbitrary, move the xen_hvm_config()
> handling to the top of kvm_set_msr_common() before falling through.
>

In case we're making MSR 0x40000000 something different from
HV_X64_MSR_GUEST_OS_ID we can and probably should disable Hyper-V
emulation in KVM completely -- or how else is it going to work?

> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/x86.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c7f1ba21212e..13ba4a64f748 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3001,6 +3001,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	u32 msr = msr_info->index;
>  	u64 data = msr_info->data;
>  
> +	if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
> +		return xen_hvm_config(vcpu, data);
> +

Can we generalize this maybe? E.g. before handling KVM and architectural
MSRs we check that the particular MSR is not overriden by an emulated
hypervisor, 

e.g.
	if (kvm_emulating_hyperv(kvm) && kvm_hyperv_msr_overriden(kvm,msr)
		return kvm_hyperv_handle_msr(kvm, msr);
	if (kvm_emulating_xen(kvm) && kvm_xen_msr_overriden(kvm,msr)
		return kvm_xen_handle_msr(kvm, msr);

	switch (msr) {
		...

>  	switch (msr) {
>  	case MSR_AMD64_NB_CFG:
>  	case MSR_IA32_UCODE_WRITE:
> @@ -3288,8 +3291,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		vcpu->arch.msr_misc_features_enables = data;
>  		break;
>  	default:
> -		if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
> -			return xen_hvm_config(vcpu, data);
>  		if (kvm_pmu_is_valid_msr(vcpu, msr))
>  			return kvm_pmu_set_msr(vcpu, msr_info);
>  		return KVM_MSR_RET_INVALID;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14 21:27   ` Vitaly Kuznetsov
@ 2020-12-14 21:35     ` David Woodhouse
  2020-12-14 21:44       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 21:35 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree



On 14 December 2020 21:27:19 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>David Woodhouse <dwmw2@infradead.org> writes:
>
>> From: Joao Martins <joao.m.martins@oracle.com>
>>
>> Xen usually places its MSR at 0x40000000 or 0x40000200 depending on
>> whether it is running in viridian mode or not. Note that this is not
>> ABI guaranteed, so it is possible for Xen to advertise the MSR some
>> place else.
>>
>> Given the way xen_hvm_config() is handled, if the former address is
>> selected, this will conflict with Hyper-V's MSR
>> (HV_X64_MSR_GUEST_OS_ID) which unconditionally uses the same address.
>>
>> Given that the MSR location is arbitrary, move the xen_hvm_config()
>> handling to the top of kvm_set_msr_common() before falling through.
>>
>
>In case we're making MSR 0x40000000 something different from
>HV_X64_MSR_GUEST_OS_ID we can and probably should disable Hyper-V
>emulation in KVM completely -- or how else is it going to work?
 
The way Xen itself does this — and the way we have to do it if we want to faithfully emulate Xen and support live migration from it — is to shift the Xen MSRs up to (from memory) 0x40000200 if Hyper-V is enabled.

I did look at disabling Hyper-V entirely when it isn't enabled, but the only flag we have for it being enabled is the guest OS ID being set... which is done through that MSR :)

My minimal version ended up being so close to Joao's original that it was not longer worth the bikeshedding and so I gave up on it and stuck with the original.


>> @@ -3001,6 +3001,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu,
>struct msr_data *msr_info)
>>  	u32 msr = msr_info->index;
>>  	u64 data = msr_info->data;
>>  
>> +	if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
>> +		return xen_hvm_config(vcpu, data);
>> +
>
>Can we generalize this maybe? E.g. before handling KVM and
>architectural
>MSRs we check that the particular MSR is not overriden by an emulated
>hypervisor, 
>
>e.g.
>	if (kvm_emulating_hyperv(kvm) && kvm_hyperv_msr_overriden(kvm,msr)
>		return kvm_hyperv_handle_msr(kvm, msr);
>	if (kvm_emulating_xen(kvm) && kvm_xen_msr_overriden(kvm,msr)
>		return kvm_xen_handle_msr(kvm, msr);

That smells a bit like overengineering. As I said, I did have a play with "improving" Joao's original patch but nothing I tried actually made more sense to me than this once the details were ironed out.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-14 21:21     ` David Woodhouse
@ 2020-12-14 21:41       ` Vitaly Kuznetsov
  2020-12-14 21:45         ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-14 21:41 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

> On 14 December 2020 21:13:40 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>What about different address space ids? 
>>
>>gfn_to_memslot() now calls kvm_memslots() which gives memslots for
>>address space id = 0 but what if we want something different? Note,
>>different vCPUs can (in theory) be in different address spaces so we
>>actually need 'vcpu' and not 'kvm' then.
>
> Sure, but then you want a different function; this one is called
> 'kvm_map_gfn()' and it operates on kvm_memslots(). It *doesn't*
> operate on the vcpu at all.
>

Actually, we already have kvm_vcpu_map() which uses kvm_vcpu_memslots()
inside so no new function is needed.

> Which is why it's so bizarre that its argument is a 'vcpu' which it
> only ever uses to get vcpu->kvm from it. It should just take the kvm
> pointer.

Your change is correct but I'm not sure that it's entirely clear that
kvm_map_gfn() implicitly uses 'as_id=0' and I don't even see a comment
about the fact :-(

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14 21:35     ` David Woodhouse
@ 2020-12-14 21:44       ` Vitaly Kuznetsov
  2020-12-14 21:48         ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-14 21:44 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

>>> @@ -3001,6 +3001,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu,
>>struct msr_data *msr_info)
>>>  	u32 msr = msr_info->index;
>>>  	u64 data = msr_info->data;
>>>  
>>> +	if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
>>> +		return xen_hvm_config(vcpu, data);
>>> +
>>
>>Can we generalize this maybe? E.g. before handling KVM and
>>architectural
>>MSRs we check that the particular MSR is not overriden by an emulated
>>hypervisor, 
>>
>>e.g.
>>	if (kvm_emulating_hyperv(kvm) && kvm_hyperv_msr_overriden(kvm,msr)
>>		return kvm_hyperv_handle_msr(kvm, msr);
>>	if (kvm_emulating_xen(kvm) && kvm_xen_msr_overriden(kvm,msr)
>>		return kvm_xen_handle_msr(kvm, msr);
>
> That smells a bit like overengineering. As I said, I did have a play
> with "improving" Joao's original patch but nothing I tried actually
> made more sense to me than this once the details were ironed out.

This actually looks more or less like hypercall distinction from after PATCH3:

	if (kvm_xen_hypercall_enabled(vcpu->kvm))
		return kvm_xen_hypercall(vcpu);

        if (kvm_hv_hypercall_enabled(vcpu->kvm))
  	        return kvm_hv_hypercall(vcpu);

....

so my idea was why not do the same for MSRs?

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-14 21:41       ` Vitaly Kuznetsov
@ 2020-12-14 21:45         ` David Woodhouse
  2020-12-15 12:07           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 21:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree



On 14 December 2020 21:41:23 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>Your change is correct but I'm not sure that it's entirely clear that
>kvm_map_gfn() implicitly uses 'as_id=0' and I don't even see a comment
>about the fact :-(

Isn't that true of all the kvm_read_guest and kvm_write_guest functions and indeed of kvm_memslots() itself?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14 21:44       ` Vitaly Kuznetsov
@ 2020-12-14 21:48         ` David Woodhouse
  2020-12-14 22:22           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 21:48 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree



On 14 December 2020 21:44:47 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>This actually looks more or less like hypercall distinction from after
>PATCH3:
>
>	if (kvm_xen_hypercall_enabled(vcpu->kvm))
>		return kvm_xen_hypercall(vcpu);
>
>        if (kvm_hv_hypercall_enabled(vcpu->kvm))
>  	        return kvm_hv_hypercall(vcpu);
>
>....
>
>so my idea was why not do the same for MSRs?

Can you define kvm_hv_msr_enabled()?

Note kvm_hv_hypercall_enabled() is based on a value that gets written through the MSR, so it can't be that.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14 21:48         ` David Woodhouse
@ 2020-12-14 22:22           ` Vitaly Kuznetsov
  2020-12-14 22:41             ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-14 22:22 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

> On 14 December 2020 21:44:47 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>This actually looks more or less like hypercall distinction from after
>>PATCH3:
>>
>>	if (kvm_xen_hypercall_enabled(vcpu->kvm))
>>		return kvm_xen_hypercall(vcpu);
>>
>>        if (kvm_hv_hypercall_enabled(vcpu->kvm))
>>  	        return kvm_hv_hypercall(vcpu);
>>
>>....
>>
>>so my idea was why not do the same for MSRs?
>
> Can you define kvm_hv_msr_enabled()?
>
> Note kvm_hv_hypercall_enabled() is based on a value that gets written
> through the MSR, so it can't be that.

When Hyper-V emulation appeared in KVM we (unfortunately) didn't add a
capability to globaly enable and disable it so to be backwards
compatible we'll have to define kvm_emulating_hyperv() as 'true' for
now as that's how KVM behaves. This, however, doesn't mean we can't add
e.g. a module parameter to disable Hyper-V emulation. Also, we can
probably check guest CPUIDs and if Hyper-V's signature wasn't set we can
return 'false'.

<rant>
Having Hyper-V emulation in KVM 'always enabled' may not be a big deal
from functional point of view but may not be ideal from security
standpoint as bugs in arch/x86/kvm/hyperv.c become exploitable even from
Linux guests.
</rant>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14 22:22           ` Vitaly Kuznetsov
@ 2020-12-14 22:41             ` David Woodhouse
  2020-12-15 12:10               ` Vitaly Kuznetsov
  0 siblings, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2020-12-14 22:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On Mon, 2020-12-14 at 23:22 +0100, Vitaly Kuznetsov wrote:
> > Can you define kvm_hv_msr_enabled()?
> > 
> > Note kvm_hv_hypercall_enabled() is based on a value that gets written
> > through the MSR, so it can't be that.
> 
> When Hyper-V emulation appeared in KVM we (unfortunately) didn't add a
> capability to globaly enable and disable it so to be backwards
> compatible we'll have to define kvm_emulating_hyperv() as 'true' for
> now as that's how KVM behaves. This, however, doesn't mean we can't add
> e.g. a module parameter to disable Hyper-V emulation. Also, we can
> probably check guest CPUIDs and if Hyper-V's signature wasn't set we can
> return 'false'.
> 
> <rant>
> Having Hyper-V emulation in KVM 'always enabled' may not be a big deal
> from functional point of view but may not be ideal from security
> standpoint as bugs in arch/x86/kvm/hyperv.c become exploitable even from
> Linux guests.
> </rant>

Indeed. And yet it can coexist with Xen support too, so it isn't even
as simple as turning it off when Xen is enabled.

Which is why I ended up just using Joao's patch unchanged. Short of
going back in time to make Hyper-V support conditional when it was
first introduced, I couldn't see a better answer.

And regardless of the Hyper-V mess, what this patch does for Xen is
precisely what you suggest: handle it first, before the switch(), *if*
the Xen MSR is enabled.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-14 21:45         ` David Woodhouse
@ 2020-12-15 12:07           ` Vitaly Kuznetsov
  2020-12-15 12:45             ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-15 12:07 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

> On 14 December 2020 21:41:23 GMT, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>Your change is correct but I'm not sure that it's entirely clear that
>>kvm_map_gfn() implicitly uses 'as_id=0' and I don't even see a comment
>>about the fact :-(
>
> Isn't that true of all the kvm_read_guest and kvm_write_guest
> functions and indeed of kvm_memslots() itself?

Yes, sadly. Multiple address spaces support was added to KVM as a
generic feature but the only use-case for it at this moment is SMM on
x86 which is 'special', i.e. currently there's only one user for 
kvm_map_gfn()/kvm_unmap_gfn() which is steal time accounting and it's
not easy to come up with a use-case when this PV feature needs to be
used from SMM. On the other hand, if we try using multiple address
spaces in KVM for e.g. emulating something like Hyper-V VSM, it becomes
unclear which address space id needs to be used even for steal
time. To be entirely correct, we probably need to remember as_id which
was active when steal time was enabled and stick to it later when we
want to update the data. If we do that, kvm_map_gfn() will lose its only
user.

All the above doesn't make your patch incorrect of course, I just used
it to express my observation that we seem to be using as_id=0 blindly
and the API we have contributes to that.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14 22:41             ` David Woodhouse
@ 2020-12-15 12:10               ` Vitaly Kuznetsov
  0 siblings, 0 replies; 53+ messages in thread
From: Vitaly Kuznetsov @ 2020-12-15 12:10 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

David Woodhouse <dwmw2@infradead.org> writes:

> On Mon, 2020-12-14 at 23:22 +0100, Vitaly Kuznetsov wrote:
>> > Can you define kvm_hv_msr_enabled()?
>> > 
>> > Note kvm_hv_hypercall_enabled() is based on a value that gets written
>> > through the MSR, so it can't be that.
>> 
>> When Hyper-V emulation appeared in KVM we (unfortunately) didn't add a
>> capability to globaly enable and disable it so to be backwards
>> compatible we'll have to define kvm_emulating_hyperv() as 'true' for
>> now as that's how KVM behaves. This, however, doesn't mean we can't add
>> e.g. a module parameter to disable Hyper-V emulation. Also, we can
>> probably check guest CPUIDs and if Hyper-V's signature wasn't set we can
>> return 'false'.
>> 
>> <rant>
>> Having Hyper-V emulation in KVM 'always enabled' may not be a big deal
>> from functional point of view but may not be ideal from security
>> standpoint as bugs in arch/x86/kvm/hyperv.c become exploitable even from
>> Linux guests.
>> </rant>
>
> Indeed. And yet it can coexist with Xen support too, so it isn't even
> as simple as turning it off when Xen is enabled.
>
> Which is why I ended up just using Joao's patch unchanged. Short of
> going back in time to make Hyper-V support conditional when it was
> first introduced, I couldn't see a better answer.
>
> And regardless of the Hyper-V mess, what this patch does for Xen is
> precisely what you suggest: handle it first, before the switch(), *if*
> the Xen MSR is enabled.

Functionally I have no complaints, even with the suggested
'generalization' we'll be handling MSRs in the exact same sequence. You
are, however, right calling Hyper-V mess 'mess' and if we want to make
things cleaner we should probably start there (goes to my to-do
list...).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn()
  2020-12-15 12:07           ` Vitaly Kuznetsov
@ 2020-12-15 12:45             ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-15 12:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On Tue, 2020-12-15 at 13:07 +0100, Vitaly Kuznetsov wrote:
> Yes, sadly. Multiple address spaces support was added to KVM as a
> generic feature but the only use-case for it at this moment is SMM on
> x86 which is 'special', i.e. currently there's only one user for 
> kvm_map_gfn()/kvm_unmap_gfn() which is steal time accounting and it's
> not easy to come up with a use-case when this PV feature needs to be
> used from SMM. On the other hand, if we try using multiple address
> spaces in KVM for e.g. emulating something like Hyper-V VSM, it becomes
> unclear which address space id needs to be used even for steal
> time. To be entirely correct, we probably need to remember as_id which
> was active when steal time was enabled and stick to it later when we
> want to update the data. If we do that, kvm_map_gfn() will lose its only
> user.

I'm also OK with just completely deleting kvm_map_gfn(). I'm not even
using it any more myself; I just kept this patch in my v3 series
because it's a sane cleanup.

The steal time accounting doesn't *need* to have a kernel mapping of
the page just so it can do atomic xchg on it; we can do that just fine
on userspace addresses, and do exactly that for futexes.

So I'm probably going to knock up a kvm_cmpxchg_guest_offset_cached()
and make the steal time code use that, on the *userspace* address.
There's already a futex_atomic_cmpxchg_inatomic() which does the actual
operation. (It returns -EFAULT for an absent page and lets the caller
worry about faulting it in, which isn't quite what I want, but it's a
good start and I can either wrap the existing arch-provided function
like futexes do, or use it as a basis for a version that waits.)

And by an amazing coincidence that's also the function I need for
accelerating Xen event channel support... :)


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling
  2020-12-14  8:38 ` [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling David Woodhouse
  2020-12-14 21:27   ` Vitaly Kuznetsov
@ 2020-12-23  8:35   ` Christoph Hellwig
  1 sibling, 0 replies; 53+ messages in thread
From: Christoph Hellwig @ 2020-12-23  8:35 UTC (permalink / raw)
  To: David Woodhouse
  Cc: kvm, Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

> +	if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
> +		return xen_hvm_config(vcpu, data);

Nit: no need for the inner braces.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled
  2020-12-14  8:38 ` [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled David Woodhouse
@ 2020-12-23  8:36   ` Christoph Hellwig
  2020-12-23 10:51     ` David Woodhouse
  0 siblings, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2020-12-23  8:36 UTC (permalink / raw)
  To: David Woodhouse
  Cc: kvm, Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree

I think all the Xen support code should be conditional on a new config
option so that normal configs don't have to build the code (and open
potential attack vectors).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled
  2020-12-23  8:36   ` Christoph Hellwig
@ 2020-12-23 10:51     ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2020-12-23 10:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kvm, Paolo Bonzini, Ankur Arora, Joao Martins, Boris Ostrovsky,
	Sean Christopherson, graf, iaslan, pdurrant, aagch, fandree



On 23 December 2020 08:36:44 GMT, Christoph Hellwig <hch@infradead.org> wrote:
>I think all the Xen support code should be conditional on a new config
>option so that normal configs don't have to build the code (and open
>potential attack vectors).

None of this is usable by the guest (as an attack vector or otherwise) unless explicitly enabled by the VMM. And for clock stuff it's even using the *same* functions that are used for native KVM guests, just parameterised a little for the different locations.

I saw the previous discussion but didn't really think it was worth adding yet another build time option for the kernel.

Even when I add event channel support I plan to use the same user_cmpxchg functionality to clean up the KVM steal time handling and even that part won't be Xen-specific.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2020-12-23 10:52 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-14  8:38 [PATCH v3 00/17] KVM: Add minimal support for Xen HVM guests David Woodhouse
2020-12-14  8:38 ` [PATCH v3 01/17] KVM: Fix arguments to kvm_{un,}map_gfn() David Woodhouse
2020-12-14 21:13   ` Vitaly Kuznetsov
2020-12-14 21:21     ` David Woodhouse
2020-12-14 21:41       ` Vitaly Kuznetsov
2020-12-14 21:45         ` David Woodhouse
2020-12-15 12:07           ` Vitaly Kuznetsov
2020-12-15 12:45             ` David Woodhouse
2020-12-14  8:38 ` [PATCH v3 02/17] KVM: x86/xen: fix Xen hypercall page msr handling David Woodhouse
2020-12-14 21:27   ` Vitaly Kuznetsov
2020-12-14 21:35     ` David Woodhouse
2020-12-14 21:44       ` Vitaly Kuznetsov
2020-12-14 21:48         ` David Woodhouse
2020-12-14 22:22           ` Vitaly Kuznetsov
2020-12-14 22:41             ` David Woodhouse
2020-12-15 12:10               ` Vitaly Kuznetsov
2020-12-23  8:35   ` Christoph Hellwig
2020-12-14  8:38 ` [PATCH v3 03/17] KVM: x86/xen: intercept xen hypercalls if enabled David Woodhouse
2020-12-23  8:36   ` Christoph Hellwig
2020-12-23 10:51     ` David Woodhouse
2020-12-14  8:38 ` [PATCH v3 04/17] KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls David Woodhouse
2020-12-14  8:38 ` [PATCH v3 05/17] KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTR David Woodhouse
2020-12-14  8:38 ` [PATCH v3 06/17] KVM: x86/xen: latch long_mode when hypercall page is set up David Woodhouse
2020-12-14  8:38 ` [PATCH v3 07/17] KVM: x86/xen: add definitions of compat_shared_info, compat_vcpu_info David Woodhouse
2020-12-14  8:38 ` [PATCH v3 08/17] KVM: x86/xen: register shared_info page David Woodhouse
2020-12-14 10:45   ` Joao Martins
2020-12-14 11:30     ` Joao Martins
2020-12-14 12:04       ` David Woodhouse
2020-12-14 12:02     ` David Woodhouse
2020-12-14 12:53       ` Joao Martins
2020-12-14 15:05         ` David Woodhouse
2020-12-14  8:38 ` [PATCH v3 09/17] xen: add wc_sec_hi to struct shared_info David Woodhouse
2020-12-14  8:38 ` [PATCH v3 10/17] KVM: x86/xen: update wallclock region David Woodhouse
2020-12-14  8:38 ` [PATCH v3 11/17] KVM: x86/xen: register vcpu info David Woodhouse
2020-12-14 10:48   ` Joao Martins
2020-12-14  8:39 ` [PATCH v3 12/17] KVM: x86/xen: setup pvclock updates David Woodhouse
2020-12-14 13:29   ` Joao Martins
2020-12-14 14:58     ` David Woodhouse
2020-12-14 15:20       ` Joao Martins
2020-12-14 15:40         ` David Woodhouse
2020-12-14  8:39 ` [PATCH v3 13/17] KVM: x86/xen: register vcpu time info region David Woodhouse
2020-12-14 10:55   ` Joao Martins
2020-12-14 12:03     ` David Woodhouse
2020-12-14  8:39 ` [PATCH v3 14/17] KVM: x86/xen: register runstate info David Woodhouse
2020-12-14 11:10   ` Joao Martins
2020-12-14 15:47     ` David Woodhouse
2020-12-14  8:39 ` [PATCH v3 15/17] KVM: x86: declare Xen HVM shared info capability and add test case David Woodhouse
2020-12-14  8:39 ` [PATCH v3 16/17] KVM: Add documentation for Xen hypercall and shared_info updates David Woodhouse
2020-12-14  8:39 ` [PATCH v3 17/17] KVM: x86/xen: Add event channel interrupt vector upcall David Woodhouse
2020-12-14 13:19   ` Joao Martins
2020-12-14 13:32     ` Durrant, Paul
2020-12-14 14:57     ` David Woodhouse
2020-12-14 15:13       ` Joao Martins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.