[PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements
@ 2017-05-19 14:09 Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 01/10] x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set Vitaly Kuznetsov
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Changes since v2:
- Rebased to the latest char-misc-next tree.
- Added Acked-By and Tested-By tags.

Original descriptions:

Hyper-V supports hypercalls for doing local and remote TLB flushing and
gives its guests hints when using hypercall is preferred. While doing
hypercalls for local TLB flushes is probably not practical (and is not
being suggested by modern Hyper-V versions) remote TLB flush with a
hypercall brings significant improvement.

To test the series I wrote a special 'TLB trasher': on a 16 vCPU guest I
was creating 32 threads which were doing 100000 mmap/munmaps each on some
big file. Here are the results:

Before:
# time ./pthread_mmap ./randfile 
real	3m44.994s
user	0m3.829s
sys	3m36.323s

After:
# time ./pthread_mmap ./randfile 
real	2m57.145s
user	0m3.797s
sys	2m34.812s

This series brings a number of small improvements along the way: fast
hypercall implementation and using it for event signaling, rep hypercalls
implementation, hyperv tracing subsystem (which only traces the newly added
remote TLB flush for now).

Vitaly Kuznetsov (10):
  x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set
  x86/hyper-v: stash the max number of virtual/logical processor
  x86/hyper-v: make hv_do_hypercall() inline
  x86/hyper-v: fast hypercall implementation
  hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT
  x86/hyper-v: implement rep hypercalls
  hyper-v: globalize vp_index
  x86/hyper-v: use hypercall for remote TLB flush
  x86/hyper-v: support extended CPU ranges for TLB flush hypercalls
  tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()

 MAINTAINERS                         |   1 +
 arch/x86/Kbuild                     |   4 +-
 arch/x86/hyperv/Makefile            |   2 +-
 arch/x86/hyperv/hv_init.c           |  90 ++++++------
 arch/x86/hyperv/mmu.c               | 270 ++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h     | 149 +++++++++++++++++++-
 arch/x86/include/asm/trace/hyperv.h |  34 +++++
 arch/x86/include/uapi/asm/hyperv.h  |  36 +++++
 arch/x86/kernel/cpu/mshyperv.c      |  14 +-
 drivers/hv/channel_mgmt.c           |  22 ++-
 drivers/hv/connection.c             |   8 +-
 drivers/hv/hv.c                     |   9 --
 drivers/hv/hyperv_vmbus.h           |  11 --
 drivers/hv/vmbus_drv.c              |  17 ---
 drivers/pci/host/pci-hyperv.c       |   4 +-
 include/linux/hyperv.h              |  21 ++-
 16 files changed, 568 insertions(+), 124 deletions(-)
 create mode 100644 arch/x86/hyperv/mmu.c
 create mode 100644 arch/x86/include/asm/trace/hyperv.h

-- 
2.9.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 01/10] x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 02/10] x86/hyper-v: stash the max number of virtual/logical processor Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Code is arch/x86/hyperv/ is only needed when CONFIG_HYPERV is set, the
'basic' support and detection lives in arch/x86/kernel/cpu/mshyperv.c
which is included when CONFIG_HYPERVISOR_GUEST is set.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/Kbuild                 |  4 +++-
 arch/x86/include/asm/mshyperv.h | 10 +++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 586b786..3fa0a3c 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -8,7 +8,9 @@ obj-$(CONFIG_KVM) += kvm/
 obj-$(CONFIG_XEN) += xen/
 
 # Hyper-V paravirtualization support
-obj-$(CONFIG_HYPERVISOR_GUEST) += hyperv/
+ifdef CONFIG_HYPERV
+obj-y += hyperv/
+endif
 
 # lguest paravirtualization support
 obj-$(CONFIG_LGUEST_GUEST) += lguest/
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index fba1007..91acec7 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -175,7 +175,15 @@ void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
-#endif
+#else /* CONFIG_HYPERV */
+static inline void hyperv_init(void) {}
+static inline bool hv_is_hypercall_page_setup(void)
+{
+	return false;
+}
+static inline hyperv_cleanup(void) {}
+#endif /* CONFIG_HYPERV */
+
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
 static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 02/10] x86/hyper-v: stash the max number of virtual/logical processor
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 01/10] x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 03/10] x86/hyper-v: make hv_do_hypercall() inline Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Max virtual processor will be needed for 'extended' hypercalls supporting
more than 64 vCPUs. While on it, unify on 'Hyper-V' in mshyperv.c as we
currently have a mix, report acquired misc features as well.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/include/asm/mshyperv.h |  2 ++
 arch/x86/kernel/cpu/mshyperv.c  | 13 ++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 91acec7..d42b6eb 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -29,6 +29,8 @@ struct ms_hyperv_info {
 	u32 features;
 	u32 misc_features;
 	u32 hints;
+	u32 max_vp_index;
+	u32 max_lp_index;
 };
 
 extern struct ms_hyperv_info ms_hyperv;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 04cb8d3..a8b4765 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -175,9 +175,16 @@ static void __init ms_hyperv_init_platform(void)
 	ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
 	ms_hyperv.hints    = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
 
-	pr_info("HyperV: features 0x%x, hints 0x%x\n",
+	pr_info("Hyper-V: features 0x%x, hints 0x%x\n",
 		ms_hyperv.features, ms_hyperv.hints);
 
+	ms_hyperv.max_vp_index = cpuid_eax(HVCPUID_IMPLEMENTATION_LIMITS);
+	ms_hyperv.max_lp_index = cpuid_ebx(HVCPUID_IMPLEMENTATION_LIMITS);
+
+	pr_info("Hyper-V: max %d virtual processors, %d logical processors\n",
+		ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
+
+
 	/*
 	 * Extract host information.
 	 */
@@ -203,7 +210,7 @@ static void __init ms_hyperv_init_platform(void)
 		rdmsrl(HV_X64_MSR_APIC_FREQUENCY, hv_lapic_frequency);
 		hv_lapic_frequency = div_u64(hv_lapic_frequency, HZ);
 		lapic_timer_frequency = hv_lapic_frequency;
-		pr_info("HyperV: LAPIC Timer Frequency: %#x\n",
+		pr_info("Hyper-V: LAPIC Timer Frequency: %#x\n",
 			lapic_timer_frequency);
 	}
 
@@ -237,7 +244,7 @@ static void __init ms_hyperv_init_platform(void)
 }
 
 const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
-	.name			= "Microsoft HyperV",
+	.name			= "Microsoft Hyper-V",
 	.detect			= ms_hyperv_platform,
 	.init_platform		= ms_hyperv_init_platform,
 };
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 03/10] x86/hyper-v: make hv_do_hypercall() inline
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 01/10] x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 02/10] x86/hyper-v: stash the max number of virtual/logical processor Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

We have only three call sites for hv_do_hypercall() and we're going to
change HVCALL_SIGNAL_EVENT to doing fast hypercall so we can inline this
function for optimization.

Hyper-V top level functional specification states that r9-r11 registers
and flags may be clobbered by the hypervisor during hypercall and with
inlining this is somewhat important, add the clobbers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/hyperv/hv_init.c       | 54 ++++-------------------------------------
 arch/x86/include/asm/mshyperv.h | 43 ++++++++++++++++++++++++++++++++
 drivers/hv/connection.c         |  2 ++
 include/linux/hyperv.h          |  1 -
 4 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 5b882cc..691603e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -75,7 +75,8 @@ static struct clocksource hyperv_cs_msr = {
 	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static void *hypercall_pg;
+void *hv_hypercall_pg;
+EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 struct clocksource *hyperv_cs;
 EXPORT_SYMBOL_GPL(hyperv_cs);
 
@@ -102,15 +103,15 @@ void hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
-	if (hypercall_pg == NULL) {
+	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 		return;
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
-	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hypercall_pg);
+	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
 	/*
@@ -170,51 +171,6 @@ void hyperv_cleanup(void)
 }
 EXPORT_SYMBOL_GPL(hyperv_cleanup);
 
-/*
- * hv_do_hypercall- Invoke the specified hypercall
- */
-u64 hv_do_hypercall(u64 control, void *input, void *output)
-{
-	u64 input_address = (input) ? virt_to_phys(input) : 0;
-	u64 output_address = (output) ? virt_to_phys(output) : 0;
-#ifdef CONFIG_X86_64
-	u64 hv_status = 0;
-
-	if (!hypercall_pg)
-		return (u64)ULLONG_MAX;
-
-	__asm__ __volatile__("mov %0, %%r8" : : "r" (output_address) : "r8");
-	__asm__ __volatile__("call *%3" : "=a" (hv_status) :
-			     "c" (control), "d" (input_address),
-			     "m" (hypercall_pg));
-
-	return hv_status;
-
-#else
-
-	u32 control_hi = control >> 32;
-	u32 control_lo = control & 0xFFFFFFFF;
-	u32 hv_status_hi = 1;
-	u32 hv_status_lo = 1;
-	u32 input_address_hi = input_address >> 32;
-	u32 input_address_lo = input_address & 0xFFFFFFFF;
-	u32 output_address_hi = output_address >> 32;
-	u32 output_address_lo = output_address & 0xFFFFFFFF;
-
-	if (!hypercall_pg)
-		return (u64)ULLONG_MAX;
-
-	__asm__ __volatile__ ("call *%8" : "=d"(hv_status_hi),
-			      "=a"(hv_status_lo) : "d" (control_hi),
-			      "a" (control_lo), "b" (input_address_hi),
-			      "c" (input_address_lo), "D"(output_address_hi),
-			      "S"(output_address_lo), "m" (hypercall_pg));
-
-	return hv_status_lo | ((u64)hv_status_hi << 32);
-#endif /* !x86_64 */
-}
-EXPORT_SYMBOL_GPL(hv_do_hypercall);
-
 void hyperv_report_panic(struct pt_regs *regs)
 {
 	static bool panic_reported;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index d42b6eb..e293937 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -172,6 +172,49 @@ void hv_remove_crash_handler(void);
 
 #if IS_ENABLED(CONFIG_HYPERV)
 extern struct clocksource *hyperv_cs;
+extern void *hv_hypercall_pg;
+
+static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
+{
+	u64 input_address = (input) ? virt_to_phys(input) : 0;
+	u64 output_address = (output) ? virt_to_phys(output) : 0;
+#ifdef CONFIG_X86_64
+	u64 hv_status;
+
+	if (!hv_hypercall_pg)
+		return (u64)ULLONG_MAX;
+
+	__asm__ __volatile__("mov %3, %%r8\n"
+			     "call *%4"
+			     : "=a" (hv_status),
+			       "+c" (control), "+d" (input_address)
+			     :  "r" (output_address), "m" (hv_hypercall_pg)
+			     : "cc", "memory", "r8", "r9", "r10", "r11");
+
+	return hv_status;
+
+#else
+	u32 control_hi = control >> 32;
+	u32 control_lo = control & 0xFFFFFFFF;
+	u32 input_address_hi = input_address >> 32;
+	u32 input_address_lo = input_address & 0xFFFFFFFF;
+	u32 output_address_hi = output_address >> 32;
+	u32 output_address_lo = output_address & 0xFFFFFFFF;
+
+	if (!hv_hypercall_pg)
+		return (u64)ULLONG_MAX;
+
+	__asm__ __volatile__("call *%6"
+			     : "+a" (control_lo), "+d" (control_hi),
+			       "+c" (input_address_lo)
+			     : "b" (input_address_hi),
+			       "D"(output_address_hi), "S"(output_address_lo),
+			       "m" (hv_hypercall_pg)
+			     : "cc", "memory");
+
+	return control_lo | ((u64)control_hi << 32);
+#endif /* !x86_64 */
+}
 
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c2d74ee..4a0a9f6 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -32,6 +32,8 @@
 #include <linux/hyperv.h>
 #include <linux/export.h>
 #include <asm/hyperv.h>
+#include <asm/mshyperv.h>
+
 #include "hyperv_vmbus.h"
 
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index e09fc82..d1ae02d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1188,7 +1188,6 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 			bool fb_overlap_ok);
 void vmbus_free_mmio(resource_size_t start, resource_size_t size);
 int vmbus_cpu_number_to_vp_number(int cpu_number);
-u64 hv_do_hypercall(u64 control, void *input, void *output);
 
 /*
  * GUID definitions of various offer types - services offered to the guest.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 03/10] x86/hyper-v: make hv_do_hypercall() inline Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-21  3:18   ` Andy Lutomirski
  2017-05-19 14:09 ` [PATCH v3 05/10] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Hyper-V supports 'fast' hypercalls when all parameters are passed through
registers. Implement an inline version of a simpliest of these calls:
hypercall with one 8-byte input and no output.

Proper hypercall input interface (struct hv_hypercall_input) definition is
added as well.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/include/asm/mshyperv.h    | 39 ++++++++++++++++++++++++++++++++++++++
 arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index e293937..028e29b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -216,6 +216,45 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 #endif /* !x86_64 */
 }
 
+/* Fast hypercall with 8 bytes of input and no output */
+static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
+{
+	union hv_hypercall_input control = {0};
+
+	control.code = code;
+	control.fast = 1;
+#ifdef CONFIG_X86_64
+	{
+		u64 hv_status;
+
+		__asm__ __volatile__("call *%3"
+				     : "=a" (hv_status),
+				       "+c" (control.as_uint64), "+d" (input1)
+				     : "m" (hv_hypercall_pg)
+				     : "cc", "r8", "r9", "r10", "r11");
+		return hv_status;
+	}
+#else
+	{
+		u32 hv_status_hi, hv_status_lo;
+		u32 input1_hi = (u32)(input1 >> 32);
+		u32 input1_lo = (u32)input1;
+
+		__asm__ __volatile__ ("call *%6"
+				      : "=d"(hv_status_hi),
+					"=a"(hv_status_lo),
+					"+c"(input1_lo)
+				      :	"d" (control.as_uint32_hi),
+					"a" (control.as_uint32_lo),
+					"b" (input1_hi),
+					"m" (hv_hypercall_pg)
+				      : "cc", "edi", "esi");
+
+		return hv_status_lo | ((u64)hv_status_hi << 32);
+	}
+#endif
+}
+
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 432df4b..c87e900 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -256,6 +256,25 @@
 #define HV_PROCESSOR_POWER_STATE_C2		2
 #define HV_PROCESSOR_POWER_STATE_C3		3
 
+/* Hypercall interface */
+union hv_hypercall_input {
+	u64 as_uint64;
+	struct {
+		__u32 as_uint32_lo;
+		__u32 as_uint32_hi;
+	};
+	struct {
+		__u64 code:16;
+		__u64 fast:1;
+		__u64 varhead_size:10;
+		__u64 reserved1:5;
+		__u64 rep_count:12;
+		__u64 reserved2:4;
+		__u64 rep_start:12;
+		__u64 reserved3:4;
+	};
+};
+
 /* hypercall status code */
 #define HV_STATUS_SUCCESS			0
 #define HV_STATUS_INVALID_HYPERCALL_CODE	2
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 05/10] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 06/10] x86/hyper-v: implement rep hypercalls Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

We need to pass only 8 bytes of input for HvSignalEvent which makes it a
perfect fit for fast hypercall. hv_input_signal_event_buffer is not needed
any more and hv_input_signal_event is converted to union for convenience.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 15 +++++----------
 drivers/hv/connection.c   |  3 ++-
 include/linux/hyperv.h    | 19 ++++++++-----------
 3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 0fabd41..ee2a8dd 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -806,20 +806,15 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	/*
 	 * Setup state for signalling the host.
 	 */
-	newchannel->sig_event = (struct hv_input_signal_event *)
-				(ALIGN((unsigned long)
-				&newchannel->sig_buf,
-				HV_HYPERCALL_PARAM_ALIGN));
-
-	newchannel->sig_event->connectionid.asu32 = 0;
-	newchannel->sig_event->connectionid.u.id = VMBUS_EVENT_CONNECTION_ID;
-	newchannel->sig_event->flag_number = 0;
-	newchannel->sig_event->rsvdz = 0;
+	newchannel->sig_event.connectionid.asu32 = 0;
+	newchannel->sig_event.connectionid.u.id = VMBUS_EVENT_CONNECTION_ID;
+	newchannel->sig_event.flag_number = 0;
+	newchannel->sig_event.rsvdz = 0;
 
 	if (vmbus_proto_version != VERSION_WS2008) {
 		newchannel->is_dedicated_interrupt =
 				(offer->is_dedicated_interrupt != 0);
-		newchannel->sig_event->connectionid.u.id =
+		newchannel->sig_event.connectionid.u.id =
 				offer->connection_id;
 	}
 
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 4a0a9f6..51f8cb2 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -408,6 +408,7 @@ void vmbus_set_event(struct vmbus_channel *channel)
 	if (!channel->is_dedicated_interrupt)
 		vmbus_send_interrupt(child_relid);
 
-	hv_do_hypercall(HVCALL_SIGNAL_EVENT, channel->sig_event, NULL);
+	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT,
+			      channel->sig_event.as_uint64);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index d1ae02d..68a5772 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -678,15 +678,13 @@ union hv_connection_id {
 };
 
 /* Definition of the hv_signal_event hypercall input structure. */
-struct hv_input_signal_event {
-	union hv_connection_id connectionid;
-	u16 flag_number;
-	u16 rsvdz;
-};
-
-struct hv_input_signal_event_buffer {
-	u64 align8;
-	struct hv_input_signal_event event;
+union hv_input_signal_event {
+	u64 as_uint64;
+	struct {
+		union hv_connection_id connectionid;
+		u16 flag_number;
+		u16 rsvdz;
+	};
 };
 
 enum hv_numa_policy {
@@ -771,8 +769,7 @@ struct vmbus_channel {
 	} callback_mode;
 
 	bool is_dedicated_interrupt;
-	struct hv_input_signal_event_buffer sig_buf;
-	struct hv_input_signal_event *sig_event;
+	union hv_input_signal_event sig_event;
 
 	/*
 	 * Starting with win8, this field will be used to specify
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 06/10] x86/hyper-v: implement rep hypercalls
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 05/10] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 07/10] hyper-v: globalize vp_index Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Rep hypercalls are normal hypercalls which perform multiple actions at
once. Hyper-V guarantees to return exectution to the caller in not more
than 50us and the caller needs to use hypercall continuation. Touch NMI
watchdog between hypercall invocations.

This is going to be used for HvFlushVirtualAddressList hypercall for
remote TLB flushing.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/include/asm/mshyperv.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 028e29b..74fe788 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -4,6 +4,7 @@
 #include <linux/types.h>
 #include <linux/interrupt.h>
 #include <linux/clocksource.h>
+#include <linux/nmi.h>
 #include <asm/hyperv.h>
 
 /*
@@ -255,6 +256,31 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 #endif
 }
 
+/*
+ * Rep hypercalls. Callers of this functions are supposed to ensure that
+ * rep_count and vahead_size comply with union hv_hypercall_input definition.
+ */
+static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
+				      void *input, void *output)
+{
+	union hv_hypercall_input hc_input = { .code = code,
+					      .varhead_size = varhead_size,
+					      .rep_count = rep_count};
+	u64 status;
+
+	do {
+		status = hv_do_hypercall(hc_input.as_uint64, input, output);
+		if ((status & 0xffff) != HV_STATUS_SUCCESS)
+			return status;
+
+		hc_input.rep_start = (status >> 32) & 0xfff;
+
+		touch_nmi_watchdog();
+	} while (hc_input.rep_start < hc_input.rep_count);
+
+	return status;
+}
+
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 07/10] hyper-v: globalize vp_index
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 06/10] x86/hyper-v: implement rep hypercalls Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

To support implementing remote TLB flushing on Hyper-V with a hypercall
we need to make vp_index available outside of vmbus module. Rename and
globalize.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/hyperv/hv_init.c       | 34 +++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/mshyperv.h | 26 ++++++++++++++++++++++++++
 drivers/hv/channel_mgmt.c       |  7 +++----
 drivers/hv/connection.c         |  3 ++-
 drivers/hv/hv.c                 |  9 ---------
 drivers/hv/hyperv_vmbus.h       | 11 -----------
 drivers/hv/vmbus_drv.c          | 17 -----------------
 drivers/pci/host/pci-hyperv.c   |  4 ++--
 include/linux/hyperv.h          |  1 -
 9 files changed, 66 insertions(+), 46 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 691603e..7fd9cd3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -26,6 +26,8 @@
 #include <linux/mm.h>
 #include <linux/clockchips.h>
 #include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <linux/cpuhotplug.h>
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
@@ -80,6 +82,20 @@ EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 struct clocksource *hyperv_cs;
 EXPORT_SYMBOL_GPL(hyperv_cs);
 
+u32 *hv_vp_index;
+EXPORT_SYMBOL_GPL(hv_vp_index);
+
+static int hv_cpu_init(unsigned int cpu)
+{
+	u64 msr_vp_index;
+
+	hv_get_vp_index(msr_vp_index);
+
+	hv_vp_index[smp_processor_id()] = (u32)msr_vp_index;
+
+	return 0;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -95,6 +111,16 @@ void hyperv_init(void)
 	if (x86_hyper != &x86_hyper_ms_hyperv)
 		return;
 
+	/* Allocate percpu VP index */
+	hv_vp_index = kcalloc(num_possible_cpus(), sizeof(*hv_vp_index),
+			      GFP_KERNEL);
+	if (!hv_vp_index)
+		return;
+
+	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
+			      hv_cpu_init, NULL) < 0)
+		goto free_vp_index;
+
 	/*
 	 * Setup the hypercall page and enable hypercalls.
 	 * 1. Register the guest ID
@@ -106,7 +132,7 @@ void hyperv_init(void)
 	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
 	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-		return;
+		goto free_vp_index;
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -149,6 +175,12 @@ void hyperv_init(void)
 	hyperv_cs = &hyperv_cs_msr;
 	if (ms_hyperv.features & HV_X64_MSR_TIME_REF_COUNT_AVAILABLE)
 		clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
+
+	return;
+
+free_vp_index:
+	kfree(hv_vp_index);
+	hv_vp_index = NULL;
 }
 
 /*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 74fe788..eb38da3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -281,6 +281,32 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
 	return status;
 }
 
+/*
+ * Hypervisor's notion of virtual processor ID is different from
+ * Linux' notion of CPU ID. This information can only be retrieved
+ * in the context of the calling CPU. Setup a map for easy access
+ * to this information.
+ */
+extern u32 __percpu *hv_vp_index;
+
+/**
+ * hv_cpu_number_to_vp_number() - Map CPU to VP.
+ * @cpu_number: CPU number in Linux terms
+ *
+ * This function returns the mapping between the Linux processor
+ * number and the hypervisor's virtual processor number, useful
+ * in making hypercalls and such that talk about specific
+ * processors.
+ *
+ * Return: Virtual processor number in Hyper-V terms
+ */
+static inline int hv_cpu_number_to_vp_number(int cpu_number)
+{
+	WARN_ON(hv_vp_index[cpu_number] == -1);
+
+	return hv_vp_index[cpu_number];
+}
+
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index ee2a8dd..a3e4dba 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -600,7 +600,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 		 */
 		channel->numa_node = 0;
 		channel->target_cpu = 0;
-		channel->target_vp = hv_context.vp_index[0];
+		channel->target_vp = hv_cpu_number_to_vp_number(0);
 		return;
 	}
 
@@ -684,7 +684,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 	}
 
 	channel->target_cpu = cur_cpu;
-	channel->target_vp = hv_context.vp_index[cur_cpu];
+	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);
 }
 
 static void vmbus_wait_for_unload(void)
@@ -1224,8 +1224,7 @@ struct vmbus_channel *vmbus_get_outgoing_channel(struct vmbus_channel *primary)
 		return outgoing_channel;
 	}
 
-	cur_cpu = hv_context.vp_index[get_cpu()];
-	put_cpu();
+	cur_cpu = hv_cpu_number_to_vp_number(smp_processor_id());
 	list_for_each_safe(cur, tmp, &primary->sc_list) {
 		cur_channel = list_entry(cur, struct vmbus_channel, sc_list);
 		if (cur_channel->state != CHANNEL_OPENED_STATE)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 51f8cb2..34b7d55 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -96,7 +96,8 @@ static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo,
 	 * the CPU attempting to connect may not be CPU 0.
 	 */
 	if (version >= VERSION_WIN8_1) {
-		msg->target_vcpu = hv_context.vp_index[smp_processor_id()];
+		msg->target_vcpu =
+			hv_cpu_number_to_vp_number(smp_processor_id());
 		vmbus_connection.connect_cpu = smp_processor_id();
 	} else {
 		msg->target_vcpu = 0;
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 12e7bae..7e67ef4 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -229,7 +229,6 @@ int hv_synic_init(unsigned int cpu)
 	union hv_synic_siefp siefp;
 	union hv_synic_sint shared_sint;
 	union hv_synic_scontrol sctrl;
-	u64 vp_index;
 
 	/* Setup the Synic's message page */
 	hv_get_simp(simp.as_uint64);
@@ -271,14 +270,6 @@ int hv_synic_init(unsigned int cpu)
 	hv_context.synic_initialized = true;
 
 	/*
-	 * Setup the mapping between Hyper-V's notion
-	 * of cpuid and Linux' notion of cpuid.
-	 * This array will be indexed using Linux cpuid.
-	 */
-	hv_get_vp_index(vp_index);
-	hv_context.vp_index[cpu] = (u32)vp_index;
-
-	/*
 	 * Register the per-cpu clockevent source.
 	 */
 	if (ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 1b6a5e0..49569f8 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -229,17 +229,6 @@ struct hv_context {
 	struct hv_per_cpu_context __percpu *cpu_context;
 
 	/*
-	 * Hypervisor's notion of virtual processor ID is different from
-	 * Linux' notion of CPU ID. This information can only be retrieved
-	 * in the context of the calling CPU. Setup a map for easy access
-	 * to this information:
-	 *
-	 * vp_index[a] is the Hyper-V's processor ID corresponding to
-	 * Linux cpuid 'a'.
-	 */
-	u32 vp_index[NR_CPUS];
-
-	/*
 	 * To manage allocations in a NUMA node.
 	 * Array indexed by numa node ID.
 	 */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 59bb3ef..b37f4bb 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1482,23 +1482,6 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size)
 }
 EXPORT_SYMBOL_GPL(vmbus_free_mmio);
 
-/**
- * vmbus_cpu_number_to_vp_number() - Map CPU to VP.
- * @cpu_number: CPU number in Linux terms
- *
- * This function returns the mapping between the Linux processor
- * number and the hypervisor's virtual processor number, useful
- * in making hypercalls and such that talk about specific
- * processors.
- *
- * Return: Virtual processor number in Hyper-V terms
- */
-int vmbus_cpu_number_to_vp_number(int cpu_number)
-{
-	return hv_context.vp_index[cpu_number];
-}
-EXPORT_SYMBOL_GPL(vmbus_cpu_number_to_vp_number);
-
 static int vmbus_acpi_add(struct acpi_device *device)
 {
 	acpi_status result;
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index 8493638..786ae41 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -810,7 +810,7 @@ static void hv_irq_unmask(struct irq_data *data)
 	params->vector = cfg->vector;
 
 	for_each_cpu_and(cpu, dest, cpu_online_mask)
-		params->vp_mask |= (1ULL << vmbus_cpu_number_to_vp_number(cpu));
+		params->vp_mask |= (1ULL << hv_cpu_number_to_vp_number(cpu));
 
 	hv_do_hypercall(HVCALL_RETARGET_INTERRUPT, params, NULL);
 
@@ -905,7 +905,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	} else {
 		for_each_cpu_and(cpu, affinity, cpu_online_mask) {
 			int_pkt->int_desc.cpu_mask |=
-				(1ULL << vmbus_cpu_number_to_vp_number(cpu));
+				(1ULL << hv_cpu_number_to_vp_number(cpu));
 		}
 	}
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 68a5772..08824f5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1184,7 +1184,6 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 			resource_size_t size, resource_size_t align,
 			bool fb_overlap_ok);
 void vmbus_free_mmio(resource_size_t start, resource_size_t size);
-int vmbus_cpu_number_to_vp_number(int cpu_number);
 
 /*
  * GUID definitions of various offer types - services offered to the guest.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 07/10] hyper-v: globalize vp_index Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-21  3:23   ` Andy Lutomirski
  2017-05-19 14:09 ` [PATCH v3 09/10] x86/hyper-v: support extended CPU ranges for TLB flush hypercalls Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 10/10] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov
  9 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot. These areas are of fixes size, limit them with an arbitrary number
of 16 (16 gvas are able to specify 16 * 4096 pages).

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |   2 +
 arch/x86/hyperv/mmu.c              | 117 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h    |   3 +
 arch/x86/include/uapi/asm/hyperv.h |   7 +++
 arch/x86/kernel/cpu/mshyperv.c     |   1 +
 6 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/mmu.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 171ae09..367a820 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1 +1 @@
-obj-y		:= hv_init.o
+obj-y		:= hv_init.o mmu.o
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7fd9cd3..df3252f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -140,6 +140,8 @@ void hyperv_init(void)
 	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
+	hyper_alloc_mmu();
+
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
new file mode 100644
index 0000000..e3ab9b9
--- /dev/null
+++ b/arch/x86/hyperv/mmu.c
@@ -0,0 +1,117 @@
+#include <linux/types.h>
+#include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <linux/log2.h>
+#include <asm/mshyperv.h>
+#include <asm/tlbflush.h>
+#include <asm/msr.h>
+#include <asm/fpu/api.h>
+
+/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
+struct hv_flush_pcpu {
+	__u64 address_space;
+	__u64 flags;
+	__u64 processor_mask;
+	__u64 gva_list[];
+};
+
+static struct hv_flush_pcpu __percpu *pcpu_flush;
+
+static void hyperv_flush_tlb_others(const struct cpumask *cpus,
+				    struct mm_struct *mm, unsigned long start,
+				    unsigned long end)
+{
+	struct hv_flush_pcpu *flush;
+	unsigned long cur, flags;
+	u64 status = -1ULL;
+	int cpu, vcpu, gva_n, max_gvas;
+
+	if (!pcpu_flush || !hv_hypercall_pg)
+		goto do_native;
+
+	if (cpumask_empty(cpus))
+		return;
+
+	local_irq_save(flags);
+
+	flush = this_cpu_ptr(pcpu_flush);
+
+	if (mm) {
+		flush->address_space = virt_to_phys(mm->pgd);
+		flush->flags = 0;
+	} else {
+		flush->address_space = 0;
+		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+	}
+
+	flush->processor_mask = 0;
+	if (cpumask_equal(cpus, cpu_present_mask)) {
+		flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+	} else {
+		for_each_cpu(cpu, cpus) {
+			vcpu = hv_cpu_number_to_vp_number(cpu);
+			if (vcpu != -1 && vcpu < 64)
+				flush->processor_mask |= 1 << vcpu;
+			else
+				goto do_native;
+		}
+	}
+
+	/*
+	 * We can flush not more than max_gvas with one hypercall. Flush the
+	 * whole address space if we were asked to do more.
+	 */
+	max_gvas = (PAGE_SIZE - sizeof(*flush)) / 8;
+
+	if (end == TLB_FLUSH_ALL ||
+	    (end && ((end - start)/(PAGE_SIZE*PAGE_SIZE)) > max_gvas)) {
+		if (end == TLB_FLUSH_ALL)
+			flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+		status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+					 flush, NULL);
+	} else {
+		cur = start;
+		gva_n = 0;
+		do {
+			flush->gva_list[gva_n] = cur & PAGE_MASK;
+			/*
+			 * Lower 12 bits encode the number of additional
+			 * pages to flush (in addition to the 'cur' page).
+			 */
+			if (end >= cur + PAGE_SIZE * PAGE_SIZE)
+				flush->gva_list[gva_n] |= ~PAGE_MASK;
+			else if (end > cur)
+				flush->gva_list[gva_n] |=
+					(end - cur - 1) >> PAGE_SHIFT;
+
+			cur += PAGE_SIZE * PAGE_SIZE;
+			++gva_n;
+
+		} while (cur < end);
+
+		status = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST,
+					     gva_n, 0, flush, NULL);
+
+	}
+
+	local_irq_restore(flags);
+
+	if (!(status & 0xffff))
+		return;
+do_native:
+	native_flush_tlb_others(cpus, mm, start, end);
+}
+
+void hyperv_setup_mmu_ops(void)
+{
+	if (ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
+		pr_info("Hyper-V: Using hypercall for remote TLB flush\n");
+		pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
+	}
+}
+
+void hyper_alloc_mmu(void)
+{
+	if (ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED)
+		pcpu_flush = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index eb38da3..359967f 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -308,6 +308,8 @@ static inline int hv_cpu_number_to_vp_number(int cpu_number)
 }
 
 void hyperv_init(void);
+void hyperv_setup_mmu_ops(void);
+void hyper_alloc_mmu(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
@@ -318,6 +320,7 @@ static inline bool hv_is_hypercall_page_setup(void)
 	return false;
 }
 static inline hyperv_cleanup(void) {}
+static inline void hyperv_setup_mmu_ops(void) {}
 #endif /* CONFIG_HYPERV */
 
 #ifdef CONFIG_HYPERV_TSCPAGE
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index c87e900..3d44036 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -239,6 +239,8 @@
 		(~((1ull << HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1))
 
 /* Declare the various hypercall operations. */
+#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
+#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
 #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
 #define HVCALL_POST_MESSAGE			0x005c
 #define HVCALL_SIGNAL_EVENT			0x005d
@@ -256,6 +258,11 @@
 #define HV_PROCESSOR_POWER_STATE_C2		2
 #define HV_PROCESSOR_POWER_STATE_C3		3
 
+#define HV_FLUSH_ALL_PROCESSORS			0x00000001
+#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	0x00000002
+#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	0x00000004
+#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	0x00000008
+
 /* Hypercall interface */
 union hv_hypercall_input {
 	u64 as_uint64;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index a8b4765..16a9221 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -240,6 +240,7 @@ static void __init ms_hyperv_init_platform(void)
 	 * Setup the hook to get control post apic initialization.
 	 */
 	x86_platform.apic_post_init = hyperv_init;
+	hyperv_setup_mmu_ops();
 #endif
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 09/10] x86/hyper-v: support extended CPU ranges for TLB flush hypercalls
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  2017-05-19 14:09 ` [PATCH v3 10/10] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Hyper-V hosts may support more than 64 vCPUs, we need to use
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX hypercalls in this
case.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 arch/x86/hyperv/mmu.c              | 149 ++++++++++++++++++++++++++++++++++++-
 arch/x86/include/uapi/asm/hyperv.h |  10 +++
 2 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index e3ab9b9..c9cecb3 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -15,8 +15,57 @@ struct hv_flush_pcpu {
 	__u64 gva_list[];
 };
 
+/* HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressListEx hypercalls */
+struct hv_flush_pcpu_ex {
+	__u64 address_space;
+	__u64 flags;
+	struct {
+		__u64 format;
+		__u64 valid_bank_mask;
+		__u64 bank_contents[];
+	} hv_vp_set;
+	__u64 gva_list[];
+};
+
 static struct hv_flush_pcpu __percpu *pcpu_flush;
 
+static struct hv_flush_pcpu_ex __percpu *pcpu_flush_ex;
+
+static inline int cpumask_to_vp_set(struct hv_flush_pcpu_ex *flush,
+				    const struct cpumask *cpus)
+{
+	int cur_bank, cpu, vcpu, nr_bank = 0;
+	bool has_cpus;
+
+	/*
+	 * We can't be sure that translated vcpu numbers will always be
+	 * in ascending order, so iterate over all possible banks and
+	 * check all vcpus in it instead.
+	 */
+	for (cur_bank = 0; cur_bank < ms_hyperv.max_vp_index/64; cur_bank++) {
+		has_cpus = false;
+		for_each_cpu(cpu, cpus) {
+			vcpu = hv_cpu_number_to_vp_number(cpu);
+			if (vcpu/64 != cur_bank)
+				continue;
+			if (!has_cpus) {
+				flush->hv_vp_set.valid_bank_mask |=
+					1 << vcpu / 64;
+				flush->hv_vp_set.bank_contents[nr_bank] =
+					1 << vcpu % 64;
+				has_cpus = true;
+			} else {
+				flush->hv_vp_set.bank_contents[nr_bank] |=
+					1 << vcpu % 64;
+			}
+		}
+		if (has_cpus)
+			nr_bank++;
+	}
+
+	return nr_bank;
+}
+
 static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 				    struct mm_struct *mm, unsigned long start,
 				    unsigned long end)
@@ -102,16 +151,112 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 	native_flush_tlb_others(cpus, mm, start, end);
 }
 
+static void hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
+				       struct mm_struct *mm,
+				       unsigned long start,
+				       unsigned long end)
+{
+	struct hv_flush_pcpu_ex *flush;
+	unsigned long cur, flags;
+	u64 status = -1ULL;
+	int nr_bank = 0, max_gvas, gva_n;
+
+	if (!pcpu_flush_ex || !hv_hypercall_pg)
+		goto do_native;
+
+	if (cpumask_empty(cpus))
+		return;
+
+	local_irq_save(flags);
+
+	flush = this_cpu_ptr(pcpu_flush_ex);
+
+	if (mm) {
+		flush->address_space = virt_to_phys(mm->pgd);
+		flush->flags = 0;
+	} else {
+		flush->address_space = 0;
+		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+	}
+
+	flush->hv_vp_set.valid_bank_mask = 0;
+
+	if (cpumask_equal(cpus, cpu_present_mask)) {
+		flush->hv_vp_set.format = HV_GENERIC_SET_ALL;
+		flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+	} else {
+		flush->hv_vp_set.format = HV_GENERIC_SET_SPARCE_4K;
+		nr_bank = cpumask_to_vp_set(flush, cpus);
+	}
+
+	/*
+	 * We can flush not more than max_gvas with one hypercall. Flush the
+	 * whole address space if we were asked to do more.
+	 */
+	max_gvas = (PAGE_SIZE - sizeof(*flush) - nr_bank*8) / 8;
+
+	if (end == TLB_FLUSH_ALL ||
+	    (end && ((end - start)/(PAGE_SIZE*PAGE_SIZE)) > max_gvas)) {
+		if (end == TLB_FLUSH_ALL)
+			flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+
+		status = hv_do_rep_hypercall(
+			HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX,
+			0, nr_bank + 2, flush, NULL);
+	} else {
+		cur = start;
+		gva_n = nr_bank;
+		do {
+			flush->gva_list[gva_n] = cur & PAGE_MASK;
+			/*
+			 * Lower 12 bits encode the number of additional
+			 * pages to flush (in addition to the 'cur' page).
+			 */
+			if (end >= cur + PAGE_SIZE * PAGE_SIZE)
+				flush->gva_list[gva_n] |= ~PAGE_MASK;
+			else if (end > cur)
+				flush->gva_list[gva_n] |=
+					(end - cur - 1) >> PAGE_SHIFT;
+
+			cur += PAGE_SIZE * PAGE_SIZE;
+			++gva_n;
+
+		} while (cur < end);
+
+		status = hv_do_rep_hypercall(
+			HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
+			gva_n, nr_bank + 2, flush, NULL);
+	}
+
+	local_irq_restore(flags);
+
+	if (!(status & 0xffff))
+		return;
+do_native:
+	native_flush_tlb_others(cpus, mm, start, end);
+}
+
 void hyperv_setup_mmu_ops(void)
 {
-	if (ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
+	if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
+		return;
+
+	if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED)) {
 		pr_info("Hyper-V: Using hypercall for remote TLB flush\n");
 		pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
+	} else {
+		pr_info("Hyper-V: Using ext hypercall for remote TLB flush\n");
+		pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others_ex;
 	}
 }
 
 void hyper_alloc_mmu(void)
 {
-	if (ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED)
+	if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
+		return;
+
+	if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
 		pcpu_flush = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+	else
+		pcpu_flush_ex = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
 }
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 3d44036..c697e20 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -152,6 +152,9 @@
  */
 #define HV_X64_DEPRECATING_AEOI_RECOMMENDED	(1 << 9)
 
+/* Recommend using the newer ExProcessorMasks interface */
+#define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED	(1 << 11)
+
 /*
  * Crash notification flag.
  */
@@ -242,6 +245,8 @@
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
 #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
+#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX  0x0013
+#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX   0x0014
 #define HVCALL_POST_MESSAGE			0x005c
 #define HVCALL_SIGNAL_EVENT			0x005d
 
@@ -263,6 +268,11 @@
 #define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	0x00000004
 #define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	0x00000008
 
+enum HV_GENERIC_SET_FORMAT {
+	HV_GENERIC_SET_SPARCE_4K,
+	HV_GENERIC_SET_ALL,
+};
+
 /* Hypercall interface */
 union hv_hypercall_input {
 	u64 as_uint64;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 10/10] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()
  2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2017-05-19 14:09 ` [PATCH v3 09/10] x86/hyper-v: support extended CPU ranges for TLB flush hypercalls Vitaly Kuznetsov
@ 2017-05-19 14:09 ` Vitaly Kuznetsov
  9 siblings, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-19 14:09 UTC (permalink / raw)
  To: devel
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jork Loeser, Simon Xiao

Add Hyper-V tracing subsystem and trace hyperv_mmu_flush_tlb_others().
Tracing is done the same way we do xen_mmu_flush_tlb_others().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Simon Xiao <sixiao@microsoft.com>
Tested-by: Srikanth Myakam <v-srm@microsoft.com>
---
 MAINTAINERS                         |  1 +
 arch/x86/hyperv/mmu.c               |  8 ++++++++
 arch/x86/include/asm/trace/hyperv.h | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)
 create mode 100644 arch/x86/include/asm/trace/hyperv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f7d568b..0ee55dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6161,6 +6161,7 @@ M:	Stephen Hemminger <sthemmin@microsoft.com>
 L:	devel@linuxdriverproject.org
 S:	Maintained
 F:	arch/x86/include/asm/mshyperv.h
+F:	arch/x86/include/asm/trace/hyperv.h
 F:	arch/x86/include/uapi/asm/hyperv.h
 F:	arch/x86/kernel/cpu/mshyperv.c
 F:	arch/x86/hyperv
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index c9cecb3..f6b5211 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -6,6 +6,10 @@
 #include <asm/tlbflush.h>
 #include <asm/msr.h>
 #include <asm/fpu/api.h>
+#include <asm/trace/hyperv.h>
+
+#define CREATE_TRACE_POINTS
+DEFINE_TRACE(hyperv_mmu_flush_tlb_others);
 
 /* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
 struct hv_flush_pcpu {
@@ -75,6 +79,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 	u64 status = -1ULL;
 	int cpu, vcpu, gva_n, max_gvas;
 
+	trace_hyperv_mmu_flush_tlb_others(cpus, mm, start, end);
+
 	if (!pcpu_flush || !hv_hypercall_pg)
 		goto do_native;
 
@@ -161,6 +167,8 @@ static void hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	u64 status = -1ULL;
 	int nr_bank = 0, max_gvas, gva_n;
 
+	trace_hyperv_mmu_flush_tlb_others(cpus, mm, start, end);
+
 	if (!pcpu_flush_ex || !hv_hypercall_pg)
 		goto do_native;
 
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
new file mode 100644
index 0000000..e46a351
--- /dev/null
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -0,0 +1,34 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hyperv
+
+#if !defined(_TRACE_HYPERV_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HYPERV_H
+
+#include <linux/tracepoint.h>
+
+#if IS_ENABLED(CONFIG_HYPERV)
+
+TRACE_EVENT(hyperv_mmu_flush_tlb_others,
+	    TP_PROTO(const struct cpumask *cpus, struct mm_struct *mm,
+		     unsigned long addr, unsigned long end),
+	    TP_ARGS(cpus, mm, addr, end),
+	    TP_STRUCT__entry(
+		    __field(unsigned int, ncpus)
+		    __field(struct mm_struct *, mm)
+		    __field(unsigned long, addr)
+		    __field(unsigned long, end)
+		    ),
+	    TP_fast_assign(__entry->ncpus = cpumask_weight(cpus);
+			   __entry->mm = mm;
+			   __entry->addr = addr,
+			   __entry->end = end),
+	    TP_printk("ncpus %d mm %p addr %lx, end %lx",
+		      __entry->ncpus, __entry->mm, __entry->addr, __entry->end)
+	);
+
+#endif /* CONFIG_HYPERV */
+
+#endif /* _TRACE_HYPERV_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation
  2017-05-19 14:09 ` [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
@ 2017-05-21  3:18   ` Andy Lutomirski
  2017-05-22 10:44     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2017-05-21  3:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel
  Cc: Stephen Hemminger, Jork Loeser, Haiyang Zhang, x86, linux-kernel,
	Steven Rostedt, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
> Hyper-V supports 'fast' hypercalls when all parameters are passed through
> registers. Implement an inline version of a simpliest of these calls:
> hypercall with one 8-byte input and no output.
> 
> Proper hypercall input interface (struct hv_hypercall_input) definition is
> added as well.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
> Tested-by: Simon Xiao <sixiao@microsoft.com>
> Tested-by: Srikanth Myakam <v-srm@microsoft.com>
> ---
>   arch/x86/include/asm/mshyperv.h    | 39 ++++++++++++++++++++++++++++++++++++++
>   arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
>   2 files changed, 58 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index e293937..028e29b 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -216,6 +216,45 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>   #endif /* !x86_64 */
>   }
>   
> +/* Fast hypercall with 8 bytes of input and no output */
> +static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> +{
> +	union hv_hypercall_input control = {0};
> +
> +	control.code = code;
> +	control.fast = 1;
> +#ifdef CONFIG_X86_64
> +	{
> +		u64 hv_status;
> +
> +		__asm__ __volatile__("call *%3"
> +				     : "=a" (hv_status),
> +				       "+c" (control.as_uint64), "+d" (input1)
> +				     : "m" (hv_hypercall_pg)
> +				     : "cc", "r8", "r9", "r10", "r11");
> +		return hv_status;
> +	}
> +#else
> +	{
> +		u32 hv_status_hi, hv_status_lo;
> +		u32 input1_hi = (u32)(input1 >> 32);
> +		u32 input1_lo = (u32)input1;
> +
> +		__asm__ __volatile__ ("call *%6"
> +				      : "=d"(hv_status_hi),
> +					"=a"(hv_status_lo),
> +					"+c"(input1_lo)
> +				      :	"d" (control.as_uint32_hi),
> +					"a" (control.as_uint32_lo),
> +					"b" (input1_hi),
> +					"m" (hv_hypercall_pg)
> +				      : "cc", "edi", "esi");
> +
> +		return hv_status_lo | ((u64)hv_status_hi << 32);
> +	}
> +#endif

This is going to need an explicit "sp" annotation to force a stack 
frame, I think.  Otherwise objtool is likely to get mad in a 
frame-pointer-omitted build.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-05-19 14:09 ` [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush Vitaly Kuznetsov
@ 2017-05-21  3:23   ` Andy Lutomirski
       [not found]     ` <87zie5tbmm.fsf@vitty.brq.redhat.com>
  0 siblings, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2017-05-21  3:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel
  Cc: Stephen Hemminger, Jork Loeser, Haiyang Zhang, x86, linux-kernel,
	Steven Rostedt, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> this is supposed to work faster than IPIs.
> 
> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
> we need to put the input somewhere in memory and we don't really want to
> have memory allocation on each call so we pre-allocate per cpu memory areas
> on boot. These areas are of fixes size, limit them with an arbitrary number
> of 16 (16 gvas are able to specify 16 * 4096 pages).
> 
> pv_ops patching is happening very early so we need to separate
> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> 
> It is possible and easy to implement local TLB flushing too and there is
> even a hint for that. However, I don't see a room for optimization on the
> host side as both hypercall and native tlb flush will result in vmexit. The
> hint is also not set on modern Hyper-V versions.

Why do local flushes exit?

> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> +				    struct mm_struct *mm, unsigned long start,
> +				    unsigned long end)
> +{

What tree will this go through?  I'm about to send a signature change 
for this function for tip:x86/mm.

Also, how would this interact with PCID?  I have PCID patches that I'm 
pretty happy with now, and I'm hoping to support PCID in 4.13.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation
  2017-05-21  3:18   ` Andy Lutomirski
@ 2017-05-22 10:44     ` Vitaly Kuznetsov
  2017-05-22 22:04       ` Andy Lutomirski
  0 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-22 10:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: devel, Stephen Hemminger, Jork Loeser, Haiyang Zhang, x86,
	linux-kernel, Steven Rostedt, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner

Andy Lutomirski <luto@kernel.org> writes:

> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>> Hyper-V supports 'fast' hypercalls when all parameters are passed through
>> registers. Implement an inline version of a simpliest of these calls:
>> hypercall with one 8-byte input and no output.
>>
>> Proper hypercall input interface (struct hv_hypercall_input) definition is
>> added as well.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
>> Tested-by: Simon Xiao <sixiao@microsoft.com>
>> Tested-by: Srikanth Myakam <v-srm@microsoft.com>
>> ---
>>   arch/x86/include/asm/mshyperv.h    | 39 ++++++++++++++++++++++++++++++++++++++
>>   arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index e293937..028e29b 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -216,6 +216,45 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>>   #endif /* !x86_64 */
>>   }
>>   +/* Fast hypercall with 8 bytes of input and no output */
>> +static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>> +{
>> +	union hv_hypercall_input control = {0};
>> +
>> +	control.code = code;
>> +	control.fast = 1;
>> +#ifdef CONFIG_X86_64
>> +	{
>> +		u64 hv_status;
>> +
>> +		__asm__ __volatile__("call *%3"
>> +				     : "=a" (hv_status),
>> +				       "+c" (control.as_uint64), "+d" (input1)
>> +				     : "m" (hv_hypercall_pg)
>> +				     : "cc", "r8", "r9", "r10", "r11");
>> +		return hv_status;
>> +	}
>> +#else
>> +	{
>> +		u32 hv_status_hi, hv_status_lo;
>> +		u32 input1_hi = (u32)(input1 >> 32);
>> +		u32 input1_lo = (u32)input1;
>> +
>> +		__asm__ __volatile__ ("call *%6"
>> +				      : "=d"(hv_status_hi),
>> +					"=a"(hv_status_lo),
>> +					"+c"(input1_lo)
>> +				      :	"d" (control.as_uint32_hi),
>> +					"a" (control.as_uint32_lo),
>> +					"b" (input1_hi),
>> +					"m" (hv_hypercall_pg)
>> +				      : "cc", "edi", "esi");
>> +
>> +		return hv_status_lo | ((u64)hv_status_hi << 32);
>> +	}
>> +#endif
>
> This is going to need an explicit "sp" annotation to force a stack
> frame, I think.  Otherwise objtool is likely to get mad in a
> frame-pointer-omitted build.
>

You mean I should do something like 

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 359967f..f86c4ae 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -221,6 +221,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 {
        union hv_hypercall_input control = {0};
+       register void *__sp asm(_ASM_SP);
 
        control.code = code;
        control.fast = 1;
@@ -228,8 +229,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
        {
                u64 hv_status;
 
-               __asm__ __volatile__("call *%3"
-                                    : "=a" (hv_status),
+               __asm__ __volatile__("call *%4"
+                                    : "=a" (hv_status), "+r" (__sp),
                                       "+c" (control.as_uint64), "+d" (input1)
                                     : "m" (hv_hypercall_pg)
                                     : "cc", "r8", "r9", "r10", "r11");
@@ -241,10 +242,11 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
                u32 input1_hi = (u32)(input1 >> 32);
                u32 input1_lo = (u32)input1;
 
-               __asm__ __volatile__ ("call *%6"
+               __asm__ __volatile__ ("call *%7"
                                      : "=d"(hv_status_hi),
                                        "=a"(hv_status_lo),
-                                       "+c"(input1_lo)
+                                       "+c"(input1_lo),
+                                       "+r"(__sp)
                                      : "d" (control.as_uint32_hi),
                                        "a" (control.as_uint32_lo),
                                        "b" (input1_hi),

(stollen from 0e8e2238)? hv_do_hypercall() will need this adjustment
too, I think.

-- 
  Vitaly

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
       [not found]     ` <87zie5tbmm.fsf@vitty.brq.redhat.com>
@ 2017-05-22 14:39       ` KY Srinivasan
  2017-05-22 18:28       ` Andy Lutomirski
  1 sibling, 0 replies; 22+ messages in thread
From: KY Srinivasan @ 2017-05-22 14:39 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Andy Lutomirski
  Cc: Stephen Hemminger, Jork Loeser, Haiyang Zhang, x86, linux-kernel,
	Steven Rostedt, Ingo Molnar, H. Peter Anvin, devel,
	Thomas Gleixner



> -----Original Message-----
> From: devel [mailto:driverdev-devel-bounces@linuxdriverproject.org] On
> Behalf Of Vitaly Kuznetsov
> Sent: Monday, May 22, 2017 3:44 AM
> To: Andy Lutomirski <luto@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>; Jork Loeser
> <Jork.Loeser@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin
> <hpa@zytor.com>; devel@linuxdriverproject.org; Thomas Gleixner
> <tglx@linutronix.de>
> Subject: Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB
> flush
> 
> Andy Lutomirski <luto@kernel.org> writes:
> 
> > On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
> >> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> >> this is supposed to work faster than IPIs.
> >>
> >> Implementation details: to do HvFlushVirtualAddress{Space,List}
> hypercalls
> >> we need to put the input somewhere in memory and we don't really
> want to
> >> have memory allocation on each call so we pre-allocate per cpu memory
> areas
> >> on boot. These areas are of fixes size, limit them with an arbitrary number
> >> of 16 (16 gvas are able to specify 16 * 4096 pages).
> >>
> >> pv_ops patching is happening very early so we need to separate
> >> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> >>
> >> It is possible and easy to implement local TLB flushing too and there is
> >> even a hint for that. However, I don't see a room for optimization on the
> >> host side as both hypercall and native tlb flush will result in vmexit. The
> >> hint is also not set on modern Hyper-V versions.
> >
> > Why do local flushes exit?
> 
> "exist"? I don't know, to be honest. To me it makes no difference from
> hypervisor's point of view as intercepting tlb flushing instructions is
> not any different from implmenting a hypercall.
> 
> Hyper-V gives its guests 'hints' to indicate if they need to use
> hypercalls for remote/locat TLB flush and I don't remember seeing
> 'local' bit set.
> 
> Microsoft folks may probably shed some light on why this was added.

As Vitaly has indicated, these are based on hints from the hypervisor.
Not sure what the perf impact might be for the local flush enlightenment.
> 
> >
> >> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >> +				    struct mm_struct *mm, unsigned long
> start,
> >> +				    unsigned long end)
> >> +{
> >
> > What tree will this go through?  I'm about to send a signature change
> > for this function for tip:x86/mm.
> 
> I think this was going to get through Greg's char-misc tree but if we
> need to synchronize I think we can push this through x86.

It will be good to take this through Greg's tree as that would simplify coordination
with other changes. 
> 
> >
> > Also, how would this interact with PCID?  I have PCID patches that I'm
> > pretty happy with now, and I'm hoping to support PCID in 4.13.
> >
> 
> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> not going away from pv_mmu_ops, right? In think case we can have both in
> 4.13. Or do you see any other clashes?
> 
> --
>   Vitaly
> _______________________________________________
> devel mailing list
> devel@linuxdriverproject.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdriverd
> ev.linuxdriverproject.org%2Fmailman%2Flistinfo%2Fdriverdev-
> devel&data=02%7C01%7Ckys%40microsoft.com%7Cbdee6af479524fb02db50
> 8d4a0ff73fe%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63631046
> 6477893081&sdata=69mm5horEX93QjLCyhvyFwD8CL%2B0M8kJFaWC9%2BW
> 18wc%3D&reserved=0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
       [not found]     ` <87zie5tbmm.fsf@vitty.brq.redhat.com>
  2017-05-22 14:39       ` KY Srinivasan
@ 2017-05-22 18:28       ` Andy Lutomirski
  2017-05-23 12:36         ` Vitaly Kuznetsov
  1 sibling, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2017-05-22 18:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andy Lutomirski, devel, Stephen Hemminger, Jork Loeser,
	Haiyang Zhang, X86 ML, linux-kernel, Steven Rostedt, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Andy Lutomirski <luto@kernel.org> writes:
>
>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>>> this is supposed to work faster than IPIs.
>>>
>>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>>> we need to put the input somewhere in memory and we don't really want to
>>> have memory allocation on each call so we pre-allocate per cpu memory areas
>>> on boot. These areas are of fixes size, limit them with an arbitrary number
>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>>>
>>> pv_ops patching is happening very early so we need to separate
>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>>>
>>> It is possible and easy to implement local TLB flushing too and there is
>>> even a hint for that. However, I don't see a room for optimization on the
>>> host side as both hypercall and native tlb flush will result in vmexit. The
>>> hint is also not set on modern Hyper-V versions.
>>
>> Why do local flushes exit?
>
> "exist"? I don't know, to be honest. To me it makes no difference from
> hypervisor's point of view as intercepting tlb flushing instructions is
> not any different from implmenting a hypercall.
>
> Hyper-V gives its guests 'hints' to indicate if they need to use
> hypercalls for remote/locat TLB flush and I don't remember seeing
> 'local' bit set.

What I meant was: why aren't local flushes handled directly in the
guest without exiting to the host?  Or are they?  In principle,
INVPCID should just work, right?  Even reading and writing CR3 back
should work if the hypervisor sets up the magic list of allowed CR3
values, right?

I guess on older CPUs there might not be any way to flush the local
TLB without exiting, but I'm not *that* familiar with the details of
the virtualization extensions.

>
>>
>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>> +                                struct mm_struct *mm, unsigned long start,
>>> +                                unsigned long end)
>>> +{
>>
>> What tree will this go through?  I'm about to send a signature change
>> for this function for tip:x86/mm.
>
> I think this was going to get through Greg's char-misc tree but if we
> need to synchronize I think we can push this through x86.

Works for me.  Linus can probably resolve the trivial conflict.  But
going through the x86 tree might make sense here if that's okay with
you.

>
>>
>> Also, how would this interact with PCID?  I have PCID patches that I'm
>> pretty happy with now, and I'm hoping to support PCID in 4.13.
>>
>
> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> not going away from pv_mmu_ops, right? In think case we can have both in
> 4.13. Or do you see any other clashes?
>

The issue is that I'm changing the whole flush algorithm.  The main
patch that affects this is here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=a67bff42e1e55666fdbaddf233a484a8773688c1

The interactions between that patch and paravirt flush helpers may be
complex, and it'll need some thought.  PCID makes everything even more
subtle, so just turning off PCID when paravirt flush is involved seems
the safest for now.  Ideally we'd eventually support PCID and paravirt
flushes together (and even eventual native remote flushes assuming
they ever get added).

Also, can you share the benchmark you used for these patches?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation
  2017-05-22 10:44     ` Vitaly Kuznetsov
@ 2017-05-22 22:04       ` Andy Lutomirski
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-05-22 22:04 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andy Lutomirski, devel, Stephen Hemminger, Jork Loeser,
	Haiyang Zhang, X86 ML, linux-kernel, Steven Rostedt, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

On Mon, May 22, 2017 at 3:44 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Andy Lutomirski <luto@kernel.org> writes:
>
>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>>> Hyper-V supports 'fast' hypercalls when all parameters are passed through
>>> registers. Implement an inline version of a simpliest of these calls:
>>> hypercall with one 8-byte input and no output.
>>>
>>> Proper hypercall input interface (struct hv_hypercall_input) definition is
>>> added as well.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
>>> Tested-by: Simon Xiao <sixiao@microsoft.com>
>>> Tested-by: Srikanth Myakam <v-srm@microsoft.com>
>>> ---
>>>   arch/x86/include/asm/mshyperv.h    | 39 ++++++++++++++++++++++++++++++++++++++
>>>   arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
>>>   2 files changed, 58 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>>> index e293937..028e29b 100644
>>> --- a/arch/x86/include/asm/mshyperv.h
>>> +++ b/arch/x86/include/asm/mshyperv.h
>>> @@ -216,6 +216,45 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>>>   #endif /* !x86_64 */
>>>   }
>>>   +/* Fast hypercall with 8 bytes of input and no output */
>>> +static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>>> +{
>>> +    union hv_hypercall_input control = {0};
>>> +
>>> +    control.code = code;
>>> +    control.fast = 1;
>>> +#ifdef CONFIG_X86_64
>>> +    {
>>> +            u64 hv_status;
>>> +
>>> +            __asm__ __volatile__("call *%3"
>>> +                                 : "=a" (hv_status),
>>> +                                   "+c" (control.as_uint64), "+d" (input1)
>>> +                                 : "m" (hv_hypercall_pg)
>>> +                                 : "cc", "r8", "r9", "r10", "r11");
>>> +            return hv_status;
>>> +    }
>>> +#else
>>> +    {
>>> +            u32 hv_status_hi, hv_status_lo;
>>> +            u32 input1_hi = (u32)(input1 >> 32);
>>> +            u32 input1_lo = (u32)input1;
>>> +
>>> +            __asm__ __volatile__ ("call *%6"
>>> +                                  : "=d"(hv_status_hi),
>>> +                                    "=a"(hv_status_lo),
>>> +                                    "+c"(input1_lo)
>>> +                                  : "d" (control.as_uint32_hi),
>>> +                                    "a" (control.as_uint32_lo),
>>> +                                    "b" (input1_hi),
>>> +                                    "m" (hv_hypercall_pg)
>>> +                                  : "cc", "edi", "esi");
>>> +
>>> +            return hv_status_lo | ((u64)hv_status_hi << 32);
>>> +    }
>>> +#endif
>>
>> This is going to need an explicit "sp" annotation to force a stack
>> frame, I think.  Otherwise objtool is likely to get mad in a
>> frame-pointer-omitted build.
>>
>
> You mean I should do something like
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 359967f..f86c4ae 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -221,6 +221,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>  {
>         union hv_hypercall_input control = {0};
> +       register void *__sp asm(_ASM_SP);

Exactly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-05-22 18:28       ` Andy Lutomirski
@ 2017-05-23 12:36         ` Vitaly Kuznetsov
  2017-05-23 17:50           ` KY Srinivasan
  2017-06-27  1:36           ` Andy Lutomirski
  0 siblings, 2 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-23 12:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: devel, Stephen Hemminger, Jork Loeser, Haiyang Zhang, X86 ML,
	linux-kernel, Steven Rostedt, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 5498 bytes --]

Andy Lutomirski <luto@kernel.org> writes:

> On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> Andy Lutomirski <luto@kernel.org> writes:
>>
>>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>>>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>>>> this is supposed to work faster than IPIs.
>>>>
>>>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>>>> we need to put the input somewhere in memory and we don't really want to
>>>> have memory allocation on each call so we pre-allocate per cpu memory areas
>>>> on boot. These areas are of fixes size, limit them with an arbitrary number
>>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>>>>
>>>> pv_ops patching is happening very early so we need to separate
>>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>>>>
>>>> It is possible and easy to implement local TLB flushing too and there is
>>>> even a hint for that. However, I don't see a room for optimization on the
>>>> host side as both hypercall and native tlb flush will result in vmexit. The
>>>> hint is also not set on modern Hyper-V versions.
>>>
>>> Why do local flushes exit?
>>
>> "exist"? I don't know, to be honest. To me it makes no difference from
>> hypervisor's point of view as intercepting tlb flushing instructions is
>> not any different from implmenting a hypercall.
>>
>> Hyper-V gives its guests 'hints' to indicate if they need to use
>> hypercalls for remote/locat TLB flush and I don't remember seeing
>> 'local' bit set.
>
> What I meant was: why aren't local flushes handled directly in the
> guest without exiting to the host?  Or are they?  In principle,
> INVPCID should just work, right?  Even reading and writing CR3 back
> should work if the hypervisor sets up the magic list of allowed CR3
> values, right?
>
> I guess on older CPUs there might not be any way to flush the local
> TLB without exiting, but I'm not *that* familiar with the details of
> the virtualization extensions.
>

Right, local flushes should 'just work'. If for whatever reason
hypervisor decides to trap us it's nothing we can do about it.

>>
>>>
>>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>>> +                                struct mm_struct *mm, unsigned long start,
>>>> +                                unsigned long end)
>>>> +{
>>>
>>> What tree will this go through?  I'm about to send a signature change
>>> for this function for tip:x86/mm.
>>
>> I think this was going to get through Greg's char-misc tree but if we
>> need to synchronize I think we can push this through x86.
>
> Works for me.  Linus can probably resolve the trivial conflict.  But
> going through the x86 tree might make sense here if that's okay with
> you.
>

Definitely fine with me, I'll leave this decision up to x86 maintainers,
Hyper-V maintainers, and Greg.

>>
>>>
>>> Also, how would this interact with PCID?  I have PCID patches that I'm
>>> pretty happy with now, and I'm hoping to support PCID in 4.13.
>>>
>>
>> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
>> not going away from pv_mmu_ops, right? In think case we can have both in
>> 4.13. Or do you see any other clashes?
>>
>
> The issue is that I'm changing the whole flush algorithm.  The main
> patch that affects this is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=a67bff42e1e55666fdbaddf233a484a8773688c1
>
> The interactions between that patch and paravirt flush helpers may be
> complex, and it'll need some thought.  PCID makes everything even more
> subtle, so just turning off PCID when paravirt flush is involved seems
> the safest for now.  Ideally we'd eventually support PCID and paravirt
> flushes together (and even eventual native remote flushes assuming
> they ever get added).

I see. On Hyper-V HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST hypercall's
interface is:
1) List of entries to flush. Each entry is a PFN and lower 12 bits are
used to encode the number of pages after this one (defined by the PFN)
we'd like to flush. We can flush up to 509 entries with one
hypercall (can be extended but requires a pre-allocated memory region).

2) Processor mask

3) Address space id (all 64 bits of CR3. Not sure how it's used within
the hypervisor).

HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX is more or less the same but we
need more space to specify > 64 vCPUs so we'll be able to pass less than
509 entries.

The main advantage compared to sending IPIs, as far as I understand, is
that virtual CPUs which are not currently scheduled don't need flushing
and we can't know this from within the guest.

I agree that disabling PCID for paravirt flush users for now is a good
option, let's have it merged and tested without this additional
complexity and make another round after.

>
> Also, can you share the benchmark you used for these patches?

I didn't do much while writing the patchset, mostly I was running the
attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
Hyper-V 2016 guest I get the following (just re-did the test with
4.12-rc1):

Before the patchset:
# time ./pthread_mmap ./randfile 

real	3m33.118s
user	0m3.698s
sys	3m16.624s

After the patchset:
# time ./pthread_mmap ./randfile 

real	2m19.920s
user	0m2.662s
sys	2m9.948s

K. Y.'s guys at Microsoft did additional testing for the patchset on
different Hyper-V deployments including Azure, they may share their
findings too.

-- 
  Vitaly


[-- Attachment #2: pthread_mmap.c --]
[-- Type: text/plain, Size: 1195 bytes --]

#include <pthread.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define nthreads 32
#define pagecount 16384
#define nrounds 10000
#define nchunks 10
#define PAGE_SIZE 4096

int fd;
unsigned long v;

void *threadf(void *ptr)
{
	unsigned long *addr;
	int i, j;

	for (j = 0; j < nrounds; j++) {
		for (i = 0; i < nchunks; i++) {
			addr = mmap(NULL, PAGE_SIZE * pagecount, PROT_READ, MAP_SHARED, fd, i * PAGE_SIZE);
			if (addr == MAP_FAILED) {
				fprintf(stderr, "mmap\n");
				exit(1);
			}
			v += *addr;
			munmap(addr, PAGE_SIZE * pagecount);
		}
	}
}

int main(int argc, char *argv[]) {
	pthread_t thr[nthreads];
	int i;

	if (argc < 2) {
		fprintf(stderr, "usage: %s <some-big-file>\n", argv[0]);
		exit(1);
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		fprintf(stderr, "open\n");
		exit(1);
	}
	
	for (i = 0; i < nthreads; i++) {
		if(pthread_create(&thr[i], NULL, threadf, NULL)) {
			fprintf(stderr, "pthread_create\n");
			exit(1);
		}
	}

	for (i = 0; i < nthreads; i++) {
		if(pthread_join(thr[i], NULL)) {
			fprintf(stderr, "pthread_join\n");
			exit(1);
		}
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-05-23 12:36         ` Vitaly Kuznetsov
@ 2017-05-23 17:50           ` KY Srinivasan
  2017-06-27  1:36           ` Andy Lutomirski
  1 sibling, 0 replies; 22+ messages in thread
From: KY Srinivasan @ 2017-05-23 17:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Andy Lutomirski
  Cc: Stephen Hemminger, Jork Loeser, Haiyang Zhang, X86 ML,
	linux-kernel, Steven Rostedt, Ingo Molnar, H. Peter Anvin, devel,
	Thomas Gleixner



> -----Original Message-----
> From: devel [mailto:driverdev-devel-bounces@linuxdriverproject.org] On
> Behalf Of Vitaly Kuznetsov
> Sent: Tuesday, May 23, 2017 5:37 AM
> To: Andy Lutomirski <luto@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>; Jork Loeser
> <Jork.Loeser@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> X86 ML <x86@kernel.org>; linux-kernel@vger.kernel.org; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin
> <hpa@zytor.com>; devel@linuxdriverproject.org; Thomas Gleixner
> <tglx@linutronix.de>
> Subject: Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB
> flush
> 
> Andy Lutomirski <luto@kernel.org> writes:
> 
> > On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov
> <vkuznets@redhat.com> wrote:
> >> Andy Lutomirski <luto@kernel.org> writes:
> >>
> >>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
> >>>> Hyper-V host can suggest us to use hypercall for doing remote TLB
> flush,
> >>>> this is supposed to work faster than IPIs.
> >>>>
> >>>> Implementation details: to do HvFlushVirtualAddress{Space,List}
> hypercalls
> >>>> we need to put the input somewhere in memory and we don't really
> want to
> >>>> have memory allocation on each call so we pre-allocate per cpu
> memory areas
> >>>> on boot. These areas are of fixes size, limit them with an arbitrary
> number
> >>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
> >>>>
> >>>> pv_ops patching is happening very early so we need to separate
> >>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> >>>>
> >>>> It is possible and easy to implement local TLB flushing too and there is
> >>>> even a hint for that. However, I don't see a room for optimization on
> the
> >>>> host side as both hypercall and native tlb flush will result in vmexit. The
> >>>> hint is also not set on modern Hyper-V versions.
> >>>
> >>> Why do local flushes exit?
> >>
> >> "exist"? I don't know, to be honest. To me it makes no difference from
> >> hypervisor's point of view as intercepting tlb flushing instructions is
> >> not any different from implmenting a hypercall.
> >>
> >> Hyper-V gives its guests 'hints' to indicate if they need to use
> >> hypercalls for remote/locat TLB flush and I don't remember seeing
> >> 'local' bit set.
> >
> > What I meant was: why aren't local flushes handled directly in the
> > guest without exiting to the host?  Or are they?  In principle,
> > INVPCID should just work, right?  Even reading and writing CR3 back
> > should work if the hypervisor sets up the magic list of allowed CR3
> > values, right?
> >
> > I guess on older CPUs there might not be any way to flush the local
> > TLB without exiting, but I'm not *that* familiar with the details of
> > the virtualization extensions.
> >
> 
> Right, local flushes should 'just work'. If for whatever reason
> hypervisor decides to trap us it's nothing we can do about it.
> 
> >>
> >>>
> >>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >>>> +                                struct mm_struct *mm, unsigned long start,
> >>>> +                                unsigned long end)
> >>>> +{
> >>>
> >>> What tree will this go through?  I'm about to send a signature change
> >>> for this function for tip:x86/mm.
> >>
> >> I think this was going to get through Greg's char-misc tree but if we
> >> need to synchronize I think we can push this through x86.
> >
> > Works for me.  Linus can probably resolve the trivial conflict.  But
> > going through the x86 tree might make sense here if that's okay with
> > you.
> >
> 
> Definitely fine with me, I'll leave this decision up to x86 maintainers,
> Hyper-V maintainers, and Greg.
> 
> >>
> >>>
> >>> Also, how would this interact with PCID?  I have PCID patches that I'm
> >>> pretty happy with now, and I'm hoping to support PCID in 4.13.
> >>>
> >>
> >> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> >> not going away from pv_mmu_ops, right? In think case we can have both
> in
> >> 4.13. Or do you see any other clashes?
> >>
> >
> > The issue is that I'm changing the whole flush algorithm.  The main
> > patch that affects this is here:
> >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.ker
> nel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fluto%2Flinux.git%2Fco
> mmit%2F%3Fh%3Dx86%2Fpcid%26id%3Da67bff42e1e55666fdbaddf233a484a
> 8773688c1&data=02%7C01%7Ckys%40microsoft.com%7C88a812b285a741bcd
> 28d08d4a1d864aa%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636
> 311398154677248&sdata=%2BaCK2EW9S%2BdggL168xQ5eiaXXRZY31II6lLle1ys
> 6Bw%3D&reserved=0
> >
> > The interactions between that patch and paravirt flush helpers may be
> > complex, and it'll need some thought.  PCID makes everything even more
> > subtle, so just turning off PCID when paravirt flush is involved seems
> > the safest for now.  Ideally we'd eventually support PCID and paravirt
> > flushes together (and even eventual native remote flushes assuming
> > they ever get added).
> 
> I see. On Hyper-V HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST hypercall's
> interface is:
> 1) List of entries to flush. Each entry is a PFN and lower 12 bits are
> used to encode the number of pages after this one (defined by the PFN)
> we'd like to flush. We can flush up to 509 entries with one
> hypercall (can be extended but requires a pre-allocated memory region).
> 
> 2) Processor mask
> 
> 3) Address space id (all 64 bits of CR3. Not sure how it's used within
> the hypervisor).
> 
> HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX is more or less the same but
> we
> need more space to specify > 64 vCPUs so we'll be able to pass less than
> 509 entries.
> 
> The main advantage compared to sending IPIs, as far as I understand, is
> that virtual CPUs which are not currently scheduled don't need flushing
> and we can't know this from within the guest.

There are other potential advantages as well:
1. When we need to flush with a large CPU mask, the hypercall mechanism can obviously
minimize the number of intercepts.
2. There is no instruction emulation in the hypercall path. 
> 
> I agree that disabling PCID for paravirt flush users for now is a good
> option, let's have it merged and tested without this additional
> complexity and make another round after.
> 
> >
> > Also, can you share the benchmark you used for these patches?
> 
> I didn't do much while writing the patchset, mostly I was running the
> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
> Hyper-V 2016 guest I get the following (just re-did the test with
> 4.12-rc1):
> 
> Before the patchset:
> # time ./pthread_mmap ./randfile
> 
> real	3m33.118s
> user	0m3.698s
> sys	3m16.624s
> 
> After the patchset:
> # time ./pthread_mmap ./randfile
> 
> real	2m19.920s
> user	0m2.662s
> sys	2m9.948s
> 
> K. Y.'s guys at Microsoft did additional testing for the patchset on
> different Hyper-V deployments including Azure, they may share their
> findings too.

Our testing was mostly focused on stability and correctness. For the benchmarks we ran
(micro benchmarks for storage and networking) we did see improvements across the board.

Regards,

K. Y

> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-05-23 12:36         ` Vitaly Kuznetsov
  2017-05-23 17:50           ` KY Srinivasan
@ 2017-06-27  1:36           ` Andy Lutomirski
  2017-07-13 12:46             ` Vitaly Kuznetsov
  1 sibling, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2017-06-27  1:36 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andy Lutomirski, devel, Stephen Hemminger, Jork Loeser,
	Haiyang Zhang, X86 ML, linux-kernel, Steven Rostedt, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Andy Lutomirski <luto@kernel.org> writes:
>
>>
>> Also, can you share the benchmark you used for these patches?
>
> I didn't do much while writing the patchset, mostly I was running the
> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
> Hyper-V 2016 guest I get the following (just re-did the test with
> 4.12-rc1):
>
> Before the patchset:
> # time ./pthread_mmap ./randfile
>
> real    3m33.118s
> user    0m3.698s
> sys     3m16.624s
>
> After the patchset:
> # time ./pthread_mmap ./randfile
>
> real    2m19.920s
> user    0m2.662s
> sys     2m9.948s
>
> K. Y.'s guys at Microsoft did additional testing for the patchset on
> different Hyper-V deployments including Azure, they may share their
> findings too.

I ran this benchmark on my big TLB patchset, mainly to make sure I
didn't regress your test.  I seem to have sped it up by 30% or so
instead.  I need to study this a little bit to figure out why to make
sure that the reason isn't that I'm failing to do flushes I need to
do.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-06-27  1:36           ` Andy Lutomirski
@ 2017-07-13 12:46             ` Vitaly Kuznetsov
  2017-07-14 22:26               ` Andy Lutomirski
  0 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-07-13 12:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: devel, Stephen Hemminger, Jork Loeser, Haiyang Zhang, X86 ML,
	linux-kernel, Steven Rostedt, Ingo Molnar, H. Peter Anvin,
	Thomas Gleixner

Andy Lutomirski <luto@kernel.org> writes:

> On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> Andy Lutomirski <luto@kernel.org> writes:
>>
>>>
>>> Also, can you share the benchmark you used for these patches?
>>
>> I didn't do much while writing the patchset, mostly I was running the
>> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
>> Hyper-V 2016 guest I get the following (just re-did the test with
>> 4.12-rc1):
>>
>> Before the patchset:
>> # time ./pthread_mmap ./randfile
>>
>> real    3m33.118s
>> user    0m3.698s
>> sys     3m16.624s
>>
>> After the patchset:
>> # time ./pthread_mmap ./randfile
>>
>> real    2m19.920s
>> user    0m2.662s
>> sys     2m9.948s
>>
>> K. Y.'s guys at Microsoft did additional testing for the patchset on
>> different Hyper-V deployments including Azure, they may share their
>> findings too.
>
> I ran this benchmark on my big TLB patchset, mainly to make sure I
> didn't regress your test.  I seem to have sped it up by 30% or so
> instead.  I need to study this a little bit to figure out why to make
> sure that the reason isn't that I'm failing to do flushes I need to
> do.

Got back to this and tested everything on WS2016 Hyper-V guest (24
vCPUs) with my slightly modified benchmark. The numbers are:

1) pre-patch:

real	1m15.775s
user	0m0.850s
sys	1m31.515s

2) your 'x86/pcid' series (PCID feature is not passed to the guest so this
is mainly your lazy tlb optimization):

real	0m55.135s
user	0m1.168s
sys	1m3.810s

3) My 'pv tlb shootdown' patchset on top of your 'x86/pcid' series:

real	0m48.891s
user	0m1.052s
sys	0m52.591s

As far as I understand I need to add
'setup_clear_cpu_cap(X86_FEATURE_PCID)' to my series to make things work
properly if this feature appears in the guest.

Other than that there is an additional room for optimization:
tlb_single_page_flush_ceiling, I'm not sure that with Hyper-V's PV the
default value of 33 is optimal. But the investigation can be done
separately.

AFAIU with your TLB preparatory work which got into 4.13 our series
become untangled and can go through different trees. I'll rebase mine
and send it to K. Y. to push through Greg's char-misc tree.

Is there anything blocking your PCID series from going into 4.14? It
seems to big a huge improvement for some workloads.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
  2017-07-13 12:46             ` Vitaly Kuznetsov
@ 2017-07-14 22:26               ` Andy Lutomirski
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-07-14 22:26 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andy Lutomirski, devel, Stephen Hemminger, Jork Loeser,
	Haiyang Zhang, X86 ML, linux-kernel, Steven Rostedt, Ingo Molnar,
	H. Peter Anvin, Thomas Gleixner

On Thu, Jul 13, 2017 at 5:46 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Andy Lutomirski <luto@kernel.org> writes:
>
>> On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>> Andy Lutomirski <luto@kernel.org> writes:
>>>
>>>>
>>>> Also, can you share the benchmark you used for these patches?
>>>
>>> I didn't do much while writing the patchset, mostly I was running the
>>> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
>>> Hyper-V 2016 guest I get the following (just re-did the test with
>>> 4.12-rc1):
>>>
>>> Before the patchset:
>>> # time ./pthread_mmap ./randfile
>>>
>>> real    3m33.118s
>>> user    0m3.698s
>>> sys     3m16.624s
>>>
>>> After the patchset:
>>> # time ./pthread_mmap ./randfile
>>>
>>> real    2m19.920s
>>> user    0m2.662s
>>> sys     2m9.948s
>>>
>>> K. Y.'s guys at Microsoft did additional testing for the patchset on
>>> different Hyper-V deployments including Azure, they may share their
>>> findings too.
>>
>> I ran this benchmark on my big TLB patchset, mainly to make sure I
>> didn't regress your test.  I seem to have sped it up by 30% or so
>> instead.  I need to study this a little bit to figure out why to make
>> sure that the reason isn't that I'm failing to do flushes I need to
>> do.
>
> Got back to this and tested everything on WS2016 Hyper-V guest (24
> vCPUs) with my slightly modified benchmark. The numbers are:
>
> 1) pre-patch:
>
> real    1m15.775s
> user    0m0.850s
> sys     1m31.515s
>
> 2) your 'x86/pcid' series (PCID feature is not passed to the guest so this
> is mainly your lazy tlb optimization):
>
> real    0m55.135s
> user    0m1.168s
> sys     1m3.810s
>
> 3) My 'pv tlb shootdown' patchset on top of your 'x86/pcid' series:
>
> real    0m48.891s
> user    0m1.052s
> sys     0m52.591s
>
> As far as I understand I need to add
> 'setup_clear_cpu_cap(X86_FEATURE_PCID)' to my series to make things work
> properly if this feature appears in the guest.
>
> Other than that there is an additional room for optimization:
> tlb_single_page_flush_ceiling, I'm not sure that with Hyper-V's PV the
> default value of 33 is optimal. But the investigation can be done
> separately.
>
> AFAIU with your TLB preparatory work which got into 4.13 our series
> become untangled and can go through different trees. I'll rebase mine
> and send it to K. Y. to push through Greg's char-misc tree.
>
> Is there anything blocking your PCID series from going into 4.14? It
> seems to big a huge improvement for some workloads.

No.  All but one patch should land in 4.13.

It would also be nifty if someone were to augment by work to allow one
CPU to tell another CPU that it just flushed on that CPU's behalf.
Basically, a property atomic and/or locked operation that finds a
given ctx_id in the remote cpu's cpu_tlbstate and, if tlb_gen <= x,
sets tlb_gen to x.  Some read operations might be useful, too.  This
*might* be doable with cmpxchg16b, but spinlocks would be easier.  The
idea would be for paravirt remote flushes to be able to see, for real,
which remote CPUs need flushes, do the flushes, and then update the
remote tlb_gen to record that they've been done.

FWIW, I read the HV TLB docs, and it's entirely unclear to me how it
interacts with PCID or whether PCID is supported at all.  It would be
real nice to get PCID *and* paravirt flush on the major hypervisor
platforms.

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-07-14 22:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 01/10] x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 02/10] x86/hyper-v: stash the max number of virtual/logical processor Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 03/10] x86/hyper-v: make hv_do_hypercall() inline Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
2017-05-21  3:18   ` Andy Lutomirski
2017-05-22 10:44     ` Vitaly Kuznetsov
2017-05-22 22:04       ` Andy Lutomirski
2017-05-19 14:09 ` [PATCH v3 05/10] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 06/10] x86/hyper-v: implement rep hypercalls Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 07/10] hyper-v: globalize vp_index Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush Vitaly Kuznetsov
2017-05-21  3:23   ` Andy Lutomirski
     [not found]     ` <87zie5tbmm.fsf@vitty.brq.redhat.com>
2017-05-22 14:39       ` KY Srinivasan
2017-05-22 18:28       ` Andy Lutomirski
2017-05-23 12:36         ` Vitaly Kuznetsov
2017-05-23 17:50           ` KY Srinivasan
2017-06-27  1:36           ` Andy Lutomirski
2017-07-13 12:46             ` Vitaly Kuznetsov
2017-07-14 22:26               ` Andy Lutomirski
2017-05-19 14:09 ` [PATCH v3 09/10] x86/hyper-v: support extended CPU ranges for TLB flush hypercalls Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 10/10] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).