All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements
@ 2017-04-07 11:26 Vitaly Kuznetsov
  2017-04-07 11:26 ` [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline Vitaly Kuznetsov
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:26 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

Hi,

Hyper-V supports hypercalls for doing local and remote TLB flushing and
gives its guests hints when using hypercall is preferred. While doing
hypercalls for local TLB flushes is probably not practical (and is not
being suggested by modern Hyper-V versions) remote TLB flush with a
hypercall brings significant improvement.

To test the series I wrote a special 'TLB trasher': on a 16 vCPU guest I
was creating 32 threads which were doing 100000 mmap/munmaps each on some
big file. Here are the results:

Before:
# time ./pthread_mmap ./randfile 
real	3m44.994s
user	0m3.829s
sys	3m36.323s

After:
# time ./pthread_mmap ./randfile 
real	2m57.145s
user	0m3.797s
sys	2m34.812s

This series brings a number of small improvements along the way: fast
hypercall implementation and using it for event signaling, rep hypercalls
implementation, hyperv tracing subsystem (which only traces the newly added
remote TLB flush for now).

Vitaly Kuznetsov (7):
  x86/hyperv: make hv_do_hypercall() inline
  x86/hyper-v: fast hypercall implementation
  hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT
  x86/hyperv: implement rep hypercalls
  hyper-v: globalize vp_index
  x86/hyper-v: use hypercall for remove TLB flush
  tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()

 MAINTAINERS                        |   1 +
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  90 +++++++++++--------------
 arch/x86/hyperv/mmu.c              | 134 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h    | 131 ++++++++++++++++++++++++++++++++++++
 arch/x86/include/uapi/asm/hyperv.h |  26 +++++++
 arch/x86/kernel/cpu/mshyperv.c     |   1 +
 drivers/hv/channel_mgmt.c          |  22 +++---
 drivers/hv/connection.c            |   8 ++-
 drivers/hv/hv.c                    |   9 ---
 drivers/hv/hyperv_vmbus.h          |  11 ---
 drivers/hv/vmbus_drv.c             |  17 -----
 include/linux/hyperv.h             |  21 +++---
 include/trace/events/hyperv.h      |  30 +++++++++
 14 files changed, 386 insertions(+), 117 deletions(-)
 create mode 100644 arch/x86/hyperv/mmu.c
 create mode 100644 include/trace/events/hyperv.h

-- 
2.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
@ 2017-04-07 11:26 ` Vitaly Kuznetsov
  2017-04-07 19:38   ` Jork Loeser
  2017-04-07 11:26 ` [PATCH 2/7] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:26 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

We have only three call sites for hv_do_hypercall() and we're going to
change HVCALL_SIGNAL_EVENT to doing fast hypercall so we can inline this
function for optimization.

Hyper-V top level functional specification states that r9-r11 registers
and flags may be clobbered by the hypervisor during hypercall and with
inlining this is somewhat important, add the clobbers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 54 ++++-------------------------------------
 arch/x86/include/asm/mshyperv.h | 45 ++++++++++++++++++++++++++++++++++
 drivers/hv/connection.c         |  2 ++
 include/linux/hyperv.h          |  1 -
 4 files changed, 52 insertions(+), 50 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index ecaa3f3..7d961d4 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -98,7 +98,8 @@ static struct clocksource hyperv_cs_msr = {
 	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static void *hypercall_pg;
+void *hv_hypercall_pg;
+EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 struct clocksource *hyperv_cs;
 EXPORT_SYMBOL_GPL(hyperv_cs);
 
@@ -125,15 +126,15 @@ void hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
-	if (hypercall_pg == NULL) {
+	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 		return;
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
-	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hypercall_pg);
+	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
 	/*
@@ -190,51 +191,6 @@ void hyperv_cleanup(void)
 }
 EXPORT_SYMBOL_GPL(hyperv_cleanup);
 
-/*
- * hv_do_hypercall- Invoke the specified hypercall
- */
-u64 hv_do_hypercall(u64 control, void *input, void *output)
-{
-	u64 input_address = (input) ? virt_to_phys(input) : 0;
-	u64 output_address = (output) ? virt_to_phys(output) : 0;
-#ifdef CONFIG_X86_64
-	u64 hv_status = 0;
-
-	if (!hypercall_pg)
-		return (u64)ULLONG_MAX;
-
-	__asm__ __volatile__("mov %0, %%r8" : : "r" (output_address) : "r8");
-	__asm__ __volatile__("call *%3" : "=a" (hv_status) :
-			     "c" (control), "d" (input_address),
-			     "m" (hypercall_pg));
-
-	return hv_status;
-
-#else
-
-	u32 control_hi = control >> 32;
-	u32 control_lo = control & 0xFFFFFFFF;
-	u32 hv_status_hi = 1;
-	u32 hv_status_lo = 1;
-	u32 input_address_hi = input_address >> 32;
-	u32 input_address_lo = input_address & 0xFFFFFFFF;
-	u32 output_address_hi = output_address >> 32;
-	u32 output_address_lo = output_address & 0xFFFFFFFF;
-
-	if (!hypercall_pg)
-		return (u64)ULLONG_MAX;
-
-	__asm__ __volatile__ ("call *%8" : "=d"(hv_status_hi),
-			      "=a"(hv_status_lo) : "d" (control_hi),
-			      "a" (control_lo), "b" (input_address_hi),
-			      "c" (input_address_lo), "D"(output_address_hi),
-			      "S"(output_address_lo), "m" (hypercall_pg));
-
-	return hv_status_lo | ((u64)hv_status_hi << 32);
-#endif /* !x86_64 */
-}
-EXPORT_SYMBOL_GPL(hv_do_hypercall);
-
 void hyperv_report_panic(struct pt_regs *regs)
 {
 	static bool panic_reported;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 7c9c895..331e834 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -170,6 +170,51 @@ void hv_remove_crash_handler(void);
 
 #if IS_ENABLED(CONFIG_HYPERV)
 extern struct clocksource *hyperv_cs;
+extern void *hv_hypercall_pg;
+
+static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
+{
+	u64 input_address = (input) ? virt_to_phys(input) : 0;
+	u64 output_address = (output) ? virt_to_phys(output) : 0;
+#ifdef CONFIG_X86_64
+	u64 hv_status;
+
+	if (!hv_hypercall_pg)
+		return (u64)ULLONG_MAX;
+
+	__asm__ __volatile__("mov %3, %%r8\n"
+			     "call *%4"
+			     : "=a" (hv_status)
+			     : "c" (control), "d" (input_address),
+			       "r" (output_address), "m" (hv_hypercall_pg)
+			     : "cc", "r8", "%r9", "%r10", "%r11");
+
+	return hv_status;
+
+#else
+	u32 control_hi = control >> 32;
+	u32 control_lo = control & 0xFFFFFFFF;
+	u32 hv_status_hi;
+	u32 hv_status_lo;
+	u32 input_address_hi = input_address >> 32;
+	u32 input_address_lo = input_address & 0xFFFFFFFF;
+	u32 output_address_hi = output_address >> 32;
+	u32 output_address_lo = output_address & 0xFFFFFFFF;
+
+	if (!hv_hypercall_pg)
+		return (u64)ULLONG_MAX;
+
+	__asm__ __volatile__ ("call *%8"
+			      : "=d"(hv_status_hi), "=a"(hv_status_lo)
+			      : "d" (control_hi), "a" (control_lo),
+				"b" (input_address_hi), "c" (input_address_lo),
+				"D"(output_address_hi), "S"(output_address_lo),
+				"m" (hv_hypercall_pg)
+			      : "cc");
+
+	return hv_status_lo | ((u64)hv_status_hi << 32);
+#endif /* !x86_64 */
+}
 
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index fce27fb..cf77c7f 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -32,6 +32,8 @@
 #include <linux/hyperv.h>
 #include <linux/export.h>
 #include <asm/hyperv.h>
+#include <asm/mshyperv.h>
+
 #include "hyperv_vmbus.h"
 
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index f681f7b..677f084 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1188,7 +1188,6 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 			bool fb_overlap_ok);
 void vmbus_free_mmio(resource_size_t start, resource_size_t size);
 int vmbus_cpu_number_to_vp_number(int cpu_number);
-u64 hv_do_hypercall(u64 control, void *input, void *output);
 
 /*
  * GUID definitions of various offer types - services offered to the guest.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
  2017-04-07 11:26 ` [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline Vitaly Kuznetsov
@ 2017-04-07 11:26 ` Vitaly Kuznetsov
  2017-04-07 19:42   ` Jork Loeser
  2017-04-08 15:18   ` KY Srinivasan
  2017-04-07 11:26 ` [PATCH 3/7] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:26 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

Hyper-V supports 'fast' hypercalls when all parameters are passed through
registers. Implement an inline version of a simpliest of these calls:
hypercall with one 8-byte input and no output.

Proper hypercall input interface (struct hv_hypercall_input) definition is
added as well.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/mshyperv.h    | 37 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 331e834..9a5f58b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -216,6 +216,43 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 #endif /* !x86_64 */
 }
 
+/* Fast hypercall with 8 bytes of input and no output */
+static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
+{
+	union hv_hypercall_input control = {0};
+
+	control.code = code;
+	control.fast = 1;
+#ifdef CONFIG_X86_64
+	{
+		u64 hv_status;
+
+		__asm__ __volatile__("call *%3"
+				     : "=a" (hv_status)
+				     : "c" (control.as_uint64), "d" (input1),
+				       "m" (hv_hypercall_pg)
+				     : "cc", "r8", "%r9", "%r10", "%r11");
+		return hv_status;
+	}
+#else
+	{
+		u32 hv_status_hi, hv_status_lo;
+
+		__asm__ __volatile__ ("call *%6"
+				      : "=d"(hv_status_hi),
+					"=a"(hv_status_lo) :
+					"d" (control.as_uint32_hi),
+					"a" (control.as_uint32_lo),
+					"c" ((u32)input1),
+					"b" ((u32)(input1 >> 32)),
+					"m" (hv_hypercall_pg)
+				      : "cc");
+
+		return hv_status_lo | ((u64)hv_status_hi << 32);
+	}
+#endif
+}
+
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 432df4b..c87e900 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -256,6 +256,25 @@
 #define HV_PROCESSOR_POWER_STATE_C2		2
 #define HV_PROCESSOR_POWER_STATE_C3		3
 
+/* Hypercall interface */
+union hv_hypercall_input {
+	u64 as_uint64;
+	struct {
+		__u32 as_uint32_lo;
+		__u32 as_uint32_hi;
+	};
+	struct {
+		__u64 code:16;
+		__u64 fast:1;
+		__u64 varhead_size:10;
+		__u64 reserved1:5;
+		__u64 rep_count:12;
+		__u64 reserved2:4;
+		__u64 rep_start:12;
+		__u64 reserved3:4;
+	};
+};
+
 /* hypercall status code */
 #define HV_STATUS_SUCCESS			0
 #define HV_STATUS_INVALID_HYPERCALL_CODE	2
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/7] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
  2017-04-07 11:26 ` [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline Vitaly Kuznetsov
  2017-04-07 11:26 ` [PATCH 2/7] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
@ 2017-04-07 11:26 ` Vitaly Kuznetsov
  2017-04-07 11:26 ` [PATCH 4/7] x86/hyperv: implement rep hypercalls Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:26 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

We need to pass only 8 bytes of input for HvSignalEvent which makes it a
perfect fit for fast hypercall. hv_input_signal_event_buffer is not needed
any more and hv_input_signal_event is converted to union for convenience.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/channel_mgmt.c | 15 +++++----------
 drivers/hv/connection.c   |  3 ++-
 include/linux/hyperv.h    | 19 ++++++++-----------
 3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 735f936..6cfa297 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -804,20 +804,15 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	/*
 	 * Setup state for signalling the host.
 	 */
-	newchannel->sig_event = (struct hv_input_signal_event *)
-				(ALIGN((unsigned long)
-				&newchannel->sig_buf,
-				HV_HYPERCALL_PARAM_ALIGN));
-
-	newchannel->sig_event->connectionid.asu32 = 0;
-	newchannel->sig_event->connectionid.u.id = VMBUS_EVENT_CONNECTION_ID;
-	newchannel->sig_event->flag_number = 0;
-	newchannel->sig_event->rsvdz = 0;
+	newchannel->sig_event.connectionid.asu32 = 0;
+	newchannel->sig_event.connectionid.u.id = VMBUS_EVENT_CONNECTION_ID;
+	newchannel->sig_event.flag_number = 0;
+	newchannel->sig_event.rsvdz = 0;
 
 	if (vmbus_proto_version != VERSION_WS2008) {
 		newchannel->is_dedicated_interrupt =
 				(offer->is_dedicated_interrupt != 0);
-		newchannel->sig_event->connectionid.u.id =
+		newchannel->sig_event.connectionid.u.id =
 				offer->connection_id;
 	}
 
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index cf77c7f..545f2a4 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -405,6 +405,7 @@ void vmbus_set_event(struct vmbus_channel *channel)
 	if (!channel->is_dedicated_interrupt)
 		vmbus_send_interrupt(child_relid);
 
-	hv_do_hypercall(HVCALL_SIGNAL_EVENT, channel->sig_event, NULL);
+	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT,
+			      channel->sig_event.as_uint64);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 677f084..5d6777c 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -678,15 +678,13 @@ union hv_connection_id {
 };
 
 /* Definition of the hv_signal_event hypercall input structure. */
-struct hv_input_signal_event {
-	union hv_connection_id connectionid;
-	u16 flag_number;
-	u16 rsvdz;
-};
-
-struct hv_input_signal_event_buffer {
-	u64 align8;
-	struct hv_input_signal_event event;
+union hv_input_signal_event {
+	u64 as_uint64;
+	struct {
+		union hv_connection_id connectionid;
+		u16 flag_number;
+		u16 rsvdz;
+	};
 };
 
 enum hv_numa_policy {
@@ -771,8 +769,7 @@ struct vmbus_channel {
 	} callback_mode;
 
 	bool is_dedicated_interrupt;
-	struct hv_input_signal_event_buffer sig_buf;
-	struct hv_input_signal_event *sig_event;
+	union hv_input_signal_event sig_event;
 
 	/*
 	 * Starting with win8, this field will be used to specify
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/7] x86/hyperv: implement rep hypercalls
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2017-04-07 11:26 ` [PATCH 3/7] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
@ 2017-04-07 11:26 ` Vitaly Kuznetsov
  2017-04-07 19:48   ` Jork Loeser
  2017-04-07 11:26 ` [PATCH 5/7] hyper-v: globalize vp_index Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:26 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

Rep hypercalls are normal hypercalls which perform multiple actions at
once. Hyper-V guarantees to return exectution to the caller in not more
than 50us and the caller needs to use hypercall continuation. Touch NMI
watchdog between hypercall invocations.

This is going to be used for HvFlushVirtualAddressList hypercall for
remote TLB flushing.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/mshyperv.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 9a5f58b..a2c996b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -4,6 +4,7 @@
 #include <linux/types.h>
 #include <linux/interrupt.h>
 #include <linux/clocksource.h>
+#include <linux/nmi.h>
 #include <asm/hyperv.h>
 
 /*
@@ -253,6 +254,26 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 #endif
 }
 
+static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, void *input,
+				      void *output)
+{
+	union hv_hypercall_input hc_input = { .code = code,
+					      .rep_count = rep_count};
+	u64 status;
+
+	do {
+		status = hv_do_hypercall(hc_input.as_uint64, input, output);
+		if ((status & 0xffff) != HV_STATUS_SUCCESS)
+			return status;
+
+		hc_input.rep_start = (status >> 32) & 0xfff;
+
+		touch_nmi_watchdog();
+	} while (hc_input.rep_start < hc_input.rep_count);
+
+	return status;
+}
+
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/7] hyper-v: globalize vp_index
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2017-04-07 11:26 ` [PATCH 4/7] x86/hyperv: implement rep hypercalls Vitaly Kuznetsov
@ 2017-04-07 11:26 ` Vitaly Kuznetsov
  2017-04-08 15:41   ` KY Srinivasan
  2017-04-07 11:27 ` [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:26 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

To support implementing remote TLB flushing on Hyper-V with a hypercall
we need to make vp_index available outside of vmbus module. Rename and
globalize.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 34 +++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/mshyperv.h | 26 ++++++++++++++++++++++++++
 drivers/hv/channel_mgmt.c       |  7 +++----
 drivers/hv/connection.c         |  3 ++-
 drivers/hv/hv.c                 |  9 ---------
 drivers/hv/hyperv_vmbus.h       | 11 -----------
 drivers/hv/vmbus_drv.c          | 17 -----------------
 include/linux/hyperv.h          |  1 -
 8 files changed, 64 insertions(+), 44 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7d961d4..1c14088 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -26,6 +26,8 @@
 #include <linux/mm.h>
 #include <linux/clockchips.h>
 #include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <linux/cpuhotplug.h>
 
 #ifdef CONFIG_X86_64
 
@@ -103,6 +105,20 @@ EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 struct clocksource *hyperv_cs;
 EXPORT_SYMBOL_GPL(hyperv_cs);
 
+u32 *hv_vp_index;
+EXPORT_SYMBOL_GPL(hv_vp_index);
+
+static int hv_cpu_init(unsigned int cpu)
+{
+	u64 msr_vp_index;
+
+	hv_get_vp_index(msr_vp_index);
+
+	hv_vp_index[smp_processor_id()] = (u32)msr_vp_index;
+
+	return 0;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -118,6 +134,16 @@ void hyperv_init(void)
 	if (x86_hyper != &x86_hyper_ms_hyperv)
 		return;
 
+	/* Allocate percpu VP index */
+	hv_vp_index = kcalloc(num_possible_cpus(), sizeof(*hv_vp_index),
+			      GFP_KERNEL);
+	if (!hv_vp_index)
+		return;
+
+	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
+			      hv_cpu_init, NULL) < 0)
+		goto free_vp_index;
+
 	/*
 	 * Setup the hypercall page and enable hypercalls.
 	 * 1. Register the guest ID
@@ -129,7 +155,7 @@ void hyperv_init(void)
 	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
 	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-		return;
+		goto free_vp_index;
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -169,6 +195,12 @@ void hyperv_init(void)
 	hyperv_cs = &hyperv_cs_msr;
 	if (ms_hyperv.features & HV_X64_MSR_TIME_REF_COUNT_AVAILABLE)
 		clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
+
+	return;
+
+free_vp_index:
+	kfree(hv_vp_index);
+	hv_vp_index = NULL;
 }
 
 /*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index a2c996b..1293c84 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -274,6 +274,32 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, void *input,
 	return status;
 }
 
+/*
+ * Hypervisor's notion of virtual processor ID is different from
+ * Linux' notion of CPU ID. This information can only be retrieved
+ * in the context of the calling CPU. Setup a map for easy access
+ * to this information.
+ */
+extern u32 __percpu *hv_vp_index;
+
+/**
+ * vmbus_cpu_number_to_vp_number() - Map CPU to VP.
+ * @cpu_number: CPU number in Linux terms
+ *
+ * This function returns the mapping between the Linux processor
+ * number and the hypervisor's virtual processor number, useful
+ * in making hypercalls and such that talk about specific
+ * processors.
+ *
+ * Return: Virtual processor number in Hyper-V terms
+ */
+static inline int vmbus_cpu_number_to_vp_number(int cpu_number)
+{
+	WARN_ON(hv_vp_index[cpu_number] == -1);
+
+	return hv_vp_index[cpu_number];
+}
+
 void hyperv_init(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 6cfa297..9969c82 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -599,7 +599,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 		 */
 		channel->numa_node = 0;
 		channel->target_cpu = 0;
-		channel->target_vp = hv_context.vp_index[0];
+		channel->target_vp = vmbus_cpu_number_to_vp_number(0);
 		return;
 	}
 
@@ -683,7 +683,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 	}
 
 	channel->target_cpu = cur_cpu;
-	channel->target_vp = hv_context.vp_index[cur_cpu];
+	channel->target_vp = vmbus_cpu_number_to_vp_number(cur_cpu);
 }
 
 static void vmbus_wait_for_unload(void)
@@ -1187,8 +1187,7 @@ struct vmbus_channel *vmbus_get_outgoing_channel(struct vmbus_channel *primary)
 		return outgoing_channel;
 	}
 
-	cur_cpu = hv_context.vp_index[get_cpu()];
-	put_cpu();
+	cur_cpu = vmbus_cpu_number_to_vp_number(smp_processor_id());
 	list_for_each_safe(cur, tmp, &primary->sc_list) {
 		cur_channel = list_entry(cur, struct vmbus_channel, sc_list);
 		if (cur_channel->state != CHANNEL_OPENED_STATE)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 545f2a4..7026d13 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -96,7 +96,8 @@ static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo,
 	 * the CPU attempting to connect may not be CPU 0.
 	 */
 	if (version >= VERSION_WIN8_1)
-		msg->target_vcpu = hv_context.vp_index[smp_processor_id()];
+		msg->target_vcpu =
+			vmbus_cpu_number_to_vp_number(smp_processor_id());
 	else
 		msg->target_vcpu = 0;
 
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 12e7bae..7e67ef4 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -229,7 +229,6 @@ int hv_synic_init(unsigned int cpu)
 	union hv_synic_siefp siefp;
 	union hv_synic_sint shared_sint;
 	union hv_synic_scontrol sctrl;
-	u64 vp_index;
 
 	/* Setup the Synic's message page */
 	hv_get_simp(simp.as_uint64);
@@ -271,14 +270,6 @@ int hv_synic_init(unsigned int cpu)
 	hv_context.synic_initialized = true;
 
 	/*
-	 * Setup the mapping between Hyper-V's notion
-	 * of cpuid and Linux' notion of cpuid.
-	 * This array will be indexed using Linux cpuid.
-	 */
-	hv_get_vp_index(vp_index);
-	hv_context.vp_index[cpu] = (u32)vp_index;
-
-	/*
 	 * Register the per-cpu clockevent source.
 	 */
 	if (ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 6113e91..d624526 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -229,17 +229,6 @@ struct hv_context {
 	struct hv_per_cpu_context __percpu *cpu_context;
 
 	/*
-	 * Hypervisor's notion of virtual processor ID is different from
-	 * Linux' notion of CPU ID. This information can only be retrieved
-	 * in the context of the calling CPU. Setup a map for easy access
-	 * to this information:
-	 *
-	 * vp_index[a] is the Hyper-V's processor ID corresponding to
-	 * Linux cpuid 'a'.
-	 */
-	u32 vp_index[NR_CPUS];
-
-	/*
 	 * To manage allocations in a NUMA node.
 	 * Array indexed by numa node ID.
 	 */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0087b49..63e743d 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1455,23 +1455,6 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size)
 }
 EXPORT_SYMBOL_GPL(vmbus_free_mmio);
 
-/**
- * vmbus_cpu_number_to_vp_number() - Map CPU to VP.
- * @cpu_number: CPU number in Linux terms
- *
- * This function returns the mapping between the Linux processor
- * number and the hypervisor's virtual processor number, useful
- * in making hypercalls and such that talk about specific
- * processors.
- *
- * Return: Virtual processor number in Hyper-V terms
- */
-int vmbus_cpu_number_to_vp_number(int cpu_number)
-{
-	return hv_context.vp_index[cpu_number];
-}
-EXPORT_SYMBOL_GPL(vmbus_cpu_number_to_vp_number);
-
 static int vmbus_acpi_add(struct acpi_device *device)
 {
 	acpi_status result;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 5d6777c..2450e07 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1184,7 +1184,6 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 			resource_size_t size, resource_size_t align,
 			bool fb_overlap_ok);
 void vmbus_free_mmio(resource_size_t start, resource_size_t size);
-int vmbus_cpu_number_to_vp_number(int cpu_number);
 
 /*
  * GUID definitions of various offer types - services offered to the guest.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2017-04-07 11:26 ` [PATCH 5/7] hyper-v: globalize vp_index Vitaly Kuznetsov
@ 2017-04-07 11:27 ` Vitaly Kuznetsov
  2017-04-07 20:46   ` Jork Loeser
  2017-04-08 16:47   ` KY Srinivasan
  2017-04-07 11:27 ` [PATCH 7/7] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov
  2017-04-08 14:57 ` [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements KY Srinivasan
  7 siblings, 2 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:27 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot. These areas are of fixes size, limit them with an arbitrary number
of 16 (16 gvas are able to specify 16 * 4096 pages).

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |   2 +
 arch/x86/hyperv/mmu.c              | 128 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h    |   2 +
 arch/x86/include/uapi/asm/hyperv.h |   7 ++
 arch/x86/kernel/cpu/mshyperv.c     |   1 +
 6 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/mmu.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 171ae09..367a820 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1 +1 @@
-obj-y		:= hv_init.o
+obj-y		:= hv_init.o mmu.o
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1c14088..2cf8a98 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -163,6 +163,8 @@ void hyperv_init(void)
 	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
+	hyper_alloc_mmu();
+
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
new file mode 100644
index 0000000..fb487cb
--- /dev/null
+++ b/arch/x86/hyperv/mmu.c
@@ -0,0 +1,128 @@
+#include <linux/types.h>
+#include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <asm/mshyperv.h>
+#include <asm/tlbflush.h>
+#include <asm/msr.h>
+#include <asm/fpu/api.h>
+
+/*
+ * Arbitrary number; we need to pre-allocate per-cpu struct for doing TLB
+ * flush hypercalls and we need to pick a size. '16' means we'll be able
+ * to flush 16 * 4096 pages (256MB) with one hypercall.
+ */
+#define HV_MMU_MAX_GVAS 16
+
+/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls */
+struct hv_flush_pcpu {
+	struct {
+		__u64 address_space;
+		__u64 flags;
+		__u64 processor_mask;
+		__u64 gva_list[HV_MMU_MAX_GVAS];
+	} flush;
+
+	spinlock_t lock;
+};
+
+static struct hv_flush_pcpu __percpu *pcpu_flush;
+
+static void hyperv_flush_tlb_others(const struct cpumask *cpus,
+				    struct mm_struct *mm, unsigned long start,
+				    unsigned long end)
+{
+	struct hv_flush_pcpu *flush;
+	unsigned long cur, flags;
+	u64 status = -1ULL;
+	int cpu, vcpu, gva_n;
+
+	if (!pcpu_flush || !hv_hypercall_pg)
+		goto do_native;
+
+	if (cpumask_empty(cpus))
+		return;
+
+	flush = this_cpu_ptr(pcpu_flush);
+	spin_lock_irqsave(&flush->lock, flags);
+
+	flush->flush.address_space = virt_to_phys(mm->pgd);
+	flush->flush.processor_mask = 0;
+	if (cpumask_equal(cpus, cpu_present_mask)) {
+		flush->flush.flags = HV_FLUSH_ALL_PROCESSORS;
+	} else {
+		flush->flush.flags = 0;
+		for_each_cpu(cpu, cpus) {
+			vcpu = vmbus_cpu_number_to_vp_number(cpu);
+			if (vcpu != -1 && vcpu < 64)
+				flush->flush.processor_mask |= 1 << vcpu;
+			else
+				goto unlock_do_native;
+		}
+	}
+
+	if (end == TLB_FLUSH_ALL) {
+		flush->flush.flags = HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+		status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+					 &flush->flush, NULL);
+	} else {
+		cur = start;
+more_gvas:
+		gva_n = 0;
+
+		do {
+			flush->flush.gva_list[gva_n] = cur & PAGE_MASK;
+			/*
+			 * Lower 12 bits encode the number of additional
+			 * pages to flush (in addition to the 'cur' page).
+			 */
+			if (end >= cur + PAGE_SIZE * PAGE_SIZE)
+				flush->flush.gva_list[gva_n] |= ~PAGE_MASK;
+			else if (end > cur)
+				flush->flush.gva_list[gva_n] |=
+					(end - cur - 1) >> PAGE_SHIFT;
+
+			cur += PAGE_SIZE * PAGE_SIZE;
+			++gva_n;
+
+		} while (cur < end && gva_n < HV_MMU_MAX_GVAS);
+
+		status = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST,
+					     gva_n, &flush->flush, NULL);
+
+		if (!(status & 0xffff) && cur < end)
+			goto more_gvas;
+	}
+
+unlock_do_native:
+	spin_unlock_irqrestore(&flush->lock, flags);
+
+	if (!(status & 0xffff))
+		return;
+do_native:
+	native_flush_tlb_others(cpus, mm, start, end);
+}
+
+void hyperv_setup_mmu_ops(void)
+{
+	if (ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
+		pr_info("Hyper-V: Using hypercall for remote TLB flush\n");
+		pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
+	}
+}
+
+void hyper_alloc_mmu(void)
+{
+	int cpu;
+	struct hv_flush_pcpu *flush;
+
+	if (ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
+		pcpu_flush = alloc_percpu(struct hv_flush_pcpu);
+		if (!pcpu_flush)
+			return;
+
+		for_each_possible_cpu(cpu) {
+			flush = per_cpu_ptr(pcpu_flush, cpu);
+			spin_lock_init(&flush->lock);
+		}
+	}
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 1293c84..a5041c3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -301,6 +301,8 @@ static inline int vmbus_cpu_number_to_vp_number(int cpu_number)
 }
 
 void hyperv_init(void);
+void hyperv_setup_mmu_ops(void);
+void hyper_alloc_mmu(void);
 void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index c87e900..3d44036 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -239,6 +239,8 @@
 		(~((1ull << HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1))
 
 /* Declare the various hypercall operations. */
+#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
+#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
 #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
 #define HVCALL_POST_MESSAGE			0x005c
 #define HVCALL_SIGNAL_EVENT			0x005d
@@ -256,6 +258,11 @@
 #define HV_PROCESSOR_POWER_STATE_C2		2
 #define HV_PROCESSOR_POWER_STATE_C3		3
 
+#define HV_FLUSH_ALL_PROCESSORS			0x00000001
+#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	0x00000002
+#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	0x00000004
+#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	0x00000008
+
 /* Hypercall interface */
 union hv_hypercall_input {
 	u64 as_uint64;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 04cb8d3..fc228d8 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -233,6 +233,7 @@ static void __init ms_hyperv_init_platform(void)
 	 * Setup the hook to get control post apic initialization.
 	 */
 	x86_platform.apic_post_init = hyperv_init;
+	hyperv_setup_mmu_ops();
 #endif
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/7] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2017-04-07 11:27 ` [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Vitaly Kuznetsov
@ 2017-04-07 11:27 ` Vitaly Kuznetsov
  2017-04-07 14:38   ` Steven Rostedt
  2017-04-08 14:57 ` [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements KY Srinivasan
  7 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-07 11:27 UTC (permalink / raw)
  To: devel, x86
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

Add Hyper-V tracing subsystem and trace hyperv_mmu_flush_tlb_others().
Tracing is done the same way we do xen_mmu_flush_tlb_others().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 MAINTAINERS                   |  1 +
 arch/x86/hyperv/mmu.c         |  6 ++++++
 include/trace/events/hyperv.h | 30 ++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)
 create mode 100644 include/trace/events/hyperv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 819d5e8..9785d98 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6057,6 +6057,7 @@ F:	drivers/scsi/storvsc_drv.c
 F:	drivers/uio/uio_hv_generic.c
 F:	drivers/video/fbdev/hyperv_fb.c
 F:	include/linux/hyperv.h
+F:	include/trace/events/hyperv.h
 F:	tools/hv/
 F:	Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index fb487cb..61f2a5b 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -5,6 +5,10 @@
 #include <asm/tlbflush.h>
 #include <asm/msr.h>
 #include <asm/fpu/api.h>
+#include <trace/events/hyperv.h>
+
+#define CREATE_TRACE_POINTS
+DEFINE_TRACE(hyperv_mmu_flush_tlb_others);
 
 /*
  * Arbitrary number; we need to pre-allocate per-cpu struct for doing TLB
@@ -36,6 +40,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 	u64 status = -1ULL;
 	int cpu, vcpu, gva_n;
 
+	trace_hyperv_mmu_flush_tlb_others(cpus, mm, start, end);
+
 	if (!pcpu_flush || !hv_hypercall_pg)
 		goto do_native;
 
diff --git a/include/trace/events/hyperv.h b/include/trace/events/hyperv.h
new file mode 100644
index 0000000..e37e72d
--- /dev/null
+++ b/include/trace/events/hyperv.h
@@ -0,0 +1,30 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hyperv
+
+#if !defined(_TRACE_HYPERV_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HYPERV_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(hyperv_mmu_flush_tlb_others,
+	    TP_PROTO(const struct cpumask *cpus, struct mm_struct *mm,
+		     unsigned long addr, unsigned long end),
+	    TP_ARGS(cpus, mm, addr, end),
+	    TP_STRUCT__entry(
+		    __field(unsigned int, ncpus)
+		    __field(struct mm_struct *, mm)
+		    __field(unsigned long, addr)
+		    __field(unsigned long, end)
+		    ),
+	    TP_fast_assign(__entry->ncpus = cpumask_weight(cpus);
+			   __entry->mm = mm;
+			   __entry->addr = addr,
+			   __entry->end = end),
+	    TP_printk("ncpus %d mm %p addr %lx, end %lx",
+		      __entry->ncpus, __entry->mm, __entry->addr, __entry->end)
+	);
+
+#endif /* _TRACE_HYPERV_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()
  2017-04-07 11:27 ` [PATCH 7/7] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov
@ 2017-04-07 14:38   ` Steven Rostedt
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2017-04-07 14:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: devel, x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jork Loeser

On Fri,  7 Apr 2017 13:27:01 +0200
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Add Hyper-V tracing subsystem and trace hyperv_mmu_flush_tlb_others().
> Tracing is done the same way we do xen_mmu_flush_tlb_others().
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  MAINTAINERS                   |  1 +
>  arch/x86/hyperv/mmu.c         |  6 ++++++
>  include/trace/events/hyperv.h | 30 ++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
>  create mode 100644 include/trace/events/hyperv.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 819d5e8..9785d98 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6057,6 +6057,7 @@ F:	drivers/scsi/storvsc_drv.c
>  F:	drivers/uio/uio_hv_generic.c
>  F:	drivers/video/fbdev/hyperv_fb.c
>  F:	include/linux/hyperv.h
> +F:	include/trace/events/hyperv.h
>  F:	tools/hv/
>  F:	Documentation/ABI/stable/sysfs-bus-vmbus
>  
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index fb487cb..61f2a5b 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -5,6 +5,10 @@
>  #include <asm/tlbflush.h>
>  #include <asm/msr.h>
>  #include <asm/fpu/api.h>
> +#include <trace/events/hyperv.h>
> +
> +#define CREATE_TRACE_POINTS
> +DEFINE_TRACE(hyperv_mmu_flush_tlb_others);
>  
>  /*
>   * Arbitrary number; we need to pre-allocate per-cpu struct for doing TLB
> @@ -36,6 +40,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>  	u64 status = -1ULL;
>  	int cpu, vcpu, gva_n;
>  
> +	trace_hyperv_mmu_flush_tlb_others(cpus, mm, start, end);
> +
>  	if (!pcpu_flush || !hv_hypercall_pg)
>  		goto do_native;
>  
> diff --git a/include/trace/events/hyperv.h b/include/trace/events/hyperv.h
> new file mode 100644
> index 0000000..e37e72d
> --- /dev/null
> +++ b/include/trace/events/hyperv.h

Since this is architecture specific code, can you keep the header file
in arch/x86?

You can see how to do this in the samples/trace_events/ directory and
see other examples in x86 like arch/x86/kvm/trace.h and
arch/x86/include/asm/trace/mpx.h.

Thanks,

-- Steve

> @@ -0,0 +1,30 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM hyperv
> +
> +#if !defined(_TRACE_HYPERV_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HYPERV_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(hyperv_mmu_flush_tlb_others,
> +	    TP_PROTO(const struct cpumask *cpus, struct mm_struct *mm,
> +		     unsigned long addr, unsigned long end),
> +	    TP_ARGS(cpus, mm, addr, end),
> +	    TP_STRUCT__entry(
> +		    __field(unsigned int, ncpus)
> +		    __field(struct mm_struct *, mm)
> +		    __field(unsigned long, addr)
> +		    __field(unsigned long, end)
> +		    ),
> +	    TP_fast_assign(__entry->ncpus = cpumask_weight(cpus);
> +			   __entry->mm = mm;
> +			   __entry->addr = addr,
> +			   __entry->end = end),
> +	    TP_printk("ncpus %d mm %p addr %lx, end %lx",
> +		      __entry->ncpus, __entry->mm, __entry->addr, __entry->end)
> +	);
> +
> +#endif /* _TRACE_HYPERV_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline
  2017-04-07 11:26 ` [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline Vitaly Kuznetsov
@ 2017-04-07 19:38   ` Jork Loeser
  0 siblings, 0 replies; 26+ messages in thread
From: Jork Loeser @ 2017-04-07 19:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 04:27
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline

> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h index 7c9c895..331e834 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -170,6 +170,51 @@ void hv_remove_crash_handler(void);
> 
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern struct clocksource *hyperv_cs;
> +extern void *hv_hypercall_pg;
> +
> +static inline u64 hv_do_hypercall(u64 control, void *input, void
> +*output) {
> +	u64 input_address = (input) ? virt_to_phys(input) : 0;
> +	u64 output_address = (output) ? virt_to_phys(output) : 0; #ifdef
> +CONFIG_X86_64
> +	u64 hv_status;
> +
> +	if (!hv_hypercall_pg)
> +		return (u64)ULLONG_MAX;
> +
> +	__asm__ __volatile__("mov %3, %%r8\n"
> +			     "call *%4"
> +			     : "=a" (hv_status)
> +			     : "c" (control), "d" (input_address),
> +			       "r" (output_address), "m" (hv_hypercall_pg)
> +			     : "cc", "r8", "%r9", "%r10", "%r11");
Is clobbering memory required here?


> +
> +	return hv_status;
> +
> +#else
> +	u32 control_hi = control >> 32;
> +	u32 control_lo = control & 0xFFFFFFFF;
> +	u32 hv_status_hi;
> +	u32 hv_status_lo;
> +	u32 input_address_hi = input_address >> 32;
> +	u32 input_address_lo = input_address & 0xFFFFFFFF;
> +	u32 output_address_hi = output_address >> 32;
> +	u32 output_address_lo = output_address & 0xFFFFFFFF;
> +
> +	if (!hv_hypercall_pg)
> +		return (u64)ULLONG_MAX;
> +
> +	__asm__ __volatile__ ("call *%8"
> +			      : "=d"(hv_status_hi), "=a"(hv_status_lo)
> +			      : "d" (control_hi), "a" (control_lo),
> +				"b" (input_address_hi), "c"
> (input_address_lo),
> +				"D"(output_address_hi),
> "S"(output_address_lo),
> +				"m" (hv_hypercall_pg)
> +			      : "cc");

Please clobber ecx register for x86 path as well, e.g. by passing as output w/ "+". Please also clobber memory.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-07 11:26 ` [PATCH 2/7] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
@ 2017-04-07 19:42   ` Jork Loeser
  2017-04-10  9:07     ` Vitaly Kuznetsov
  2017-04-08 15:18   ` KY Srinivasan
  1 sibling, 1 reply; 26+ messages in thread
From: Jork Loeser @ 2017-04-07 19:42 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 04:27
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
 
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h index 331e834..9a5f58b 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -216,6 +216,43 @@ static inline u64 hv_do_hypercall(u64 control, void
> *input, void *output)  #endif /* !x86_64 */  }
> 
> +/* Fast hypercall with 8 bytes of input and no output */ static inline
> +u64 hv_do_fast_hypercall8(u16 code, u64 input1) {
> +	union hv_hypercall_input control = {0};
> +
> +	control.code = code;
> +	control.fast = 1;
> +#ifdef CONFIG_X86_64
> +	{
> +		u64 hv_status;
> +
> +		__asm__ __volatile__("call *%3"
> +				     : "=a" (hv_status)
> +				     : "c" (control.as_uint64), "d" (input1),
> +				       "m" (hv_hypercall_pg)
> +				     : "cc", "r8", "%r9", "%r10", "%r11");
> +		return hv_status;
Clobber memory (are there such fast hypercalls)?

> +	}
> +#else
> +	{
> +		u32 hv_status_hi, hv_status_lo;
> +
> +		__asm__ __volatile__ ("call *%6"
> +				      : "=d"(hv_status_hi),
> +					"=a"(hv_status_lo) :
> +					"d" (control.as_uint32_hi),
> +					"a" (control.as_uint32_lo),
> +					"c" ((u32)input1),
> +					"b" ((u32)(input1 >> 32)),
> +					"m" (hv_hypercall_pg)
> +				      : "cc");
> +
> +		return hv_status_lo | ((u64)hv_status_hi << 32);
> +	}
> +#endif
Please clobber ECX, EDI and ESI for x86. Clobber memory as well?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 4/7] x86/hyperv: implement rep hypercalls
  2017-04-07 11:26 ` [PATCH 4/7] x86/hyperv: implement rep hypercalls Vitaly Kuznetsov
@ 2017-04-07 19:48   ` Jork Loeser
  2017-04-10  9:00     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 26+ messages in thread
From: Jork Loeser @ 2017-04-07 19:48 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 04:27
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 4/7] x86/hyperv: implement rep hypercalls
 
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h index 9a5f58b..a2c996b 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -4,6 +4,7 @@
>  #include <linux/types.h>
>  #include <linux/interrupt.h>
>  #include <linux/clocksource.h>
> +#include <linux/nmi.h>
>  #include <asm/hyperv.h>
> 
>  /*
> @@ -253,6 +254,26 @@ static inline u64 hv_do_fast_hypercall8(u16 code,
> u64 input1)  #endif  }
> 
> +static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, void
> *input,
> +				      void *output)
> +{
> +	union hv_hypercall_input hc_input = { .code = code,
> +					      .rep_count = rep_count};

Is there a way to statically verify the re-count not to exceed 12 bits? Could a dynamic check be justified? Perhaps a function comment?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-07 11:27 ` [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Vitaly Kuznetsov
@ 2017-04-07 20:46   ` Jork Loeser
  2017-04-10 17:21     ` Vitaly Kuznetsov
  2017-04-08 16:47   ` KY Srinivasan
  1 sibling, 1 reply; 26+ messages in thread
From: Jork Loeser @ 2017-04-07 20:46 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 04:27
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush

> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c new file
> mode 100644 index 0000000..fb487cb
> --- /dev/null
> +++ b/arch/x86/hyperv/mmu.c
> @@ -0,0 +1,128 @@
> +#include <linux/types.h>
> +#include <linux/hyperv.h>
> +#include <linux/slab.h>
> +#include <asm/mshyperv.h>
> +#include <asm/tlbflush.h>
> +#include <asm/msr.h>
> +#include <asm/fpu/api.h>
> +
> +/*
> + * Arbitrary number; we need to pre-allocate per-cpu struct for doing
> +TLB
> + * flush hypercalls and we need to pick a size. '16' means we'll be
> +able
> + * to flush 16 * 4096 pages (256MB) with one hypercall.
> + */
> +#define HV_MMU_MAX_GVAS 16
> +
> +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls */
> +struct hv_flush_pcpu {
> +	struct {
> +		__u64 address_space;
> +		__u64 flags;
> +		__u64 processor_mask;
> +		__u64 gva_list[HV_MMU_MAX_GVAS];
> +	} flush;
> +
> +	spinlock_t lock;
> +};
Does this need an alignment declaration, so that the flush portion never crosses a page boundary when allocated with alloc_percpu()?

> +
> +static struct hv_flush_pcpu __percpu *pcpu_flush;
> +
> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> +				    struct mm_struct *mm, unsigned long
> start,
> +				    unsigned long end)
> +{
> +	struct hv_flush_pcpu *flush;
> +	unsigned long cur, flags;
> +	u64 status = -1ULL;
> +	int cpu, vcpu, gva_n;
> +
> +	if (!pcpu_flush || !hv_hypercall_pg)
> +		goto do_native;
> +
> +	if (cpumask_empty(cpus))
> +		return;
> +
> +	flush = this_cpu_ptr(pcpu_flush);
> +	spin_lock_irqsave(&flush->lock, flags);

What purpose does the spinlock on the CPU-local struct serve? Would a local_irq_save() do?

Could this be called from NMI context, such as from the debugger?

Could this be a long-running loop, e.g. due to a large start/end range? If so, consider disabling interrupts only in the inner loop / flush the entire space?

Regards,
Jork

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements
  2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2017-04-07 11:27 ` [PATCH 7/7] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov
@ 2017-04-08 14:57 ` KY Srinivasan
  7 siblings, 0 replies; 26+ messages in thread
From: KY Srinivasan @ 2017-04-08 14:57 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Steven Rostedt, Jork Loeser



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 4:27 AM
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and
> hypercall improvements
> 
> Hi,
> 
> Hyper-V supports hypercalls for doing local and remote TLB flushing and
> gives its guests hints when using hypercall is preferred. While doing
> hypercalls for local TLB flushes is probably not practical (and is not
> being suggested by modern Hyper-V versions) remote TLB flush with a
> hypercall brings significant improvement.
> 
> To test the series I wrote a special 'TLB trasher': on a 16 vCPU guest I
> was creating 32 threads which were doing 100000 mmap/munmaps each on
> some
> big file. Here are the results:
> 
> Before:
> # time ./pthread_mmap ./randfile
> real	3m44.994s
> user	0m3.829s
> sys	3m36.323s
> 
> After:
> # time ./pthread_mmap ./randfile
> real	2m57.145s
> user	0m3.797s
> sys	2m34.812s
> 
> This series brings a number of small improvements along the way: fast
> hypercall implementation and using it for event signaling, rep hypercalls
> implementation, hyperv tracing subsystem (which only traces the newly
> added
> remote TLB flush for now).

Thanks Vitaly. We are currently testing these patches on Azure and other Hyper-V
platforms and will report back.

K. Y 
> 
> Vitaly Kuznetsov (7):
>   x86/hyperv: make hv_do_hypercall() inline
>   x86/hyper-v: fast hypercall implementation
>   hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT
>   x86/hyperv: implement rep hypercalls
>   hyper-v: globalize vp_index
>   x86/hyper-v: use hypercall for remove TLB flush
>   tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()
> 
>  MAINTAINERS                        |   1 +
>  arch/x86/hyperv/Makefile           |   2 +-
>  arch/x86/hyperv/hv_init.c          |  90 +++++++++++--------------
>  arch/x86/hyperv/mmu.c              | 134
> +++++++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h    | 131
> ++++++++++++++++++++++++++++++++++++
>  arch/x86/include/uapi/asm/hyperv.h |  26 +++++++
>  arch/x86/kernel/cpu/mshyperv.c     |   1 +
>  drivers/hv/channel_mgmt.c          |  22 +++---
>  drivers/hv/connection.c            |   8 ++-
>  drivers/hv/hv.c                    |   9 ---
>  drivers/hv/hyperv_vmbus.h          |  11 ---
>  drivers/hv/vmbus_drv.c             |  17 -----
>  include/linux/hyperv.h             |  21 +++---
>  include/trace/events/hyperv.h      |  30 +++++++++
>  14 files changed, 386 insertions(+), 117 deletions(-)
>  create mode 100644 arch/x86/hyperv/mmu.c
>  create mode 100644 include/trace/events/hyperv.h
> 
> --
> 2.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-07 11:26 ` [PATCH 2/7] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
  2017-04-07 19:42   ` Jork Loeser
@ 2017-04-08 15:18   ` KY Srinivasan
  2017-04-10  8:46     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 26+ messages in thread
From: KY Srinivasan @ 2017-04-08 15:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Steven Rostedt, Jork Loeser



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 4:27 AM
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
> 
> Hyper-V supports 'fast' hypercalls when all parameters are passed through
> registers. Implement an inline version of a simpliest of these calls:
> hypercall with one 8-byte input and no output.
> 
> Proper hypercall input interface (struct hv_hypercall_input) definition is
> added as well.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/mshyperv.h    | 37
> +++++++++++++++++++++++++++++++++++++
>  arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h
> index 331e834..9a5f58b 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -216,6 +216,43 @@ static inline u64 hv_do_hypercall(u64 control, void
> *input, void *output)
>  #endif /* !x86_64 */
>  }
> 
> +/* Fast hypercall with 8 bytes of input and no output */
> +static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> +{
> +	union hv_hypercall_input control = {0};

Defining the hyper-call arguments on the stack can be problematic
if CONFIG_VMAP_STACK is defined - since we are passing the guest
physical address to the hypervisor, we cannot have the arguments
straddle a page boundary.
We have dealt with this issue currently by making sure the arguments are
never on the stack via different means. Perhaps, we can allocate memory
on a per-cpu basis that can be used for this purpose. In fact, this is what I have done
for the   hv_post_message hypercall. We can just rename that page and use it in all
hypercalls - we can just allocate two pages on a per-CPU basis for this purpose
(for input and output parameters).
> +
> +	control.code = code;
> +	control.fast = 1;
> +#ifdef CONFIG_X86_64
> +	{
> +		u64 hv_status;
> +
> +		__asm__ __volatile__("call *%3"
> +				     : "=a" (hv_status)
> +				     : "c" (control.as_uint64), "d" (input1),
> +				       "m" (hv_hypercall_pg)
> +				     : "cc", "r8", "%r9", "%r10", "%r11");
> +		return hv_status;
> +	}
> +#else
> +	{
> +		u32 hv_status_hi, hv_status_lo;
> +
> +		__asm__ __volatile__ ("call *%6"
> +				      : "=d"(hv_status_hi),
> +					"=a"(hv_status_lo) :
> +					"d" (control.as_uint32_hi),
> +					"a" (control.as_uint32_lo),
> +					"c" ((u32)input1),
> +					"b" ((u32)(input1 >> 32)),
> +					"m" (hv_hypercall_pg)
> +				      : "cc");
> +
> +		return hv_status_lo | ((u64)hv_status_hi << 32);
> +	}
> +#endif
> +}
> +
>  void hyperv_init(void);
>  void hyperv_report_panic(struct pt_regs *regs);
>  bool hv_is_hypercall_page_setup(void);
> diff --git a/arch/x86/include/uapi/asm/hyperv.h
> b/arch/x86/include/uapi/asm/hyperv.h
> index 432df4b..c87e900 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -256,6 +256,25 @@
>  #define HV_PROCESSOR_POWER_STATE_C2		2
>  #define HV_PROCESSOR_POWER_STATE_C3		3
> 
> +/* Hypercall interface */
> +union hv_hypercall_input {
> +	u64 as_uint64;
> +	struct {
> +		__u32 as_uint32_lo;
> +		__u32 as_uint32_hi;
> +	};
> +	struct {
> +		__u64 code:16;
> +		__u64 fast:1;
> +		__u64 varhead_size:10;
> +		__u64 reserved1:5;
> +		__u64 rep_count:12;
> +		__u64 reserved2:4;
> +		__u64 rep_start:12;
> +		__u64 reserved3:4;
> +	};
> +};
> +
>  /* hypercall status code */
>  #define HV_STATUS_SUCCESS			0
>  #define HV_STATUS_INVALID_HYPERCALL_CODE	2
> --
> 2.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 5/7] hyper-v: globalize vp_index
  2017-04-07 11:26 ` [PATCH 5/7] hyper-v: globalize vp_index Vitaly Kuznetsov
@ 2017-04-08 15:41   ` KY Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: KY Srinivasan @ 2017-04-08 15:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Steven Rostedt, Jork Loeser



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 4:27 AM
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 5/7] hyper-v: globalize vp_index
> 
> To support implementing remote TLB flushing on Hyper-V with a hypercall
> we need to make vp_index available outside of vmbus module. Rename and
> globalize.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/hyperv/hv_init.c       | 34
> +++++++++++++++++++++++++++++++++-
>  arch/x86/include/asm/mshyperv.h | 26 ++++++++++++++++++++++++++
>  drivers/hv/channel_mgmt.c       |  7 +++----
>  drivers/hv/connection.c         |  3 ++-
>  drivers/hv/hv.c                 |  9 ---------
>  drivers/hv/hyperv_vmbus.h       | 11 -----------
>  drivers/hv/vmbus_drv.c          | 17 -----------------
>  include/linux/hyperv.h          |  1 -
>  8 files changed, 64 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 7d961d4..1c14088 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -26,6 +26,8 @@
>  #include <linux/mm.h>
>  #include <linux/clockchips.h>
>  #include <linux/hyperv.h>
> +#include <linux/slab.h>
> +#include <linux/cpuhotplug.h>
> 
>  #ifdef CONFIG_X86_64
> 
> @@ -103,6 +105,20 @@ EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>  struct clocksource *hyperv_cs;
>  EXPORT_SYMBOL_GPL(hyperv_cs);
> 
> +u32 *hv_vp_index;
> +EXPORT_SYMBOL_GPL(hv_vp_index);
> +
> +static int hv_cpu_init(unsigned int cpu)
> +{
> +	u64 msr_vp_index;
> +
> +	hv_get_vp_index(msr_vp_index);
> +
> +	hv_vp_index[smp_processor_id()] = (u32)msr_vp_index;
> +
> +	return 0;
> +}
> +
>  /*
>   * This function is to be invoked early in the boot sequence after the
>   * hypervisor has been detected.
> @@ -118,6 +134,16 @@ void hyperv_init(void)
>  	if (x86_hyper != &x86_hyper_ms_hyperv)
>  		return;
> 
> +	/* Allocate percpu VP index */
> +	hv_vp_index = kcalloc(num_possible_cpus(), sizeof(*hv_vp_index),
> +			      GFP_KERNEL);
> +	if (!hv_vp_index)
> +		return;
> +
> +	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> "x86/hyperv_init:online",
> +			      hv_cpu_init, NULL) < 0)
> +		goto free_vp_index;
> +
>  	/*
>  	 * Setup the hypercall page and enable hypercalls.
>  	 * 1. Register the guest ID
> @@ -129,7 +155,7 @@ void hyperv_init(void)
>  	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL,
> PAGE_KERNEL_RX);
>  	if (hv_hypercall_pg == NULL) {
>  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> -		return;
> +		goto free_vp_index;
>  	}
> 
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -169,6 +195,12 @@ void hyperv_init(void)
>  	hyperv_cs = &hyperv_cs_msr;
>  	if (ms_hyperv.features &
> HV_X64_MSR_TIME_REF_COUNT_AVAILABLE)
>  		clocksource_register_hz(&hyperv_cs_msr,
> NSEC_PER_SEC/100);
> +
> +	return;
> +
> +free_vp_index:
> +	kfree(hv_vp_index);
> +	hv_vp_index = NULL;
>  }
> 
>  /*
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h
> index a2c996b..1293c84 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -274,6 +274,32 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16
> rep_count, void *input,
>  	return status;
>  }
> 
> +/*
> + * Hypervisor's notion of virtual processor ID is different from
> + * Linux' notion of CPU ID. This information can only be retrieved
> + * in the context of the calling CPU. Setup a map for easy access
> + * to this information.
> + */
> +extern u32 __percpu *hv_vp_index;
> +
> +/**
> + * vmbus_cpu_number_to_vp_number() - Map CPU to VP.
> + * @cpu_number: CPU number in Linux terms
> + *
> + * This function returns the mapping between the Linux processor
> + * number and the hypervisor's virtual processor number, useful
> + * in making hypercalls and such that talk about specific
> + * processors.
> + *
> + * Return: Virtual processor number in Hyper-V terms
> + */
> +static inline int vmbus_cpu_number_to_vp_number(int cpu_number)
> +{
> +	WARN_ON(hv_vp_index[cpu_number] == -1);
> +
> +	return hv_vp_index[cpu_number];
> +}

Now that we have moved this functionality into the Hyper-V specific 
base kernel, a hv prefix maybe more appropriate for the function.

> +
>  void hyperv_init(void);
>  void hyperv_report_panic(struct pt_regs *regs);
>  bool hv_is_hypercall_page_setup(void);
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 6cfa297..9969c82 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -599,7 +599,7 @@ static void init_vp_index(struct vmbus_channel
> *channel, u16 dev_type)
>  		 */
>  		channel->numa_node = 0;
>  		channel->target_cpu = 0;
> -		channel->target_vp = hv_context.vp_index[0];
> +		channel->target_vp =
> vmbus_cpu_number_to_vp_number(0);
>  		return;
>  	}
> 
> @@ -683,7 +683,7 @@ static void init_vp_index(struct vmbus_channel
> *channel, u16 dev_type)
>  	}
> 
>  	channel->target_cpu = cur_cpu;
> -	channel->target_vp = hv_context.vp_index[cur_cpu];
> +	channel->target_vp =
> vmbus_cpu_number_to_vp_number(cur_cpu);
>  }
> 
>  static void vmbus_wait_for_unload(void)
> @@ -1187,8 +1187,7 @@ struct vmbus_channel
> *vmbus_get_outgoing_channel(struct vmbus_channel *primary)
>  		return outgoing_channel;
>  	}
> 
> -	cur_cpu = hv_context.vp_index[get_cpu()];
> -	put_cpu();
> +	cur_cpu =
> vmbus_cpu_number_to_vp_number(smp_processor_id());
>  	list_for_each_safe(cur, tmp, &primary->sc_list) {
>  		cur_channel = list_entry(cur, struct vmbus_channel, sc_list);
>  		if (cur_channel->state != CHANNEL_OPENED_STATE)
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 545f2a4..7026d13 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -96,7 +96,8 @@ static int vmbus_negotiate_version(struct
> vmbus_channel_msginfo *msginfo,
>  	 * the CPU attempting to connect may not be CPU 0.
>  	 */
>  	if (version >= VERSION_WIN8_1)
> -		msg->target_vcpu =
> hv_context.vp_index[smp_processor_id()];
> +		msg->target_vcpu =
> +
> 	vmbus_cpu_number_to_vp_number(smp_processor_id());
>  	else
>  		msg->target_vcpu = 0;
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 12e7bae..7e67ef4 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -229,7 +229,6 @@ int hv_synic_init(unsigned int cpu)
>  	union hv_synic_siefp siefp;
>  	union hv_synic_sint shared_sint;
>  	union hv_synic_scontrol sctrl;
> -	u64 vp_index;
> 
>  	/* Setup the Synic's message page */
>  	hv_get_simp(simp.as_uint64);
> @@ -271,14 +270,6 @@ int hv_synic_init(unsigned int cpu)
>  	hv_context.synic_initialized = true;
> 
>  	/*
> -	 * Setup the mapping between Hyper-V's notion
> -	 * of cpuid and Linux' notion of cpuid.
> -	 * This array will be indexed using Linux cpuid.
> -	 */
> -	hv_get_vp_index(vp_index);
> -	hv_context.vp_index[cpu] = (u32)vp_index;
> -
> -	/*
>  	 * Register the per-cpu clockevent source.
>  	 */
>  	if (ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE)
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 6113e91..d624526 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -229,17 +229,6 @@ struct hv_context {
>  	struct hv_per_cpu_context __percpu *cpu_context;
> 
>  	/*
> -	 * Hypervisor's notion of virtual processor ID is different from
> -	 * Linux' notion of CPU ID. This information can only be retrieved
> -	 * in the context of the calling CPU. Setup a map for easy access
> -	 * to this information:
> -	 *
> -	 * vp_index[a] is the Hyper-V's processor ID corresponding to
> -	 * Linux cpuid 'a'.
> -	 */
> -	u32 vp_index[NR_CPUS];
> -
> -	/*
>  	 * To manage allocations in a NUMA node.
>  	 * Array indexed by numa node ID.
>  	 */
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 0087b49..63e743d 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1455,23 +1455,6 @@ void vmbus_free_mmio(resource_size_t start,
> resource_size_t size)
>  }
>  EXPORT_SYMBOL_GPL(vmbus_free_mmio);
> 
> -/**
> - * vmbus_cpu_number_to_vp_number() - Map CPU to VP.
> - * @cpu_number: CPU number in Linux terms
> - *
> - * This function returns the mapping between the Linux processor
> - * number and the hypervisor's virtual processor number, useful
> - * in making hypercalls and such that talk about specific
> - * processors.
> - *
> - * Return: Virtual processor number in Hyper-V terms
> - */
> -int vmbus_cpu_number_to_vp_number(int cpu_number)
> -{
> -	return hv_context.vp_index[cpu_number];
> -}
> -EXPORT_SYMBOL_GPL(vmbus_cpu_number_to_vp_number);
> -
>  static int vmbus_acpi_add(struct acpi_device *device)
>  {
>  	acpi_status result;
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 5d6777c..2450e07 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1184,7 +1184,6 @@ int vmbus_allocate_mmio(struct resource **new,
> struct hv_device *device_obj,
>  			resource_size_t size, resource_size_t align,
>  			bool fb_overlap_ok);
>  void vmbus_free_mmio(resource_size_t start, resource_size_t size);
> -int vmbus_cpu_number_to_vp_number(int cpu_number);
> 
>  /*
>   * GUID definitions of various offer types - services offered to the guest.
> --
> 2.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-07 11:27 ` [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Vitaly Kuznetsov
  2017-04-07 20:46   ` Jork Loeser
@ 2017-04-08 16:47   ` KY Srinivasan
  2017-04-10 14:44     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 26+ messages in thread
From: KY Srinivasan @ 2017-04-08 16:47 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel, x86
  Cc: linux-kernel, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Steven Rostedt, Jork Loeser



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 7, 2017 4:27 AM
> To: devel@linuxdriverproject.org; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
> 
> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> this is supposed to work faster than IPIs.
> 
> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
> we need to put the input somewhere in memory and we don't really want to
> have memory allocation on each call so we pre-allocate per cpu memory
> areas
> on boot. These areas are of fixes size, limit them with an arbitrary number
> of 16 (16 gvas are able to specify 16 * 4096 pages).
> 
> pv_ops patching is happening very early so we need to separate
> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> 
> It is possible and easy to implement local TLB flushing too and there is
> even a hint for that. However, I don't see a room for optimization on the
> host side as both hypercall and native tlb flush will result in vmexit. The
> hint is also not set on modern Hyper-V versions.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/hyperv/Makefile           |   2 +-
>  arch/x86/hyperv/hv_init.c          |   2 +
>  arch/x86/hyperv/mmu.c              | 128
> +++++++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h    |   2 +
>  arch/x86/include/uapi/asm/hyperv.h |   7 ++
>  arch/x86/kernel/cpu/mshyperv.c     |   1 +
>  6 files changed, 141 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/mmu.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 171ae09..367a820 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1 +1 @@
> -obj-y		:= hv_init.o
> +obj-y		:= hv_init.o mmu.o
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1c14088..2cf8a98 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -163,6 +163,8 @@ void hyperv_init(void)
>  	hypercall_msr.guest_physical_address =
> vmalloc_to_pfn(hv_hypercall_pg);
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> 
> +	hyper_alloc_mmu();
> +
>  	/*
>  	 * Register Hyper-V specific clocksource.
>  	 */
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> new file mode 100644
> index 0000000..fb487cb
> --- /dev/null
> +++ b/arch/x86/hyperv/mmu.c
> @@ -0,0 +1,128 @@
> +#include <linux/types.h>
> +#include <linux/hyperv.h>
> +#include <linux/slab.h>
> +#include <asm/mshyperv.h>
> +#include <asm/tlbflush.h>
> +#include <asm/msr.h>
> +#include <asm/fpu/api.h>
> +
> +/*
> + * Arbitrary number; we need to pre-allocate per-cpu struct for doing TLB
> + * flush hypercalls and we need to pick a size. '16' means we'll be able
> + * to flush 16 * 4096 pages (256MB) with one hypercall.
> + */
> +#define HV_MMU_MAX_GVAS 16

Did you experiment with different sizes here.
> +
> +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls */
> +struct hv_flush_pcpu {
> +	struct {
> +		__u64 address_space;
> +		__u64 flags;
> +		__u64 processor_mask;
> +		__u64 gva_list[HV_MMU_MAX_GVAS];
> +	} flush;
> +
> +	spinlock_t lock;
> +};
> +
We may be supporting more than 64 CPUs in this hypercall. I am going to inquire with
the Windows folks and get back to you.

> +static struct hv_flush_pcpu __percpu *pcpu_flush;
> +
> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> +				    struct mm_struct *mm, unsigned long
> start,
> +				    unsigned long end)
> +{
> +	struct hv_flush_pcpu *flush;
> +	unsigned long cur, flags;
> +	u64 status = -1ULL;
> +	int cpu, vcpu, gva_n;
> +
> +	if (!pcpu_flush || !hv_hypercall_pg)
> +		goto do_native;
> +
> +	if (cpumask_empty(cpus))
> +		return;
> +
> +	flush = this_cpu_ptr(pcpu_flush);
> +	spin_lock_irqsave(&flush->lock, flags);
> +
> +	flush->flush.address_space = virt_to_phys(mm->pgd);
> +	flush->flush.processor_mask = 0;
> +	if (cpumask_equal(cpus, cpu_present_mask)) {
> +		flush->flush.flags = HV_FLUSH_ALL_PROCESSORS;
> +	} else {
> +		flush->flush.flags = 0;
> +		for_each_cpu(cpu, cpus) {
> +			vcpu = vmbus_cpu_number_to_vp_number(cpu);
> +			if (vcpu != -1 && vcpu < 64)
> +				flush->flush.processor_mask |= 1 << vcpu;
> +			else
> +				goto unlock_do_native;
> +		}
> +	}
> +
> +	if (end == TLB_FLUSH_ALL) {
> +		flush->flush.flags =
> HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> +		status =
> hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> +					 &flush->flush, NULL);
> +	} else {
> +		cur = start;
> +more_gvas:
> +		gva_n = 0;
> +
> +		do {
> +			flush->flush.gva_list[gva_n] = cur & PAGE_MASK;
> +			/*
> +			 * Lower 12 bits encode the number of additional
> +			 * pages to flush (in addition to the 'cur' page).
> +			 */
> +			if (end >= cur + PAGE_SIZE * PAGE_SIZE)
> +				flush->flush.gva_list[gva_n] |=
> ~PAGE_MASK;
> +			else if (end > cur)
> +				flush->flush.gva_list[gva_n] |=
> +					(end - cur - 1) >> PAGE_SHIFT;
> +
> +			cur += PAGE_SIZE * PAGE_SIZE;
> +			++gva_n;
> +
> +		} while (cur < end && gva_n < HV_MMU_MAX_GVAS);
> +
> +		status =
> hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST,
> +					     gva_n, &flush->flush, NULL);
> +
> +		if (!(status & 0xffff) && cur < end)
> +			goto more_gvas;
> +	}
> +
> +unlock_do_native:
> +	spin_unlock_irqrestore(&flush->lock, flags);
> +
> +	if (!(status & 0xffff))
> +		return;
> +do_native:
> +	native_flush_tlb_others(cpus, mm, start, end);
> +}
> +
> +void hyperv_setup_mmu_ops(void)
> +{
> +	if (ms_hyperv.hints &
> HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
> +		pr_info("Hyper-V: Using hypercall for remote TLB flush\n");
> +		pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
> +	}
> +}
> +
> +void hyper_alloc_mmu(void)
> +{
> +	int cpu;
> +	struct hv_flush_pcpu *flush;
> +
> +	if (ms_hyperv.hints &
> HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
> +		pcpu_flush = alloc_percpu(struct hv_flush_pcpu);
> +		if (!pcpu_flush)
> +			return;
> +
> +		for_each_possible_cpu(cpu) {
> +			flush = per_cpu_ptr(pcpu_flush, cpu);
> +			spin_lock_init(&flush->lock);
> +		}
> +	}
> +}
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h
> index 1293c84..a5041c3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -301,6 +301,8 @@ static inline int
> vmbus_cpu_number_to_vp_number(int cpu_number)
>  }
> 
>  void hyperv_init(void);
> +void hyperv_setup_mmu_ops(void);
> +void hyper_alloc_mmu(void);
>  void hyperv_report_panic(struct pt_regs *regs);
>  bool hv_is_hypercall_page_setup(void);
>  void hyperv_cleanup(void);
> diff --git a/arch/x86/include/uapi/asm/hyperv.h
> b/arch/x86/include/uapi/asm/hyperv.h
> index c87e900..3d44036 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -239,6 +239,8 @@
>  		(~((1ull <<
> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1))
> 
>  /* Declare the various hypercall operations. */
> +#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
> +#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
>  #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
>  #define HVCALL_POST_MESSAGE			0x005c
>  #define HVCALL_SIGNAL_EVENT			0x005d
> @@ -256,6 +258,11 @@
>  #define HV_PROCESSOR_POWER_STATE_C2		2
>  #define HV_PROCESSOR_POWER_STATE_C3		3
> 
> +#define HV_FLUSH_ALL_PROCESSORS			0x00000001
> +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	0x00000002
> +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	0x00000004
> +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	0x00000008
> +
>  /* Hypercall interface */
>  union hv_hypercall_input {
>  	u64 as_uint64;
> diff --git a/arch/x86/kernel/cpu/mshyperv.c
> b/arch/x86/kernel/cpu/mshyperv.c
> index 04cb8d3..fc228d8 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -233,6 +233,7 @@ static void __init ms_hyperv_init_platform(void)
>  	 * Setup the hook to get control post apic initialization.
>  	 */
>  	x86_platform.apic_post_init = hyperv_init;
> +	hyperv_setup_mmu_ops();
>  #endif
>  }
> 
> --
> 2.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-08 15:18   ` KY Srinivasan
@ 2017-04-10  8:46     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-10  8:46 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: devel, x86, linux-kernel, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

KY Srinivasan <kys@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, April 7, 2017 4:27 AM
>> To: devel@linuxdriverproject.org; x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
>> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
>> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
>> Subject: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
>> 
>> Hyper-V supports 'fast' hypercalls when all parameters are passed through
>> registers. Implement an inline version of a simpliest of these calls:
>> hypercall with one 8-byte input and no output.
>> 
>> Proper hypercall input interface (struct hv_hypercall_input) definition is
>> added as well.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/mshyperv.h    | 37
>> +++++++++++++++++++++++++++++++++++++
>>  arch/x86/include/uapi/asm/hyperv.h | 19 +++++++++++++++++++
>>  2 files changed, 56 insertions(+)
>> 
>> diff --git a/arch/x86/include/asm/mshyperv.h
>> b/arch/x86/include/asm/mshyperv.h
>> index 331e834..9a5f58b 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -216,6 +216,43 @@ static inline u64 hv_do_hypercall(u64 control, void
>> *input, void *output)
>>  #endif /* !x86_64 */
>>  }
>> 
>> +/* Fast hypercall with 8 bytes of input and no output */
>> +static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>> +{
>> +	union hv_hypercall_input control = {0};
>
> Defining the hyper-call arguments on the stack can be problematic
> if CONFIG_VMAP_STACK is defined - since we are passing the guest
> physical address to the hypervisor, we cannot have the arguments
> straddle a page boundary.
> We have dealt with this issue currently by making sure the arguments are
> never on the stack via different means. Perhaps, we can allocate memory
> on a per-cpu basis that can be used for this purpose. In fact, this is what I have done
> for the   hv_post_message hypercall. We can just rename that page and use it in all
> hypercalls - we can just allocate two pages on a per-CPU basis for this purpose
> (for input and output parameters).

This is fast hypercall and not the normal one - we pass control to
register so we're fine here (and I'd actually expect the compiler to put
it to the propper register in the first place.


>> +
>> +	control.code = code;
>> +	control.fast = 1;
>> +#ifdef CONFIG_X86_64
>> +	{
>> +		u64 hv_status;
>> +
>> +		__asm__ __volatile__("call *%3"
>> +				     : "=a" (hv_status)
>> +				     : "c" (control.as_uint64), "d" (input1),
>> +				       "m" (hv_hypercall_pg)
>> +				     : "cc", "r8", "%r9", "%r10", "%r11");
>> +		return hv_status;
>> +	}
>> +#else
>> +	{
>> +		u32 hv_status_hi, hv_status_lo;
>> +
>> +		__asm__ __volatile__ ("call *%6"
>> +				      : "=d"(hv_status_hi),
>> +					"=a"(hv_status_lo) :
>> +					"d" (control.as_uint32_hi),
>> +					"a" (control.as_uint32_lo),
>> +					"c" ((u32)input1),
>> +					"b" ((u32)(input1 >> 32)),
>> +					"m" (hv_hypercall_pg)
>> +				      : "cc");
>> +
>> +		return hv_status_lo | ((u64)hv_status_hi << 32);
>> +	}
>> +#endif
>> +}
>> +
>>  void hyperv_init(void);
>>  void hyperv_report_panic(struct pt_regs *regs);
>>  bool hv_is_hypercall_page_setup(void);
>> diff --git a/arch/x86/include/uapi/asm/hyperv.h
>> b/arch/x86/include/uapi/asm/hyperv.h
>> index 432df4b..c87e900 100644
>> --- a/arch/x86/include/uapi/asm/hyperv.h
>> +++ b/arch/x86/include/uapi/asm/hyperv.h
>> @@ -256,6 +256,25 @@
>>  #define HV_PROCESSOR_POWER_STATE_C2		2
>>  #define HV_PROCESSOR_POWER_STATE_C3		3
>> 
>> +/* Hypercall interface */
>> +union hv_hypercall_input {
>> +	u64 as_uint64;
>> +	struct {
>> +		__u32 as_uint32_lo;
>> +		__u32 as_uint32_hi;
>> +	};
>> +	struct {
>> +		__u64 code:16;
>> +		__u64 fast:1;
>> +		__u64 varhead_size:10;
>> +		__u64 reserved1:5;
>> +		__u64 rep_count:12;
>> +		__u64 reserved2:4;
>> +		__u64 rep_start:12;
>> +		__u64 reserved3:4;
>> +	};
>> +};
>> +
>>  /* hypercall status code */
>>  #define HV_STATUS_SUCCESS			0
>>  #define HV_STATUS_INVALID_HYPERCALL_CODE	2
>> --
>> 2.9.3

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7] x86/hyperv: implement rep hypercalls
  2017-04-07 19:48   ` Jork Loeser
@ 2017-04-10  9:00     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-10  9:00 UTC (permalink / raw)
  To: Jork Loeser
  Cc: devel, x86, linux-kernel, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt

Jork Loeser <Jork.Loeser@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, April 7, 2017 04:27
>> To: devel@linuxdriverproject.org; x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
>> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
>> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
>> Subject: [PATCH 4/7] x86/hyperv: implement rep hypercalls
>
>> diff --git a/arch/x86/include/asm/mshyperv.h
>> b/arch/x86/include/asm/mshyperv.h index 9a5f58b..a2c996b 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -4,6 +4,7 @@
>>  #include <linux/types.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/clocksource.h>
>> +#include <linux/nmi.h>
>>  #include <asm/hyperv.h>
>> 
>>  /*
>> @@ -253,6 +254,26 @@ static inline u64 hv_do_fast_hypercall8(u16 code,
>> u64 input1)  #endif  }
>> 
>> +static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, void
>> *input,
>> +				      void *output)
>> +{
>> +	union hv_hypercall_input hc_input = { .code = code,
>> +					      .rep_count = rep_count};
>
> Is there a way to statically verify the re-count not to exceed 12 bits? Could a dynamic check be justified? Perhaps a function comment?

I'd like to avoid dynamic checks here to keep this as fast as
possible. Static check is probably not an option as even the only user
we have now calculates this parameter dynamically. I'll add a comment to
the function, thanks!

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-07 19:42   ` Jork Loeser
@ 2017-04-10  9:07     ` Vitaly Kuznetsov
  2017-04-10 14:45       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-10  9:07 UTC (permalink / raw)
  To: Jork Loeser
  Cc: devel, x86, linux-kernel, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt

Jork Loeser <Jork.Loeser@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, April 7, 2017 04:27
>> To: devel@linuxdriverproject.org; x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
>> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
>> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
>> Subject: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
>
>> diff --git a/arch/x86/include/asm/mshyperv.h
>> b/arch/x86/include/asm/mshyperv.h index 331e834..9a5f58b 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -216,6 +216,43 @@ static inline u64 hv_do_hypercall(u64 control, void
>> *input, void *output)  #endif /* !x86_64 */  }
>> 
>> +/* Fast hypercall with 8 bytes of input and no output */ static inline
>> +u64 hv_do_fast_hypercall8(u16 code, u64 input1) {
>> +	union hv_hypercall_input control = {0};
>> +
>> +	control.code = code;
>> +	control.fast = 1;
>> +#ifdef CONFIG_X86_64
>> +	{
>> +		u64 hv_status;
>> +
>> +		__asm__ __volatile__("call *%3"
>> +				     : "=a" (hv_status)
>> +				     : "c" (control.as_uint64), "d" (input1),
>> +				       "m" (hv_hypercall_pg)
>> +				     : "cc", "r8", "%r9", "%r10", "%r11");
>> +		return hv_status;
> Clobber memory (are there such fast hypercalls)?
>

Hm, I was under an impression fast hypercalls have no output or put the
output in XMM* registers and we don't use them for now. Why clobbering
memory?

>> +	}
>> +#else
>> +	{
>> +		u32 hv_status_hi, hv_status_lo;
>> +
>> +		__asm__ __volatile__ ("call *%6"
>> +				      : "=d"(hv_status_hi),
>> +					"=a"(hv_status_lo) :
>> +					"d" (control.as_uint32_hi),
>> +					"a" (control.as_uint32_lo),
>> +					"c" ((u32)input1),
>> +					"b" ((u32)(input1 >> 32)),
>> +					"m" (hv_hypercall_pg)
>> +				      : "cc");
>> +
>> +		return hv_status_lo | ((u64)hv_status_hi << 32);
>> +	}
>> +#endif
> Please clobber ECX, EDI and ESI for x86. Clobber memory as well?

ECX is already in listed in inputs (lower part of input1) so it's
automatically clobbered. I'll add EDI and ESI to clobbers here, thanks!

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-08 16:47   ` KY Srinivasan
@ 2017-04-10 14:44     ` Vitaly Kuznetsov
  2017-04-10 17:34       ` Jork Loeser
  2017-04-10 22:03       ` KY Srinivasan
  0 siblings, 2 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-10 14:44 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: devel, x86, linux-kernel, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser

KY Srinivasan <kys@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, April 7, 2017 4:27 AM
>> To: devel@linuxdriverproject.org; x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
>> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
>> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
>> Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
>> 
>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>> this is supposed to work faster than IPIs.
>> 
>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>> we need to put the input somewhere in memory and we don't really want to
>> have memory allocation on each call so we pre-allocate per cpu memory
>> areas
>> on boot. These areas are of fixes size, limit them with an arbitrary number
>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>> 
>> pv_ops patching is happening very early so we need to separate
>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>> 
>> It is possible and easy to implement local TLB flushing too and there is
>> even a hint for that. However, I don't see a room for optimization on the
>> host side as both hypercall and native tlb flush will result in vmexit. The
>> hint is also not set on modern Hyper-V versions.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/hyperv/Makefile           |   2 +-
>>  arch/x86/hyperv/hv_init.c          |   2 +
>>  arch/x86/hyperv/mmu.c              | 128
>> +++++++++++++++++++++++++++++++++++++
>>  arch/x86/include/asm/mshyperv.h    |   2 +
>>  arch/x86/include/uapi/asm/hyperv.h |   7 ++
>>  arch/x86/kernel/cpu/mshyperv.c     |   1 +
>>  6 files changed, 141 insertions(+), 1 deletion(-)
>>  create mode 100644 arch/x86/hyperv/mmu.c
>> 
>> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
>> index 171ae09..367a820 100644
>> --- a/arch/x86/hyperv/Makefile
>> +++ b/arch/x86/hyperv/Makefile
>> @@ -1 +1 @@
>> -obj-y		:= hv_init.o
>> +obj-y		:= hv_init.o mmu.o
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 1c14088..2cf8a98 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -163,6 +163,8 @@ void hyperv_init(void)
>>  	hypercall_msr.guest_physical_address =
>> vmalloc_to_pfn(hv_hypercall_pg);
>>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> 
>> +	hyper_alloc_mmu();
>> +
>>  	/*
>>  	 * Register Hyper-V specific clocksource.
>>  	 */
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> new file mode 100644
>> index 0000000..fb487cb
>> --- /dev/null
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -0,0 +1,128 @@
>> +#include <linux/types.h>
>> +#include <linux/hyperv.h>
>> +#include <linux/slab.h>
>> +#include <asm/mshyperv.h>
>> +#include <asm/tlbflush.h>
>> +#include <asm/msr.h>
>> +#include <asm/fpu/api.h>
>> +
>> +/*
>> + * Arbitrary number; we need to pre-allocate per-cpu struct for doing TLB
>> + * flush hypercalls and we need to pick a size. '16' means we'll be able
>> + * to flush 16 * 4096 pages (256MB) with one hypercall.
>> + */
>> +#define HV_MMU_MAX_GVAS 16
>
> Did you experiment with different sizes here.

Actually, I was never able to see kernel trying to flush more than 4096
pages so we can get away with HV_MMU_MAX_GVAS=1. I went through the code
and didn't see any 'limit' for the number of pages we can ask to flush
so it can be a coincidence. Each addition gva_list item requires 8 bytes
only so I put and arbitrary '16' here.

>> +
>> +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls */
>> +struct hv_flush_pcpu {
>> +	struct {
>> +		__u64 address_space;
>> +		__u64 flags;
>> +		__u64 processor_mask;
>> +		__u64 gva_list[HV_MMU_MAX_GVAS];
>> +	} flush;
>> +
>> +	spinlock_t lock;
>> +};
>> +
> We may be supporting more than 64 CPUs in this hypercall. I am going to inquire with
> the Windows folks and get back to you.

Thanks! It is even specified in the specification:
"Future versions of the hypervisor may support more than 64 virtual processors per partition. In that
case, a new field will be added to the flags value that allows the caller to define the “processor bank” to
which the processor mask applies."

We, however, need to know where to put this in flags.

>
>> +static struct hv_flush_pcpu __percpu *pcpu_flush;
>> +
>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> +				    struct mm_struct *mm, unsigned long
>> start,
>> +				    unsigned long end)
>> +{
>> +	struct hv_flush_pcpu *flush;
>> +	unsigned long cur, flags;
>> +	u64 status = -1ULL;
>> +	int cpu, vcpu, gva_n;
>> +
>> +	if (!pcpu_flush || !hv_hypercall_pg)
>> +		goto do_native;
>> +
>> +	if (cpumask_empty(cpus))
>> +		return;
>> +
>> +	flush = this_cpu_ptr(pcpu_flush);
>> +	spin_lock_irqsave(&flush->lock, flags);
>> +
>> +	flush->flush.address_space = virt_to_phys(mm->pgd);
>> +	flush->flush.processor_mask = 0;
>> +	if (cpumask_equal(cpus, cpu_present_mask)) {
>> +		flush->flush.flags = HV_FLUSH_ALL_PROCESSORS;
>> +	} else {
>> +		flush->flush.flags = 0;
>> +		for_each_cpu(cpu, cpus) {
>> +			vcpu = vmbus_cpu_number_to_vp_number(cpu);
>> +			if (vcpu != -1 && vcpu < 64)
>> +				flush->flush.processor_mask |= 1 << vcpu;
>> +			else
>> +				goto unlock_do_native;
>> +		}
>> +	}
>> +
>> +	if (end == TLB_FLUSH_ALL) {
>> +		flush->flush.flags =
>> HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
>> +		status =
>> hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
>> +					 &flush->flush, NULL);
>> +	} else {
>> +		cur = start;
>> +more_gvas:
>> +		gva_n = 0;
>> +
>> +		do {
>> +			flush->flush.gva_list[gva_n] = cur & PAGE_MASK;
>> +			/*
>> +			 * Lower 12 bits encode the number of additional
>> +			 * pages to flush (in addition to the 'cur' page).
>> +			 */
>> +			if (end >= cur + PAGE_SIZE * PAGE_SIZE)
>> +				flush->flush.gva_list[gva_n] |=
>> ~PAGE_MASK;
>> +			else if (end > cur)
>> +				flush->flush.gva_list[gva_n] |=
>> +					(end - cur - 1) >> PAGE_SHIFT;
>> +
>> +			cur += PAGE_SIZE * PAGE_SIZE;
>> +			++gva_n;
>> +
>> +		} while (cur < end && gva_n < HV_MMU_MAX_GVAS);
>> +
>> +		status =
>> hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST,
>> +					     gva_n, &flush->flush, NULL);
>> +
>> +		if (!(status & 0xffff) && cur < end)
>> +			goto more_gvas;
>> +	}
>> +
>> +unlock_do_native:
>> +	spin_unlock_irqrestore(&flush->lock, flags);
>> +
>> +	if (!(status & 0xffff))
>> +		return;
>> +do_native:
>> +	native_flush_tlb_others(cpus, mm, start, end);
>> +}
>> +
>> +void hyperv_setup_mmu_ops(void)
>> +{
>> +	if (ms_hyperv.hints &
>> HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
>> +		pr_info("Hyper-V: Using hypercall for remote TLB flush\n");
>> +		pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
>> +	}
>> +}
>> +
>> +void hyper_alloc_mmu(void)
>> +{
>> +	int cpu;
>> +	struct hv_flush_pcpu *flush;
>> +
>> +	if (ms_hyperv.hints &
>> HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) {
>> +		pcpu_flush = alloc_percpu(struct hv_flush_pcpu);
>> +		if (!pcpu_flush)
>> +			return;
>> +
>> +		for_each_possible_cpu(cpu) {
>> +			flush = per_cpu_ptr(pcpu_flush, cpu);
>> +			spin_lock_init(&flush->lock);
>> +		}
>> +	}
>> +}
>> diff --git a/arch/x86/include/asm/mshyperv.h
>> b/arch/x86/include/asm/mshyperv.h
>> index 1293c84..a5041c3 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -301,6 +301,8 @@ static inline int
>> vmbus_cpu_number_to_vp_number(int cpu_number)
>>  }
>> 
>>  void hyperv_init(void);
>> +void hyperv_setup_mmu_ops(void);
>> +void hyper_alloc_mmu(void);
>>  void hyperv_report_panic(struct pt_regs *regs);
>>  bool hv_is_hypercall_page_setup(void);
>>  void hyperv_cleanup(void);
>> diff --git a/arch/x86/include/uapi/asm/hyperv.h
>> b/arch/x86/include/uapi/asm/hyperv.h
>> index c87e900..3d44036 100644
>> --- a/arch/x86/include/uapi/asm/hyperv.h
>> +++ b/arch/x86/include/uapi/asm/hyperv.h
>> @@ -239,6 +239,8 @@
>>  		(~((1ull <<
>> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1))
>> 
>>  /* Declare the various hypercall operations. */
>> +#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
>> +#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
>>  #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
>>  #define HVCALL_POST_MESSAGE			0x005c
>>  #define HVCALL_SIGNAL_EVENT			0x005d
>> @@ -256,6 +258,11 @@
>>  #define HV_PROCESSOR_POWER_STATE_C2		2
>>  #define HV_PROCESSOR_POWER_STATE_C3		3
>> 
>> +#define HV_FLUSH_ALL_PROCESSORS			0x00000001
>> +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	0x00000002
>> +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY	0x00000004
>> +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	0x00000008
>> +
>>  /* Hypercall interface */
>>  union hv_hypercall_input {
>>  	u64 as_uint64;
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c
>> b/arch/x86/kernel/cpu/mshyperv.c
>> index 04cb8d3..fc228d8 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -233,6 +233,7 @@ static void __init ms_hyperv_init_platform(void)
>>  	 * Setup the hook to get control post apic initialization.
>>  	 */
>>  	x86_platform.apic_post_init = hyperv_init;
>> +	hyperv_setup_mmu_ops();
>>  #endif
>>  }
>> 
>> --
>> 2.9.3

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-10  9:07     ` Vitaly Kuznetsov
@ 2017-04-10 14:45       ` Vitaly Kuznetsov
  2017-04-10 17:14         ` Jork Loeser
  0 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-10 14:45 UTC (permalink / raw)
  To: Jork Loeser
  Cc: devel, x86, linux-kernel, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Jork Loeser <Jork.Loeser@microsoft.com> writes:
>

[snip]

>
>>> +	}
>>> +#else
>>> +	{
>>> +		u32 hv_status_hi, hv_status_lo;
>>> +
>>> +		__asm__ __volatile__ ("call *%6"
>>> +				      : "=d"(hv_status_hi),
>>> +					"=a"(hv_status_lo) :
>>> +					"d" (control.as_uint32_hi),
>>> +					"a" (control.as_uint32_lo),
>>> +					"c" ((u32)input1),
>>> +					"b" ((u32)(input1 >> 32)),
>>> +					"m" (hv_hypercall_pg)
>>> +				      : "cc");
>>> +
>>> +		return hv_status_lo | ((u64)hv_status_hi << 32);
>>> +	}
>>> +#endif
>> Please clobber ECX, EDI and ESI for x86. Clobber memory as well?
>
> ECX is already in listed in inputs (lower part of input1) so it's
> automatically clobbered. I'll add EDI and ESI to clobbers here, thanks!

Oh, I see what you mean - hypervisor is allowed to write to ecx too, we
need to pass it with '+'. Will do, thanks!

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/7] x86/hyper-v: fast hypercall implementation
  2017-04-10 14:45       ` Vitaly Kuznetsov
@ 2017-04-10 17:14         ` Jork Loeser
  0 siblings, 0 replies; 26+ messages in thread
From: Jork Loeser @ 2017-04-10 17:14 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: devel, x86, linux-kernel, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > Jork Loeser <Jork.Loeser@microsoft.com> writes:
> >
> 
> [snip]
> 
> >
> >>> +	}
> >>> +#else
> >>> +	{
> >>> +		u32 hv_status_hi, hv_status_lo;
> >>> +
> >>> +		__asm__ __volatile__ ("call *%6"
> >>> +				      : "=d"(hv_status_hi),
> >>> +					"=a"(hv_status_lo) :
> >>> +					"d" (control.as_uint32_hi),
> >>> +					"a" (control.as_uint32_lo),
> >>> +					"c" ((u32)input1),
> >>> +					"b" ((u32)(input1 >> 32)),
> >>> +					"m" (hv_hypercall_pg)
> >>> +				      : "cc");
> >>> +
> >>> +		return hv_status_lo | ((u64)hv_status_hi << 32);
> >>> +	}
> >>> +#endif
> >> Please clobber ECX, EDI and ESI for x86. Clobber memory as well?
> >
> > ECX is already in listed in inputs (lower part of input1) so it's
> > automatically clobbered. I'll add EDI and ESI to clobbers here, thanks!
> 
> Oh, I see what you mean - hypervisor is allowed to write to ecx too, we need
> to pass it with '+'. Will do, thanks!

Yes, thank you Vitaly!

As for memory clobber, we would want that if there were hypercalls that modify memory and can issued via the "fast" ABI. If there are not, we are fine. 

Regards,
Jork

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-07 20:46   ` Jork Loeser
@ 2017-04-10 17:21     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-10 17:21 UTC (permalink / raw)
  To: Jork Loeser
  Cc: devel, x86, linux-kernel, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt

Jork Loeser <Jork.Loeser@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, April 7, 2017 04:27
>> To: devel@linuxdriverproject.org; x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo
>> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
>> Rostedt <rostedt@goodmis.org>; Jork Loeser <Jork.Loeser@microsoft.com>
>> Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
>
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c new file
>> mode 100644 index 0000000..fb487cb
>> --- /dev/null
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -0,0 +1,128 @@
>> +#include <linux/types.h>
>> +#include <linux/hyperv.h>
>> +#include <linux/slab.h>
>> +#include <asm/mshyperv.h>
>> +#include <asm/tlbflush.h>
>> +#include <asm/msr.h>
>> +#include <asm/fpu/api.h>
>> +
>> +/*
>> + * Arbitrary number; we need to pre-allocate per-cpu struct for doing
>> +TLB
>> + * flush hypercalls and we need to pick a size. '16' means we'll be
>> +able
>> + * to flush 16 * 4096 pages (256MB) with one hypercall.
>> + */
>> +#define HV_MMU_MAX_GVAS 16
>> +
>> +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls */
>> +struct hv_flush_pcpu {
>> +	struct {
>> +		__u64 address_space;
>> +		__u64 flags;
>> +		__u64 processor_mask;
>> +		__u64 gva_list[HV_MMU_MAX_GVAS];
>> +	} flush;
>> +
>> +	spinlock_t lock;
>> +};
> Does this need an alignment declaration, so that the flush portion never crosses a page boundary when allocated with alloc_percpu()?
>

Thanks for pointing this out! I would slightly prefer we use
__alloc_percpu() and specify something like roundup_pow_of_two()
alignment.

>> +
>> +static struct hv_flush_pcpu __percpu *pcpu_flush;
>> +
>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> +				    struct mm_struct *mm, unsigned long
>> start,
>> +				    unsigned long end)
>> +{
>> +	struct hv_flush_pcpu *flush;
>> +	unsigned long cur, flags;
>> +	u64 status = -1ULL;
>> +	int cpu, vcpu, gva_n;
>> +
>> +	if (!pcpu_flush || !hv_hypercall_pg)
>> +		goto do_native;
>> +
>> +	if (cpumask_empty(cpus))
>> +		return;
>> +
>> +	flush = this_cpu_ptr(pcpu_flush);
>> +	spin_lock_irqsave(&flush->lock, flags);
>
> What purpose does the spinlock on the CPU-local struct serve? Would a
> local_irq_save() do?

Now I'm not sure why I put it here in the first place :-) Yes, it would
probably do.

> Could this be called from NMI context, such as from the debugger?
>

NMI - I don't think so, native function does smp_call_function_many()
which WARNs even if it's called with interrupts disabled.

> Could this be a long-running loop, e.g. due to a large start/end
> range? If so, consider disabling interrupts only in the inner loop /
> flush the entire space?

The decision for flushing the entire space should probably be done
elsewhere as it is not implementation-specific (and I think it's done
somewhere as I never see requests to flush more than 4096 pages in my
testing).

I can disable interrupts in the inner loop but we'll have to stash flags
and calculated cpu_mask to some local variables. This is not supposed to
be expensive.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-10 14:44     ` Vitaly Kuznetsov
@ 2017-04-10 17:34       ` Jork Loeser
  2017-04-10 22:03       ` KY Srinivasan
  1 sibling, 0 replies; 26+ messages in thread
From: Jork Loeser @ 2017-04-10 17:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov, KY Srinivasan
  Cc: devel, x86, linux-kernel, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]

> > We may be supporting more than 64 CPUs in this hypercall. I am going
> > to inquire with the Windows folks and get back to you.
> 
> Thanks! It is even specified in the specification:
> "Future versions of the hypervisor may support more than 64 virtual
> processors per partition. In that case, a new field will be added to the flags
> value that allows the caller to define the “processor bank” to which the
> processor mask applies."
> 
> We, however, need to know where to put this in flags.

Would the HvFlushVirtualAddressListEx hypercall do? Is there a doc update/clarification needed?

Regards,
Jork

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
  2017-04-10 14:44     ` Vitaly Kuznetsov
  2017-04-10 17:34       ` Jork Loeser
@ 2017-04-10 22:03       ` KY Srinivasan
  1 sibling, 0 replies; 26+ messages in thread
From: KY Srinivasan @ 2017-04-10 22:03 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: devel, x86, linux-kernel, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Steven Rostedt,
	Jork Loeser



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Monday, April 10, 2017 7:44 AM
> To: KY Srinivasan <kys@microsoft.com>
> Cc: devel@linuxdriverproject.org; x86@kernel.org; linux-
> kernel@vger.kernel.org; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; Thomas Gleixner
> <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin
> <hpa@zytor.com>; Steven Rostedt <rostedt@goodmis.org>; Jork Loeser
> <Jork.Loeser@microsoft.com>
> Subject: Re: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
> 
> KY Srinivasan <kys@microsoft.com> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: Friday, April 7, 2017 4:27 AM
> >> To: devel@linuxdriverproject.org; x86@kernel.org
> >> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> >> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> >> <sthemmin@microsoft.com>; Thomas Gleixner <tglx@linutronix.de>;
> Ingo
> >> Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Steven
> >> Rostedt <rostedt@goodmis.org>; Jork Loeser
> <Jork.Loeser@microsoft.com>
> >> Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
> >>
> >> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> >> this is supposed to work faster than IPIs.
> >>
> >> Implementation details: to do HvFlushVirtualAddress{Space,List}
> hypercalls
> >> we need to put the input somewhere in memory and we don't really
> want to
> >> have memory allocation on each call so we pre-allocate per cpu memory
> >> areas
> >> on boot. These areas are of fixes size, limit them with an arbitrary number
> >> of 16 (16 gvas are able to specify 16 * 4096 pages).
> >>
> >> pv_ops patching is happening very early so we need to separate
> >> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> >>
> >> It is possible and easy to implement local TLB flushing too and there is
> >> even a hint for that. However, I don't see a room for optimization on the
> >> host side as both hypercall and native tlb flush will result in vmexit. The
> >> hint is also not set on modern Hyper-V versions.
> >>
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/hyperv/Makefile           |   2 +-
> >>  arch/x86/hyperv/hv_init.c          |   2 +
> >>  arch/x86/hyperv/mmu.c              | 128
> >> +++++++++++++++++++++++++++++++++++++
> >>  arch/x86/include/asm/mshyperv.h    |   2 +
> >>  arch/x86/include/uapi/asm/hyperv.h |   7 ++
> >>  arch/x86/kernel/cpu/mshyperv.c     |   1 +
> >>  6 files changed, 141 insertions(+), 1 deletion(-)
> >>  create mode 100644 arch/x86/hyperv/mmu.c
> >>
> >> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> >> index 171ae09..367a820 100644
> >> --- a/arch/x86/hyperv/Makefile
> >> +++ b/arch/x86/hyperv/Makefile
> >> @@ -1 +1 @@
> >> -obj-y		:= hv_init.o
> >> +obj-y		:= hv_init.o mmu.o
> >> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> >> index 1c14088..2cf8a98 100644
> >> --- a/arch/x86/hyperv/hv_init.c
> >> +++ b/arch/x86/hyperv/hv_init.c
> >> @@ -163,6 +163,8 @@ void hyperv_init(void)
> >>  	hypercall_msr.guest_physical_address =
> >> vmalloc_to_pfn(hv_hypercall_pg);
> >>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> >>
> >> +	hyper_alloc_mmu();
> >> +
> >>  	/*
> >>  	 * Register Hyper-V specific clocksource.
> >>  	 */
> >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> >> new file mode 100644
> >> index 0000000..fb487cb
> >> --- /dev/null
> >> +++ b/arch/x86/hyperv/mmu.c
> >> @@ -0,0 +1,128 @@
> >> +#include <linux/types.h>
> >> +#include <linux/hyperv.h>
> >> +#include <linux/slab.h>
> >> +#include <asm/mshyperv.h>
> >> +#include <asm/tlbflush.h>
> >> +#include <asm/msr.h>
> >> +#include <asm/fpu/api.h>
> >> +
> >> +/*
> >> + * Arbitrary number; we need to pre-allocate per-cpu struct for doing
> TLB
> >> + * flush hypercalls and we need to pick a size. '16' means we'll be able
> >> + * to flush 16 * 4096 pages (256MB) with one hypercall.
> >> + */
> >> +#define HV_MMU_MAX_GVAS 16
> >
> > Did you experiment with different sizes here.
> 
> Actually, I was never able to see kernel trying to flush more than 4096
> pages so we can get away with HV_MMU_MAX_GVAS=1. I went through the
> code
> and didn't see any 'limit' for the number of pages we can ask to flush
> so it can be a coincidence. Each addition gva_list item requires 8 bytes
> only so I put and arbitrary '16' here.
> 
> >> +
> >> +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls
> */
> >> +struct hv_flush_pcpu {
> >> +	struct {
> >> +		__u64 address_space;
> >> +		__u64 flags;
> >> +		__u64 processor_mask;
> >> +		__u64 gva_list[HV_MMU_MAX_GVAS];
> >> +	} flush;
> >> +
> >> +	spinlock_t lock;
> >> +};
> >> +
> > We may be supporting more than 64 CPUs in this hypercall. I am going to
> inquire with
> > the Windows folks and get back to you.
> 
> Thanks! It is even specified in the specification:
> "Future versions of the hypervisor may support more than 64 virtual
> processors per partition. In that
> case, a new field will be added to the flags value that allows the caller to
> define the “processor bank” to
> which the processor mask applies."
> 
> We, however, need to know where to put this in flags.

There is a new Hypercall for targeting more than 64 VCPUs. For now, we can check if the CPU mask
Is specifying more than 64 CPUs and use native call if that is the case.

K. Y

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-04-10 22:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 11:26 [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
2017-04-07 11:26 ` [PATCH 1/7] x86/hyperv: make hv_do_hypercall() inline Vitaly Kuznetsov
2017-04-07 19:38   ` Jork Loeser
2017-04-07 11:26 ` [PATCH 2/7] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
2017-04-07 19:42   ` Jork Loeser
2017-04-10  9:07     ` Vitaly Kuznetsov
2017-04-10 14:45       ` Vitaly Kuznetsov
2017-04-10 17:14         ` Jork Loeser
2017-04-08 15:18   ` KY Srinivasan
2017-04-10  8:46     ` Vitaly Kuznetsov
2017-04-07 11:26 ` [PATCH 3/7] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
2017-04-07 11:26 ` [PATCH 4/7] x86/hyperv: implement rep hypercalls Vitaly Kuznetsov
2017-04-07 19:48   ` Jork Loeser
2017-04-10  9:00     ` Vitaly Kuznetsov
2017-04-07 11:26 ` [PATCH 5/7] hyper-v: globalize vp_index Vitaly Kuznetsov
2017-04-08 15:41   ` KY Srinivasan
2017-04-07 11:27 ` [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Vitaly Kuznetsov
2017-04-07 20:46   ` Jork Loeser
2017-04-10 17:21     ` Vitaly Kuznetsov
2017-04-08 16:47   ` KY Srinivasan
2017-04-10 14:44     ` Vitaly Kuznetsov
2017-04-10 17:34       ` Jork Loeser
2017-04-10 22:03       ` KY Srinivasan
2017-04-07 11:27 ` [PATCH 7/7] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov
2017-04-07 14:38   ` Steven Rostedt
2017-04-08 14:57 ` [PATCH 0/7] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements KY Srinivasan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.