linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
@ 2017-10-31 22:18 mikelley
  2017-11-01 12:00 ` Vitaly Kuznetsov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: mikelley @ 2017-10-31 22:18 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, marcelo.cerri, sthemmin, kys
  Cc: Michael Kelley

From: Michael Kelley <mikelley@microsoft.com>

The 2016 version of Hyper-V offers the option to operate the guest VM
per-vcpu stimer's in Direct Mode, which means the timer interupts on its
own vector rather than queueing a VMbus message.  Direct Mode reduces
timer processing overhead in both the hypervisor and the guest, and
avoids having timer interrupts pollute the VMbus interrupt stream for
the synthetic NIC and storage.  This patch enables Direct Mode by
default on stimer0 (which is the only stimer used in Linux on Hyper-V)
when running on a version of Hyper-V that supports it, with a hv_vmbus
module parameter for disabling Direct Mode and reverting to the old
behavior.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 arch/x86/include/asm/mshyperv.h    | 14 ++++++++
 arch/x86/include/uapi/asm/hyperv.h | 26 ++++++++++++++
 arch/x86/kernel/cpu/mshyperv.c     | 64 +++++++++++++++++++++++++++++++++-
 drivers/hv/hv.c                    | 71 ++++++++++++++++++++++++++++++++++++--
 drivers/hv/hyperv_vmbus.h          |  4 ++-
 5 files changed, 175 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 740dc97..1bba1d7 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -4,6 +4,8 @@
 #include <linux/types.h>
 #include <linux/atomic.h>
 #include <linux/nmi.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
 #include <asm/io.h>
 #include <asm/hyperv.h>
 
@@ -374,4 +376,16 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 	return NULL;
 }
 #endif
+
+/* Per architecture routines for stimer0 Direct Mode handling.  On x86/x64
+ * there are no percpu actions to take.
+ */
+#if IS_ENABLED(CONFIG_HYPERV)
+static inline void hv_enable_stimer0_percpu_irq(int irq) { }
+static inline void hv_disable_stimer0_percpu_irq(int irq) { }
+extern int hv_allocate_stimer0_irq(int *irq, int *vector);
+extern void hv_deallocate_stimer0_irq(int irq);
+extern void hv_ack_stimer0_interrupt(struct irq_desc *desc);
+#endif
+
 #endif
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index f65d125..408cf3e 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -112,6 +112,22 @@
 #define HV_X64_GUEST_IDLE_STATE_AVAILABLE		(1 << 5)
 /* Guest crash data handler available */
 #define HV_X64_GUEST_CRASH_MSR_AVAILABLE		(1 << 10)
+/* Debug MSRs available */
+#define HV_X64_DEBUG_MSR_AVAILABLE			(1 << 11)
+/* Support for Non-Privileged Instruction Execution Prevention is available */
+#define HV_X64_NPIEP_AVAILABLE				(1 << 12)
+/* Support for DisableHypervisor is available */
+#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE		(1 << 13)
+/* Extended GVA Ranges for Flush Virtual Address list is available */
+#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE		(1 << 14)
+/* Return Hypercall output via XMM registers is available */
+#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE		(1 << 15)
+/* SINT polling mode available */
+#define HV_X64_SINT_POLLING_MODE_AVAILABLE		(1 << 17)
+/* Hypercall MSR lock is available */
+#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE		(1 << 18)
+/* stimer direct mode is available */
+#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE		(1 << 19)
 
 /*
  * Implementation recommendations. Indicates which behaviors the hypervisor
@@ -300,6 +316,16 @@ enum HV_GENERIC_SET_FORMAT {
 
 #define HV_SYNIC_STIMER_COUNT		(4)
 
+/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ is a fake
+ * because stimer's in Direct Mode simply interrupt on the specified vector,
+ * without using a particular IOAPIC pin. But we use the IRQ allocation
+ * machinery, so we need a hardware IRQ #.  This value is somewhat arbitrary,
+ * but it should not be a legacy IRQ (0 to 15), and should fit within the
+ * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So any value
+ * between 16 and 23 should be good.
+ */
+#define HV_STIMER0_IRQNR		18
+
 /* Define synthetic interrupt controller message constants. */
 #define HV_MESSAGE_SIZE			(256)
 #define HV_MESSAGE_PAYLOAD_BYTE_COUNT	(240)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 236324e8..88dc243 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -19,7 +19,10 @@
 #include <linux/efi.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/irqdesc.h>
 #include <linux/kexec.h>
+#include <linux/acpi.h>
 #include <asm/processor.h>
 #include <asm/hypervisor.h>
 #include <asm/hyperv.h>
@@ -27,6 +30,7 @@
 #include <asm/desc.h>
 #include <asm/irq_regs.h>
 #include <asm/i8259.h>
+#include <asm/irqdomain.h>
 #include <asm/apic.h>
 #include <asm/timer.h>
 #include <asm/reboot.h>
@@ -69,6 +73,64 @@ void hv_remove_vmbus_irq(void)
 EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
 EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
 
+
+/* Routines to do per-architecture handling of stimer0 when in Direct Mode */
+
+void hv_ack_stimer0_interrupt(struct irq_desc *desc)
+{
+	ack_APIC_irq();
+}
+
+static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
+				const struct cpumask *mask)
+{
+	cpumask_copy(retmask, cpu_online_mask);
+}
+
+int hv_allocate_stimer0_irq(int *irq, int *vector)
+{
+	int		localirq;
+	int		result;
+	struct irq_data *irq_data;
+
+	/* The normal APIC vector allocation domain allows allocation of vectors
+	 * only for the calling CPU.  So we change the allocation domain to one
+	 * that allows vectors to be allocated in all online CPUs.  This
+	 * change is fine in a Hyper-V VM because VMs don't have the usual
+	 * complement of interrupting devices.
+	 */
+	apic->vector_allocation_domain = allonline_vector_allocation_domain;
+	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
+				ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
+	if (localirq < 0) {
+		pr_err("Cannot register stimer0 gsi. Error %d", localirq);
+		return -1;
+	}
+
+	/* We pass in a dummy IRQ handler because architecture independent code
+	 * will later override the IRQ domain interrupt handler and set it to a
+	 * Hyper-V specific handler.
+	 */
+	result = request_irq(localirq, (irq_handler_t)(-1), 0,
+					"Hyper-V stimer0", NULL);
+	if (result) {
+		pr_err("Cannot request stimer0 irq. Error %d", result);
+		acpi_unregister_gsi(localirq);
+		return -1;
+	}
+	irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);
+	*vector = irqd_cfg(irq_data)->vector;
+	*irq = localirq;
+	return 0;
+}
+
+void hv_deallocate_stimer0_irq(int irq)
+{
+	free_irq(irq, NULL);
+	acpi_unregister_gsi(irq);
+}
+
+
 void hv_setup_kexec_handler(void (*handler)(void))
 {
 	hv_kexec_handler = handler;
@@ -195,7 +257,7 @@ static void __init ms_hyperv_init_platform(void)
 		hv_host_info_ecx = cpuid_ecx(HVCPUID_VERSION);
 		hv_host_info_edx = cpuid_edx(HVCPUID_VERSION);
 
-		pr_info("Hyper-V Host Build:%d-%d.%d-%d-%d.%d\n",
+		pr_info("Hyper-V: Host Build %d-%d.%d-%d-%d.%d\n",
 			hv_host_info_eax, hv_host_info_ebx >> 16,
 			hv_host_info_ebx & 0xFFFF, hv_host_info_ecx,
 			hv_host_info_edx >> 24, hv_host_info_edx & 0xFFFFFF);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index fe96aab..68ac474 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -27,8 +27,12 @@
 #include <linux/vmalloc.h>
 #include <linux/hyperv.h>
 #include <linux/version.h>
-#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
+#include <linux/random.h>
+#include <linux/kernel_stat.h>
 #include <linux/clockchips.h>
+#include <linux/module.h>
 #include <asm/hyperv.h>
 #include <asm/mshyperv.h>
 #include "hyperv_vmbus.h"
@@ -38,6 +42,21 @@ struct hv_context hv_context = {
 	.synic_initialized	= false,
 };
 
+/* If true, we're using Direct Mode for stimer0, and the timer will do it own
+ * interrupt when it expires. If false, stimer0 is not using Direct Mode and
+ * will send a VMbus message when it expires. We prefer to use Direct Mode,
+ * but not all versions of Hyper-V support Direct Mode.
+ *
+ * While Hyper-V provides a total of four stimer's per CPU, Linux use only
+ * stimer0.
+ */
+static bool	stimer_direct_mode;
+static int	stimer0_irq;
+static int	stimer0_vector;
+static bool	direct_mode_disable;
+module_param(direct_mode_disable, bool, 0444);
+MODULE_PARM_DESC(direct_mode_disable, "Set to Y to disable Direct Mode.");
+
 #define HV_TIMER_FREQUENCY (10 * 1000 * 1000) /* 100ns period */
 #define HV_MAX_MAX_DELTA_TICKS 0xffffffff
 #define HV_MIN_DELTA_TICKS 1
@@ -52,7 +71,12 @@ int hv_init(void)
 	hv_context.cpu_context = alloc_percpu(struct hv_per_cpu_context);
 	if (!hv_context.cpu_context)
 		return -ENOMEM;
+	stimer_direct_mode = (ms_hyperv.misc_features &
+		HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? true : false;
 
+	/* Apply boot command line override to the Direct Mode setting */
+	if (direct_mode_disable)
+		stimer_direct_mode = false;
 	return 0;
 }
 
@@ -91,6 +115,23 @@ int hv_post_message(union hv_connection_id connection_id,
 	return status & 0xFFFF;
 }
 
+/* ISR for when stimer0 is operating in Direct Mode.  Direct Mode does
+ * not use VMBus or any VMBus messages, so process here and not in the
+ * VMBus driver code.
+ */
+
+static void hv_stimer0_isr(struct irq_desc *desc)
+{
+	struct hv_per_cpu_context *hv_cpu;
+
+	__this_cpu_inc(*desc->kstat_irqs);
+	__this_cpu_inc(kstat.irqs_sum);
+	hv_ack_stimer0_interrupt(desc);
+	hv_cpu = this_cpu_ptr(hv_context.cpu_context);
+	hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
+	add_interrupt_randomness(desc->irq_data.irq, 0);
+}
+
 static int hv_ce_set_next_event(unsigned long delta,
 				struct clock_event_device *evt)
 {
@@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct clock_event_device *evt)
 {
 	hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
 	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
+	if (stimer_direct_mode)
+		hv_disable_stimer0_percpu_irq(stimer0_irq);
 
 	return 0;
 }
@@ -116,9 +159,24 @@ static int hv_ce_set_oneshot(struct clock_event_device *evt)
 {
 	union hv_timer_config timer_cfg;
 
+	timer_cfg.as_uint64 = 0; /* Zero everything */
 	timer_cfg.enable = 1;
 	timer_cfg.auto_enable = 1;
-	timer_cfg.sintx = VMBUS_MESSAGE_SINT;
+	if (stimer_direct_mode) {
+
+		/* When it expires, the timer will directly interrupt
+		 * on the specific hardware vector.
+		 */
+		timer_cfg.direct_mode = 1;
+		timer_cfg.apic_vector = stimer0_vector;
+		hv_enable_stimer0_percpu_irq(stimer0_irq);
+	} else {
+		/* When it expires, the timer will generate a VMbus message,
+		 * to be handled by the normal VMbus interrupt handler.
+		 */
+		timer_cfg.direct_mode = 0;
+		timer_cfg.sintx = VMBUS_MESSAGE_SINT;
+	}
 	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64);
 
 	return 0;
@@ -191,6 +249,12 @@ int hv_synic_alloc(void)
 		INIT_LIST_HEAD(&hv_cpu->chan_list);
 	}
 
+	if (stimer_direct_mode) {
+		if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
+			goto err;
+		irq_set_handler(stimer0_irq, hv_stimer0_isr);
+	}
+
 	return 0;
 err:
 	return -ENOMEM;
@@ -292,6 +356,9 @@ void hv_synic_clockevents_cleanup(void)
 	if (!(ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE))
 		return;
 
+	if (stimer_direct_mode)
+		hv_deallocate_stimer0_irq(stimer0_irq);
+
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index de6f01d..ee8c89b 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -55,7 +55,9 @@
 		u64 periodic:1;
 		u64 lazy:1;
 		u64 auto_enable:1;
-		u64 reserved_z0:12;
+		u64 apic_vector:8;
+		u64 direct_mode:1;
+		u64 reserved_z0:3;
 		u64 sintx:4;
 		u64 reserved_z1:44;
 	};
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-10-31 22:18 [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0 mikelley
@ 2017-11-01 12:00 ` Vitaly Kuznetsov
  2017-11-01 13:41   ` Vitaly Kuznetsov
  2017-11-01 14:51 ` Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2017-11-01 12:00 UTC (permalink / raw)
  To: mikelley
  Cc: gregkh, linux-kernel, devel, olaf, apw, jasowang,
	leann.ogasawara, marcelo.cerri, sthemmin, kys, Michael Kelley

mikelley@exchange.microsoft.com writes:

> From: Michael Kelley <mikelley@microsoft.com>
>
> The 2016 version of Hyper-V offers the option to operate the guest VM
> per-vcpu stimer's in Direct Mode, which means the timer interupts on its
> own vector rather than queueing a VMbus message.  Direct Mode reduces
> timer processing overhead in both the hypervisor and the guest, and
> avoids having timer interrupts pollute the VMbus interrupt stream for
> the synthetic NIC and storage.  This patch enables Direct Mode by
> default on stimer0 (which is the only stimer used in Linux on Hyper-V)
> when running on a version of Hyper-V that supports it, with a hv_vmbus
> module parameter for disabling Direct Mode and reverting to the old
> behavior.
>
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---
>  arch/x86/include/asm/mshyperv.h    | 14 ++++++++
>  arch/x86/include/uapi/asm/hyperv.h | 26 ++++++++++++++
>  arch/x86/kernel/cpu/mshyperv.c     | 64 +++++++++++++++++++++++++++++++++-
>  drivers/hv/hv.c                    | 71 ++++++++++++++++++++++++++++++++++++--
>  drivers/hv/hyperv_vmbus.h          |  4 ++-
>  5 files changed, 175 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 740dc97..1bba1d7 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -4,6 +4,8 @@
>  #include <linux/types.h>
>  #include <linux/atomic.h>
>  #include <linux/nmi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
>  #include <asm/io.h>
>  #include <asm/hyperv.h>
>
> @@ -374,4 +376,16 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
>  	return NULL;
>  }
>  #endif
> +
> +/* Per architecture routines for stimer0 Direct Mode handling.  On x86/x64
> + * there are no percpu actions to take.
> + */
> +#if IS_ENABLED(CONFIG_HYPERV)
> +static inline void hv_enable_stimer0_percpu_irq(int irq) { }
> +static inline void hv_disable_stimer0_percpu_irq(int irq) { }
> +extern int hv_allocate_stimer0_irq(int *irq, int *vector);
> +extern void hv_deallocate_stimer0_irq(int irq);
> +extern void hv_ack_stimer0_interrupt(struct irq_desc *desc);
> +#endif
> +
>  #endif
> diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> index f65d125..408cf3e 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -112,6 +112,22 @@
>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE		(1 << 5)
>  /* Guest crash data handler available */
>  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE		(1 << 10)
> +/* Debug MSRs available */
> +#define HV_X64_DEBUG_MSR_AVAILABLE			(1 << 11)
> +/* Support for Non-Privileged Instruction Execution Prevention is available */
> +#define HV_X64_NPIEP_AVAILABLE				(1 << 12)
> +/* Support for DisableHypervisor is available */
> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE		(1 << 13)
> +/* Extended GVA Ranges for Flush Virtual Address list is available */
> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE		(1 << 14)
> +/* Return Hypercall output via XMM registers is available */
> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE		(1 << 15)
> +/* SINT polling mode available */
> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE		(1 << 17)
> +/* Hypercall MSR lock is available */
> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE		(1 << 18)
> +/* stimer direct mode is available */
> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE		(1 << 19)
>
>  /*
>   * Implementation recommendations. Indicates which behaviors the hypervisor
> @@ -300,6 +316,16 @@ enum HV_GENERIC_SET_FORMAT {
>
>  #define HV_SYNIC_STIMER_COUNT		(4)
>
> +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ is a fake
> + * because stimer's in Direct Mode simply interrupt on the specified vector,
> + * without using a particular IOAPIC pin. But we use the IRQ allocation
> + * machinery, so we need a hardware IRQ #.  This value is somewhat arbitrary,
> + * but it should not be a legacy IRQ (0 to 15), and should fit within the
> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So any value
> + * between 16 and 23 should be good.
> + */

(nitpick): all comments in the patch are in 'net' style:

 /* This is a
  * comment.
  */

While in Hyper-V files (and x86 in general) the 'normal' style is used:

 /*
  * This is a
  * comment.
  */

But checkpatch.pl doesn't complain.

> +#define HV_STIMER0_IRQNR		18
> +
>  /* Define synthetic interrupt controller message constants. */
>  #define HV_MESSAGE_SIZE			(256)
>  #define HV_MESSAGE_PAYLOAD_BYTE_COUNT	(240)
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 236324e8..88dc243 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -19,7 +19,10 @@
>  #include <linux/efi.h>
>  #include <linux/interrupt.h>
>  #include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/irqdesc.h>
>  #include <linux/kexec.h>
> +#include <linux/acpi.h>
>  #include <asm/processor.h>
>  #include <asm/hypervisor.h>
>  #include <asm/hyperv.h>
> @@ -27,6 +30,7 @@
>  #include <asm/desc.h>
>  #include <asm/irq_regs.h>
>  #include <asm/i8259.h>
> +#include <asm/irqdomain.h>
>  #include <asm/apic.h>
>  #include <asm/timer.h>
>  #include <asm/reboot.h>
> @@ -69,6 +73,64 @@ void hv_remove_vmbus_irq(void)
>  EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
>  EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
>
> +
> +/* Routines to do per-architecture handling of stimer0 when in Direct Mode */
> +
> +void hv_ack_stimer0_interrupt(struct irq_desc *desc)
> +{
> +	ack_APIC_irq();
> +}
> +
> +static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
> +				const struct cpumask *mask)
> +{
> +	cpumask_copy(retmask, cpu_online_mask);
> +}
> +
> +int hv_allocate_stimer0_irq(int *irq, int *vector)
> +{
> +	int		localirq;
> +	int		result;
> +	struct irq_data *irq_data;
> +
> +	/* The normal APIC vector allocation domain allows allocation of vectors
> +	 * only for the calling CPU.  So we change the allocation domain to one
> +	 * that allows vectors to be allocated in all online CPUs.  This
> +	 * change is fine in a Hyper-V VM because VMs don't have the usual
> +	 * complement of interrupting devices.
> +	 */
> +	apic->vector_allocation_domain = allonline_vector_allocation_domain;
> +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> +				ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> +	if (localirq < 0) {
> +		pr_err("Cannot register stimer0 gsi. Error %d", localirq);
> +		return -1;
> +	}
> +
> +	/* We pass in a dummy IRQ handler because architecture independent code
> +	 * will later override the IRQ domain interrupt handler and set it to a
> +	 * Hyper-V specific handler.
> +	 */
> +	result = request_irq(localirq, (irq_handler_t)(-1), 0,
> +					"Hyper-V stimer0", NULL);

I remember K. Y. told me he has patches in his stash to implement
Hyper-V-specific APIC. Is this still in works or there is a reason to
not do it? Just curious.

> +	if (result) {
> +		pr_err("Cannot request stimer0 irq. Error %d", result);
> +		acpi_unregister_gsi(localirq);
> +		return -1;
> +	}
> +	irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);

In theory irq_domain_get_irq_data() can return NULL.

> +	*vector = irqd_cfg(irq_data)->vector;
> +	*irq = localirq;
> +	return 0;
> +}
> +
> +void hv_deallocate_stimer0_irq(int irq)
> +{
> +	free_irq(irq, NULL);
> +	acpi_unregister_gsi(irq);
> +}
> +
> +
>  void hv_setup_kexec_handler(void (*handler)(void))
>  {
>  	hv_kexec_handler = handler;
> @@ -195,7 +257,7 @@ static void __init ms_hyperv_init_platform(void)
>  		hv_host_info_ecx = cpuid_ecx(HVCPUID_VERSION);
>  		hv_host_info_edx = cpuid_edx(HVCPUID_VERSION);
>
> -		pr_info("Hyper-V Host Build:%d-%d.%d-%d-%d.%d\n",
> +		pr_info("Hyper-V: Host Build %d-%d.%d-%d-%d.%d\n",

While this definitely makes logs nicer uhe change is probably not needed
in this particular patch.

>  			hv_host_info_eax, hv_host_info_ebx >> 16,
>  			hv_host_info_ebx & 0xFFFF, hv_host_info_ecx,
>  			hv_host_info_edx >> 24, hv_host_info_edx & 0xFFFFFF);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index fe96aab..68ac474 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -27,8 +27,12 @@
>  #include <linux/vmalloc.h>
>  #include <linux/hyperv.h>
>  #include <linux/version.h>
> -#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
> +#include <linux/random.h>
> +#include <linux/kernel_stat.h>
>  #include <linux/clockchips.h>
> +#include <linux/module.h>
>  #include <asm/hyperv.h>
>  #include <asm/mshyperv.h>
>  #include "hyperv_vmbus.h"
> @@ -38,6 +42,21 @@ struct hv_context hv_context = {
>  	.synic_initialized	= false,
>  };
>
> +/* If true, we're using Direct Mode for stimer0, and the timer will do it own
> + * interrupt when it expires. If false, stimer0 is not using Direct Mode and
> + * will send a VMbus message when it expires. We prefer to use Direct Mode,
> + * but not all versions of Hyper-V support Direct Mode.
> + *
> + * While Hyper-V provides a total of four stimer's per CPU, Linux use only
> + * stimer0.
> + */
> +static bool	stimer_direct_mode;
> +static int	stimer0_irq;
> +static int	stimer0_vector;
> +static bool	direct_mode_disable;
> +module_param(direct_mode_disable, bool, 0444);
> +MODULE_PARM_DESC(direct_mode_disable, "Set to Y to disable Direct Mode.");
> +
>  #define HV_TIMER_FREQUENCY (10 * 1000 * 1000) /* 100ns period */
>  #define HV_MAX_MAX_DELTA_TICKS 0xffffffff
>  #define HV_MIN_DELTA_TICKS 1
> @@ -52,7 +71,12 @@ int hv_init(void)
>  	hv_context.cpu_context = alloc_percpu(struct hv_per_cpu_context);
>  	if (!hv_context.cpu_context)
>  		return -ENOMEM;
> +	stimer_direct_mode = (ms_hyperv.misc_features &
> +		HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? true : false;
>
> +	/* Apply boot command line override to the Direct Mode setting */
> +	if (direct_mode_disable)
> +		stimer_direct_mode = false;

Having both direct_mode_disable and stimer_direct_mode seems redundant.

we may write something like

     if (!direct_mode_disable)
         direct_mode_disable = (ms_hyperv.misc_features & HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? false : true;

and use it everywhere.

>  	return 0;
>  }
>
> @@ -91,6 +115,23 @@ int hv_post_message(union hv_connection_id connection_id,
>  	return status & 0xFFFF;
>  }
>
> +/* ISR for when stimer0 is operating in Direct Mode.  Direct Mode does
> + * not use VMBus or any VMBus messages, so process here and not in the
> + * VMBus driver code.
> + */
> +
> +static void hv_stimer0_isr(struct irq_desc *desc)
> +{
> +	struct hv_per_cpu_context *hv_cpu;
> +
> +	__this_cpu_inc(*desc->kstat_irqs);
> +	__this_cpu_inc(kstat.irqs_sum);
> +	hv_ack_stimer0_interrupt(desc);
> +	hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> +	hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
> +	add_interrupt_randomness(desc->irq_data.irq, 0);

AFAIU implementing Hyper-V specific IRQ chip would allow us to drop this
as we would do this on 'normal' Linux IRQ path.

> +}
> +
>  static int hv_ce_set_next_event(unsigned long delta,
>  				struct clock_event_device *evt)
>  {
> @@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct clock_event_device *evt)
>  {
>  	hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
> +	if (stimer_direct_mode)
> +		hv_disable_stimer0_percpu_irq(stimer0_irq);
>

Both hv_disable_stimer0_percpu_irq() and hv_enable_stimer0_percpu_irq()
are no-ops, right? Why don't we mask/unmask the IRQ? I may have missed
something...

>  	return 0;
>  }
> @@ -116,9 +159,24 @@ static int hv_ce_set_oneshot(struct clock_event_device *evt)
>  {
>  	union hv_timer_config timer_cfg;
>
> +	timer_cfg.as_uint64 = 0; /* Zero everything */
>  	timer_cfg.enable = 1;
>  	timer_cfg.auto_enable = 1;
> -	timer_cfg.sintx = VMBUS_MESSAGE_SINT;
> +	if (stimer_direct_mode) {
> +
> +		/* When it expires, the timer will directly interrupt
> +		 * on the specific hardware vector.
> +		 */
> +		timer_cfg.direct_mode = 1;
> +		timer_cfg.apic_vector = stimer0_vector;
> +		hv_enable_stimer0_percpu_irq(stimer0_irq);
> +	} else {
> +		/* When it expires, the timer will generate a VMbus message,
> +		 * to be handled by the normal VMbus interrupt handler.
> +		 */
> +		timer_cfg.direct_mode = 0;
> +		timer_cfg.sintx = VMBUS_MESSAGE_SINT;
> +	}
>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64);
>
>  	return 0;
> @@ -191,6 +249,12 @@ int hv_synic_alloc(void)
>  		INIT_LIST_HEAD(&hv_cpu->chan_list);
>  	}
>
> +	if (stimer_direct_mode) {
> +		if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
> +			goto err;
> +		irq_set_handler(stimer0_irq, hv_stimer0_isr);
> +	}
> +
>  	return 0;
>  err:
>  	return -ENOMEM;
> @@ -292,6 +356,9 @@ void hv_synic_clockevents_cleanup(void)
>  	if (!(ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE))
>  		return;
>
> +	if (stimer_direct_mode)
> +		hv_deallocate_stimer0_irq(stimer0_irq);
> +
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index de6f01d..ee8c89b 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -55,7 +55,9 @@
>  		u64 periodic:1;
>  		u64 lazy:1;
>  		u64 auto_enable:1;
> -		u64 reserved_z0:12;
> +		u64 apic_vector:8;
> +		u64 direct_mode:1;
> +		u64 reserved_z0:3;
>  		u64 sintx:4;
>  		u64 reserved_z1:44;
>  	};

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-11-01 12:00 ` Vitaly Kuznetsov
@ 2017-11-01 13:41   ` Vitaly Kuznetsov
  2017-11-30 18:08     ` Michael Kelley (EOSG)
  0 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2017-11-01 13:41 UTC (permalink / raw)
  To: mikelley
  Cc: gregkh, linux-kernel, devel, olaf, apw, jasowang,
	leann.ogasawara, marcelo.cerri, sthemmin, kys, Michael Kelley,
	x86

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> mikelley@exchange.microsoft.com writes:
>
>> From: Michael Kelley <mikelley@microsoft.com>
>>
>> The 2016 version of Hyper-V offers the option to operate the guest VM
>> per-vcpu stimer's in Direct Mode, which means the timer interupts on its
>> own vector rather than queueing a VMbus message.  Direct Mode reduces
>> timer processing overhead in both the hypervisor and the guest, and
>> avoids having timer interrupts pollute the VMbus interrupt stream for
>> the synthetic NIC and storage.  This patch enables Direct Mode by
>> default on stimer0 (which is the only stimer used in Linux on Hyper-V)
>> when running on a version of Hyper-V that supports it, with a hv_vmbus
>> module parameter for disabling Direct Mode and reverting to the old
>> behavior.
>>
>> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
>> ---
>>  arch/x86/include/asm/mshyperv.h    | 14 ++++++++
>>  arch/x86/include/uapi/asm/hyperv.h | 26 ++++++++++++++
>>  arch/x86/kernel/cpu/mshyperv.c     | 64 +++++++++++++++++++++++++++++++++-
>>  drivers/hv/hv.c                    | 71 ++++++++++++++++++++++++++++++++++++--
>>  drivers/hv/hyperv_vmbus.h          |  4 ++-
>>  5 files changed, 175 insertions(+), 4 deletions(-)
>>

(in addition to my previous comment)

This patch is also x86-heavy so it probably makes sense to make x86
maintainers (Thomas, Peter, Ingo) aware no matter which tree it will go
through. 

CC: x86@kernel.org

>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index 740dc97..1bba1d7 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -4,6 +4,8 @@
>>  #include <linux/types.h>
>>  #include <linux/atomic.h>
>>  #include <linux/nmi.h>
>> +#include <linux/irq.h>
>> +#include <linux/irqdesc.h>
>>  #include <asm/io.h>
>>  #include <asm/hyperv.h>
>>
>> @@ -374,4 +376,16 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
>>  	return NULL;
>>  }
>>  #endif
>> +
>> +/* Per architecture routines for stimer0 Direct Mode handling.  On x86/x64
>> + * there are no percpu actions to take.
>> + */
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> +static inline void hv_enable_stimer0_percpu_irq(int irq) { }
>> +static inline void hv_disable_stimer0_percpu_irq(int irq) { }
>> +extern int hv_allocate_stimer0_irq(int *irq, int *vector);
>> +extern void hv_deallocate_stimer0_irq(int irq);
>> +extern void hv_ack_stimer0_interrupt(struct irq_desc *desc);
>> +#endif
>> +
>>  #endif
>> diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
>> index f65d125..408cf3e 100644
>> --- a/arch/x86/include/uapi/asm/hyperv.h
>> +++ b/arch/x86/include/uapi/asm/hyperv.h
>> @@ -112,6 +112,22 @@
>>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE		(1 << 5)
>>  /* Guest crash data handler available */
>>  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE		(1 << 10)
>> +/* Debug MSRs available */
>> +#define HV_X64_DEBUG_MSR_AVAILABLE			(1 << 11)
>> +/* Support for Non-Privileged Instruction Execution Prevention is available */
>> +#define HV_X64_NPIEP_AVAILABLE				(1 << 12)
>> +/* Support for DisableHypervisor is available */
>> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE		(1 << 13)
>> +/* Extended GVA Ranges for Flush Virtual Address list is available */
>> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE		(1 << 14)
>> +/* Return Hypercall output via XMM registers is available */
>> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE		(1 << 15)
>> +/* SINT polling mode available */
>> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE		(1 << 17)
>> +/* Hypercall MSR lock is available */
>> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE		(1 << 18)
>> +/* stimer direct mode is available */
>> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE		(1 << 19)
>>
>>  /*
>>   * Implementation recommendations. Indicates which behaviors the hypervisor
>> @@ -300,6 +316,16 @@ enum HV_GENERIC_SET_FORMAT {
>>
>>  #define HV_SYNIC_STIMER_COUNT		(4)
>>
>> +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ is a fake
>> + * because stimer's in Direct Mode simply interrupt on the specified vector,
>> + * without using a particular IOAPIC pin. But we use the IRQ allocation
>> + * machinery, so we need a hardware IRQ #.  This value is somewhat arbitrary,
>> + * but it should not be a legacy IRQ (0 to 15), and should fit within the
>> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So any value
>> + * between 16 and 23 should be good.
>> + */
>
> (nitpick): all comments in the patch are in 'net' style:
>
>  /* This is a
>   * comment.
>   */
>
> While in Hyper-V files (and x86 in general) the 'normal' style is used:
>
>  /*
>   * This is a
>   * comment.
>   */
>
> But checkpatch.pl doesn't complain.
>
>> +#define HV_STIMER0_IRQNR		18
>> +
>>  /* Define synthetic interrupt controller message constants. */
>>  #define HV_MESSAGE_SIZE			(256)
>>  #define HV_MESSAGE_PAYLOAD_BYTE_COUNT	(240)
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index 236324e8..88dc243 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -19,7 +19,10 @@
>>  #include <linux/efi.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/irq.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/irqdesc.h>
>>  #include <linux/kexec.h>
>> +#include <linux/acpi.h>
>>  #include <asm/processor.h>
>>  #include <asm/hypervisor.h>
>>  #include <asm/hyperv.h>
>> @@ -27,6 +30,7 @@
>>  #include <asm/desc.h>
>>  #include <asm/irq_regs.h>
>>  #include <asm/i8259.h>
>> +#include <asm/irqdomain.h>
>>  #include <asm/apic.h>
>>  #include <asm/timer.h>
>>  #include <asm/reboot.h>
>> @@ -69,6 +73,64 @@ void hv_remove_vmbus_irq(void)
>>  EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
>>  EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
>>
>> +
>> +/* Routines to do per-architecture handling of stimer0 when in Direct Mode */
>> +
>> +void hv_ack_stimer0_interrupt(struct irq_desc *desc)
>> +{
>> +	ack_APIC_irq();
>> +}
>> +
>> +static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
>> +				const struct cpumask *mask)
>> +{
>> +	cpumask_copy(retmask, cpu_online_mask);
>> +}
>> +
>> +int hv_allocate_stimer0_irq(int *irq, int *vector)
>> +{
>> +	int		localirq;
>> +	int		result;
>> +	struct irq_data *irq_data;
>> +
>> +	/* The normal APIC vector allocation domain allows allocation of vectors
>> +	 * only for the calling CPU.  So we change the allocation domain to one
>> +	 * that allows vectors to be allocated in all online CPUs.  This
>> +	 * change is fine in a Hyper-V VM because VMs don't have the usual
>> +	 * complement of interrupting devices.
>> +	 */
>> +	apic->vector_allocation_domain = allonline_vector_allocation_domain;
>> +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
>> +				ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
>> +	if (localirq < 0) {
>> +		pr_err("Cannot register stimer0 gsi. Error %d", localirq);
>> +		return -1;
>> +	}
>> +
>> +	/* We pass in a dummy IRQ handler because architecture independent code
>> +	 * will later override the IRQ domain interrupt handler and set it to a
>> +	 * Hyper-V specific handler.
>> +	 */
>> +	result = request_irq(localirq, (irq_handler_t)(-1), 0,
>> +					"Hyper-V stimer0", NULL);
>
> I remember K. Y. told me he has patches in his stash to implement
> Hyper-V-specific APIC. Is this still in works or there is a reason to
> not do it? Just curious.
>
>> +	if (result) {
>> +		pr_err("Cannot request stimer0 irq. Error %d", result);
>> +		acpi_unregister_gsi(localirq);
>> +		return -1;
>> +	}
>> +	irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);
>
> In theory irq_domain_get_irq_data() can return NULL.
>
>> +	*vector = irqd_cfg(irq_data)->vector;
>> +	*irq = localirq;
>> +	return 0;
>> +}
>> +
>> +void hv_deallocate_stimer0_irq(int irq)
>> +{
>> +	free_irq(irq, NULL);
>> +	acpi_unregister_gsi(irq);
>> +}
>> +
>> +
>>  void hv_setup_kexec_handler(void (*handler)(void))
>>  {
>>  	hv_kexec_handler = handler;
>> @@ -195,7 +257,7 @@ static void __init ms_hyperv_init_platform(void)
>>  		hv_host_info_ecx = cpuid_ecx(HVCPUID_VERSION);
>>  		hv_host_info_edx = cpuid_edx(HVCPUID_VERSION);
>>
>> -		pr_info("Hyper-V Host Build:%d-%d.%d-%d-%d.%d\n",
>> +		pr_info("Hyper-V: Host Build %d-%d.%d-%d-%d.%d\n",
>
> While this definitely makes logs nicer uhe change is probably not needed
> in this particular patch.
>
>>  			hv_host_info_eax, hv_host_info_ebx >> 16,
>>  			hv_host_info_ebx & 0xFFFF, hv_host_info_ecx,
>>  			hv_host_info_edx >> 24, hv_host_info_edx & 0xFFFFFF);
>> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
>> index fe96aab..68ac474 100644
>> --- a/drivers/hv/hv.c
>> +++ b/drivers/hv/hv.c
>> @@ -27,8 +27,12 @@
>>  #include <linux/vmalloc.h>
>>  #include <linux/hyperv.h>
>>  #include <linux/version.h>
>> -#include <linux/interrupt.h>
>> +#include <linux/irq.h>
>> +#include <linux/irqdesc.h>
>> +#include <linux/random.h>
>> +#include <linux/kernel_stat.h>
>>  #include <linux/clockchips.h>
>> +#include <linux/module.h>
>>  #include <asm/hyperv.h>
>>  #include <asm/mshyperv.h>
>>  #include "hyperv_vmbus.h"
>> @@ -38,6 +42,21 @@ struct hv_context hv_context = {
>>  	.synic_initialized	= false,
>>  };
>>
>> +/* If true, we're using Direct Mode for stimer0, and the timer will do it own
>> + * interrupt when it expires. If false, stimer0 is not using Direct Mode and
>> + * will send a VMbus message when it expires. We prefer to use Direct Mode,
>> + * but not all versions of Hyper-V support Direct Mode.
>> + *
>> + * While Hyper-V provides a total of four stimer's per CPU, Linux use only
>> + * stimer0.
>> + */
>> +static bool	stimer_direct_mode;
>> +static int	stimer0_irq;
>> +static int	stimer0_vector;
>> +static bool	direct_mode_disable;
>> +module_param(direct_mode_disable, bool, 0444);
>> +MODULE_PARM_DESC(direct_mode_disable, "Set to Y to disable Direct Mode.");
>> +
>>  #define HV_TIMER_FREQUENCY (10 * 1000 * 1000) /* 100ns period */
>>  #define HV_MAX_MAX_DELTA_TICKS 0xffffffff
>>  #define HV_MIN_DELTA_TICKS 1
>> @@ -52,7 +71,12 @@ int hv_init(void)
>>  	hv_context.cpu_context = alloc_percpu(struct hv_per_cpu_context);
>>  	if (!hv_context.cpu_context)
>>  		return -ENOMEM;
>> +	stimer_direct_mode = (ms_hyperv.misc_features &
>> +		HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? true : false;
>>
>> +	/* Apply boot command line override to the Direct Mode setting */
>> +	if (direct_mode_disable)
>> +		stimer_direct_mode = false;
>
> Having both direct_mode_disable and stimer_direct_mode seems redundant.
>
> we may write something like
>
>      if (!direct_mode_disable)
>          direct_mode_disable = (ms_hyperv.misc_features & HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? false : true;
>
> and use it everywhere.
>
>>  	return 0;
>>  }
>>
>> @@ -91,6 +115,23 @@ int hv_post_message(union hv_connection_id connection_id,
>>  	return status & 0xFFFF;
>>  }
>>
>> +/* ISR for when stimer0 is operating in Direct Mode.  Direct Mode does
>> + * not use VMBus or any VMBus messages, so process here and not in the
>> + * VMBus driver code.
>> + */
>> +
>> +static void hv_stimer0_isr(struct irq_desc *desc)
>> +{
>> +	struct hv_per_cpu_context *hv_cpu;
>> +
>> +	__this_cpu_inc(*desc->kstat_irqs);
>> +	__this_cpu_inc(kstat.irqs_sum);
>> +	hv_ack_stimer0_interrupt(desc);
>> +	hv_cpu = this_cpu_ptr(hv_context.cpu_context);
>> +	hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
>> +	add_interrupt_randomness(desc->irq_data.irq, 0);
>
> AFAIU implementing Hyper-V specific IRQ chip would allow us to drop this
> as we would do this on 'normal' Linux IRQ path.
>
>> +}
>> +
>>  static int hv_ce_set_next_event(unsigned long delta,
>>  				struct clock_event_device *evt)
>>  {
>> @@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct clock_event_device *evt)
>>  {
>>  	hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
>>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
>> +	if (stimer_direct_mode)
>> +		hv_disable_stimer0_percpu_irq(stimer0_irq);
>>
>
> Both hv_disable_stimer0_percpu_irq() and hv_enable_stimer0_percpu_irq()
> are no-ops, right? Why don't we mask/unmask the IRQ? I may have missed
> something...
>
>>  	return 0;
>>  }
>> @@ -116,9 +159,24 @@ static int hv_ce_set_oneshot(struct clock_event_device *evt)
>>  {
>>  	union hv_timer_config timer_cfg;
>>
>> +	timer_cfg.as_uint64 = 0; /* Zero everything */
>>  	timer_cfg.enable = 1;
>>  	timer_cfg.auto_enable = 1;
>> -	timer_cfg.sintx = VMBUS_MESSAGE_SINT;
>> +	if (stimer_direct_mode) {
>> +
>> +		/* When it expires, the timer will directly interrupt
>> +		 * on the specific hardware vector.
>> +		 */
>> +		timer_cfg.direct_mode = 1;
>> +		timer_cfg.apic_vector = stimer0_vector;
>> +		hv_enable_stimer0_percpu_irq(stimer0_irq);
>> +	} else {
>> +		/* When it expires, the timer will generate a VMbus message,
>> +		 * to be handled by the normal VMbus interrupt handler.
>> +		 */
>> +		timer_cfg.direct_mode = 0;
>> +		timer_cfg.sintx = VMBUS_MESSAGE_SINT;
>> +	}
>>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64);
>>
>>  	return 0;
>> @@ -191,6 +249,12 @@ int hv_synic_alloc(void)
>>  		INIT_LIST_HEAD(&hv_cpu->chan_list);
>>  	}
>>
>> +	if (stimer_direct_mode) {
>> +		if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
>> +			goto err;
>> +		irq_set_handler(stimer0_irq, hv_stimer0_isr);
>> +	}
>> +
>>  	return 0;
>>  err:
>>  	return -ENOMEM;
>> @@ -292,6 +356,9 @@ void hv_synic_clockevents_cleanup(void)
>>  	if (!(ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE))
>>  		return;
>>
>> +	if (stimer_direct_mode)
>> +		hv_deallocate_stimer0_irq(stimer0_irq);
>> +
>>  	for_each_present_cpu(cpu) {
>>  		struct hv_per_cpu_context *hv_cpu
>>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index de6f01d..ee8c89b 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -55,7 +55,9 @@
>>  		u64 periodic:1;
>>  		u64 lazy:1;
>>  		u64 auto_enable:1;
>> -		u64 reserved_z0:12;
>> +		u64 apic_vector:8;
>> +		u64 direct_mode:1;
>> +		u64 reserved_z0:3;
>>  		u64 sintx:4;
>>  		u64 reserved_z1:44;
>>  	};

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-10-31 22:18 [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0 mikelley
  2017-11-01 12:00 ` Vitaly Kuznetsov
@ 2017-11-01 14:51 ` Thomas Gleixner
  2017-11-01 16:19   ` Thomas Gleixner
  2017-11-30 18:08   ` Michael Kelley (EOSG)
  2017-11-03 16:01 ` kbuild test robot
  2017-11-03 22:55 ` kbuild test robot
  3 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2017-11-01 14:51 UTC (permalink / raw)
  To: mikelley
  Cc: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, marcelo.cerri, sthemmin, kys, Michael Kelley

On Tue, 31 Oct 2017, mikelley@exchange.microsoft.com wrote:
> diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> index f65d125..408cf3e 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -112,6 +112,22 @@
>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE		(1 << 5)
>  /* Guest crash data handler available */
>  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE		(1 << 10)
> +/* Debug MSRs available */
> +#define HV_X64_DEBUG_MSR_AVAILABLE			(1 << 11)
> +/* Support for Non-Privileged Instruction Execution Prevention is available */
> +#define HV_X64_NPIEP_AVAILABLE				(1 << 12)
> +/* Support for DisableHypervisor is available */
> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE		(1 << 13)
> +/* Extended GVA Ranges for Flush Virtual Address list is available */
> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE		(1 << 14)
> +/* Return Hypercall output via XMM registers is available */
> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE		(1 << 15)
> +/* SINT polling mode available */
> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE		(1 << 17)
> +/* Hypercall MSR lock is available */
> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE		(1 << 18)
> +/* stimer direct mode is available */
> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE		(1 << 19)

How are all these defines (except the last one related to that patch?

> +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ is a fake
> + * because stimer's in Direct Mode simply interrupt on the specified vector,
> + * without using a particular IOAPIC pin. But we use the IRQ allocation
> + * machinery, so we need a hardware IRQ #.  This value is somewhat arbitrary,
> + * but it should not be a legacy IRQ (0 to 15), and should fit within the
> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So any value
> + * between 16 and 23 should be good.
> + */
> +#define HV_STIMER0_IRQNR		18

Why would you want abuse an IOAPIC interrupt if all you need is a vector?

> +/* Routines to do per-architecture handling of stimer0 when in Direct Mode */
> +
> +void hv_ack_stimer0_interrupt(struct irq_desc *desc)
> +{
> +	ack_APIC_irq();
> +}
> +
> +static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
> +				const struct cpumask *mask)
> +{
> +	cpumask_copy(retmask, cpu_online_mask);
> +}
> +
> +int hv_allocate_stimer0_irq(int *irq, int *vector)
> +{
> +	int		localirq;
> +	int		result;
> +	struct irq_data *irq_data;
> +
> +	/* The normal APIC vector allocation domain allows allocation of vectors

Please fix you comment style. Multi line comments are:

       /*
        * Bla....
	* foo...
	*/      

> +	 * only for the calling CPU.  So we change the allocation domain to one
> +	 * that allows vectors to be allocated in all online CPUs.  This
> +	 * change is fine in a Hyper-V VM because VMs don't have the usual
> +	 * complement of interrupting devices.
> +	 */
> +	apic->vector_allocation_domain = allonline_vector_allocation_domain;

This does not work anymore. vector_allocation_domain is gone as of
4.15. Please check the vector rework in

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip x86/apic

Aside of that what guarantees that all cpus are online at the point where
you allocate that interrupt? Nothing, so the vector will be not reserved or
allocated on offline CPUs. Now guess what happens if you bring the offline
CPUs online later, it will drown in spurious interrupts. Worse, the vector
can also be reused for a device interrupt. Great plan.

> +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> +				ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> +	if (localirq < 0) {
> +		pr_err("Cannot register stimer0 gsi. Error %d", localirq);
> +		return -1;
> +	}
> +
> +	/* We pass in a dummy IRQ handler because architecture independent code
> +	 * will later override the IRQ domain interrupt handler and set it to a
> +	 * Hyper-V specific handler.
> +	 */
> +	result = request_irq(localirq, (irq_handler_t)(-1), 0,
					"Hyper-V stimer0", NULL);

That's a crude hack. Really. 

> +	if (result) {
> +		pr_err("Cannot request stimer0 irq. Error %d", result);
> +		acpi_unregister_gsi(localirq);
> +		return -1;
> +	}
> +	irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);
> +	*vector = irqd_cfg(irq_data)->vector;
> +	*irq = localirq;

Uurgh, no. This is even more of a layering violation. Grab random data from
wherever it comes and then expect that it works. This will simply fall
apart the moment someone changes the affinity of this interrupt. It will
move to some random other vector and the system drowns in spurious
interrupts on the old vector.

> +/* ISR for when stimer0 is operating in Direct Mode.  Direct Mode does
> + * not use VMBus or any VMBus messages, so process here and not in the
> + * VMBus driver code.
> + */
> +
> +static void hv_stimer0_isr(struct irq_desc *desc)
> +{
> +	struct hv_per_cpu_context *hv_cpu;
> +
> +	__this_cpu_inc(*desc->kstat_irqs);
> +	__this_cpu_inc(kstat.irqs_sum);
> +	hv_ack_stimer0_interrupt(desc);
> +	hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> +	hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
> +	add_interrupt_randomness(desc->irq_data.irq, 0);
> +}
> +
>  static int hv_ce_set_next_event(unsigned long delta,
>  				struct clock_event_device *evt)
>  {
> @@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct clock_event_device *evt)
>  {
>  	hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
> +	if (stimer_direct_mode)
> +		hv_disable_stimer0_percpu_irq(stimer0_irq);

Whats the point of that? It's an empty inline:

> +#if IS_ENABLED(CONFIG_HYPERV)
> +static inline void hv_enable_stimer0_percpu_irq(int irq) { }
> +static inline void hv_disable_stimer0_percpu_irq(int irq) { }

> +	if (stimer_direct_mode) {
> +
> +		/* When it expires, the timer will directly interrupt
> +		 * on the specific hardware vector.
> +		 */
> +		timer_cfg.direct_mode = 1;
> +		timer_cfg.apic_vector = stimer0_vector;
> +		hv_enable_stimer0_percpu_irq(stimer0_irq);

Ditto. 

> +	if (stimer_direct_mode) {
> +		if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
> +			goto err;
> +		irq_set_handler(stimer0_irq, hv_stimer0_isr);

What you really want to do here is to allocate a fixed vector like we do
for the other percpu interrupts (local apic timer, IPIs etc.) and use
that. This must be done at boot time, when you detect that the kernel runs
on HyperV. This makes sure, that all CPUs will have the vector populated,
even those which come online late.

Though you can't do that from a module. You have to do that setup during
early boot from ms_hyperv_init_platform(). That's the only sane way to deal
with that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-11-01 14:51 ` Thomas Gleixner
@ 2017-11-01 16:19   ` Thomas Gleixner
  2017-11-01 16:21     ` Michael Kelley (EOSG)
  2017-11-30 18:08   ` Michael Kelley (EOSG)
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2017-11-01 16:19 UTC (permalink / raw)
  To: Michael Kelley
  Cc: gregkh, LKML, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, marcelo.cerri, sthemmin, kys

On Wed, 1 Nov 2017, Thomas Gleixner wrote:

> On Tue, 31 Oct 2017, mikelley@exchange.microsoft.com wrote:

     	     	       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please fix your mail process. mikelley@exchange.microsoft.com is not your
real mail address, but shows up as From and obviously the reply bounces:

  mikelley@exchange.microsoft.com
  Remote Server returned '550 5.1.10 RESOLVER.ADR.RecipientNotFound;
  Recipient not found by SMTP address lookup'

Sigh,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-11-01 16:19   ` Thomas Gleixner
@ 2017-11-01 16:21     ` Michael Kelley (EOSG)
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Kelley (EOSG) @ 2017-11-01 16:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: gregkh, LKML, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, marcelo.cerri, Stephen Hemminger, KY Srinivasan

It's fixed.

Michael

-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de] 
Sent: Wednesday, November 1, 2017 9:20 AM
To: Michael Kelley (EOSG) <Michael.H.Kelley@microsoft.com>
Cc: gregkh@linuxfoundation.org; LKML <linux-kernel@vger.kernel.org>; devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com; vkuznets@redhat.com; jasowang@redhat.com; leann.ogasawara@canonical.com; marcelo.cerri@canonical.com; Stephen Hemminger <sthemmin@microsoft.com>; KY Srinivasan <kys@microsoft.com>
Subject: Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

On Wed, 1 Nov 2017, Thomas Gleixner wrote:

> On Tue, 31 Oct 2017, mikelley@exchange.microsoft.com wrote:

     	     	       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please fix your mail process. mikelley@exchange.microsoft.com is not your real mail address, but shows up as From and obviously the reply bounces:

  mikelley@exchange.microsoft.com
  Remote Server returned '550 5.1.10 RESOLVER.ADR.RecipientNotFound;
  Recipient not found by SMTP address lookup'

Sigh,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-10-31 22:18 [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0 mikelley
  2017-11-01 12:00 ` Vitaly Kuznetsov
  2017-11-01 14:51 ` Thomas Gleixner
@ 2017-11-03 16:01 ` kbuild test robot
  2017-11-03 22:55 ` kbuild test robot
  3 siblings, 0 replies; 10+ messages in thread
From: kbuild test robot @ 2017-11-03 16:01 UTC (permalink / raw)
  To: mikelley
  Cc: kbuild-all, gregkh, linux-kernel, devel, olaf, apw, vkuznets,
	jasowang, leann.ogasawara, marcelo.cerri, sthemmin, kys,
	Michael Kelley

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

Hi Michael,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on char-misc/master]

url:    https://github.com/0day-ci/linux/commits/mikelley-exchange-microsoft-com/Drivers-hv-vmbus-Implement-Direct-Mode-for-stimer0/20171103-214519
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "hv_deallocate_stimer0_irq" [drivers/hv/hv_vmbus.ko] undefined!
>> ERROR: "hv_allocate_stimer0_irq" [drivers/hv/hv_vmbus.ko] undefined!
>> ERROR: "hv_ack_stimer0_interrupt" [drivers/hv/hv_vmbus.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40097 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-10-31 22:18 [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0 mikelley
                   ` (2 preceding siblings ...)
  2017-11-03 16:01 ` kbuild test robot
@ 2017-11-03 22:55 ` kbuild test robot
  3 siblings, 0 replies; 10+ messages in thread
From: kbuild test robot @ 2017-11-03 22:55 UTC (permalink / raw)
  To: mikelley
  Cc: kbuild-all, gregkh, linux-kernel, devel, olaf, apw, vkuznets,
	jasowang, leann.ogasawara, marcelo.cerri, sthemmin, kys,
	Michael Kelley

[-- Attachment #1: Type: text/plain, Size: 2581 bytes --]

Hi Michael,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on char-misc/master]

url:    https://github.com/0day-ci/linux/commits/mikelley-exchange-microsoft-com/Drivers-hv-vmbus-Implement-Direct-Mode-for-stimer0/20171103-214519
config: x86_64-randconfig-h0-11040619 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kasan.h:16:0,
                    from include/linux/slab.h:120,
                    from include/linux/irq.h:25,
                    from arch/x86/include/asm/mshyperv.h:7,
                    from arch/x86/entry/vdso/vdso32/../vclock_gettime.c:20,
                    from arch/x86/entry/vdso/vdso32/vclock_gettime.c:32:
   arch/x86/include/asm/pgtable.h: In function 'pte_flags_pkey':
>> arch/x86/include/asm/pgtable.h:1199:2: warning: left shift count >= width of type
     return (pte_flags & _PAGE_PKEY_MASK) >> _PAGE_BIT_PKEY_BIT0;
     ^
>> arch/x86/include/asm/pgtable.h:1199:2: warning: left shift count >= width of type
>> arch/x86/include/asm/pgtable.h:1199:2: warning: left shift count >= width of type
>> arch/x86/include/asm/pgtable.h:1199:2: warning: left shift count >= width of type
   arch/x86/include/asm/pgtable.h:1199:2: warning: right shift count >= width of type

vim +1199 arch/x86/include/asm/pgtable.h

33a709b2 Dave Hansen 2016-02-12  1194  
33a709b2 Dave Hansen 2016-02-12  1195  static inline u16 pte_flags_pkey(unsigned long pte_flags)
33a709b2 Dave Hansen 2016-02-12  1196  {
33a709b2 Dave Hansen 2016-02-12  1197  #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
33a709b2 Dave Hansen 2016-02-12  1198  	/* ifdef to avoid doing 59-bit shift on 32-bit values */
33a709b2 Dave Hansen 2016-02-12 @1199  	return (pte_flags & _PAGE_PKEY_MASK) >> _PAGE_BIT_PKEY_BIT0;
33a709b2 Dave Hansen 2016-02-12  1200  #else
33a709b2 Dave Hansen 2016-02-12  1201  	return 0;
33a709b2 Dave Hansen 2016-02-12  1202  #endif
33a709b2 Dave Hansen 2016-02-12  1203  }
33a709b2 Dave Hansen 2016-02-12  1204  

:::::: The code at line 1199 was first introduced by commit
:::::: 33a709b25a760b91184bb335cf7d7c32b8123013 mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys

:::::: TO: Dave Hansen <dave.hansen@linux.intel.com>
:::::: CC: Ingo Molnar <mingo@kernel.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26523 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-11-01 13:41   ` Vitaly Kuznetsov
@ 2017-11-30 18:08     ` Michael Kelley (EOSG)
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Kelley (EOSG) @ 2017-11-30 18:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: gregkh, linux-kernel, devel, olaf, apw, jasowang,
	leann.ogasawara, marcelo.cerri, Stephen Hemminger, KY Srinivasan,
	x86

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > mikelley@exchange.microsoft.com writes:
> >
> >> From: Michael Kelley <mikelley@microsoft.com>
> >>
> >> The 2016 version of Hyper-V offers the option to operate the guest VM
> >> per-vcpu stimer's in Direct Mode, which means the timer interupts on
> >> its own vector rather than queueing a VMbus message.  Direct Mode
> >> reduces timer processing overhead in both the hypervisor and the
> >> guest, and avoids having timer interrupts pollute the VMbus interrupt
> >> stream for the synthetic NIC and storage.  This patch enables Direct
> >> Mode by default on stimer0 (which is the only stimer used in Linux on
> >> Hyper-V) when running on a version of Hyper-V that supports it, with
> >> a hv_vmbus module parameter for disabling Direct Mode and reverting
> >> to the old behavior.
> >>
> >> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> >> ---
> >>  arch/x86/include/asm/mshyperv.h    | 14 ++++++++
> >>  arch/x86/include/uapi/asm/hyperv.h | 26 ++++++++++++++
> >>  arch/x86/kernel/cpu/mshyperv.c     | 64 +++++++++++++++++++++++++++++++++-
> >>  drivers/hv/hv.c                    | 71 ++++++++++++++++++++++++++++++++++++--
> >>  drivers/hv/hyperv_vmbus.h          |  4 ++-
> >>  5 files changed, 175 insertions(+), 4 deletions(-)
> >>
> 
> (in addition to my previous comment)
> 
> This patch is also x86-heavy so it probably makes sense to make x86 maintainers (Thomas,
> Peter, Ingo) aware no matter which tree it will go through.
> 
> CC: x86@kernel.org
> 
> >> diff --git a/arch/x86/include/asm/mshyperv.h
> >> b/arch/x86/include/asm/mshyperv.h index 740dc97..1bba1d7 100644
> >> --- a/arch/x86/include/asm/mshyperv.h
> >> +++ b/arch/x86/include/asm/mshyperv.h
> >> @@ -4,6 +4,8 @@
> >>  #include <linux/types.h>
> >>  #include <linux/atomic.h>
> >>  #include <linux/nmi.h>
> >> +#include <linux/irq.h>
> >> +#include <linux/irqdesc.h>
> >>  #include <asm/io.h>
> >>  #include <asm/hyperv.h>
> >>
> >> @@ -374,4 +376,16 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> >>  	return NULL;
> >>  }
> >>  #endif
> >> +
> >> +/* Per architecture routines for stimer0 Direct Mode handling.  On
> >> +x86/x64
> >> + * there are no percpu actions to take.
> >> + */
> >> +#if IS_ENABLED(CONFIG_HYPERV)
> >> +static inline void hv_enable_stimer0_percpu_irq(int irq) { } static
> >> +inline void hv_disable_stimer0_percpu_irq(int irq) { } extern int
> >> +hv_allocate_stimer0_irq(int *irq, int *vector); extern void
> >> +hv_deallocate_stimer0_irq(int irq); extern void
> >> +hv_ack_stimer0_interrupt(struct irq_desc *desc); #endif
> >> +
> >>  #endif
> >> diff --git a/arch/x86/include/uapi/asm/hyperv.h
> >> b/arch/x86/include/uapi/asm/hyperv.h
> >> index f65d125..408cf3e 100644
> >> --- a/arch/x86/include/uapi/asm/hyperv.h
> >> +++ b/arch/x86/include/uapi/asm/hyperv.h
> >> @@ -112,6 +112,22 @@
> >>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE		(1 << 5)
> >>  /* Guest crash data handler available */
> >>  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE		(1 << 10)
> >> +/* Debug MSRs available */
> >> +#define HV_X64_DEBUG_MSR_AVAILABLE			(1 << 11)
> >> +/* Support for Non-Privileged Instruction Execution Prevention is available */
> >> +#define HV_X64_NPIEP_AVAILABLE				(1 << 12)
> >> +/* Support for DisableHypervisor is available */
> >> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE		(1 << 13)
> >> +/* Extended GVA Ranges for Flush Virtual Address list is available */
> >> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE		(1 << 14)
> >> +/* Return Hypercall output via XMM registers is available */
> >> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE		(1 << 15)
> >> +/* SINT polling mode available */
> >> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE		(1 << 17)
> >> +/* Hypercall MSR lock is available */
> >> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE		(1 << 18)
> >> +/* stimer direct mode is available */
> >> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE		(1 << 19)
> >>
> >>  /*
> >>   * Implementation recommendations. Indicates which behaviors the
> >> hypervisor @@ -300,6 +316,16 @@ enum HV_GENERIC_SET_FORMAT {
> >>
> >>  #define HV_SYNIC_STIMER_COUNT		(4)
> >>
> >> +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ
> >> +is a fake
> >> + * because stimer's in Direct Mode simply interrupt on the specified
> >> +vector,
> >> + * without using a particular IOAPIC pin. But we use the IRQ
> >> +allocation
> >> + * machinery, so we need a hardware IRQ #.  This value is somewhat
> >> +arbitrary,
> >> + * but it should not be a legacy IRQ (0 to 15), and should fit
> >> +within the
> >> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So
> >> +any value
> >> + * between 16 and 23 should be good.
> >> + */
> >
> > (nitpick): all comments in the patch are in 'net' style:
> >
> >  /* This is a
> >   * comment.
> >   */
> >
> > While in Hyper-V files (and x86 in general) the 'normal' style is used:
> >
> >  /*
> >   * This is a
> >   * comment.
> >   */
> >
> > But checkpatch.pl doesn't complain.

Will fix in v2.

> >
> >> +#define HV_STIMER0_IRQNR		18
> >> +
> >>  /* Define synthetic interrupt controller message constants. */
> >>  #define HV_MESSAGE_SIZE			(256)
> >>  #define HV_MESSAGE_PAYLOAD_BYTE_COUNT	(240)
> >> diff --git a/arch/x86/kernel/cpu/mshyperv.c
> >> b/arch/x86/kernel/cpu/mshyperv.c index 236324e8..88dc243 100644
> >> --- a/arch/x86/kernel/cpu/mshyperv.c
> >> +++ b/arch/x86/kernel/cpu/mshyperv.c
> >> @@ -19,7 +19,10 @@
> >>  #include <linux/efi.h>
> >>  #include <linux/interrupt.h>
> >>  #include <linux/irq.h>
> >> +#include <linux/irqdomain.h>
> >> +#include <linux/irqdesc.h>
> >>  #include <linux/kexec.h>
> >> +#include <linux/acpi.h>
> >>  #include <asm/processor.h>
> >>  #include <asm/hypervisor.h>
> >>  #include <asm/hyperv.h>
> >> @@ -27,6 +30,7 @@
> >>  #include <asm/desc.h>
> >>  #include <asm/irq_regs.h>
> >>  #include <asm/i8259.h>
> >> +#include <asm/irqdomain.h>
> >>  #include <asm/apic.h>
> >>  #include <asm/timer.h>
> >>  #include <asm/reboot.h>
> >> @@ -69,6 +73,64 @@ void hv_remove_vmbus_irq(void)
> >> EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
> >>  EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
> >>
> >> +
> >> +/* Routines to do per-architecture handling of stimer0 when in
> >> +Direct Mode */
> >> +
> >> +void hv_ack_stimer0_interrupt(struct irq_desc *desc) {
> >> +	ack_APIC_irq();
> >> +}
> >> +
> >> +static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
> >> +				const struct cpumask *mask)
> >> +{
> >> +	cpumask_copy(retmask, cpu_online_mask); }
> >> +
> >> +int hv_allocate_stimer0_irq(int *irq, int *vector) {
> >> +	int		localirq;
> >> +	int		result;
> >> +	struct irq_data *irq_data;
> >> +
> >> +	/* The normal APIC vector allocation domain allows allocation of vectors
> >> +	 * only for the calling CPU.  So we change the allocation domain to one
> >> +	 * that allows vectors to be allocated in all online CPUs.  This
> >> +	 * change is fine in a Hyper-V VM because VMs don't have the usual
> >> +	 * complement of interrupting devices.
> >> +	 */
> >> +	apic->vector_allocation_domain = allonline_vector_allocation_domain;
> >> +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> >> +				ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> >> +	if (localirq < 0) {
> >> +		pr_err("Cannot register stimer0 gsi. Error %d", localirq);
> >> +		return -1;
> >> +	}
> >> +
> >> +	/* We pass in a dummy IRQ handler because architecture independent code
> >> +	 * will later override the IRQ domain interrupt handler and set it to a
> >> +	 * Hyper-V specific handler.
> >> +	 */
> >> +	result = request_irq(localirq, (irq_handler_t)(-1), 0,
> >> +					"Hyper-V stimer0", NULL);
> >
> > I remember K. Y. told me he has patches in his stash to implement
> > Hyper-V-specific APIC. Is this still in works or there is a reason to
> > not do it? Just curious.
> >

I've looked at KY's pending patch.  It's only a partial APIC implementation that overrides certain performance sensitive functions and doesn't really help what's needed here.   v2 of the patch will take a different approach that makes this moot.

> >> +	if (result) {
> >> +		pr_err("Cannot request stimer0 irq. Error %d", result);
> >> +		acpi_unregister_gsi(localirq);
> >> +		return -1;
> >> +	}
> >> +	irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);
> >
> > In theory irq_domain_get_irq_data() can return NULL.
> >

Per separate comments from Thomas Gleixner, shouldn't be doing this anyway.  Will take a different approach in v2 of the patch. 

> >> +	*vector = irqd_cfg(irq_data)->vector;
> >> +	*irq = localirq;
> >> +	return 0;
> >> +}
> >> +
> >> +void hv_deallocate_stimer0_irq(int irq) {
> >> +	free_irq(irq, NULL);
> >> +	acpi_unregister_gsi(irq);
> >> +}
> >> +
> >> +
> >>  void hv_setup_kexec_handler(void (*handler)(void))  {
> >>  	hv_kexec_handler = handler;
> >> @@ -195,7 +257,7 @@ static void __init ms_hyperv_init_platform(void)
> >>  		hv_host_info_ecx = cpuid_ecx(HVCPUID_VERSION);
> >>  		hv_host_info_edx = cpuid_edx(HVCPUID_VERSION);
> >>
> >> -		pr_info("Hyper-V Host Build:%d-%d.%d-%d-%d.%d\n",
> >> +		pr_info("Hyper-V: Host Build %d-%d.%d-%d-%d.%d\n",
> >
> > While this definitely makes logs nicer uhe change is probably not
> > needed in this particular patch.

True.  Will move this to a separate patch.

> >
> >>  			hv_host_info_eax, hv_host_info_ebx >> 16,
> >>  			hv_host_info_ebx & 0xFFFF, hv_host_info_ecx,
> >>  			hv_host_info_edx >> 24, hv_host_info_edx & 0xFFFFFF); diff --git
> >> a/drivers/hv/hv.c b/drivers/hv/hv.c index fe96aab..68ac474 100644
> >> --- a/drivers/hv/hv.c
> >> +++ b/drivers/hv/hv.c
> >> @@ -27,8 +27,12 @@
> >>  #include <linux/vmalloc.h>
> >>  #include <linux/hyperv.h>
> >>  #include <linux/version.h>
> >> -#include <linux/interrupt.h>
> >> +#include <linux/irq.h>
> >> +#include <linux/irqdesc.h>
> >> +#include <linux/random.h>
> >> +#include <linux/kernel_stat.h>
> >>  #include <linux/clockchips.h>
> >> +#include <linux/module.h>
> >>  #include <asm/hyperv.h>
> >>  #include <asm/mshyperv.h>
> >>  #include "hyperv_vmbus.h"
> >> @@ -38,6 +42,21 @@ struct hv_context hv_context = {
> >>  	.synic_initialized	= false,
> >>  };
> >>
> >> +/* If true, we're using Direct Mode for stimer0, and the timer will
> >> +do it own
> >> + * interrupt when it expires. If false, stimer0 is not using Direct
> >> +Mode and
> >> + * will send a VMbus message when it expires. We prefer to use
> >> +Direct Mode,
> >> + * but not all versions of Hyper-V support Direct Mode.
> >> + *
> >> + * While Hyper-V provides a total of four stimer's per CPU, Linux
> >> +use only
> >> + * stimer0.
> >> + */
> >> +static bool	stimer_direct_mode;
> >> +static int	stimer0_irq;
> >> +static int	stimer0_vector;
> >> +static bool	direct_mode_disable;
> >> +module_param(direct_mode_disable, bool, 0444);
> >> +MODULE_PARM_DESC(direct_mode_disable, "Set to Y to disable Direct
> >> +Mode.");
> >> +
> >>  #define HV_TIMER_FREQUENCY (10 * 1000 * 1000) /* 100ns period */
> >> #define HV_MAX_MAX_DELTA_TICKS 0xffffffff  #define HV_MIN_DELTA_TICKS
> >> 1 @@ -52,7 +71,12 @@ int hv_init(void)
> >>  	hv_context.cpu_context = alloc_percpu(struct hv_per_cpu_context);
> >>  	if (!hv_context.cpu_context)
> >>  		return -ENOMEM;
> >> +	stimer_direct_mode = (ms_hyperv.misc_features &
> >> +		HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? true : false;
> >>
> >> +	/* Apply boot command line override to the Direct Mode setting */
> >> +	if (direct_mode_disable)
> >> +		stimer_direct_mode = false;
> >
> > Having both direct_mode_disable and stimer_direct_mode seems redundant.
> >
> > we may write something like
> >
> >      if (!direct_mode_disable)
> >          direct_mode_disable = (ms_hyperv.misc_features &
> > HV_X64_STIMER_DIRECT_MODE_AVAILABLE) ? false : true;
> >
> > and use it everywhere.

Got it.  Makes sense.

> >
> >>  	return 0;
> >>  }
> >>
> >> @@ -91,6 +115,23 @@ int hv_post_message(union hv_connection_id connection_id,
> >>  	return status & 0xFFFF;
> >>  }
> >>
> >> +/* ISR for when stimer0 is operating in Direct Mode.  Direct Mode
> >> +does
> >> + * not use VMBus or any VMBus messages, so process here and not in
> >> +the
> >> + * VMBus driver code.
> >> + */
> >> +
> >> +static void hv_stimer0_isr(struct irq_desc *desc) {
> >> +	struct hv_per_cpu_context *hv_cpu;
> >> +
> >> +	__this_cpu_inc(*desc->kstat_irqs);
> >> +	__this_cpu_inc(kstat.irqs_sum);
> >> +	hv_ack_stimer0_interrupt(desc);
> >> +	hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> >> +	hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
> >> +	add_interrupt_randomness(desc->irq_data.irq, 0);
> >
> > AFAIU implementing Hyper-V specific IRQ chip would allow us to drop
> > this as we would do this on 'normal' Linux IRQ path.
> >
> >> +}
> >> +
> >>  static int hv_ce_set_next_event(unsigned long delta,
> >>  				struct clock_event_device *evt)
> >>  {
> >> @@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct
> >> clock_event_device *evt)  {
> >>  	hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
> >>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
> >> +	if (stimer_direct_mode)
> >> +		hv_disable_stimer0_percpu_irq(stimer0_irq);
> >>
> >
> > Both hv_disable_stimer0_percpu_irq() and
> > hv_enable_stimer0_percpu_irq() are no-ops, right? Why don't we
> > mask/unmask the IRQ? I may have missed something...

Yes, these functions are no-ops on x86 as there's really no need to mask/unmask.  But we have code in the works to enable Linux guests on Hyper-V on ARM64 hardware.  On the ARM64 side, stimer Direct Mode fits nicely with percpu IRQs, and these functions are needed to call enable_percpu_irq() and disable_percpu_irq().

> >
> >>  	return 0;
> >>  }
> >> @@ -116,9 +159,24 @@ static int hv_ce_set_oneshot(struct
> >> clock_event_device *evt)  {
> >>  	union hv_timer_config timer_cfg;
> >>
> >> +	timer_cfg.as_uint64 = 0; /* Zero everything */
> >>  	timer_cfg.enable = 1;
> >>  	timer_cfg.auto_enable = 1;
> >> -	timer_cfg.sintx = VMBUS_MESSAGE_SINT;
> >> +	if (stimer_direct_mode) {
> >> +
> >> +		/* When it expires, the timer will directly interrupt
> >> +		 * on the specific hardware vector.
> >> +		 */
> >> +		timer_cfg.direct_mode = 1;
> >> +		timer_cfg.apic_vector = stimer0_vector;
> >> +		hv_enable_stimer0_percpu_irq(stimer0_irq);
> >> +	} else {
> >> +		/* When it expires, the timer will generate a VMbus message,
> >> +		 * to be handled by the normal VMbus interrupt handler.
> >> +		 */
> >> +		timer_cfg.direct_mode = 0;
> >> +		timer_cfg.sintx = VMBUS_MESSAGE_SINT;
> >> +	}
> >>  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG,
> >> timer_cfg.as_uint64);
> >>
> >>  	return 0;
> >> @@ -191,6 +249,12 @@ int hv_synic_alloc(void)
> >>  		INIT_LIST_HEAD(&hv_cpu->chan_list);
> >>  	}
> >>
> >> +	if (stimer_direct_mode) {
> >> +		if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
> >> +			goto err;
> >> +		irq_set_handler(stimer0_irq, hv_stimer0_isr);
> >> +	}
> >> +
> >>  	return 0;
> >>  err:
> >>  	return -ENOMEM;
> >> @@ -292,6 +356,9 @@ void hv_synic_clockevents_cleanup(void)
> >>  	if (!(ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE))
> >>  		return;
> >>
> >> +	if (stimer_direct_mode)
> >> +		hv_deallocate_stimer0_irq(stimer0_irq);
> >> +
> >>  	for_each_present_cpu(cpu) {
> >>  		struct hv_per_cpu_context *hv_cpu
> >>  			= per_cpu_ptr(hv_context.cpu_context, cpu); diff --git
> >> a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h index
> >> de6f01d..ee8c89b 100644
> >> --- a/drivers/hv/hyperv_vmbus.h
> >> +++ b/drivers/hv/hyperv_vmbus.h
> >> @@ -55,7 +55,9 @@
> >>  		u64 periodic:1;
> >>  		u64 lazy:1;
> >>  		u64 auto_enable:1;
> >> -		u64 reserved_z0:12;
> >> +		u64 apic_vector:8;
> >> +		u64 direct_mode:1;
> >> +		u64 reserved_z0:3;
> >>  		u64 sintx:4;
> >>  		u64 reserved_z1:44;
> >>  	};
> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
  2017-11-01 14:51 ` Thomas Gleixner
  2017-11-01 16:19   ` Thomas Gleixner
@ 2017-11-30 18:08   ` Michael Kelley (EOSG)
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Kelley (EOSG) @ 2017-11-30 18:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, marcelo.cerri, Stephen Hemminger, KY Srinivasan

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, 31 Oct 2017, mikelley@exchange.microsoft.com wrote:
> > diff --git a/arch/x86/include/uapi/asm/hyperv.h
> > b/arch/x86/include/uapi/asm/hyperv.h
> > index f65d125..408cf3e 100644
> > --- a/arch/x86/include/uapi/asm/hyperv.h
> > +++ b/arch/x86/include/uapi/asm/hyperv.h
> > @@ -112,6 +112,22 @@
> >  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE		(1 << 5)
> >  /* Guest crash data handler available */
> >  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE		(1 << 10)
> > +/* Debug MSRs available */
> > +#define HV_X64_DEBUG_MSR_AVAILABLE			(1 << 11)
> > +/* Support for Non-Privileged Instruction Execution Prevention is available */
> > +#define HV_X64_NPIEP_AVAILABLE				(1 << 12)
> > +/* Support for DisableHypervisor is available */
> > +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE		(1 << 13)
> > +/* Extended GVA Ranges for Flush Virtual Address list is available */
> > +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE		(1 << 14)
> > +/* Return Hypercall output via XMM registers is available */
> > +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE		(1 << 15)
> > +/* SINT polling mode available */
> > +#define HV_X64_SINT_POLLING_MODE_AVAILABLE		(1 << 17)
> > +/* Hypercall MSR lock is available */
> > +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE		(1 << 18)
> > +/* stimer direct mode is available */
> > +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE		(1 << 19)
> 
> How are all these defines (except the last one related to that patch?

Will move to a separate patch.

> 
> > +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ
> > +is a fake
> > + * because stimer's in Direct Mode simply interrupt on the specified
> > +vector,
> > + * without using a particular IOAPIC pin. But we use the IRQ
> > +allocation
> > + * machinery, so we need a hardware IRQ #.  This value is somewhat
> > +arbitrary,
> > + * but it should not be a legacy IRQ (0 to 15), and should fit within
> > +the
> > + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So
> > +any value
> > + * between 16 and 23 should be good.
> > + */
> > +#define HV_STIMER0_IRQNR		18
> 
> Why would you want abuse an IOAPIC interrupt if all you need is a vector?

Allocating a vector up-front like the existing HYPERVISOR_CALLBACK_VECTOR would certainly be more straightforward, and in fact, my original internal testing of stimer Direct Mode used that approach.   Vectors are a limited resource, so I wanted to find a way to allocate one on-the-fly rather than use fixed pre-allocation, even under CONFIG_HYPERV.   But I've got no problem with allocating a vector up-front and skipping all the irq/APIC manipulation and related issues.

Any guidance on which vector to use?

> 
> > +/* Routines to do per-architecture handling of stimer0 when in Direct
> > +Mode */
> > +
> > +void hv_ack_stimer0_interrupt(struct irq_desc *desc) {
> > +	ack_APIC_irq();
> > +}
> > +
> > +static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
> > +				const struct cpumask *mask)
> > +{
> > +	cpumask_copy(retmask, cpu_online_mask); }
> > +
> > +int hv_allocate_stimer0_irq(int *irq, int *vector) {
> > +	int		localirq;
> > +	int		result;
> > +	struct irq_data *irq_data;
> > +
> > +	/* The normal APIC vector allocation domain allows allocation of
> > +vectors
> 
> Please fix you comment style. Multi line comments are:
> 
>        /*
>         * Bla....
> 	* foo...
> 	*/
> 

Will do.

> > +	 * only for the calling CPU.  So we change the allocation domain to one
> > +	 * that allows vectors to be allocated in all online CPUs.  This
> > +	 * change is fine in a Hyper-V VM because VMs don't have the usual
> > +	 * complement of interrupting devices.
> > +	 */
> > +	apic->vector_allocation_domain = allonline_vector_allocation_domain;
> 
> This does not work anymore. vector_allocation_domain is gone as of 4.15. Please check the
> vector rework in
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip x86/apic
> 
> Aside of that what guarantees that all cpus are online at the point where you allocate that
> interrupt? Nothing, so the vector will be not reserved or allocated on offline CPUs. Now guess
> what happens if you bring the offline CPUs online later, it will drown in spurious interrupts.
> Worse, the vector can also be reused for a device interrupt. Great plan.
> 
> > +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> > +				ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> > +	if (localirq < 0) {
> > +		pr_err("Cannot register stimer0 gsi. Error %d", localirq);
> > +		return -1;
> > +	}
> > +
> > +	/* We pass in a dummy IRQ handler because architecture independent code
> > +	 * will later override the IRQ domain interrupt handler and set it to a
> > +	 * Hyper-V specific handler.
> > +	 */
> > +	result = request_irq(localirq, (irq_handler_t)(-1), 0,
> 					"Hyper-V stimer0", NULL);
> 
> That's a crude hack. Really.
> 
> > +	if (result) {
> > +		pr_err("Cannot request stimer0 irq. Error %d", result);
> > +		acpi_unregister_gsi(localirq);
> > +		return -1;
> > +	}
> > +	irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);
> > +	*vector = irqd_cfg(irq_data)->vector;
> > +	*irq = localirq;
> 
> Uurgh, no. This is even more of a layering violation. Grab random data from wherever it
> comes and then expect that it works. This will simply fall apart the moment someone changes
> the affinity of this interrupt. It will move to some random other vector and the system drowns
> in spurious interrupts on the old vector.
> 
> > +/* ISR for when stimer0 is operating in Direct Mode.  Direct Mode
> > +does
> > + * not use VMBus or any VMBus messages, so process here and not in
> > +the
> > + * VMBus driver code.
> > + */
> > +
> > +static void hv_stimer0_isr(struct irq_desc *desc) {
> > +	struct hv_per_cpu_context *hv_cpu;
> > +
> > +	__this_cpu_inc(*desc->kstat_irqs);
> > +	__this_cpu_inc(kstat.irqs_sum);
> > +	hv_ack_stimer0_interrupt(desc);
> > +	hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> > +	hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
> > +	add_interrupt_randomness(desc->irq_data.irq, 0); }
> > +
> >  static int hv_ce_set_next_event(unsigned long delta,
> >  				struct clock_event_device *evt)
> >  {
> > @@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct
> > clock_event_device *evt)  {
> >  	hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
> >  	hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
> > +	if (stimer_direct_mode)
> > +		hv_disable_stimer0_percpu_irq(stimer0_irq);
> 
> Whats the point of that? It's an empty inline:
> 
> > +#if IS_ENABLED(CONFIG_HYPERV)
> > +static inline void hv_enable_stimer0_percpu_irq(int irq) { } static
> > +inline void hv_disable_stimer0_percpu_irq(int irq) { }

We've got code in the works for Hyper-V on ARM64.  This is architecture independent code that caters to the ARM64 side where stimer Direct Mode will use percpu IRQs, and hooks are needed to call enable_percpu_irq() and disable_percpu_irq().

> 
> > +	if (stimer_direct_mode) {
> > +
> > +		/* When it expires, the timer will directly interrupt
> > +		 * on the specific hardware vector.
> > +		 */
> > +		timer_cfg.direct_mode = 1;
> > +		timer_cfg.apic_vector = stimer0_vector;
> > +		hv_enable_stimer0_percpu_irq(stimer0_irq);
> 
> Ditto.
> 
> > +	if (stimer_direct_mode) {
> > +		if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
> > +			goto err;
> > +		irq_set_handler(stimer0_irq, hv_stimer0_isr);
> 
> What you really want to do here is to allocate a fixed vector like we do for the other percpu
> interrupts (local apic timer, IPIs etc.) and use that. This must be done at boot time, when you
> detect that the kernel runs on HyperV. This makes sure, that all CPUs will have the vector
> populated, even those which come online late.

Will submit a v2 with that approach.

> 
> Though you can't do that from a module. You have to do that setup during early boot from
> ms_hyperv_init_platform(). That's the only sane way to deal with that.
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-30 18:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-31 22:18 [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0 mikelley
2017-11-01 12:00 ` Vitaly Kuznetsov
2017-11-01 13:41   ` Vitaly Kuznetsov
2017-11-30 18:08     ` Michael Kelley (EOSG)
2017-11-01 14:51 ` Thomas Gleixner
2017-11-01 16:19   ` Thomas Gleixner
2017-11-01 16:21     ` Michael Kelley (EOSG)
2017-11-30 18:08   ` Michael Kelley (EOSG)
2017-11-03 16:01 ` kbuild test robot
2017-11-03 22:55 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).