linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 1/5] x86/hyperv: Add support for detecting nested hypervisor
       [not found] <cover.1672639707.git.jinankjain@linux.microsoft.com>
@ 2023-01-02  7:12 ` Jinank Jain
  2023-01-03 19:09   ` Michael Kelley (LINUX)
  2023-01-02  7:12 ` [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition Jinank Jain
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jinank Jain @ 2023-01-02  7:12 UTC (permalink / raw)
  To: jinankjain
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, arnd, peterz, jpoimboe, jinankjain, seanjc, kirill.shutemov,
	ak, sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

Detect if Linux is running as a nested hypervisor in the root
partition for Microsoft Hypervisor, using flags provided by MSHV.
Expose a new variable hv_nested that is used later for decisions
specific to the nested use case.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 3 +++
 arch/x86/kernel/cpu/mshyperv.c     | 7 +++++++
 drivers/hv/hv_common.c             | 9 ++++++---
 include/asm-generic/mshyperv.h     | 1 +
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 6d9368ea3701..58c03d18c235 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -114,6 +114,9 @@
 /* Recommend using the newer ExProcessorMasks interface */
 #define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED		BIT(11)
 
+/* Indicates that the hypervisor is nested within a Hyper-V partition. */
+#define HV_X64_HYPERV_NESTED				BIT(12)
+
 /* Recommend using enlightened VMCS */
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
 
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 46668e255421..f9b78d4829e3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,6 +37,8 @@
 
 /* Is Linux running as the root partition? */
 bool hv_root_partition;
+/* Is Linux running on nested Microsoft Hypervisor */
+bool hv_nested;
 struct ms_hyperv_info ms_hyperv;
 
 #if IS_ENABLED(CONFIG_HYPERV)
@@ -301,6 +303,11 @@ static void __init ms_hyperv_init_platform(void)
 		pr_info("Hyper-V: running as root partition\n");
 	}
 
+	if (ms_hyperv.hints & HV_X64_HYPERV_NESTED) {
+		hv_nested = true;
+		pr_info("Hyper-V: running on a nested hypervisor\n");
+	}
+
 	/*
 	 * Extract host information.
 	 */
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..52a6f89ccdbd 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -25,17 +25,20 @@
 #include <asm/mshyperv.h>
 
 /*
- * hv_root_partition and ms_hyperv are defined here with other Hyper-V
- * specific globals so they are shared across all architectures and are
+ * hv_root_partition, ms_hyperv and hv_nested are defined here with other
+ * Hyper-V specific globals so they are shared across all architectures and are
  * built only when CONFIG_HYPERV is defined.  But on x86,
  * ms_hyperv_init_platform() is built even when CONFIG_HYPERV is not
- * defined, and it uses these two variables.  So mark them as __weak
+ * defined, and it uses these three variables.  So mark them as __weak
  * here, allowing for an overriding definition in the module containing
  * ms_hyperv_init_platform().
  */
 bool __weak hv_root_partition;
 EXPORT_SYMBOL_GPL(hv_root_partition);
 
+bool __weak hv_nested;
+EXPORT_SYMBOL_GPL(hv_nested);
+
 struct ms_hyperv_info __weak ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index bfb9eb9d7215..f131027830c3 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -48,6 +48,7 @@ struct ms_hyperv_info {
 	u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
+extern bool hv_nested;
 
 extern void * __percpu *hyperv_pcpu_input_arg;
 extern void * __percpu *hyperv_pcpu_output_arg;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition
       [not found] <cover.1672639707.git.jinankjain@linux.microsoft.com>
  2023-01-02  7:12 ` [PATCH v10 1/5] x86/hyperv: Add support for detecting nested hypervisor Jinank Jain
@ 2023-01-02  7:12 ` Jinank Jain
  2023-01-03 19:11   ` Michael Kelley (LINUX)
  2023-01-09 17:03   ` Nuno Das Neves
  2023-01-02  7:12 ` [PATCH v10 3/5] x86/hyperv: Add an interface to do nested hypercalls Jinank Jain
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Jinank Jain @ 2023-01-02  7:12 UTC (permalink / raw)
  To: jinankjain
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, arnd, peterz, jpoimboe, jinankjain, seanjc, kirill.shutemov,
	ak, sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

Child partitions are free to allocate SynIC message and event page but in
case of root partition it must use the pages allocated by Microsoft
Hypervisor (MSHV). Base address for these pages can be found using
synthetic MSRs exposed by MSHV. There is a slight difference in those MSRs
for nested vs non-nested root partition.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 11 +++++
 arch/x86/include/asm/mshyperv.h    | 30 +++-----------
 arch/x86/kernel/cpu/mshyperv.c     | 65 ++++++++++++++++++++++++++++++
 drivers/hv/hv.c                    | 18 +++++----
 4 files changed, 93 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 58c03d18c235..b5019becb618 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -225,6 +225,17 @@ enum hv_isolation_type {
 #define HV_REGISTER_SINT14			0x4000009E
 #define HV_REGISTER_SINT15			0x4000009F
 
+/*
+ * Define synthetic interrupt controller model specific registers for
+ * nested hypervisor.
+ */
+#define HV_REGISTER_NESTED_SCONTROL            0x40001080
+#define HV_REGISTER_NESTED_SVERSION            0x40001081
+#define HV_REGISTER_NESTED_SIEFP               0x40001082
+#define HV_REGISTER_NESTED_SIMP                0x40001083
+#define HV_REGISTER_NESTED_EOM                 0x40001084
+#define HV_REGISTER_NESTED_SINT0               0x40001090
+
 /*
  * Synthetic Timer MSRs. Four timers per vcpu.
  */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 61f0c206bff0..c38e4c66a3ac 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -198,30 +198,10 @@ static inline bool hv_is_synic_reg(unsigned int reg)
 	return false;
 }
 
-static inline u64 hv_get_register(unsigned int reg)
-{
-	u64 value;
-
-	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
-		hv_ghcb_msr_read(reg, &value);
-	else
-		rdmsrl(reg, value);
-	return value;
-}
-
-static inline void hv_set_register(unsigned int reg, u64 value)
-{
-	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
-		hv_ghcb_msr_write(reg, value);
-
-		/* Write proxy bit via wrmsl instruction */
-		if (reg >= HV_REGISTER_SINT0 &&
-		    reg <= HV_REGISTER_SINT15)
-			wrmsrl(reg, value | 1 << 20);
-	} else {
-		wrmsrl(reg, value);
-	}
-}
+u64 hv_get_register(unsigned int reg);
+void hv_set_register(unsigned int reg, u64 value);
+u64 hv_get_non_nested_register(unsigned int reg);
+void hv_set_non_nested_register(unsigned int reg, u64 value);
 
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
@@ -241,6 +221,8 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
 }
 static inline void hv_set_register(unsigned int reg, u64 value) { }
 static inline u64 hv_get_register(unsigned int reg) { return 0; }
+static inline void hv_set_non_nested_register(unsigned int reg, u64 value) { }
+static inline u64 hv_get_non_nested_register(unsigned int reg) { return 0; }
 static inline int hv_set_mem_host_visibility(unsigned long addr, int numpages,
 					     bool visible)
 {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f9b78d4829e3..938fc82edf05 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -41,7 +41,72 @@ bool hv_root_partition;
 bool hv_nested;
 struct ms_hyperv_info ms_hyperv;
 
+static inline unsigned int hv_get_nested_reg(unsigned int reg)
+{
+	switch (reg) {
+	case HV_REGISTER_SIMP:
+		return HV_REGISTER_NESTED_SIMP;
+	case HV_REGISTER_SIEFP:
+		return HV_REGISTER_NESTED_SIEFP;
+	case HV_REGISTER_SVERSION:
+		return HV_REGISTER_NESTED_SVERSION;
+	case HV_REGISTER_SCONTROL:
+		return HV_REGISTER_NESTED_SCONTROL;
+	case HV_REGISTER_SINT0:
+		return HV_REGISTER_NESTED_SINT0;
+	case HV_REGISTER_EOM:
+		return HV_REGISTER_NESTED_EOM;
+	default:
+		return reg;
+	}
+}
+
 #if IS_ENABLED(CONFIG_HYPERV)
+u64 hv_get_non_nested_register(unsigned int reg)
+{
+	u64 value;
+
+	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
+		hv_ghcb_msr_read(reg, &value);
+	else
+		rdmsrl(reg, value);
+	return value;
+}
+EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
+
+void hv_set_non_nested_register(unsigned int reg, u64 value)
+{
+	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
+		hv_ghcb_msr_write(reg, value);
+
+		/* Write proxy bit via wrmsl instruction */
+		if (reg >= HV_REGISTER_SINT0 &&
+		    reg <= HV_REGISTER_SINT15)
+			wrmsrl(reg, value | 1 << 20);
+	} else {
+		wrmsrl(reg, value);
+	}
+}
+EXPORT_SYMBOL_GPL(hv_set_non_nested_register);
+
+u64 hv_get_register(unsigned int reg)
+{
+	if (hv_nested)
+		reg = hv_get_nested_reg(reg);
+
+	return hv_get_non_nested_register(reg);
+}
+EXPORT_SYMBOL_GPL(hv_get_register);
+
+void hv_set_register(unsigned int reg, u64 value)
+{
+	if (hv_nested)
+		reg = hv_get_nested_reg(reg);
+
+	hv_set_non_nested_register(reg, value);
+}
+EXPORT_SYMBOL_GPL(hv_set_register);
+
 static void (*vmbus_handler)(void);
 static void (*hv_stimer0_handler)(void);
 static void (*hv_kexec_handler)(void);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 4d6480d57546..8b0dd8e5244d 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -147,7 +147,7 @@ int hv_synic_alloc(void)
 		 * Synic message and event pages are allocated by paravisor.
 		 * Skip these pages allocation here.
 		 */
-		if (!hv_isolation_type_snp()) {
+		if (!hv_isolation_type_snp() && !hv_root_partition) {
 			hv_cpu->synic_message_page =
 				(void *)get_zeroed_page(GFP_ATOMIC);
 			if (hv_cpu->synic_message_page == NULL) {
@@ -216,7 +216,7 @@ void hv_synic_enable_regs(unsigned int cpu)
 	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
 	simp.simp_enabled = 1;
 
-	if (hv_isolation_type_snp()) {
+	if (hv_isolation_type_snp() || hv_root_partition) {
 		hv_cpu->synic_message_page
 			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
 				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
@@ -233,7 +233,7 @@ void hv_synic_enable_regs(unsigned int cpu)
 	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
 	siefp.siefp_enabled = 1;
 
-	if (hv_isolation_type_snp()) {
+	if (hv_isolation_type_snp() || hv_root_partition) {
 		hv_cpu->synic_event_page =
 			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
 				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
@@ -315,20 +315,24 @@ void hv_synic_disable_regs(unsigned int cpu)
 	 * addresses.
 	 */
 	simp.simp_enabled = 0;
-	if (hv_isolation_type_snp())
+	if (hv_isolation_type_snp() || hv_root_partition) {
 		memunmap(hv_cpu->synic_message_page);
-	else
+		hv_cpu->synic_message_page = NULL;
+	} else {
 		simp.base_simp_gpa = 0;
+	}
 
 	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
 
 	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
 	siefp.siefp_enabled = 0;
 
-	if (hv_isolation_type_snp())
+	if (hv_isolation_type_snp() || hv_root_partition) {
 		memunmap(hv_cpu->synic_event_page);
-	else
+		hv_cpu->synic_event_page = NULL;
+	} else {
 		siefp.base_siefp_gpa = 0;
+	}
 
 	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 3/5] x86/hyperv: Add an interface to do nested hypercalls
       [not found] <cover.1672639707.git.jinankjain@linux.microsoft.com>
  2023-01-02  7:12 ` [PATCH v10 1/5] x86/hyperv: Add support for detecting nested hypervisor Jinank Jain
  2023-01-02  7:12 ` [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition Jinank Jain
@ 2023-01-02  7:12 ` Jinank Jain
  2023-01-03 19:12   ` Michael Kelley (LINUX)
  2023-01-02  7:12 ` [PATCH v10 4/5] Drivers: hv: Enable vmbus driver for nested root partition Jinank Jain
  2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
  4 siblings, 1 reply; 14+ messages in thread
From: Jinank Jain @ 2023-01-02  7:12 UTC (permalink / raw)
  To: jinankjain
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, arnd, peterz, jpoimboe, jinankjain, seanjc, kirill.shutemov,
	ak, sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

According to TLFS, in order to communicate to L0 hypervisor there needs
to be an additional bit set in the control register. This communication
is required to perform privileged instructions which can only be
performed by L0 hypervisor. An example of that could be setting up the
VMBus infrastructure.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h |  3 ++-
 arch/x86/include/asm/mshyperv.h    | 42 +++++++++++++++++++++++++++---
 include/asm-generic/hyperv-tlfs.h  |  1 +
 3 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index b5019becb618..7758c495541d 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -380,7 +380,8 @@ struct hv_nested_enlightenments_control {
 		__u32 reserved:31;
 	} features;
 	struct {
-		__u32 reserved;
+		__u32 inter_partition_comm:1;
+		__u32 reserved:31;
 	} hypercallControls;
 } __packed;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index c38e4c66a3ac..9e5535044ed0 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -74,10 +74,16 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	return hv_status;
 }
 
+/* Hypercall to the L0 hypervisor */
+static inline u64 hv_do_nested_hypercall(u64 control, void *input, void *output)
+{
+	return hv_do_hypercall(control | HV_HYPERCALL_NESTED, input, output);
+}
+
 /* Fast hypercall with 8 bytes of input and no output */
-static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
+static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 {
-	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
+	u64 hv_status;
 
 #ifdef CONFIG_X86_64
 	{
@@ -105,10 +111,24 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 		return hv_status;
 }
 
+static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
+{
+	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT;
+
+	return _hv_do_fast_hypercall8(control, input1);
+}
+
+static inline u64 hv_do_fast_nested_hypercall8(u16 code, u64 input1)
+{
+	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT | HV_HYPERCALL_NESTED;
+
+	return _hv_do_fast_hypercall8(control, input1);
+}
+
 /* Fast hypercall with 16 bytes of input */
-static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
+static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 {
-	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
+	u64 hv_status;
 
 #ifdef CONFIG_X86_64
 	{
@@ -139,6 +159,20 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
 	return hv_status;
 }
 
+static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
+{
+	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT;
+
+	return _hv_do_fast_hypercall16(control, input1, input2);
+}
+
+static inline u64 hv_do_fast_nested_hypercall16(u16 code, u64 input1, u64 input2)
+{
+	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT | HV_HYPERCALL_NESTED;
+
+	return _hv_do_fast_hypercall16(control, input1, input2);
+}
+
 extern struct hv_vp_assist_page **hv_vp_assist_page;
 
 static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index b17c6eeb9afa..e61ee461c4fc 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -194,6 +194,7 @@ enum HV_GENERIC_SET_FORMAT {
 #define HV_HYPERCALL_VARHEAD_OFFSET	17
 #define HV_HYPERCALL_VARHEAD_MASK	GENMASK_ULL(26, 17)
 #define HV_HYPERCALL_RSVD0_MASK		GENMASK_ULL(31, 27)
+#define HV_HYPERCALL_NESTED		BIT_ULL(31)
 #define HV_HYPERCALL_REP_COMP_OFFSET	32
 #define HV_HYPERCALL_REP_COMP_1		BIT_ULL(32)
 #define HV_HYPERCALL_REP_COMP_MASK	GENMASK_ULL(43, 32)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 4/5] Drivers: hv: Enable vmbus driver for nested root partition
       [not found] <cover.1672639707.git.jinankjain@linux.microsoft.com>
                   ` (2 preceding siblings ...)
  2023-01-02  7:12 ` [PATCH v10 3/5] x86/hyperv: Add an interface to do nested hypercalls Jinank Jain
@ 2023-01-02  7:12 ` Jinank Jain
  2023-01-03 19:12   ` Michael Kelley (LINUX)
  2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
  4 siblings, 1 reply; 14+ messages in thread
From: Jinank Jain @ 2023-01-02  7:12 UTC (permalink / raw)
  To: jinankjain
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, arnd, peterz, jpoimboe, jinankjain, seanjc, kirill.shutemov,
	ak, sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

Currently VMBus driver is not initialized for root partition but we need
to enable the VMBus driver for nested root partition. This is required,
so that L2 root can use the VMBus devices.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
---
 drivers/hv/vmbus_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0f00d57b7c25..6324e01d5eec 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2745,7 +2745,7 @@ static int __init hv_acpi_init(void)
 	if (!hv_is_hyperv_initialized())
 		return -ENODEV;
 
-	if (hv_root_partition)
+	if (hv_root_partition && !hv_nested)
 		return 0;
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 5/5] x86/hyperv: Change interrupt vector for nested root partition
       [not found] <cover.1672639707.git.jinankjain@linux.microsoft.com>
                   ` (3 preceding siblings ...)
  2023-01-02  7:12 ` [PATCH v10 4/5] Drivers: hv: Enable vmbus driver for nested root partition Jinank Jain
@ 2023-01-02  7:12 ` Jinank Jain
  2023-01-03 19:15   ` Michael Kelley (LINUX)
                     ` (3 more replies)
  4 siblings, 4 replies; 14+ messages in thread
From: Jinank Jain @ 2023-01-02  7:12 UTC (permalink / raw)
  To: jinankjain
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, arnd, peterz, jpoimboe, jinankjain, seanjc, kirill.shutemov,
	ak, sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

Traditionally we have been using the HYPERVISOR_CALLBACK_VECTOR to relay
the VMBus interrupt. But this does not work in case of nested
hypervisor. Microsoft Hypervisor reserves 0x31 to 0x34 as the interrupt
vector range for VMBus and thus we have to use one of the vectors from
that range and setup the IDT accordingly.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
---
 arch/x86/include/asm/idtentry.h    |  2 ++
 arch/x86/include/asm/irq_vectors.h |  6 ++++++
 arch/x86/kernel/cpu/mshyperv.c     | 15 +++++++++++++++
 arch/x86/kernel/idt.c              | 10 ++++++++++
 drivers/hv/vmbus_drv.c             |  3 ++-
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 72184b0b2219..c0648e3e4d4a 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -686,6 +686,8 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR,	sysvec_kvm_posted_intr_nested
 DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR,	sysvec_hyperv_callback);
 DECLARE_IDTENTRY_SYSVEC(HYPERV_REENLIGHTENMENT_VECTOR,	sysvec_hyperv_reenlightenment);
 DECLARE_IDTENTRY_SYSVEC(HYPERV_STIMER0_VECTOR,	sysvec_hyperv_stimer0);
+DECLARE_IDTENTRY_SYSVEC(HYPERV_INTR_NESTED_VMBUS_VECTOR,
+			sysvec_hyperv_nested_vmbus_intr);
 #endif
 
 #if IS_ENABLED(CONFIG_ACRN_GUEST)
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 43dcb9284208..729d19eab7f5 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -102,6 +102,12 @@
 #if IS_ENABLED(CONFIG_HYPERV)
 #define HYPERV_REENLIGHTENMENT_VECTOR	0xee
 #define HYPERV_STIMER0_VECTOR		0xed
+/*
+ * FIXME: Change this, once Microsoft Hypervisor changes its assumption
+ * around VMBus interrupt vector allocation for nested root partition.
+ * Or provides a better interface to detect this instead of hardcoding.
+ */
+#define HYPERV_INTR_NESTED_VMBUS_VECTOR	0x31
 #endif
 
 #define LOCAL_TIMER_VECTOR		0xec
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 938fc82edf05..4dfe0f9d7be3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -126,6 +126,21 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
 	set_irq_regs(old_regs);
 }
 
+DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_nested_vmbus_intr)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+
+	inc_irq_stat(irq_hv_callback_count);
+
+	if (vmbus_handler)
+		vmbus_handler();
+
+	if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
+		ack_APIC_irq();
+
+	set_irq_regs(old_regs);
+}
+
 void hv_setup_vmbus_handler(void (*handler)(void))
 {
 	vmbus_handler = handler;
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a58c6bc1cd68..3536935cea39 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -160,6 +160,16 @@ static const __initconst struct idt_data apic_idts[] = {
 # endif
 	INTG(SPURIOUS_APIC_VECTOR,		asm_sysvec_spurious_apic_interrupt),
 	INTG(ERROR_APIC_VECTOR,			asm_sysvec_error_interrupt),
+#ifdef CONFIG_HYPERV
+	/*
+	 * This is a hack because we cannot install this interrupt handler
+	 * via alloc_intr_gate as it does not allow interrupt vector less
+	 * than FIRST_SYSTEM_VECTORS. And hyperv does not want anything other
+	 * than 0x31-0x34 as the interrupt vector for vmbus interrupt in case
+	 * of nested setup.
+	 */
+	INTG(HYPERV_INTR_NESTED_VMBUS_VECTOR, asm_sysvec_hyperv_nested_vmbus_intr),
+#endif
 #endif
 };
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 6324e01d5eec..740878367426 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2768,7 +2768,8 @@ static int __init hv_acpi_init(void)
 	 * normal Linux IRQ mechanism is not used in this case.
 	 */
 #ifdef HYPERVISOR_CALLBACK_VECTOR
-	vmbus_interrupt = HYPERVISOR_CALLBACK_VECTOR;
+	vmbus_interrupt = hv_nested ? HYPERV_INTR_NESTED_VMBUS_VECTOR :
+				      HYPERVISOR_CALLBACK_VECTOR;
 	vmbus_irq = -1;
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH v10 1/5] x86/hyperv: Add support for detecting nested hypervisor
  2023-01-02  7:12 ` [PATCH v10 1/5] x86/hyperv: Add support for detecting nested hypervisor Jinank Jain
@ 2023-01-03 19:09   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley (LINUX) @ 2023-01-03 19:09 UTC (permalink / raw)
  To: Jinank Jain, Jinank Jain
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu, Dexuan Cui, tglx, mingo,
	bp, dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh

From: Jinank Jain <jinankjain@linux.microsoft.com> Sent: Sunday, January 1, 2023 11:13 PM
> 
> Detect if Linux is running as a nested hypervisor in the root
> partition for Microsoft Hypervisor, using flags provided by MSHV.
> Expose a new variable hv_nested that is used later for decisions
> specific to the nested use case.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 3 +++
>  arch/x86/kernel/cpu/mshyperv.c     | 7 +++++++
>  drivers/hv/hv_common.c             | 9 ++++++---
>  include/asm-generic/mshyperv.h     | 1 +
>  4 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6d9368ea3701..58c03d18c235 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -114,6 +114,9 @@
>  /* Recommend using the newer ExProcessorMasks interface */
>  #define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED		BIT(11)
> 
> +/* Indicates that the hypervisor is nested within a Hyper-V partition. */
> +#define HV_X64_HYPERV_NESTED				BIT(12)
> +
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 46668e255421..f9b78d4829e3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -37,6 +37,8 @@
> 
>  /* Is Linux running as the root partition? */
>  bool hv_root_partition;
> +/* Is Linux running on nested Microsoft Hypervisor */
> +bool hv_nested;
>  struct ms_hyperv_info ms_hyperv;
> 
>  #if IS_ENABLED(CONFIG_HYPERV)
> @@ -301,6 +303,11 @@ static void __init ms_hyperv_init_platform(void)
>  		pr_info("Hyper-V: running as root partition\n");
>  	}
> 
> +	if (ms_hyperv.hints & HV_X64_HYPERV_NESTED) {
> +		hv_nested = true;
> +		pr_info("Hyper-V: running on a nested hypervisor\n");
> +	}
> +
>  	/*
>  	 * Extract host information.
>  	 */
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index ae68298c0dca..52a6f89ccdbd 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -25,17 +25,20 @@
>  #include <asm/mshyperv.h>
> 
>  /*
> - * hv_root_partition and ms_hyperv are defined here with other Hyper-V
> - * specific globals so they are shared across all architectures and are
> + * hv_root_partition, ms_hyperv and hv_nested are defined here with other
> + * Hyper-V specific globals so they are shared across all architectures and are
>   * built only when CONFIG_HYPERV is defined.  But on x86,
>   * ms_hyperv_init_platform() is built even when CONFIG_HYPERV is not
> - * defined, and it uses these two variables.  So mark them as __weak
> + * defined, and it uses these three variables.  So mark them as __weak
>   * here, allowing for an overriding definition in the module containing
>   * ms_hyperv_init_platform().
>   */
>  bool __weak hv_root_partition;
>  EXPORT_SYMBOL_GPL(hv_root_partition);
> 
> +bool __weak hv_nested;
> +EXPORT_SYMBOL_GPL(hv_nested);
> +
>  struct ms_hyperv_info __weak ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
> 
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index bfb9eb9d7215..f131027830c3 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -48,6 +48,7 @@ struct ms_hyperv_info {
>  	u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> +extern bool hv_nested;
> 
>  extern void * __percpu *hyperv_pcpu_input_arg;
>  extern void * __percpu *hyperv_pcpu_output_arg;
> --
> 2.25.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition
  2023-01-02  7:12 ` [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition Jinank Jain
@ 2023-01-03 19:11   ` Michael Kelley (LINUX)
  2023-01-09 17:03   ` Nuno Das Neves
  1 sibling, 0 replies; 14+ messages in thread
From: Michael Kelley (LINUX) @ 2023-01-03 19:11 UTC (permalink / raw)
  To: Jinank Jain, Jinank Jain
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu, Dexuan Cui, tglx, mingo,
	bp, dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh

From: Jinank Jain <jinankjain@linux.microsoft.com> Sent: Sunday, January 1, 2023 11:13 PM
> 
> Child partitions are free to allocate SynIC message and event page but in
> case of root partition it must use the pages allocated by Microsoft
> Hypervisor (MSHV). Base address for these pages can be found using
> synthetic MSRs exposed by MSHV. There is a slight difference in those MSRs
> for nested vs non-nested root partition.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 11 +++++
>  arch/x86/include/asm/mshyperv.h    | 30 +++-----------
>  arch/x86/kernel/cpu/mshyperv.c     | 65 ++++++++++++++++++++++++++++++
>  drivers/hv/hv.c                    | 18 +++++----
>  4 files changed, 93 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 58c03d18c235..b5019becb618 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -225,6 +225,17 @@ enum hv_isolation_type {
>  #define HV_REGISTER_SINT14			0x4000009E
>  #define HV_REGISTER_SINT15			0x4000009F
> 
> +/*
> + * Define synthetic interrupt controller model specific registers for
> + * nested hypervisor.
> + */
> +#define HV_REGISTER_NESTED_SCONTROL            0x40001080
> +#define HV_REGISTER_NESTED_SVERSION            0x40001081
> +#define HV_REGISTER_NESTED_SIEFP               0x40001082
> +#define HV_REGISTER_NESTED_SIMP                0x40001083
> +#define HV_REGISTER_NESTED_EOM                 0x40001084
> +#define HV_REGISTER_NESTED_SINT0               0x40001090
> +
>  /*
>   * Synthetic Timer MSRs. Four timers per vcpu.
>   */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 61f0c206bff0..c38e4c66a3ac 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -198,30 +198,10 @@ static inline bool hv_is_synic_reg(unsigned int reg)
>  	return false;
>  }
> 
> -static inline u64 hv_get_register(unsigned int reg)
> -{
> -	u64 value;
> -
> -	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
> -		hv_ghcb_msr_read(reg, &value);
> -	else
> -		rdmsrl(reg, value);
> -	return value;
> -}
> -
> -static inline void hv_set_register(unsigned int reg, u64 value)
> -{
> -	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
> -		hv_ghcb_msr_write(reg, value);
> -
> -		/* Write proxy bit via wrmsl instruction */
> -		if (reg >= HV_REGISTER_SINT0 &&
> -		    reg <= HV_REGISTER_SINT15)
> -			wrmsrl(reg, value | 1 << 20);
> -	} else {
> -		wrmsrl(reg, value);
> -	}
> -}
> +u64 hv_get_register(unsigned int reg);
> +void hv_set_register(unsigned int reg, u64 value);
> +u64 hv_get_non_nested_register(unsigned int reg);
> +void hv_set_non_nested_register(unsigned int reg, u64 value);
> 
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
> @@ -241,6 +221,8 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
>  }
>  static inline void hv_set_register(unsigned int reg, u64 value) { }
>  static inline u64 hv_get_register(unsigned int reg) { return 0; }
> +static inline void hv_set_non_nested_register(unsigned int reg, u64 value) { }
> +static inline u64 hv_get_non_nested_register(unsigned int reg) { return 0; }
>  static inline int hv_set_mem_host_visibility(unsigned long addr, int numpages,
>  					     bool visible)
>  {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f9b78d4829e3..938fc82edf05 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -41,7 +41,72 @@ bool hv_root_partition;
>  bool hv_nested;
>  struct ms_hyperv_info ms_hyperv;
> 
> +static inline unsigned int hv_get_nested_reg(unsigned int reg)
> +{
> +	switch (reg) {
> +	case HV_REGISTER_SIMP:
> +		return HV_REGISTER_NESTED_SIMP;
> +	case HV_REGISTER_SIEFP:
> +		return HV_REGISTER_NESTED_SIEFP;
> +	case HV_REGISTER_SVERSION:
> +		return HV_REGISTER_NESTED_SVERSION;
> +	case HV_REGISTER_SCONTROL:
> +		return HV_REGISTER_NESTED_SCONTROL;
> +	case HV_REGISTER_SINT0:
> +		return HV_REGISTER_NESTED_SINT0;
> +	case HV_REGISTER_EOM:
> +		return HV_REGISTER_NESTED_EOM;
> +	default:
> +		return reg;
> +	}
> +}
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
> +u64 hv_get_non_nested_register(unsigned int reg)
> +{
> +	u64 value;
> +
> +	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
> +		hv_ghcb_msr_read(reg, &value);
> +	else
> +		rdmsrl(reg, value);
> +	return value;
> +}
> +EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
> +
> +void hv_set_non_nested_register(unsigned int reg, u64 value)
> +{
> +	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
> +		hv_ghcb_msr_write(reg, value);
> +
> +		/* Write proxy bit via wrmsl instruction */
> +		if (reg >= HV_REGISTER_SINT0 &&
> +		    reg <= HV_REGISTER_SINT15)
> +			wrmsrl(reg, value | 1 << 20);
> +	} else {
> +		wrmsrl(reg, value);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(hv_set_non_nested_register);
> +
> +u64 hv_get_register(unsigned int reg)
> +{
> +	if (hv_nested)
> +		reg = hv_get_nested_reg(reg);
> +
> +	return hv_get_non_nested_register(reg);
> +}
> +EXPORT_SYMBOL_GPL(hv_get_register);
> +
> +void hv_set_register(unsigned int reg, u64 value)
> +{
> +	if (hv_nested)
> +		reg = hv_get_nested_reg(reg);
> +
> +	hv_set_non_nested_register(reg, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_set_register);
> +
>  static void (*vmbus_handler)(void);
>  static void (*hv_stimer0_handler)(void);
>  static void (*hv_kexec_handler)(void);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 4d6480d57546..8b0dd8e5244d 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -147,7 +147,7 @@ int hv_synic_alloc(void)
>  		 * Synic message and event pages are allocated by paravisor.
>  		 * Skip these pages allocation here.
>  		 */
> -		if (!hv_isolation_type_snp()) {
> +		if (!hv_isolation_type_snp() && !hv_root_partition) {
>  			hv_cpu->synic_message_page =
>  				(void *)get_zeroed_page(GFP_ATOMIC);
>  			if (hv_cpu->synic_message_page == NULL) {
> @@ -216,7 +216,7 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
>  	simp.simp_enabled = 1;
> 
> -	if (hv_isolation_type_snp()) {
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		hv_cpu->synic_message_page
>  			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
>  				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> @@ -233,7 +233,7 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
>  	siefp.siefp_enabled = 1;
> 
> -	if (hv_isolation_type_snp()) {
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		hv_cpu->synic_event_page =
>  			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
>  				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> @@ -315,20 +315,24 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	 * addresses.
>  	 */
>  	simp.simp_enabled = 0;
> -	if (hv_isolation_type_snp())
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		memunmap(hv_cpu->synic_message_page);
> -	else
> +		hv_cpu->synic_message_page = NULL;
> +	} else {
>  		simp.base_simp_gpa = 0;
> +	}
> 
>  	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> 
>  	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
>  	siefp.siefp_enabled = 0;
> 
> -	if (hv_isolation_type_snp())
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		memunmap(hv_cpu->synic_event_page);
> -	else
> +		hv_cpu->synic_event_page = NULL;
> +	} else {
>  		siefp.base_siefp_gpa = 0;
> +	}
> 
>  	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> 
> --
> 2.25.1

Looks good!  Thanks for persisting and addressing my concerns.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v10 3/5] x86/hyperv: Add an interface to do nested hypercalls
  2023-01-02  7:12 ` [PATCH v10 3/5] x86/hyperv: Add an interface to do nested hypercalls Jinank Jain
@ 2023-01-03 19:12   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley (LINUX) @ 2023-01-03 19:12 UTC (permalink / raw)
  To: Jinank Jain, Jinank Jain
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu, Dexuan Cui, tglx, mingo,
	bp, dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh

From: Jinank Jain <jinankjain@linux.microsoft.com> Sent: Sunday, January 1, 2023 11:13 PM
> 
> According to TLFS, in order to communicate to L0 hypervisor there needs
> to be an additional bit set in the control register. This communication
> is required to perform privileged instructions which can only be
> performed by L0 hypervisor. An example of that could be setting up the
> VMBus infrastructure.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h |  3 ++-
>  arch/x86/include/asm/mshyperv.h    | 42 +++++++++++++++++++++++++++---
>  include/asm-generic/hyperv-tlfs.h  |  1 +
>  3 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index b5019becb618..7758c495541d 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -380,7 +380,8 @@ struct hv_nested_enlightenments_control {
>  		__u32 reserved:31;
>  	} features;
>  	struct {
> -		__u32 reserved;
> +		__u32 inter_partition_comm:1;
> +		__u32 reserved:31;
>  	} hypercallControls;
>  } __packed;
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index c38e4c66a3ac..9e5535044ed0 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -74,10 +74,16 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void
> *output)
>  	return hv_status;
>  }
> 
> +/* Hypercall to the L0 hypervisor */
> +static inline u64 hv_do_nested_hypercall(u64 control, void *input, void *output)
> +{
> +	return hv_do_hypercall(control | HV_HYPERCALL_NESTED, input, output);
> +}
> +
>  /* Fast hypercall with 8 bytes of input and no output */
> -static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> +static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
>  {
> -	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> +	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
>  	{
> @@ -105,10 +111,24 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64
> input1)
>  		return hv_status;
>  }
> 
> +static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> +{
> +	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT;
> +
> +	return _hv_do_fast_hypercall8(control, input1);
> +}
> +
> +static inline u64 hv_do_fast_nested_hypercall8(u16 code, u64 input1)
> +{
> +	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT | HV_HYPERCALL_NESTED;
> +
> +	return _hv_do_fast_hypercall8(control, input1);
> +}
> +
>  /* Fast hypercall with 16 bytes of input */
> -static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
> +static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
>  {
> -	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> +	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
>  	{
> @@ -139,6 +159,20 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64
> input1, u64 input2)
>  	return hv_status;
>  }
> 
> +static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
> +{
> +	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT;
> +
> +	return _hv_do_fast_hypercall16(control, input1, input2);
> +}
> +
> +static inline u64 hv_do_fast_nested_hypercall16(u16 code, u64 input1, u64 input2)
> +{
> +	u64 control = (u64)code | HV_HYPERCALL_FAST_BIT | HV_HYPERCALL_NESTED;
> +
> +	return _hv_do_fast_hypercall16(control, input1, input2);
> +}
> +
>  extern struct hv_vp_assist_page **hv_vp_assist_page;
> 
>  static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index b17c6eeb9afa..e61ee461c4fc 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -194,6 +194,7 @@ enum HV_GENERIC_SET_FORMAT {
>  #define HV_HYPERCALL_VARHEAD_OFFSET	17
>  #define HV_HYPERCALL_VARHEAD_MASK	GENMASK_ULL(26, 17)
>  #define HV_HYPERCALL_RSVD0_MASK		GENMASK_ULL(31, 27)
> +#define HV_HYPERCALL_NESTED		BIT_ULL(31)
>  #define HV_HYPERCALL_REP_COMP_OFFSET	32
>  #define HV_HYPERCALL_REP_COMP_1		BIT_ULL(32)
>  #define HV_HYPERCALL_REP_COMP_MASK	GENMASK_ULL(43, 32)
> --
> 2.25.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v10 4/5] Drivers: hv: Enable vmbus driver for nested root partition
  2023-01-02  7:12 ` [PATCH v10 4/5] Drivers: hv: Enable vmbus driver for nested root partition Jinank Jain
@ 2023-01-03 19:12   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley (LINUX) @ 2023-01-03 19:12 UTC (permalink / raw)
  To: Jinank Jain, Jinank Jain
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu, Dexuan Cui, tglx, mingo,
	bp, dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh

From: Jinank Jain <jinankjain@linux.microsoft.com> Sent: Sunday, January 1, 2023 11:13 PM
> 
> Currently VMBus driver is not initialized for root partition but we need
> to enable the VMBus driver for nested root partition. This is required,
> so that L2 root can use the VMBus devices.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 0f00d57b7c25..6324e01d5eec 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2745,7 +2745,7 @@ static int __init hv_acpi_init(void)
>  	if (!hv_is_hyperv_initialized())
>  		return -ENODEV;
> 
> -	if (hv_root_partition)
> +	if (hv_root_partition && !hv_nested)
>  		return 0;
> 
>  	/*
> --
> 2.25.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v10 5/5] x86/hyperv: Change interrupt vector for nested root partition
  2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
@ 2023-01-03 19:15   ` Michael Kelley (LINUX)
  2023-01-12 15:27   ` Wei Liu
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley (LINUX) @ 2023-01-03 19:15 UTC (permalink / raw)
  To: Jinank Jain, Jinank Jain
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu, Dexuan Cui, tglx, mingo,
	bp, dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh

From: Jinank Jain <jinankjain@linux.microsoft.com> Sent: Sunday, January 1, 2023 11:13 PM
> 
> Traditionally we have been using the HYPERVISOR_CALLBACK_VECTOR to relay
> the VMBus interrupt. But this does not work in case of nested
> hypervisor. Microsoft Hypervisor reserves 0x31 to 0x34 as the interrupt
> vector range for VMBus and thus we have to use one of the vectors from
> that range and setup the IDT accordingly.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  arch/x86/include/asm/idtentry.h    |  2 ++
>  arch/x86/include/asm/irq_vectors.h |  6 ++++++
>  arch/x86/kernel/cpu/mshyperv.c     | 15 +++++++++++++++
>  arch/x86/kernel/idt.c              | 10 ++++++++++
>  drivers/hv/vmbus_drv.c             |  3 ++-
>  5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 72184b0b2219..c0648e3e4d4a 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -686,6 +686,8 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR,
> 	sysvec_kvm_posted_intr_nested
>  DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR,
> 	sysvec_hyperv_callback);
>  DECLARE_IDTENTRY_SYSVEC(HYPERV_REENLIGHTENMENT_VECTOR,
> 	sysvec_hyperv_reenlightenment);
>  DECLARE_IDTENTRY_SYSVEC(HYPERV_STIMER0_VECTOR,
> 	sysvec_hyperv_stimer0);
> +DECLARE_IDTENTRY_SYSVEC(HYPERV_INTR_NESTED_VMBUS_VECTOR,
> +			sysvec_hyperv_nested_vmbus_intr);
>  #endif
> 
>  #if IS_ENABLED(CONFIG_ACRN_GUEST)
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 43dcb9284208..729d19eab7f5 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -102,6 +102,12 @@
>  #if IS_ENABLED(CONFIG_HYPERV)
>  #define HYPERV_REENLIGHTENMENT_VECTOR	0xee
>  #define HYPERV_STIMER0_VECTOR		0xed
> +/*
> + * FIXME: Change this, once Microsoft Hypervisor changes its assumption
> + * around VMBus interrupt vector allocation for nested root partition.
> + * Or provides a better interface to detect this instead of hardcoding.
> + */
> +#define HYPERV_INTR_NESTED_VMBUS_VECTOR	0x31
>  #endif
> 
>  #define LOCAL_TIMER_VECTOR		0xec
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 938fc82edf05..4dfe0f9d7be3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -126,6 +126,21 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
>  	set_irq_regs(old_regs);
>  }
> 
> +DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_nested_vmbus_intr)
> +{
> +	struct pt_regs *old_regs = set_irq_regs(regs);
> +
> +	inc_irq_stat(irq_hv_callback_count);
> +
> +	if (vmbus_handler)
> +		vmbus_handler();
> +
> +	if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
> +		ack_APIC_irq();
> +
> +	set_irq_regs(old_regs);
> +}
> +
>  void hv_setup_vmbus_handler(void (*handler)(void))
>  {
>  	vmbus_handler = handler;
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index a58c6bc1cd68..3536935cea39 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -160,6 +160,16 @@ static const __initconst struct idt_data apic_idts[] = {
>  # endif
>  	INTG(SPURIOUS_APIC_VECTOR,
> 	asm_sysvec_spurious_apic_interrupt),
>  	INTG(ERROR_APIC_VECTOR,			asm_sysvec_error_interrupt),
> +#ifdef CONFIG_HYPERV
> +	/*
> +	 * This is a hack because we cannot install this interrupt handler
> +	 * via alloc_intr_gate as it does not allow interrupt vector less
> +	 * than FIRST_SYSTEM_VECTORS. And hyperv does not want anything other
> +	 * than 0x31-0x34 as the interrupt vector for vmbus interrupt in case
> +	 * of nested setup.
> +	 */
> +	INTG(HYPERV_INTR_NESTED_VMBUS_VECTOR,
> asm_sysvec_hyperv_nested_vmbus_intr),
> +#endif
>  #endif
>  };
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 6324e01d5eec..740878367426 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2768,7 +2768,8 @@ static int __init hv_acpi_init(void)
>  	 * normal Linux IRQ mechanism is not used in this case.
>  	 */
>  #ifdef HYPERVISOR_CALLBACK_VECTOR
> -	vmbus_interrupt = HYPERVISOR_CALLBACK_VECTOR;
> +	vmbus_interrupt = hv_nested ? HYPERV_INTR_NESTED_VMBUS_VECTOR :
> +				      HYPERVISOR_CALLBACK_VECTOR;
>  	vmbus_irq = -1;
>  #endif
> 
> --
> 2.25.1

I'm giving my "Reviewed-by" based on what I know, but I'm unsure
about the validity of grabbing vector 0x31 out of the middle of the
range versus at the end like all the other fixed vectors.  Getting
this changed on the MSHV side would really be a better solution.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition
  2023-01-02  7:12 ` [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition Jinank Jain
  2023-01-03 19:11   ` Michael Kelley (LINUX)
@ 2023-01-09 17:03   ` Nuno Das Neves
  1 sibling, 0 replies; 14+ messages in thread
From: Nuno Das Neves @ 2023-01-09 17:03 UTC (permalink / raw)
  To: Jinank Jain, jinankjain
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, arnd, peterz, jpoimboe, seanjc, kirill.shutemov, ak,
	sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

On 1/1/2023 11:12 PM, Jinank Jain wrote:
> Child partitions are free to allocate SynIC message and event page but in
> case of root partition it must use the pages allocated by Microsoft
> Hypervisor (MSHV). Base address for these pages can be found using
> synthetic MSRs exposed by MSHV. There is a slight difference in those MSRs
> for nested vs non-nested root partition.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 11 +++++
>  arch/x86/include/asm/mshyperv.h    | 30 +++-----------
>  arch/x86/kernel/cpu/mshyperv.c     | 65 ++++++++++++++++++++++++++++++
>  drivers/hv/hv.c                    | 18 +++++----
>  4 files changed, 93 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 58c03d18c235..b5019becb618 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -225,6 +225,17 @@ enum hv_isolation_type {
>  #define HV_REGISTER_SINT14			0x4000009E
>  #define HV_REGISTER_SINT15			0x4000009F
>  
> +/*
> + * Define synthetic interrupt controller model specific registers for
> + * nested hypervisor.
> + */
> +#define HV_REGISTER_NESTED_SCONTROL            0x40001080
> +#define HV_REGISTER_NESTED_SVERSION            0x40001081
> +#define HV_REGISTER_NESTED_SIEFP               0x40001082
> +#define HV_REGISTER_NESTED_SIMP                0x40001083
> +#define HV_REGISTER_NESTED_EOM                 0x40001084
> +#define HV_REGISTER_NESTED_SINT0               0x40001090
> +
>  /*
>   * Synthetic Timer MSRs. Four timers per vcpu.
>   */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 61f0c206bff0..c38e4c66a3ac 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -198,30 +198,10 @@ static inline bool hv_is_synic_reg(unsigned int reg)
>  	return false;
>  }
>  
> -static inline u64 hv_get_register(unsigned int reg)
> -{
> -	u64 value;
> -
> -	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
> -		hv_ghcb_msr_read(reg, &value);
> -	else
> -		rdmsrl(reg, value);
> -	return value;
> -}
> -
> -static inline void hv_set_register(unsigned int reg, u64 value)
> -{
> -	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
> -		hv_ghcb_msr_write(reg, value);
> -
> -		/* Write proxy bit via wrmsl instruction */
> -		if (reg >= HV_REGISTER_SINT0 &&
> -		    reg <= HV_REGISTER_SINT15)
> -			wrmsrl(reg, value | 1 << 20);
> -	} else {
> -		wrmsrl(reg, value);
> -	}
> -}
> +u64 hv_get_register(unsigned int reg);
> +void hv_set_register(unsigned int reg, u64 value);
> +u64 hv_get_non_nested_register(unsigned int reg);
> +void hv_set_non_nested_register(unsigned int reg, u64 value);
>  
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
> @@ -241,6 +221,8 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
>  }
>  static inline void hv_set_register(unsigned int reg, u64 value) { }
>  static inline u64 hv_get_register(unsigned int reg) { return 0; }
> +static inline void hv_set_non_nested_register(unsigned int reg, u64 value) { }
> +static inline u64 hv_get_non_nested_register(unsigned int reg) { return 0; }
>  static inline int hv_set_mem_host_visibility(unsigned long addr, int numpages,
>  					     bool visible)
>  {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f9b78d4829e3..938fc82edf05 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -41,7 +41,72 @@ bool hv_root_partition;
>  bool hv_nested;
>  struct ms_hyperv_info ms_hyperv;
>  
> +static inline unsigned int hv_get_nested_reg(unsigned int reg)
> +{
> +	switch (reg) {
> +	case HV_REGISTER_SIMP:
> +		return HV_REGISTER_NESTED_SIMP;
> +	case HV_REGISTER_SIEFP:
> +		return HV_REGISTER_NESTED_SIEFP;
> +	case HV_REGISTER_SVERSION:
> +		return HV_REGISTER_NESTED_SVERSION;
> +	case HV_REGISTER_SCONTROL:
> +		return HV_REGISTER_NESTED_SCONTROL;
> +	case HV_REGISTER_SINT0:
> +		return HV_REGISTER_NESTED_SINT0;
> +	case HV_REGISTER_EOM:
> +		return HV_REGISTER_NESTED_EOM;
> +	default:
> +		return reg;
> +	}
> +}
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
> +u64 hv_get_non_nested_register(unsigned int reg)
> +{
> +	u64 value;
> +
> +	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
> +		hv_ghcb_msr_read(reg, &value);
> +	else
> +		rdmsrl(reg, value);
> +	return value;
> +}
> +EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
> +
> +void hv_set_non_nested_register(unsigned int reg, u64 value)
> +{
> +	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
> +		hv_ghcb_msr_write(reg, value);
> +
> +		/* Write proxy bit via wrmsl instruction */
> +		if (reg >= HV_REGISTER_SINT0 &&
> +		    reg <= HV_REGISTER_SINT15)
> +			wrmsrl(reg, value | 1 << 20);
> +	} else {
> +		wrmsrl(reg, value);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(hv_set_non_nested_register);
> +
> +u64 hv_get_register(unsigned int reg)
> +{
> +	if (hv_nested)
> +		reg = hv_get_nested_reg(reg);
> +
> +	return hv_get_non_nested_register(reg);
> +}
> +EXPORT_SYMBOL_GPL(hv_get_register);
> +
> +void hv_set_register(unsigned int reg, u64 value)
> +{
> +	if (hv_nested)
> +		reg = hv_get_nested_reg(reg);
> +
> +	hv_set_non_nested_register(reg, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_set_register);
> +
>  static void (*vmbus_handler)(void);
>  static void (*hv_stimer0_handler)(void);
>  static void (*hv_kexec_handler)(void);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 4d6480d57546..8b0dd8e5244d 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -147,7 +147,7 @@ int hv_synic_alloc(void)
>  		 * Synic message and event pages are allocated by paravisor.
>  		 * Skip these pages allocation here.
>  		 */
> -		if (!hv_isolation_type_snp()) {
> +		if (!hv_isolation_type_snp() && !hv_root_partition) {
>  			hv_cpu->synic_message_page =
>  				(void *)get_zeroed_page(GFP_ATOMIC);
>  			if (hv_cpu->synic_message_page == NULL) {
> @@ -216,7 +216,7 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
>  	simp.simp_enabled = 1;
>  
> -	if (hv_isolation_type_snp()) {
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		hv_cpu->synic_message_page
>  			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
>  				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> @@ -233,7 +233,7 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
>  	siefp.siefp_enabled = 1;
>  
> -	if (hv_isolation_type_snp()) {
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		hv_cpu->synic_event_page =
>  			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
>  				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> @@ -315,20 +315,24 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	 * addresses.
>  	 */
>  	simp.simp_enabled = 0;
> -	if (hv_isolation_type_snp())
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		memunmap(hv_cpu->synic_message_page);
> -	else
> +		hv_cpu->synic_message_page = NULL;
> +	} else {
>  		simp.base_simp_gpa = 0;
> +	}
>  
>  	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
>  
>  	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
>  	siefp.siefp_enabled = 0;
>  
> -	if (hv_isolation_type_snp())
> +	if (hv_isolation_type_snp() || hv_root_partition) {
>  		memunmap(hv_cpu->synic_event_page);
> -	else
> +		hv_cpu->synic_event_page = NULL;
> +	} else {
>  		siefp.base_siefp_gpa = 0;
> +	}
>  
>  	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
>  

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 5/5] x86/hyperv: Change interrupt vector for nested root partition
  2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
  2023-01-03 19:15   ` Michael Kelley (LINUX)
@ 2023-01-12 15:27   ` Wei Liu
  2023-01-12 16:47   ` Borislav Petkov
  2023-01-12 20:00   ` Thomas Gleixner
  3 siblings, 0 replies; 14+ messages in thread
From: Wei Liu @ 2023-01-12 15:27 UTC (permalink / raw)
  To: Jinank Jain
  Cc: jinankjain, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh, mikelley

On Mon, Jan 02, 2023 at 07:12:55AM +0000, Jinank Jain wrote:
> Traditionally we have been using the HYPERVISOR_CALLBACK_VECTOR to relay
> the VMBus interrupt. But this does not work in case of nested
> hypervisor. Microsoft Hypervisor reserves 0x31 to 0x34 as the interrupt
> vector range for VMBus and thus we have to use one of the vectors from
> that range and setup the IDT accordingly.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>

I've applied all but this patch to hyperv-next.

This patch still needs an ack or nack from x86 maintainers to proceed.

Thanks,
Wei.

> ---
>  arch/x86/include/asm/idtentry.h    |  2 ++
>  arch/x86/include/asm/irq_vectors.h |  6 ++++++
>  arch/x86/kernel/cpu/mshyperv.c     | 15 +++++++++++++++
>  arch/x86/kernel/idt.c              | 10 ++++++++++
>  drivers/hv/vmbus_drv.c             |  3 ++-
>  5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 72184b0b2219..c0648e3e4d4a 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -686,6 +686,8 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR,	sysvec_kvm_posted_intr_nested
>  DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR,	sysvec_hyperv_callback);
>  DECLARE_IDTENTRY_SYSVEC(HYPERV_REENLIGHTENMENT_VECTOR,	sysvec_hyperv_reenlightenment);
>  DECLARE_IDTENTRY_SYSVEC(HYPERV_STIMER0_VECTOR,	sysvec_hyperv_stimer0);
> +DECLARE_IDTENTRY_SYSVEC(HYPERV_INTR_NESTED_VMBUS_VECTOR,
> +			sysvec_hyperv_nested_vmbus_intr);
>  #endif
>  
>  #if IS_ENABLED(CONFIG_ACRN_GUEST)
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 43dcb9284208..729d19eab7f5 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -102,6 +102,12 @@
>  #if IS_ENABLED(CONFIG_HYPERV)
>  #define HYPERV_REENLIGHTENMENT_VECTOR	0xee
>  #define HYPERV_STIMER0_VECTOR		0xed
> +/*
> + * FIXME: Change this, once Microsoft Hypervisor changes its assumption
> + * around VMBus interrupt vector allocation for nested root partition.
> + * Or provides a better interface to detect this instead of hardcoding.
> + */
> +#define HYPERV_INTR_NESTED_VMBUS_VECTOR	0x31
>  #endif
>  
>  #define LOCAL_TIMER_VECTOR		0xec
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 938fc82edf05..4dfe0f9d7be3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -126,6 +126,21 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
>  	set_irq_regs(old_regs);
>  }
>  
> +DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_nested_vmbus_intr)
> +{
> +	struct pt_regs *old_regs = set_irq_regs(regs);
> +
> +	inc_irq_stat(irq_hv_callback_count);
> +
> +	if (vmbus_handler)
> +		vmbus_handler();
> +
> +	if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
> +		ack_APIC_irq();
> +
> +	set_irq_regs(old_regs);
> +}
> +
>  void hv_setup_vmbus_handler(void (*handler)(void))
>  {
>  	vmbus_handler = handler;
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index a58c6bc1cd68..3536935cea39 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -160,6 +160,16 @@ static const __initconst struct idt_data apic_idts[] = {
>  # endif
>  	INTG(SPURIOUS_APIC_VECTOR,		asm_sysvec_spurious_apic_interrupt),
>  	INTG(ERROR_APIC_VECTOR,			asm_sysvec_error_interrupt),
> +#ifdef CONFIG_HYPERV
> +	/*
> +	 * This is a hack because we cannot install this interrupt handler
> +	 * via alloc_intr_gate as it does not allow interrupt vector less
> +	 * than FIRST_SYSTEM_VECTORS. And hyperv does not want anything other
> +	 * than 0x31-0x34 as the interrupt vector for vmbus interrupt in case
> +	 * of nested setup.
> +	 */
> +	INTG(HYPERV_INTR_NESTED_VMBUS_VECTOR, asm_sysvec_hyperv_nested_vmbus_intr),
> +#endif
>  #endif
>  };
>  
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 6324e01d5eec..740878367426 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2768,7 +2768,8 @@ static int __init hv_acpi_init(void)
>  	 * normal Linux IRQ mechanism is not used in this case.
>  	 */
>  #ifdef HYPERVISOR_CALLBACK_VECTOR
> -	vmbus_interrupt = HYPERVISOR_CALLBACK_VECTOR;
> +	vmbus_interrupt = hv_nested ? HYPERV_INTR_NESTED_VMBUS_VECTOR :
> +				      HYPERVISOR_CALLBACK_VECTOR;
>  	vmbus_irq = -1;
>  #endif
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 5/5] x86/hyperv: Change interrupt vector for nested root partition
  2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
  2023-01-03 19:15   ` Michael Kelley (LINUX)
  2023-01-12 15:27   ` Wei Liu
@ 2023-01-12 16:47   ` Borislav Petkov
  2023-01-12 20:00   ` Thomas Gleixner
  3 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2023-01-12 16:47 UTC (permalink / raw)
  To: Jinank Jain
  Cc: jinankjain, kys, haiyangz, wei.liu, decui, tglx, mingo,
	dave.hansen, x86, hpa, arnd, peterz, jpoimboe, seanjc,
	kirill.shutemov, ak, sathyanarayanan.kuppuswamy, linux-hyperv,
	linux-kernel, linux-arch, anrayabh, mikelley

On Mon, Jan 02, 2023 at 07:12:55AM +0000, Jinank Jain wrote:
> Traditionally we have been using the HYPERVISOR_CALLBACK_VECTOR to relay

Who's "we"?

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

Also, pls read section "2) Describe your changes" in
Documentation/process/submitting-patches.rst for more details.

Also, see section "Changelog" in
Documentation/process/maintainer-tip.rst

Bottom line is: personal pronouns are ambiguous in text, especially with
so many parties/companies/etc developing the kernel so let's avoid them
please.

> the VMBus interrupt. But this does not work in case of nested
> hypervisor. Microsoft Hypervisor reserves 0x31 to 0x34 as the interrupt
> vector range for VMBus and thus we have to use one of the vectors from
> that range and setup the IDT accordingly.
> 
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> ---
>  arch/x86/include/asm/idtentry.h    |  2 ++
>  arch/x86/include/asm/irq_vectors.h |  6 ++++++
>  arch/x86/kernel/cpu/mshyperv.c     | 15 +++++++++++++++
>  arch/x86/kernel/idt.c              | 10 ++++++++++
>  drivers/hv/vmbus_drv.c             |  3 ++-
>  5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 72184b0b2219..c0648e3e4d4a 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -686,6 +686,8 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR,	sysvec_kvm_posted_intr_nested
>  DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR,	sysvec_hyperv_callback);
>  DECLARE_IDTENTRY_SYSVEC(HYPERV_REENLIGHTENMENT_VECTOR,	sysvec_hyperv_reenlightenment);
>  DECLARE_IDTENTRY_SYSVEC(HYPERV_STIMER0_VECTOR,	sysvec_hyperv_stimer0);
> +DECLARE_IDTENTRY_SYSVEC(HYPERV_INTR_NESTED_VMBUS_VECTOR,
> +			sysvec_hyperv_nested_vmbus_intr);
>  #endif
>  
>  #if IS_ENABLED(CONFIG_ACRN_GUEST)
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 43dcb9284208..729d19eab7f5 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -102,6 +102,12 @@
>  #if IS_ENABLED(CONFIG_HYPERV)
>  #define HYPERV_REENLIGHTENMENT_VECTOR	0xee
>  #define HYPERV_STIMER0_VECTOR		0xed
> +/*
> + * FIXME: Change this, once Microsoft Hypervisor changes its assumption
      ^^^^^^

This patch looks like it is not ready to go anywhere yet...

> + * around VMBus interrupt vector allocation for nested root partition.

When is that going to happen? If at all...

> + * Or provides a better interface to detect this instead of hardcoding.
> + */
> +#define HYPERV_INTR_NESTED_VMBUS_VECTOR	0x31
>  #endif
>  
>  #define LOCAL_TIMER_VECTOR		0xec
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 938fc82edf05..4dfe0f9d7be3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -126,6 +126,21 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
>  	set_irq_regs(old_regs);
>  }
>  
> +DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_nested_vmbus_intr)
> +{
> +	struct pt_regs *old_regs = set_irq_regs(regs);
> +
> +	inc_irq_stat(irq_hv_callback_count);
> +
> +	if (vmbus_handler)
> +		vmbus_handler();
> +
> +	if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
> +		ack_APIC_irq();
> +
> +	set_irq_regs(old_regs);
> +}
> +
>  void hv_setup_vmbus_handler(void (*handler)(void))
>  {
>  	vmbus_handler = handler;
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index a58c6bc1cd68..3536935cea39 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -160,6 +160,16 @@ static const __initconst struct idt_data apic_idts[] = {
>  # endif
>  	INTG(SPURIOUS_APIC_VECTOR,		asm_sysvec_spurious_apic_interrupt),
>  	INTG(ERROR_APIC_VECTOR,			asm_sysvec_error_interrupt),
> +#ifdef CONFIG_HYPERV
> +	/*
> +	 * This is a hack because we cannot install this interrupt handler
> +	 * via alloc_intr_gate as it does not allow interrupt vector less
> +	 * than FIRST_SYSTEM_VECTORS. And hyperv does not want anything other
> +	 * than 0x31-0x34 as the interrupt vector for vmbus interrupt in case

Well:

/*
 * IDT vectors usable for external interrupt sources start at 0x20.
 * (0x80 is the syscall vector, 0x30-0x3f are for ISA)
				^^^^^^^^^^^^^^^^^^^^^^

 */
#define FIRST_EXTERNAL_VECTOR		0x20

I guess HyperV decided to reuse those...?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 5/5] x86/hyperv: Change interrupt vector for nested root partition
  2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
                     ` (2 preceding siblings ...)
  2023-01-12 16:47   ` Borislav Petkov
@ 2023-01-12 20:00   ` Thomas Gleixner
  3 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2023-01-12 20:00 UTC (permalink / raw)
  To: Jinank Jain
  Cc: kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, hpa,
	arnd, peterz, jpoimboe, jinankjain, seanjc, kirill.shutemov, ak,
	sathyanarayanan.kuppuswamy, linux-hyperv, linux-kernel,
	linux-arch, anrayabh, mikelley

Jinank!

On Mon, Jan 02 2023 at 07:12, Jinank Jain wrote:
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> +/*
> + * FIXME: Change this, once Microsoft Hypervisor changes its assumption
> + * around VMBus interrupt vector allocation for nested root partition.
> + * Or provides a better interface to detect this instead of hardcoding.
> + */
> +#define HYPERV_INTR_NESTED_VMBUS_VECTOR	0x31

arch/x86/include/asm/irq_vectors.h line 47:

/*
 * Vectors 0x30-0x3f are used for ISA interrupts.
 *   round up to the next 16-vector boundary
 */
#define ISA_IRQ_VECTOR(irq)		(((FIRST_EXTERNAL_VECTOR + 16) & ~15) + irq)

So this overlaps with the legacy interrupt vector space.

> +#ifdef CONFIG_HYPERV
> +	/*
> +	 * This is a hack because we cannot install this interrupt handler
> +	 * via alloc_intr_gate as it does not allow interrupt vector less
> +	 * than FIRST_SYSTEM_VECTORS. And hyperv does not want anything other
> +	 * than 0x31-0x34 as the interrupt vector for vmbus interrupt in case
> +	 * of nested setup.
> +	 */
> +	INTG(HYPERV_INTR_NESTED_VMBUS_VECTOR, asm_sysvec_hyperv_nested_vmbus_intr),
> +#endif

I agree, that this is a hack, but that puts it mildly: It's a completely
broken hack.

> +DECLARE_IDTENTRY_SYSVEC(HYPERV_INTR_NESTED_VMBUS_VECTOR, sysvec_hyperv_nested_vmbus_intr);

This generates the low level entry stub for vector 0x31 at compile time,
which competes with the interrupt stub for external interrupts generated
by:

      SYM_CODE_START(irq_entries_start)


Now the above INTG() hard-codes the IDT entry for vector 0x31 into the
apic_idts table. That marks it as system vector which in turn prevents
idt_setup_apic_and_irq_gates() to install the IDT entry for the external
vector on _ALL_ systems unconditionally.

IOW, you broke world except for systems which do not use the legacy
interrupt space. Congrats!

That legacy space is hardcoded and that's clearly documented so.

0x31 becomes IRQ1 - usually the i8042 - which makes it pretty much
guaranteed that this collides and fails. The worst case consequence is a
fully uncontrolled interrupt storm which is not even detectable.

So this patch is /dev/null material and either the hypervisor side makes
it possible to use a different vector space or this needs some very
careful modifications to the legacy ISA vector assignment.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-01-12 20:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1672639707.git.jinankjain@linux.microsoft.com>
2023-01-02  7:12 ` [PATCH v10 1/5] x86/hyperv: Add support for detecting nested hypervisor Jinank Jain
2023-01-03 19:09   ` Michael Kelley (LINUX)
2023-01-02  7:12 ` [PATCH v10 2/5] Drivers: hv: Setup synic registers in case of nested root partition Jinank Jain
2023-01-03 19:11   ` Michael Kelley (LINUX)
2023-01-09 17:03   ` Nuno Das Neves
2023-01-02  7:12 ` [PATCH v10 3/5] x86/hyperv: Add an interface to do nested hypercalls Jinank Jain
2023-01-03 19:12   ` Michael Kelley (LINUX)
2023-01-02  7:12 ` [PATCH v10 4/5] Drivers: hv: Enable vmbus driver for nested root partition Jinank Jain
2023-01-03 19:12   ` Michael Kelley (LINUX)
2023-01-02  7:12 ` [PATCH v10 5/5] x86/hyperv: Change interrupt vector " Jinank Jain
2023-01-03 19:15   ` Michael Kelley (LINUX)
2023-01-12 15:27   ` Wei Liu
2023-01-12 16:47   ` Borislav Petkov
2023-01-12 20:00   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).