linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor
@ 2021-01-20 12:00 Wei Liu
  2021-01-20 12:00 ` [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT Wei Liu
                   ` (15 more replies)
  0 siblings, 16 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, sameo, robert.bradford, sebastien.boeuf

Hi all

Here we propose this patch series to make Linux run as the root partition [0]
on Microsoft Hypervisor [1]. There will be a subsequent patch series to provide a
device node (/dev/mshv) such that userspace programs can create and run virtual
machines. We've also ported Cloud Hypervisor [3] over and have been able to
boot a Linux guest with Virtio devices since late July 2020.

This series implements only the absolutely necessary components to get
things running.  A large portion of this series consists of patches that
augment hyperv-tlfs.h.  They should be rather uncontroversial and can be
applied right away.

A few key things other than the changes to hyperv-tlfs.h:

1. Linux needs to setup existing Hyper-V facilities differently.
2. Linux needs to make a few hypercalls to bring up APs.
3. Interrupts are remapped by IOMMU, which is controlled by the hypervisor.
   Linux needs to make hypercalls to map and unmap interrupts. This is
   done by introducing a new MSI irqdomain and extending the remapping
   domain in hyperv-iommu.

This series is now based on 5.11-rc2.

Posting v5 with the latest changes to get some testing from various
kernel test bots, with the intention to merge this series soon.

Comments and suggestions are welcome.

Thanks,
Wei.

[0] Just think of it like Xen's Dom0.
[1] Hyper-V is more well-known, but it really refers to the whole stack
    including the hypervisor and other components that run in Windows kernel
    and userspace.
[3] https://github.com/cloud-hypervisor/

Cc: sameo@linux.intel.com
Cc: robert.bradford@intel.com
Cc: sebastien.boeuf@intel.com

Changes since v4:
1. Rework IO-APIC handling.

Changes since v3:
1. Fix compilation errors.
2. Adapt to upstream changes.

Changes since v2:
1. Address more comments from Vitaly.
2. Fix and test 32bit build.

Changes since v1:
1. Simplify MSI IRQ domain implementation.
2. Address Vitaly's comments.

Wei Liu (16):
  asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to
    HV_CPU_MANAGEMENT
  x86/hyperv: detect if Linux is the root partition
  Drivers: hv: vmbus: skip VMBus initialization if Linux is root
  iommu/hyperv: don't setup IRQ remapping when running as root
  clocksource/hyperv: use MSR-based access if running as root
  x86/hyperv: allocate output arg pages if required
  x86/hyperv: extract partition ID from Microsoft Hypervisor if
    necessary
  x86/hyperv: handling hypercall page setup for root
  x86/hyperv: provide a bunch of helper functions
  x86/hyperv: implement and use hv_smp_prepare_cpus
  asm-generic/hyperv: update hv_msi_entry
  asm-generic/hyperv: update hv_interrupt_entry
  asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  asm-generic/hyperv: import data structures for mapping device
    interrupts
  x86/hyperv: implement an MSI domain for root partition
  iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

 arch/x86/hyperv/Makefile            |   4 +-
 arch/x86/hyperv/hv_init.c           | 108 +++++++-
 arch/x86/hyperv/hv_proc.c           | 225 ++++++++++++++++
 arch/x86/hyperv/irqdomain.c         | 386 ++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h  |  23 ++
 arch/x86/include/asm/mshyperv.h     |  19 +-
 arch/x86/kernel/cpu/mshyperv.c      |  49 ++++
 drivers/clocksource/hyperv_timer.c  |   3 +
 drivers/hv/vmbus_drv.c              |   3 +
 drivers/iommu/hyperv-iommu.c        | 178 ++++++++++++-
 drivers/pci/controller/pci-hyperv.c |   2 +-
 include/asm-generic/hyperv-tlfs.h   | 254 +++++++++++++++++-
 12 files changed, 1233 insertions(+), 21 deletions(-)
 create mode 100644 arch/x86/hyperv/hv_proc.c
 create mode 100644 arch/x86/hyperv/irqdomain.c


base-commit: e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
-- 
2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 15:57   ` Pavel Tatashin
  2021-01-26  0:25   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition Wei Liu
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Vitaly Kuznetsov, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

This makes the name match Hyper-V TLFS.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/asm-generic/hyperv-tlfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index e73a11850055..e6903589a82a 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -88,7 +88,7 @@
 #define HV_CONNECT_PORT				BIT(7)
 #define HV_ACCESS_STATS				BIT(8)
 #define HV_DEBUGGING				BIT(11)
-#define HV_CPU_POWER_MANAGEMENT			BIT(12)
+#define HV_CPU_MANAGEMENT			BIT(12)
 
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
  2021-01-20 12:00 ` [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 16:03   ` Pavel Tatashin
  2021-01-26  0:31   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root Wei Liu
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

For now we can use the privilege flag to check. Stash the value to be
used later.

Put in a bunch of defines for future use when we want to have more
fine-grained detection.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v3: move hv_root_partition to mshyperv.c
---
 arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
 arch/x86/include/asm/mshyperv.h    |  2 ++
 arch/x86/kernel/cpu/mshyperv.c     | 20 ++++++++++++++++++++
 3 files changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 6bf42aed387e..204010350604 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -21,6 +21,7 @@
 #define HYPERV_CPUID_FEATURES			0x40000003
 #define HYPERV_CPUID_ENLIGHTMENT_INFO		0x40000004
 #define HYPERV_CPUID_IMPLEMENT_LIMITS		0x40000005
+#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES	0x40000007
 #define HYPERV_CPUID_NESTED_FEATURES		0x4000000A
 
 #define HYPERV_CPUID_VIRT_STACK_INTERFACE	0x40000081
@@ -110,6 +111,15 @@
 /* Recommend using enlightened VMCS */
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
 
+/*
+ * CPU management features identification.
+ * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
+ */
+#define HV_X64_START_LOGICAL_PROCESSOR			BIT(0)
+#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR		BIT(1)
+#define HV_X64_PERFORMANCE_COUNTER_SYNC			BIT(2)
+#define HV_X64_RESERVED_IDENTITY_BIT			BIT(31)
+
 /*
  * Virtual processor will never share a physical core with another virtual
  * processor, except for virtual processors that are reported as sibling SMT
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..ac2b0d110f03 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
 		struct hv_guest_mapping_flush_list *flush,
 		u64 start_gfn, u64 end_gfn);
 
+extern bool hv_root_partition;
+
 #ifdef CONFIG_X86_64
 void hv_apic_init(void);
 void __init hv_init_spinlocks(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f628e3dc150f..c376d191a260 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,10 @@
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
 
+/* Is Linux running as the root partition? */
+bool hv_root_partition;
+EXPORT_SYMBOL_GPL(hv_root_partition);
+
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
@@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
 	pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
 		 ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
 
+	/*
+	 * Check CPU management privilege.
+	 *
+	 * To mirror what Windows does we should extract CPU management
+	 * features and use the ReservedIdentityBit to detect if Linux is the
+	 * root partition. But that requires negotiating CPU management
+	 * interface (a process to be finalized).
+	 *
+	 * For now, use the privilege flag as the indicator for running as
+	 * root.
+	 */
+	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
+		hv_root_partition = true;
+		pr_info("Hyper-V: running as root partition\n");
+	}
+
 	/*
 	 * Extract host information.
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
  2021-01-20 12:00 ` [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT Wei Liu
  2021-01-20 12:00 ` [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 16:06   ` Pavel Tatashin
  2021-01-26  0:32   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger

There is no VMBus and the other infrastructures initialized in
hv_acpi_init when Linux is running as the root partition.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v3: Return 0 instead of -ENODEV.
---
 drivers/hv/vmbus_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 502f8cd95f6d..ee27b3670a51 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
 	if (!hv_is_hyperv_initialized())
 		return -ENODEV;
 
+	if (hv_root_partition)
+		return 0;
+
 	init_completion(&probe_event);
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (2 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 16:08   ` Pavel Tatashin
  2021-01-26  0:33   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if " Wei Liu
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Joerg Roedel, Vitaly Kuznetsov, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root partition.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1d21a0b5f724..b7db6024e65c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include <asm/io_apic.h>
 #include <asm/irq_remapping.h>
 #include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
 
 #include "irq_remapping.h"
 
@@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
 
 	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
 	    x86_init.hyper.msi_ext_dest_id() ||
-	    !x2apic_supported())
+	    !x2apic_supported() || hv_root_partition)
 		return -ENODEV;
 
 	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if running as root
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (3 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 16:13   ` Pavel Tatashin
  2021-01-26  0:34   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required Wei Liu
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Daniel Lezcano, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner

When Linux runs as the root partition, the setup required for TSC page
is different. Luckily Linux also has access to the MSR based
clocksource. We can just disable the TSC page clocksource if Linux is
the root partition.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/clocksource/hyperv_timer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba04cb381cd3..269a691bd2c4 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
 	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
 		return false;
 
+	if (hv_root_partition)
+		return false;
+
 	hv_read_reference_counter = read_hv_clock_tsc;
 	phys_addr = virt_to_phys(hv_get_tsc_page());
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (4 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if " Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-20 15:12   ` kernel test robot
                     ` (2 more replies)
  2021-01-20 12:00 ` [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary Wei Liu
                   ` (9 subsequent siblings)
  15 siblings, 3 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Lillian Grassin-Drake, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

When Linux runs as the root partition, it will need to make hypercalls
which return data from the hypervisor.

Allocate pages for storing results when Linux runs as the root
partition.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v3: Fix hv_cpu_die to use free_pages.
v2: Address Vitaly's comments
---
 arch/x86/hyperv/hv_init.c       | 35 ++++++++++++++++++++++++++++-----
 arch/x86/include/asm/mshyperv.h |  1 +
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e04d90af4c27..6f4cb40e53fe 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 void  __percpu **hyperv_pcpu_input_arg;
 EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
 
+void  __percpu **hyperv_pcpu_output_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
@@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
 	void **input_arg;
 	struct page *pg;
 
-	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
-	pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
+	pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
 	if (unlikely(!pg))
 		return -ENOMEM;
+
+	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
 	*input_arg = page_address(pg);
+	if (hv_root_partition) {
+		void **output_arg;
+
+		output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+		*output_arg = page_address(pg + 1);
+	}
 
 	hv_get_vp_index(msr_vp_index);
 
@@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
 	unsigned int new_cpu;
 	unsigned long flags;
 	void **input_arg;
-	void *input_pg = NULL;
+	void *pg;
 
 	local_irq_save(flags);
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-	input_pg = *input_arg;
+	pg = *input_arg;
 	*input_arg = NULL;
+
+	if (hv_root_partition) {
+		void **output_arg;
+
+		output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+		*output_arg = NULL;
+	}
+
 	local_irq_restore(flags);
-	free_page((unsigned long)input_pg);
+
+	free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
 
 	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
@@ -346,6 +365,12 @@ void __init hyperv_init(void)
 
 	BUG_ON(hyperv_pcpu_input_arg == NULL);
 
+	/* Allocate the per-CPU state for output arg for root */
+	if (hv_root_partition) {
+		hyperv_pcpu_output_arg = alloc_percpu(void *);
+		BUG_ON(hyperv_pcpu_output_arg == NULL);
+	}
+
 	/* Allocate percpu VP index */
 	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
 				    GFP_KERNEL);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ac2b0d110f03..62d9390f1ddf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
 #if IS_ENABLED(CONFIG_HYPERV)
 extern void *hv_hypercall_pg;
 extern void  __percpu  **hyperv_pcpu_input_arg;
+extern void  __percpu  **hyperv_pcpu_output_arg;
 
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (5 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  0:48   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root Wei Liu
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Lillian Grassin-Drake, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

We will need the partition ID for executing some hypercalls later.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v3:
1. Make hv_get_partition_id static.
2. Change code structure a bit.
---
 arch/x86/hyperv/hv_init.c         | 27 +++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h   |  2 ++
 include/asm-generic/hyperv-tlfs.h |  6 ++++++
 3 files changed, 35 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6f4cb40e53fe..fc9941bd8653 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -26,6 +26,9 @@
 #include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
 
+u64 hv_current_partition_id = ~0ull;
+EXPORT_SYMBOL_GPL(hv_current_partition_id);
+
 void *hv_hypercall_pg;
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 
@@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = {
 	.resume		= hv_resume,
 };
 
+static void __init hv_get_partition_id(void)
+{
+	struct hv_get_partition_id *output_page;
+	u16 status;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
+	status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
+		HV_HYPERCALL_RESULT_MASK;
+	if (status != HV_STATUS_SUCCESS) {
+		/* No point in proceeding if this failed */
+		pr_err("Failed to get partition ID: %d\n", status);
+		BUG();
+	}
+	hv_current_partition_id = output_page->partition_id;
+	local_irq_restore(flags);
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -426,6 +448,11 @@ void __init hyperv_init(void)
 
 	register_syscore_ops(&hv_syscore_ops);
 
+	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
+		hv_get_partition_id();
+
+	BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
+
 	return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 62d9390f1ddf..67f5d35a73d3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
 extern void  __percpu  **hyperv_pcpu_input_arg;
 extern void  __percpu  **hyperv_pcpu_output_arg;
 
+extern u64 hv_current_partition_id;
+
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index e6903589a82a..87b1a79b19eb 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX	0x0013
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX	0x0014
 #define HVCALL_SEND_IPI_EX			0x0015
+#define HVCALL_GET_PARTITION_ID			0x0046
 #define HVCALL_GET_VP_REGISTERS			0x0050
 #define HVCALL_SET_VP_REGISTERS			0x0051
 #define HVCALL_POST_MESSAGE			0x005c
@@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
 	u64 gva_list[];
 } __packed;
 
+/* HvGetPartitionId hypercall (output only) */
+struct hv_get_partition_id {
+	u64 partition_id;
+} __packed;
+
 /* HvRetargetDeviceInterrupt hypercall */
 union hv_msi_entry {
 	u64 as_uint64;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (6 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  0:49   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions Wei Liu
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Lillian Grassin-Drake, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

When Linux is running as the root partition, the hypercall page will
have already been setup by Hyper-V. Copy the content over to the
allocated page.

Add checks to hv_suspend & co to bail early because they are not
supported in this setup yet.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v3:
1. Use HV_HYP_PAGE_SIZE.
2. Add checks to hv_suspend & co.
---
 arch/x86/hyperv/hv_init.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index fc9941bd8653..ad8e77859b32 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -25,6 +25,7 @@
 #include <linux/cpuhotplug.h>
 #include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
+#include <linux/highmem.h>
 
 u64 hv_current_partition_id = ~0ull;
 EXPORT_SYMBOL_GPL(hv_current_partition_id);
@@ -283,6 +284,9 @@ static int hv_suspend(void)
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 	int ret;
 
+	if (hv_root_partition)
+		return -EPERM;
+
 	/*
 	 * Reset the hypercall page as it is going to be invalidated
 	 * accross hibernation. Setting hv_hypercall_pg to NULL ensures
@@ -433,8 +437,35 @@ void __init hyperv_init(void)
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
-	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
-	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+
+	if (hv_root_partition) {
+		struct page *pg;
+		void *src, *dst;
+
+		/*
+		 * For the root partition, the hypervisor will set up its
+		 * hypercall page. The hypervisor guarantees it will not show
+		 * up in the root's address space. The root can't change the
+		 * location of the hypercall page.
+		 *
+		 * Order is important here. We must enable the hypercall page
+		 * so it is populated with code, then copy the code to an
+		 * executable page.
+		 */
+		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+
+		pg = vmalloc_to_page(hv_hypercall_pg);
+		dst = kmap(pg);
+		src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE,
+				MEMREMAP_WB);
+		BUG_ON(!(src && dst));
+		memcpy(dst, src, HV_HYP_PAGE_SIZE);
+		memunmap(src);
+		kunmap(pg);
+	} else {
+		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
+		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+	}
 
 	/*
 	 * Ignore any errors in setting up stimer clockevents
@@ -577,6 +608,6 @@ EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
 
 bool hv_is_hibernation_supported(void)
 {
-	return acpi_sleep_state_supported(ACPI_STATE_S4);
+	return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
 }
 EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (7 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  1:20   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus Wei Liu
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Lillian Grassin-Drake, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

They are used to deposit pages into Microsoft Hypervisor and bring up
logical and virtual processors.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.

v3:
1. Add __packed to structures.
2. Drop unnecessary exports.

v2:
1. Adapt to hypervisor side changes
2. Address Vitaly's comments
---
 arch/x86/hyperv/Makefile          |   2 +-
 arch/x86/hyperv/hv_proc.c         | 225 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h   |   4 +
 include/asm-generic/hyperv-tlfs.h |  67 +++++++++
 4 files changed, 297 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/hv_proc.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 89b1f74d3225..565358020921 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y			:= hv_init.o mmu.o nested.o
-obj-$(CONFIG_X86_64)	+= hv_apic.o
+obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)	+= hv_spinlock.o
diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
new file mode 100644
index 000000000000..706097160e2f
--- /dev/null
+++ b/arch/x86/hyperv/hv_proc.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/types.h>
+#include <linux/version.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/clockchips.h>
+#include <linux/acpi.h>
+#include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <linux/cpuhotplug.h>
+#include <linux/minmax.h>
+#include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
+#include <asm/apic.h>
+
+#include <asm/trace/hyperv.h>
+
+#define HV_DEPOSIT_MAX_ORDER (8)
+#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER)
+
+/*
+ * Deposits exact number of pages
+ * Must be called with interrupts enabled
+ * Max 256 pages
+ */
+int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
+{
+	struct page **pages;
+	int *counts;
+	int num_allocations;
+	int i, j, page_count;
+	int order;
+	int desired_order;
+	u16 status;
+	int ret;
+	u64 base_pfn;
+	struct hv_deposit_memory *input_page;
+	unsigned long flags;
+
+	if (num_pages > HV_DEPOSIT_MAX)
+		return -E2BIG;
+	if (!num_pages)
+		return 0;
+
+	/* One buffer for page pointers and counts */
+	pages = page_address(alloc_page(GFP_KERNEL));
+	if (!pages)
+		return -ENOMEM;
+
+	counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
+	if (!counts) {
+		free_page((unsigned long)pages);
+		return -ENOMEM;
+	}
+
+	/* Allocate all the pages before disabling interrupts */
+	num_allocations = 0;
+	i = 0;
+	order = HV_DEPOSIT_MAX_ORDER;
+
+	while (num_pages) {
+		/* Find highest order we can actually allocate */
+		desired_order = 31 - __builtin_clz(num_pages);
+		order = min(desired_order, order);
+		do {
+			pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
+			if (!pages[i]) {
+				if (!order) {
+					ret = -ENOMEM;
+					goto err_free_allocations;
+				}
+				--order;
+			}
+		} while (!pages[i]);
+
+		split_page(pages[i], order);
+		counts[i] = 1 << order;
+		num_pages -= counts[i];
+		i++;
+		num_allocations++;
+	}
+
+	local_irq_save(flags);
+
+	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+	input_page->partition_id = partition_id;
+
+	/* Populate gpa_page_list - these will fit on the input page */
+	for (i = 0, page_count = 0; i < num_allocations; ++i) {
+		base_pfn = page_to_pfn(pages[i]);
+		for (j = 0; j < counts[i]; ++j, ++page_count)
+			input_page->gpa_page_list[page_count] = base_pfn + j;
+	}
+	status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
+				     page_count, 0, input_page,
+				     NULL) & HV_HYPERCALL_RESULT_MASK;
+	local_irq_restore(flags);
+
+	if (status != HV_STATUS_SUCCESS) {
+		pr_err("Failed to deposit pages: %d\n", status);
+		ret = status;
+		goto err_free_allocations;
+	}
+
+	ret = 0;
+	goto free_buf;
+
+err_free_allocations:
+	for (i = 0; i < num_allocations; ++i) {
+		base_pfn = page_to_pfn(pages[i]);
+		for (j = 0; j < counts[i]; ++j)
+			__free_page(pfn_to_page(base_pfn + j));
+	}
+
+free_buf:
+	free_page((unsigned long)pages);
+	kfree(counts);
+	return ret;
+}
+
+int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
+{
+	struct hv_add_logical_processor_in *input;
+	struct hv_add_logical_processor_out *output;
+	int status;
+	unsigned long flags;
+	int ret = 0;
+#ifdef CONFIG_ACPI_NUMA
+	int pxm = node_to_pxm(node);
+#else
+	int pxm = 0;
+#endif
+
+	/*
+	 * When adding a logical processor, the hypervisor may return
+	 * HV_STATUS_INSUFFICIENT_MEMORY. When that happens, we deposit more
+	 * pages and retry.
+	 */
+	do {
+		local_irq_save(flags);
+
+		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		/* We don't do anything with the output right now */
+		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+		input->lp_index = lp_index;
+		input->apic_id = apic_id;
+		input->flags = 0;
+		input->proximity_domain_info.domain_id = pxm;
+		input->proximity_domain_info.flags.reserved = 0;
+		input->proximity_domain_info.flags.proximity_info_valid = 1;
+		input->proximity_domain_info.flags.proximity_preferred = 1;
+		status = hv_do_hypercall(HVCALL_ADD_LOGICAL_PROCESSOR,
+					 input, output);
+		local_irq_restore(flags);
+
+		if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
+			if (status != HV_STATUS_SUCCESS) {
+				pr_err("%s: cpu %u apic ID %u, %d\n", __func__,
+				       lp_index, apic_id, status);
+				ret = status;
+			}
+			break;
+		}
+		ret = hv_call_deposit_pages(node, hv_current_partition_id, 1);
+	} while (!ret);
+
+	return ret;
+}
+
+int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
+{
+	struct hv_create_vp *input;
+	u16 status;
+	unsigned long irq_flags;
+	int ret = 0;
+#ifdef CONFIG_ACPI_NUMA
+	int pxm = node_to_pxm(node);
+#else
+	int pxm = 0;
+#endif
+
+	/* Root VPs don't seem to need pages deposited */
+	if (partition_id != hv_current_partition_id) {
+		ret = hv_call_deposit_pages(node, partition_id, 90);
+		if (ret)
+			return ret;
+	}
+
+	do {
+		local_irq_save(irq_flags);
+
+		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+		input->partition_id = partition_id;
+		input->vp_index = vp_index;
+		input->flags = flags;
+		input->subnode_type = HvSubnodeAny;
+		if (node != NUMA_NO_NODE) {
+			input->proximity_domain_info.domain_id = pxm;
+			input->proximity_domain_info.flags.reserved = 0;
+			input->proximity_domain_info.flags.proximity_info_valid = 1;
+			input->proximity_domain_info.flags.proximity_preferred = 1;
+		} else {
+			input->proximity_domain_info.as_uint64 = 0;
+		}
+		status = hv_do_hypercall(HVCALL_CREATE_VP, input, NULL);
+		local_irq_restore(irq_flags);
+
+		if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
+			if (status != HV_STATUS_SUCCESS) {
+				pr_err("%s: vcpu %u, lp %u, %d\n", __func__,
+				       vp_index, flags, status);
+				ret = status;
+			}
+			break;
+		}
+		ret = hv_call_deposit_pages(node, partition_id, 1);
+
+	} while (!ret);
+
+	return ret;
+}
+
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 67f5d35a73d3..4e590a167160 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -80,6 +80,10 @@ extern void  __percpu  **hyperv_pcpu_output_arg;
 
 extern u64 hv_current_partition_id;
 
+int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
+int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
+int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
+
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 87b1a79b19eb..ec53570102f0 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -142,6 +142,8 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX	0x0014
 #define HVCALL_SEND_IPI_EX			0x0015
 #define HVCALL_GET_PARTITION_ID			0x0046
+#define HVCALL_DEPOSIT_MEMORY			0x0048
+#define HVCALL_CREATE_VP			0x004e
 #define HVCALL_GET_VP_REGISTERS			0x0050
 #define HVCALL_SET_VP_REGISTERS			0x0051
 #define HVCALL_POST_MESSAGE			0x005c
@@ -149,6 +151,7 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_POST_DEBUG_DATA			0x0069
 #define HVCALL_RETRIEVE_DEBUG_DATA		0x006a
 #define HVCALL_RESET_DEBUG_SESSION		0x006b
+#define HVCALL_ADD_LOGICAL_PROCESSOR		0x0076
 #define HVCALL_RETARGET_INTERRUPT		0x007e
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
@@ -413,6 +416,70 @@ struct hv_get_partition_id {
 	u64 partition_id;
 } __packed;
 
+/* HvDepositMemory hypercall */
+struct hv_deposit_memory {
+	u64 partition_id;
+	u64 gpa_page_list[];
+} __packed;
+
+struct hv_proximity_domain_flags {
+	u32 proximity_preferred : 1;
+	u32 reserved : 30;
+	u32 proximity_info_valid : 1;
+} __packed;
+
+/* Not a union in windows but useful for zeroing */
+union hv_proximity_domain_info {
+	struct {
+		u32 domain_id;
+		struct hv_proximity_domain_flags flags;
+	};
+	u64 as_uint64;
+} __packed;
+
+struct hv_lp_startup_status {
+	u64 hv_status;
+	u64 substatus1;
+	u64 substatus2;
+	u64 substatus3;
+	u64 substatus4;
+	u64 substatus5;
+	u64 substatus6;
+} __packed;
+
+/* HvAddLogicalProcessor hypercall */
+struct hv_add_logical_processor_in {
+	u32 lp_index;
+	u32 apic_id;
+	union hv_proximity_domain_info proximity_domain_info;
+	u64 flags;
+};
+
+struct hv_add_logical_processor_out {
+	struct hv_lp_startup_status startup_status;
+} __packed;
+
+enum HV_SUBNODE_TYPE
+{
+    HvSubnodeAny = 0,
+    HvSubnodeSocket,
+    HvSubnodeAmdNode,
+    HvSubnodeL3,
+    HvSubnodeCount,
+    HvSubnodeInvalid = -1
+};
+
+/* HvCreateVp hypercall */
+struct hv_create_vp {
+	u64 partition_id;
+	u32 vp_index;
+	u8 padding[3];
+	u8 subnode_type;
+	u64 subnode_id;
+	union hv_proximity_domain_info proximity_domain_info;
+	u64 flags;
+} __packed;
+
 /* HvRetargetDeviceInterrupt hypercall */
 union hv_msi_entry {
 	u64 as_uint64;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (8 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  1:21   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry Wei Liu
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Lillian Grassin-Drake, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

Microsoft Hypervisor requires the root partition to make a few
hypercalls to setup application processors before they can be used.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
CPU hotplug and unplug is not yet supported in this setup, so those
paths remain untouched.

v3: Always call native SMP preparation function.
---
 arch/x86/kernel/cpu/mshyperv.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index c376d191a260..13d3b6dd21a3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -31,6 +31,7 @@
 #include <asm/reboot.h>
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
+#include <asm/numa.h>
 
 /* Is Linux running as the root partition? */
 bool hv_root_partition;
@@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void)
 	hv_init_spinlocks();
 #endif
 }
+
+static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
+{
+#ifdef CONFIG_X86_64
+	int i;
+	int ret;
+#endif
+
+	native_smp_prepare_cpus(max_cpus);
+
+#ifdef CONFIG_X86_64
+	for_each_present_cpu(i) {
+		if (i == 0)
+			continue;
+		ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
+		BUG_ON(ret);
+	}
+
+	for_each_present_cpu(i) {
+		if (i == 0)
+			continue;
+		ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i);
+		BUG_ON(ret);
+	}
+#endif
+}
 #endif
 
 static void __init ms_hyperv_init_platform(void)
@@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void)
 
 # ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
+	if (hv_root_partition)
+		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
 # endif
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (9 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  1:22   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry Wei Liu
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

We will soon need to access fields inside the MSI address and MSI data
fields. Introduce hv_msi_address_register and hv_msi_data_register.

Fix up one user of hv_msi_entry in mshyperv.h.

No functional change expected.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
 arch/x86/include/asm/mshyperv.h   |  4 ++--
 include/asm-generic/hyperv-tlfs.h | 28 ++++++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 4e590a167160..cbee72550a12 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -257,8 +257,8 @@ static inline void hv_apic_init(void) {}
 static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
 					      struct msi_desc *msi_desc)
 {
-	msi_entry->address = msi_desc->msg.address_lo;
-	msi_entry->data = msi_desc->msg.data;
+	msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
+	msi_entry->data.as_uint32 = msi_desc->msg.data;
 }
 
 #else /* CONFIG_HYPERV */
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index ec53570102f0..7e103be42799 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -480,12 +480,36 @@ struct hv_create_vp {
 	u64 flags;
 } __packed;
 
+union hv_msi_address_register {
+	u32 as_uint32;
+	struct {
+		u32 reserved1:2;
+		u32 destination_mode:1;
+		u32 redirection_hint:1;
+		u32 reserved2:8;
+		u32 destination_id:8;
+		u32 msi_base:12;
+	};
+} __packed;
+
+union hv_msi_data_register {
+	u32 as_uint32;
+	struct {
+		u32 vector:8;
+		u32 delivery_mode:3;
+		u32 reserved1:3;
+		u32 level_assert:1;
+		u32 trigger_mode:1;
+		u32 reserved2:16;
+	};
+} __packed;
+
 /* HvRetargetDeviceInterrupt hypercall */
 union hv_msi_entry {
 	u64 as_uint64;
 	struct {
-		u32 address;
-		u32 data;
+		union hv_msi_address_register address;
+		union hv_msi_data_register data;
 	} __packed;
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (10 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  1:23   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures Wei Liu
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Rob Herring, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Lorenzo Pieralisi, Bjorn Helgaas,
	Arnd Bergmann,
	open list:PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS,
	open list:GENERIC INCLUDE/ASM HEADER FILES

We will soon use the same structure to handle IO-APIC interrupts as
well. Introduce an enum to identify the source and a data structure for
IO-APIC RTE.

While at it, update pci-hyperv.c to use the enum.

No functional change.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/pci/controller/pci-hyperv.c |  2 +-
 include/asm-generic/hyperv-tlfs.h   | 36 +++++++++++++++++++++++++++--
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 6db8d96a78eb..87aa62ee0368 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data)
 	params = &hbus->retarget_msi_interrupt_params;
 	memset(params, 0, sizeof(*params));
 	params->partition_id = HV_PARTITION_ID_SELF;
-	params->int_entry.source = 1; /* MSI(-X) */
+	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
 	hv_set_msi_entry_from_desc(&params->int_entry.msi_entry, msi_desc);
 	params->device_id = (hbus->hdev->dev_instance.b[5] << 24) |
 			   (hbus->hdev->dev_instance.b[4] << 16) |
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 7e103be42799..8423bf53c237 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -480,6 +480,11 @@ struct hv_create_vp {
 	u64 flags;
 } __packed;
 
+enum hv_interrupt_source {
+	HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */
+	HV_INTERRUPT_SOURCE_IOAPIC,
+};
+
 union hv_msi_address_register {
 	u32 as_uint32;
 	struct {
@@ -513,10 +518,37 @@ union hv_msi_entry {
 	} __packed;
 };
 
+union hv_ioapic_rte {
+	u64 as_uint64;
+
+	struct {
+		u32 vector:8;
+		u32 delivery_mode:3;
+		u32 destination_mode:1;
+		u32 delivery_status:1;
+		u32 interrupt_polarity:1;
+		u32 remote_irr:1;
+		u32 trigger_mode:1;
+		u32 interrupt_mask:1;
+		u32 reserved1:15;
+
+		u32 reserved2:24;
+		u32 destination_id:8;
+	};
+
+	struct {
+		u32 low_uint32;
+		u32 high_uint32;
+	};
+} __packed;
+
 struct hv_interrupt_entry {
-	u32 source;			/* 1 for MSI(-X) */
+	u32 source;
 	u32 reserved1;
-	union hv_msi_entry msi_entry;
+	union {
+		union hv_msi_entry msi_entry;
+		union hv_ioapic_rte ioapic_rte;
+	};
 } __packed;
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (11 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  1:26   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts Wei Liu
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES

We will need to identify the device we want Microsoft Hypervisor to
manipulate.  Introduce the data structures for that purpose.

They will be used in a later patch.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
 include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 8423bf53c237..42ff1326c6bd 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
 	} element[];
 } __packed;
 
+enum hv_device_type {
+	HV_DEVICE_TYPE_LOGICAL = 0,
+	HV_DEVICE_TYPE_PCI = 1,
+	HV_DEVICE_TYPE_IOAPIC = 2,
+	HV_DEVICE_TYPE_ACPI = 3,
+};
+
+typedef u16 hv_pci_rid;
+typedef u16 hv_pci_segment;
+typedef u64 hv_logical_device_id;
+union hv_pci_bdf {
+	u16 as_uint16;
+
+	struct {
+		u8 function:3;
+		u8 device:5;
+		u8 bus;
+	};
+} __packed;
+
+union hv_pci_bus_range {
+	u16 as_uint16;
+
+	struct {
+		u8 subordinate_bus;
+		u8 secondary_bus;
+	};
+} __packed;
+
+union hv_device_id {
+	u64 as_uint64;
+
+	struct {
+		u64 :62;
+		u64 device_type:2;
+	};
+
+	/* HV_DEVICE_TYPE_LOGICAL */
+	struct {
+		u64 id:62;
+		u64 device_type:2;
+	} logical;
+
+	/* HV_DEVICE_TYPE_PCI */
+	struct {
+		union {
+			hv_pci_rid rid;
+			union hv_pci_bdf bdf;
+		};
+
+		hv_pci_segment segment;
+		union hv_pci_bus_range shadow_bus_range;
+
+		u16 phantom_function_bits:2;
+		u16 source_shadow:1;
+
+		u16 rsvdz0:11;
+		u16 device_type:2;
+	} pci;
+
+	/* HV_DEVICE_TYPE_IOAPIC */
+	struct {
+		u8 ioapic_id;
+		u8 rsvdz0;
+		u16 rsvdz1;
+		u16 rsvdz2;
+
+		u16 rsvdz3:14;
+		u16 device_type:2;
+	} ioapic;
+
+	/* HV_DEVICE_TYPE_ACPI */
+	struct {
+		u32 input_mapping_base;
+		u32 input_mapping_count:30;
+		u32 device_type:2;
+	} acpi;
+} __packed;
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (12 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-26  1:27   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition Wei Liu
  2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping " Wei Liu
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
 arch/x86/include/asm/hyperv-tlfs.h | 13 +++++++++++
 include/asm-generic/hyperv-tlfs.h  | 36 ++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 204010350604..ab7d6cde548d 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -533,6 +533,19 @@ struct hv_partition_assist_pg {
 	u32 tlb_lock_count;
 };
 
+enum hv_interrupt_type {
+	HV_X64_INTERRUPT_TYPE_FIXED             = 0x0000,
+	HV_X64_INTERRUPT_TYPE_LOWESTPRIORITY    = 0x0001,
+	HV_X64_INTERRUPT_TYPE_SMI               = 0x0002,
+	HV_X64_INTERRUPT_TYPE_REMOTEREAD        = 0x0003,
+	HV_X64_INTERRUPT_TYPE_NMI               = 0x0004,
+	HV_X64_INTERRUPT_TYPE_INIT              = 0x0005,
+	HV_X64_INTERRUPT_TYPE_SIPI              = 0x0006,
+	HV_X64_INTERRUPT_TYPE_EXTINT            = 0x0007,
+	HV_X64_INTERRUPT_TYPE_LOCALINT0         = 0x0008,
+	HV_X64_INTERRUPT_TYPE_LOCALINT1         = 0x0009,
+	HV_X64_INTERRUPT_TYPE_MAXIMUM           = 0x000A,
+};
 
 #include <asm-generic/hyperv-tlfs.h>
 
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 42ff1326c6bd..07efe0131fe3 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -152,6 +152,8 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_RETRIEVE_DEBUG_DATA		0x006a
 #define HVCALL_RESET_DEBUG_SESSION		0x006b
 #define HVCALL_ADD_LOGICAL_PROCESSOR		0x0076
+#define HVCALL_MAP_DEVICE_INTERRUPT		0x007c
+#define HVCALL_UNMAP_DEVICE_INTERRUPT		0x007d
 #define HVCALL_RETARGET_INTERRUPT		0x007e
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
@@ -702,4 +704,38 @@ union hv_device_id {
 	} acpi;
 } __packed;
 
+enum hv_interrupt_trigger_mode {
+	HV_INTERRUPT_TRIGGER_MODE_EDGE = 0,
+	HV_INTERRUPT_TRIGGER_MODE_LEVEL = 1,
+};
+
+struct hv_device_interrupt_descriptor {
+	u32 interrupt_type;
+	u32 trigger_mode;
+	u32 vector_count;
+	u32 reserved;
+	struct hv_device_interrupt_target target;
+} __packed;
+
+struct hv_input_map_device_interrupt {
+	u64 partition_id;
+	u64 device_id;
+	u64 flags;
+	struct hv_interrupt_entry logical_interrupt_entry;
+	struct hv_device_interrupt_descriptor interrupt_descriptor;
+} __packed;
+
+struct hv_output_map_device_interrupt {
+	struct hv_interrupt_entry interrupt_entry;
+} __packed;
+
+struct hv_input_unmap_device_interrupt {
+	u64 partition_id;
+	u64 device_id;
+	struct hv_interrupt_entry interrupt_entry;
+} __packed;
+
+#define HV_SOURCE_SHADOW_NONE               0x0
+#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE   0x1
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (13 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-27  5:47   ` Michael Kelley
  2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping " Wei Liu
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

When Linux runs as the root partition on Microsoft Hypervisor, its
interrupts are remapped.  Linux will need to explicitly map and unmap
interrupts for hardware.

Implement an MSI domain to issue the correct hypercalls. And initialize
this irqdomain as the default MSI irq domain.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
v3: build irqdomain.o for 32bit as well.
v2: This patch is simplified due to upstream changes.
---
 arch/x86/hyperv/Makefile        |   2 +-
 arch/x86/hyperv/hv_init.c       |   9 +
 arch/x86/hyperv/irqdomain.c     | 332 ++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h |   2 +
 4 files changed, 344 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/irqdomain.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 565358020921..48e2c51464e8 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y			:= hv_init.o mmu.o nested.o
+obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
 obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index ad8e77859b32..1cb2f7d1850a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -484,6 +484,15 @@ void __init hyperv_init(void)
 
 	BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
 
+#ifdef CONFIG_PCI_MSI
+	/*
+	 * If we're running as root, we want to create our own PCI MSI domain.
+	 * We can't set this in hv_pci_init because that would be too late.
+	 */
+	if (hv_root_partition)
+		x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
+#endif
+
 	return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
new file mode 100644
index 000000000000..19637cd60231
--- /dev/null
+++ b/arch/x86/hyperv/irqdomain.c
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
+//
+// Authors:
+//   Sunil Muthuswamy <sunilmut@microsoft.com>
+//   Wei Liu <wei.liu@kernel.org>
+
+#include <linux/pci.h>
+#include <linux/irq.h>
+#include <asm/mshyperv.h>
+
+static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
+{
+	unsigned long flags;
+	struct hv_input_unmap_device_interrupt *input;
+	struct hv_interrupt_entry *intr_entry;
+	u16 status;
+
+	local_irq_save(flags);
+	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+	memset(input, 0, sizeof(*input));
+	intr_entry = &input->interrupt_entry;
+	input->partition_id = hv_current_partition_id;
+	input->device_id = id;
+	*intr_entry = *old_entry;
+
+	status = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, 0, 0, input, NULL) &
+			 HV_HYPERCALL_RESULT_MASK;
+	local_irq_restore(flags);
+
+	return status;
+}
+
+#ifdef CONFIG_PCI_MSI
+struct rid_data {
+	struct pci_dev *bridge;
+	u32 rid;
+};
+
+static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
+{
+	struct rid_data *rd = data;
+	u8 bus = PCI_BUS_NUM(rd->rid);
+
+	if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
+		rd->bridge = pdev;
+		rd->rid = alias;
+	}
+
+	return 0;
+}
+
+static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
+{
+	union hv_device_id dev_id;
+	struct rid_data data = {
+		.bridge = NULL,
+		.rid = PCI_DEVID(dev->bus->number, dev->devfn)
+	};
+
+	pci_for_each_dma_alias(dev, get_rid_cb, &data);
+
+	dev_id.as_uint64 = 0;
+	dev_id.device_type = HV_DEVICE_TYPE_PCI;
+	dev_id.pci.segment = pci_domain_nr(dev->bus);
+
+	dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
+	dev_id.pci.bdf.device = PCI_SLOT(data.rid);
+	dev_id.pci.bdf.function = PCI_FUNC(data.rid);
+	dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
+
+	if (data.bridge) {
+		int pos;
+
+		/*
+		 * Microsoft Hypervisor requires a bus range when the bridge is
+		 * running in PCI-X mode.
+		 *
+		 * To distinguish conventional vs PCI-X bridge, we can check
+		 * the bridge's PCI-X Secondary Status Register, Secondary Bus
+		 * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
+		 * Specification Revision 1.0 5.2.2.1.3.
+		 *
+		 * Value zero means it is in conventional mode, otherwise it is
+		 * in PCI-X mode.
+		 */
+
+		pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
+		if (pos) {
+			u16 status;
+
+			pci_read_config_word(data.bridge, pos +
+					PCI_X_BRIDGE_SSTATUS, &status);
+
+			if (status & PCI_X_SSTATUS_FREQ) {
+				/* Non-zero, PCI-X mode */
+				u8 sec_bus, sub_bus;
+
+				dev_id.pci.source_shadow = HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
+
+				pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus);
+				dev_id.pci.shadow_bus_range.secondary_bus = sec_bus;
+				pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus);
+				dev_id.pci.shadow_bus_range.subordinate_bus = sub_bus;
+			}
+		}
+	}
+
+	return dev_id;
+}
+
+static int hv_map_msi_interrupt(struct pci_dev *dev, int vcpu, int vector,
+				struct hv_interrupt_entry *entry)
+{
+	struct hv_input_map_device_interrupt *input;
+	struct hv_output_map_device_interrupt *output;
+	struct hv_device_interrupt_descriptor *intr_desc;
+	unsigned long flags;
+	u16 status;
+
+	local_irq_save(flags);
+
+	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+	intr_desc = &input->interrupt_descriptor;
+	memset(input, 0, sizeof(*input));
+	input->partition_id = hv_current_partition_id;
+	input->device_id = hv_build_pci_dev_id(dev).as_uint64;
+	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
+	intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
+	intr_desc->vector_count = 1;
+	intr_desc->target.vector = vector;
+	__set_bit(vcpu, (unsigned long*)&intr_desc->target.vp_mask);
+
+	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
+			 HV_HYPERCALL_RESULT_MASK;
+	*entry = output->interrupt_entry;
+
+	local_irq_restore(flags);
+
+	if (status != HV_STATUS_SUCCESS)
+		pr_err("%s: hypercall failed, status %d\n", __func__, status);
+
+	return status;
+}
+
+static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
+{
+	/* High address is always 0 */
+	msg->address_hi = 0;
+	msg->address_lo = entry->msi_entry.address.as_uint32;
+	msg->data = entry->msi_entry.data.as_uint32;
+}
+
+static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry);
+static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct msi_desc *msidesc;
+	struct pci_dev *dev;
+	struct hv_interrupt_entry out_entry, *stored_entry;
+	struct irq_cfg *cfg = irqd_cfg(data);
+	struct cpumask *affinity;
+	int cpu, vcpu;
+	u16 status;
+
+	msidesc = irq_data_get_msi_desc(data);
+	dev = msi_desc_to_pci_dev(msidesc);
+
+	if (!cfg) {
+		pr_debug("%s: cfg is NULL", __func__);
+		return;
+	}
+
+	affinity = irq_data_get_effective_affinity_mask(data);
+	cpu = cpumask_first_and(affinity, cpu_online_mask);
+	vcpu = hv_cpu_number_to_vp_number(cpu);
+
+	if (data->chip_data) {
+		/*
+		 * This interrupt is already mapped. Let's unmap first.
+		 *
+		 * We don't use retarget interrupt hypercalls here because
+		 * Microsoft Hypervisor doens't allow root to change the vector
+		 * or specify VPs outside of the set that is initially used
+		 * during mapping.
+		 */
+		stored_entry = data->chip_data;
+		data->chip_data = NULL;
+
+		status = hv_unmap_msi_interrupt(dev, stored_entry);
+
+		kfree(stored_entry);
+
+		if (status != HV_STATUS_SUCCESS) {
+			pr_debug("%s: failed to unmap, status %d", __func__, status);
+			return;
+		}
+	}
+
+	stored_entry = kzalloc(sizeof(*stored_entry), GFP_ATOMIC);
+	if (!stored_entry) {
+		pr_debug("%s: failed to allocate chip data\n", __func__);
+		return;
+	}
+
+	status = hv_map_msi_interrupt(dev, vcpu, cfg->vector, &out_entry);
+	if (status != HV_STATUS_SUCCESS) {
+		kfree(stored_entry);
+		return;
+	}
+
+	*stored_entry = out_entry;
+	data->chip_data = stored_entry;
+	entry_to_msi_msg(&out_entry, msg);
+
+	return;
+}
+
+static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry)
+{
+	return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry)
+		& HV_HYPERCALL_RESULT_MASK;
+}
+
+static void hv_teardown_msi_irq_common(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
+{
+	u16 status;
+	struct hv_interrupt_entry old_entry;
+	struct irq_desc *desc;
+	struct irq_data *data;
+	struct msi_msg msg;
+
+	desc = irq_to_desc(irq);
+	if (!desc) {
+		pr_debug("%s: no irq desc\n", __func__);
+		return;
+	}
+
+	data = &desc->irq_data;
+	if (!data) {
+		pr_debug("%s: no irq data\n", __func__);
+		return;
+	}
+
+	if (!data->chip_data) {
+		pr_debug("%s: no chip data\n!", __func__);
+		return;
+	}
+
+	old_entry = *(struct hv_interrupt_entry *)data->chip_data;
+	entry_to_msi_msg(&old_entry, &msg);
+
+	kfree(data->chip_data);
+	data->chip_data = NULL;
+
+	status = hv_unmap_msi_interrupt(dev, &old_entry);
+
+	if (status != HV_STATUS_SUCCESS) {
+		pr_err("%s: hypercall failed, status %d\n", __func__, status);
+		return;
+	}
+}
+
+static void hv_msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+	int i;
+	struct msi_desc *entry;
+	struct pci_dev *pdev;
+
+	if (WARN_ON_ONCE(!dev_is_pci(dev)))
+		return;
+
+	pdev = to_pci_dev(dev);
+
+	for_each_pci_msi_entry(entry, pdev) {
+		if (entry->irq) {
+			for (i = 0; i < entry->nvec_used; i++) {
+				hv_teardown_msi_irq_common(pdev, entry, entry->irq + i);
+				irq_domain_free_irqs(entry->irq + i, 1);
+			}
+		}
+	}
+}
+
+/*
+ * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI or MSI-X Capability Structure.
+ */
+static struct irq_chip hv_pci_msi_controller = {
+	.name			= "HV-PCI-MSI",
+	.irq_unmask		= pci_msi_unmask_irq,
+	.irq_mask		= pci_msi_mask_irq,
+	.irq_ack		= irq_chip_ack_parent,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= hv_irq_compose_msi_msg,
+	.irq_set_affinity	= msi_domain_set_affinity,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static struct msi_domain_ops pci_msi_domain_ops = {
+	.domain_free_irqs	= hv_msi_domain_free_irqs,
+	.msi_prepare		= pci_msi_prepare,
+};
+
+static struct msi_domain_info hv_pci_msi_domain_info = {
+	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+			  MSI_FLAG_PCI_MSIX,
+	.ops		= &pci_msi_domain_ops,
+	.chip		= &hv_pci_msi_controller,
+	.handler	= handle_edge_irq,
+	.handler_name	= "edge",
+};
+
+struct irq_domain * __init hv_create_pci_msi_domain(void)
+{
+	struct irq_domain *d = NULL;
+	struct fwnode_handle *fn;
+
+	fn = irq_domain_alloc_named_fwnode("HV-PCI-MSI");
+	if (fn)
+		d = pci_msi_create_irq_domain(fn, &hv_pci_msi_domain_info, x86_vector_domain);
+
+	/* No point in going further if we can't get an irq domain */
+	BUG_ON(!d);
+
+	return d;
+}
+
+#endif /* CONFIG_PCI_MSI */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index cbee72550a12..ccc849e25d5e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -261,6 +261,8 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
 	msi_entry->data.as_uint32 = msi_desc->msg.data;
 }
 
+struct irq_domain *hv_create_pci_msi_domain(void);
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
                   ` (14 preceding siblings ...)
  2021-01-20 12:00 ` [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition Wei Liu
@ 2021-01-20 12:00 ` Wei Liu
  2021-01-27  5:47   ` Michael Kelley
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-20 12:00 UTC (permalink / raw)
  To: Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
Hypervisor when Linux runs as the root partition. Implement an IRQ
domain to handle mapping and unmapping of IO-APIC interrupts.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
---
 arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
 arch/x86/include/asm/mshyperv.h |   4 +
 drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
 3 files changed, 233 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 19637cd60231..8e2b4e478b70 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
 }
 
 #endif /* CONFIG_PCI_MSI */
+
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
+{
+	union hv_device_id device_id;
+
+	device_id.as_uint64 = 0;
+	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+	device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+	return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
+
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+		struct hv_interrupt_entry *entry)
+{
+	unsigned long flags;
+	struct hv_input_map_device_interrupt *input;
+	struct hv_output_map_device_interrupt *output;
+	union hv_device_id device_id;
+	struct hv_device_interrupt_descriptor *intr_desc;
+	u16 status;
+
+	device_id.as_uint64 = 0;
+	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+	device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+	local_irq_save(flags);
+	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+	memset(input, 0, sizeof(*input));
+	intr_desc = &input->interrupt_descriptor;
+	input->partition_id = hv_current_partition_id;
+	input->device_id = device_id.as_uint64;
+	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
+	intr_desc->target.vector = vector;
+	intr_desc->vector_count = 1;
+
+	if (level)
+		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
+	else
+		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
+
+	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
+
+	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
+			 HV_HYPERCALL_RESULT_MASK;
+	local_irq_restore(flags);
+
+	*entry = output->interrupt_entry;
+
+	return status;
+}
+EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccc849e25d5e..345d7c6f8c37 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
 
 struct irq_domain *hv_create_pci_msi_domain(void);
 
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+		struct hv_interrupt_entry *entry);
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index b7db6024e65c..6d35e4c303c6 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
 	.free = hyperv_irq_remapping_free,
 };
 
+static const struct irq_domain_ops hyperv_root_ir_domain_ops;
 static int __init hyperv_prepare_irq_remapping(void)
 {
 	struct fwnode_handle *fn;
 	int i;
+	const char *name;
+	const struct irq_domain_ops *ops;
 
 	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
 	    x86_init.hyper.msi_ext_dest_id() ||
-	    !x2apic_supported() || hv_root_partition)
+	    !x2apic_supported())
 		return -ENODEV;
 
-	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+	if (hv_root_partition) {
+		name = "HYPERV-ROOT-IR";
+		ops = &hyperv_root_ir_domain_ops;
+	} else {
+		name = "HYPERV-IR";
+		ops = &hyperv_ir_domain_ops;
+	}
+
+	fn = irq_domain_alloc_named_id_fwnode(name, 0);
 	if (!fn)
 		return -ENOMEM;
 
 	ioapic_ir_domain =
 		irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
-				0, IOAPIC_REMAPPING_ENTRY, fn,
-				&hyperv_ir_domain_ops, NULL);
+				0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
 
 	if (!ioapic_ir_domain) {
 		irq_domain_free_fwnode(fn);
 		return -ENOMEM;
 	}
 
+	if (hv_root_partition)
+		return 0; /* The rest is only relevant to guests */
+
 	/*
 	 * Hyper-V doesn't provide irq remapping function for
 	 * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
@@ -167,4 +180,162 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
 	.enable			= hyperv_enable_irq_remapping,
 };
 
+/* IRQ remapping domain when Linux runs as the root partition */
+struct hyperv_root_ir_data {
+	u8 ioapic_id;
+	bool is_level;
+	struct hv_interrupt_entry entry;
+};
+
+static void
+hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
+{
+	u16 status;
+	u32 vector;
+	struct irq_cfg *cfg;
+	int ioapic_id;
+	struct cpumask *affinity;
+	int cpu, vcpu;
+	struct hv_interrupt_entry entry;
+	struct hyperv_root_ir_data *data = irq_data->chip_data;
+	struct IO_APIC_route_entry e;
+
+	cfg = irqd_cfg(irq_data);
+	affinity = irq_data_get_effective_affinity_mask(irq_data);
+	cpu = cpumask_first_and(affinity, cpu_online_mask);
+	vcpu = hv_cpu_number_to_vp_number(cpu);
+
+	vector = cfg->vector;
+	ioapic_id = data->ioapic_id;
+
+	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
+	    && data->entry.ioapic_rte.as_uint64) {
+		entry = data->entry;
+
+		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
+
+		if (status != HV_STATUS_SUCCESS)
+			pr_debug("%s: unexpected unmap status %d\n", __func__, status);
+
+		data->entry.ioapic_rte.as_uint64 = 0;
+		data->entry.source = 0; /* Invalid source */
+	}
+
+
+	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
+					vector, &entry);
+
+	if (status != HV_STATUS_SUCCESS) {
+		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
+		return;
+	}
+
+	data->entry = entry;
+
+	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
+	e.w1 = entry.ioapic_rte.low_uint32;
+	e.w2 = entry.ioapic_rte.high_uint32;
+
+	memset(msg, 0, sizeof(*msg));
+	msg->arch_data.vector = e.vector;
+	msg->arch_data.delivery_mode = e.delivery_mode;
+	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
+	msg->arch_addr_lo.dmar_format = e.ir_format;
+	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
+}
+
+static int hyperv_root_ir_set_affinity(struct irq_data *data,
+		const struct cpumask *mask, bool force)
+{
+	struct irq_data *parent = data->parent_data;
+	struct irq_cfg *cfg = irqd_cfg(data);
+	int ret;
+
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
+
+	send_cleanup_vector(cfg);
+
+	return 0;
+}
+
+static struct irq_chip hyperv_root_ir_chip = {
+	.name			= "HYPERV-ROOT-IR",
+	.irq_ack		= apic_ack_irq,
+	.irq_set_affinity	= hyperv_root_ir_set_affinity,
+	.irq_compose_msi_msg	= hyperv_root_ir_compose_msi_msg,
+};
+
+static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
+				     unsigned int virq, unsigned int nr_irqs,
+				     void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	struct irq_data *irq_data;
+	struct hyperv_root_ir_data *data;
+	int ret = 0;
+
+	if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
+		return -EINVAL;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		irq_domain_free_irqs_common(domain, virq, nr_irqs);
+		return -ENOMEM;
+	}
+
+	irq_data = irq_domain_get_irq_data(domain, virq);
+	if (!irq_data) {
+		kfree(data);
+		irq_domain_free_irqs_common(domain, virq, nr_irqs);
+		return -EINVAL;
+	}
+
+	data->ioapic_id = info->devid;
+	data->is_level = info->ioapic.is_level;
+
+	irq_data->chip = &hyperv_root_ir_chip;
+	irq_data->chip_data = data;
+
+	return 0;
+}
+
+static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
+				 unsigned int virq, unsigned int nr_irqs)
+{
+	struct irq_data *irq_data;
+	struct hyperv_root_ir_data *data;
+	struct hv_interrupt_entry *e;
+	int i;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+
+		if (irq_data && irq_data->chip_data) {
+			data = irq_data->chip_data;
+			e = &data->entry;
+
+			if (e->source == HV_DEVICE_TYPE_IOAPIC
+			      && e->ioapic_rte.as_uint64)
+				hv_unmap_ioapic_interrupt(data->ioapic_id,
+							&data->entry);
+
+			kfree(data);
+		}
+	}
+
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
+	.select = hyperv_irq_remapping_select,
+	.alloc = hyperv_root_irq_remapping_alloc,
+	.free = hyperv_root_irq_remapping_free,
+};
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required
  2021-01-20 12:00 ` [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required Wei Liu
@ 2021-01-20 15:12   ` kernel test robot
  2021-01-20 19:44   ` Pavel Tatashin
  2021-01-26  0:41   ` Michael Kelley
  2 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2021-01-20 15:12 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: kbuild-all, virtualization, Linux Kernel List, Michael Kelley,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Wei Liu, Lillian Grassin-Drake

[-- Attachment #1: Type: text/plain, Size: 11380 bytes --]

Hi Wei,

I love your patch! Perhaps something to improve:

[auto build test WARNING on e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62]

url:    https://github.com/0day-ci/linux/commits/Wei-Liu/Introducing-Linux-root-partition-support-for-Microsoft-Hypervisor/20210120-215640
base:    e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
config: x86_64-randconfig-s021-20210120 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-208-g46a52ca4-dirty
        # https://github.com/0day-ci/linux/commit/f93337fc44e13a1506633f5d308bf74a8311dada
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Wei-Liu/Introducing-Linux-root-partition-support-for-Microsoft-Hypervisor/20210120-215640
        git checkout f93337fc44e13a1506633f5d308bf74a8311dada
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
   arch/x86/hyperv/hv_init.c:84:30: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected void const [noderef] __percpu *__vpp_verify @@     got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:84:30: sparse:     expected void const [noderef] __percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:84:30: sparse:     got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:89:39: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected void const [noderef] __percpu *__vpp_verify @@     got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:89:39: sparse:     expected void const [noderef] __percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:89:39: sparse:     got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:221:30: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected void const [noderef] __percpu *__vpp_verify @@     got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:221:30: sparse:     expected void const [noderef] __percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:221:30: sparse:     got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:228:39: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected void const [noderef] __percpu *__vpp_verify @@     got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:228:39: sparse:     expected void const [noderef] __percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:228:39: sparse:     got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:364:31: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected void [noderef] __percpu **extern [addressable] [toplevel] hyperv_pcpu_input_arg @@     got void *[noderef] __percpu * @@
   arch/x86/hyperv/hv_init.c:364:31: sparse:     expected void [noderef] __percpu **extern [addressable] [toplevel] hyperv_pcpu_input_arg
   arch/x86/hyperv/hv_init.c:364:31: sparse:     got void *[noderef] __percpu *
>> arch/x86/hyperv/hv_init.c:370:40: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected void [noderef] __percpu **extern [addressable] [toplevel] hyperv_pcpu_output_arg @@     got void *[noderef] __percpu * @@
   arch/x86/hyperv/hv_init.c:370:40: sparse:     expected void [noderef] __percpu **extern [addressable] [toplevel] hyperv_pcpu_output_arg
   arch/x86/hyperv/hv_init.c:370:40: sparse:     got void *[noderef] __percpu *

vim +370 arch/x86/hyperv/hv_init.c

   211	
   212	static int hv_cpu_die(unsigned int cpu)
   213	{
   214		struct hv_reenlightenment_control re_ctrl;
   215		unsigned int new_cpu;
   216		unsigned long flags;
   217		void **input_arg;
   218		void *pg;
   219	
   220		local_irq_save(flags);
 > 221		input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
   222		pg = *input_arg;
   223		*input_arg = NULL;
   224	
   225		if (hv_root_partition) {
   226			void **output_arg;
   227	
   228			output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
   229			*output_arg = NULL;
   230		}
   231	
   232		local_irq_restore(flags);
   233	
   234		free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
   235	
   236		if (hv_vp_assist_page && hv_vp_assist_page[cpu])
   237			wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
   238	
   239		if (hv_reenlightenment_cb == NULL)
   240			return 0;
   241	
   242		rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
   243		if (re_ctrl.target_vp == hv_vp_index[cpu]) {
   244			/*
   245			 * Reassign reenlightenment notifications to some other online
   246			 * CPU or just disable the feature if there are no online CPUs
   247			 * left (happens on hibernation).
   248			 */
   249			new_cpu = cpumask_any_but(cpu_online_mask, cpu);
   250	
   251			if (new_cpu < nr_cpu_ids)
   252				re_ctrl.target_vp = hv_vp_index[new_cpu];
   253			else
   254				re_ctrl.enabled = 0;
   255	
   256			wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
   257		}
   258	
   259		return 0;
   260	}
   261	
   262	static int __init hv_pci_init(void)
   263	{
   264		int gen2vm = efi_enabled(EFI_BOOT);
   265	
   266		/*
   267		 * For Generation-2 VM, we exit from pci_arch_init() by returning 0.
   268		 * The purpose is to suppress the harmless warning:
   269		 * "PCI: Fatal: No config space access function found"
   270		 */
   271		if (gen2vm)
   272			return 0;
   273	
   274		/* For Generation-1 VM, we'll proceed in pci_arch_init().  */
   275		return 1;
   276	}
   277	
   278	static int hv_suspend(void)
   279	{
   280		union hv_x64_msr_hypercall_contents hypercall_msr;
   281		int ret;
   282	
   283		/*
   284		 * Reset the hypercall page as it is going to be invalidated
   285		 * accross hibernation. Setting hv_hypercall_pg to NULL ensures
   286		 * that any subsequent hypercall operation fails safely instead of
   287		 * crashing due to an access of an invalid page. The hypercall page
   288		 * pointer is restored on resume.
   289		 */
   290		hv_hypercall_pg_saved = hv_hypercall_pg;
   291		hv_hypercall_pg = NULL;
   292	
   293		/* Disable the hypercall page in the hypervisor */
   294		rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   295		hypercall_msr.enable = 0;
   296		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   297	
   298		ret = hv_cpu_die(0);
   299		return ret;
   300	}
   301	
   302	static void hv_resume(void)
   303	{
   304		union hv_x64_msr_hypercall_contents hypercall_msr;
   305		int ret;
   306	
   307		ret = hv_cpu_init(0);
   308		WARN_ON(ret);
   309	
   310		/* Re-enable the hypercall page */
   311		rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   312		hypercall_msr.enable = 1;
   313		hypercall_msr.guest_physical_address =
   314			vmalloc_to_pfn(hv_hypercall_pg_saved);
   315		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   316	
   317		hv_hypercall_pg = hv_hypercall_pg_saved;
   318		hv_hypercall_pg_saved = NULL;
   319	
   320		/*
   321		 * Reenlightenment notifications are disabled by hv_cpu_die(0),
   322		 * reenable them here if hv_reenlightenment_cb was previously set.
   323		 */
   324		if (hv_reenlightenment_cb)
   325			set_hv_tscchange_cb(hv_reenlightenment_cb);
   326	}
   327	
   328	/* Note: when the ops are called, only CPU0 is online and IRQs are disabled. */
   329	static struct syscore_ops hv_syscore_ops = {
   330		.suspend	= hv_suspend,
   331		.resume		= hv_resume,
   332	};
   333	
   334	/*
   335	 * This function is to be invoked early in the boot sequence after the
   336	 * hypervisor has been detected.
   337	 *
   338	 * 1. Setup the hypercall page.
   339	 * 2. Register Hyper-V specific clocksource.
   340	 * 3. Setup Hyper-V specific APIC entry points.
   341	 */
   342	void __init hyperv_init(void)
   343	{
   344		u64 guest_id, required_msrs;
   345		union hv_x64_msr_hypercall_contents hypercall_msr;
   346		int cpuhp, i;
   347	
   348		if (x86_hyper_type != X86_HYPER_MS_HYPERV)
   349			return;
   350	
   351		/* Absolutely required MSRs */
   352		required_msrs = HV_MSR_HYPERCALL_AVAILABLE |
   353			HV_MSR_VP_INDEX_AVAILABLE;
   354	
   355		if ((ms_hyperv.features & required_msrs) != required_msrs)
   356			return;
   357	
   358		/*
   359		 * Allocate the per-CPU state for the hypercall input arg.
   360		 * If this allocation fails, we will not be able to setup
   361		 * (per-CPU) hypercall input page and thus this failure is
   362		 * fatal on Hyper-V.
   363		 */
   364		hyperv_pcpu_input_arg = alloc_percpu(void  *);
   365	
   366		BUG_ON(hyperv_pcpu_input_arg == NULL);
   367	
   368		/* Allocate the per-CPU state for output arg for root */
   369		if (hv_root_partition) {
 > 370			hyperv_pcpu_output_arg = alloc_percpu(void *);
   371			BUG_ON(hyperv_pcpu_output_arg == NULL);
   372		}
   373	
   374		/* Allocate percpu VP index */
   375		hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
   376					    GFP_KERNEL);
   377		if (!hv_vp_index)
   378			return;
   379	
   380		for (i = 0; i < num_possible_cpus(); i++)
   381			hv_vp_index[i] = VP_INVAL;
   382	
   383		hv_vp_assist_page = kcalloc(num_possible_cpus(),
   384					    sizeof(*hv_vp_assist_page), GFP_KERNEL);
   385		if (!hv_vp_assist_page) {
   386			ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
   387			goto free_vp_index;
   388		}
   389	
   390		cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
   391					  hv_cpu_init, hv_cpu_die);
   392		if (cpuhp < 0)
   393			goto free_vp_assist_page;
   394	
   395		/*
   396		 * Setup the hypercall page and enable hypercalls.
   397		 * 1. Register the guest ID
   398		 * 2. Enable the hypercall and register the hypercall page
   399		 */
   400		guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
   401		wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
   402	
   403		hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
   404				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
   405				VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
   406				__builtin_return_address(0));
   407		if (hv_hypercall_pg == NULL) {
   408			wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
   409			goto remove_cpuhp_state;
   410		}
   411	
   412		rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   413		hypercall_msr.enable = 1;
   414		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
   415		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   416	
   417		/*
   418		 * Ignore any errors in setting up stimer clockevents
   419		 * as we can run with the LAPIC timer as a fallback.
   420		 */
   421		(void)hv_stimer_alloc();
   422	
   423		hv_apic_init();
   424	
   425		x86_init.pci.arch_init = hv_pci_init;
   426	
   427		register_syscore_ops(&hv_syscore_ops);
   428	
   429		return;
   430	
   431	remove_cpuhp_state:
   432		cpuhp_remove_state(cpuhp);
   433	free_vp_assist_page:
   434		kfree(hv_vp_assist_page);
   435		hv_vp_assist_page = NULL;
   436	free_vp_index:
   437		kfree(hv_vp_index);
   438		hv_vp_index = NULL;
   439	}
   440	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36939 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT
  2021-01-20 12:00 ` [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT Wei Liu
@ 2021-01-20 15:57   ` Pavel Tatashin
  2021-01-26  0:25   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Pavel Tatashin @ 2021-01-20 15:57 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Michael Kelley, Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves,
	Vitaly Kuznetsov, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> This makes the name match Hyper-V TLFS.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  include/asm-generic/hyperv-tlfs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index e73a11850055..e6903589a82a 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -88,7 +88,7 @@
>  #define HV_CONNECT_PORT                                BIT(7)
>  #define HV_ACCESS_STATS                                BIT(8)
>  #define HV_DEBUGGING                           BIT(11)
> -#define HV_CPU_POWER_MANAGEMENT                        BIT(12)
> +#define HV_CPU_MANAGEMENT                      BIT(12)

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition
  2021-01-20 12:00 ` [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition Wei Liu
@ 2021-01-20 16:03   ` Pavel Tatashin
  2021-01-26 15:06     ` Wei Liu
  2021-01-26  0:31   ` Michael Kelley
  1 sibling, 1 reply; 59+ messages in thread
From: Pavel Tatashin @ 2021-01-20 16:03 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Michael Kelley, Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> For now we can use the privilege flag to check. Stash the value to be
> used later.
>
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
>  arch/x86/include/asm/mshyperv.h    |  2 ++
>  arch/x86/kernel/cpu/mshyperv.c     | 20 ++++++++++++++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
>  #define HYPERV_CPUID_FEATURES                  0x40000003
>  #define HYPERV_CPUID_ENLIGHTMENT_INFO          0x40000004
>  #define HYPERV_CPUID_IMPLEMENT_LIMITS          0x40000005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES   0x40000007
>  #define HYPERV_CPUID_NESTED_FEATURES           0x4000000A
>
>  #define HYPERV_CPUID_VIRT_STACK_INTERFACE      0x40000081
> @@ -110,6 +111,15 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED            BIT(14)
>
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR                 BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR           BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNC                        BIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT                   BIT(31)
> +
>  /*
>   * Virtual processor will never share a physical core with another virtual
>   * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
>                 struct hv_guest_mapping_flush_list *flush,
>                 u64 start_gfn, u64 end_gfn);
>
> +extern bool hv_root_partition;
> +
>  #ifdef CONFIG_X86_64
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
>  #include <asm/nmi.h>
>  #include <clocksource/hyperv_timer.h>
>
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
>
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
>         pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
>                  ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
>
> +       /*
> +        * Check CPU management privilege.
> +        *
> +        * To mirror what Windows does we should extract CPU management
> +        * features and use the ReservedIdentityBit to detect if Linux is the
> +        * root partition. But that requires negotiating CPU management
> +        * interface (a process to be finalized).

Is this comment relevant? Do we have to mirror what Windows does?

> +        *
> +        * For now, use the privilege flag as the indicator for running as
> +        * root.
> +        */
> +       if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> +               hv_root_partition = true;
> +               pr_info("Hyper-V: running as root partition\n");
> +       }
> +

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

>         /*
>          * Extract host information.
>          */
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root
  2021-01-20 12:00 ` [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root Wei Liu
@ 2021-01-20 16:06   ` Pavel Tatashin
  2021-01-26  0:32   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Pavel Tatashin @ 2021-01-20 16:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Michael Kelley, Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> There is no VMBus and the other infrastructures initialized in
> hv_acpi_init when Linux is running as the root partition.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3: Return 0 instead of -ENODEV.
> ---
>  drivers/hv/vmbus_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 502f8cd95f6d..ee27b3670a51 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
>         if (!hv_is_hyperv_initialized())
>                 return -ENODEV;
>
> +       if (hv_root_partition)
> +               return 0;
> +

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
@ 2021-01-20 16:08   ` Pavel Tatashin
  2021-01-26  0:33   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Pavel Tatashin @ 2021-01-20 16:08 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Michael Kelley, Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves,
	Joerg Roedel, Vitaly Kuznetsov, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Joerg Roedel <jroedel@suse.de>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include <asm/io_apic.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
>
>  #include "irq_remapping.h"
>
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
>
>         if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>             x86_init.hyper.msi_ext_dest_id() ||
> -           !x2apic_supported())
> +           !x2apic_supported() || hv_root_partition)
>                 return -ENODEV;

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if running as root
  2021-01-20 12:00 ` [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if " Wei Liu
@ 2021-01-20 16:13   ` Pavel Tatashin
  2021-01-26 15:19     ` Wei Liu
  2021-01-26  0:34   ` Michael Kelley
  1 sibling, 1 reply; 59+ messages in thread
From: Pavel Tatashin @ 2021-01-20 16:13 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Michael Kelley, Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves,
	Daniel Lezcano, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> When Linux runs as the root partition, the setup required for TSC page
> is different.

Why would we need a TSC page as a clock source for root partition at
all? I think the above can be removed.

 Luckily Linux also has access to the MSR based
> clocksource. We can just disable the TSC page clocksource if Linux is
> the root partition.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  drivers/clocksource/hyperv_timer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..269a691bd2c4 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
>         if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>                 return false;
>
> +       if (hv_root_partition)
> +               return false;
> +

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required
  2021-01-20 12:00 ` [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required Wei Liu
  2021-01-20 15:12   ` kernel test robot
@ 2021-01-20 19:44   ` Pavel Tatashin
  2021-01-26  0:41   ` Michael Kelley
  2 siblings, 0 replies; 59+ messages in thread
From: Pavel Tatashin @ 2021-01-20 19:44 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Michael Kelley, Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves,
	Lillian Grassin-Drake, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
>
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

The new warnings reported by the robot are the same as for the input argument.

Pasha

> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
>  arch/x86/hyperv/hv_init.c       | 35 ++++++++++++++++++++++++++++-----
>  arch/x86/include/asm/mshyperv.h |  1 +
>  2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>  void  __percpu **hyperv_pcpu_input_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>
> +void  __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
>         void **input_arg;
>         struct page *pg;
>
> -       input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>         /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> -       pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> +       pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
>         if (unlikely(!pg))
>                 return -ENOMEM;
> +
> +       input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>         *input_arg = page_address(pg);
> +       if (hv_root_partition) {
> +               void **output_arg;
> +
> +               output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +               *output_arg = page_address(pg + 1);
> +       }
>
>         hv_get_vp_index(msr_vp_index);
>
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
>         unsigned int new_cpu;
>         unsigned long flags;
>         void **input_arg;
> -       void *input_pg = NULL;
> +       void *pg;
>
>         local_irq_save(flags);
>         input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -       input_pg = *input_arg;
> +       pg = *input_arg;
>         *input_arg = NULL;
> +
> +       if (hv_root_partition) {
> +               void **output_arg;
> +
> +               output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +               *output_arg = NULL;
> +       }
> +
>         local_irq_restore(flags);
> -       free_page((unsigned long)input_pg);
> +
> +       free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
>
>         if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>                 wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
>
>         BUG_ON(hyperv_pcpu_input_arg == NULL);
>
> +       /* Allocate the per-CPU state for output arg for root */
> +       if (hv_root_partition) {
> +               hyperv_pcpu_output_arg = alloc_percpu(void *);
> +               BUG_ON(hyperv_pcpu_output_arg == NULL);
> +       }
> +
>         /* Allocate percpu VP index */
>         hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>                                     GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> +extern void  __percpu  **hyperv_pcpu_output_arg;
>
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT
  2021-01-20 12:00 ` [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT Wei Liu
  2021-01-20 15:57   ` Pavel Tatashin
@ 2021-01-26  0:25   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:25 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, vkuznets,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> This makes the name match Hyper-V TLFS.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  include/asm-generic/hyperv-tlfs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index e73a11850055..e6903589a82a 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -88,7 +88,7 @@
>  #define HV_CONNECT_PORT				BIT(7)
>  #define HV_ACCESS_STATS				BIT(8)
>  #define HV_DEBUGGING				BIT(11)
> -#define HV_CPU_POWER_MANAGEMENT			BIT(12)
> +#define HV_CPU_MANAGEMENT			BIT(12)
> 
> 
>  /*
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition
  2021-01-20 12:00 ` [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition Wei Liu
  2021-01-20 16:03   ` Pavel Tatashin
@ 2021-01-26  0:31   ` Michael Kelley
  2021-01-26 15:15     ` Wei Liu
  1 sibling, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:31 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> For now we can use the privilege flag to check. Stash the value to be
> used later.
> 
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
>  arch/x86/include/asm/mshyperv.h    |  2 ++
>  arch/x86/kernel/cpu/mshyperv.c     | 20 ++++++++++++++++++++
>  3 files changed, 32 insertions(+)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
>  #define HYPERV_CPUID_FEATURES			0x40000003
>  #define HYPERV_CPUID_ENLIGHTMENT_INFO		0x40000004
>  #define HYPERV_CPUID_IMPLEMENT_LIMITS		0x40000005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES	0x40000007
>  #define HYPERV_CPUID_NESTED_FEATURES		0x4000000A
> 
>  #define HYPERV_CPUID_VIRT_STACK_INTERFACE	0x40000081
> @@ -110,6 +111,15 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
> 
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR			BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR		BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNC			BIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT			BIT(31)
> +

I wonder if these bit definitions should go in the asm-generic part of
hyperv-tlfs.h instead of the X64 specific part.  They look very architecture
neutral (in which case the X64 should be dropped from the name
as well).  Of course, they can be moved later when/if we get to that point
and have a firmer understanding of what is and isn't arch neutral.

>  /*
>   * Virtual processor will never share a physical core with another virtual
>   * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
>  		struct hv_guest_mapping_flush_list *flush,
>  		u64 start_gfn, u64 end_gfn);
> 
> +extern bool hv_root_partition;
> +
>  #ifdef CONFIG_X86_64
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
>  #include <asm/nmi.h>
>  #include <clocksource/hyperv_timer.h>
> 
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
> 
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
>  	pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
>  		 ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
> 
> +	/*
> +	 * Check CPU management privilege.
> +	 *
> +	 * To mirror what Windows does we should extract CPU management
> +	 * features and use the ReservedIdentityBit to detect if Linux is the
> +	 * root partition. But that requires negotiating CPU management
> +	 * interface (a process to be finalized).
> +	 *
> +	 * For now, use the privilege flag as the indicator for running as
> +	 * root.
> +	 */
> +	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {

Should the EBX value be captured in the ms_hyperv structure with the
other similar values, and then used from there?

Michael

> +		hv_root_partition = true;
> +		pr_info("Hyper-V: running as root partition\n");
> +	}
> +
>  	/*
>  	 * Extract host information.
>  	 */
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root
  2021-01-20 12:00 ` [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root Wei Liu
  2021-01-20 16:06   ` Pavel Tatashin
@ 2021-01-26  0:32   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:32 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> There is no VMBus and the other infrastructures initialized in
> hv_acpi_init when Linux is running as the root partition.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3: Return 0 instead of -ENODEV.
> ---
>  drivers/hv/vmbus_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 502f8cd95f6d..ee27b3670a51 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
>  	if (!hv_is_hyperv_initialized())
>  		return -ENODEV;
> 
> +	if (hv_root_partition)
> +		return 0;
> +
>  	init_completion(&probe_event);
> 
>  	/*
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root
  2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
  2021-01-20 16:08   ` Pavel Tatashin
@ 2021-01-26  0:33   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:33 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, Joerg Roedel,
	vkuznets, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Joerg Roedel, Will Deacon, open list:IOMMU DRIVERS

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Joerg Roedel <jroedel@suse.de>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include <asm/io_apic.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
> 
>  #include "irq_remapping.h"
> 
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
> 
>  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>  	    x86_init.hyper.msi_ext_dest_id() ||
> -	    !x2apic_supported())
> +	    !x2apic_supported() || hv_root_partition)
>  		return -ENODEV;
> 
>  	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if running as root
  2021-01-20 12:00 ` [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if " Wei Liu
  2021-01-20 16:13   ` Pavel Tatashin
@ 2021-01-26  0:34   ` Michael Kelley
  1 sibling, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:34 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, Daniel Lezcano,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition, the setup required for TSC page
> is different. Luckily Linux also has access to the MSR based
> clocksource. We can just disable the TSC page clocksource if Linux is
> the root partition.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  drivers/clocksource/hyperv_timer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..269a691bd2c4 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
>  	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>  		return false;
> 
> +	if (hv_root_partition)
> +		return false;
> +
>  	hv_read_reference_counter = read_hv_clock_tsc;
>  	phys_addr = virt_to_phys(hv_get_tsc_page());
> 
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required
  2021-01-20 12:00 ` [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required Wei Liu
  2021-01-20 15:12   ` kernel test robot
  2021-01-20 19:44   ` Pavel Tatashin
@ 2021-01-26  0:41   ` Michael Kelley
  2021-01-26 18:09     ` Wei Liu
  2 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:41 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Lillian Grassin-Drake, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
> 
> Allocate pages for storing results when Linux runs as the root
> partition.
> 
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
>  arch/x86/hyperv/hv_init.c       | 35 ++++++++++++++++++++++++++++-----
>  arch/x86/include/asm/mshyperv.h |  1 +
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>  void  __percpu **hyperv_pcpu_input_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> 
> +void  __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> 
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
>  	void **input_arg;
>  	struct page *pg;
> 
> -	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>  	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> -	pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> +	pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ?
> 1 : 0);
>  	if (unlikely(!pg))
>  		return -ENOMEM;
> +
> +	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>  	*input_arg = page_address(pg);
> +	if (hv_root_partition) {
> +		void **output_arg;
> +
> +		output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +		*output_arg = page_address(pg + 1);
> +	}
> 
>  	hv_get_vp_index(msr_vp_index);
> 
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
>  	unsigned int new_cpu;
>  	unsigned long flags;
>  	void **input_arg;
> -	void *input_pg = NULL;
> +	void *pg;
> 
>  	local_irq_save(flags);
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	input_pg = *input_arg;
> +	pg = *input_arg;
>  	*input_arg = NULL;
> +
> +	if (hv_root_partition) {
> +		void **output_arg;
> +
> +		output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +		*output_arg = NULL;
> +	}
> +
>  	local_irq_restore(flags);
> -	free_page((unsigned long)input_pg);
> +
> +	free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
> 
>  	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
> 
>  	BUG_ON(hyperv_pcpu_input_arg == NULL);
> 
> +	/* Allocate the per-CPU state for output arg for root */
> +	if (hv_root_partition) {
> +		hyperv_pcpu_output_arg = alloc_percpu(void *);
> +		BUG_ON(hyperv_pcpu_output_arg == NULL);
> +	}
> +
>  	/* Allocate percpu VP index */
>  	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>  				    GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> +extern void  __percpu  **hyperv_pcpu_output_arg;
> 
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
> --
> 2.20.1

I think this all works OK.  But a meta question:  Do we need a separate
per-cpu output argument page?  From the Hyper-V hypercall standpoint, I
don't think input and output args need to be in separate pages.  They both
just need to not cross a page boundary.  As long as we don't have a hypercall
where the sum of the sizes of the input and output args exceeds a page,
we could just have a single page, and split it up in any manner that works
for the particular hypercall.

Thoughts?

Michael



^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
  2021-01-20 12:00 ` [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary Wei Liu
@ 2021-01-26  0:48   ` Michael Kelley
  2021-02-02 15:03     ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:48 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Lillian Grassin-Drake, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will need the partition ID for executing some hypercalls later.
> 
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3:
> 1. Make hv_get_partition_id static.
> 2. Change code structure a bit.
> ---
>  arch/x86/hyperv/hv_init.c         | 27 +++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h   |  2 ++
>  include/asm-generic/hyperv-tlfs.h |  6 ++++++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6f4cb40e53fe..fc9941bd8653 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -26,6 +26,9 @@
>  #include <linux/syscore_ops.h>
>  #include <clocksource/hyperv_timer.h>
> 
> +u64 hv_current_partition_id = ~0ull;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> @@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = {
>  	.resume		= hv_resume,
>  };
> 
> +static void __init hv_get_partition_id(void)
> +{
> +	struct hv_get_partition_id *output_page;
> +	u16 status;
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +	status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> +		HV_HYPERCALL_RESULT_MASK;
> +	if (status != HV_STATUS_SUCCESS) {

Across the Hyper-V code in Linux, the way we check the hypercall result
is very inconsistent.  IMHO, the and'ing of hv_do_hypercall() with 
HV_HYPERCALL_RESULT_MASK so that status can be a u16 is stylistically
a bit unusual.

I'd like to see the hypercall result being stored into a u64 local variable.
Then the subsequent test for the status should 'and' the u64 with
HV_HYPERCALL_RESULT_MASK to determine the result code.
I've made a note to go fix the places that aren't doing it that way.

> +		/* No point in proceeding if this failed */
> +		pr_err("Failed to get partition ID: %d\n", status);
> +		BUG();
> +	}
> +	hv_current_partition_id = output_page->partition_id;
> +	local_irq_restore(flags);
> +}
> +
>  /*
>   * This function is to be invoked early in the boot sequence after the
>   * hypervisor has been detected.
> @@ -426,6 +448,11 @@ void __init hyperv_init(void)
> 
>  	register_syscore_ops(&hv_syscore_ops);
> 
> +	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> +		hv_get_partition_id();

Another place where the EBX value saved into the ms_hyperv structure
could be used.

> +
> +	BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> +
>  	return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 62d9390f1ddf..67f5d35a73d3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
>  extern void  __percpu  **hyperv_pcpu_output_arg;
> 
> +extern u64 hv_current_partition_id;
> +
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX	0x0013
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX	0x0014
>  #define HVCALL_SEND_IPI_EX			0x0015
> +#define HVCALL_GET_PARTITION_ID			0x0046
>  #define HVCALL_GET_VP_REGISTERS			0x0050
>  #define HVCALL_SET_VP_REGISTERS			0x0051
>  #define HVCALL_POST_MESSAGE			0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
>  	u64 gva_list[];
>  } __packed;
> 
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> +	u64 partition_id;
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>  	u64 as_uint64;
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root
  2021-01-20 12:00 ` [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root Wei Liu
@ 2021-01-26  0:49   ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  0:49 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Lillian Grassin-Drake, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux is running as the root partition, the hypercall page will
> have already been setup by Hyper-V. Copy the content over to the
> allocated page.
> 
> Add checks to hv_suspend & co to bail early because they are not
> supported in this setup yet.
> 
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v3:
> 1. Use HV_HYP_PAGE_SIZE.
> 2. Add checks to hv_suspend & co.
> ---
>  arch/x86/hyperv/hv_init.c | 37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index fc9941bd8653..ad8e77859b32 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -25,6 +25,7 @@
>  #include <linux/cpuhotplug.h>
>  #include <linux/syscore_ops.h>
>  #include <clocksource/hyperv_timer.h>
> +#include <linux/highmem.h>
> 
>  u64 hv_current_partition_id = ~0ull;
>  EXPORT_SYMBOL_GPL(hv_current_partition_id);
> @@ -283,6 +284,9 @@ static int hv_suspend(void)
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
>  	int ret;
> 
> +	if (hv_root_partition)
> +		return -EPERM;
> +
>  	/*
>  	 * Reset the hypercall page as it is going to be invalidated
>  	 * accross hibernation. Setting hv_hypercall_pg to NULL ensures
> @@ -433,8 +437,35 @@ void __init hyperv_init(void)
> 
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	hypercall_msr.enable = 1;
> -	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> -	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> +	if (hv_root_partition) {
> +		struct page *pg;
> +		void *src, *dst;
> +
> +		/*
> +		 * For the root partition, the hypervisor will set up its
> +		 * hypercall page. The hypervisor guarantees it will not show
> +		 * up in the root's address space. The root can't change the
> +		 * location of the hypercall page.
> +		 *
> +		 * Order is important here. We must enable the hypercall page
> +		 * so it is populated with code, then copy the code to an
> +		 * executable page.
> +		 */
> +		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> +		pg = vmalloc_to_page(hv_hypercall_pg);
> +		dst = kmap(pg);
> +		src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT,
> PAGE_SIZE,
> +				MEMREMAP_WB);
> +		BUG_ON(!(src && dst));
> +		memcpy(dst, src, HV_HYP_PAGE_SIZE);
> +		memunmap(src);
> +		kunmap(pg);
> +	} else {
> +		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> +		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +	}
> 
>  	/*
>  	 * Ignore any errors in setting up stimer clockevents
> @@ -577,6 +608,6 @@ EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> 
>  bool hv_is_hibernation_supported(void)
>  {
> -	return acpi_sleep_state_supported(ACPI_STATE_S4);
> +	return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions
  2021-01-20 12:00 ` [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions Wei Liu
@ 2021-01-26  1:20   ` Michael Kelley
  2021-02-02 16:19     ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  1:20 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Lillian Grassin-Drake, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
> 
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.
> 
> v3:
> 1. Add __packed to structures.
> 2. Drop unnecessary exports.
> 
> v2:
> 1. Adapt to hypervisor side changes
> 2. Address Vitaly's comments
> ---
>  arch/x86/hyperv/Makefile          |   2 +-
>  arch/x86/hyperv/hv_proc.c         | 225 ++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h   |   4 +
>  include/asm-generic/hyperv-tlfs.h |  67 +++++++++
>  4 files changed, 297 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/hv_proc.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 89b1f74d3225..565358020921 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y			:= hv_init.o mmu.o nested.o
> -obj-$(CONFIG_X86_64)	+= hv_apic.o
> +obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
>  obj-$(CONFIG_PARAVIRT_SPINLOCKS)	+= hv_spinlock.o
> diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
> new file mode 100644
> index 000000000000..706097160e2f
> --- /dev/null
> +++ b/arch/x86/hyperv/hv_proc.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/types.h>
> +#include <linux/version.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
> +#include <linux/clockchips.h>
> +#include <linux/acpi.h>
> +#include <linux/hyperv.h>
> +#include <linux/slab.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/minmax.h>
> +#include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
> +#include <asm/apic.h>
> +
> +#include <asm/trace/hyperv.h>
> +
> +#define HV_DEPOSIT_MAX_ORDER (8)
> +#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER)

Is there any reason to not let the maximum be 511, which is
how many entries will fit on the hypercall input page?  The
max could be define in terms of HY_HYP_PAGE_SIZE so that
the logical dependency is fully expressed.  

> +
> +/*
> + * Deposits exact number of pages
> + * Must be called with interrupts enabled
> + * Max 256 pages
> + */
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +{
> +	struct page **pages;
> +	int *counts;
> +	int num_allocations;
> +	int i, j, page_count;
> +	int order;
> +	int desired_order;
> +	u16 status;
> +	int ret;
> +	u64 base_pfn;
> +	struct hv_deposit_memory *input_page;
> +	unsigned long flags;
> +
> +	if (num_pages > HV_DEPOSIT_MAX)
> +		return -E2BIG;
> +	if (!num_pages)
> +		return 0;
> +
> +	/* One buffer for page pointers and counts */
> +	pages = page_address(alloc_page(GFP_KERNEL));
> +	if (!pages)

Does the above check work?  If alloc_pages() returns NULL, it looks like
page_address() might fault.

> +		return -ENOMEM;
> +
> +	counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
> +	if (!counts) {
> +		free_page((unsigned long)pages);
> +		return -ENOMEM;
> +	}
> +
> +	/* Allocate all the pages before disabling interrupts */
> +	num_allocations = 0;
> +	i = 0;
> +	order = HV_DEPOSIT_MAX_ORDER;
> +
> +	while (num_pages) {
> +		/* Find highest order we can actually allocate */
> +		desired_order = 31 - __builtin_clz(num_pages);
> +		order = min(desired_order, order);

The above seems redundant since request sizes larger than the
max have already been rejected.

> +		do {
> +			pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
> +			if (!pages[i]) {
> +				if (!order) {
> +					ret = -ENOMEM;
> +					goto err_free_allocations;
> +				}
> +				--order;
> +			}
> +		} while (!pages[i]);

The duplicative test of !pages[i] is somewhat annoying.  How about
this:

		while{!pages[i] = alloc_pages_node(node, GFP_KERNEL, order) {
			if (!order) {
				ret = -ENOMEM;
				goto err_free_allocations;
			}
			--order;
		}

or if you don't like doing an assignment in the while test:

		while(1) {
			pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
			if (page[i])
				break;
			if (!order) {
				ret = -ENOMEM;
				goto err_free_allocations;
			}
			--order;
		}

> +
> +		split_page(pages[i], order);
> +		counts[i] = 1 << order;
> +		num_pages -= counts[i];
> +		i++;
> +		num_allocations++;

Incrementing both I and num_allocations in the loop seems
redundant, especially since num_allocations isn't used in the loop.
Could num_allocations be assigned the value of i once the loop
is exited?  (and num_allocations would not need to be initialized to 0.) 
Would also have to do the assignment in the error case.

> +	}
> +
> +	local_irq_save(flags);
> +
> +	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> +	input_page->partition_id = partition_id;
> +
> +	/* Populate gpa_page_list - these will fit on the input page */
> +	for (i = 0, page_count = 0; i < num_allocations; ++i) {
> +		base_pfn = page_to_pfn(pages[i]);
> +		for (j = 0; j < counts[i]; ++j, ++page_count)
> +			input_page->gpa_page_list[page_count] = base_pfn + j;
> +	}
> +	status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
> +				     page_count, 0, input_page,
> +				     NULL) & HV_HYPERCALL_RESULT_MASK;

Similar comment about how hypercall status is checked.

> +	local_irq_restore(flags);
> +
> +	if (status != HV_STATUS_SUCCESS) {
> +		pr_err("Failed to deposit pages: %d\n", status);
> +		ret = status;
> +		goto err_free_allocations;
> +	}
> +
> +	ret = 0;
> +	goto free_buf;
> +
> +err_free_allocations:
> +	for (i = 0; i < num_allocations; ++i) {
> +		base_pfn = page_to_pfn(pages[i]);
> +		for (j = 0; j < counts[i]; ++j)
> +			__free_page(pfn_to_page(base_pfn + j));
> +	}
> +
> +free_buf:
> +	free_page((unsigned long)pages);
> +	kfree(counts);
> +	return ret;
> +}
> +
> +int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
> +{
> +	struct hv_add_logical_processor_in *input;
> +	struct hv_add_logical_processor_out *output;
> +	int status;
> +	unsigned long flags;
> +	int ret = 0;
> +#ifdef CONFIG_ACPI_NUMA
> +	int pxm = node_to_pxm(node);
> +#else
> +	int pxm = 0;
> +#endif

It seems like the above #ifdef'ery might be better fixed in
include/acpi/acpi_numa.h, where there's already a null definition
of pxm_to_node() in case CONFIG_ACPI_NUMA isn't defined.  There
should also be a null definition of node_to_pxm() in that file.

> +
> +	/*
> +	 * When adding a logical processor, the hypervisor may return
> +	 * HV_STATUS_INSUFFICIENT_MEMORY. When that happens, we deposit more
> +	 * pages and retry.
> +	 */
> +	do {
> +		local_irq_save(flags);
> +
> +		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +		/* We don't do anything with the output right now */
> +		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> +		input->lp_index = lp_index;
> +		input->apic_id = apic_id;
> +		input->flags = 0;
> +		input->proximity_domain_info.domain_id = pxm;
> +		input->proximity_domain_info.flags.reserved = 0;
> +		input->proximity_domain_info.flags.proximity_info_valid = 1;
> +		input->proximity_domain_info.flags.proximity_preferred = 1;
> +		status = hv_do_hypercall(HVCALL_ADD_LOGICAL_PROCESSOR,
> +					 input, output);
> +		local_irq_restore(flags);
> +
> +		if (status != HV_STATUS_INSUFFICIENT_MEMORY) {

The 'and' with HV_HYPERCALL_RESULT_MASK isn't coded anywhere for this
hypercall, and 'status' is declared as 'int'.

> +			if (status != HV_STATUS_SUCCESS) {
> +				pr_err("%s: cpu %u apic ID %u, %d\n", __func__,
> +				       lp_index, apic_id, status);
> +				ret = status;
> +			}
> +			break;
> +		}
> +		ret = hv_call_deposit_pages(node, hv_current_partition_id, 1);
> +	} while (!ret);
> +
> +	return ret;
> +}
> +
> +int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
> +{
> +	struct hv_create_vp *input;
> +	u16 status;
> +	unsigned long irq_flags;
> +	int ret = 0;
> +#ifdef CONFIG_ACPI_NUMA
> +	int pxm = node_to_pxm(node);
> +#else
> +	int pxm = 0;
> +#endif

Same comment.

> +
> +	/* Root VPs don't seem to need pages deposited */
> +	if (partition_id != hv_current_partition_id) {
> +		ret = hv_call_deposit_pages(node, partition_id, 90);

Perhaps add a comment about the value "90".  Was it
empirically determined?

> +		if (ret)
> +			return ret;
> +	}
> +
> +	do {
> +		local_irq_save(irq_flags);
> +
> +		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> +		input->partition_id = partition_id;
> +		input->vp_index = vp_index;
> +		input->flags = flags;
> +		input->subnode_type = HvSubnodeAny;
> +		if (node != NUMA_NO_NODE) {
> +			input->proximity_domain_info.domain_id = pxm;
> +			input->proximity_domain_info.flags.reserved = 0;
> +			input->proximity_domain_info.flags.proximity_info_valid = 1;
> +			input->proximity_domain_info.flags.proximity_preferred = 1;
> +		} else {
> +			input->proximity_domain_info.as_uint64 = 0;
> +		}
> +		status = hv_do_hypercall(HVCALL_CREATE_VP, input, NULL);
> +		local_irq_restore(irq_flags);
> +
> +		if (status != HV_STATUS_INSUFFICIENT_MEMORY) {

Same problems with the status check.

> +			if (status != HV_STATUS_SUCCESS) {
> +				pr_err("%s: vcpu %u, lp %u, %d\n", __func__,
> +				       vp_index, flags, status);
> +				ret = status;
> +			}
> +			break;
> +		}
> +		ret = hv_call_deposit_pages(node, partition_id, 1);
> +
> +	} while (!ret);
> +
> +	return ret;
> +}
> +
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 67f5d35a73d3..4e590a167160 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -80,6 +80,10 @@ extern void  __percpu  **hyperv_pcpu_output_arg;
> 
>  extern u64 hv_current_partition_id;
> 
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> +int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> +int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> +
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 87b1a79b19eb..ec53570102f0 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -142,6 +142,8 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX	0x0014
>  #define HVCALL_SEND_IPI_EX			0x0015
>  #define HVCALL_GET_PARTITION_ID			0x0046
> +#define HVCALL_DEPOSIT_MEMORY			0x0048
> +#define HVCALL_CREATE_VP			0x004e
>  #define HVCALL_GET_VP_REGISTERS			0x0050
>  #define HVCALL_SET_VP_REGISTERS			0x0051
>  #define HVCALL_POST_MESSAGE			0x005c
> @@ -149,6 +151,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_POST_DEBUG_DATA			0x0069
>  #define HVCALL_RETRIEVE_DEBUG_DATA		0x006a
>  #define HVCALL_RESET_DEBUG_SESSION		0x006b
> +#define HVCALL_ADD_LOGICAL_PROCESSOR		0x0076
>  #define HVCALL_RETARGET_INTERRUPT		0x007e
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> @@ -413,6 +416,70 @@ struct hv_get_partition_id {
>  	u64 partition_id;
>  } __packed;
> 
> +/* HvDepositMemory hypercall */
> +struct hv_deposit_memory {
> +	u64 partition_id;
> +	u64 gpa_page_list[];
> +} __packed;
> +
> +struct hv_proximity_domain_flags {
> +	u32 proximity_preferred : 1;
> +	u32 reserved : 30;
> +	u32 proximity_info_valid : 1;
> +} __packed;
> +
> +/* Not a union in windows but useful for zeroing */
> +union hv_proximity_domain_info {
> +	struct {
> +		u32 domain_id;
> +		struct hv_proximity_domain_flags flags;
> +	};
> +	u64 as_uint64;
> +} __packed;
> +
> +struct hv_lp_startup_status {
> +	u64 hv_status;
> +	u64 substatus1;
> +	u64 substatus2;
> +	u64 substatus3;
> +	u64 substatus4;
> +	u64 substatus5;
> +	u64 substatus6;
> +} __packed;
> +
> +/* HvAddLogicalProcessor hypercall */
> +struct hv_add_logical_processor_in {
> +	u32 lp_index;
> +	u32 apic_id;
> +	union hv_proximity_domain_info proximity_domain_info;
> +	u64 flags;
> +};

__packed is missing from this struct definition

> +
> +struct hv_add_logical_processor_out {
> +	struct hv_lp_startup_status startup_status;
> +} __packed;
> +
> +enum HV_SUBNODE_TYPE
> +{
> +    HvSubnodeAny = 0,
> +    HvSubnodeSocket,
> +    HvSubnodeAmdNode,
> +    HvSubnodeL3,
> +    HvSubnodeCount,
> +    HvSubnodeInvalid = -1
> +};

Are these values defined by Hyper-V?  If so, explicitly coding the
value of each enum member might be better.

> +
> +/* HvCreateVp hypercall */
> +struct hv_create_vp {
> +	u64 partition_id;
> +	u32 vp_index;
> +	u8 padding[3];
> +	u8 subnode_type;
> +	u64 subnode_id;
> +	union hv_proximity_domain_info proximity_domain_info;
> +	u64 flags;
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>  	u64 as_uint64;
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus
  2021-01-20 12:00 ` [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus Wei Liu
@ 2021-01-26  1:21   ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  1:21 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Lillian Grassin-Drake, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Microsoft Hypervisor requires the root partition to make a few
> hypercalls to setup application processors before they can be used.
> 
> Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> CPU hotplug and unplug is not yet supported in this setup, so those
> paths remain untouched.
> 
> v3: Always call native SMP preparation function.
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index c376d191a260..13d3b6dd21a3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -31,6 +31,7 @@
>  #include <asm/reboot.h>
>  #include <asm/nmi.h>
>  #include <clocksource/hyperv_timer.h>
> +#include <asm/numa.h>
> 
>  /* Is Linux running as the root partition? */
>  bool hv_root_partition;
> @@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void)
>  	hv_init_spinlocks();
>  #endif
>  }
> +
> +static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> +{
> +#ifdef CONFIG_X86_64
> +	int i;
> +	int ret;
> +#endif
> +
> +	native_smp_prepare_cpus(max_cpus);
> +
> +#ifdef CONFIG_X86_64
> +	for_each_present_cpu(i) {
> +		if (i == 0)
> +			continue;
> +		ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
> +		BUG_ON(ret);
> +	}
> +
> +	for_each_present_cpu(i) {
> +		if (i == 0)
> +			continue;
> +		ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i);
> +		BUG_ON(ret);
> +	}
> +#endif
> +}
>  #endif
> 
>  static void __init ms_hyperv_init_platform(void)
> @@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void)
> 
>  # ifdef CONFIG_SMP
>  	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> +	if (hv_root_partition)
> +		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
>  # endif
> 
>  	/*
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry
  2021-01-20 12:00 ` [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry Wei Liu
@ 2021-01-26  1:22   ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  1:22 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will soon need to access fields inside the MSI address and MSI data
> fields. Introduce hv_msi_address_register and hv_msi_data_register.
> 
> Fix up one user of hv_msi_entry in mshyperv.h.
> 
> No functional change expected.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
>  arch/x86/include/asm/mshyperv.h   |  4 ++--
>  include/asm-generic/hyperv-tlfs.h | 28 ++++++++++++++++++++++++++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 4e590a167160..cbee72550a12 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -257,8 +257,8 @@ static inline void hv_apic_init(void) {}
>  static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
>  					      struct msi_desc *msi_desc)
>  {
> -	msi_entry->address = msi_desc->msg.address_lo;
> -	msi_entry->data = msi_desc->msg.data;
> +	msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
> +	msi_entry->data.as_uint32 = msi_desc->msg.data;
>  }
> 
>  #else /* CONFIG_HYPERV */
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index ec53570102f0..7e103be42799 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -480,12 +480,36 @@ struct hv_create_vp {
>  	u64 flags;
>  } __packed;
> 
> +union hv_msi_address_register {
> +	u32 as_uint32;
> +	struct {
> +		u32 reserved1:2;
> +		u32 destination_mode:1;
> +		u32 redirection_hint:1;
> +		u32 reserved2:8;
> +		u32 destination_id:8;
> +		u32 msi_base:12;
> +	};
> +} __packed;
> +
> +union hv_msi_data_register {
> +	u32 as_uint32;
> +	struct {
> +		u32 vector:8;
> +		u32 delivery_mode:3;
> +		u32 reserved1:3;
> +		u32 level_assert:1;
> +		u32 trigger_mode:1;
> +		u32 reserved2:16;
> +	};
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>  	u64 as_uint64;
>  	struct {
> -		u32 address;
> -		u32 data;
> +		union hv_msi_address_register address;
> +		union hv_msi_data_register data;
>  	} __packed;
>  };
> 
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry
  2021-01-20 12:00 ` [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry Wei Liu
@ 2021-01-26  1:23   ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  1:23 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, Rob Herring,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Lorenzo Pieralisi, Bjorn Helgaas, Arnd Bergmann,
	open list:PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will soon use the same structure to handle IO-APIC interrupts as
> well. Introduce an enum to identify the source and a data structure for
> IO-APIC RTE.
> 
> While at it, update pci-hyperv.c to use the enum.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Acked-by: Rob Herring <robh@kernel.org>
> ---
>  drivers/pci/controller/pci-hyperv.c |  2 +-
>  include/asm-generic/hyperv-tlfs.h   | 36 +++++++++++++++++++++++++++--
>  2 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 6db8d96a78eb..87aa62ee0368 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data)
>  	params = &hbus->retarget_msi_interrupt_params;
>  	memset(params, 0, sizeof(*params));
>  	params->partition_id = HV_PARTITION_ID_SELF;
> -	params->int_entry.source = 1; /* MSI(-X) */
> +	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
>  	hv_set_msi_entry_from_desc(&params->int_entry.msi_entry, msi_desc);
>  	params->device_id = (hbus->hdev->dev_instance.b[5] << 24) |
>  			   (hbus->hdev->dev_instance.b[4] << 16) |
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 7e103be42799..8423bf53c237 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -480,6 +480,11 @@ struct hv_create_vp {
>  	u64 flags;
>  } __packed;
> 
> +enum hv_interrupt_source {
> +	HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */
> +	HV_INTERRUPT_SOURCE_IOAPIC,
> +};
> +
>  union hv_msi_address_register {
>  	u32 as_uint32;
>  	struct {
> @@ -513,10 +518,37 @@ union hv_msi_entry {
>  	} __packed;
>  };
> 
> +union hv_ioapic_rte {
> +	u64 as_uint64;
> +
> +	struct {
> +		u32 vector:8;
> +		u32 delivery_mode:3;
> +		u32 destination_mode:1;
> +		u32 delivery_status:1;
> +		u32 interrupt_polarity:1;
> +		u32 remote_irr:1;
> +		u32 trigger_mode:1;
> +		u32 interrupt_mask:1;
> +		u32 reserved1:15;
> +
> +		u32 reserved2:24;
> +		u32 destination_id:8;
> +	};
> +
> +	struct {
> +		u32 low_uint32;
> +		u32 high_uint32;
> +	};
> +} __packed;
> +
>  struct hv_interrupt_entry {
> -	u32 source;			/* 1 for MSI(-X) */
> +	u32 source;
>  	u32 reserved1;
> -	union hv_msi_entry msi_entry;
> +	union {
> +		union hv_msi_entry msi_entry;
> +		union hv_ioapic_rte ioapic_rte;
> +	};
>  } __packed;
> 
>  /*
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-01-20 12:00 ` [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures Wei Liu
@ 2021-01-26  1:26   ` Michael Kelley
  2021-02-02 17:02     ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  1:26 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will need to identify the device we want Microsoft Hypervisor to
> manipulate.  Introduce the data structures for that purpose.
> 
> They will be used in a later patch.
> 
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
>  include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
>  1 file changed, 79 insertions(+)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 8423bf53c237..42ff1326c6bd 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
>  	} element[];
>  } __packed;
> 
> +enum hv_device_type {
> +	HV_DEVICE_TYPE_LOGICAL = 0,
> +	HV_DEVICE_TYPE_PCI = 1,
> +	HV_DEVICE_TYPE_IOAPIC = 2,
> +	HV_DEVICE_TYPE_ACPI = 3,
> +};
> +
> +typedef u16 hv_pci_rid;
> +typedef u16 hv_pci_segment;
> +typedef u64 hv_logical_device_id;
> +union hv_pci_bdf {
> +	u16 as_uint16;
> +
> +	struct {
> +		u8 function:3;
> +		u8 device:5;
> +		u8 bus;
> +	};
> +} __packed;
> +
> +union hv_pci_bus_range {
> +	u16 as_uint16;
> +
> +	struct {
> +		u8 subordinate_bus;
> +		u8 secondary_bus;
> +	};
> +} __packed;
> +
> +union hv_device_id {
> +	u64 as_uint64;
> +
> +	struct {
> +		u64 :62;
> +		u64 device_type:2;
> +	};

Are the above 4 lines extraneous junk? 
If not, a comment would be helpful.  And we
would normally label the 62 bit field as 
"reserved0" or something similar.

> +
> +	/* HV_DEVICE_TYPE_LOGICAL */
> +	struct {
> +		u64 id:62;
> +		u64 device_type:2;
> +	} logical;
> +
> +	/* HV_DEVICE_TYPE_PCI */
> +	struct {
> +		union {
> +			hv_pci_rid rid;
> +			union hv_pci_bdf bdf;
> +		};
> +
> +		hv_pci_segment segment;
> +		union hv_pci_bus_range shadow_bus_range;
> +
> +		u16 phantom_function_bits:2;
> +		u16 source_shadow:1;
> +
> +		u16 rsvdz0:11;
> +		u16 device_type:2;
> +	} pci;
> +
> +	/* HV_DEVICE_TYPE_IOAPIC */
> +	struct {
> +		u8 ioapic_id;
> +		u8 rsvdz0;
> +		u16 rsvdz1;
> +		u16 rsvdz2;
> +
> +		u16 rsvdz3:14;
> +		u16 device_type:2;
> +	} ioapic;
> +
> +	/* HV_DEVICE_TYPE_ACPI */
> +	struct {
> +		u32 input_mapping_base;
> +		u32 input_mapping_count:30;
> +		u32 device_type:2;
> +	} acpi;
> +} __packed;
> +
>  #endif
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts
  2021-01-20 12:00 ` [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts Wei Liu
@ 2021-01-26  1:27   ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-01-26  1:27 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 13 +++++++++++
>  include/asm-generic/hyperv-tlfs.h  | 36 ++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 204010350604..ab7d6cde548d 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -533,6 +533,19 @@ struct hv_partition_assist_pg {
>  	u32 tlb_lock_count;
>  };
> 
> +enum hv_interrupt_type {
> +	HV_X64_INTERRUPT_TYPE_FIXED             = 0x0000,
> +	HV_X64_INTERRUPT_TYPE_LOWESTPRIORITY    = 0x0001,
> +	HV_X64_INTERRUPT_TYPE_SMI               = 0x0002,
> +	HV_X64_INTERRUPT_TYPE_REMOTEREAD        = 0x0003,
> +	HV_X64_INTERRUPT_TYPE_NMI               = 0x0004,
> +	HV_X64_INTERRUPT_TYPE_INIT              = 0x0005,
> +	HV_X64_INTERRUPT_TYPE_SIPI              = 0x0006,
> +	HV_X64_INTERRUPT_TYPE_EXTINT            = 0x0007,
> +	HV_X64_INTERRUPT_TYPE_LOCALINT0         = 0x0008,
> +	HV_X64_INTERRUPT_TYPE_LOCALINT1         = 0x0009,
> +	HV_X64_INTERRUPT_TYPE_MAXIMUM           = 0x000A,
> +};
> 
>  #include <asm-generic/hyperv-tlfs.h>
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 42ff1326c6bd..07efe0131fe3 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -152,6 +152,8 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_RETRIEVE_DEBUG_DATA		0x006a
>  #define HVCALL_RESET_DEBUG_SESSION		0x006b
>  #define HVCALL_ADD_LOGICAL_PROCESSOR		0x0076
> +#define HVCALL_MAP_DEVICE_INTERRUPT		0x007c
> +#define HVCALL_UNMAP_DEVICE_INTERRUPT		0x007d
>  #define HVCALL_RETARGET_INTERRUPT		0x007e
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> @@ -702,4 +704,38 @@ union hv_device_id {
>  	} acpi;
>  } __packed;
> 
> +enum hv_interrupt_trigger_mode {
> +	HV_INTERRUPT_TRIGGER_MODE_EDGE = 0,
> +	HV_INTERRUPT_TRIGGER_MODE_LEVEL = 1,
> +};
> +
> +struct hv_device_interrupt_descriptor {
> +	u32 interrupt_type;
> +	u32 trigger_mode;
> +	u32 vector_count;
> +	u32 reserved;
> +	struct hv_device_interrupt_target target;
> +} __packed;
> +
> +struct hv_input_map_device_interrupt {
> +	u64 partition_id;
> +	u64 device_id;
> +	u64 flags;
> +	struct hv_interrupt_entry logical_interrupt_entry;
> +	struct hv_device_interrupt_descriptor interrupt_descriptor;
> +} __packed;
> +
> +struct hv_output_map_device_interrupt {
> +	struct hv_interrupt_entry interrupt_entry;
> +} __packed;
> +
> +struct hv_input_unmap_device_interrupt {
> +	u64 partition_id;
> +	u64 device_id;
> +	struct hv_interrupt_entry interrupt_entry;
> +} __packed;
> +
> +#define HV_SOURCE_SHADOW_NONE               0x0
> +#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE   0x1
> +
>  #endif
> --
> 2.20.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition
  2021-01-20 16:03   ` Pavel Tatashin
@ 2021-01-26 15:06     ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-26 15:06 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Michael Kelley, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Wed, Jan 20, 2021 at 11:03:18AM -0500, Pavel Tatashin wrote:
> On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
> >
> > For now we can use the privilege flag to check. Stash the value to be
> > used later.
> >
> > Put in a bunch of defines for future use when we want to have more
> > fine-grained detection.
> >
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> > v3: move hv_root_partition to mshyperv.c
> > ---
> >  arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
> >  arch/x86/include/asm/mshyperv.h    |  2 ++
> >  arch/x86/kernel/cpu/mshyperv.c     | 20 ++++++++++++++++++++
> >  3 files changed, 32 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > index 6bf42aed387e..204010350604 100644
> > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > @@ -21,6 +21,7 @@
> >  #define HYPERV_CPUID_FEATURES                  0x40000003
> >  #define HYPERV_CPUID_ENLIGHTMENT_INFO          0x40000004
> >  #define HYPERV_CPUID_IMPLEMENT_LIMITS          0x40000005
> > +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES   0x40000007
> >  #define HYPERV_CPUID_NESTED_FEATURES           0x4000000A
> >
> >  #define HYPERV_CPUID_VIRT_STACK_INTERFACE      0x40000081
> > @@ -110,6 +111,15 @@
> >  /* Recommend using enlightened VMCS */
> >  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED            BIT(14)
> >
> > +/*
> > + * CPU management features identification.
> > + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> > + */
> > +#define HV_X64_START_LOGICAL_PROCESSOR                 BIT(0)
> > +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR           BIT(1)
> > +#define HV_X64_PERFORMANCE_COUNTER_SYNC                        BIT(2)
> > +#define HV_X64_RESERVED_IDENTITY_BIT                   BIT(31)
> > +
> >  /*
> >   * Virtual processor will never share a physical core with another virtual
> >   * processor, except for virtual processors that are reported as sibling SMT
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index ffc289992d1b..ac2b0d110f03 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
> >                 struct hv_guest_mapping_flush_list *flush,
> >                 u64 start_gfn, u64 end_gfn);
> >
> > +extern bool hv_root_partition;
> > +
> >  #ifdef CONFIG_X86_64
> >  void hv_apic_init(void);
> >  void __init hv_init_spinlocks(void);
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index f628e3dc150f..c376d191a260 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -32,6 +32,10 @@
> >  #include <asm/nmi.h>
> >  #include <clocksource/hyperv_timer.h>
> >
> > +/* Is Linux running as the root partition? */
> > +bool hv_root_partition;
> > +EXPORT_SYMBOL_GPL(hv_root_partition);
> > +
> >  struct ms_hyperv_info ms_hyperv;
> >  EXPORT_SYMBOL_GPL(ms_hyperv);
> >
> > @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
> >         pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
> >                  ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
> >
> > +       /*
> > +        * Check CPU management privilege.
> > +        *
> > +        * To mirror what Windows does we should extract CPU management
> > +        * features and use the ReservedIdentityBit to detect if Linux is the
> > +        * root partition. But that requires negotiating CPU management
> > +        * interface (a process to be finalized).
> 
> Is this comment relevant? Do we have to mirror what Windows does?
> 

We should do that in the future when the process for negotiating CPU
management features is stabilized / finalized.

> > +        *
> > +        * For now, use the privilege flag as the indicator for running as
> > +        * root.
> > +        */
> > +       if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> > +               hv_root_partition = true;
> > +               pr_info("Hyper-V: running as root partition\n");
> > +       }
> > +
> 
> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

Thanks for reviewing these patches.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition
  2021-01-26  0:31   ` Michael Kelley
@ 2021-01-26 15:15     ` Wei Liu
  2021-01-26 15:24       ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-01-26 15:15 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Tue, Jan 26, 2021 at 12:31:31AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > For now we can use the privilege flag to check. Stash the value to be
> > used later.
> > 
> > Put in a bunch of defines for future use when we want to have more
> > fine-grained detection.
> > 
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> > v3: move hv_root_partition to mshyperv.c
> > ---
> >  arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
> >  arch/x86/include/asm/mshyperv.h    |  2 ++
> >  arch/x86/kernel/cpu/mshyperv.c     | 20 ++++++++++++++++++++
> >  3 files changed, 32 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > index 6bf42aed387e..204010350604 100644
> > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > @@ -21,6 +21,7 @@
> >  #define HYPERV_CPUID_FEATURES			0x40000003
> >  #define HYPERV_CPUID_ENLIGHTMENT_INFO		0x40000004
> >  #define HYPERV_CPUID_IMPLEMENT_LIMITS		0x40000005
> > +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES	0x40000007
> >  #define HYPERV_CPUID_NESTED_FEATURES		0x4000000A
> > 
> >  #define HYPERV_CPUID_VIRT_STACK_INTERFACE	0x40000081
> > @@ -110,6 +111,15 @@
> >  /* Recommend using enlightened VMCS */
> >  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
> > 
> > +/*
> > + * CPU management features identification.
> > + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> > + */
> > +#define HV_X64_START_LOGICAL_PROCESSOR			BIT(0)
> > +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR		BIT(1)
> > +#define HV_X64_PERFORMANCE_COUNTER_SYNC			BIT(2)
> > +#define HV_X64_RESERVED_IDENTITY_BIT			BIT(31)
> > +
> 
> I wonder if these bit definitions should go in the asm-generic part of
> hyperv-tlfs.h instead of the X64 specific part.  They look very architecture
> neutral (in which case the X64 should be dropped from the name
> as well).  Of course, they can be moved later when/if we get to that point
> and have a firmer understanding of what is and isn't arch neutral.

Yes. This is the approach I'm taking here. They can be easily moved in
the future if there is a need.

> 
> >  /*
> >   * Virtual processor will never share a physical core with another virtual
> >   * processor, except for virtual processors that are reported as sibling SMT
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index ffc289992d1b..ac2b0d110f03 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
> >  		struct hv_guest_mapping_flush_list *flush,
> >  		u64 start_gfn, u64 end_gfn);
> > 
> > +extern bool hv_root_partition;
> > +
> >  #ifdef CONFIG_X86_64
> >  void hv_apic_init(void);
> >  void __init hv_init_spinlocks(void);
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index f628e3dc150f..c376d191a260 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -32,6 +32,10 @@
> >  #include <asm/nmi.h>
> >  #include <clocksource/hyperv_timer.h>
> > 
> > +/* Is Linux running as the root partition? */
> > +bool hv_root_partition;
> > +EXPORT_SYMBOL_GPL(hv_root_partition);
> > +
> >  struct ms_hyperv_info ms_hyperv;
> >  EXPORT_SYMBOL_GPL(ms_hyperv);
> > 
> > @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
> >  	pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
> >  		 ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
> > 
> > +	/*
> > +	 * Check CPU management privilege.
> > +	 *
> > +	 * To mirror what Windows does we should extract CPU management
> > +	 * features and use the ReservedIdentityBit to detect if Linux is the
> > +	 * root partition. But that requires negotiating CPU management
> > +	 * interface (a process to be finalized).
> > +	 *
> > +	 * For now, use the privilege flag as the indicator for running as
> > +	 * root.
> > +	 */
> > +	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> 
> Should the EBX value be captured in the ms_hyperv structure with the
> other similar values, and then used from there?
> 

There is only one usage of this in this whole series so I didn't bother
capturing. I would also like to clean up ms_hyperv_info's fields a bit.

Given there are quite some patches pending which change ms_hyperv_info
struct, I would like to avoid creating more conflicts than necessary.

My plan is to implement my idea from the thread "Field names inside
ms_hyperv_info" once all patches that touch ms_hyperv_info are merged.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if running as root
  2021-01-20 16:13   ` Pavel Tatashin
@ 2021-01-26 15:19     ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-26 15:19 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Michael Kelley, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, Daniel Lezcano,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Thomas Gleixner

On Wed, Jan 20, 2021 at 11:13:28AM -0500, Pavel Tatashin wrote:
> On Wed, Jan 20, 2021 at 7:01 AM Wei Liu <wei.liu@kernel.org> wrote:
> >
> > When Linux runs as the root partition, the setup required for TSC page
> > is different.
> 
> Why would we need a TSC page as a clock source for root partition at
> all? I think the above can be removed.
> 

The TSC page is considered superior to MSR-based clock. In the future we
may want to switch back to that TSC page instead.

I think it provides more context than without.

Wei.

>  Luckily Linux also has access to the MSR based
> > clocksource. We can just disable the TSC page clocksource if Linux is
> > the root partition.
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition
  2021-01-26 15:15     ` Wei Liu
@ 2021-01-26 15:24       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-26 15:24 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Tue, Jan 26, 2021 at 03:15:12PM +0000, Wei Liu wrote:
> On Tue, Jan 26, 2021 at 12:31:31AM +0000, Michael Kelley wrote:
> > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > > 
> > > For now we can use the privilege flag to check. Stash the value to be
> > > used later.
> > > 
> > > Put in a bunch of defines for future use when we want to have more
> > > fine-grained detection.
> > > 
> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > ---
> > > v3: move hv_root_partition to mshyperv.c
> > > ---
> > >  arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
> > >  arch/x86/include/asm/mshyperv.h    |  2 ++
> > >  arch/x86/kernel/cpu/mshyperv.c     | 20 ++++++++++++++++++++
> > >  3 files changed, 32 insertions(+)
> > > 
> > > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > > index 6bf42aed387e..204010350604 100644
> > > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > > @@ -21,6 +21,7 @@
> > >  #define HYPERV_CPUID_FEATURES			0x40000003
> > >  #define HYPERV_CPUID_ENLIGHTMENT_INFO		0x40000004
> > >  #define HYPERV_CPUID_IMPLEMENT_LIMITS		0x40000005
> > > +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES	0x40000007
> > >  #define HYPERV_CPUID_NESTED_FEATURES		0x4000000A
> > > 
> > >  #define HYPERV_CPUID_VIRT_STACK_INTERFACE	0x40000081
> > > @@ -110,6 +111,15 @@
> > >  /* Recommend using enlightened VMCS */
> > >  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
> > > 
> > > +/*
> > > + * CPU management features identification.
> > > + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> > > + */
> > > +#define HV_X64_START_LOGICAL_PROCESSOR			BIT(0)
> > > +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR		BIT(1)
> > > +#define HV_X64_PERFORMANCE_COUNTER_SYNC			BIT(2)
> > > +#define HV_X64_RESERVED_IDENTITY_BIT			BIT(31)
> > > +
> > 
> > I wonder if these bit definitions should go in the asm-generic part of
> > hyperv-tlfs.h instead of the X64 specific part.  They look very architecture
> > neutral (in which case the X64 should be dropped from the name
> > as well).  Of course, they can be moved later when/if we get to that point
> > and have a firmer understanding of what is and isn't arch neutral.
> 
> Yes. This is the approach I'm taking here. They can be easily moved in
> the future if there is a need.
> 
> > 
> > >  /*
> > >   * Virtual processor will never share a physical core with another virtual
> > >   * processor, except for virtual processors that are reported as sibling SMT
> > > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > > index ffc289992d1b..ac2b0d110f03 100644
> > > --- a/arch/x86/include/asm/mshyperv.h
> > > +++ b/arch/x86/include/asm/mshyperv.h
> > > @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
> > >  		struct hv_guest_mapping_flush_list *flush,
> > >  		u64 start_gfn, u64 end_gfn);
> > > 
> > > +extern bool hv_root_partition;
> > > +
> > >  #ifdef CONFIG_X86_64
> > >  void hv_apic_init(void);
> > >  void __init hv_init_spinlocks(void);
> > > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > > index f628e3dc150f..c376d191a260 100644
> > > --- a/arch/x86/kernel/cpu/mshyperv.c
> > > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > > @@ -32,6 +32,10 @@
> > >  #include <asm/nmi.h>
> > >  #include <clocksource/hyperv_timer.h>
> > > 
> > > +/* Is Linux running as the root partition? */
> > > +bool hv_root_partition;
> > > +EXPORT_SYMBOL_GPL(hv_root_partition);
> > > +
> > >  struct ms_hyperv_info ms_hyperv;
> > >  EXPORT_SYMBOL_GPL(ms_hyperv);
> > > 
> > > @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
> > >  	pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
> > >  		 ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
> > > 
> > > +	/*
> > > +	 * Check CPU management privilege.
> > > +	 *
> > > +	 * To mirror what Windows does we should extract CPU management
> > > +	 * features and use the ReservedIdentityBit to detect if Linux is the
> > > +	 * root partition. But that requires negotiating CPU management
> > > +	 * interface (a process to be finalized).
> > > +	 *
> > > +	 * For now, use the privilege flag as the indicator for running as
> > > +	 * root.
> > > +	 */
> > > +	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> > 
> > Should the EBX value be captured in the ms_hyperv structure with the
> > other similar values, and then used from there?
> > 
> 
> There is only one usage of this in this whole series so I didn't bother
> capturing. I would also like to clean up ms_hyperv_info's fields a bit.

Correction: there are two patches that use this. But the rest of my
argument stands.

> 
> Given there are quite some patches pending which change ms_hyperv_info
> struct, I would like to avoid creating more conflicts than necessary.
> 
> My plan is to implement my idea from the thread "Field names inside
> ms_hyperv_info" once all patches that touch ms_hyperv_info are merged.
> 
> Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required
  2021-01-26  0:41   ` Michael Kelley
@ 2021-01-26 18:09     ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-01-26 18:09 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, Lillian Grassin-Drake,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Tue, Jan 26, 2021 at 12:41:05AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > When Linux runs as the root partition, it will need to make hypercalls
> > which return data from the hypervisor.
> > 
> > Allocate pages for storing results when Linux runs as the root
> > partition.
> > 
> > Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> > Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> > v3: Fix hv_cpu_die to use free_pages.
> > v2: Address Vitaly's comments
> > ---
> >  arch/x86/hyperv/hv_init.c       | 35 ++++++++++++++++++++++++++++-----
> >  arch/x86/include/asm/mshyperv.h |  1 +
> >  2 files changed, 31 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index e04d90af4c27..6f4cb40e53fe 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> >  void  __percpu **hyperv_pcpu_input_arg;
> >  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> > 
> > +void  __percpu **hyperv_pcpu_output_arg;
> > +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> > +
> >  u32 hv_max_vp_index;
> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> > 
> > @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
> >  	void **input_arg;
> >  	struct page *pg;
> > 
> > -	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> >  	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> > -	pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> > +	pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ?
> > 1 : 0);
> >  	if (unlikely(!pg))
> >  		return -ENOMEM;
> > +
> > +	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> >  	*input_arg = page_address(pg);
> > +	if (hv_root_partition) {
> > +		void **output_arg;
> > +
> > +		output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> > +		*output_arg = page_address(pg + 1);
> > +	}
> > 
> >  	hv_get_vp_index(msr_vp_index);
> > 
> > @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
> >  	unsigned int new_cpu;
> >  	unsigned long flags;
> >  	void **input_arg;
> > -	void *input_pg = NULL;
> > +	void *pg;
> > 
> >  	local_irq_save(flags);
> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	input_pg = *input_arg;
> > +	pg = *input_arg;
> >  	*input_arg = NULL;
> > +
> > +	if (hv_root_partition) {
> > +		void **output_arg;
> > +
> > +		output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> > +		*output_arg = NULL;
> > +	}
> > +
> >  	local_irq_restore(flags);
> > -	free_page((unsigned long)input_pg);
> > +
> > +	free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
> > 
> >  	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
> >  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> > @@ -346,6 +365,12 @@ void __init hyperv_init(void)
> > 
> >  	BUG_ON(hyperv_pcpu_input_arg == NULL);
> > 
> > +	/* Allocate the per-CPU state for output arg for root */
> > +	if (hv_root_partition) {
> > +		hyperv_pcpu_output_arg = alloc_percpu(void *);
> > +		BUG_ON(hyperv_pcpu_output_arg == NULL);
> > +	}
> > +
> >  	/* Allocate percpu VP index */
> >  	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> >  				    GFP_KERNEL);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index ac2b0d110f03..62d9390f1ddf 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
> >  #if IS_ENABLED(CONFIG_HYPERV)
> >  extern void *hv_hypercall_pg;
> >  extern void  __percpu  **hyperv_pcpu_input_arg;
> > +extern void  __percpu  **hyperv_pcpu_output_arg;
> > 
> >  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> >  {
> > --
> > 2.20.1
> 
> I think this all works OK.  But a meta question:  Do we need a separate
> per-cpu output argument page?  From the Hyper-V hypercall standpoint, I
> don't think input and output args need to be in separate pages.  They both

That's correct. They don't have to be in separate pages.

> just need to not cross a page boundary.  As long as we don't have a hypercall
> where the sum of the sizes of the input and output args exceeds a page,
> we could just have a single page, and split it up in any manner that works
> for the particular hypercall.
> 

There is one more requirement: The pointers must be 8-byte aligned. That
means we may need to explicitly pad things a bit. That quickly becomes
tedious if we do it in every call site; or we will need to provide a
macro to do the calculation correctly.

Another consideration is hypercalls that take variable-length input /
output. Admittedly I haven't seen one that takes variable-length
arguments and needs to do input and output at the same time, I wouldn't
want to paint ourselves into the corner now because sizing
variable-length input and output at the same time can be non-trivial.

Wei.

> Thoughts?
> 
> Michael
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
  2021-01-20 12:00 ` [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition Wei Liu
@ 2021-01-27  5:47   ` Michael Kelley
  2021-02-02 17:31     ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-27  5:47 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition on Microsoft Hypervisor, its
> interrupts are remapped.  Linux will need to explicitly map and unmap
> interrupts for hardware.
> 
> Implement an MSI domain to issue the correct hypercalls. And initialize
> this irqdomain as the default MSI irq domain.
> 
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
> v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> v3: build irqdomain.o for 32bit as well.

I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is built
only for 64-bit, I'm assuming running Linux in the root partition
is only functional for 64-bit builds.  So is the goal simply that 32-bit
builds will compile correctly?  Seems like maybe there should be
a CONFIG option for running Linux in the root partition, and that
option would force 64-bit.

> v2: This patch is simplified due to upstream changes.
> ---
>  arch/x86/hyperv/Makefile        |   2 +-
>  arch/x86/hyperv/hv_init.c       |   9 +
>  arch/x86/hyperv/irqdomain.c     | 332 ++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h |   2 +
>  4 files changed, 344 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/irqdomain.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 565358020921..48e2c51464e8 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y			:= hv_init.o mmu.o nested.o
> +obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
>  obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index ad8e77859b32..1cb2f7d1850a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -484,6 +484,15 @@ void __init hyperv_init(void)
> 
>  	BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> 
> +#ifdef CONFIG_PCI_MSI
> +	/*
> +	 * If we're running as root, we want to create our own PCI MSI domain.
> +	 * We can't set this in hv_pci_init because that would be too late.
> +	 */
> +	if (hv_root_partition)
> +		x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
> +#endif
> +
>  	return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> new file mode 100644
> index 000000000000..19637cd60231
> --- /dev/null
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -0,0 +1,332 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> +//
> +// Authors:
> +//   Sunil Muthuswamy <sunilmut@microsoft.com>
> +//   Wei Liu <wei.liu@kernel.org>

I think the // comment style should only be used for the SPDX line.

> +
> +#include <linux/pci.h>
> +#include <linux/irq.h>
> +#include <asm/mshyperv.h>
> +
> +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> +{
> +	unsigned long flags;
> +	struct hv_input_unmap_device_interrupt *input;
> +	struct hv_interrupt_entry *intr_entry;
> +	u16 status;
> +
> +	local_irq_save(flags);
> +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> +	memset(input, 0, sizeof(*input));
> +	intr_entry = &input->interrupt_entry;
> +	input->partition_id = hv_current_partition_id;
> +	input->device_id = id;
> +	*intr_entry = *old_entry;
> +
> +	status = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, 0, 0, input, NULL) &
> +			 HV_HYPERCALL_RESULT_MASK;
> +	local_irq_restore(flags);
> +
> +	return status;
> +}
> +
> +#ifdef CONFIG_PCI_MSI
> +struct rid_data {
> +	struct pci_dev *bridge;
> +	u32 rid;
> +};
> +
> +static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
> +{
> +	struct rid_data *rd = data;
> +	u8 bus = PCI_BUS_NUM(rd->rid);
> +
> +	if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
> +		rd->bridge = pdev;
> +		rd->rid = alias;
> +	}
> +
> +	return 0;
> +}
> +
> +static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> +{
> +	union hv_device_id dev_id;
> +	struct rid_data data = {
> +		.bridge = NULL,
> +		.rid = PCI_DEVID(dev->bus->number, dev->devfn)
> +	};
> +
> +	pci_for_each_dma_alias(dev, get_rid_cb, &data);
> +
> +	dev_id.as_uint64 = 0;
> +	dev_id.device_type = HV_DEVICE_TYPE_PCI;
> +	dev_id.pci.segment = pci_domain_nr(dev->bus);
> +
> +	dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> +	dev_id.pci.bdf.device = PCI_SLOT(data.rid);
> +	dev_id.pci.bdf.function = PCI_FUNC(data.rid);
> +	dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
> +
> +	if (data.bridge) {
> +		int pos;
> +
> +		/*
> +		 * Microsoft Hypervisor requires a bus range when the bridge is
> +		 * running in PCI-X mode.
> +		 *
> +		 * To distinguish conventional vs PCI-X bridge, we can check
> +		 * the bridge's PCI-X Secondary Status Register, Secondary Bus
> +		 * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
> +		 * Specification Revision 1.0 5.2.2.1.3.
> +		 *
> +		 * Value zero means it is in conventional mode, otherwise it is
> +		 * in PCI-X mode.
> +		 */
> +
> +		pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
> +		if (pos) {
> +			u16 status;
> +
> +			pci_read_config_word(data.bridge, pos +
> +					PCI_X_BRIDGE_SSTATUS, &status);
> +
> +			if (status & PCI_X_SSTATUS_FREQ) {
> +				/* Non-zero, PCI-X mode */
> +				u8 sec_bus, sub_bus;
> +
> +				dev_id.pci.source_shadow = HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
> +
> +				pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus);
> +				dev_id.pci.shadow_bus_range.secondary_bus = sec_bus;
> +				pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus);
> +				dev_id.pci.shadow_bus_range.subordinate_bus = sub_bus;
> +			}
> +		}
> +	}
> +
> +	return dev_id;
> +}
> +
> +static int hv_map_msi_interrupt(struct pci_dev *dev, int vcpu, int vector,
> +				struct hv_interrupt_entry *entry)
> +{
> +	struct hv_input_map_device_interrupt *input;
> +	struct hv_output_map_device_interrupt *output;
> +	struct hv_device_interrupt_descriptor *intr_desc;
> +	unsigned long flags;
> +	u16 status;
> +
> +	local_irq_save(flags);
> +
> +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> +	intr_desc = &input->interrupt_descriptor;
> +	memset(input, 0, sizeof(*input));
> +	input->partition_id = hv_current_partition_id;
> +	input->device_id = hv_build_pci_dev_id(dev).as_uint64;
> +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> +	intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +	intr_desc->vector_count = 1;
> +	intr_desc->target.vector = vector;
> +	__set_bit(vcpu, (unsigned long*)&intr_desc->target.vp_mask);

This is using the CPU bitmap format that supports up to 64 vCPUs.  Any reason not
to use the format that supports a larger number of CPUs?   In either case, perhaps
a check for the value of vcpu against the max of 64 (or the larger number if you
change the bitmap format) would be appropriate.

> +
> +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
> +			 HV_HYPERCALL_RESULT_MASK;
> +	*entry = output->interrupt_entry;
> +
> +	local_irq_restore(flags);
> +
> +	if (status != HV_STATUS_SUCCESS)
> +		pr_err("%s: hypercall failed, status %d\n", __func__, status);
> +
> +	return status;
> +}
> +
> +static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
> +{
> +	/* High address is always 0 */
> +	msg->address_hi = 0;
> +	msg->address_lo = entry->msi_entry.address.as_uint32;
> +	msg->data = entry->msi_entry.data.as_uint32;
> +}
> +
> +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry);
> +static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +	struct msi_desc *msidesc;
> +	struct pci_dev *dev;
> +	struct hv_interrupt_entry out_entry, *stored_entry;
> +	struct irq_cfg *cfg = irqd_cfg(data);
> +	struct cpumask *affinity;
> +	int cpu, vcpu;
> +	u16 status;
> +
> +	msidesc = irq_data_get_msi_desc(data);
> +	dev = msi_desc_to_pci_dev(msidesc);
> +
> +	if (!cfg) {
> +		pr_debug("%s: cfg is NULL", __func__);
> +		return;
> +	}
> +
> +	affinity = irq_data_get_effective_affinity_mask(data);
> +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> +	vcpu = hv_cpu_number_to_vp_number(cpu);
> +
> +	if (data->chip_data) {
> +		/*
> +		 * This interrupt is already mapped. Let's unmap first.
> +		 *
> +		 * We don't use retarget interrupt hypercalls here because
> +		 * Microsoft Hypervisor doens't allow root to change the vector
> +		 * or specify VPs outside of the set that is initially used
> +		 * during mapping.
> +		 */
> +		stored_entry = data->chip_data;
> +		data->chip_data = NULL;
> +
> +		status = hv_unmap_msi_interrupt(dev, stored_entry);
> +
> +		kfree(stored_entry);
> +
> +		if (status != HV_STATUS_SUCCESS) {
> +			pr_debug("%s: failed to unmap, status %d", __func__, status);
> +			return;
> +		}
> +	}
> +
> +	stored_entry = kzalloc(sizeof(*stored_entry), GFP_ATOMIC);
> +	if (!stored_entry) {
> +		pr_debug("%s: failed to allocate chip data\n", __func__);
> +		return;
> +	}
> +
> +	status = hv_map_msi_interrupt(dev, vcpu, cfg->vector, &out_entry);
> +	if (status != HV_STATUS_SUCCESS) {
> +		kfree(stored_entry);
> +		return;
> +	}
> +
> +	*stored_entry = out_entry;
> +	data->chip_data = stored_entry;
> +	entry_to_msi_msg(&out_entry, msg);
> +
> +	return;
> +}
> +
> +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry
> *old_entry)
> +{
> +	return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry)
> +		& HV_HYPERCALL_RESULT_MASK;

The masking with HV_HYPERCALL_RESULT_MASK is already done in
hv_unmap_interrupt().

> +}
> +
> +static void hv_teardown_msi_irq_common(struct pci_dev *dev, struct msi_desc
> *msidesc, int irq)
> +{
> +	u16 status;
> +	struct hv_interrupt_entry old_entry;
> +	struct irq_desc *desc;
> +	struct irq_data *data;
> +	struct msi_msg msg;
> +
> +	desc = irq_to_desc(irq);
> +	if (!desc) {
> +		pr_debug("%s: no irq desc\n", __func__);
> +		return;
> +	}
> +
> +	data = &desc->irq_data;
> +	if (!data) {
> +		pr_debug("%s: no irq data\n", __func__);
> +		return;
> +	}
> +
> +	if (!data->chip_data) {
> +		pr_debug("%s: no chip data\n!", __func__);
> +		return;
> +	}
> +
> +	old_entry = *(struct hv_interrupt_entry *)data->chip_data;
> +	entry_to_msi_msg(&old_entry, &msg);
> +
> +	kfree(data->chip_data);
> +	data->chip_data = NULL;
> +
> +	status = hv_unmap_msi_interrupt(dev, &old_entry);
> +
> +	if (status != HV_STATUS_SUCCESS) {
> +		pr_err("%s: hypercall failed, status %d\n", __func__, status);
> +		return;
> +	}
> +}
> +
> +static void hv_msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +{
> +	int i;
> +	struct msi_desc *entry;
> +	struct pci_dev *pdev;
> +
> +	if (WARN_ON_ONCE(!dev_is_pci(dev)))
> +		return;
> +
> +	pdev = to_pci_dev(dev);
> +
> +	for_each_pci_msi_entry(entry, pdev) {
> +		if (entry->irq) {
> +			for (i = 0; i < entry->nvec_used; i++) {
> +				hv_teardown_msi_irq_common(pdev, entry, entry->irq +
> i);
> +				irq_domain_free_irqs(entry->irq + i, 1);
> +			}
> +		}
> +	}
> +}
> +
> +/*
> + * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
> + * which implement the MSI or MSI-X Capability Structure.
> + */
> +static struct irq_chip hv_pci_msi_controller = {
> +	.name			= "HV-PCI-MSI",
> +	.irq_unmask		= pci_msi_unmask_irq,
> +	.irq_mask		= pci_msi_mask_irq,
> +	.irq_ack		= irq_chip_ack_parent,
> +	.irq_retrigger		= irq_chip_retrigger_hierarchy,
> +	.irq_compose_msi_msg	= hv_irq_compose_msi_msg,
> +	.irq_set_affinity	= msi_domain_set_affinity,
> +	.flags			= IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static struct msi_domain_ops pci_msi_domain_ops = {
> +	.domain_free_irqs	= hv_msi_domain_free_irqs,
> +	.msi_prepare		= pci_msi_prepare,
> +};
> +
> +static struct msi_domain_info hv_pci_msi_domain_info = {
> +	.flags		= MSI_FLAG_USE_DEF_DOM_OPS |
> MSI_FLAG_USE_DEF_CHIP_OPS |
> +			  MSI_FLAG_PCI_MSIX,
> +	.ops		= &pci_msi_domain_ops,
> +	.chip		= &hv_pci_msi_controller,
> +	.handler	= handle_edge_irq,
> +	.handler_name	= "edge",
> +};
> +
> +struct irq_domain * __init hv_create_pci_msi_domain(void)
> +{
> +	struct irq_domain *d = NULL;
> +	struct fwnode_handle *fn;
> +
> +	fn = irq_domain_alloc_named_fwnode("HV-PCI-MSI");
> +	if (fn)
> +		d = pci_msi_create_irq_domain(fn, &hv_pci_msi_domain_info,
> x86_vector_domain);
> +
> +	/* No point in going further if we can't get an irq domain */
> +	BUG_ON(!d);
> +
> +	return d;
> +}
> +
> +#endif /* CONFIG_PCI_MSI */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index cbee72550a12..ccc849e25d5e 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -261,6 +261,8 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry
> *msi_entry,
>  	msi_entry->data.as_uint32 = msi_desc->msg.data;
>  }
> 
> +struct irq_domain *hv_create_pci_msi_domain(void);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping " Wei Liu
@ 2021-01-27  5:47   ` Michael Kelley
  2021-02-03 12:47     ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-01-27  5:47 UTC (permalink / raw)
  To: Wei Liu, Linux on Hyper-V List
  Cc: virtualization, Linux Kernel List, Vineeth Pillai,
	Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> ---
>  arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
>  3 files changed, 233 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 19637cd60231..8e2b4e478b70 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  }
> 
>  #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> +{
> +	union hv_device_id device_id;
> +
> +	device_id.as_uint64 = 0;
> +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> +	return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK;

The masking is already done in hv_unmap_interrupt.

> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> +		struct hv_interrupt_entry *entry)
> +{
> +	unsigned long flags;
> +	struct hv_input_map_device_interrupt *input;
> +	struct hv_output_map_device_interrupt *output;
> +	union hv_device_id device_id;
> +	struct hv_device_interrupt_descriptor *intr_desc;
> +	u16 status;
> +
> +	device_id.as_uint64 = 0;
> +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> +	local_irq_save(flags);
> +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +	memset(input, 0, sizeof(*input));
> +	intr_desc = &input->interrupt_descriptor;
> +	input->partition_id = hv_current_partition_id;
> +	input->device_id = device_id.as_uint64;
> +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> +	intr_desc->target.vector = vector;
> +	intr_desc->vector_count = 1;
> +
> +	if (level)
> +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> +	else
> +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> +
> +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
> +			 HV_HYPERCALL_RESULT_MASK;
> +	local_irq_restore(flags);
> +
> +	*entry = output->interrupt_entry;
> +
> +	return status;

As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
mostly parallel, though some of the assignments are done in a different order.  It's a nit,
but making them as parallel as possible would be nice. :-)

Same 64 vCPU comment applies here as well.


> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> hv_msi_entry *msi_entry,
> 
>  struct irq_domain *hv_create_pci_msi_domain(void);
> 
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> +		struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index b7db6024e65c..6d35e4c303c6 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
>  	.free = hyperv_irq_remapping_free,
>  };
> 
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
>  static int __init hyperv_prepare_irq_remapping(void)
>  {
>  	struct fwnode_handle *fn;
>  	int i;
> +	const char *name;
> +	const struct irq_domain_ops *ops;
> 
>  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>  	    x86_init.hyper.msi_ext_dest_id() ||
> -	    !x2apic_supported() || hv_root_partition)
> +	    !x2apic_supported())

Any reason that the check for hv_root_partition was added
in patch #4  of this series, and then removed here?  Could
patch #4 just be dropped?

>  		return -ENODEV;
> 
> -	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> +	if (hv_root_partition) {
> +		name = "HYPERV-ROOT-IR";
> +		ops = &hyperv_root_ir_domain_ops;
> +	} else {
> +		name = "HYPERV-IR";
> +		ops = &hyperv_ir_domain_ops;
> +	}
> +
> +	fn = irq_domain_alloc_named_id_fwnode(name, 0);
>  	if (!fn)
>  		return -ENOMEM;
> 
>  	ioapic_ir_domain =
>  		irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> -				0, IOAPIC_REMAPPING_ENTRY, fn,
> -				&hyperv_ir_domain_ops, NULL);
> +				0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
> 
>  	if (!ioapic_ir_domain) {
>  		irq_domain_free_fwnode(fn);
>  		return -ENOMEM;
>  	}
> 
> +	if (hv_root_partition)
> +		return 0; /* The rest is only relevant to guests */
> +
>  	/*
>  	 * Hyper-V doesn't provide irq remapping function for
>  	 * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> @@ -167,4 +180,162 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
>  	.enable			= hyperv_enable_irq_remapping,
>  };
> 
> +/* IRQ remapping domain when Linux runs as the root partition */
> +struct hyperv_root_ir_data {
> +	u8 ioapic_id;
> +	bool is_level;
> +	struct hv_interrupt_entry entry;
> +};
> +
> +static void
> +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> +{
> +	u16 status;
> +	u32 vector;
> +	struct irq_cfg *cfg;
> +	int ioapic_id;
> +	struct cpumask *affinity;
> +	int cpu, vcpu;
> +	struct hv_interrupt_entry entry;
> +	struct hyperv_root_ir_data *data = irq_data->chip_data;
> +	struct IO_APIC_route_entry e;
> +
> +	cfg = irqd_cfg(irq_data);
> +	affinity = irq_data_get_effective_affinity_mask(irq_data);
> +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> +	vcpu = hv_cpu_number_to_vp_number(cpu);
> +
> +	vector = cfg->vector;
> +	ioapic_id = data->ioapic_id;
> +
> +	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC

Does 'data' need to be checked to be non-NULL?  The parallel code in
hv_irq_compose_msi_msg() makes such a check.

> +	    && data->entry.ioapic_rte.as_uint64) {
> +		entry = data->entry;
> +
> +		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> +
> +		if (status != HV_STATUS_SUCCESS)
> +			pr_debug("%s: unexpected unmap status %d\n", __func__, status);
> +
> +		data->entry.ioapic_rte.as_uint64 = 0;
> +		data->entry.source = 0; /* Invalid source */

Again comparing, hv_irq_compose_msi_msg() frees the old
entry, and then allocates a new one.   This code reuses the old entry. 
Any reason for the difference?

> +	}
> +
> +
> +	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
> +					vector, &entry);
> +
> +	if (status != HV_STATUS_SUCCESS) {
> +		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
> +		return;
> +	}
> +
> +	data->entry = entry;
> +
> +	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> +	e.w1 = entry.ioapic_rte.low_uint32;
> +	e.w2 = entry.ioapic_rte.high_uint32;
> +
> +	memset(msg, 0, sizeof(*msg));
> +	msg->arch_data.vector = e.vector;
> +	msg->arch_data.delivery_mode = e.delivery_mode;
> +	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> +	msg->arch_addr_lo.dmar_format = e.ir_format;
> +	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> +}

Having this whole function be more parallel to hv_irq_compose_msi_msg()
would be nice. :-)

> +
> +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> +		const struct cpumask *mask, bool force)
> +{
> +	struct irq_data *parent = data->parent_data;
> +	struct irq_cfg *cfg = irqd_cfg(data);
> +	int ret;
> +
> +	ret = parent->chip->irq_set_affinity(parent, mask, force);
> +	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> +		return ret;
> +
> +	send_cleanup_vector(cfg);
> +
> +	return 0;
> +}
> +
> +static struct irq_chip hyperv_root_ir_chip = {
> +	.name			= "HYPERV-ROOT-IR",
> +	.irq_ack		= apic_ack_irq,
> +	.irq_set_affinity	= hyperv_root_ir_set_affinity,
> +	.irq_compose_msi_msg	= hyperv_root_ir_compose_msi_msg,
> +};
> +
> +static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
> +				     unsigned int virq, unsigned int nr_irqs,
> +				     void *arg)
> +{
> +	struct irq_alloc_info *info = arg;
> +	struct irq_data *irq_data;
> +	struct hyperv_root_ir_data *data;
> +	int ret = 0;
> +
> +	if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> +		return -EINVAL;
> +
> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> +	if (ret < 0)
> +		return ret;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data) {
> +		irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +		return -ENOMEM;
> +	}
> +
> +	irq_data = irq_domain_get_irq_data(domain, virq);
> +	if (!irq_data) {
> +		kfree(data);
> +		irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +		return -EINVAL;
> +	}
> +
> +	data->ioapic_id = info->devid;
> +	data->is_level = info->ioapic.is_level;
> +
> +	irq_data->chip = &hyperv_root_ir_chip;
> +	irq_data->chip_data = data;
> +
> +	return 0;
> +}
> +
> +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> +				 unsigned int virq, unsigned int nr_irqs)
> +{
> +	struct irq_data *irq_data;
> +	struct hyperv_root_ir_data *data;
> +	struct hv_interrupt_entry *e;
> +	int i;
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		irq_data = irq_domain_get_irq_data(domain, virq + i);
> +
> +		if (irq_data && irq_data->chip_data) {
> +			data = irq_data->chip_data;

Set irq_data->chip_data to NULL?  That seems to be done in other
similar places in your code.

> +			e = &data->entry;
> +
> +			if (e->source == HV_DEVICE_TYPE_IOAPIC
> +			      && e->ioapic_rte.as_uint64)
> +				hv_unmap_ioapic_interrupt(data->ioapic_id,
> +							&data->entry);
> +
> +			kfree(data);
> +		}
> +	}
> +
> +	irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +}
> +
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> +	.select = hyperv_irq_remapping_select,
> +	.alloc = hyperv_root_irq_remapping_alloc,
> +	.free = hyperv_root_irq_remapping_free,
> +};
> +
>  #endif
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
  2021-01-26  0:48   ` Michael Kelley
@ 2021-02-02 15:03     ` Wei Liu
  2021-02-04 16:33       ` Michael Kelley
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-02-02 15:03 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, Lillian Grassin-Drake,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Tue, Jan 26, 2021 at 12:48:37AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > We will need the partition ID for executing some hypercalls later.
> > 
> > Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> > v3:
> > 1. Make hv_get_partition_id static.
> > 2. Change code structure a bit.
> > ---
> >  arch/x86/hyperv/hv_init.c         | 27 +++++++++++++++++++++++++++
> >  arch/x86/include/asm/mshyperv.h   |  2 ++
> >  include/asm-generic/hyperv-tlfs.h |  6 ++++++
> >  3 files changed, 35 insertions(+)
> > 
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 6f4cb40e53fe..fc9941bd8653 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -26,6 +26,9 @@
> >  #include <linux/syscore_ops.h>
> >  #include <clocksource/hyperv_timer.h>
> > 
> > +u64 hv_current_partition_id = ~0ull;
> > +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> > +
> >  void *hv_hypercall_pg;
> >  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> > 
> > @@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = {
> >  	.resume		= hv_resume,
> >  };
> > 
> > +static void __init hv_get_partition_id(void)
> > +{
> > +	struct hv_get_partition_id *output_page;
> > +	u16 status;
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> > +	output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +	status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> > +		HV_HYPERCALL_RESULT_MASK;
> > +	if (status != HV_STATUS_SUCCESS) {
> 
> Across the Hyper-V code in Linux, the way we check the hypercall result
> is very inconsistent.  IMHO, the and'ing of hv_do_hypercall() with 
> HV_HYPERCALL_RESULT_MASK so that status can be a u16 is stylistically
> a bit unusual.
> 
> I'd like to see the hypercall result being stored into a u64 local variable.
> Then the subsequent test for the status should 'and' the u64 with
> HV_HYPERCALL_RESULT_MASK to determine the result code.
> I've made a note to go fix the places that aren't doing it that way.
> 

I will fold in the following diff in the next version. I will also check
if there are other instances in this patch series that need fixing.
Pretty sure there are a few.

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index fc9941bd8653..6064f64a1295 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -337,14 +337,13 @@ static struct syscore_ops hv_syscore_ops = {
 static void __init hv_get_partition_id(void)
 {
        struct hv_get_partition_id *output_page;
-       u16 status;
+       u64 status;
        unsigned long flags;

        local_irq_save(flags);
        output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
-       status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
-               HV_HYPERCALL_RESULT_MASK;
-       if (status != HV_STATUS_SUCCESS) {
+       status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
+       if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
                /* No point in proceeding if this failed */
                pr_err("Failed to get partition ID: %d\n", status);
                BUG();
> > +		/* No point in proceeding if this failed */
> > +		pr_err("Failed to get partition ID: %d\n", status);
> > +		BUG();
> > +	}
> > +	hv_current_partition_id = output_page->partition_id;
> > +	local_irq_restore(flags);
> > +}
> > +
> >  /*
> >   * This function is to be invoked early in the boot sequence after the
> >   * hypervisor has been detected.
> > @@ -426,6 +448,11 @@ void __init hyperv_init(void)
> > 
> >  	register_syscore_ops(&hv_syscore_ops);
> > 
> > +	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> > +		hv_get_partition_id();
> 
> Another place where the EBX value saved into the ms_hyperv structure
> could be used.

If you're okay with my response earlier, this will be handled later in
another patch (series).

Wei.

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions
  2021-01-26  1:20   ` Michael Kelley
@ 2021-02-02 16:19     ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-02-02 16:19 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, Lillian Grassin-Drake,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Tue, Jan 26, 2021 at 01:20:36AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
[...]
> > +#include <asm/trace/hyperv.h>
> > +
> > +#define HV_DEPOSIT_MAX_ORDER (8)
> > +#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER)
> 
> Is there any reason to not let the maximum be 511, which is
> how many entries will fit on the hypercall input page?  The
> max could be define in terms of HY_HYP_PAGE_SIZE so that
> the logical dependency is fully expressed.  

Let me try changing this. This file is largely authored by Lilian and
Nuno. I don't see a particular reason why the value can't be larger.

I've updated the value to the following.

/*
 * See struct hv_deposit_memory. The first u64 is partition ID, the rest
 * are GPAs.
 */
#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)

Let's see how that goes. I will test it once I fix other places.

> 
> > +
> > +/*
> > + * Deposits exact number of pages
> > + * Must be called with interrupts enabled
> > + * Max 256 pages
> > + */
> > +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> > +{
> > +	struct page **pages;
> > +	int *counts;
> > +	int num_allocations;
> > +	int i, j, page_count;
> > +	int order;
> > +	int desired_order;
> > +	u16 status;
> > +	int ret;
> > +	u64 base_pfn;
> > +	struct hv_deposit_memory *input_page;
> > +	unsigned long flags;
> > +
> > +	if (num_pages > HV_DEPOSIT_MAX)
> > +		return -E2BIG;
> > +	if (!num_pages)
> > +		return 0;
> > +
> > +	/* One buffer for page pointers and counts */
> > +	pages = page_address(alloc_page(GFP_KERNEL));
> > +	if (!pages)
> 
> Does the above check work?  If alloc_pages() returns NULL, it looks like
> page_address() might fault.
> 

Good catch. Fixed.

> > +		return -ENOMEM;
> > +
> > +	counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
> > +	if (!counts) {
> > +		free_page((unsigned long)pages);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	/* Allocate all the pages before disabling interrupts */
> > +	num_allocations = 0;
> > +	i = 0;
> > +	order = HV_DEPOSIT_MAX_ORDER;
> > +
> > +	while (num_pages) {
> > +		/* Find highest order we can actually allocate */
> > +		desired_order = 31 - __builtin_clz(num_pages);
> > +		order = min(desired_order, order);
> 
> The above seems redundant since request sizes larger than the
> max have already been rejected.
> 

min(...) can be dropped.

> > +		do {
> > +			pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
> > +			if (!pages[i]) {
> > +				if (!order) {
> > +					ret = -ENOMEM;
> > +					goto err_free_allocations;
> > +				}
> > +				--order;
> > +			}
> > +		} while (!pages[i]);
> 
> The duplicative test of !pages[i] is somewhat annoying.  How about
> this:
> 
> 		while{!pages[i] = alloc_pages_node(node, GFP_KERNEL, order) {
> 			if (!order) {
> 				ret = -ENOMEM;
> 				goto err_free_allocations;
> 			}
> 			--order;
> 		}
> 
> or if you don't like doing an assignment in the while test:
> 
> 		while(1) {
> 			pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
> 			if (page[i])
> 				break;
> 			if (!order) {
> 				ret = -ENOMEM;
> 				goto err_free_allocations;
> 			}
> 			--order;
> 		}
> 

I will use this variant.

> > +
> > +		split_page(pages[i], order);
> > +		counts[i] = 1 << order;
> > +		num_pages -= counts[i];
> > +		i++;
> > +		num_allocations++;
> 
> Incrementing both I and num_allocations in the loop seems
> redundant, especially since num_allocations isn't used in the loop.
> Could num_allocations be assigned the value of i once the loop
> is exited?  (and num_allocations would not need to be initialized to 0.) 
> Would also have to do the assignment in the error case.
> 

Yes. That can be done.

> > +	}
> > +
> > +	local_irq_save(flags);
> > +
> > +	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +
> > +	input_page->partition_id = partition_id;
> > +
> > +	/* Populate gpa_page_list - these will fit on the input page */
> > +	for (i = 0, page_count = 0; i < num_allocations; ++i) {
> > +		base_pfn = page_to_pfn(pages[i]);
> > +		for (j = 0; j < counts[i]; ++j, ++page_count)
> > +			input_page->gpa_page_list[page_count] = base_pfn + j;
> > +	}
> > +	status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
> > +				     page_count, 0, input_page,
> > +				     NULL) & HV_HYPERCALL_RESULT_MASK;
> 
> Similar comment about how hypercall status is checked.
> 

Fixed.

> > +	local_irq_restore(flags);
> > +
> > +	if (status != HV_STATUS_SUCCESS) {
> > +		pr_err("Failed to deposit pages: %d\n", status);
> > +		ret = status;
> > +		goto err_free_allocations;
> > +	}
> > +
> > +	ret = 0;
> > +	goto free_buf;
> > +
> > +err_free_allocations:
> > +	for (i = 0; i < num_allocations; ++i) {
> > +		base_pfn = page_to_pfn(pages[i]);
> > +		for (j = 0; j < counts[i]; ++j)
> > +			__free_page(pfn_to_page(base_pfn + j));
> > +	}
> > +
> > +free_buf:
> > +	free_page((unsigned long)pages);
> > +	kfree(counts);
> > +	return ret;
> > +}
> > +
> > +int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
> > +{
> > +	struct hv_add_logical_processor_in *input;
> > +	struct hv_add_logical_processor_out *output;
> > +	int status;
> > +	unsigned long flags;
> > +	int ret = 0;
> > +#ifdef CONFIG_ACPI_NUMA
> > +	int pxm = node_to_pxm(node);
> > +#else
> > +	int pxm = 0;
> > +#endif
> 
> It seems like the above #ifdef'ery might be better fixed in
> include/acpi/acpi_numa.h, where there's already a null definition
> of pxm_to_node() in case CONFIG_ACPI_NUMA isn't defined.  There
> should also be a null definition of node_to_pxm() in that file.
> 

Sure.

> > +
> > +	/*
> > +	 * When adding a logical processor, the hypervisor may return
> > +	 * HV_STATUS_INSUFFICIENT_MEMORY. When that happens, we deposit more
> > +	 * pages and retry.
> > +	 */
> > +	do {
> > +		local_irq_save(flags);
> > +
> > +		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +		/* We don't do anything with the output right now */
> > +		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +
> > +		input->lp_index = lp_index;
> > +		input->apic_id = apic_id;
> > +		input->flags = 0;
> > +		input->proximity_domain_info.domain_id = pxm;
> > +		input->proximity_domain_info.flags.reserved = 0;
> > +		input->proximity_domain_info.flags.proximity_info_valid = 1;
> > +		input->proximity_domain_info.flags.proximity_preferred = 1;
> > +		status = hv_do_hypercall(HVCALL_ADD_LOGICAL_PROCESSOR,
> > +					 input, output);
> > +		local_irq_restore(flags);
> > +
> > +		if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
> 
> The 'and' with HV_HYPERCALL_RESULT_MASK isn't coded anywhere for this
> hypercall, and 'status' is declared as 'int'.
> 

Fixed.

> > +			if (status != HV_STATUS_SUCCESS) {
> > +				pr_err("%s: cpu %u apic ID %u, %d\n", __func__,
> > +				       lp_index, apic_id, status);
> > +				ret = status;
> > +			}
> > +			break;
> > +		}
> > +		ret = hv_call_deposit_pages(node, hv_current_partition_id, 1);
> > +	} while (!ret);
> > +
> > +	return ret;
> > +}
> > +
> > +int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
> > +{
> > +	struct hv_create_vp *input;
> > +	u16 status;
> > +	unsigned long irq_flags;
> > +	int ret = 0;
> > +#ifdef CONFIG_ACPI_NUMA
> > +	int pxm = node_to_pxm(node);
> > +#else
> > +	int pxm = 0;
> > +#endif
> 
> Same comment.
> 
> > +
> > +	/* Root VPs don't seem to need pages deposited */
> > +	if (partition_id != hv_current_partition_id) {
> > +		ret = hv_call_deposit_pages(node, partition_id, 90);
> 
> Perhaps add a comment about the value "90".  Was it
> empirically determined?

I think so. I will add a comment.

> 
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	do {
> > +		local_irq_save(irq_flags);
> > +
> > +		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +
> > +		input->partition_id = partition_id;
> > +		input->vp_index = vp_index;
> > +		input->flags = flags;
> > +		input->subnode_type = HvSubnodeAny;
> > +		if (node != NUMA_NO_NODE) {
> > +			input->proximity_domain_info.domain_id = pxm;
> > +			input->proximity_domain_info.flags.reserved = 0;
> > +			input->proximity_domain_info.flags.proximity_info_valid = 1;
> > +			input->proximity_domain_info.flags.proximity_preferred = 1;
> > +		} else {
> > +			input->proximity_domain_info.as_uint64 = 0;
> > +		}
> > +		status = hv_do_hypercall(HVCALL_CREATE_VP, input, NULL);
> > +		local_irq_restore(irq_flags);
> > +
> > +		if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
> 
> Same problems with the status check.

Fixed.

> 
> > +			if (status != HV_STATUS_SUCCESS) {
> > +				pr_err("%s: vcpu %u, lp %u, %d\n", __func__,
> > +				       vp_index, flags, status);
> > +				ret = status;
> > +			}
> > +			break;
> > +		}
> > +		ret = hv_call_deposit_pages(node, partition_id, 1);
> > +
> > +	} while (!ret);
> > +
> > +	return ret;
> > +}
> > +
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index 67f5d35a73d3..4e590a167160 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
[...]
> > +/* HvAddLogicalProcessor hypercall */
> > +struct hv_add_logical_processor_in {
> > +	u32 lp_index;
> > +	u32 apic_id;
> > +	union hv_proximity_domain_info proximity_domain_info;
> > +	u64 flags;
> > +};
> 
> __packed is missing from this struct definition
> 

Fixed.

> > +
> > +struct hv_add_logical_processor_out {
> > +	struct hv_lp_startup_status startup_status;
> > +} __packed;
> > +
> > +enum HV_SUBNODE_TYPE
> > +{
> > +    HvSubnodeAny = 0,
> > +    HvSubnodeSocket,
> > +    HvSubnodeAmdNode,
> > +    HvSubnodeL3,
> > +    HvSubnodeCount,
> > +    HvSubnodeInvalid = -1
> > +};
> 
> Are these values defined by Hyper-V?  If so, explicitly coding the
> value of each enum member might be better.

Fixed.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-01-26  1:26   ` Michael Kelley
@ 2021-02-02 17:02     ` Wei Liu
  2021-02-03 13:26       ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-02-02 17:02 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Tue, Jan 26, 2021 at 01:26:52AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > We will need to identify the device we want Microsoft Hypervisor to
> > manipulate.  Introduce the data structures for that purpose.
> > 
> > They will be used in a later patch.
> > 
> > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> >  include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
> >  1 file changed, 79 insertions(+)
> > 
> > diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> > index 8423bf53c237..42ff1326c6bd 100644
> > --- a/include/asm-generic/hyperv-tlfs.h
> > +++ b/include/asm-generic/hyperv-tlfs.h
> > @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
> >  	} element[];
> >  } __packed;
> > 
> > +enum hv_device_type {
> > +	HV_DEVICE_TYPE_LOGICAL = 0,
> > +	HV_DEVICE_TYPE_PCI = 1,
> > +	HV_DEVICE_TYPE_IOAPIC = 2,
> > +	HV_DEVICE_TYPE_ACPI = 3,
> > +};
> > +
> > +typedef u16 hv_pci_rid;
> > +typedef u16 hv_pci_segment;
> > +typedef u64 hv_logical_device_id;
> > +union hv_pci_bdf {
> > +	u16 as_uint16;
> > +
> > +	struct {
> > +		u8 function:3;
> > +		u8 device:5;
> > +		u8 bus;
> > +	};
> > +} __packed;
> > +
> > +union hv_pci_bus_range {
> > +	u16 as_uint16;
> > +
> > +	struct {
> > +		u8 subordinate_bus;
> > +		u8 secondary_bus;
> > +	};
> > +} __packed;
> > +
> > +union hv_device_id {
> > +	u64 as_uint64;
> > +
> > +	struct {
> > +		u64 :62;
> > +		u64 device_type:2;
> > +	};
> 
> Are the above 4 lines extraneous junk? 
> If not, a comment would be helpful.  And we
> would normally label the 62 bit field as 
> "reserved0" or something similar.
> 

No. It is not junk. I got this from a header in tree.

I am inclined to just drop this hunk. If that breaks things, I will use
"reserved0".

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
  2021-01-27  5:47   ` Michael Kelley
@ 2021-02-02 17:31     ` Wei Liu
  2021-02-02 18:15       ` Michael Kelley
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-02-02 17:31 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Wed, Jan 27, 2021 at 05:47:04AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > When Linux runs as the root partition on Microsoft Hypervisor, its
> > interrupts are remapped.  Linux will need to explicitly map and unmap
> > interrupts for hardware.
> > 
> > Implement an MSI domain to issue the correct hypercalls. And initialize
> > this irqdomain as the default MSI irq domain.
> > 
> > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> > v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> > v3: build irqdomain.o for 32bit as well.
> 
> I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is built
> only for 64-bit, I'm assuming running Linux in the root partition
> is only functional for 64-bit builds.  So is the goal simply that 32-bit
> builds will compile correctly?  Seems like maybe there should be
> a CONFIG option for running Linux in the root partition, and that
> option would force 64-bit.

To ensure 32 bit kernel builds and 32 bit guests still work.

The config option ROOT_API is to be introduced by Nuno's /dev/mshv
series. We can use that option to gate some objects when that's
available.

> 
[...]
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > new file mode 100644
> > index 000000000000..19637cd60231
> > --- /dev/null
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -0,0 +1,332 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +//
> > +// Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> > +//
> > +// Authors:
> > +//   Sunil Muthuswamy <sunilmut@microsoft.com>
> > +//   Wei Liu <wei.liu@kernel.org>
> 
> I think the // comment style should only be used for the SPDX line.

Fixed.

> 
> > +
> > +#include <linux/pci.h>
> > +#include <linux/irq.h>
> > +#include <asm/mshyperv.h>
> > +
[...]
> > +static int hv_map_msi_interrupt(struct pci_dev *dev, int vcpu, int vector,
> > +				struct hv_interrupt_entry *entry)
> > +{
> > +	struct hv_input_map_device_interrupt *input;
> > +	struct hv_output_map_device_interrupt *output;
> > +	struct hv_device_interrupt_descriptor *intr_desc;
> > +	unsigned long flags;
> > +	u16 status;
> > +
> > +	local_irq_save(flags);
> > +
> > +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +
> > +	intr_desc = &input->interrupt_descriptor;
> > +	memset(input, 0, sizeof(*input));
> > +	input->partition_id = hv_current_partition_id;
> > +	input->device_id = hv_build_pci_dev_id(dev).as_uint64;
> > +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > +	intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > +	intr_desc->vector_count = 1;
> > +	intr_desc->target.vector = vector;
> > +	__set_bit(vcpu, (unsigned long*)&intr_desc->target.vp_mask);
> 
> This is using the CPU bitmap format that supports up to 64 vCPUs.  Any reason not
> to use the format that supports a larger number of CPUs?   In either case, perhaps
> a check for the value of vcpu against the max of 64 (or the larger number if you
> change the bitmap format) would be appropriate.
> 

This is mostly due to we didn't have a suitably large machine during
development.

I will see if this can use vpset instead.

> > +
> > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
> > +			 HV_HYPERCALL_RESULT_MASK;
> > +	*entry = output->interrupt_entry;
> > +
> > +	local_irq_restore(flags);
> > +
> > +	if (status != HV_STATUS_SUCCESS)
> > +		pr_err("%s: hypercall failed, status %d\n", __func__, status);
> > +
> > +	return status;
> > +}
> > +
[...]
> > +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry
> > *old_entry)
> > +{
> > +	return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry)
> > +		& HV_HYPERCALL_RESULT_MASK;
> 
> The masking with HV_HYPERCALL_RESULT_MASK is already done in
> hv_unmap_interrupt().
> 

Fixed.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
  2021-02-02 17:31     ` Wei Liu
@ 2021-02-02 18:15       ` Michael Kelley
  2021-02-02 18:16         ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-02-02 18:15 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, February 2, 2021 9:32 AM
> 
> On Wed, Jan 27, 2021 at 05:47:04AM +0000, Michael Kelley wrote:
> > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > When Linux runs as the root partition on Microsoft Hypervisor, its
> > > interrupts are remapped.  Linux will need to explicitly map and unmap
> > > interrupts for hardware.
> > >
> > > Implement an MSI domain to issue the correct hypercalls. And initialize
> > > this irqdomain as the default MSI irq domain.
> > >
> > > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > ---
> > > v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> > > v3: build irqdomain.o for 32bit as well.
> >
> > I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is built
> > only for 64-bit, I'm assuming running Linux in the root partition
> > is only functional for 64-bit builds.  So is the goal simply that 32-bit
> > builds will compile correctly?  Seems like maybe there should be
> > a CONFIG option for running Linux in the root partition, and that
> > option would force 64-bit.
> 
> To ensure 32 bit kernel builds and 32 bit guests still work.
> 
> The config option ROOT_API is to be introduced by Nuno's /dev/mshv
> series. We can use that option to gate some objects when that's
> available.
> 

But just so I'm 100% clear, is there intent to run 32-bit Linux in the root
partition?  I'm assuming not.

Michael

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
  2021-02-02 18:15       ` Michael Kelley
@ 2021-02-02 18:16         ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-02-02 18:16 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin

On Tue, Feb 02, 2021 at 06:15:23PM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, February 2, 2021 9:32 AM
> > 
> > On Wed, Jan 27, 2021 at 05:47:04AM +0000, Michael Kelley wrote:
> > > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > > >
> > > > When Linux runs as the root partition on Microsoft Hypervisor, its
> > > > interrupts are remapped.  Linux will need to explicitly map and unmap
> > > > interrupts for hardware.
> > > >
> > > > Implement an MSI domain to issue the correct hypercalls. And initialize
> > > > this irqdomain as the default MSI irq domain.
> > > >
> > > > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > > ---
> > > > v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> > > > v3: build irqdomain.o for 32bit as well.
> > >
> > > I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is built
> > > only for 64-bit, I'm assuming running Linux in the root partition
> > > is only functional for 64-bit builds.  So is the goal simply that 32-bit
> > > builds will compile correctly?  Seems like maybe there should be
> > > a CONFIG option for running Linux in the root partition, and that
> > > option would force 64-bit.
> > 
> > To ensure 32 bit kernel builds and 32 bit guests still work.
> > 
> > The config option ROOT_API is to be introduced by Nuno's /dev/mshv
> > series. We can use that option to gate some objects when that's
> > available.
> > 
> 
> But just so I'm 100% clear, is there intent to run 32-bit Linux in the root
> partition?  I'm assuming not.

That's correct. There is no intent to run 32-bit Linux as the root
partition.

Wei.

> 
> Michael

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-01-27  5:47   ` Michael Kelley
@ 2021-02-03 12:47     ` Wei Liu
  2021-02-04 16:41       ` Michael Kelley
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-02-03 12:47 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

On Wed, Jan 27, 2021 at 05:47:08AM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > domain to handle mapping and unmapping of IO-APIC interrupts.
> > 
> > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > ---
> >  arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
> >  arch/x86/include/asm/mshyperv.h |   4 +
> >  drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
> >  3 files changed, 233 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 19637cd60231..8e2b4e478b70 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
> >  }
> > 
> >  #endif /* CONFIG_PCI_MSI */
> > +
> > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> > +{
> > +	union hv_device_id device_id;
> > +
> > +	device_id.as_uint64 = 0;
> > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > +
> > +	return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK;
> 
> The masking is already done in hv_unmap_interrupt.

Fixed.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > +
> > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > +		struct hv_interrupt_entry *entry)
> > +{
> > +	unsigned long flags;
> > +	struct hv_input_map_device_interrupt *input;
> > +	struct hv_output_map_device_interrupt *output;
> > +	union hv_device_id device_id;
> > +	struct hv_device_interrupt_descriptor *intr_desc;
> > +	u16 status;
> > +
> > +	device_id.as_uint64 = 0;
> > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > +
> > +	local_irq_save(flags);
> > +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +	memset(input, 0, sizeof(*input));
> > +	intr_desc = &input->interrupt_descriptor;
> > +	input->partition_id = hv_current_partition_id;
> > +	input->device_id = device_id.as_uint64;
> > +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > +	intr_desc->target.vector = vector;
> > +	intr_desc->vector_count = 1;
> > +
> > +	if (level)
> > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > +	else
> > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > +
> > +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > +
> > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) &
> > +			 HV_HYPERCALL_RESULT_MASK;
> > +	local_irq_restore(flags);
> > +
> > +	*entry = output->interrupt_entry;
> > +
> > +	return status;
> 
> As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
> mostly parallel, though some of the assignments are done in a different order.  It's a nit,
> but making them as parallel as possible would be nice. :-)
> 

Indeed. I will see about factoring out a function.

> Same 64 vCPU comment applies here as well.
> 

This is changed to use vpset instead. Took me a bit of time to get it
working because document is a bit lacking.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index ccc849e25d5e..345d7c6f8c37 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > hv_msi_entry *msi_entry,
> > 
> >  struct irq_domain *hv_create_pci_msi_domain(void);
> > 
> > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > +		struct hv_interrupt_entry *entry);
> > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> > +
> >  #else /* CONFIG_HYPERV */
> >  static inline void hyperv_init(void) {}
> >  static inline void hyperv_setup_mmu_ops(void) {}
> > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > index b7db6024e65c..6d35e4c303c6 100644
> > --- a/drivers/iommu/hyperv-iommu.c
> > +++ b/drivers/iommu/hyperv-iommu.c
> > @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
> >  	.free = hyperv_irq_remapping_free,
> >  };
> > 
> > +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
> >  static int __init hyperv_prepare_irq_remapping(void)
> >  {
> >  	struct fwnode_handle *fn;
> >  	int i;
> > +	const char *name;
> > +	const struct irq_domain_ops *ops;
> > 
> >  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> >  	    x86_init.hyper.msi_ext_dest_id() ||
> > -	    !x2apic_supported() || hv_root_partition)
> > +	    !x2apic_supported())
> 
> Any reason that the check for hv_root_partition was added
> in patch #4  of this series, and then removed here?  Could
> patch #4 just be dropped?
> 

Before v5 (or v4?) IO-APIC was not handled via Hyper-V IOMMU. Now it is.

Patch 4 has become redundant with that change. I already dropped patch 4
in the v6 branch I have locally.

> >  		return -ENODEV;
> > 
> > -	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> > +	if (hv_root_partition) {
> > +		name = "HYPERV-ROOT-IR";
> > +		ops = &hyperv_root_ir_domain_ops;
> > +	} else {
> > +		name = "HYPERV-IR";
> > +		ops = &hyperv_ir_domain_ops;
> > +	}
> > +
[...]
> > +static void
> > +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> > +{
> > +	u16 status;
> > +	u32 vector;
> > +	struct irq_cfg *cfg;
> > +	int ioapic_id;
> > +	struct cpumask *affinity;
> > +	int cpu, vcpu;
> > +	struct hv_interrupt_entry entry;
> > +	struct hyperv_root_ir_data *data = irq_data->chip_data;
> > +	struct IO_APIC_route_entry e;
> > +
> > +	cfg = irqd_cfg(irq_data);
> > +	affinity = irq_data_get_effective_affinity_mask(irq_data);
> > +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> > +	vcpu = hv_cpu_number_to_vp_number(cpu);
> > +
> > +	vector = cfg->vector;
> > +	ioapic_id = data->ioapic_id;
> > +
> > +	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
> 
> Does 'data' need to be checked to be non-NULL?  The parallel code in
> hv_irq_compose_msi_msg() makes such a check.

The usage of irq_data->chip_data is different in these two functions.

In this function, we're sure it is correctly allocated by
hyperv_root_ir_remapping_alloc at some point before.

In hv_irq_compose_msi_msg, irq_data->chip_data is instead used as a
temporary place to stash some state that is controlled solely by the
said function.

Once we get to the point of introducing a paravirtualized IOMMU for the
root partition, we can then unify these two paths.

> 
> > +	    && data->entry.ioapic_rte.as_uint64) {
> > +		entry = data->entry;
> > +
> > +		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> > +
> > +		if (status != HV_STATUS_SUCCESS)
> > +			pr_debug("%s: unexpected unmap status %d\n", __func__, status);
> > +
> > +		data->entry.ioapic_rte.as_uint64 = 0;
> > +		data->entry.source = 0; /* Invalid source */
> 
> Again comparing, hv_irq_compose_msi_msg() frees the old
> entry, and then allocates a new one.   This code reuses the old entry. 
> Any reason for the difference?
> 

See above.

I can perhaps tweak the logic a bit to reuse the same entry, but the
overall design won't change. I opted to always reallocate because that
looked more straight-forward to me.

Let me know if you feel strongly about reusing.

> > +	}
> > +
> > +
> > +	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
> > +					vector, &entry);
> > +
> > +	if (status != HV_STATUS_SUCCESS) {
> > +		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
> > +		return;
> > +	}
> > +
> > +	data->entry = entry;
> > +
> > +	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> > +	e.w1 = entry.ioapic_rte.low_uint32;
> > +	e.w2 = entry.ioapic_rte.high_uint32;
> > +
> > +	memset(msg, 0, sizeof(*msg));
> > +	msg->arch_data.vector = e.vector;
> > +	msg->arch_data.delivery_mode = e.delivery_mode;
> > +	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> > +	msg->arch_addr_lo.dmar_format = e.ir_format;
> > +	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> > +}
> 
> Having this whole function be more parallel to hv_irq_compose_msi_msg()
> would be nice. :-)
> 

Unlike hv_map_ioapic_interrupt and hv_map_msi_interrupt, which can
benefit from unifying now, this and hv_irq_compose_msi_msg will need to
wait till we have an IOMMU for the reason I stated above.

> > +
> > +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> > +		const struct cpumask *mask, bool force)
> > +{
> > +	struct irq_data *parent = data->parent_data;
> > +	struct irq_cfg *cfg = irqd_cfg(data);
> > +	int ret;
> > +
> > +	ret = parent->chip->irq_set_affinity(parent, mask, force);
> > +	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> > +		return ret;
> > +
> > +	send_cleanup_vector(cfg);
> > +
> > +	return 0;
> > +}
> > +
[...]
> > +
> > +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> > +				 unsigned int virq, unsigned int nr_irqs)
> > +{
> > +	struct irq_data *irq_data;
> > +	struct hyperv_root_ir_data *data;
> > +	struct hv_interrupt_entry *e;
> > +	int i;
> > +
> > +	for (i = 0; i < nr_irqs; i++) {
> > +		irq_data = irq_domain_get_irq_data(domain, virq + i);
> > +
> > +		if (irq_data && irq_data->chip_data) {
> > +			data = irq_data->chip_data;
> 
> Set irq_data->chip_data to NULL?  That seems to be done in other
> similar places in your code.

There is no need to do that. By the time this function returns, irq_data
will be gone too -- freed by irq_domain_free_irqs_common.

> 
> > +			e = &data->entry;
> > +
> > +			if (e->source == HV_DEVICE_TYPE_IOAPIC
> > +			      && e->ioapic_rte.as_uint64)
> > +				hv_unmap_ioapic_interrupt(data->ioapic_id,
> > +							&data->entry);
> > +
> > +			kfree(data);
> > +		}
> > +	}
> > +
> > +	irq_domain_free_irqs_common(domain, virq, nr_irqs);
> > +}
> > +
> > +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> > +	.select = hyperv_irq_remapping_select,
> > +	.alloc = hyperv_root_irq_remapping_alloc,
> > +	.free = hyperv_root_irq_remapping_free,
> > +};
> > +
> >  #endif
> > --
> > 2.20.1
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-02-02 17:02     ` Wei Liu
@ 2021-02-03 13:26       ` Wei Liu
  2021-02-03 13:49         ` Arnd Bergmann
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-02-03 13:26 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Tue, Feb 02, 2021 at 05:02:48PM +0000, Wei Liu wrote:
> On Tue, Jan 26, 2021 at 01:26:52AM +0000, Michael Kelley wrote:
> > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > > 
> > > We will need to identify the device we want Microsoft Hypervisor to
> > > manipulate.  Introduce the data structures for that purpose.
> > > 
> > > They will be used in a later patch.
> > > 
> > > Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > ---
> > >  include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
> > >  1 file changed, 79 insertions(+)
> > > 
> > > diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> > > index 8423bf53c237..42ff1326c6bd 100644
> > > --- a/include/asm-generic/hyperv-tlfs.h
> > > +++ b/include/asm-generic/hyperv-tlfs.h
> > > @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
> > >  	} element[];
> > >  } __packed;
> > > 
> > > +enum hv_device_type {
> > > +	HV_DEVICE_TYPE_LOGICAL = 0,
> > > +	HV_DEVICE_TYPE_PCI = 1,
> > > +	HV_DEVICE_TYPE_IOAPIC = 2,
> > > +	HV_DEVICE_TYPE_ACPI = 3,
> > > +};
> > > +
> > > +typedef u16 hv_pci_rid;
> > > +typedef u16 hv_pci_segment;
> > > +typedef u64 hv_logical_device_id;
> > > +union hv_pci_bdf {
> > > +	u16 as_uint16;
> > > +
> > > +	struct {
> > > +		u8 function:3;
> > > +		u8 device:5;
> > > +		u8 bus;
> > > +	};
> > > +} __packed;
> > > +
> > > +union hv_pci_bus_range {
> > > +	u16 as_uint16;
> > > +
> > > +	struct {
> > > +		u8 subordinate_bus;
> > > +		u8 secondary_bus;
> > > +	};
> > > +} __packed;
> > > +
> > > +union hv_device_id {
> > > +	u64 as_uint64;
> > > +
> > > +	struct {
> > > +		u64 :62;
> > > +		u64 device_type:2;
> > > +	};
> > 
> > Are the above 4 lines extraneous junk? 
> > If not, a comment would be helpful.  And we
> > would normally label the 62 bit field as 
> > "reserved0" or something similar.
> > 
> 
> No. It is not junk. I got this from a header in tree.
> 
> I am inclined to just drop this hunk. If that breaks things, I will use
> "reserved0".
> 

It turns out adding reserved0 is required. Dropping this hunk does not
work.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-02-03 13:26       ` Wei Liu
@ 2021-02-03 13:49         ` Arnd Bergmann
  2021-02-03 14:09           ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Arnd Bergmann @ 2021-02-03 13:49 UTC (permalink / raw)
  To: Wei Liu
  Cc: Michael Kelley, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Wed, Feb 3, 2021 at 2:26 PM Wei Liu <wei.liu@kernel.org> wrote:
> On Tue, Feb 02, 2021 at 05:02:48PM +0000, Wei Liu wrote:
> > On Tue, Jan 26, 2021 at 01:26:52AM +0000, Michael Kelley wrote:
> > > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > > > +union hv_device_id {
> > > > + u64 as_uint64;
> > > > +
> > > > + struct {
> > > > +         u64 :62;
> > > > +         u64 device_type:2;
> > > > + };
> > >
> > > Are the above 4 lines extraneous junk?
> > > If not, a comment would be helpful.  And we
> > > would normally label the 62 bit field as
> > > "reserved0" or something similar.
> > >
> >
> > No. It is not junk. I got this from a header in tree.
> >
> > I am inclined to just drop this hunk. If that breaks things, I will use
> > "reserved0".
> >
>
> It turns out adding reserved0 is required. Dropping this hunk does not
> work.

Generally speaking, bitfields are not great for specifying binary interfaces,
as the actual bit order can differ by architecture. The normal way we get
around it in the kernel is to use basic integer types and define macros
for bit masks. Ideally, each such field should also be marked with a
particular endianess as __le64 or __be64, in case this is ever used with
an Arm guest running a big-endian kernel.

That said, if you do not care about the specific order of the bits, having
anonymous bitfields for the reserved members is fine, I don't see a
reason to name it as reserved.

      Arnd

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-02-03 13:49         ` Arnd Bergmann
@ 2021-02-03 14:09           ` Wei Liu
  2021-02-04 16:46             ` Michael Kelley
  0 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2021-02-03 14:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Wei Liu, Michael Kelley, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

On Wed, Feb 03, 2021 at 02:49:53PM +0100, Arnd Bergmann wrote:
> On Wed, Feb 3, 2021 at 2:26 PM Wei Liu <wei.liu@kernel.org> wrote:
> > On Tue, Feb 02, 2021 at 05:02:48PM +0000, Wei Liu wrote:
> > > On Tue, Jan 26, 2021 at 01:26:52AM +0000, Michael Kelley wrote:
> > > > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > > > > +union hv_device_id {
> > > > > + u64 as_uint64;
> > > > > +
> > > > > + struct {
> > > > > +         u64 :62;
> > > > > +         u64 device_type:2;
> > > > > + };
> > > >
> > > > Are the above 4 lines extraneous junk?
> > > > If not, a comment would be helpful.  And we
> > > > would normally label the 62 bit field as
> > > > "reserved0" or something similar.
> > > >
> > >
> > > No. It is not junk. I got this from a header in tree.
> > >
> > > I am inclined to just drop this hunk. If that breaks things, I will use
> > > "reserved0".
> > >
> >
> > It turns out adding reserved0 is required. Dropping this hunk does not
> > work.
> 
> Generally speaking, bitfields are not great for specifying binary interfaces,
> as the actual bit order can differ by architecture. The normal way we get
> around it in the kernel is to use basic integer types and define macros
> for bit masks. Ideally, each such field should also be marked with a
> particular endianess as __le64 or __be64, in case this is ever used with
> an Arm guest running a big-endian kernel.

Thanks for the information.

I think we will need to wait until Microsoft Hypervisor clearly defines
the endianess in its header(s) before we can make changes to the copy in
Linux.

> 
> That said, if you do not care about the specific order of the bits, having
> anonymous bitfields for the reserved members is fine, I don't see a
> reason to name it as reserved.

Michael, let me know what you think. I'm not too fussed either way.

Wei.

> 
>       Arnd

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
  2021-02-02 15:03     ` Wei Liu
@ 2021-02-04 16:33       ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-02-04 16:33 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	Lillian Grassin-Drake, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, February 2, 2021 7:04 AM
> 
> On Tue, Jan 26, 2021 at 12:48:37AM +0000, Michael Kelley wrote:
> > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > We will need the partition ID for executing some hypercalls later.
> > >
> > > Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
> > > Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > ---
> > > v3:
> > > 1. Make hv_get_partition_id static.
> > > 2. Change code structure a bit.
> > > ---
> > >  arch/x86/hyperv/hv_init.c         | 27 +++++++++++++++++++++++++++
> > >  arch/x86/include/asm/mshyperv.h   |  2 ++
> > >  include/asm-generic/hyperv-tlfs.h |  6 ++++++
> > >  3 files changed, 35 insertions(+)
> > >
> > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > > index 6f4cb40e53fe..fc9941bd8653 100644
> > > --- a/arch/x86/hyperv/hv_init.c
> > > +++ b/arch/x86/hyperv/hv_init.c
> > > @@ -26,6 +26,9 @@
> > >  #include <linux/syscore_ops.h>
> > >  #include <clocksource/hyperv_timer.h>
> > >
> > > +u64 hv_current_partition_id = ~0ull;
> > > +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> > > +
> > >  void *hv_hypercall_pg;
> > >  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> > >
> > > @@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = {
> > >  	.resume		= hv_resume,
> > >  };
> > >
> > > +static void __init hv_get_partition_id(void)
> > > +{
> > > +	struct hv_get_partition_id *output_page;
> > > +	u16 status;
> > > +	unsigned long flags;
> > > +
> > > +	local_irq_save(flags);
> > > +	output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > +	status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> > > +		HV_HYPERCALL_RESULT_MASK;
> > > +	if (status != HV_STATUS_SUCCESS) {
> >
> > Across the Hyper-V code in Linux, the way we check the hypercall result
> > is very inconsistent.  IMHO, the and'ing of hv_do_hypercall() with
> > HV_HYPERCALL_RESULT_MASK so that status can be a u16 is stylistically
> > a bit unusual.
> >
> > I'd like to see the hypercall result being stored into a u64 local variable.
> > Then the subsequent test for the status should 'and' the u64 with
> > HV_HYPERCALL_RESULT_MASK to determine the result code.
> > I've made a note to go fix the places that aren't doing it that way.
> >
> 
> I will fold in the following diff in the next version. I will also check
> if there are other instances in this patch series that need fixing.
> Pretty sure there are a few.
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index fc9941bd8653..6064f64a1295 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -337,14 +337,13 @@ static struct syscore_ops hv_syscore_ops = {
>  static void __init hv_get_partition_id(void)
>  {
>         struct hv_get_partition_id *output_page;
> -       u16 status;
> +       u64 status;
>         unsigned long flags;
> 
>         local_irq_save(flags);
>         output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> -       status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> -               HV_HYPERCALL_RESULT_MASK;
> -       if (status != HV_STATUS_SUCCESS) {
> +       status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
> +       if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
>                 /* No point in proceeding if this failed */
>                 pr_err("Failed to get partition ID: %d\n", status);
>                 BUG();
> > > +		/* No point in proceeding if this failed */
> > > +		pr_err("Failed to get partition ID: %d\n", status);
> > > +		BUG();
> > > +	}
> > > +	hv_current_partition_id = output_page->partition_id;
> > > +	local_irq_restore(flags);
> > > +}
> > > +
> > >  /*
> > >   * This function is to be invoked early in the boot sequence after the
> > >   * hypervisor has been detected.
> > > @@ -426,6 +448,11 @@ void __init hyperv_init(void)
> > >
> > >  	register_syscore_ops(&hv_syscore_ops);
> > >
> > > +	if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> > > +		hv_get_partition_id();
> >
> > Another place where the EBX value saved into the ms_hyperv structure
> > could be used.
> 
> If you're okay with my response earlier, this will be handled later in
> another patch (series).
> 

Yes, that's OK.  Andrea Parri's patch series for Isolated VMs is capturing the
EBX value as well.

Michael

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-02-03 12:47     ` Wei Liu
@ 2021-02-04 16:41       ` Michael Kelley
  2021-02-04 16:48         ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Michael Kelley @ 2021-02-04 16:41 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 4:47 AM
> 
> On Wed, Jan 27, 2021 at 05:47:08AM +0000, Michael Kelley wrote:
> > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > > domain to handle mapping and unmapping of IO-APIC interrupts.
> > >
> > > Signed-off-by: Wei Liu <wei.liu@kernel.org>
> > > ---
> > >  arch/x86/hyperv/irqdomain.c     |  54 ++++++++++
> > >  arch/x86/include/asm/mshyperv.h |   4 +
> > >  drivers/iommu/hyperv-iommu.c    | 179 +++++++++++++++++++++++++++++++-
> > >  3 files changed, 233 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > index 19637cd60231..8e2b4e478b70 100644
> > > --- a/arch/x86/hyperv/irqdomain.c
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
> > >  }
> > >
> > >  #endif /* CONFIG_PCI_MSI */
> > > +
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> > > +{
> > > +	union hv_device_id device_id;
> > > +
> > > +	device_id.as_uint64 = 0;
> > > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > +	return hv_unmap_interrupt(device_id.as_uint64, entry) &
> HV_HYPERCALL_RESULT_MASK;
> >
> > The masking is already done in hv_unmap_interrupt.
> 
> Fixed.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > > +
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > > +		struct hv_interrupt_entry *entry)
> > > +{
> > > +	unsigned long flags;
> > > +	struct hv_input_map_device_interrupt *input;
> > > +	struct hv_output_map_device_interrupt *output;
> > > +	union hv_device_id device_id;
> > > +	struct hv_device_interrupt_descriptor *intr_desc;
> > > +	u16 status;
> > > +
> > > +	device_id.as_uint64 = 0;
> > > +	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > +	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > +	local_irq_save(flags);
> > > +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > +	memset(input, 0, sizeof(*input));
> > > +	intr_desc = &input->interrupt_descriptor;
> > > +	input->partition_id = hv_current_partition_id;
> > > +	input->device_id = device_id.as_uint64;
> > > +	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > +	intr_desc->target.vector = vector;
> > > +	intr_desc->vector_count = 1;
> > > +
> > > +	if (level)
> > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > +	else
> > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > +
> > > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> output) &
> > > +			 HV_HYPERCALL_RESULT_MASK;
> > > +	local_irq_restore(flags);
> > > +
> > > +	*entry = output->interrupt_entry;
> > > +
> > > +	return status;
> >
> > As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
> > mostly parallel, though some of the assignments are done in a different order.  It's a nit,
> > but making them as parallel as possible would be nice. :-)
> >
> 
> Indeed. I will see about factoring out a function.

If factoring out a separate helper function is clumsy, just having the parallel code
in the two functions be as similar as possible makes it easier to see what's the
same and what's different.

> 
> > Same 64 vCPU comment applies here as well.
> >
> 
> This is changed to use vpset instead. Took me a bit of time to get it
> working because document is a bit lacking.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > > index ccc849e25d5e..345d7c6f8c37 100644
> > > --- a/arch/x86/include/asm/mshyperv.h
> > > +++ b/arch/x86/include/asm/mshyperv.h
> > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > > hv_msi_entry *msi_entry,
> > >
> > >  struct irq_domain *hv_create_pci_msi_domain(void);
> > >
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> > > +		struct hv_interrupt_entry *entry);
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> > > +
> > >  #else /* CONFIG_HYPERV */
> > >  static inline void hyperv_init(void) {}
> > >  static inline void hyperv_setup_mmu_ops(void) {}
> > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > > index b7db6024e65c..6d35e4c303c6 100644
> > > --- a/drivers/iommu/hyperv-iommu.c
> > > +++ b/drivers/iommu/hyperv-iommu.c
> > > @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
> > >  	.free = hyperv_irq_remapping_free,
> > >  };
> > >
> > > +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
> > >  static int __init hyperv_prepare_irq_remapping(void)
> > >  {
> > >  	struct fwnode_handle *fn;
> > >  	int i;
> > > +	const char *name;
> > > +	const struct irq_domain_ops *ops;
> > >
> > >  	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> > >  	    x86_init.hyper.msi_ext_dest_id() ||
> > > -	    !x2apic_supported() || hv_root_partition)
> > > +	    !x2apic_supported())
> >
> > Any reason that the check for hv_root_partition was added
> > in patch #4  of this series, and then removed here?  Could
> > patch #4 just be dropped?
> >
> 
> Before v5 (or v4?) IO-APIC was not handled via Hyper-V IOMMU. Now it is.
> 
> Patch 4 has become redundant with that change. I already dropped patch 4
> in the v6 branch I have locally.
> 
> > >  		return -ENODEV;
> > >
> > > -	fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> > > +	if (hv_root_partition) {
> > > +		name = "HYPERV-ROOT-IR";
> > > +		ops = &hyperv_root_ir_domain_ops;
> > > +	} else {
> > > +		name = "HYPERV-IR";
> > > +		ops = &hyperv_ir_domain_ops;
> > > +	}
> > > +
> [...]
> > > +static void
> > > +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> > > +{
> > > +	u16 status;
> > > +	u32 vector;
> > > +	struct irq_cfg *cfg;
> > > +	int ioapic_id;
> > > +	struct cpumask *affinity;
> > > +	int cpu, vcpu;
> > > +	struct hv_interrupt_entry entry;
> > > +	struct hyperv_root_ir_data *data = irq_data->chip_data;
> > > +	struct IO_APIC_route_entry e;
> > > +
> > > +	cfg = irqd_cfg(irq_data);
> > > +	affinity = irq_data_get_effective_affinity_mask(irq_data);
> > > +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> > > +	vcpu = hv_cpu_number_to_vp_number(cpu);
> > > +
> > > +	vector = cfg->vector;
> > > +	ioapic_id = data->ioapic_id;
> > > +
> > > +	if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
> >
> > Does 'data' need to be checked to be non-NULL?  The parallel code in
> > hv_irq_compose_msi_msg() makes such a check.
> 
> The usage of irq_data->chip_data is different in these two functions.
> 
> In this function, we're sure it is correctly allocated by
> hyperv_root_ir_remapping_alloc at some point before.
> 
> In hv_irq_compose_msi_msg, irq_data->chip_data is instead used as a
> temporary place to stash some state that is controlled solely by the
> said function.
> 
> Once we get to the point of introducing a paravirtualized IOMMU for the
> root partition, we can then unify these two paths.

OK, thanks for the explanation.

> 
> >
> > > +	    && data->entry.ioapic_rte.as_uint64) {
> > > +		entry = data->entry;
> > > +
> > > +		status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> > > +
> > > +		if (status != HV_STATUS_SUCCESS)
> > > +			pr_debug("%s: unexpected unmap status %d\n", __func__,
> status);
> > > +
> > > +		data->entry.ioapic_rte.as_uint64 = 0;
> > > +		data->entry.source = 0; /* Invalid source */
> >
> > Again comparing, hv_irq_compose_msi_msg() frees the old
> > entry, and then allocates a new one.   This code reuses the old entry.
> > Any reason for the difference?
> >
> 
> See above.
> 
> I can perhaps tweak the logic a bit to reuse the same entry, but the
> overall design won't change. I opted to always reallocate because that
> looked more straight-forward to me.
> 
> Let me know if you feel strongly about reusing.

I don't feel strongly about reusing.  I was just comparing/contrasting
the two functions.

> 
> > > +	}
> > > +
> > > +
> > > +	status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, vcpu,
> > > +					vector, &entry);
> > > +
> > > +	if (status != HV_STATUS_SUCCESS) {
> > > +		pr_err("%s: map hypercall failed, status %d\n", __func__, status);
> > > +		return;
> > > +	}
> > > +
> > > +	data->entry = entry;
> > > +
> > > +	/* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> > > +	e.w1 = entry.ioapic_rte.low_uint32;
> > > +	e.w2 = entry.ioapic_rte.high_uint32;
> > > +
> > > +	memset(msg, 0, sizeof(*msg));
> > > +	msg->arch_data.vector = e.vector;
> > > +	msg->arch_data.delivery_mode = e.delivery_mode;
> > > +	msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> > > +	msg->arch_addr_lo.dmar_format = e.ir_format;
> > > +	msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> > > +}
> >
> > Having this whole function be more parallel to hv_irq_compose_msi_msg()
> > would be nice. :-)
> >
> 
> Unlike hv_map_ioapic_interrupt and hv_map_msi_interrupt, which can
> benefit from unifying now, this and hv_irq_compose_msi_msg will need to
> wait till we have an IOMMU for the reason I stated above.

OK.  Just having the code in the two functions be more parallel where
possible would make it easier to see similarities and differences.  But
it's not a big deal.

> 
> > > +
> > > +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> > > +		const struct cpumask *mask, bool force)
> > > +{
> > > +	struct irq_data *parent = data->parent_data;
> > > +	struct irq_cfg *cfg = irqd_cfg(data);
> > > +	int ret;
> > > +
> > > +	ret = parent->chip->irq_set_affinity(parent, mask, force);
> > > +	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> > > +		return ret;
> > > +
> > > +	send_cleanup_vector(cfg);
> > > +
> > > +	return 0;
> > > +}
> > > +
> [...]
> > > +
> > > +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> > > +				 unsigned int virq, unsigned int nr_irqs)
> > > +{
> > > +	struct irq_data *irq_data;
> > > +	struct hyperv_root_ir_data *data;
> > > +	struct hv_interrupt_entry *e;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < nr_irqs; i++) {
> > > +		irq_data = irq_domain_get_irq_data(domain, virq + i);
> > > +
> > > +		if (irq_data && irq_data->chip_data) {
> > > +			data = irq_data->chip_data;
> >
> > Set irq_data->chip_data to NULL?  That seems to be done in other
> > similar places in your code.
> 
> There is no need to do that. By the time this function returns, irq_data
> will be gone too -- freed by irq_domain_free_irqs_common.

OK

> 
> >
> > > +			e = &data->entry;
> > > +
> > > +			if (e->source == HV_DEVICE_TYPE_IOAPIC
> > > +			      && e->ioapic_rte.as_uint64)
> > > +				hv_unmap_ioapic_interrupt(data->ioapic_id,
> > > +							&data->entry);
> > > +
> > > +			kfree(data);
> > > +		}
> > > +	}
> > > +
> > > +	irq_domain_free_irqs_common(domain, virq, nr_irqs);
> > > +}
> > > +
> > > +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> > > +	.select = hyperv_irq_remapping_select,
> > > +	.alloc = hyperv_root_irq_remapping_alloc,
> > > +	.free = hyperv_root_irq_remapping_free,
> > > +};
> > > +
> > >  #endif
> > > --
> > > 2.20.1
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  2021-02-03 14:09           ` Wei Liu
@ 2021-02-04 16:46             ` Michael Kelley
  0 siblings, 0 replies; 59+ messages in thread
From: Michael Kelley @ 2021-02-04 16:46 UTC (permalink / raw)
  To: Wei Liu, Arnd Bergmann
  Cc: Linux on Hyper-V List, virtualization, Linux Kernel List,
	Vineeth Pillai, Sunil Muthuswamy, Nuno Das Neves, pasha.tatashin,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Arnd Bergmann,
	open list:GENERIC INCLUDE/ASM HEADER FILES

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 6:09 AM
> 
> On Wed, Feb 03, 2021 at 02:49:53PM +0100, Arnd Bergmann wrote:
> > On Wed, Feb 3, 2021 at 2:26 PM Wei Liu <wei.liu@kernel.org> wrote:
> > > On Tue, Feb 02, 2021 at 05:02:48PM +0000, Wei Liu wrote:
> > > > On Tue, Jan 26, 2021 at 01:26:52AM +0000, Michael Kelley wrote:
> > > > > From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, January 20, 2021 4:01 AM
> > > > > > +union hv_device_id {
> > > > > > + u64 as_uint64;
> > > > > > +
> > > > > > + struct {
> > > > > > +         u64 :62;
> > > > > > +         u64 device_type:2;
> > > > > > + };
> > > > >
> > > > > Are the above 4 lines extraneous junk?
> > > > > If not, a comment would be helpful.  And we
> > > > > would normally label the 62 bit field as
> > > > > "reserved0" or something similar.
> > > > >
> > > >
> > > > No. It is not junk. I got this from a header in tree.
> > > >
> > > > I am inclined to just drop this hunk. If that breaks things, I will use
> > > > "reserved0".
> > > >
> > >
> > > It turns out adding reserved0 is required. Dropping this hunk does not
> > > work.
> >
> > Generally speaking, bitfields are not great for specifying binary interfaces,
> > as the actual bit order can differ by architecture. The normal way we get
> > around it in the kernel is to use basic integer types and define macros
> > for bit masks. Ideally, each such field should also be marked with a
> > particular endianess as __le64 or __be64, in case this is ever used with
> > an Arm guest running a big-endian kernel.
> 
> Thanks for the information.
> 
> I think we will need to wait until Microsoft Hypervisor clearly defines
> the endianess in its header(s) before we can make changes to the copy in
> Linux.
> 
> >
> > That said, if you do not care about the specific order of the bits, having
> > anonymous bitfields for the reserved members is fine, I don't see a
> > reason to name it as reserved.
> 
> Michael, let me know what you think. I'm not too fussed either way.
> 
> Wei.

I'm OK either way.  In the Hyper-V code we've typically given such
fields a name rather than leave them anonymous, which is why it stuck
out.

Michael

> 
> >
> >       Arnd

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  2021-02-04 16:41       ` Michael Kelley
@ 2021-02-04 16:48         ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2021-02-04 16:48 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Linux on Hyper-V List, virtualization,
	Linux Kernel List, Vineeth Pillai, Sunil Muthuswamy,
	Nuno Das Neves, pasha.tatashin, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Joerg Roedel, Will Deacon,
	open list:IOMMU DRIVERS

On Thu, Feb 04, 2021 at 04:41:47PM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, February 3, 2021 4:47 AM
[...]
> > > > +
> > > > +	if (level)
> > > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > > +	else
> > > > +		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > > +
> > > > +	__set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > > +
> > > > +	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> > output) &
> > > > +			 HV_HYPERCALL_RESULT_MASK;
> > > > +	local_irq_restore(flags);
> > > > +
> > > > +	*entry = output->interrupt_entry;
> > > > +
> > > > +	return status;
> > >
> > > As a cross-check, I was comparing this code against hv_map_msi_interrupt().  They are
> > > mostly parallel, though some of the assignments are done in a different order.  It's a nit,
> > > but making them as parallel as possible would be nice. :-)
> > >
> > 
> > Indeed. I will see about factoring out a function.
> 
> If factoring out a separate helper function is clumsy, just having the parallel code
> in the two functions be as similar as possible makes it easier to see what's the
> same and what's different.
> 

No. It is not clumsy at all. I've done it in the newly posted v6.

I was baffled why I wrote hv_unmap_interrupt helper to be used by both
hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous
patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough
coffee that day. :-/

Thanks for pointing out that issue. It definitely helped improve the
quality of this series.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-02-04 19:29 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 12:00 [PATCH v5 00/16] Introducing Linux root partition support for Microsoft Hypervisor Wei Liu
2021-01-20 12:00 ` [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT Wei Liu
2021-01-20 15:57   ` Pavel Tatashin
2021-01-26  0:25   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition Wei Liu
2021-01-20 16:03   ` Pavel Tatashin
2021-01-26 15:06     ` Wei Liu
2021-01-26  0:31   ` Michael Kelley
2021-01-26 15:15     ` Wei Liu
2021-01-26 15:24       ` Wei Liu
2021-01-20 12:00 ` [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root Wei Liu
2021-01-20 16:06   ` Pavel Tatashin
2021-01-26  0:32   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root Wei Liu
2021-01-20 16:08   ` Pavel Tatashin
2021-01-26  0:33   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if " Wei Liu
2021-01-20 16:13   ` Pavel Tatashin
2021-01-26 15:19     ` Wei Liu
2021-01-26  0:34   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required Wei Liu
2021-01-20 15:12   ` kernel test robot
2021-01-20 19:44   ` Pavel Tatashin
2021-01-26  0:41   ` Michael Kelley
2021-01-26 18:09     ` Wei Liu
2021-01-20 12:00 ` [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary Wei Liu
2021-01-26  0:48   ` Michael Kelley
2021-02-02 15:03     ` Wei Liu
2021-02-04 16:33       ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root Wei Liu
2021-01-26  0:49   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions Wei Liu
2021-01-26  1:20   ` Michael Kelley
2021-02-02 16:19     ` Wei Liu
2021-01-20 12:00 ` [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus Wei Liu
2021-01-26  1:21   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry Wei Liu
2021-01-26  1:22   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry Wei Liu
2021-01-26  1:23   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures Wei Liu
2021-01-26  1:26   ` Michael Kelley
2021-02-02 17:02     ` Wei Liu
2021-02-03 13:26       ` Wei Liu
2021-02-03 13:49         ` Arnd Bergmann
2021-02-03 14:09           ` Wei Liu
2021-02-04 16:46             ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts Wei Liu
2021-01-26  1:27   ` Michael Kelley
2021-01-20 12:00 ` [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition Wei Liu
2021-01-27  5:47   ` Michael Kelley
2021-02-02 17:31     ` Wei Liu
2021-02-02 18:15       ` Michael Kelley
2021-02-02 18:16         ` Wei Liu
2021-01-20 12:00 ` [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping " Wei Liu
2021-01-27  5:47   ` Michael Kelley
2021-02-03 12:47     ` Wei Liu
2021-02-04 16:41       ` Michael Kelley
2021-02-04 16:48         ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).