linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
@ 2021-08-27 17:20 Tianyu Lan
  2021-08-27 17:20 ` [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
                   ` (13 more replies)
  0 siblings, 14 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:20 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is based on the Hyper-V next branch.

Change since V3:
	- Initalize GHCB page in the cpu init callbac.
	- Change vmbus_teardown_gpadl() parameter in order to
	  mask the memory back to non-visible to host.
	- Merge hv_ringbuffer_post_init() into hv_ringbuffer_init().
	- Keep Hyper-V bounce buffer size as same as AMD SEV VM
	- Use dma_map_sg() instead of dm_map_page() in the storvsc driver.

Change since V2:
       - Drop x86_set_memory_enc static call and use platform check
         in the __set_memory_enc_dec() to run platform callback of
	 set memory encrypted or decrypted.

Change since V1:
       - Introduce x86_set_memory_enc static call and so platforms can
         override __set_memory_enc_dec() with their implementation
       - Introduce sev_es_ghcb_hv_call_simple() and share code
         between SEV and Hyper-V code.
       - Not remap monitor pages in the non-SNP isolation VM
       - Make swiotlb_init_io_tlb_mem() return error code and return
         error when dma_map_decrypted() fails.

Change since RFC V4:
       - Introduce dma map decrypted function to remap bounce buffer
          and provide dma map decrypted ops for platform to hook callback.        
       - Split swiotlb and dma map decrypted change into two patches
       - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
       - Add interface set_memory_decrypted_map() to decrypt memory and
         map bounce buffer in extra address space
       - Remove swiotlb remap function and store the remap address
         returned by set_memory_decrypted_map() in swiotlb mem data structure.
       - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec().

Change since RFC v2:
       - Remove not UIO driver in Isolation VM patch
       - Use vmap_pfn() to replace ioremap_page_range function in
       order to avoid exposing symbol ioremap_page_range() and
       ioremap_page_range()
       - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted()
       - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
       - Fix code style


Tianyu Lan (13):
  x86/hyperv: Initialize GHCB page in Isolation VM
  x86/hyperv: Initialize shared memory boundary in the Isolation VM.
  x86/hyperv: Add new hvcall guest address host visibility support
  hyperv: Mark vmbus ring buffer visible to host in Isolation VM
  hyperv: Add Write/Read MSR registers via ghcb page
  hyperv: Add ghcb hvcall support for SNP VM
  hyperv/Vmbus: Add SNP support for VMbus channel initiate  message
  hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  hv_netvsc: Add Isolation VM support for netvsc driver
  hv_storvsc: Add Isolation VM support for storvsc driver

 arch/arm64/include/asm/mshyperv.h  |  23 ++
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  78 +++++--
 arch/x86/hyperv/ivm.c              | 325 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  17 ++
 arch/x86/include/asm/mshyperv.h    |  88 +++++++-
 arch/x86/include/asm/sev.h         |   3 +
 arch/x86/kernel/cpu/mshyperv.c     |   5 +
 arch/x86/kernel/sev-shared.c       |  63 +++---
 arch/x86/mm/mem_encrypt.c          |   3 +-
 arch/x86/mm/pat/set_memory.c       |  19 +-
 arch/x86/xen/pci-swiotlb-xen.c     |   3 +-
 drivers/hv/Kconfig                 |   1 +
 drivers/hv/channel.c               |  55 +++--
 drivers/hv/connection.c            |  81 ++++++-
 drivers/hv/hv.c                    | 120 +++++++----
 drivers/hv/hv_common.c             |  12 ++
 drivers/hv/hyperv_vmbus.h          |   1 +
 drivers/hv/ring_buffer.c           |  56 +++--
 drivers/hv/vmbus_drv.c             |   4 +
 drivers/iommu/hyperv-iommu.c       |  61 ++++++
 drivers/net/hyperv/hyperv_net.h    |   6 +
 drivers/net/hyperv/netvsc.c        | 151 +++++++++++++-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c         |  41 ++--
 drivers/uio/uio_hv_generic.c       |  14 +-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  19 +-
 include/linux/dma-map-ops.h        |   9 +
 include/linux/hyperv.h             |  15 +-
 include/linux/swiotlb.h            |   4 +
 kernel/dma/mapping.c               |  22 ++
 kernel/dma/swiotlb.c               |  32 ++-
 33 files changed, 1166 insertions(+), 170 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
@ 2021-08-27 17:20 ` Tianyu Lan
  2021-09-02  0:15   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the " Tianyu Lan
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:20 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via ghcb page.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Chagne since v3:
        * Rename ghcb_base to hv_ghcb_pg and move it out of
	  struct ms_hyperv_info.
	* Allocate hv_ghcb_pg before cpuhp_setup_state() and leverage
	  hv_cpu_init() to initialize ghcb page.
---
 arch/x86/hyperv/hv_init.c       | 68 +++++++++++++++++++++++++++++----
 arch/x86/include/asm/mshyperv.h |  4 ++
 arch/x86/kernel/cpu/mshyperv.c  |  3 ++
 include/asm-generic/mshyperv.h  |  1 +
 4 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 708a2712a516..eba10ed4f73e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include <linux/kexec.h>
 #include <linux/version.h>
 #include <linux/vmalloc.h>
+#include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/hyperv.h>
 #include <linux/slab.h>
@@ -36,12 +37,42 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
 void *hv_hypercall_pg;
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 
+void __percpu **hv_ghcb_pg;
+
 /* Storage to save the hypercall page temporarily for hibernation */
 static void *hv_hypercall_pg_saved;
 
 struct hv_vp_assist_page **hv_vp_assist_page;
 EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 
+static int hyperv_init_ghcb(void)
+{
+	u64 ghcb_gpa;
+	void *ghcb_va;
+	void **ghcb_base;
+
+	if (!hv_isolation_type_snp())
+		return 0;
+
+	if (!hv_ghcb_pg)
+		return -EINVAL;
+
+	/*
+	 * GHCB page is allocated by paravisor. The address
+	 * returned by MSR_AMD64_SEV_ES_GHCB is above shared
+	 * ghcb boundary and map it here.
+	 */
+	rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+	ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+	if (!ghcb_va)
+		return -ENOMEM;
+
+	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
+	*ghcb_base = ghcb_va;
+
+	return 0;
+}
+
 static int hv_cpu_init(unsigned int cpu)
 {
 	union hv_vp_assist_msr_contents msr = { 0 };
@@ -85,7 +116,7 @@ static int hv_cpu_init(unsigned int cpu)
 		}
 	}
 
-	return 0;
+	return hyperv_init_ghcb();
 }
 
 static void (*hv_reenlightenment_cb)(void);
@@ -177,6 +208,14 @@ static int hv_cpu_die(unsigned int cpu)
 {
 	struct hv_reenlightenment_control re_ctrl;
 	unsigned int new_cpu;
+	void **ghcb_va;
+
+	if (hv_ghcb_pg) {
+		ghcb_va = (void **)this_cpu_ptr(hv_ghcb_pg);
+		if (*ghcb_va)
+			memunmap(*ghcb_va);
+		*ghcb_va = NULL;
+	}
 
 	hv_common_cpu_die(cpu);
 
@@ -366,10 +405,16 @@ void __init hyperv_init(void)
 		goto common_free;
 	}
 
+	if (hv_isolation_type_snp()) {
+		hv_ghcb_pg = alloc_percpu(void *);
+		if (!hv_ghcb_pg)
+			goto free_vp_assist_page;
+	}
+
 	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
 				  hv_cpu_init, hv_cpu_die);
 	if (cpuhp < 0)
-		goto free_vp_assist_page;
+		goto free_ghcb_page;
 
 	/*
 	 * Setup the hypercall page and enable hypercalls.
@@ -383,10 +428,8 @@ void __init hyperv_init(void)
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
 			__builtin_return_address(0));
-	if (hv_hypercall_pg == NULL) {
-		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-		goto remove_cpuhp_state;
-	}
+	if (hv_hypercall_pg == NULL)
+		goto clean_guest_os_id;
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
@@ -456,8 +499,11 @@ void __init hyperv_init(void)
 	hv_query_ext_cap(0);
 	return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 	cpuhp_remove_state(cpuhp);
+free_ghcb_page:
+	free_percpu(hv_ghcb_pg);
 free_vp_assist_page:
 	kfree(hv_vp_assist_page);
 	hv_vp_assist_page = NULL;
@@ -559,3 +605,11 @@ bool hv_is_isolation_supported(void)
 {
 	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+bool hv_isolation_type_snp(void)
+{
+	return static_branch_unlikely(&isolation_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..37739a277ac6 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include <asm/paravirt.h>
 #include <asm/mshyperv.h>
 
+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
 		void *data);
@@ -39,6 +41,8 @@ extern void *hv_hypercall_pg;
 
 extern u64 hv_current_partition_id;
 
+extern void __percpu **hv_ghcb_pg;
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 6b5835a087a3..20557a9d6e25 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -316,6 +316,9 @@ static void __init ms_hyperv_init_platform(void)
 
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
+
+		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+			static_branch_enable(&isolation_type_snp);
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..0924bbd8458e 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -237,6 +237,7 @@ bool hv_is_hyperv_initialized(void);
 bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool hv_is_isolation_supported(void);
+bool hv_isolation_type_snp(void);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
 #else /* CONFIG_HYPERV */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the Isolation VM.
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
  2021-08-27 17:20 ` [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  0:15   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support Tianyu Lan
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* user BIT_ULL to get shared_gpa_boundary
	* Rename field Reserved* to reserved
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 12 +++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 20557a9d6e25..8bb001198316 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
 	if (ms_hyperv.priv_high & HV_ISOLATION) {
 		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
 		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+		ms_hyperv.shared_gpa_boundary =
+			BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
 
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0924bbd8458e..7537ae1db828 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,7 +35,17 @@ struct ms_hyperv_info {
 	u32 max_vp_index;
 	u32 max_lp_index;
 	u32 isolation_config_a;
-	u32 isolation_config_b;
+	union {
+		u32 isolation_config_b;
+		struct {
+			u32 cvm_type : 4;
+			u32 reserved11 : 1;
+			u32 shared_gpa_boundary_active : 1;
+			u32 shared_gpa_boundary_bits : 6;
+			u32 reserved12 : 20;
+		};
+	};
+	u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
  2021-08-27 17:20 ` [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
  2021-08-27 17:21 ` [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the " Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  0:16   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Fix error code handle in the __hv_set_mem_host_visibility().
	* Move HvCallModifySparseGpaPageHostVisibility near to enum
	  hv_mem_host_visibility.

Change since v2:
       * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
         according to platform check.

Change since v1:
       * Use new staic call x86_set_memory_enc to avoid add Hyper-V
         specific check in the set_memory code.
---
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |   6 ++
 arch/x86/hyperv/ivm.c              | 113 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  17 +++++
 arch/x86/include/asm/mshyperv.h    |   4 +-
 arch/x86/mm/pat/set_memory.c       |  19 +++--
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |   1 +
 8 files changed, 156 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
+obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index eba10ed4f73e..b1aa42f60faa 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -603,6 +603,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
 
 bool hv_is_isolation_supported(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return 0;
+
+	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+		return 0;
+
 	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
 
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index 000000000000..a069c788ce3c
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan <Tianyu.Lan@microsoft.com>
+ */
+
+#include <linux/hyperv.h>
+#include <linux/types.h>
+#include <linux/bitfield.h>
+#include <linux/slab.h>
+#include <asm/io.h>
+#include <asm/mshyperv.h>
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+			   enum hv_mem_host_visibility visibility)
+{
+	struct hv_gpa_range_for_visibility **input_pcpu, *input;
+	u16 pages_processed;
+	u64 hv_status;
+	unsigned long flags;
+
+	/* no-op if partition isolation is not enabled */
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+			HV_MAX_MODIFY_GPA_REP_COUNT);
+		return -EINVAL;
+	}
+
+	local_irq_save(flags);
+	input_pcpu = (struct hv_gpa_range_for_visibility **)
+			this_cpu_ptr(hyperv_pcpu_input_arg);
+	input = *input_pcpu;
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
+		return -EINVAL;
+	}
+
+	input->partition_id = HV_PARTITION_ID_SELF;
+	input->host_visibility = visibility;
+	input->reserved0 = 0;
+	input->reserved1 = 0;
+	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+	hv_status = hv_do_rep_hypercall(
+			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+			0, input, &pages_processed);
+	local_irq_restore(flags);
+
+	if (hv_result_success(hv_status))
+		return 0;
+	else
+		return -EFAULT;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount,
+				      enum hv_mem_host_visibility visibility)
+{
+	u64 *pfn_array;
+	int ret = 0;
+	int i, pfn;
+
+	if (!hv_is_isolation_supported() || !hv_hypercall_pg)
+		return 0;
+
+	pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+	if (!pfn_array)
+		return -ENOMEM;
+
+	for (i = 0, pfn = 0; i < pagecount; i++) {
+		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+		pfn++;
+
+		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+			ret = hv_mark_gpa_visibility(pfn, pfn_array,
+						     visibility);
+			if (ret)
+				goto err_free_pfn_array;
+			pfn = 0;
+		}
+	}
+
+ err_free_pfn_array:
+	kfree(pfn_array);
+	return ret;
+}
+
+/*
+ * hv_set_mem_host_visibility - Set specified memory visible to host.
+ *
+ * In Isolation VM, all guest memory is encrypted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host. This function works as wrap of hv_mark_gpa_visibility()
+ * with memory base and size.
+ */
+int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
+{
+	enum hv_mem_host_visibility visibility = visible ?
+			VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE;
+
+	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 2322d6bd5883..381e88122a5f 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -276,6 +276,23 @@ enum hv_isolation_type {
 #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
 #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
 
+/* Hyper-V memory host visibility */
+enum hv_mem_host_visibility {
+	VMBUS_PAGE_NOT_VISIBLE		= 0,
+	VMBUS_PAGE_VISIBLE_READ_ONLY	= 1,
+	VMBUS_PAGE_VISIBLE_READ_WRITE	= 3
+};
+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+#define HV_MAX_MODIFY_GPA_REP_COUNT	((PAGE_SIZE / sizeof(u64)) - 2)
+struct hv_gpa_range_for_visibility {
+	u64 partition_id;
+	u32 host_visibility:2;
+	u32 reserved0:30;
+	u32 reserved1;
+	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 37739a277ac6..ffb2af079c6b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -192,7 +192,9 @@ struct irq_domain *hv_create_pci_msi_domain(void);
 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 		struct hv_interrupt_entry *entry);
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+			   enum hv_mem_host_visibility visibility);
+int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..1e4a0882820a 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -29,6 +29,8 @@
 #include <asm/proto.h>
 #include <asm/memtype.h>
 #include <asm/set_memory.h>
+#include <asm/hyperv-tlfs.h>
+#include <asm/mshyperv.h>
 
 #include "../mm_internal.h"
 
@@ -1980,15 +1982,11 @@ int set_memory_global(unsigned long addr, int numpages)
 				    __pgprot(_PAGE_GLOBAL), 0);
 }
 
-static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
 	int ret;
 
-	/* Nothing to do if memory encryption is not active */
-	if (!mem_encrypt_active())
-		return 0;
-
 	/* Should not be working on unaligned addresses */
 	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
 		addr &= PAGE_MASK;
@@ -2023,6 +2021,17 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	return ret;
 }
 
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+	if (hv_is_isolation_supported())
+		return hv_set_mem_host_visibility(addr, numpages, !enc);
+
+	if (mem_encrypt_active())
+		return __set_memory_enc_pgtable(addr, numpages, enc);
+
+	return 0;
+}
+
 int set_memory_encrypted(unsigned long addr, int numpages)
 {
 	return __set_memory_enc_dec(addr, numpages, true);
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 56348a541c50..8ed6733d5146 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_RETARGET_INTERRUPT		0x007e
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
+#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
 
 /* Extended hypercalls */
 #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 7537ae1db828..aa55447b9700 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -254,6 +254,7 @@ bool hv_query_ext_cap(u64 cap_query);
 static inline bool hv_is_hyperv_initialized(void) { return false; }
 static inline bool hv_is_hibernation_supported(void) { return false; }
 static inline void hyperv_cleanup(void) {}
+static inline hv_is_isolation_supported(void);
 #endif /* CONFIG_HYPERV */
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (2 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-08-27 17:41   ` Greg KH
  2021-09-02  0:17   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
       * Change vmbus_teardown_gpadl() parameter and put gpadl handle,
       buffer and buffer size in the struct vmbus_gpadl.
---
 drivers/hv/channel.c            | 36 ++++++++++++++++++++++++++++-----
 drivers/net/hyperv/hyperv_net.h |  1 +
 drivers/net/hyperv/netvsc.c     | 16 +++++++++++----
 drivers/uio/uio_hv_generic.c    | 14 +++++++++++--
 include/linux/hyperv.h          |  8 +++++++-
 5 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..82650beb3af0 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include <linux/hyperv.h>
 #include <linux/uio.h>
 #include <linux/interrupt.h>
+#include <linux/set_memory.h>
 #include <asm/page.h>
 #include <asm/mshyperv.h>
 
@@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	if (ret)
 		return ret;
 
+	ret = set_memory_decrypted((unsigned long)kbuffer,
+				   HVPFN_UP(size));
+	if (ret) {
+		pr_warn("Failed to set host visibility for new GPADL %d.\n", ret);
+		return ret;
+	}
+
 	init_completion(&msginfo->waitevent);
 	msginfo->waiting_channel = channel;
 
@@ -549,6 +557,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	}
 
 	kfree(msginfo);
+
+	if (ret)
+		set_memory_encrypted((unsigned long)kbuffer,
+				     HVPFN_UP(size));
+
 	return ret;
 }
 
@@ -639,6 +652,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	struct vmbus_channel_open_channel *open_msg;
 	struct vmbus_channel_msginfo *open_info = NULL;
 	struct page *page = newchannel->ringbuffer_page;
+	struct vmbus_gpadl gpadl;
 	u32 send_pages, recv_pages;
 	unsigned long flags;
 	int err;
@@ -759,7 +773,10 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 error_free_info:
 	kfree(open_info);
 error_free_gpadl:
-	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
+	gpadl.gpadl_handle = newchannel->ringbuffer_gpadlhandle;
+	gpadl.buffer = page_address(newchannel->ringbuffer_page);
+	gpadl.size = (send_pages + recv_pages) << PAGE_SHIFT;
+	vmbus_teardown_gpadl(newchannel, &gpadl);
 	newchannel->ringbuffer_gpadlhandle = 0;
 error_clean_ring:
 	hv_ringbuffer_cleanup(&newchannel->outbound);
@@ -806,7 +823,7 @@ EXPORT_SYMBOL_GPL(vmbus_open);
 /*
  * vmbus_teardown_gpadl -Teardown the specified GPADL handle
  */
-int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
+int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl *gpadl)
 {
 	struct vmbus_channel_gpadl_teardown *msg;
 	struct vmbus_channel_msginfo *info;
@@ -825,7 +842,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 
 	msg->header.msgtype = CHANNELMSG_GPADL_TEARDOWN;
 	msg->child_relid = channel->offermsg.child_relid;
-	msg->gpadl = gpadl_handle;
+	msg->gpadl = gpadl->gpadl_handle;
 
 	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
 	list_add_tail(&info->msglistentry,
@@ -859,6 +876,12 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
 
 	kfree(info);
+
+	ret = set_memory_encrypted((unsigned long)gpadl->buffer,
+				   HVPFN_UP(gpadl->size));
+	if (ret)
+		pr_warn("Fail to set mem host visibility in GPADL teardown %d.\n", ret);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
@@ -896,6 +919,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
 static int vmbus_close_internal(struct vmbus_channel *channel)
 {
 	struct vmbus_channel_close_channel *msg;
+	struct vmbus_gpadl gpadl;
 	int ret;
 
 	vmbus_reset_channel_cb(channel);
@@ -934,8 +958,10 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
 
 	/* Tear down the gpadl for the channel's ring buffer */
 	else if (channel->ringbuffer_gpadlhandle) {
-		ret = vmbus_teardown_gpadl(channel,
-					   channel->ringbuffer_gpadlhandle);
+		gpadl.gpadl_handle = channel->ringbuffer_gpadlhandle;
+		gpadl.buffer = page_address(channel->ringbuffer_page);
+		gpadl.size = channel->ringbuffer_pagecount;
+		ret = vmbus_teardown_gpadl(channel, &gpadl);
 		if (ret) {
 			pr_err("Close failed: teardown gpadl return %d\n", ret);
 			/*
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..aa7c9962dbd8 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1082,6 +1082,7 @@ struct netvsc_device {
 
 	/* Send buffer allocated by us */
 	void *send_buf;
+	u32 send_buf_size;
 	u32 send_buf_gpadl_handle;
 	u32 send_section_cnt;
 	u32 send_section_size;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..f19bffff6a63 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -276,11 +276,14 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
 				       struct netvsc_device *net_device,
 				       struct net_device *ndev)
 {
+	struct vmbus_gpadl gpadl;
 	int ret;
 
 	if (net_device->recv_buf_gpadl_handle) {
-		ret = vmbus_teardown_gpadl(device->channel,
-					   net_device->recv_buf_gpadl_handle);
+		gpadl.gpadl_handle = net_device->recv_buf_gpadl_handle;
+		gpadl.buffer = net_device->recv_buf;
+		gpadl.size = net_device->recv_buf_size;
+		ret = vmbus_teardown_gpadl(device->channel, &gpadl);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -298,11 +301,15 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
 				       struct netvsc_device *net_device,
 				       struct net_device *ndev)
 {
+	struct vmbus_gpadl gpadl;
 	int ret;
 
 	if (net_device->send_buf_gpadl_handle) {
-		ret = vmbus_teardown_gpadl(device->channel,
-					   net_device->send_buf_gpadl_handle);
+		gpadl.gpadl_handle = net_device->send_buf_gpadl_handle;
+		gpadl.buffer = net_device->send_buf;
+		gpadl.size = net_device->send_buf_size;
+
+		ret = vmbus_teardown_gpadl(device->channel, &gpadl);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -463,6 +470,7 @@ static int netvsc_init_buf(struct hv_device *device,
 		ret = -ENOMEM;
 		goto cleanup;
 	}
+	net_device->send_buf_size = buf_size;
 
 	/* Establish the gpadl handle for this buffer on this
 	 * channel.  Note: This call uses the vmbus connection rather
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 652fe2547587..13c5df8dd11d 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -179,14 +179,24 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
 static void
 hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
 {
+	struct vmbus_gpadl gpadl;
+
 	if (pdata->send_gpadl) {
-		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl);
+		gpadl.gpadl_handle = pdata->send_gpadl;
+		gpadl.buffer = pdata->send_buf;
+		gpadl.size = SEND_BUFFER_SIZE;
+
+		vmbus_teardown_gpadl(dev->channel, &gpadl);
 		pdata->send_gpadl = 0;
 		vfree(pdata->send_buf);
 	}
 
 	if (pdata->recv_gpadl) {
-		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl);
+		gpadl.gpadl_handle = pdata->recv_gpadl;
+		gpadl.buffer = pdata->recv_buf;
+		gpadl.size = RECV_BUFFER_SIZE;
+
+		vmbus_teardown_gpadl(dev->channel, &gpadl);
 		pdata->recv_gpadl = 0;
 		vfree(pdata->recv_buf);
 	}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index ddc8713ce57b..757e09606fd3 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -803,6 +803,12 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl {
+	u32 gpadl_handle;
+	u32 size;
+	void *buffer;
+};
+
 struct vmbus_channel {
 	struct list_head listentry;
 
@@ -1195,7 +1201,7 @@ extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
 				      u32 *gpadl_handle);
 
 extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
-				     u32 gpadl_handle);
+				     struct vmbus_gpadl *gpadl);
 
 void vmbus_reset_channel_cb(struct vmbus_channel *channel);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (3 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-08-27 17:41   ` Greg KH
  2021-09-02  3:32   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM Tianyu Lan
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyperv provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyperv requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB page.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v1:
         * Introduce sev_es_ghcb_hv_call_simple() and share code
           between SEV and Hyper-V code.
Change since v3:
         * Pass old_msg_type to hv_signal_eom() as parameter.
	 * Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
	 * Add hv_isolation_type_snp() weak function.
	 * Add maros to set syinc register in ARM code.
---
 arch/arm64/include/asm/mshyperv.h |  23 ++++++
 arch/x86/hyperv/hv_init.c         |  36 ++--------
 arch/x86/hyperv/ivm.c             | 112 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h   |  80 ++++++++++++++++++++-
 arch/x86/include/asm/sev.h        |   3 +
 arch/x86/kernel/sev-shared.c      |  63 ++++++++++-------
 drivers/hv/hv.c                   | 112 ++++++++++++++++++++----------
 drivers/hv/hv_common.c            |   6 ++
 include/asm-generic/mshyperv.h    |   4 +-
 9 files changed, 345 insertions(+), 94 deletions(-)

diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index 20070a847304..ced83297e009 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -41,6 +41,29 @@ static inline u64 hv_get_register(unsigned int reg)
 	return hv_get_vpreg(reg);
 }
 
+#define hv_get_simp(val)	{ val = hv_get_register(HV_REGISTER_SIMP); }
+#define hv_set_simp(val)	hv_set_register(HV_REGISTER_SIMP, val)
+
+#define hv_get_siefp(val)	{ val = hv_get_register(HV_REGISTER_SIEFP); }
+#define hv_set_siefp(val)	hv_set_register(HV_REGISTER_SIEFP, val)
+
+#define hv_get_synint_state(int_num, val) {			\
+	val = hv_get_register(HV_REGISTER_SINT0 + int_num);	\
+	}
+
+#define hv_set_synint_state(int_num, val)			\
+	hv_set_register(HV_REGISTER_SINT0 + int_num, val)
+
+#define hv_get_synic_state(val) {			\
+	val = hv_get_register(HV_REGISTER_SCONTROL);	\
+	}
+
+#define hv_set_synic_state(val)			\
+	hv_set_register(HV_REGISTER_SCONTROL, val)
+
+#define hv_signal_eom(old_msg_type)		 \
+	hv_set_register(HV_REGISTER_EOM, 0)
+
 /* SMCCC hypercall parameters */
 #define HV_SMCCC_FUNC_NUMBER	1
 #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b1aa42f60faa..be6210a3fd2f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
 void *hv_hypercall_pg;
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 
-void __percpu **hv_ghcb_pg;
+union hv_ghcb __percpu **hv_ghcb_pg;
 
 /* Storage to save the hypercall page temporarily for hibernation */
 static void *hv_hypercall_pg_saved;
@@ -406,7 +406,7 @@ void __init hyperv_init(void)
 	}
 
 	if (hv_isolation_type_snp()) {
-		hv_ghcb_pg = alloc_percpu(void *);
+		hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
 		if (!hv_ghcb_pg)
 			goto free_vp_assist_page;
 	}
@@ -424,6 +424,9 @@ void __init hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
+	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
+
 	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -501,6 +504,7 @@ void __init hyperv_init(void)
 
 clean_guest_os_id:
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 	cpuhp_remove_state(cpuhp);
 free_ghcb_page:
 	free_percpu(hv_ghcb_pg);
@@ -522,6 +526,7 @@ void hyperv_cleanup(void)
 
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
 	/*
 	 * Reset hypercall page reference before reset the page,
@@ -592,30 +597,3 @@ bool hv_is_hyperv_initialized(void)
 	return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-	if (!(ms_hyperv.priv_high & HV_ISOLATION))
-		return HV_ISOLATION_TYPE_NONE;
-	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
-		return 0;
-
-	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
-		return 0;
-
-	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
-
-bool hv_isolation_type_snp(void)
-{
-	return static_branch_unlikely(&isolation_type_snp);
-}
-EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index a069c788ce3c..f56fe4f73000 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,13 +6,125 @@
  *  Tianyu Lan <Tianyu.Lan@microsoft.com>
  */
 
+#include <linux/types.h>
+#include <linux/bitfield.h>
 #include <linux/hyperv.h>
 #include <linux/types.h>
 #include <linux/bitfield.h>
 #include <linux/slab.h>
+#include <asm/svm.h>
+#include <asm/sev.h>
 #include <asm/io.h>
 #include <asm/mshyperv.h>
 
+union hv_ghcb {
+	struct ghcb ghcb;
+} __packed __aligned(HV_HYP_PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!hv_ghcb_pg)
+		return;
+
+	WARN_ON(in_nmi());
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
+	ghcb_set_rdx(&hv_ghcb->ghcb, upper_32_bits(value));
+
+	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
+		pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+	local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	/* Check size of union hv_ghcb here. */
+	BUILD_BUG_ON(sizeof(union hv_ghcb) != HV_HYP_PAGE_SIZE);
+
+	if (!hv_ghcb_pg)
+		return;
+
+	WARN_ON(in_nmi());
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
+		pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+	else
+		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+	local_irq_restore(flags);
+}
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+	hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
+{
+	hv_ghcb_msr_write(msr, value);
+
+	/* Write proxy bit vua wrmsrl instruction. */
+	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
+		wrmsrl(msr, value | 1 << 20);
+}
+EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
+
+enum hv_isolation_type hv_get_isolation_type(void)
+{
+	if (!(ms_hyperv.priv_high & HV_ISOLATION))
+		return HV_ISOLATION_TYPE_NONE;
+	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
+}
+EXPORT_SYMBOL_GPL(hv_get_isolation_type);
+
+/*
+ * hv_is_isolation_supported - Check system runs in the Hyper-V
+ * isolation VM.
+ */
+bool hv_is_isolation_supported(void)
+{
+	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
+}
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+/*
+ * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
+ * isolation VM.
+ */
+bool hv_isolation_type_snp(void)
+{
+	return static_branch_unlikely(&isolation_type_snp);
+}
+
 /*
  * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
  *
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffb2af079c6b..b77f4caee3ee 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include <asm/paravirt.h>
 #include <asm/mshyperv.h>
 
+union hv_ghcb;
+
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
 
 typedef int (*hyperv_fill_flush_list_func)(
@@ -30,6 +32,61 @@ static inline u64 hv_get_register(unsigned int reg)
 	return value;
 }
 
+#define hv_get_sint_reg(val, reg) {		\
+	if (hv_isolation_type_snp())		\
+		hv_get_##reg##_ghcb(&val);	\
+	else					\
+		rdmsrl(HV_REGISTER_##reg, val);	\
+	}
+
+#define hv_set_sint_reg(val, reg) {		\
+	if (hv_isolation_type_snp())		\
+		hv_set_##reg##_ghcb(val);	\
+	else					\
+		wrmsrl(HV_REGISTER_##reg, val);	\
+	}
+
+
+#define hv_get_simp(val) hv_get_sint_reg(val, SIMP)
+#define hv_get_siefp(val) hv_get_sint_reg(val, SIEFP)
+
+#define hv_set_simp(val) hv_set_sint_reg(val, SIMP)
+#define hv_set_siefp(val) hv_set_sint_reg(val, SIEFP)
+
+#define hv_get_synic_state(val) {			\
+	if (hv_isolation_type_snp())			\
+		hv_get_synic_state_ghcb(&val);		\
+	else						\
+		rdmsrl(HV_REGISTER_SCONTROL, val);	\
+	}
+#define hv_set_synic_state(val) {			\
+	if (hv_isolation_type_snp())			\
+		hv_set_synic_state_ghcb(val);		\
+	else						\
+		wrmsrl(HV_REGISTER_SCONTROL, val);	\
+	}
+
+#define hv_signal_eom(old_msg_type) {		 \
+	if (hv_isolation_type_snp() &&		 \
+	    old_msg_type != HVMSG_TIMER_EXPIRED) \
+		hv_sint_wrmsrl_ghcb(HV_REGISTER_EOM, 0); \
+	else						\
+		wrmsrl(HV_REGISTER_EOM, 0);		\
+	}
+
+#define hv_get_synint_state(int_num, val) {		\
+	if (hv_isolation_type_snp())			\
+		hv_get_synint_state_ghcb(int_num, &val);\
+	else						\
+		rdmsrl(HV_REGISTER_SINT0 + int_num, val);\
+	}
+#define hv_set_synint_state(int_num, val) {		\
+	if (hv_isolation_type_snp())			\
+		hv_set_synint_state_ghcb(int_num, val);	\
+	else						\
+		wrmsrl(HV_REGISTER_SINT0 + int_num, val);\
+	}
+
 #define hv_get_raw_timer() rdtsc_ordered()
 
 void hyperv_vector_handler(struct pt_regs *regs);
@@ -41,7 +98,7 @@ extern void *hv_hypercall_pg;
 
 extern u64 hv_current_partition_id;
 
-extern void __percpu **hv_ghcb_pg;
+extern union hv_ghcb  __percpu **hv_ghcb_pg;
 
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
@@ -195,6 +252,25 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 			   enum hv_mem_host_visibility visibility);
 int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
+void hv_signal_eom_ghcb(void);
+void hv_ghcb_msr_write(u64 msr, u64 value);
+void hv_ghcb_msr_read(u64 msr, u64 *value);
+
+#define hv_get_synint_state_ghcb(int_num, val)			\
+	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+#define hv_set_synint_state_ghcb(int_num, val) \
+	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+
+#define hv_get_SIMP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
+#define hv_set_SIMP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
+
+#define hv_get_SIEFP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+#define hv_set_SIEFP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+
+#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
+#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
@@ -211,9 +287,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
 {
 	return -1;
 }
+static inline void hv_signal_eom_ghcb(void) { };
 #endif /* CONFIG_HYPERV */
 
-
 #include <asm-generic/mshyperv.h>
 
 #endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..81beb2a8031b 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -81,6 +81,9 @@ static __always_inline void sev_es_nmi_complete(void)
 		__sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 9f90f460a28c..dd7f37de640b 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -94,10 +94,9 @@ static void vc_finish_insn(struct es_em_ctxt *ctxt)
 	ctxt->regs->ip += ctxt->insn.length;
 }
 
-static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
-					  struct es_em_ctxt *ctxt,
-					  u64 exit_code, u64 exit_info_1,
-					  u64 exit_info_2)
+enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2)
 {
 	enum es_result ret;
 
@@ -109,29 +108,45 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
 	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
 	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
 
-	sev_es_wr_ghcb_msr(__pa(ghcb));
 	VMGEXIT();
 
-	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
-		u64 info = ghcb->save.sw_exit_info_2;
-		unsigned long v;
-
-		info = ghcb->save.sw_exit_info_2;
-		v = info & SVM_EVTINJ_VEC_MASK;
-
-		/* Check if exception information from hypervisor is sane. */
-		if ((info & SVM_EVTINJ_VALID) &&
-		    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
-		    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
-			ctxt->fi.vector = v;
-			if (info & SVM_EVTINJ_VALID_ERR)
-				ctxt->fi.error_code = info >> 32;
-			ret = ES_EXCEPTION;
-		} else {
-			ret = ES_VMM_ERROR;
-		}
-	} else {
+	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1)
+		ret = ES_VMM_ERROR;
+	else
 		ret = ES_OK;
+
+	return ret;
+}
+
+static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+				   struct es_em_ctxt *ctxt,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2)
+{
+	unsigned long v;
+	enum es_result ret;
+	u64 info;
+
+	sev_es_wr_ghcb_msr(__pa(ghcb));
+
+	ret = sev_es_ghcb_hv_call_simple(ghcb, exit_code, exit_info_1,
+					 exit_info_2);
+	if (ret == ES_OK)
+		return ret;
+
+	info = ghcb->save.sw_exit_info_2;
+	v = info & SVM_EVTINJ_VEC_MASK;
+
+	/* Check if exception information from hypervisor is sane. */
+	if ((info & SVM_EVTINJ_VALID) &&
+	    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
+	    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
+		ctxt->fi.vector = v;
+		if (info & SVM_EVTINJ_VALID_ERR)
+			ctxt->fi.error_code = info >> 32;
+		ret = ES_EXCEPTION;
+	} else {
+		ret = ES_VMM_ERROR;
 	}
 
 	return ret;
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index e83507f49676..97b21256a9db 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -8,6 +8,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
@@ -136,17 +137,24 @@ int hv_synic_alloc(void)
 		tasklet_init(&hv_cpu->msg_dpc,
 			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
 
-		hv_cpu->synic_message_page =
-			(void *)get_zeroed_page(GFP_ATOMIC);
-		if (hv_cpu->synic_message_page == NULL) {
-			pr_err("Unable to allocate SYNIC message page\n");
-			goto err;
-		}
+		/*
+		 * Synic message and event pages are allocated by paravisor.
+		 * Skip these pages allocation here.
+		 */
+		if (!hv_isolation_type_snp()) {
+			hv_cpu->synic_message_page =
+				(void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->synic_message_page == NULL) {
+				pr_err("Unable to allocate SYNIC message page\n");
+				goto err;
+			}
 
-		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
-		if (hv_cpu->synic_event_page == NULL) {
-			pr_err("Unable to allocate SYNIC event page\n");
-			goto err;
+			hv_cpu->synic_event_page =
+				(void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->synic_event_page == NULL) {
+				pr_err("Unable to allocate SYNIC event page\n");
+				goto err;
+			}
 		}
 
 		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
@@ -199,26 +207,43 @@ void hv_synic_enable_regs(unsigned int cpu)
 	union hv_synic_scontrol sctrl;
 
 	/* Setup the Synic's message page */
-	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
+	hv_get_simp(simp.as_uint64);
 	simp.simp_enabled = 1;
-	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
-		>> HV_HYP_PAGE_SHIFT;
 
-	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
+	if (hv_isolation_type_snp()) {
+		hv_cpu->synic_message_page
+			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
+				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+		if (!hv_cpu->synic_message_page)
+			pr_err("Fail to map syinc message page.\n");
+	} else {
+		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+			>> HV_HYP_PAGE_SHIFT;
+	}
+
+	hv_set_simp(simp.as_uint64);
 
 	/* Setup the Synic's event page */
-	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
+	hv_get_siefp(siefp.as_uint64);
 	siefp.siefp_enabled = 1;
-	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
-		>> HV_HYP_PAGE_SHIFT;
 
-	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
+	if (hv_isolation_type_snp()) {
+		hv_cpu->synic_event_page =
+			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
+				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+
+		if (!hv_cpu->synic_event_page)
+			pr_err("Fail to map syinc event page.\n");
+	} else {
+		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+			>> HV_HYP_PAGE_SHIFT;
+	}
+	hv_set_siefp(siefp.as_uint64);
 
 	/* Setup the shared SINT. */
 	if (vmbus_irq != -1)
 		enable_percpu_irq(vmbus_irq, 0);
-	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
-					VMBUS_MESSAGE_SINT);
+	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 
 	shared_sint.vector = vmbus_interrupt;
 	shared_sint.masked = false;
@@ -233,14 +258,12 @@ void hv_synic_enable_regs(unsigned int cpu)
 #else
 	shared_sint.auto_eoi = 0;
 #endif
-	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
-				shared_sint.as_uint64);
+	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 
 	/* Enable the global synic bit */
-	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
+	hv_get_synic_state(sctrl.as_uint64);
 	sctrl.enable = 1;
-
-	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
+	hv_set_synic_state(sctrl.as_uint64);
 }
 
 int hv_synic_init(unsigned int cpu)
@@ -257,37 +280,50 @@ int hv_synic_init(unsigned int cpu)
  */
 void hv_synic_disable_regs(unsigned int cpu)
 {
+	struct hv_per_cpu_context *hv_cpu
+		= per_cpu_ptr(hv_context.cpu_context, cpu);
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_scontrol sctrl;
 
-	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
-					VMBUS_MESSAGE_SINT);
-
+	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 	shared_sint.masked = 1;
+	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
 
 	/* Need to correctly cleanup in the case of SMP!!! */
 	/* Disable the interrupt */
-	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
-				shared_sint.as_uint64);
+	hv_get_simp(simp.as_uint64);
 
-	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
+	/*
+	 * In Isolation VM, sim and sief pages are allocated by
+	 * paravisor. These pages also will be used by kdump
+	 * kernel. So just reset enable bit here and keep page
+	 * addresses.
+	 */
 	simp.simp_enabled = 0;
-	simp.base_simp_gpa = 0;
+	if (hv_isolation_type_snp())
+		memunmap(hv_cpu->synic_message_page);
+	else
+		simp.base_simp_gpa = 0;
 
-	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
+	hv_set_simp(simp.as_uint64);
 
-	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
+	hv_get_siefp(siefp.as_uint64);
 	siefp.siefp_enabled = 0;
-	siefp.base_siefp_gpa = 0;
 
-	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
+	if (hv_isolation_type_snp())
+		memunmap(hv_cpu->synic_event_page);
+	else
+		siefp.base_siefp_gpa = 0;
+
+	hv_set_siefp(siefp.as_uint64);
 
 	/* Disable the global synic bit */
-	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
+	hv_get_synic_state(sctrl.as_uint64);
 	sctrl.enable = 0;
-	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
+	hv_set_synic_state(sctrl.as_uint64);
 
 	if (vmbus_irq != -1)
 		disable_percpu_irq(vmbus_irq);
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c0d9048a4112..1fc82d237161 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -249,6 +249,12 @@ bool __weak hv_is_isolation_supported(void)
 }
 EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
 
+bool __weak hv_isolation_type_snp(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
+
 void __weak hv_setup_vmbus_handler(void (*handler)(void))
 {
 }
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index aa55447b9700..04a687d95eac 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -24,6 +24,7 @@
 #include <linux/cpumask.h>
 #include <linux/nmi.h>
 #include <asm/ptrace.h>
+#include <asm/mshyperv.h>
 #include <asm/hyperv-tlfs.h>
 
 struct ms_hyperv_info {
@@ -54,6 +55,7 @@ extern void  __percpu  **hyperv_pcpu_output_arg;
 
 extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
+extern bool hv_isolation_type_snp(void);
 
 /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
 static inline int hv_result(u64 status)
@@ -148,7 +150,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 		 * possibly deliver another msg from the
 		 * hypervisor
 		 */
-		hv_set_register(HV_REGISTER_EOM, 0);
+		hv_signal_eom(old_msg_type);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (4 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  0:20   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

hyperv provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Add hv_ghcb_hypercall() stub function to avoid
	  compile error for ARM.
---
 arch/x86/hyperv/ivm.c          | 71 ++++++++++++++++++++++++++++++++++
 drivers/hv/connection.c        |  6 ++-
 drivers/hv/hv.c                |  8 +++-
 drivers/hv/hv_common.c         |  6 +++
 include/asm-generic/mshyperv.h |  1 +
 5 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index f56fe4f73000..e761c67e2218 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -17,10 +17,81 @@
 #include <asm/io.h>
 #include <asm/mshyperv.h>
 
+#define GHCB_USAGE_HYPERV_CALL	1
+
 union hv_ghcb {
 	struct ghcb ghcb;
+	struct {
+		u64 hypercalldata[509];
+		u64 outputgpa;
+		union {
+			union {
+				struct {
+					u32 callcode        : 16;
+					u32 isfast          : 1;
+					u32 reserved1       : 14;
+					u32 isnested        : 1;
+					u32 countofelements : 12;
+					u32 reserved2       : 4;
+					u32 repstartindex   : 12;
+					u32 reserved3       : 4;
+				};
+				u64 asuint64;
+			} hypercallinput;
+			union {
+				struct {
+					u16 callstatus;
+					u16 reserved1;
+					u32 elementsprocessed : 12;
+					u32 reserved2         : 20;
+				};
+				u64 asunit64;
+			} hypercalloutput;
+		};
+		u64 reserved2;
+	} hypercall;
 } __packed __aligned(HV_HYP_PAGE_SIZE);
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!hv_ghcb_pg)
+		return -EFAULT;
+
+	WARN_ON(in_nmi());
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return -EFAULT;
+	}
+
+	hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
+	hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
+
+	hv_ghcb->hypercall.outputgpa = (u64)output;
+	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+	hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+	if (input_size)
+		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+	VMGEXIT();
+
+	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
+	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+	local_irq_restore(flags);
+
+	return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
 	union hv_ghcb *hv_ghcb;
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
 	++channel->sig_events;
 
-	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+	if (hv_isolation_type_snp())
+		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
+				NULL, sizeof(u64));
+	else
+		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 97b21256a9db..d4531c64d9d3 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
 	aligned_msg->payload_size = payload_size;
 	memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-	status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+	if (hv_isolation_type_snp())
+		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+				(void *)aligned_msg, NULL,
+				sizeof(struct hv_input_post_message));
+	else
+		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+				aligned_msg, NULL);
 
 	/* Preemption must remain disabled until after the hypercall
 	 * so some other thread can't get scheduled onto this cpu and
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 1fc82d237161..7be173a99f27 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -289,3 +289,9 @@ void __weak hyperv_cleanup(void)
 {
 }
 EXPORT_SYMBOL_GPL(hyperv_cleanup);
+
+u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+	return HV_STATUS_INVALID_PARAMETER;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 04a687d95eac..0da45807c36a 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -250,6 +250,7 @@ bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool hv_is_isolation_supported(void);
 bool hv_isolation_type_snp(void);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
 #else /* CONFIG_HYPERV */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate  message
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (5 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  0:21   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary).

Introduce monitor_pages_original[] in the struct vmbus_connection
to store monitor page virtual address returned by hv_alloc_hyperv_
zeroed_page() and free monitor page via monitor_pages_original in
the vmbus_disconnect(). The monitor_pages[] is to used to access
monitor page and it is initialized to be equal with monitor_pages_
original. The monitor_pages[] will be overridden in the isolation VM
with va of extra address.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Rename monitor_pages_va with monitor_pages_original
	* free monitor page via monitor_pages_original and
	  monitor_pages is used to access monitor page.

Change since v1:
        * Not remap monitor pages in the non-SNP isolation VM.
---
 drivers/hv/connection.c   | 75 ++++++++++++++++++++++++++++++++++++---
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..9a48d8115c87 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
 #include <linux/vmalloc.h>
 #include <linux/hyperv.h>
 #include <linux/export.h>
+#include <linux/io.h>
 #include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 
 	msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
 	msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+	if (hv_isolation_type_snp()) {
+		msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+		msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+	}
+
 	msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
 	/*
@@ -148,6 +155,35 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 		return -ECONNREFUSED;
 	}
 
+
+	if (hv_is_isolation_supported()) {
+		if (hv_isolation_type_snp()) {
+			vmbus_connection.monitor_pages[0]
+				= memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+					   MEMREMAP_WB);
+			if (!vmbus_connection.monitor_pages[0])
+				return -ENOMEM;
+
+			vmbus_connection.monitor_pages[1]
+				= memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+					   MEMREMAP_WB);
+			if (!vmbus_connection.monitor_pages[1]) {
+				memunmap(vmbus_connection.monitor_pages[0]);
+				return -ENOMEM;
+			}
+		}
+
+		/*
+		 * Set memory host visibility hvcall smears memory
+		 * and so zero monitor pages here.
+		 */
+		memset(vmbus_connection.monitor_pages[0], 0x00,
+		       HV_HYP_PAGE_SIZE);
+		memset(vmbus_connection.monitor_pages[1], 0x00,
+		       HV_HYP_PAGE_SIZE);
+
+	}
+
 	return ret;
 }
 
@@ -159,6 +195,7 @@ int vmbus_connect(void)
 	struct vmbus_channel_msginfo *msginfo = NULL;
 	int i, ret = 0;
 	__u32 version;
+	u64 pfn[2];
 
 	/* Initialize the vmbus connection */
 	vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +253,21 @@ int vmbus_connect(void)
 		goto cleanup;
 	}
 
+	vmbus_connection.monitor_pages_original[0]
+		= vmbus_connection.monitor_pages[0];
+	vmbus_connection.monitor_pages_original[1]
+		= vmbus_connection.monitor_pages[1];
+
+	if (hv_is_isolation_supported()) {
+		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+		if (hv_mark_gpa_visibility(2, pfn,
+				VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+			ret = -EFAULT;
+			goto cleanup;
+		}
+	}
+
 	msginfo = kzalloc(sizeof(*msginfo) +
 			  sizeof(struct vmbus_channel_initiate_contact),
 			  GFP_KERNEL);
@@ -284,6 +336,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+	u64 pfn[2];
+
 	/*
 	 * First send the unload request to the host.
 	 */
@@ -303,10 +357,23 @@ void vmbus_disconnect(void)
 		vmbus_connection.int_page = NULL;
 	}
 
-	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
-	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
-	vmbus_connection.monitor_pages[0] = NULL;
-	vmbus_connection.monitor_pages[1] = NULL;
+	if (hv_is_isolation_supported()) {
+		memunmap(vmbus_connection.monitor_pages[0]);
+		memunmap(vmbus_connection.monitor_pages[1]);
+
+		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+		hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+	}
+
+	hv_free_hyperv_page((unsigned long)
+		vmbus_connection.monitor_pages_original[0]);
+	hv_free_hyperv_page((unsigned long)
+		vmbus_connection.monitor_pages_original[1]);
+	vmbus_connection.monitor_pages_original[0] =
+		vmbus_connection.monitor_pages[0] = NULL;
+	vmbus_connection.monitor_pages_original[1] =
+		vmbus_connection.monitor_pages[1] = NULL;
 }
 
 /*
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 42f3d9d123a1..7cb11ef694da 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
 	 * is child->parent notification
 	 */
 	struct hv_monitor_page *monitor_pages[2];
+	void *monitor_pages_original[2];
 	struct list_head chn_msg_list;
 	spinlock_t channelmsg_lock;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (6 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  0:23   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
AMD SNP support. This patch is to map the ring buffer
address in extra address space via vmap_pfn(). Hyperv set
memory host visibility hvcall smears data in the ring buffer
and so reset the ring buffer memory to zero after mapping.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Remove hv_ringbuffer_post_init(), merge map
	operation for Isolation VM into hv_ringbuffer_init()
	* Call hv_ringbuffer_init() after __vmbus_establish_gpadl().
---
 drivers/hv/Kconfig       |  1 +
 drivers/hv/channel.c     | 19 +++++++-------
 drivers/hv/ring_buffer.c | 56 ++++++++++++++++++++++++++++++----------
 3 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index d1123ceb38f3..dd12af20e467 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -8,6 +8,7 @@ config HYPERV
 		|| (ARM64 && !CPU_BIG_ENDIAN))
 	select PARAVIRT
 	select X86_HV_CALLBACK_VECTOR if X86
+	select VMAP_PFN
 	help
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 82650beb3af0..81f8629e4491 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -679,15 +679,6 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	if (!newchannel->max_pkt_size)
 		newchannel->max_pkt_size = VMBUS_DEFAULT_MAX_PKT_SIZE;
 
-	err = hv_ringbuffer_init(&newchannel->outbound, page, send_pages, 0);
-	if (err)
-		goto error_clean_ring;
-
-	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
-				 recv_pages, newchannel->max_pkt_size);
-	if (err)
-		goto error_clean_ring;
-
 	/* Establish the gpadl for the ring buffer */
 	newchannel->ringbuffer_gpadlhandle = 0;
 
@@ -699,6 +690,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	if (err)
 		goto error_clean_ring;
 
+	err = hv_ringbuffer_init(&newchannel->outbound,
+				 page, send_pages, 0);
+	if (err)
+		goto error_free_gpadl;
+
+	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
+				 recv_pages, newchannel->max_pkt_size);
+	if (err)
+		goto error_free_gpadl;
+
 	/* Create and init the channel open message */
 	open_info = kzalloc(sizeof(*open_info) +
 			   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..24d64d18eb65 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
 #include <linux/prefetch.h>
+#include <linux/io.h>
+#include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"
 
@@ -183,8 +185,10 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
 {
-	int i;
 	struct page **pages_wraparound;
+	unsigned long *pfns_wraparound;
+	u64 pfn;
+	int i;
 
 	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
 
@@ -192,23 +196,49 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 	 * First page holds struct hv_ring_buffer, do wraparound mapping for
 	 * the rest.
 	 */
-	pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
-				   GFP_KERNEL);
-	if (!pages_wraparound)
-		return -ENOMEM;
+	if (hv_isolation_type_snp()) {
+		pfn = page_to_pfn(pages) +
+			HVPFN_DOWN(ms_hyperv.shared_gpa_boundary);
 
-	pages_wraparound[0] = pages;
-	for (i = 0; i < 2 * (page_cnt - 1); i++)
-		pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
+		pfns_wraparound = kcalloc(page_cnt * 2 - 1,
+			sizeof(unsigned long), GFP_KERNEL);
+		if (!pfns_wraparound)
+			return -ENOMEM;
 
-	ring_info->ring_buffer = (struct hv_ring_buffer *)
-		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
+		pfns_wraparound[0] = pfn;
+		for (i = 0; i < 2 * (page_cnt - 1); i++)
+			pfns_wraparound[i + 1] = pfn + i % (page_cnt - 1) + 1;
 
-	kfree(pages_wraparound);
+		ring_info->ring_buffer = (struct hv_ring_buffer *)
+			vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
+				 PAGE_KERNEL);
+		kfree(pfns_wraparound);
 
+		if (!ring_info->ring_buffer)
+			return -ENOMEM;
+
+		/* Zero ring buffer after setting memory host visibility. */
+		memset(ring_info->ring_buffer, 0x00,
+			HV_HYP_PAGE_SIZE * page_cnt);
+	} else {
+		pages_wraparound = kcalloc(page_cnt * 2 - 1,
+					   sizeof(struct page *),
+					   GFP_KERNEL);
+
+		pages_wraparound[0] = pages;
+		for (i = 0; i < 2 * (page_cnt - 1); i++)
+			pages_wraparound[i + 1] =
+				&pages[i % (page_cnt - 1) + 1];
+
+		ring_info->ring_buffer = (struct hv_ring_buffer *)
+			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
+				PAGE_KERNEL);
+
+		kfree(pages_wraparound);
+		if (!ring_info->ring_buffer)
+			return -ENOMEM;
+	}
 
-	if (!ring_info->ring_buffer)
-		return -ENOMEM;
 
 	ring_info->ring_buffer->read_index =
 		ring_info->ring_buffer->write_index = 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (7 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-08-27 17:21 ` [PATCH V4 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops. The swiotlb bounce buffer
PA will be returned to driver and used for DMA address. The new
mapped virtual address is just to acess bounce buffer in the
swiotlb code. PAs passed to DMA API still have backing struct page.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 include/linux/dma-map-ops.h |  9 +++++++++
 kernel/dma/mapping.c        | 22 ++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
 	unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
+struct dma_memory_decrypted_ops {
+	void *(*map)(void *addr, unsigned long size);
+	void (*unmap)(void *addr);
+};
+
 #ifdef CONFIG_DMA_OPS
 #include <asm/dma-mapping.h>
 
@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
 }
 #endif /* CONFIG_DMA_API_DEBUG */
 
+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
 extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 
 #endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
 #include <linux/of_device.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <asm/set_memory.h>
 #include "debug.h"
 #include "direct.h"
 
 bool dma_default_coherent;
 
+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 /*
  * Managed DMA API
  */
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
 	return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+	if (set_memory_decrypted((unsigned long)addr,
+				 size / PAGE_SIZE))
+		return NULL;
+
+	if (dma_memory_generic_decrypted_ops.map)
+		return dma_memory_generic_decrypted_ops.map(addr, size);
+	else
+		return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+	if (dma_memory_generic_decrypted_ops.unmap)
+		dma_memory_generic_decrypted_ops.unmap(addr);
+
+	return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (8 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-08-27 17:21 ` [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v1:
       * Make swiotlb_init_io_tlb_mem() return error code and return
         error when dma_map_decrypted() fails.
---
 include/linux/swiotlb.h |  4 ++++
 kernel/dma/swiotlb.c    | 32 ++++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:	The end address of the swiotlb memory pool. Used to do a quick
  *		range check to see if the memory was in fact allocated by this
  *		API.
+ * @vaddr:	The vaddr of the swiotlb memory pool. The swiotlb
+ *		memory pool may be remapped in the memory encrypted case and store
+ *		virtual address for bounce buffer operation.
  * @nslabs:	The number of IO TLB blocks (in groups of 64) between @start and
  *		@end. For default swiotlb, this is command line adjustable via
  *		setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
 	phys_addr_t start;
 	phys_addr_t end;
+	void *vaddr;
 	unsigned long nslabs;
 	unsigned long used;
 	unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..29b6d888ef3b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -176,7 +176,7 @@ void __init swiotlb_update_mem_attributes(void)
 	memset(vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
 				    unsigned long nslabs, bool late_alloc)
 {
 	void *vaddr = phys_to_virt(start);
@@ -194,14 +194,21 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
 		mem->slots[i].alloc_size = 0;
 	}
 
-	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-	memset(vaddr, 0, bytes);
+	mem->vaddr = dma_map_decrypted(vaddr, bytes);
+	if (!mem->vaddr) {
+		pr_err("Failed to decrypt memory.\n");
+		return -ENOMEM;
+	}
+
+	memset(mem->vaddr, 0, bytes);
+	return 0;
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	struct io_tlb_mem *mem;
 	size_t alloc_size;
+	int ret;
 
 	if (swiotlb_force == SWIOTLB_NO_FORCE)
 		return 0;
@@ -216,7 +223,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+	ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+	if (ret) {
+		memblock_free(__pa(mem), alloc_size);
+		return ret;
+	}
 
 	io_tlb_default_mem = mem;
 	if (verbose)
@@ -304,6 +315,8 @@ int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
 	struct io_tlb_mem *mem;
+	int size = get_order(struct_size(mem, slots, nslabs));
+	int ret;
 
 	if (swiotlb_force == SWIOTLB_NO_FORCE)
 		return 0;
@@ -312,12 +325,15 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	if (WARN_ON_ONCE(io_tlb_default_mem))
 		return -ENOMEM;
 
-	mem = (void *)__get_free_pages(GFP_KERNEL,
-		get_order(struct_size(mem, slots, nslabs)));
+	mem = (void *)__get_free_pages(GFP_KERNEL, size);
 	if (!mem)
 		return -ENOMEM;
 
-	swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+	ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+	if (ret) {
+		free_pages((unsigned long)mem, size);
+		return ret;
+	}
 
 	io_tlb_default_mem = mem;
 	swiotlb_print_info();
@@ -360,7 +376,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
-	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
 	unsigned int tlb_offset;
 
 	if (orig_addr == INVALID_PHYS_ADDR)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (9 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  1:27   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
       * Get hyperv bounce bufffer size via default swiotlb
       bounce buffer size function and keep default size as
       same as the one in the AMD SEV VM.
---
 arch/x86/hyperv/ivm.c           | 28 +++++++++++++++
 arch/x86/include/asm/mshyperv.h |  2 ++
 arch/x86/mm/mem_encrypt.c       |  3 +-
 arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
 drivers/hv/vmbus_drv.c          |  3 ++
 drivers/iommu/hyperv-iommu.c    | 61 +++++++++++++++++++++++++++++++++
 include/linux/hyperv.h          |  1 +
 7 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index e761c67e2218..84563b3c9f3a 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -294,3 +294,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
 
 	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+				      sizeof(unsigned long), GFP_KERNEL);
+	void *vaddr;
+	int i;
+
+	if (!pfns)
+		return NULL;
+
+	for (i = 0; i < size / PAGE_SIZE; i++)
+		pfns[i] = virt_to_hvpfn(addr + i * PAGE_SIZE) +
+			(ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+	vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+	kfree(pfns);
+
+	return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+	vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index b77f4caee3ee..627fcf8d443c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -252,6 +252,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 			   enum hv_mem_host_visibility visibility);
 int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
 void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
 void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..e2db0b8ed938 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -30,6 +30,7 @@
 #include <asm/processor-flags.h>
 #include <asm/msr.h>
 #include <asm/cmdline.h>
+#include <asm/mshyperv.h>
 
 #include "mm_internal.h"
 
@@ -202,7 +203,7 @@ void __init sev_setup_arch(void)
 	phys_addr_t total_mem = memblock_phys_mem_size();
 	unsigned long size;
 
-	if (!sev_active())
+	if (!sev_active() && !hv_is_isolation_supported())
 		return;
 
 	/*
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include <linux/dma-map-ops.h>
 #include <linux/pci.h>
+#include <linux/hyperv.h>
 #include <xen/swiotlb-xen.h>
 
 #include <asm/xen/hypervisor.h>
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
-		  NULL,
+		  hyperv_swiotlb_detect,
 		  pci_xen_swiotlb_init,
 		  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include <linux/cpu.h>
 #include <linux/sched/task_stack.h>
 
+#include <linux/dma-map-ops.h>
 #include <linux/delay.h>
 #include <linux/notifier.h>
 #include <linux/panic_notifier.h>
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
 	return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
 	}
 	hv_debug_add_dev_dir(child_device_obj);
 
+	child_device_obj->device.dma_mask = &vmbus_dma_mask;
 	return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..899563551574 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,22 @@
 #include <linux/irq.h>
 #include <linux/iommu.h>
 #include <linux/module.h>
+#include <linux/hyperv.h>
+#include <linux/io.h>
 
 #include <asm/apic.h>
 #include <asm/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/io_apic.h>
+#include <asm/iommu.h>
+#include <asm/iommu_table.h>
 #include <asm/irq_remapping.h>
 #include <asm/hypervisor.h>
 #include <asm/mshyperv.h>
+#include <asm/swiotlb.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-direct.h>
+#include <linux/set_memory.h>
 
 #include "irq_remapping.h"
 
@@ -36,6 +44,9 @@
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_size;
+static void *hyperv_io_tlb_start;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
 		const struct cpumask *mask, bool force)
 {
@@ -337,4 +348,54 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
 	.free = hyperv_root_irq_remapping_free,
 };
 
+void __init hyperv_iommu_swiotlb_init(void)
+{
+	/*
+	 * Allocate Hyper-V swiotlb bounce buffer at early place
+	 * to reserve large contiguous memory.
+	 */
+	hyperv_io_tlb_size = swiotlb_size_or_default();
+	hyperv_io_tlb_start = memblock_alloc(
+		hyperv_io_tlb_size, HV_HYP_PAGE_SIZE);
+
+	if (!hyperv_io_tlb_start) {
+		pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
+		return;
+	}
+}
+
+int __init hyperv_swiotlb_detect(void)
+{
+	if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
+	    && hv_is_isolation_supported()) {
+		/*
+		 * Enable swiotlb force mode in Isolation VM to
+		 * use swiotlb bounce buffer for dma transaction.
+		 */
+		swiotlb_force = SWIOTLB_FORCE;
+
+		dma_memory_generic_decrypted_ops.map = hv_map_memory;
+		dma_memory_generic_decrypted_ops.unmap = hv_unmap_memory;
+		return 1;
+	}
+
+	return 0;
+}
+
+void __init hyperv_iommu_swiotlb_later_init(void)
+{
+	/*
+	 * Swiotlb bounce buffer needs to be mapped in extra address
+	 * space. Map function doesn't work in the early place and so
+	 * call swiotlb_late_init_with_tbl() here.
+	 */
+	if (swiotlb_late_init_with_tbl(hyperv_io_tlb_start,
+				       hyperv_io_tlb_size >> IO_TLB_SHIFT))
+		panic("Fail to initialize hyperv swiotlb.\n");
+}
+
+IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
+		  NULL, hyperv_iommu_swiotlb_init,
+		  hyperv_iommu_swiotlb_later_init);
+
 #endif
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 757e09606fd3..724a735d722a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1739,6 +1739,7 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
 int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
 				void (*block_invalidate)(void *context,
 							 u64 block_mask));
+int __init hyperv_swiotlb_detect(void);
 
 struct hyperv_pci_block_ops {
 	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (10 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  2:34   ` Michael Kelley
  2021-08-27 17:21 ` [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
  2021-08-30 12:00 ` [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Christoph Hellwig
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma adress will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Add comment to explain why not to use dma_map_sg()
	* Fix some error handle.
---
 arch/x86/hyperv/ivm.c             |   1 +
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c       | 135 +++++++++++++++++++++++++++++-
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h            |   5 ++
 5 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 84563b3c9f3a..08d8e01de017 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -317,6 +317,7 @@ void *hv_map_memory(void *addr, unsigned long size)
 
 	return vaddr;
 }
+EXPORT_SYMBOL_GPL(hv_map_memory);
 
 void hv_unmap_memory(void *addr)
 {
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index aa7c9962dbd8..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
 	u32 total_bytes;
 	u32 send_buf_index;
 	u32 total_data_buflen;
+	struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
 	/* Receive buffer allocated by us but manages by NetVSP */
 	void *recv_buf;
+	void *recv_original_buf;
 	u32 recv_buf_size; /* allocated bytes */
 	u32 recv_buf_gpadl_handle;
 	u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
 	/* Send buffer allocated by us */
 	void *send_buf;
+	void *send_original_buf;
 	u32 send_buf_size;
 	u32 send_buf_gpadl_handle;
 	u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI	10000
 #define RETRY_MAX	2000	/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+		      struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index f19bffff6a63..edd336b08c2c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
 	int i;
 
 	kfree(nvdev->extension);
-	vfree(nvdev->recv_buf);
-	vfree(nvdev->send_buf);
+
+	if (nvdev->recv_original_buf) {
+		vunmap(nvdev->recv_buf);
+		vfree(nvdev->recv_original_buf);
+	} else {
+		vfree(nvdev->recv_buf);
+	}
+
+	if (nvdev->send_original_buf) {
+		vunmap(nvdev->send_buf);
+		vfree(nvdev->send_original_buf);
+	} else {
+		vfree(nvdev->send_buf);
+	}
+
 	kfree(nvdev->send_section_map);
 
 	for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -347,6 +360,7 @@ static int netvsc_init_buf(struct hv_device *device,
 	unsigned int buf_size;
 	size_t map_words;
 	int i, ret = 0;
+	void *vaddr;
 
 	/* Get receive buffer area. */
 	buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -382,6 +396,17 @@ static int netvsc_init_buf(struct hv_device *device,
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		vaddr = hv_map_memory(net_device->recv_buf, buf_size);
+		if (!vaddr) {
+			ret = -ENOMEM;
+			goto cleanup;
+		}
+
+		net_device->recv_original_buf = net_device->recv_buf;
+		net_device->recv_buf = vaddr;
+	}
+
 	/* Notify the NetVsp of the gpadl handle */
 	init_packet = &net_device->channel_init_pkt;
 	memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -485,6 +510,17 @@ static int netvsc_init_buf(struct hv_device *device,
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		vaddr = hv_map_memory(net_device->send_buf, buf_size);
+		if (!vaddr) {
+			ret = -ENOMEM;
+			goto cleanup;
+		}
+
+		net_device->send_original_buf = net_device->send_buf;
+		net_device->send_buf = vaddr;
+	}
+
 	/* Notify the NetVsp of the gpadl handle */
 	init_packet = &net_device->channel_init_pkt;
 	memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -775,7 +811,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
 
 	/* Notify the layer above us */
 	if (likely(skb)) {
-		const struct hv_netvsc_packet *packet
+		struct hv_netvsc_packet *packet
 			= (struct hv_netvsc_packet *)skb->cb;
 		u32 send_index = packet->send_buf_index;
 		struct netvsc_stats *tx_stats;
@@ -791,6 +827,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
 		tx_stats->bytes += packet->total_bytes;
 		u64_stats_update_end(&tx_stats->syncp);
 
+		netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
 		napi_consume_skb(skb, budget);
 	}
 
@@ -955,6 +992,87 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
 		memset(dest, 0, padding);
 }
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+		      struct hv_netvsc_packet *packet)
+{
+	u32 page_count = packet->cp_partial ?
+		packet->page_buf_cnt - packet->rmsg_pgcnt :
+		packet->page_buf_cnt;
+	int i;
+
+	if (!hv_is_isolation_supported())
+		return;
+
+	if (!packet->dma_range)
+		return;
+
+	for (i = 0; i < page_count; i++)
+		dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
+				 packet->dma_range[i].mapping_size,
+				 DMA_TO_DEVICE);
+
+	kfree(packet->dma_range);
+}
+
+/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
+ * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
+ * VM.
+ *
+ * In isolation VM, netvsc send buffer has been marked visible to
+ * host and so the data copied to send buffer doesn't need to use
+ * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
+ * may not be copied to send buffer and so these pages need to be
+ * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
+ * that. The pfns in the struct hv_page_buffer need to be converted
+ * to bounce buffer's pfn. The loop here is necessary becuase the
+ * entries in the page buffer array are not necessarily full
+ * pages of data.  Each entry in the array has a separate offset and
+ * len that may be non-zero, even for entries in the middle of the
+ * array.  And the entries are not physically contiguous.  So each
+ * entry must be individually mapped rather than as a contiguous unit.
+ * So not use dma_map_sg() here.
+ */
+int netvsc_dma_map(struct hv_device *hv_dev,
+		   struct hv_netvsc_packet *packet,
+		   struct hv_page_buffer *pb)
+{
+	u32 page_count =  packet->cp_partial ?
+		packet->page_buf_cnt - packet->rmsg_pgcnt :
+		packet->page_buf_cnt;
+	dma_addr_t dma;
+	int i;
+
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	packet->dma_range = kcalloc(page_count,
+				    sizeof(*packet->dma_range),
+				    GFP_KERNEL);
+	if (!packet->dma_range)
+		return -ENOMEM;
+
+	for (i = 0; i < page_count; i++) {
+		char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
+					 + pb[i].offset);
+		u32 len = pb[i].len;
+
+		dma = dma_map_single(&hv_dev->device, src, len,
+				     DMA_TO_DEVICE);
+		if (dma_mapping_error(&hv_dev->device, dma)) {
+			kfree(packet->dma_range);
+			return -ENOMEM;
+		}
+
+		packet->dma_range[i].dma = dma;
+		packet->dma_range[i].mapping_size = len;
+		pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
+		pb[i].offset = offset_in_hvpage(dma);
+		pb[i].len = len;
+	}
+
+	return 0;
+}
+
 static inline int netvsc_send_pkt(
 	struct hv_device *device,
 	struct hv_netvsc_packet *packet,
@@ -995,14 +1113,24 @@ static inline int netvsc_send_pkt(
 
 	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
 
+	packet->dma_range = NULL;
 	if (packet->page_buf_cnt) {
 		if (packet->cp_partial)
 			pb += packet->rmsg_pgcnt;
 
+		ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
+		if (ret) {
+			ret = -EAGAIN;
+			goto exit; 
+		}
+
 		ret = vmbus_sendpacket_pagebuffer(out_channel,
 						  pb, packet->page_buf_cnt,
 						  &nvmsg, sizeof(nvmsg),
 						  req_id);
+
+		if (ret)
+			netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
 	} else {
 		ret = vmbus_sendpacket(out_channel,
 				       &nvmsg, sizeof(nvmsg),
@@ -1010,6 +1138,7 @@ static inline int netvsc_send_pkt(
 				       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
 	}
 
+exit:
 	if (ret == 0) {
 		atomic_inc_return(&nvchan->queue_sends);
 
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index f6c9c2a670f9..448fcc325ed7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct net_device *ndev,
 			}
 		}
 
+		netvsc_dma_unmap(((struct net_device_context *)
+			netdev_priv(ndev))->device_ctx, &request->pkt);
 		complete(&request->wait_event);
 	} else {
 		netdev_err(ndev,
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 724a735d722a..139a43ad65a1 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1596,6 +1596,11 @@ struct hyperv_service_callback {
 	void (*callback)(void *context);
 };
 
+struct hv_dma_range {
+	dma_addr_t dma;
+	u32 mapping_size;
+};
+
 #define MAX_SRV_VER	0x7ffffff
 extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf, u32 buflen,
 				const int *fw_version, int fw_vercnt,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (11 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
@ 2021-08-27 17:21 ` Tianyu Lan
  2021-09-02  2:08   ` Michael Kelley
  2021-08-30 12:00 ` [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Christoph Hellwig
  13 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:21 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(dma_map_sg) to map
these memory during sending/receiving packet and return swiotlb bounce
buffer dma address. In Isolation VM, swiotlb  bounce buffer is marked
to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Rplace dma_map_page with dma_map_sg()
	* Use for_each_sg() to populate payload->range.pfn_array.
	* Remove storvsc_dma_map macro
---
 drivers/hv/vmbus_drv.c     |  1 +
 drivers/scsi/storvsc_drv.c | 41 +++++++++++++++-----------------------
 include/linux/hyperv.h     |  1 +
 3 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f068e22a5636..270d526fd9de 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2124,6 +2124,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
 	hv_debug_add_dev_dir(child_device_obj);
 
 	child_device_obj->device.dma_mask = &vmbus_dma_mask;
+	child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
 	return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..4f1793be1fdc 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include <linux/device.h>
 #include <linux/hyperv.h>
 #include <linux/blkdev.h>
+#include <linux/dma-mapping.h>
+
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_host.h>
@@ -1312,6 +1314,9 @@ static void storvsc_on_channel_callback(void *context)
 					continue;
 				}
 				request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
+				if (scsi_sg_count(scmnd))
+					dma_unmap_sg(&device->device, scsi_sglist(scmnd),
+						     scsi_sg_count(scmnd), scmnd->sc_data_direction);
 			}
 
 			storvsc_on_receive(stor_device, packet, request);
@@ -1725,7 +1730,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	struct hv_host_device *host_dev = shost_priv(host);
 	struct hv_device *dev = host_dev->dev;
 	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-	int i;
 	struct scatterlist *sgl;
 	unsigned int sg_count;
 	struct vmscsi_request *vm_srb;
@@ -1807,10 +1811,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	payload_sz = sizeof(cmd_request->mpb);
 
 	if (sg_count) {
-		unsigned int hvpgoff, hvpfns_to_add;
 		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
 		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-		u64 hvpfn;
+		struct scatterlist *sg;
+		unsigned long hvpfn, hvpfns_to_add;
+		int j, i = 0;
 
 		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1824,31 +1829,16 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 		payload->range.len = length;
 		payload->range.offset = offset_in_hvpg;
 
+		if (dma_map_sg(&dev->device, sgl, sg_count,
+		    scmnd->sc_data_direction) == 0)
+			return SCSI_MLQUEUE_DEVICE_BUSY;
 
-		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
-			/*
-			 * Init values for the current sgl entry. hvpgoff
-			 * and hvpfns_to_add are in units of Hyper-V size
-			 * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-			 * case also handles values of sgl->offset that are
-			 * larger than PAGE_SIZE. Such offsets are handled
-			 * even on other than the first sgl entry, provided
-			 * they are a multiple of PAGE_SIZE.
-			 */
-			hvpgoff = HVPFN_DOWN(sgl->offset);
-			hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-			hvpfns_to_add =	HVPFN_UP(sgl->offset + sgl->length) -
-						hvpgoff;
+		for_each_sg(sgl, sg, sg_count, j) {
+			hvpfns_to_add = HVPFN_UP(sg_dma_len(sg));
+			hvpfn = HVPFN_DOWN(sg_dma_address(sg));
 
-			/*
-			 * Fill the next portion of the PFN array with
-			 * sequential Hyper-V PFNs for the continguous physical
-			 * memory described by the sgl entry. The end of the
-			 * last sgl should be reached at the same time that
-			 * the PFN array is filled.
-			 */
 			while (hvpfns_to_add--)
-				payload->range.pfn_array[i++] =	hvpfn++;
+				payload->range.pfn_array[i++] = hvpfn++;
 		}
 	}
 
@@ -1992,6 +1982,7 @@ static int storvsc_probe(struct hv_device *device,
 	stor_device->vmscsi_size_delta = sizeof(struct vmscsi_win8_extension);
 	spin_lock_init(&stor_device->lock);
 	hv_set_drvdata(device, stor_device);
+	dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);
 
 	stor_device->port_number = host->host_no;
 	ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 139a43ad65a1..8f39893f8ccf 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1274,6 +1274,7 @@ struct hv_device {
 
 	struct vmbus_channel *channel;
 	struct kset	     *channels_kset;
+	struct device_dma_parameters dma_parms;
 
 	/* place holder to keep track of the dir for hv device in debugfs */
 	struct dentry *debug_dir;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-27 17:21 ` [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
@ 2021-08-27 17:41   ` Greg KH
  2021-08-27 17:44     ` Tianyu Lan
  2021-09-02  0:17   ` Michael Kelley
  1 sibling, 1 reply; 41+ messages in thread
From: Greg KH @ 2021-08-27 17:41 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

On Fri, Aug 27, 2021 at 01:21:02PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
>        * Change vmbus_teardown_gpadl() parameter and put gpadl handle,
>        buffer and buffer size in the struct vmbus_gpadl.
> ---
>  drivers/hv/channel.c            | 36 ++++++++++++++++++++++++++++-----
>  drivers/net/hyperv/hyperv_net.h |  1 +
>  drivers/net/hyperv/netvsc.c     | 16 +++++++++++----
>  drivers/uio/uio_hv_generic.c    | 14 +++++++++++--
>  include/linux/hyperv.h          |  8 +++++++-
>  5 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..82650beb3af0 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/uio.h>
>  #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
>  #include <asm/page.h>
>  #include <asm/mshyperv.h>
>  
> @@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	if (ret)
>  		return ret;
>  
> +	ret = set_memory_decrypted((unsigned long)kbuffer,
> +				   HVPFN_UP(size));
> +	if (ret) {
> +		pr_warn("Failed to set host visibility for new GPADL %d.\n", ret);

dev_warn()?  You have access to a struct device, why not use it?

same for all other instances here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page
  2021-08-27 17:21 ` [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
@ 2021-08-27 17:41   ` Greg KH
  2021-08-27 17:46     ` Tianyu Lan
  2021-09-02  3:32   ` Michael Kelley
  1 sibling, 1 reply; 41+ messages in thread
From: Greg KH @ 2021-08-27 17:41 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

On Fri, Aug 27, 2021 at 01:21:03PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Hyperv provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyperv requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB page.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v1:
>          * Introduce sev_es_ghcb_hv_call_simple() and share code
>            between SEV and Hyper-V code.
> Change since v3:
>          * Pass old_msg_type to hv_signal_eom() as parameter.
> 	 * Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
> 	 * Add hv_isolation_type_snp() weak function.
> 	 * Add maros to set syinc register in ARM code.
> ---
>  arch/arm64/include/asm/mshyperv.h |  23 ++++++
>  arch/x86/hyperv/hv_init.c         |  36 ++--------
>  arch/x86/hyperv/ivm.c             | 112 ++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h   |  80 ++++++++++++++++++++-
>  arch/x86/include/asm/sev.h        |   3 +
>  arch/x86/kernel/sev-shared.c      |  63 ++++++++++-------
>  drivers/hv/hv.c                   | 112 ++++++++++++++++++++----------
>  drivers/hv/hv_common.c            |   6 ++
>  include/asm-generic/mshyperv.h    |   4 +-
>  9 files changed, 345 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> index 20070a847304..ced83297e009 100644
> --- a/arch/arm64/include/asm/mshyperv.h
> +++ b/arch/arm64/include/asm/mshyperv.h
> @@ -41,6 +41,29 @@ static inline u64 hv_get_register(unsigned int reg)
>  	return hv_get_vpreg(reg);
>  }
>  
> +#define hv_get_simp(val)	{ val = hv_get_register(HV_REGISTER_SIMP); }
> +#define hv_set_simp(val)	hv_set_register(HV_REGISTER_SIMP, val)
> +
> +#define hv_get_siefp(val)	{ val = hv_get_register(HV_REGISTER_SIEFP); }
> +#define hv_set_siefp(val)	hv_set_register(HV_REGISTER_SIEFP, val)
> +
> +#define hv_get_synint_state(int_num, val) {			\
> +	val = hv_get_register(HV_REGISTER_SINT0 + int_num);	\
> +	}
> +
> +#define hv_set_synint_state(int_num, val)			\
> +	hv_set_register(HV_REGISTER_SINT0 + int_num, val)
> +
> +#define hv_get_synic_state(val) {			\
> +	val = hv_get_register(HV_REGISTER_SCONTROL);	\
> +	}
> +
> +#define hv_set_synic_state(val)			\
> +	hv_set_register(HV_REGISTER_SCONTROL, val)
> +
> +#define hv_signal_eom(old_msg_type)		 \
> +	hv_set_register(HV_REGISTER_EOM, 0)

Please just use real inline functions and not #defines if you really
need it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-27 17:41   ` Greg KH
@ 2021-08-27 17:44     ` Tianyu Lan
  0 siblings, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:44 UTC (permalink / raw)
  To: Greg KH
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

Hi Greg:
      Thanks for your review.

On 8/28/2021 1:41 AM, Greg KH wrote:
> On Fri, Aug 27, 2021 at 01:21:02PM -0400, Tianyu Lan wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Mark vmbus ring buffer visible with set_memory_decrypted() when
>> establish gpadl handle.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>> Change since v3:
>>         * Change vmbus_teardown_gpadl() parameter and put gpadl handle,
>>         buffer and buffer size in the struct vmbus_gpadl.
>> ---
>>   drivers/hv/channel.c            | 36 ++++++++++++++++++++++++++++-----
>>   drivers/net/hyperv/hyperv_net.h |  1 +
>>   drivers/net/hyperv/netvsc.c     | 16 +++++++++++----
>>   drivers/uio/uio_hv_generic.c    | 14 +++++++++++--
>>   include/linux/hyperv.h          |  8 +++++++-
>>   5 files changed, 63 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
>> index f3761c73b074..82650beb3af0 100644
>> --- a/drivers/hv/channel.c
>> +++ b/drivers/hv/channel.c
>> @@ -17,6 +17,7 @@
>>   #include <linux/hyperv.h>
>>   #include <linux/uio.h>
>>   #include <linux/interrupt.h>
>> +#include <linux/set_memory.h>
>>   #include <asm/page.h>
>>   #include <asm/mshyperv.h>
>>   
>> @@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   	if (ret)
>>   		return ret;
>>   
>> +	ret = set_memory_decrypted((unsigned long)kbuffer,
>> +				   HVPFN_UP(size));
>> +	if (ret) {
>> +		pr_warn("Failed to set host visibility for new GPADL %d.\n", ret);
> 
> dev_warn()?  You have access to a struct device, why not use it?
> 
> same for all other instances here.
> 
>

Yes, dav_warn() is better. Will update in the next version. Thanks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page
  2021-08-27 17:41   ` Greg KH
@ 2021-08-27 17:46     ` Tianyu Lan
  0 siblings, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-08-27 17:46 UTC (permalink / raw)
  To: Greg KH
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

On 8/28/2021 1:41 AM, Greg KH wrote:
> On Fri, Aug 27, 2021 at 01:21:03PM -0400, Tianyu Lan wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Hyperv provides GHCB protocol to write Synthetic Interrupt
>> Controller MSR registers in Isolation VM with AMD SEV SNP
>> and these registers are emulated by hypervisor directly.
>> Hyperv requires to write SINTx MSR registers twice. First
>> writes MSR via GHCB page to communicate with hypervisor
>> and then writes wrmsr instruction to talk with paravisor
>> which runs in VMPL0. Guest OS ID MSR also needs to be set
>> via GHCB page.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>> Change since v1:
>>           * Introduce sev_es_ghcb_hv_call_simple() and share code
>>             between SEV and Hyper-V code.
>> Change since v3:
>>           * Pass old_msg_type to hv_signal_eom() as parameter.
>> 	 * Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
>> 	 * Add hv_isolation_type_snp() weak function.
>> 	 * Add maros to set syinc register in ARM code.
>> ---
>>   arch/arm64/include/asm/mshyperv.h |  23 ++++++
>>   arch/x86/hyperv/hv_init.c         |  36 ++--------
>>   arch/x86/hyperv/ivm.c             | 112 ++++++++++++++++++++++++++++++
>>   arch/x86/include/asm/mshyperv.h   |  80 ++++++++++++++++++++-
>>   arch/x86/include/asm/sev.h        |   3 +
>>   arch/x86/kernel/sev-shared.c      |  63 ++++++++++-------
>>   drivers/hv/hv.c                   | 112 ++++++++++++++++++++----------
>>   drivers/hv/hv_common.c            |   6 ++
>>   include/asm-generic/mshyperv.h    |   4 +-
>>   9 files changed, 345 insertions(+), 94 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
>> index 20070a847304..ced83297e009 100644
>> --- a/arch/arm64/include/asm/mshyperv.h
>> +++ b/arch/arm64/include/asm/mshyperv.h
>> @@ -41,6 +41,29 @@ static inline u64 hv_get_register(unsigned int reg)
>>   	return hv_get_vpreg(reg);
>>   }
>>   
>> +#define hv_get_simp(val)	{ val = hv_get_register(HV_REGISTER_SIMP); }
>> +#define hv_set_simp(val)	hv_set_register(HV_REGISTER_SIMP, val)
>> +
>> +#define hv_get_siefp(val)	{ val = hv_get_register(HV_REGISTER_SIEFP); }
>> +#define hv_set_siefp(val)	hv_set_register(HV_REGISTER_SIEFP, val)
>> +
>> +#define hv_get_synint_state(int_num, val) {			\
>> +	val = hv_get_register(HV_REGISTER_SINT0 + int_num);	\
>> +	}
>> +
>> +#define hv_set_synint_state(int_num, val)			\
>> +	hv_set_register(HV_REGISTER_SINT0 + int_num, val)
>> +
>> +#define hv_get_synic_state(val) {			\
>> +	val = hv_get_register(HV_REGISTER_SCONTROL);	\
>> +	}
>> +
>> +#define hv_set_synic_state(val)			\
>> +	hv_set_register(HV_REGISTER_SCONTROL, val)
>> +
>> +#define hv_signal_eom(old_msg_type)		 \
>> +	hv_set_register(HV_REGISTER_EOM, 0)
> 
> Please just use real inline functions and not #defines if you really
> need it.
> 

OK. Will update. Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (12 preceding siblings ...)
  2021-08-27 17:21 ` [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
@ 2021-08-30 12:00 ` Christoph Hellwig
  2021-08-31 15:20   ` Tianyu Lan
  2021-08-31 17:16   ` Michael Kelley
  13 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2021-08-30 12:00 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, hch, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

Sorry for the delayed answer, but I look at the vmap_pfn usage in the
previous version and tried to come up with a better version.  This
mostly untested branch:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap

get us there for swiotlb and the channel infrastructure  I've started
looking at the network driver and didn't get anywhere due to other work.

As far as I can tell the network driver does gigantic multi-megabyte
vmalloc allocation for the send and receive buffers, which are then
passed to the hardware, but always copied to/from when interacting
with the networking stack.  Did I see that right?  Are these big
buffers actually required unlike the normal buffer management schemes
in other Linux network drivers?

If so I suspect the best way to allocate them is by not using vmalloc
but just discontiguous pages, and then use kmap_local_pfn where the
PFN includes the share_gpa offset when actually copying from/to the
skbs.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-08-30 12:00 ` [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Christoph Hellwig
@ 2021-08-31 15:20   ` Tianyu Lan
  2021-09-02  7:51     ` Christoph Hellwig
  2021-08-31 17:16   ` Michael Kelley
  1 sibling, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-08-31 15:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, michael.h.kelley, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

Hi Christoph:

On 8/30/2021 8:00 PM, Christoph Hellwig wrote:
> Sorry for the delayed answer, but I look at the vmap_pfn usage in the
> previous version and tried to come up with a better version.  This
> mostly untested branch:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap

No problem. Thank you very much for your suggestion patches and they are 
very helpful.


> 
> get us there for swiotlb and the channel infrastructure  I've started
> looking at the network driver and didn't get anywhere due to other work.
> 
> As far as I can tell the network driver does gigantic multi-megabyte
> vmalloc allocation for the send and receive buffers, which are then
> passed to the hardware, but always copied to/from when interacting
> with the networking stack.  Did I see that right?  Are these big
> buffers actually required unlike the normal buffer management schemes
> in other Linux network drivers?


For send packet, netvsc tries batching packet in send buffer if 
possible. It passes the original skb pages directly to
hypervisor when send buffer is not enough or packet length is larger 
than section size. These packets are sent via 
vmbus_sendpacket_pagebuffer() finally. Please see netvsc_send() for 
detail. The following code is to check whether the packet could be 
copied into send buffer. If not, the packet will be sent with original 
skb pages.

1239        /* batch packets in send buffer if possible */
1240        msdp = &nvchan->msd;
1241        if (msdp->pkt)
1242                msd_len = msdp->pkt->total_data_buflen;
1243
1244        try_batch =  msd_len > 0 && msdp->count < net_device->max_pkt;
1245        if (try_batch && msd_len + pktlen + net_device->pkt_align <
1246            net_device->send_section_size) {
1247                section_index = msdp->pkt->send_buf_index;
1248
1249        } else if (try_batch && msd_len + packet->rmsg_size <
1250                   net_device->send_section_size) {
1251                section_index = msdp->pkt->send_buf_index;
1252                packet->cp_partial = true;
1253
1254        } else if (pktlen + net_device->pkt_align <
1255                   net_device->send_section_size) {
1256                section_index = 
netvsc_get_next_send_section(net_device);
1257                if (unlikely(section_index == NETVSC_INVALID_INDEX)) {
1258                        ++ndev_ctx->eth_stats.tx_send_full;
1259                } else {
1260                        move_pkt_msd(&msd_send, &msd_skb, msdp);
1261                        msd_len = 0;
1262                }
1263        }
1264



For receive packet, the data is always copied from recv buffer.

> 
> If so I suspect the best way to allocate them is by not using vmalloc
> but just discontiguous pages, and then use kmap_local_pfn where the
> PFN includes the share_gpa offset when actually copying from/to the
> skbs.
> 
When netvsc needs to copy packet data to send buffer, it needs to 
caculate position with section_index and send_section_size.
Please seee netvsc_copy_to_send_buf() detail. So the contiguous virtual 
address of send buffer is necessary to copy data and batch packets.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-08-30 12:00 ` [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Christoph Hellwig
  2021-08-31 15:20   ` Tianyu Lan
@ 2021-08-31 17:16   ` Michael Kelley
  2021-09-02  7:59     ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Michael Kelley @ 2021-08-31 17:16 UTC (permalink / raw)
  To: Christoph Hellwig, Tianyu Lan
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu,
	Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86, hpa,
	dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky, jgross,
	sstabellini, joro, davem, kuba, jejb, martin.petersen, gregkh,
	arnd, m.szyprowski, robin.murphy, brijesh.singh, thomas.lendacky,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	hannes, aneesh.kumar, krish.sadhukhan, saravanand,
	linux-arm-kernel, xen-devel, rientjes, ardb, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

From: Christoph Hellwig <hch@lst.de> Sent: Monday, August 30, 2021 5:01 AM
> 
> Sorry for the delayed answer, but I look at the vmap_pfn usage in the
> previous version and tried to come up with a better version.  This
> mostly untested branch:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap
> 
> get us there for swiotlb and the channel infrastructure  I've started
> looking at the network driver and didn't get anywhere due to other work.
> 
> As far as I can tell the network driver does gigantic multi-megabyte
> vmalloc allocation for the send and receive buffers, which are then
> passed to the hardware, but always copied to/from when interacting
> with the networking stack.  Did I see that right?  Are these big
> buffers actually required unlike the normal buffer management schemes
> in other Linux network drivers?
> 
> If so I suspect the best way to allocate them is by not using vmalloc
> but just discontiguous pages, and then use kmap_local_pfn where the
> PFN includes the share_gpa offset when actually copying from/to the
> skbs.

As a quick overview, I think there are four places where the
shared_gpa_boundary must be applied to adjust the guest physical
address that is used.  Each requires mapping a corresponding
virtual address range.  Here are the four places:

1)  The so-called "monitor pages" that are a core communication
mechanism between the guest and Hyper-V.  These are two single
pages, and the mapping is handled by calling memremap() for
each of the two pages.  See Patch 7 of Tianyu's series.

2)  The VMbus channel ring buffers.  You have proposed using
your new  vmap_phys_range() helper, but I don't think that works
here.  More details below.

3)  The network driver send and receive buffers.  vmap_phys_range()
should work here.

4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
should work here as well.

Case #2 above does unusual mapping.  The ring buffer consists of a ring
buffer header page, followed by one or more pages that are the actual
ring buffer.  The pages making up the actual ring buffer are mapped
twice in succession.  For example, if the ring buffer has 4 pages
(one header page and three ring buffer pages), the contiguous
virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
The duplicate contiguous mapping allows the code that is reading
or writing the actual ring buffer to not be concerned about wrap-around
because writing off the end of the ring buffer is automatically
wrapped-around by the mapping.  The amount of data read or
written in one batch never exceeds the size of the ring buffer, and
after a batch is read or written, the read or write indices are adjusted
to put them back into the range of the first mapping of the actual
ring buffer pages.  So there's method to the madness, and the
technique works pretty well.  But this kind of mapping is not
amenable to using vmap_phys_range().

Michael



^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM
  2021-08-27 17:20 ` [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
@ 2021-09-02  0:15   ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:15 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via ghcb page.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Chagne since v3:
>         * Rename ghcb_base to hv_ghcb_pg and move it out of
> 	  struct ms_hyperv_info.
> 	* Allocate hv_ghcb_pg before cpuhp_setup_state() and leverage
> 	  hv_cpu_init() to initialize ghcb page.
> ---
>  arch/x86/hyperv/hv_init.c       | 68 +++++++++++++++++++++++++++++----
>  arch/x86/include/asm/mshyperv.h |  4 ++
>  arch/x86/kernel/cpu/mshyperv.c  |  3 ++
>  include/asm-generic/mshyperv.h  |  1 +
>  4 files changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 708a2712a516..eba10ed4f73e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -20,6 +20,7 @@
>  #include <linux/kexec.h>
>  #include <linux/version.h>
>  #include <linux/vmalloc.h>
> +#include <linux/io.h>
>  #include <linux/mm.h>
>  #include <linux/hyperv.h>
>  #include <linux/slab.h>
> @@ -36,12 +37,42 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> +void __percpu **hv_ghcb_pg;
> +
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> 
>  struct hv_vp_assist_page **hv_vp_assist_page;
>  EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> 
> +static int hyperv_init_ghcb(void)
> +{
> +	u64 ghcb_gpa;
> +	void *ghcb_va;
> +	void **ghcb_base;
> +
> +	if (!hv_isolation_type_snp())
> +		return 0;
> +
> +	if (!hv_ghcb_pg)
> +		return -EINVAL;
> +
> +	/*
> +	 * GHCB page is allocated by paravisor. The address
> +	 * returned by MSR_AMD64_SEV_ES_GHCB is above shared
> +	 * ghcb boundary and map it here.

I'm not sure what the "shared ghcb boundary" is.  Did you
mean "shared_gpa_boundary"?

> +	 */
> +	rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> +	ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> +	if (!ghcb_va)
> +		return -ENOMEM;
> +
> +	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> +	*ghcb_base = ghcb_va;
> +
> +	return 0;
> +}
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,7 +116,7 @@ static int hv_cpu_init(unsigned int cpu)
>  		}
>  	}
> 
> -	return 0;
> +	return hyperv_init_ghcb();
>  }
> 
>  static void (*hv_reenlightenment_cb)(void);
> @@ -177,6 +208,14 @@ static int hv_cpu_die(unsigned int cpu)
>  {
>  	struct hv_reenlightenment_control re_ctrl;
>  	unsigned int new_cpu;
> +	void **ghcb_va;
> +
> +	if (hv_ghcb_pg) {
> +		ghcb_va = (void **)this_cpu_ptr(hv_ghcb_pg);
> +		if (*ghcb_va)
> +			memunmap(*ghcb_va);
> +		*ghcb_va = NULL;
> +	}
> 
>  	hv_common_cpu_die(cpu);
> 
> @@ -366,10 +405,16 @@ void __init hyperv_init(void)
>  		goto common_free;
>  	}
> 
> +	if (hv_isolation_type_snp()) {
> +		hv_ghcb_pg = alloc_percpu(void *);
> +		if (!hv_ghcb_pg)
> +			goto free_vp_assist_page;
> +	}
> +
>  	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>  				  hv_cpu_init, hv_cpu_die);
>  	if (cpuhp < 0)
> -		goto free_vp_assist_page;
> +		goto free_ghcb_page;
> 
>  	/*
>  	 * Setup the hypercall page and enable hypercalls.
> @@ -383,10 +428,8 @@ void __init hyperv_init(void)
>  			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>  			__builtin_return_address(0));
> -	if (hv_hypercall_pg == NULL) {
> -		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> -		goto remove_cpuhp_state;
> -	}
> +	if (hv_hypercall_pg == NULL)
> +		goto clean_guest_os_id;
> 
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	hypercall_msr.enable = 1;
> @@ -456,8 +499,11 @@ void __init hyperv_init(void)
>  	hv_query_ext_cap(0);
>  	return;
> 
> -remove_cpuhp_state:
> +clean_guest_os_id:
> +	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  	cpuhp_remove_state(cpuhp);
> +free_ghcb_page:
> +	free_percpu(hv_ghcb_pg);
>  free_vp_assist_page:
>  	kfree(hv_vp_assist_page);
>  	hv_vp_assist_page = NULL;
> @@ -559,3 +605,11 @@ bool hv_is_isolation_supported(void)
>  {
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +
> +bool hv_isolation_type_snp(void)
> +{
> +	return static_branch_unlikely(&isolation_type_snp);
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index adccbc209169..37739a277ac6 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -11,6 +11,8 @@
>  #include <asm/paravirt.h>
>  #include <asm/mshyperv.h>
> 
> +DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> +
>  typedef int (*hyperv_fill_flush_list_func)(
>  		struct hv_guest_mapping_flush_list *flush,
>  		void *data);
> @@ -39,6 +41,8 @@ extern void *hv_hypercall_pg;
> 
>  extern u64 hv_current_partition_id;
> 
> +extern void __percpu **hv_ghcb_pg;
> +
>  int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
>  int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
>  int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 6b5835a087a3..20557a9d6e25 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -316,6 +316,9 @@ static void __init ms_hyperv_init_platform(void)
> 
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> +
> +		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> +			static_branch_enable(&isolation_type_snp);
>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c1ab6a6e72b5..0924bbd8458e 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -237,6 +237,7 @@ bool hv_is_hyperv_initialized(void);
>  bool hv_is_hibernation_supported(void);
>  enum hv_isolation_type hv_get_isolation_type(void);
>  bool hv_is_isolation_supported(void);
> +bool hv_isolation_type_snp(void);
>  void hyperv_cleanup(void);
>  bool hv_query_ext_cap(u64 cap_query);
>  #else /* CONFIG_HYPERV */
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the Isolation VM.
  2021-08-27 17:21 ` [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the " Tianyu Lan
@ 2021-09-02  0:15   ` Michael Kelley
  2021-09-02  6:35     ` Tianyu Lan
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:15 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyper-V exposes shared memory boundary via cpuid
> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
> shared_gpa_boundary of ms_hyperv struct. This prepares
> to share memory with host for SNP guest.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* user BIT_ULL to get shared_gpa_boundary
> 	* Rename field Reserved* to reserved
> ---
>  arch/x86/kernel/cpu/mshyperv.c |  2 ++
>  include/asm-generic/mshyperv.h | 12 +++++++++++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 20557a9d6e25..8bb001198316 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>  	if (ms_hyperv.priv_high & HV_ISOLATION) {
>  		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>  		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
> +		ms_hyperv.shared_gpa_boundary =
> +			BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> 
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 0924bbd8458e..7537ae1db828 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -35,7 +35,17 @@ struct ms_hyperv_info {
>  	u32 max_vp_index;
>  	u32 max_lp_index;
>  	u32 isolation_config_a;
> -	u32 isolation_config_b;
> +	union {
> +		u32 isolation_config_b;
> +		struct {
> +			u32 cvm_type : 4;
> +			u32 reserved11 : 1;
> +			u32 shared_gpa_boundary_active : 1;
> +			u32 shared_gpa_boundary_bits : 6;
> +			u32 reserved12 : 20;

I'm still curious about the "11" and "12" in the reserved
field names.  Why not just "reserved1" and "reserved2"?
Having the "11" and "12" isn't wrong, but it makes one
wonder why since it's not usual. :-)

> +		};
> +	};
> +	u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support
  2021-08-27 17:21 ` [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support Tianyu Lan
@ 2021-09-02  0:16   ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:16 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Fix error code handle in the __hv_set_mem_host_visibility().
> 	* Move HvCallModifySparseGpaPageHostVisibility near to enum
> 	  hv_mem_host_visibility.
> 
> Change since v2:
>        * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>          according to platform check.
> 
> Change since v1:
>        * Use new staic call x86_set_memory_enc to avoid add Hyper-V
>          specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile           |   2 +-
>  arch/x86/hyperv/hv_init.c          |   6 ++
>  arch/x86/hyperv/ivm.c              | 113 +++++++++++++++++++++++++++++
>  arch/x86/include/asm/hyperv-tlfs.h |  17 +++++
>  arch/x86/include/asm/mshyperv.h    |   4 +-
>  arch/x86/mm/pat/set_memory.c       |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h     |   1 +
>  8 files changed, 156 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index eba10ed4f73e..b1aa42f60faa 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -603,6 +603,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> 
>  bool hv_is_isolation_supported(void)
>  {
> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> +		return 0;

Use "return false" per previous comment from Wei Liu.

> +
> +	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> +		return 0;

Use "return false".

> +
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> new file mode 100644
> index 000000000000..a069c788ce3c
> --- /dev/null
> +++ b/arch/x86/hyperv/ivm.c
> @@ -0,0 +1,113 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V Isolation VM interface with paravisor and hypervisor
> + *
> + * Author:
> + *  Tianyu Lan <Tianyu.Lan@microsoft.com>
> + */
> +
> +#include <linux/hyperv.h>
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
> +#include <linux/slab.h>
> +#include <asm/io.h>
> +#include <asm/mshyperv.h>
> +
> +/*
> + * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
> + *
> + * In Isolation VM, all guest memory is encripted from host and guest

s/encripted/encrypted/

> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host.
> + */
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +			   enum hv_mem_host_visibility visibility)
> +{
> +	struct hv_gpa_range_for_visibility **input_pcpu, *input;
> +	u16 pages_processed;
> +	u64 hv_status;
> +	unsigned long flags;
> +
> +	/* no-op if partition isolation is not enabled */
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> +			HV_MAX_MODIFY_GPA_REP_COUNT);
> +		return -EINVAL;
> +	}
> +
> +	local_irq_save(flags);
> +	input_pcpu = (struct hv_gpa_range_for_visibility **)
> +			this_cpu_ptr(hyperv_pcpu_input_arg);
> +	input = *input_pcpu;
> +	if (unlikely(!input)) {
> +		local_irq_restore(flags);
> +		return -EINVAL;
> +	}
> +
> +	input->partition_id = HV_PARTITION_ID_SELF;
> +	input->host_visibility = visibility;
> +	input->reserved0 = 0;
> +	input->reserved1 = 0;
> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> +	hv_status = hv_do_rep_hypercall(
> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> +			0, input, &pages_processed);
> +	local_irq_restore(flags);
> +
> +	if (hv_result_success(hv_status))
> +		return 0;
> +	else
> +		return -EFAULT;
> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);

In later comments on Patch 7 of this series, I have suggested that
code in that patch should not call hv_mark_gpa_visibility() directly,
but instead should call set_memory_encrypted() and
set_memory_decrypted().  I'm thinking that those functions should
be the standard way to change the visibility of pages in the Isolated
VM case.  Then hv_mark_gpa_visibility() could be static and it would
not need a stub version for ARM64.  It would only be called by
__hv_set_mem_host_visibility below, and in turn by
set_memory_encrypted()/decrypted().  

> +
> +static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount,
> +				      enum hv_mem_host_visibility visibility)
> +{
> +	u64 *pfn_array;
> +	int ret = 0;
> +	int i, pfn;
> +
> +	if (!hv_is_isolation_supported() || !hv_hypercall_pg)
> +		return 0;
> +
> +	pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
> +	if (!pfn_array)
> +		return -ENOMEM;
> +
> +	for (i = 0, pfn = 0; i < pagecount; i++) {
> +		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
> +		pfn++;
> +
> +		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
> +			ret = hv_mark_gpa_visibility(pfn, pfn_array,
> +						     visibility);
> +			if (ret)
> +				goto err_free_pfn_array;
> +			pfn = 0;
> +		}
> +	}
> +
> + err_free_pfn_array:
> +	kfree(pfn_array);
> +	return ret;
> +}
> +
> +/*
> + * hv_set_mem_host_visibility - Set specified memory visible to host.
> + *
> + * In Isolation VM, all guest memory is encrypted from host and guest
> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host. This function works as wrap of hv_mark_gpa_visibility()
> + * with memory base and size.
> + */
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
> +{
> +	enum hv_mem_host_visibility visibility = visible ?
> +			VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE;
> +
> +	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
> +}

Is there a need for this wrapper function?  Couldn't the handling of the host
visibility enum be folded into __hv_set_mem_host_visibility() and the initial
double underscore dropped?  Maybe I missed it, but I don't see that
__hv_set_mem_host_visibility() is called anyplace else.   Just trying to avoid
complexity if it isn't really needed.

> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 2322d6bd5883..381e88122a5f 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -276,6 +276,23 @@ enum hv_isolation_type {
>  #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
>  #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
> 
> +/* Hyper-V memory host visibility */
> +enum hv_mem_host_visibility {
> +	VMBUS_PAGE_NOT_VISIBLE		= 0,
> +	VMBUS_PAGE_VISIBLE_READ_ONLY	= 1,
> +	VMBUS_PAGE_VISIBLE_READ_WRITE	= 3
> +};
> +
> +/* HvCallModifySparseGpaPageHostVisibility hypercall */
> +#define HV_MAX_MODIFY_GPA_REP_COUNT	((PAGE_SIZE / sizeof(u64)) - 2)
> +struct hv_gpa_range_for_visibility {
> +	u64 partition_id;
> +	u32 host_visibility:2;
> +	u32 reserved0:30;
> +	u32 reserved1;
> +	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
> +} __packed;
> +
>  /*
>   * Declare the MSR used to setup pages used to communicate with the hypervisor.
>   */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 37739a277ac6..ffb2af079c6b 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -192,7 +192,9 @@ struct irq_domain *hv_create_pci_msi_domain(void);
>  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>  		struct hv_interrupt_entry *entry);
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> -
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +			   enum hv_mem_host_visibility visibility);
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index ad8a5c586a35..1e4a0882820a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -29,6 +29,8 @@
>  #include <asm/proto.h>
>  #include <asm/memtype.h>
>  #include <asm/set_memory.h>
> +#include <asm/hyperv-tlfs.h>
> +#include <asm/mshyperv.h>
> 
>  #include "../mm_internal.h"
> 
> @@ -1980,15 +1982,11 @@ int set_memory_global(unsigned long addr, int numpages)
>  				    __pgprot(_PAGE_GLOBAL), 0);
>  }
> 
> -static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>  {
>  	struct cpa_data cpa;
>  	int ret;
> 
> -	/* Nothing to do if memory encryption is not active */
> -	if (!mem_encrypt_active())
> -		return 0;
> -
>  	/* Should not be working on unaligned addresses */
>  	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>  		addr &= PAGE_MASK;
> @@ -2023,6 +2021,17 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	return ret;
>  }
> 
> +static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> +	if (hv_is_isolation_supported())
> +		return hv_set_mem_host_visibility(addr, numpages, !enc);
> +
> +	if (mem_encrypt_active())
> +		return __set_memory_enc_pgtable(addr, numpages, enc);
> +
> +	return 0;
> +}
> +
>  int set_memory_encrypted(unsigned long addr, int numpages)
>  {
>  	return __set_memory_enc_dec(addr, numpages, true);
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 56348a541c50..8ed6733d5146 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_RETARGET_INTERRUPT		0x007e
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> +#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> 
>  /* Extended hypercalls */
>  #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 7537ae1db828..aa55447b9700 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -254,6 +254,7 @@ bool hv_query_ext_cap(u64 cap_query);
>  static inline bool hv_is_hyperv_initialized(void) { return false; }
>  static inline bool hv_is_hibernation_supported(void) { return false; }
>  static inline void hyperv_cleanup(void) {}
> +static inline hv_is_isolation_supported(void);
>  #endif /* CONFIG_HYPERV */
> 
>  #endif
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-27 17:21 ` [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
  2021-08-27 17:41   ` Greg KH
@ 2021-09-02  0:17   ` Michael Kelley
  1 sibling, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:17 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
>        * Change vmbus_teardown_gpadl() parameter and put gpadl handle,
>        buffer and buffer size in the struct vmbus_gpadl.
> ---
>  drivers/hv/channel.c            | 36 ++++++++++++++++++++++++++++-----
>  drivers/net/hyperv/hyperv_net.h |  1 +
>  drivers/net/hyperv/netvsc.c     | 16 +++++++++++----
>  drivers/uio/uio_hv_generic.c    | 14 +++++++++++--
>  include/linux/hyperv.h          |  8 +++++++-
>  5 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..82650beb3af0 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/uio.h>
>  #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
>  #include <asm/page.h>
>  #include <asm/mshyperv.h>
> 
> @@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	if (ret)
>  		return ret;
> 
> +	ret = set_memory_decrypted((unsigned long)kbuffer,
> +				   HVPFN_UP(size));
> +	if (ret) {
> +		pr_warn("Failed to set host visibility for new GPADL %d.\n", ret);
> +		return ret;
> +	}
> +
>  	init_completion(&msginfo->waitevent);
>  	msginfo->waiting_channel = channel;
> 
> @@ -549,6 +557,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	}
> 
>  	kfree(msginfo);
> +
> +	if (ret)
> +		set_memory_encrypted((unsigned long)kbuffer,
> +				     HVPFN_UP(size));
> +
>  	return ret;
>  }
> 
> @@ -639,6 +652,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	struct vmbus_channel_open_channel *open_msg;
>  	struct vmbus_channel_msginfo *open_info = NULL;
>  	struct page *page = newchannel->ringbuffer_page;
> +	struct vmbus_gpadl gpadl;
>  	u32 send_pages, recv_pages;
>  	unsigned long flags;
>  	int err;
> @@ -759,7 +773,10 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  error_free_info:
>  	kfree(open_info);
>  error_free_gpadl:
> -	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
> +	gpadl.gpadl_handle = newchannel->ringbuffer_gpadlhandle;
> +	gpadl.buffer = page_address(newchannel->ringbuffer_page);
> +	gpadl.size = (send_pages + recv_pages) << PAGE_SHIFT;
> +	vmbus_teardown_gpadl(newchannel, &gpadl);
>  	newchannel->ringbuffer_gpadlhandle = 0;
>  error_clean_ring:
>  	hv_ringbuffer_cleanup(&newchannel->outbound);
> @@ -806,7 +823,7 @@ EXPORT_SYMBOL_GPL(vmbus_open);
>  /*
>   * vmbus_teardown_gpadl -Teardown the specified GPADL handle
>   */
> -int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
> +int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl *gpadl)
>  {
>  	struct vmbus_channel_gpadl_teardown *msg;
>  	struct vmbus_channel_msginfo *info;
> @@ -825,7 +842,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
> 
>  	msg->header.msgtype = CHANNELMSG_GPADL_TEARDOWN;
>  	msg->child_relid = channel->offermsg.child_relid;
> -	msg->gpadl = gpadl_handle;
> +	msg->gpadl = gpadl->gpadl_handle;
> 
>  	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
>  	list_add_tail(&info->msglistentry,
> @@ -859,6 +876,12 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>  	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
> 
>  	kfree(info);
> +
> +	ret = set_memory_encrypted((unsigned long)gpadl->buffer,
> +				   HVPFN_UP(gpadl->size));
> +	if (ret)
> +		pr_warn("Fail to set mem host visibility in GPADL teardown %d.\n", ret);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
> @@ -896,6 +919,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
>  static int vmbus_close_internal(struct vmbus_channel *channel)
>  {
>  	struct vmbus_channel_close_channel *msg;
> +	struct vmbus_gpadl gpadl;
>  	int ret;
> 
>  	vmbus_reset_channel_cb(channel);
> @@ -934,8 +958,10 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
> 
>  	/* Tear down the gpadl for the channel's ring buffer */
>  	else if (channel->ringbuffer_gpadlhandle) {
> -		ret = vmbus_teardown_gpadl(channel,
> -					   channel->ringbuffer_gpadlhandle);
> +		gpadl.gpadl_handle = channel->ringbuffer_gpadlhandle;
> +		gpadl.buffer = page_address(channel->ringbuffer_page);
> +		gpadl.size = channel->ringbuffer_pagecount;
> +		ret = vmbus_teardown_gpadl(channel, &gpadl);
>  		if (ret) {
>  			pr_err("Close failed: teardown gpadl return %d\n", ret);
>  			/*
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index bc48855dff10..aa7c9962dbd8 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -1082,6 +1082,7 @@ struct netvsc_device {
> 
>  	/* Send buffer allocated by us */
>  	void *send_buf;
> +	u32 send_buf_size;
>  	u32 send_buf_gpadl_handle;
>  	u32 send_section_cnt;
>  	u32 send_section_size;
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 7bd935412853..f19bffff6a63 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -276,11 +276,14 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
>  				       struct netvsc_device *net_device,
>  				       struct net_device *ndev)
>  {
> +	struct vmbus_gpadl gpadl;
>  	int ret;
> 
>  	if (net_device->recv_buf_gpadl_handle) {
> -		ret = vmbus_teardown_gpadl(device->channel,
> -					   net_device->recv_buf_gpadl_handle);
> +		gpadl.gpadl_handle = net_device->recv_buf_gpadl_handle;
> +		gpadl.buffer = net_device->recv_buf;
> +		gpadl.size = net_device->recv_buf_size;
> +		ret = vmbus_teardown_gpadl(device->channel, &gpadl);
> 
>  		/* If we failed here, we might as well return and have a leak
>  		 * rather than continue and a bugchk
> @@ -298,11 +301,15 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
>  				       struct netvsc_device *net_device,
>  				       struct net_device *ndev)
>  {
> +	struct vmbus_gpadl gpadl;
>  	int ret;
> 
>  	if (net_device->send_buf_gpadl_handle) {
> -		ret = vmbus_teardown_gpadl(device->channel,
> -					   net_device->send_buf_gpadl_handle);
> +		gpadl.gpadl_handle = net_device->send_buf_gpadl_handle;
> +		gpadl.buffer = net_device->send_buf;
> +		gpadl.size = net_device->send_buf_size;
> +
> +		ret = vmbus_teardown_gpadl(device->channel, &gpadl);
> 
>  		/* If we failed here, we might as well return and have a leak
>  		 * rather than continue and a bugchk
> @@ -463,6 +470,7 @@ static int netvsc_init_buf(struct hv_device *device,
>  		ret = -ENOMEM;
>  		goto cleanup;
>  	}
> +	net_device->send_buf_size = buf_size;
> 
>  	/* Establish the gpadl handle for this buffer on this
>  	 * channel.  Note: This call uses the vmbus connection rather
> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
> index 652fe2547587..13c5df8dd11d 100644
> --- a/drivers/uio/uio_hv_generic.c
> +++ b/drivers/uio/uio_hv_generic.c
> @@ -179,14 +179,24 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
>  static void
>  hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
>  {
> +	struct vmbus_gpadl gpadl;
> +
>  	if (pdata->send_gpadl) {
> -		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl);
> +		gpadl.gpadl_handle = pdata->send_gpadl;
> +		gpadl.buffer = pdata->send_buf;
> +		gpadl.size = SEND_BUFFER_SIZE;
> +
> +		vmbus_teardown_gpadl(dev->channel, &gpadl);
>  		pdata->send_gpadl = 0;
>  		vfree(pdata->send_buf);
>  	}
> 
>  	if (pdata->recv_gpadl) {
> -		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl);
> +		gpadl.gpadl_handle = pdata->recv_gpadl;
> +		gpadl.buffer = pdata->recv_buf;
> +		gpadl.size = RECV_BUFFER_SIZE;
> +
> +		vmbus_teardown_gpadl(dev->channel, &gpadl);
>  		pdata->recv_gpadl = 0;
>  		vfree(pdata->recv_buf);
>  	}
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index ddc8713ce57b..757e09606fd3 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -803,6 +803,12 @@ struct vmbus_device {
> 
>  #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
> 
> +struct vmbus_gpadl {
> +	u32 gpadl_handle;
> +	u32 size;
> +	void *buffer;
> +};
> +
>  struct vmbus_channel {
>  	struct list_head listentry;
> 
> @@ -1195,7 +1201,7 @@ extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
>  				      u32 *gpadl_handle);
> 
>  extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
> -				     u32 gpadl_handle);
> +				     struct vmbus_gpadl *gpadl);
> 
>  void vmbus_reset_channel_cb(struct vmbus_channel *channel);
> 
> --
> 2.25.1

This isn't quite what I had in mind in my comments on v3 of this
patch series.  My idea is to store the full struct vmbus_gpadl
data structure in places where previously just the
u32 gpadl_handle was stored.  Then pass around a pointer to the
struct vmbus_gpadl where previously just the gpadl_handle (or a
pointer to it) was passed. This lets __vmbus_establish_gpadl()
fill in the actual handle value as well the other info (buffer pointer
and size) that vmbus_teardown_gpadl() needs.  Callers of the
gpadl functions don't need to worry about saving or finding
the right info.  Most of the changes are just tweaking the references
to what is now a struct instead of a u32.  

Here's a diff of what I had in mind.  My version also has
vmbus_teardown_gpadl() set the handle field to zero, rather than
each caller having to do it.  The code compiles, but I
have not done a runtime test.  This diff is a net +21 lines of code,
whereas your v3 and v4 patches were both +51 lines of code.

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c7..fc041ae 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include <linux/hyperv.h>
 #include <linux/uio.h>
 #include <linux/interrupt.h>
+#include <linux/set_memory.h>
 #include <asm/page.h>
 #include <asm/mshyperv.h>
 
@@ -456,7 +457,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
 static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 				   enum hv_gpadl_type type, void *kbuffer,
 				   u32 size, u32 send_offset,
-				   u32 *gpadl_handle)
+				   struct vmbus_gpadl *gpadl_handle)
 {
 	struct vmbus_channel_gpadl_header *gpadlmsg;
 	struct vmbus_channel_gpadl_body *gpadl_body;
@@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	if (ret)
 		return ret;
 
+	ret = set_memory_decrypted((unsigned long)kbuffer,
+				   HVPFN_UP(size));
+	if (ret) {
+		pr_warn("Failed to set host visibility for new GPADL %d.\n", ret);
+		return ret;
+	}
+
 	init_completion(&msginfo->waitevent);
 	msginfo->waiting_channel = channel;
 
@@ -537,7 +545,9 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	}
 
 	/* At this point, we received the gpadl created msg */
-	*gpadl_handle = gpadlmsg->gpadl;
+	gpadl_handle->handle = gpadlmsg->gpadl;
+	gpadl_handle->buffer = kbuffer;
+	gpadl_handle->size = size;
 
 cleanup:
 	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
@@ -549,6 +559,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	}
 
 	kfree(msginfo);
+
+	if (ret)
+		set_memory_encrypted((unsigned long)kbuffer,
+				     HVPFN_UP(size));
+
 	return ret;
 }
 
@@ -561,7 +576,7 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
  * @gpadl_handle: some funky thing
  */
 int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
-			  u32 size, u32 *gpadl_handle)
+			  u32 size, struct vmbus_gpadl *gpadl_handle)
 {
 	return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
 				       0U, gpadl_handle);
@@ -675,7 +690,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 		goto error_clean_ring;
 
 	/* Establish the gpadl for the ring buffer */
-	newchannel->ringbuffer_gpadlhandle = 0;
+	newchannel->ringbuffer_gpadlhandle.handle = 0;
 
 	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
 				      page_address(newchannel->ringbuffer_page),
@@ -701,7 +716,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	open_msg->header.msgtype = CHANNELMSG_OPENCHANNEL;
 	open_msg->openid = newchannel->offermsg.child_relid;
 	open_msg->child_relid = newchannel->offermsg.child_relid;
-	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
+	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle.handle;
 	/*
 	 * The unit of ->downstream_ringbuffer_pageoffset is HV_HYP_PAGE and
 	 * the unit of ->ringbuffer_send_offset (i.e. send_pages) is PAGE, so
@@ -759,8 +774,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 error_free_info:
 	kfree(open_info);
 error_free_gpadl:
-	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
-	newchannel->ringbuffer_gpadlhandle = 0;
+	vmbus_teardown_gpadl(newchannel, &newchannel->ringbuffer_gpadlhandle);
 error_clean_ring:
 	hv_ringbuffer_cleanup(&newchannel->outbound);
 	hv_ringbuffer_cleanup(&newchannel->inbound);
@@ -806,7 +820,7 @@ int vmbus_open(struct vmbus_channel *newchannel,
 /*
  * vmbus_teardown_gpadl -Teardown the specified GPADL handle
  */
-int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
+int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl *gpadl)
 {
 	struct vmbus_channel_gpadl_teardown *msg;
 	struct vmbus_channel_msginfo *info;
@@ -825,7 +839,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 
 	msg->header.msgtype = CHANNELMSG_GPADL_TEARDOWN;
 	msg->child_relid = channel->offermsg.child_relid;
-	msg->gpadl = gpadl_handle;
+	msg->gpadl = gpadl->handle;
 
 	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
 	list_add_tail(&info->msglistentry,
@@ -844,6 +858,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 		goto post_msg_err;
 
 	wait_for_completion(&info->waitevent);
+	gpadl->handle = 0;
 
 post_msg_err:
 	/*
@@ -859,6 +874,12 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
 
 	kfree(info);
+
+	ret = set_memory_encrypted((unsigned long)gpadl->buffer,
+				   HVPFN_UP(gpadl->size));
+	if (ret)
+		pr_warn("Fail to set mem host visibility in GPADL teardown %d.\n", ret);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
@@ -933,9 +954,9 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
 	}
 
 	/* Tear down the gpadl for the channel's ring buffer */
-	else if (channel->ringbuffer_gpadlhandle) {
+	else if (channel->ringbuffer_gpadlhandle.handle) {
 		ret = vmbus_teardown_gpadl(channel,
-					   channel->ringbuffer_gpadlhandle);
+					   &channel->ringbuffer_gpadlhandle);
 		if (ret) {
 			pr_err("Close failed: teardown gpadl return %d\n", ret);
 			/*
@@ -943,8 +964,6 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
 			 * it is perhaps better to leak memory.
 			 */
 		}
-
-		channel->ringbuffer_gpadlhandle = 0;
 	}
 
 	if (!ret)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855..54cbce1 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1075,14 +1075,14 @@ struct netvsc_device {
 	/* Receive buffer allocated by us but manages by NetVSP */
 	void *recv_buf;
 	u32 recv_buf_size; /* allocated bytes */
-	u32 recv_buf_gpadl_handle;
+	struct vmbus_gpadl recv_buf_gpadl_handle;
 	u32 recv_section_cnt;
 	u32 recv_section_size;
 	u32 recv_completion_cnt;
 
 	/* Send buffer allocated by us */
 	void *send_buf;
-	u32 send_buf_gpadl_handle;
+	struct vmbus_gpadl send_buf_gpadl_handle;
 	u32 send_section_cnt;
 	u32 send_section_size;
 	unsigned long *send_section_map;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd9354..585974c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -278,9 +278,9 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
 {
 	int ret;
 
-	if (net_device->recv_buf_gpadl_handle) {
+	if (net_device->recv_buf_gpadl_handle.handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
-					   net_device->recv_buf_gpadl_handle);
+					   &net_device->recv_buf_gpadl_handle);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -290,7 +290,6 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
 				   "unable to teardown receive buffer's gpadl\n");
 			return;
 		}
-		net_device->recv_buf_gpadl_handle = 0;
 	}
 }
 
@@ -300,9 +299,9 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
 {
 	int ret;
 
-	if (net_device->send_buf_gpadl_handle) {
+	if (net_device->send_buf_gpadl_handle.handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
-					   net_device->send_buf_gpadl_handle);
+					   &net_device->send_buf_gpadl_handle);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -312,7 +311,6 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
 				   "unable to teardown send buffer's gpadl\n");
 			return;
 		}
-		net_device->send_buf_gpadl_handle = 0;
 	}
 }
 
@@ -380,7 +378,7 @@ static int netvsc_init_buf(struct hv_device *device,
 	memset(init_packet, 0, sizeof(struct nvsp_message));
 	init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_RECV_BUF;
 	init_packet->msg.v1_msg.send_recv_buf.
-		gpadl_handle = net_device->recv_buf_gpadl_handle;
+		gpadl_handle = net_device->recv_buf_gpadl_handle.handle;
 	init_packet->msg.v1_msg.
 		send_recv_buf.id = NETVSC_RECEIVE_BUFFER_ID;
 
@@ -482,7 +480,7 @@ static int netvsc_init_buf(struct hv_device *device,
 	memset(init_packet, 0, sizeof(struct nvsp_message));
 	init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_SEND_BUF;
 	init_packet->msg.v1_msg.send_send_buf.gpadl_handle =
-		net_device->send_buf_gpadl_handle;
+		net_device->send_buf_gpadl_handle.handle;
 	init_packet->msg.v1_msg.send_send_buf.id = NETVSC_SEND_BUFFER_ID;
 
 	trace_nvsp_send(ndev, init_packet);
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 652fe25..97e08e7 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -58,11 +58,11 @@ struct hv_uio_private_data {
 	atomic_t refcnt;
 
 	void	*recv_buf;
-	u32	recv_gpadl;
+	struct vmbus_gpadl recv_gpadl;
 	char	recv_name[32];	/* "recv_4294967295" */
 
 	void	*send_buf;
-	u32	send_gpadl;
+	struct vmbus_gpadl send_gpadl;
 	char	send_name[32];
 };
 
@@ -179,15 +179,13 @@ static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
 static void
 hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
 {
-	if (pdata->send_gpadl) {
-		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl);
-		pdata->send_gpadl = 0;
+	if (pdata->send_gpadl.handle) {
+		vmbus_teardown_gpadl(dev->channel, &pdata->send_gpadl);
 		vfree(pdata->send_buf);
 	}
 
-	if (pdata->recv_gpadl) {
-		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl);
-		pdata->recv_gpadl = 0;
+	if (pdata->recv_gpadl.handle) {
+		vmbus_teardown_gpadl(dev->channel, &pdata->recv_gpadl);
 		vfree(pdata->recv_buf);
 	}
 }
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2e859d2..a0d64c3 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -809,6 +809,12 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl {
+	u32 handle;
+	u32 size;
+	void *buffer;
+};
+
 struct vmbus_channel {
 	struct list_head listentry;
 
@@ -828,7 +834,7 @@ struct vmbus_channel {
 	bool rescind_ref; /* got rescind msg, got channel reference */
 	struct completion rescind_event;
 
-	u32 ringbuffer_gpadlhandle;
+	struct vmbus_gpadl ringbuffer_gpadlhandle;
 
 	/* Allocated memory for ring buffer */
 	struct page *ringbuffer_page;
@@ -1208,10 +1214,10 @@ extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
 				      void *kbuffer,
 				      u32 size,
-				      u32 *gpadl_handle);
+				      struct vmbus_gpadl *gpadl_handle);
 
 extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
-				     u32 gpadl_handle);
+				     struct vmbus_gpadl *gpadl);
 
 void vmbus_reset_channel_cb(struct vmbus_channel *channel);


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM
  2021-08-27 17:21 ` [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM Tianyu Lan
@ 2021-09-02  0:20   ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:20 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 

Subject line tag should probably be "x86/hyperv:" since the majority
of the code added is under arch/x86.

> hyperv provides ghcb hvcall to handle VMBus
> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
> msg in SNP Isolation VM. Add such support.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Add hv_ghcb_hypercall() stub function to avoid
> 	  compile error for ARM.
> ---
>  arch/x86/hyperv/ivm.c          | 71 ++++++++++++++++++++++++++++++++++
>  drivers/hv/connection.c        |  6 ++-
>  drivers/hv/hv.c                |  8 +++-
>  drivers/hv/hv_common.c         |  6 +++
>  include/asm-generic/mshyperv.h |  1 +
>  5 files changed, 90 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index f56fe4f73000..e761c67e2218 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -17,10 +17,81 @@
>  #include <asm/io.h>
>  #include <asm/mshyperv.h>
> 
> +#define GHCB_USAGE_HYPERV_CALL	1
> +
>  union hv_ghcb {
>  	struct ghcb ghcb;
> +	struct {
> +		u64 hypercalldata[509];
> +		u64 outputgpa;
> +		union {
> +			union {
> +				struct {
> +					u32 callcode        : 16;
> +					u32 isfast          : 1;
> +					u32 reserved1       : 14;
> +					u32 isnested        : 1;
> +					u32 countofelements : 12;
> +					u32 reserved2       : 4;
> +					u32 repstartindex   : 12;
> +					u32 reserved3       : 4;
> +				};
> +				u64 asuint64;
> +			} hypercallinput;
> +			union {
> +				struct {
> +					u16 callstatus;
> +					u16 reserved1;
> +					u32 elementsprocessed : 12;
> +					u32 reserved2         : 20;
> +				};
> +				u64 asunit64;
> +			} hypercalloutput;
> +		};
> +		u64 reserved2;
> +	} hypercall;
>  } __packed __aligned(HV_HYP_PAGE_SIZE);
> 
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!hv_ghcb_pg)
> +		return -EFAULT;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return -EFAULT;
> +	}
> +
> +	hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
> +	hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
> +
> +	hv_ghcb->hypercall.outputgpa = (u64)output;
> +	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> +	hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> +	if (input_size)
> +		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> +	VMGEXIT();
> +
> +	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
> +	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
> +	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
> +
> +	local_irq_restore(flags);
> +
> +	return hv_ghcb->hypercall.hypercalloutput.callstatus;

The hypercall.hypercalloutput.callstatus value must be saved
in a local variable *before* the call to local_irq_restore().  Then
the local variable is the return value.  Once local_irq_restore()
is called, the GHCB page could get reused.

> +}
> +
>  void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>  	union hv_ghcb *hv_ghcb;
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 5e479d54918c..6d315c1465e0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
> 
>  	++channel->sig_events;
> 
> -	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> +	if (hv_isolation_type_snp())
> +		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
> +				NULL, sizeof(u64));

Better to use "sizeof(channel->sig_event)" instead of explicitly coding
the type.

> +	else
> +		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_set_event);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 97b21256a9db..d4531c64d9d3 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
>  	aligned_msg->payload_size = payload_size;
>  	memcpy((void *)aligned_msg->payload, payload, payload_size);
> 
> -	status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
> +	if (hv_isolation_type_snp())
> +		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
> +				(void *)aligned_msg, NULL,
> +				sizeof(struct hv_input_post_message));

As above, use "sizeof(*aligned_msg)".

> +	else
> +		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
> +				aligned_msg, NULL);
> 
>  	/* Preemption must remain disabled until after the hypercall
>  	 * so some other thread can't get scheduled onto this cpu and
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 1fc82d237161..7be173a99f27 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -289,3 +289,9 @@ void __weak hyperv_cleanup(void)
>  {
>  }
>  EXPORT_SYMBOL_GPL(hyperv_cleanup);
> +
> +u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> +	return HV_STATUS_INVALID_PARAMETER;
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 04a687d95eac..0da45807c36a 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -250,6 +250,7 @@ bool hv_is_hibernation_supported(void);
>  enum hv_isolation_type hv_get_isolation_type(void);
>  bool hv_is_isolation_supported(void);
>  bool hv_isolation_type_snp(void);
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
>  void hyperv_cleanup(void);
>  bool hv_query_ext_cap(u64 cap_query);
>  #else /* CONFIG_HYPERV */
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate  message
  2021-08-27 17:21 ` [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
@ 2021-09-02  0:21   ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:21 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 

Subject line tag should be "Drivers: hv: vmbus:"

> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary).
> 
> Introduce monitor_pages_original[] in the struct vmbus_connection
> to store monitor page virtual address returned by hv_alloc_hyperv_
> zeroed_page() and free monitor page via monitor_pages_original in
> the vmbus_disconnect(). The monitor_pages[] is to used to access
> monitor page and it is initialized to be equal with monitor_pages_
> original. The monitor_pages[] will be overridden in the isolation VM
> with va of extra address.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Rename monitor_pages_va with monitor_pages_original
> 	* free monitor page via monitor_pages_original and
> 	  monitor_pages is used to access monitor page.
> 
> Change since v1:
>         * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 75 ++++++++++++++++++++++++++++++++++++---
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 72 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6d315c1465e0..9a48d8115c87 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/hyperv.h>
>  #include <linux/export.h>
> +#include <linux/io.h>
>  #include <asm/mshyperv.h>
> 
>  #include "hyperv_vmbus.h"
> @@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
> 
>  	msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>  	msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> +	if (hv_isolation_type_snp()) {
> +		msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> +		msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> +	}
> +
>  	msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>  	/*
> @@ -148,6 +155,35 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
>  		return -ECONNREFUSED;
>  	}
> 
> +
> +	if (hv_is_isolation_supported()) {
> +		if (hv_isolation_type_snp()) {
> +			vmbus_connection.monitor_pages[0]
> +				= memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +					   MEMREMAP_WB);
> +			if (!vmbus_connection.monitor_pages[0])
> +				return -ENOMEM;
> +
> +			vmbus_connection.monitor_pages[1]
> +				= memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +					   MEMREMAP_WB);
> +			if (!vmbus_connection.monitor_pages[1]) {
> +				memunmap(vmbus_connection.monitor_pages[0]);
> +				return -ENOMEM;
> +			}
> +		}
> +
> +		/*
> +		 * Set memory host visibility hvcall smears memory
> +		 * and so zero monitor pages here.
> +		 */
> +		memset(vmbus_connection.monitor_pages[0], 0x00,
> +		       HV_HYP_PAGE_SIZE);
> +		memset(vmbus_connection.monitor_pages[1], 0x00,
> +		       HV_HYP_PAGE_SIZE);
> +
> +	}

I still find it somewhat confusing to have the handling of the
shared_gpa_boundary and memory mapping in the function for
negotiating the VMbus version.  I think the code works as written,
but it would seem cleaner and easier to understand to precompute
the physical addresses and do all the mapping and memory zero'ing
in a single place in vmbus_connect().  Then the negotiate version
function can focus on doing only the version negotiation.

> +
>  	return ret;
>  }
> 
> @@ -159,6 +195,7 @@ int vmbus_connect(void)
>  	struct vmbus_channel_msginfo *msginfo = NULL;
>  	int i, ret = 0;
>  	__u32 version;
> +	u64 pfn[2];
> 
>  	/* Initialize the vmbus connection */
>  	vmbus_connection.conn_state = CONNECTING;
> @@ -216,6 +253,21 @@ int vmbus_connect(void)
>  		goto cleanup;
>  	}
> 
> +	vmbus_connection.monitor_pages_original[0]
> +		= vmbus_connection.monitor_pages[0];
> +	vmbus_connection.monitor_pages_original[1]
> +		= vmbus_connection.monitor_pages[1];
> +
> +	if (hv_is_isolation_supported()) {
> +		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> +		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> +		if (hv_mark_gpa_visibility(2, pfn,
> +				VMBUS_PAGE_VISIBLE_READ_WRITE)) {

In Patch 4 of this series, host visibility for the specified buffer is done
by calling set_memory_decrypted()/set_memory_encrypted().  Could
the same be done here?   The code would be more consistent overall
with better encapsulation.  hv_mark_gpa_visibility() would not need to
be exported or need an ARM64 stub.

set_memory_decrypted()/encrypted() seem to be the primary functions
that should be used for this purpose, and they have already have the
appropriate stubs for architectures that don't support memory encryption.

> +			ret = -EFAULT;
> +			goto cleanup;
> +		}
> +	}
> +
>  	msginfo = kzalloc(sizeof(*msginfo) +
>  			  sizeof(struct vmbus_channel_initiate_contact),
>  			  GFP_KERNEL);
> @@ -284,6 +336,8 @@ int vmbus_connect(void)
> 
>  void vmbus_disconnect(void)
>  {
> +	u64 pfn[2];
> +
>  	/*
>  	 * First send the unload request to the host.
>  	 */
> @@ -303,10 +357,23 @@ void vmbus_disconnect(void)
>  		vmbus_connection.int_page = NULL;
>  	}
> 
> -	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
> -	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
> -	vmbus_connection.monitor_pages[0] = NULL;
> -	vmbus_connection.monitor_pages[1] = NULL;
> +	if (hv_is_isolation_supported()) {
> +		memunmap(vmbus_connection.monitor_pages[0]);
> +		memunmap(vmbus_connection.monitor_pages[1]);
> +
> +		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> +		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> +		hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);

Same comment about using set_memory_encrypted() instead.

> +	}
> +
> +	hv_free_hyperv_page((unsigned long)
> +		vmbus_connection.monitor_pages_original[0]);
> +	hv_free_hyperv_page((unsigned long)
> +		vmbus_connection.monitor_pages_original[1]);
> +	vmbus_connection.monitor_pages_original[0] =
> +		vmbus_connection.monitor_pages[0] = NULL;
> +	vmbus_connection.monitor_pages_original[1] =
> +		vmbus_connection.monitor_pages[1] = NULL;
>  }
> 
>  /*
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 42f3d9d123a1..7cb11ef694da 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -240,6 +240,7 @@ struct vmbus_connection {
>  	 * is child->parent notification
>  	 */
>  	struct hv_monitor_page *monitor_pages[2];
> +	void *monitor_pages_original[2];
>  	struct list_head chn_msg_list;
>  	spinlock_t channelmsg_lock;
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-08-27 17:21 ` [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
@ 2021-09-02  0:23   ` Michael Kelley
  2021-09-02 13:35     ` Tianyu Lan
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  0:23 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 

Subject tag should be "Drivers: hv: vmbus: "

> VMbus ring buffer are shared with host and it's need to
> be accessed via extra address space of Isolation VM with
> AMD SNP support. This patch is to map the ring buffer
> address in extra address space via vmap_pfn(). Hyperv set
> memory host visibility hvcall smears data in the ring buffer
> and so reset the ring buffer memory to zero after mapping.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Remove hv_ringbuffer_post_init(), merge map
> 	operation for Isolation VM into hv_ringbuffer_init()
> 	* Call hv_ringbuffer_init() after __vmbus_establish_gpadl().
> ---
>  drivers/hv/Kconfig       |  1 +
>  drivers/hv/channel.c     | 19 +++++++-------
>  drivers/hv/ring_buffer.c | 56 ++++++++++++++++++++++++++++++----------
>  3 files changed, 54 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index d1123ceb38f3..dd12af20e467 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -8,6 +8,7 @@ config HYPERV
>  		|| (ARM64 && !CPU_BIG_ENDIAN))
>  	select PARAVIRT
>  	select X86_HV_CALLBACK_VECTOR if X86
> +	select VMAP_PFN
>  	help
>  	  Select this option to run Linux as a Hyper-V client operating
>  	  system.
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 82650beb3af0..81f8629e4491 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -679,15 +679,6 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	if (!newchannel->max_pkt_size)
>  		newchannel->max_pkt_size = VMBUS_DEFAULT_MAX_PKT_SIZE;
> 
> -	err = hv_ringbuffer_init(&newchannel->outbound, page, send_pages, 0);
> -	if (err)
> -		goto error_clean_ring;
> -
> -	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
> -				 recv_pages, newchannel->max_pkt_size);
> -	if (err)
> -		goto error_clean_ring;
> -
>  	/* Establish the gpadl for the ring buffer */
>  	newchannel->ringbuffer_gpadlhandle = 0;
> 
> @@ -699,6 +690,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	if (err)
>  		goto error_clean_ring;
> 
> +	err = hv_ringbuffer_init(&newchannel->outbound,
> +				 page, send_pages, 0);
> +	if (err)
> +		goto error_free_gpadl;
> +
> +	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
> +				 recv_pages, newchannel->max_pkt_size);
> +	if (err)
> +		goto error_free_gpadl;
> +
>  	/* Create and init the channel open message */
>  	open_info = kzalloc(sizeof(*open_info) +
>  			   sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 2aee356840a2..24d64d18eb65 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include <linux/vmalloc.h>
>  #include <linux/slab.h>
>  #include <linux/prefetch.h>
> +#include <linux/io.h>
> +#include <asm/mshyperv.h>
> 
>  #include "hyperv_vmbus.h"
> 
> @@ -183,8 +185,10 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
>  {
> -	int i;
>  	struct page **pages_wraparound;
> +	unsigned long *pfns_wraparound;
> +	u64 pfn;
> +	int i;
> 
>  	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
> 
> @@ -192,23 +196,49 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  	 * First page holds struct hv_ring_buffer, do wraparound mapping for
>  	 * the rest.
>  	 */
> -	pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> -				   GFP_KERNEL);
> -	if (!pages_wraparound)
> -		return -ENOMEM;
> +	if (hv_isolation_type_snp()) {
> +		pfn = page_to_pfn(pages) +
> +			HVPFN_DOWN(ms_hyperv.shared_gpa_boundary);

Use PFN_DOWN, not HVPFN_DOWN.  This is all done in units of guest page
size, not Hyper-V page size.

> 
> -	pages_wraparound[0] = pages;
> -	for (i = 0; i < 2 * (page_cnt - 1); i++)
> -		pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
> +		pfns_wraparound = kcalloc(page_cnt * 2 - 1,
> +			sizeof(unsigned long), GFP_KERNEL);
> +		if (!pfns_wraparound)
> +			return -ENOMEM;
> 
> -	ring_info->ring_buffer = (struct hv_ring_buffer *)
> -		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> +		pfns_wraparound[0] = pfn;
> +		for (i = 0; i < 2 * (page_cnt - 1); i++)
> +			pfns_wraparound[i + 1] = pfn + i % (page_cnt - 1) + 1;
> 
> -	kfree(pages_wraparound);
> +		ring_info->ring_buffer = (struct hv_ring_buffer *)
> +			vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
> +				 PAGE_KERNEL);
> +		kfree(pfns_wraparound);
> 
> +		if (!ring_info->ring_buffer)
> +			return -ENOMEM;
> +
> +		/* Zero ring buffer after setting memory host visibility. */
> +		memset(ring_info->ring_buffer, 0x00,
> +			HV_HYP_PAGE_SIZE * page_cnt);

The page_cnt parameter is in units of the guest page size.  So this
should use PAGE_SIZE, not HV_HYP_PAGE_SIZE.

> +	} else {
> +		pages_wraparound = kcalloc(page_cnt * 2 - 1,
> +					   sizeof(struct page *),
> +					   GFP_KERNEL);
> +
> +		pages_wraparound[0] = pages;
> +		for (i = 0; i < 2 * (page_cnt - 1); i++)
> +			pages_wraparound[i + 1] =
> +				&pages[i % (page_cnt - 1) + 1];
> +
> +		ring_info->ring_buffer = (struct hv_ring_buffer *)
> +			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
> +				PAGE_KERNEL);
> +
> +		kfree(pages_wraparound);
> +		if (!ring_info->ring_buffer)
> +			return -ENOMEM;
> +	}

With this patch, the code is a big "if" statement with two halves -- one
when SNP isolation is in effect, and the other when not.  The SNP isolation
case does the work using PFNs with the shared_gpa_boundary added,
while the other case does the same work but using struct page.  Perhaps
I'm missing something, but can both halves be combined and always
do the work using PFNs?  The only difference is whether to add the
shared_gpa_boundary, and whether to zero the memory when done.
So get the starting PFN, then have an "if" statement for whether to
add the shared_gpa_boundary.  Then everything else is the same.
At the end, use an "if" statement to decide whether to zero the
memory.  It would really be better to have the logic in this algorithm
coded only once.

> 
> -	if (!ring_info->ring_buffer)
> -		return -ENOMEM;
> 
>  	ring_info->ring_buffer->read_index =
>  		ring_info->ring_buffer->write_index = 0;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  2021-08-27 17:21 ` [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
@ 2021-09-02  1:27   ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  1:27 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Swiotlb bounce buffer code calls dma_map_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space. Populate dma memory decrypted ops with hv
> map/unmap function.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> The map function vmap_pfn() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
>        * Get hyperv bounce bufffer size via default swiotlb
>        bounce buffer size function and keep default size as
>        same as the one in the AMD SEV VM.
> ---
>  arch/x86/hyperv/ivm.c           | 28 +++++++++++++++
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  arch/x86/mm/mem_encrypt.c       |  3 +-
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c          |  3 ++
>  drivers/iommu/hyperv-iommu.c    | 61 +++++++++++++++++++++++++++++++++
>  include/linux/hyperv.h          |  1 +
>  7 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index e761c67e2218..84563b3c9f3a 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -294,3 +294,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
> 
>  	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
>  }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +void *hv_map_memory(void *addr, unsigned long size)
> +{
> +	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
> +				      sizeof(unsigned long), GFP_KERNEL);

Should be PAGE_SIZE, not HV_HYP_PAGE_SIZE, since this code
only manipulates guest page tables.  There's no communication with
Hyper-V that requires HV_HYP_PAGE_SIZE.

> +	void *vaddr;
> +	int i;
> +
> +	if (!pfns)
> +		return NULL;
> +
> +	for (i = 0; i < size / PAGE_SIZE; i++)
> +		pfns[i] = virt_to_hvpfn(addr + i * PAGE_SIZE) +

Use virt_to_pfn(), not virt_to_hvpfn(), for the same reason.

> +			(ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
> +
> +	vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
> +	kfree(pfns);
> +
> +	return vaddr;
> +}
> +
> +void hv_unmap_memory(void *addr)
> +{
> +	vunmap(addr);
> +}
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index b77f4caee3ee..627fcf8d443c 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -252,6 +252,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  			   enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);
>  void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
>  void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..e2db0b8ed938 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -30,6 +30,7 @@
>  #include <asm/processor-flags.h>
>  #include <asm/msr.h>
>  #include <asm/cmdline.h>
> +#include <asm/mshyperv.h>
> 
>  #include "mm_internal.h"
> 
> @@ -202,7 +203,7 @@ void __init sev_setup_arch(void)
>  	phys_addr_t total_mem = memblock_phys_mem_size();
>  	unsigned long size;
> 
> -	if (!sev_active())
> +	if (!sev_active() && !hv_is_isolation_supported())
>  		return;
> 
>  	/*
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include <linux/dma-map-ops.h>
>  #include <linux/pci.h>
> +#include <linux/hyperv.h>
>  #include <xen/swiotlb-xen.h>
> 
>  #include <asm/xen/hypervisor.h>
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -		  NULL,
> +		  hyperv_swiotlb_detect,
>  		  pci_xen_swiotlb_init,
>  		  NULL);
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 57bbbaa4e8f7..f068e22a5636 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -23,6 +23,7 @@
>  #include <linux/cpu.h>
>  #include <linux/sched/task_stack.h>
> 
> +#include <linux/dma-map-ops.h>
>  #include <linux/delay.h>
>  #include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
> @@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
>  	return child_device_obj;
>  }
> 
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
>  /*
>   * vmbus_device_register - Register the child device
>   */
> @@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
>  	}
>  	hv_debug_add_dev_dir(child_device_obj);
> 
> +	child_device_obj->device.dma_mask = &vmbus_dma_mask;
>  	return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index e285a220c913..899563551574 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -13,14 +13,22 @@
>  #include <linux/irq.h>
>  #include <linux/iommu.h>
>  #include <linux/module.h>
> +#include <linux/hyperv.h>
> +#include <linux/io.h>
> 
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hw_irq.h>
>  #include <asm/io_apic.h>
> +#include <asm/iommu.h>
> +#include <asm/iommu_table.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/hypervisor.h>
>  #include <asm/mshyperv.h>
> +#include <asm/swiotlb.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-direct.h>
> +#include <linux/set_memory.h>
> 
>  #include "irq_remapping.h"
> 
> @@ -36,6 +44,9 @@
>  static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
>  static struct irq_domain *ioapic_ir_domain;
> 
> +static unsigned long hyperv_io_tlb_size;
> +static void *hyperv_io_tlb_start;
> +
>  static int hyperv_ir_set_affinity(struct irq_data *data,
>  		const struct cpumask *mask, bool force)
>  {
> @@ -337,4 +348,54 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
>  	.free = hyperv_root_irq_remapping_free,
>  };
> 
> +void __init hyperv_iommu_swiotlb_init(void)
> +{
> +	/*
> +	 * Allocate Hyper-V swiotlb bounce buffer at early place
> +	 * to reserve large contiguous memory.
> +	 */
> +	hyperv_io_tlb_size = swiotlb_size_or_default();
> +	hyperv_io_tlb_start = memblock_alloc(
> +		hyperv_io_tlb_size, HV_HYP_PAGE_SIZE);

Could the alignment be specified as just PAGE_SIZE?  I don't
see any particular relationship here to the Hyper-V page size.

> +
> +	if (!hyperv_io_tlb_start) {
> +		pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
> +		return;
> +	}
> +}
> +
> +int __init hyperv_swiotlb_detect(void)
> +{
> +	if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
> +	    && hv_is_isolation_supported()) {
> +		/*
> +		 * Enable swiotlb force mode in Isolation VM to
> +		 * use swiotlb bounce buffer for dma transaction.
> +		 */
> +		swiotlb_force = SWIOTLB_FORCE;
> +
> +		dma_memory_generic_decrypted_ops.map = hv_map_memory;
> +		dma_memory_generic_decrypted_ops.unmap = hv_unmap_memory;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +void __init hyperv_iommu_swiotlb_later_init(void)
> +{
> +	/*
> +	 * Swiotlb bounce buffer needs to be mapped in extra address
> +	 * space. Map function doesn't work in the early place and so
> +	 * call swiotlb_late_init_with_tbl() here.
> +	 */
> +	if (swiotlb_late_init_with_tbl(hyperv_io_tlb_start,
> +				       hyperv_io_tlb_size >> IO_TLB_SHIFT))
> +		panic("Fail to initialize hyperv swiotlb.\n");
> +}
> +
> +IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
> +		  NULL, hyperv_iommu_swiotlb_init,
> +		  hyperv_iommu_swiotlb_later_init);
> +
>  #endif
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 757e09606fd3..724a735d722a 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1739,6 +1739,7 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
>  int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
>  				void (*block_invalidate)(void *context,
>  							 u64 block_mask));
> +int __init hyperv_swiotlb_detect(void);
> 
>  struct hyperv_pci_block_ops {
>  	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver
  2021-08-27 17:21 ` [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
@ 2021-09-02  2:08   ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  2:08 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 

Per previous comment, the Subject line tag should be "scsi: storvsc: "

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(dma_map_sg) to map
> these memory during sending/receiving packet and return swiotlb bounce
> buffer dma address. In Isolation VM, swiotlb  bounce buffer is marked
> to be visible to host and the swiotlb force mode is enabled.
> 
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
> keep the original data offset in the bounce buffer.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Rplace dma_map_page with dma_map_sg()
> 	* Use for_each_sg() to populate payload->range.pfn_array.
> 	* Remove storvsc_dma_map macro
> ---
>  drivers/hv/vmbus_drv.c     |  1 +
>  drivers/scsi/storvsc_drv.c | 41 +++++++++++++++-----------------------
>  include/linux/hyperv.h     |  1 +
>  3 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f068e22a5636..270d526fd9de 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2124,6 +2124,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
>  	hv_debug_add_dev_dir(child_device_obj);
> 
>  	child_device_obj->device.dma_mask = &vmbus_dma_mask;
> +	child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
>  	return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 328bb961c281..4f1793be1fdc 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include <linux/device.h>
>  #include <linux/hyperv.h>
>  #include <linux/blkdev.h>
> +#include <linux/dma-mapping.h>
> +
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
>  #include <scsi/scsi_host.h>
> @@ -1312,6 +1314,9 @@ static void storvsc_on_channel_callback(void *context)
>  					continue;
>  				}
>  				request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
> +				if (scsi_sg_count(scmnd))
> +					dma_unmap_sg(&device->device, scsi_sglist(scmnd),
> +						     scsi_sg_count(scmnd), scmnd->sc_data_direction);

Use scsi_dma_unmap(), which does exactly what you have written
above. :-)

>  			}
> 
>  			storvsc_on_receive(stor_device, packet, request);
> @@ -1725,7 +1730,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  	struct hv_host_device *host_dev = shost_priv(host);
>  	struct hv_device *dev = host_dev->dev;
>  	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> -	int i;
>  	struct scatterlist *sgl;
>  	unsigned int sg_count;
>  	struct vmscsi_request *vm_srb;
> @@ -1807,10 +1811,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  	payload_sz = sizeof(cmd_request->mpb);
> 
>  	if (sg_count) {
> -		unsigned int hvpgoff, hvpfns_to_add;
>  		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>  		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> -		u64 hvpfn;
> +		struct scatterlist *sg;
> +		unsigned long hvpfn, hvpfns_to_add;
> +		int j, i = 0;
> 
>  		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1824,31 +1829,16 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  		payload->range.len = length;
>  		payload->range.offset = offset_in_hvpg;
> 
> +		if (dma_map_sg(&dev->device, sgl, sg_count,
> +		    scmnd->sc_data_direction) == 0)
> +			return SCSI_MLQUEUE_DEVICE_BUSY;
> 
> -		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> -			/*
> -			 * Init values for the current sgl entry. hvpgoff
> -			 * and hvpfns_to_add are in units of Hyper-V size
> -			 * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
> -			 * case also handles values of sgl->offset that are
> -			 * larger than PAGE_SIZE. Such offsets are handled
> -			 * even on other than the first sgl entry, provided
> -			 * they are a multiple of PAGE_SIZE.
> -			 */

Any reason not to keep this comment?  It's still correct and
mentions important cases that must be handled.

> -			hvpgoff = HVPFN_DOWN(sgl->offset);
> -			hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> -			hvpfns_to_add =	HVPFN_UP(sgl->offset + sgl->length) -
> -						hvpgoff;
> +		for_each_sg(sgl, sg, sg_count, j) {

There's a subtle issue here in that the number of entries in the
mapped sgl might not be the same as the number of entries prior
to the mapping.  A change in the count probably never happens for
the direct DMA mapping being done here, but let's code to be
correct in the general case.  Either need to refetch the value of
sg_count, or arrange to use something like for_each_sgtable_dma_sg().

> +			hvpfns_to_add = HVPFN_UP(sg_dma_len(sg));

This simplification in calculating hvpnfs_to_add is not correct.  Consider
the case of one sgl entry specifying a buffer of 3 Kbytes that starts at a
2K offset in the first page and runs over into the second page.  This case
can happen when the physical memory for the two pages is contiguous
due to random happenstance, due to huge pages, or due to being on an
architecture like ARM64 where the guest page size may be larger than
the Hyper-V page size.

In this case, we need two Hyper-V PFNs because the buffer crosses a
Hyper-V page boundary.   But the above will calculate only one PFN.
The original algorithm handles this case correctly.

> +			hvpfn = HVPFN_DOWN(sg_dma_address(sg));
> 
> -			/*
> -			 * Fill the next portion of the PFN array with
> -			 * sequential Hyper-V PFNs for the continguous physical
> -			 * memory described by the sgl entry. The end of the
> -			 * last sgl should be reached at the same time that
> -			 * the PFN array is filled.
> -			 */

Any reason not to keep this comment?  It's still correct.

>  			while (hvpfns_to_add--)
> -				payload->range.pfn_array[i++] =	hvpfn++;
> +				payload->range.pfn_array[i++] = hvpfn++;
>  		}
>  	}
> 
> @@ -1992,6 +1982,7 @@ static int storvsc_probe(struct hv_device *device,
>  	stor_device->vmscsi_size_delta = sizeof(struct vmscsi_win8_extension);
>  	spin_lock_init(&stor_device->lock);
>  	hv_set_drvdata(device, stor_device);
> +	dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);
> 
>  	stor_device->port_number = host->host_no;
>  	ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 139a43ad65a1..8f39893f8ccf 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1274,6 +1274,7 @@ struct hv_device {
> 
>  	struct vmbus_channel *channel;
>  	struct kset	     *channels_kset;
> +	struct device_dma_parameters dma_parms;
> 
>  	/* place holder to keep track of the dir for hv device in debugfs */
>  	struct dentry *debug_dir;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver
  2021-08-27 17:21 ` [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
@ 2021-09-02  2:34   ` Michael Kelley
  2021-09-02  4:56     ` Michael Kelley
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  2:34 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma adress will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Add comment to explain why not to use dma_map_sg()
> 	* Fix some error handle.
> ---
>  arch/x86/hyperv/ivm.c             |   1 +
>  drivers/net/hyperv/hyperv_net.h   |   5 ++
>  drivers/net/hyperv/netvsc.c       | 135 +++++++++++++++++++++++++++++-
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h            |   5 ++
>  5 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 84563b3c9f3a..08d8e01de017 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -317,6 +317,7 @@ void *hv_map_memory(void *addr, unsigned long size)
> 
>  	return vaddr;
>  }
> +EXPORT_SYMBOL_GPL(hv_map_memory);
> 
>  void hv_unmap_memory(void *addr)
>  {
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index aa7c9962dbd8..862419912bfb 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>  	u32 total_bytes;
>  	u32 send_buf_index;
>  	u32 total_data_buflen;
> +	struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>  	/* Receive buffer allocated by us but manages by NetVSP */
>  	void *recv_buf;
> +	void *recv_original_buf;
>  	u32 recv_buf_size; /* allocated bytes */
>  	u32 recv_buf_gpadl_handle;
>  	u32 recv_section_cnt;
> @@ -1082,6 +1084,7 @@ struct netvsc_device {
> 
>  	/* Send buffer allocated by us */
>  	void *send_buf;
> +	void *send_original_buf;
>  	u32 send_buf_size;
>  	u32 send_buf_gpadl_handle;
>  	u32 send_section_cnt;
> @@ -1731,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI	10000
>  #define RETRY_MAX	2000	/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +		      struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index f19bffff6a63..edd336b08c2c 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
>  	int i;
> 
>  	kfree(nvdev->extension);
> -	vfree(nvdev->recv_buf);
> -	vfree(nvdev->send_buf);
> +
> +	if (nvdev->recv_original_buf) {
> +		vunmap(nvdev->recv_buf);

In patch 11, you have added a hv_unmap_memory()
function as the inverse of hv_map_memory().  Since this
buffer was mapped with hv_map_memory() and you have
added that function, the cleanup should use
hv_unmap_memory() rather than calling vunmap() directly.

> +		vfree(nvdev->recv_original_buf);
> +	} else {
> +		vfree(nvdev->recv_buf);
> +	}
> +
> +	if (nvdev->send_original_buf) {
> +		vunmap(nvdev->send_buf);

Same here.

> +		vfree(nvdev->send_original_buf);
> +	} else {
> +		vfree(nvdev->send_buf);
> +	}
> +
>  	kfree(nvdev->send_section_map);
> 
>  	for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -347,6 +360,7 @@ static int netvsc_init_buf(struct hv_device *device,
>  	unsigned int buf_size;
>  	size_t map_words;
>  	int i, ret = 0;
> +	void *vaddr;
> 
>  	/* Get receive buffer area. */
>  	buf_size = device_info->recv_sections * device_info->recv_section_size;
> @@ -382,6 +396,17 @@ static int netvsc_init_buf(struct hv_device *device,
>  		goto cleanup;
>  	}
> 
> +	if (hv_isolation_type_snp()) {
> +		vaddr = hv_map_memory(net_device->recv_buf, buf_size);

Since the netvsc driver is architecture neutral, this code also needs
to compile for ARM64.  A stub will be needed for hv_map_memory()
on the ARM64 side.  Same for hv_unmap_memory() as suggested
above.  Or better, move hv_map_memory() and hv_unmap_memory()
to an architecture neutral module such as hv_common.c.

Or if Christop's approach of creating the vmap_phys_addr() helper
comes to fruition, that's an even better approach since it will already
handle multiple architectures.

> +		if (!vaddr) {
> +			ret = -ENOMEM;
> +			goto cleanup;
> +		}
> +
> +		net_device->recv_original_buf = net_device->recv_buf;
> +		net_device->recv_buf = vaddr;
> +	}
> +
>  	/* Notify the NetVsp of the gpadl handle */
>  	init_packet = &net_device->channel_init_pkt;
>  	memset(init_packet, 0, sizeof(struct nvsp_message));
> @@ -485,6 +510,17 @@ static int netvsc_init_buf(struct hv_device *device,
>  		goto cleanup;
>  	}
> 
> +	if (hv_isolation_type_snp()) {
> +		vaddr = hv_map_memory(net_device->send_buf, buf_size);
> +		if (!vaddr) {
> +			ret = -ENOMEM;
> +			goto cleanup;
> +		}
> +
> +		net_device->send_original_buf = net_device->send_buf;
> +		net_device->send_buf = vaddr;
> +	}
> +
>  	/* Notify the NetVsp of the gpadl handle */
>  	init_packet = &net_device->channel_init_pkt;
>  	memset(init_packet, 0, sizeof(struct nvsp_message));
> @@ -775,7 +811,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
> 
>  	/* Notify the layer above us */
>  	if (likely(skb)) {
> -		const struct hv_netvsc_packet *packet
> +		struct hv_netvsc_packet *packet
>  			= (struct hv_netvsc_packet *)skb->cb;
>  		u32 send_index = packet->send_buf_index;
>  		struct netvsc_stats *tx_stats;
> @@ -791,6 +827,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
>  		tx_stats->bytes += packet->total_bytes;
>  		u64_stats_update_end(&tx_stats->syncp);
> 
> +		netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>  		napi_consume_skb(skb, budget);
>  	}
> 
> @@ -955,6 +992,87 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
>  		memset(dest, 0, padding);
>  }
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +		      struct hv_netvsc_packet *packet)
> +{
> +	u32 page_count = packet->cp_partial ?
> +		packet->page_buf_cnt - packet->rmsg_pgcnt :
> +		packet->page_buf_cnt;
> +	int i;
> +
> +	if (!hv_is_isolation_supported())
> +		return;
> +
> +	if (!packet->dma_range)
> +		return;
> +
> +	for (i = 0; i < page_count; i++)
> +		dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
> +				 packet->dma_range[i].mapping_size,
> +				 DMA_TO_DEVICE);
> +
> +	kfree(packet->dma_range);
> +}
> +
> +/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
> + * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
> + * VM.
> + *
> + * In isolation VM, netvsc send buffer has been marked visible to
> + * host and so the data copied to send buffer doesn't need to use
> + * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
> + * may not be copied to send buffer and so these pages need to be
> + * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
> + * that. The pfns in the struct hv_page_buffer need to be converted
> + * to bounce buffer's pfn. The loop here is necessary becuase the

s/becuase/because/

> + * entries in the page buffer array are not necessarily full
> + * pages of data.  Each entry in the array has a separate offset and
> + * len that may be non-zero, even for entries in the middle of the
> + * array.  And the entries are not physically contiguous.  So each
> + * entry must be individually mapped rather than as a contiguous unit.
> + * So not use dma_map_sg() here.
> + */
> +int netvsc_dma_map(struct hv_device *hv_dev,
> +		   struct hv_netvsc_packet *packet,
> +		   struct hv_page_buffer *pb)
> +{
> +	u32 page_count =  packet->cp_partial ?
> +		packet->page_buf_cnt - packet->rmsg_pgcnt :
> +		packet->page_buf_cnt;
> +	dma_addr_t dma;
> +	int i;
> +
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	packet->dma_range = kcalloc(page_count,
> +				    sizeof(*packet->dma_range),
> +				    GFP_KERNEL);
> +	if (!packet->dma_range)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < page_count; i++) {
> +		char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> +					 + pb[i].offset);
> +		u32 len = pb[i].len;
> +
> +		dma = dma_map_single(&hv_dev->device, src, len,
> +				     DMA_TO_DEVICE);
> +		if (dma_mapping_error(&hv_dev->device, dma)) {
> +			kfree(packet->dma_range);
> +			return -ENOMEM;
> +		}
> +
> +		packet->dma_range[i].dma = dma;
> +		packet->dma_range[i].mapping_size = len;
> +		pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
> +		pb[i].offset = offset_in_hvpage(dma);
> +		pb[i].len = len;
> +	}

Just to confirm, this driver does *not* set the DMA min_align_mask
like storvsc does.  So after the call to dma_map_single(), the offset
in the page could be different.  That's why you are updating
the pb[i].offset value.  Alternatively, you could set the DMA
min_align_mask, which would ensure the offset is unchanged.
I'm OK with either approach, though perhaps a comment is
warranted to explain, as this is a subtle issue.

> +
> +	return 0;
> +}
> +
>  static inline int netvsc_send_pkt(
>  	struct hv_device *device,
>  	struct hv_netvsc_packet *packet,
> @@ -995,14 +1113,24 @@ static inline int netvsc_send_pkt(
> 
>  	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
> 
> +	packet->dma_range = NULL;
>  	if (packet->page_buf_cnt) {
>  		if (packet->cp_partial)
>  			pb += packet->rmsg_pgcnt;
> 
> +		ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> +		if (ret) {
> +			ret = -EAGAIN;
> +			goto exit;
> +		}
> +
>  		ret = vmbus_sendpacket_pagebuffer(out_channel,
>  						  pb, packet->page_buf_cnt,
>  						  &nvmsg, sizeof(nvmsg),
>  						  req_id);
> +
> +		if (ret)
> +			netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>  	} else {
>  		ret = vmbus_sendpacket(out_channel,
>  				       &nvmsg, sizeof(nvmsg),
> @@ -1010,6 +1138,7 @@ static inline int netvsc_send_pkt(
>  				       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
>  	}
> 
> +exit:
>  	if (ret == 0) {
>  		atomic_inc_return(&nvchan->queue_sends);
> 
> diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
> index f6c9c2a670f9..448fcc325ed7 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct net_device *ndev,
>  			}
>  		}
> 
> +		netvsc_dma_unmap(((struct net_device_context *)
> +			netdev_priv(ndev))->device_ctx, &request->pkt);
>  		complete(&request->wait_event);
>  	} else {
>  		netdev_err(ndev,
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 724a735d722a..139a43ad65a1 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1596,6 +1596,11 @@ struct hyperv_service_callback {
>  	void (*callback)(void *context);
>  };
> 
> +struct hv_dma_range {
> +	dma_addr_t dma;
> +	u32 mapping_size;
> +};
> +
>  #define MAX_SRV_VER	0x7ffffff
>  extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf, u32 buflen,
>  				const int *fw_version, int fw_vercnt,
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page
  2021-08-27 17:21 ` [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
  2021-08-27 17:41   ` Greg KH
@ 2021-09-02  3:32   ` Michael Kelley
  1 sibling, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  3:32 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyperv provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyperv requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB page.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v1:
>          * Introduce sev_es_ghcb_hv_call_simple() and share code
>            between SEV and Hyper-V code.
> Change since v3:
>          * Pass old_msg_type to hv_signal_eom() as parameter.
> 	 * Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
> 	 * Add hv_isolation_type_snp() weak function.
> 	 * Add maros to set syinc register in ARM code.
> ---
>  arch/arm64/include/asm/mshyperv.h |  23 ++++++
>  arch/x86/hyperv/hv_init.c         |  36 ++--------
>  arch/x86/hyperv/ivm.c             | 112 ++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h   |  80 ++++++++++++++++++++-
>  arch/x86/include/asm/sev.h        |   3 +
>  arch/x86/kernel/sev-shared.c      |  63 ++++++++++-------
>  drivers/hv/hv.c                   | 112 ++++++++++++++++++++----------
>  drivers/hv/hv_common.c            |   6 ++
>  include/asm-generic/mshyperv.h    |   4 +-
>  9 files changed, 345 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> index 20070a847304..ced83297e009 100644
> --- a/arch/arm64/include/asm/mshyperv.h
> +++ b/arch/arm64/include/asm/mshyperv.h
> @@ -41,6 +41,29 @@ static inline u64 hv_get_register(unsigned int reg)
>  	return hv_get_vpreg(reg);
>  }
> 
> +#define hv_get_simp(val)	{ val = hv_get_register(HV_REGISTER_SIMP); }
> +#define hv_set_simp(val)	hv_set_register(HV_REGISTER_SIMP, val)
> +
> +#define hv_get_siefp(val)	{ val = hv_get_register(HV_REGISTER_SIEFP); }
> +#define hv_set_siefp(val)	hv_set_register(HV_REGISTER_SIEFP, val)
> +
> +#define hv_get_synint_state(int_num, val) {			\
> +	val = hv_get_register(HV_REGISTER_SINT0 + int_num);	\
> +	}
> +
> +#define hv_set_synint_state(int_num, val)			\
> +	hv_set_register(HV_REGISTER_SINT0 + int_num, val)
> +
> +#define hv_get_synic_state(val) {			\
> +	val = hv_get_register(HV_REGISTER_SCONTROL);	\
> +	}
> +
> +#define hv_set_synic_state(val)			\
> +	hv_set_register(HV_REGISTER_SCONTROL, val)
> +
> +#define hv_signal_eom(old_msg_type)		 \
> +	hv_set_register(HV_REGISTER_EOM, 0)
> +
>  /* SMCCC hypercall parameters */
>  #define HV_SMCCC_FUNC_NUMBER	1
>  #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b1aa42f60faa..be6210a3fd2f 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> -void __percpu **hv_ghcb_pg;
> +union hv_ghcb __percpu **hv_ghcb_pg;
> 
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> @@ -406,7 +406,7 @@ void __init hyperv_init(void)
>  	}
> 
>  	if (hv_isolation_type_snp()) {
> -		hv_ghcb_pg = alloc_percpu(void *);
> +		hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
>  		if (!hv_ghcb_pg)
>  			goto free_vp_assist_page;
>  	}
> @@ -424,6 +424,9 @@ void __init hyperv_init(void)
>  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> +	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> +	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> +
>  	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>  			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -501,6 +504,7 @@ void __init hyperv_init(void)
> 
>  clean_guest_os_id:
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> +	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>  	cpuhp_remove_state(cpuhp);
>  free_ghcb_page:
>  	free_percpu(hv_ghcb_pg);
> @@ -522,6 +526,7 @@ void hyperv_cleanup(void)
> 
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> +	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>  	/*
>  	 * Reset hypercall page reference before reset the page,
> @@ -592,30 +597,3 @@ bool hv_is_hyperv_initialized(void)
>  	return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> -	if (!(ms_hyperv.priv_high & HV_ISOLATION))
> -		return HV_ISOLATION_TYPE_NONE;
> -	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> -	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> -		return 0;
> -
> -	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> -		return 0;
> -
> -	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -
> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> -
> -bool hv_isolation_type_snp(void)
> -{
> -	return static_branch_unlikely(&isolation_type_snp);
> -}
> -EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index a069c788ce3c..f56fe4f73000 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,13 +6,125 @@
>   *  Tianyu Lan <Tianyu.Lan@microsoft.com>
>   */
> 
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
>  #include <linux/hyperv.h>
>  #include <linux/types.h>
>  #include <linux/bitfield.h>
>  #include <linux/slab.h>
> +#include <asm/svm.h>
> +#include <asm/sev.h>
>  #include <asm/io.h>
>  #include <asm/mshyperv.h>
> 
> +union hv_ghcb {
> +	struct ghcb ghcb;
> +} __packed __aligned(HV_HYP_PAGE_SIZE);
> +
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!hv_ghcb_pg)
> +		return;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> +	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
> +	ghcb_set_rdx(&hv_ghcb->ghcb, upper_32_bits(value));
> +
> +	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
> +		pr_warn("Fail to write msr via ghcb %llx.\n", msr);
> +
> +	local_irq_restore(flags);
> +}
> +
> +void hv_ghcb_msr_read(u64 msr, u64 *value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	/* Check size of union hv_ghcb here. */
> +	BUILD_BUG_ON(sizeof(union hv_ghcb) != HV_HYP_PAGE_SIZE);
> +
> +	if (!hv_ghcb_pg)
> +		return;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> +	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
> +		pr_warn("Fail to read msr via ghcb %llx.\n", msr);
> +	else
> +		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
> +			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
> +	local_irq_restore(flags);
> +}
> +
> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
> +{
> +	hv_ghcb_msr_read(msr, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
> +
> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
> +{
> +	hv_ghcb_msr_write(msr, value);
> +
> +	/* Write proxy bit vua wrmsrl instruction. */

s/vua/via/

> +	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
> +		wrmsrl(msr, value | 1 << 20);
> +}
> +EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
> +
> +enum hv_isolation_type hv_get_isolation_type(void)
> +{
> +	if (!(ms_hyperv.priv_high & HV_ISOLATION))
> +		return HV_ISOLATION_TYPE_NONE;
> +	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> +}
> +EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> +
> +/*
> + * hv_is_isolation_supported - Check system runs in the Hyper-V
> + * isolation VM.
> + */
> +bool hv_is_isolation_supported(void)
> +{
> +	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> +}
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +
> +/*
> + * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
> + * isolation VM.
> + */
> +bool hv_isolation_type_snp(void)
> +{
> +	return static_branch_unlikely(&isolation_type_snp);
> +}
> +
>  /*
>   * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
>   *
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffb2af079c6b..b77f4caee3ee 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -11,6 +11,8 @@
>  #include <asm/paravirt.h>
>  #include <asm/mshyperv.h>
> 
> +union hv_ghcb;
> +
>  DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> 
>  typedef int (*hyperv_fill_flush_list_func)(
> @@ -30,6 +32,61 @@ static inline u64 hv_get_register(unsigned int reg)
>  	return value;
>  }
> 
> +#define hv_get_sint_reg(val, reg) {		\
> +	if (hv_isolation_type_snp())		\
> +		hv_get_##reg##_ghcb(&val);	\
> +	else					\
> +		rdmsrl(HV_REGISTER_##reg, val);	\
> +	}
> +
> +#define hv_set_sint_reg(val, reg) {		\
> +	if (hv_isolation_type_snp())		\
> +		hv_set_##reg##_ghcb(val);	\
> +	else					\
> +		wrmsrl(HV_REGISTER_##reg, val);	\
> +	}
> +
> +
> +#define hv_get_simp(val) hv_get_sint_reg(val, SIMP)
> +#define hv_get_siefp(val) hv_get_sint_reg(val, SIEFP)
> +
> +#define hv_set_simp(val) hv_set_sint_reg(val, SIMP)
> +#define hv_set_siefp(val) hv_set_sint_reg(val, SIEFP)
> +
> +#define hv_get_synic_state(val) {			\
> +	if (hv_isolation_type_snp())			\
> +		hv_get_synic_state_ghcb(&val);		\
> +	else						\
> +		rdmsrl(HV_REGISTER_SCONTROL, val);	\
> +	}
> +#define hv_set_synic_state(val) {			\
> +	if (hv_isolation_type_snp())			\
> +		hv_set_synic_state_ghcb(val);		\
> +	else						\
> +		wrmsrl(HV_REGISTER_SCONTROL, val);	\
> +	}
> +
> +#define hv_signal_eom(old_msg_type) {		 \
> +	if (hv_isolation_type_snp() &&		 \
> +	    old_msg_type != HVMSG_TIMER_EXPIRED) \
> +		hv_sint_wrmsrl_ghcb(HV_REGISTER_EOM, 0); \
> +	else						\
> +		wrmsrl(HV_REGISTER_EOM, 0);		\
> +	}
> +
> +#define hv_get_synint_state(int_num, val) {		\
> +	if (hv_isolation_type_snp())			\
> +		hv_get_synint_state_ghcb(int_num, &val);\
> +	else						\
> +		rdmsrl(HV_REGISTER_SINT0 + int_num, val);\
> +	}
> +#define hv_set_synint_state(int_num, val) {		\
> +	if (hv_isolation_type_snp())			\
> +		hv_set_synint_state_ghcb(int_num, val);	\
> +	else						\
> +		wrmsrl(HV_REGISTER_SINT0 + int_num, val);\
> +	}
> +
>  #define hv_get_raw_timer() rdtsc_ordered()
> 
>  void hyperv_vector_handler(struct pt_regs *regs);
> @@ -41,7 +98,7 @@ extern void *hv_hypercall_pg;
> 
>  extern u64 hv_current_partition_id;
> 
> -extern void __percpu **hv_ghcb_pg;
> +extern union hv_ghcb  __percpu **hv_ghcb_pg;
> 
>  int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
>  int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> @@ -195,6 +252,25 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  			   enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
> +void hv_signal_eom_ghcb(void);
> +void hv_ghcb_msr_write(u64 msr, u64 value);
> +void hv_ghcb_msr_read(u64 msr, u64 *value);
> +
> +#define hv_get_synint_state_ghcb(int_num, val)			\
> +	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> +#define hv_set_synint_state_ghcb(int_num, val) \
> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> +
> +#define hv_get_SIMP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
> +#define hv_set_SIMP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
> +
> +#define hv_get_SIEFP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
> +#define hv_set_SIEFP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
> +
> +#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
> +#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)


I'm not seeing the value in the multiple layers of #define to get and set the
various syinc registers.  My thought was a completely different approach, which is
to simply implement the hv_get_register() and hv_set_register() functions with 
a little bit more logic.  Here's my proposal.  This code is not even compile tested,
but you get the idea:

static bool hv_is_synic_reg(unsigned int reg)
{
	if ((reg >= HV_REGISTER_SCONTROL) &&
	    (reg <= HV_REGISTER_SINT15))
		return true;
	return false;
}

u64 hv_get_register(unsigned int reg)
{
	u64 value;

	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
		hv_ghcb_msr_read(reg, &value);
	else
		rdmsrl(reg, value);
	return value;
}

void hv_set_register(unsigned int reg, u64 value)
{
	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
		hv_ghcb_msr_write(reg, value);

		/* Write proxy bit via wrmsl instruction */
		if (reg >= HV_REGISTER_SINT0 &&
		    reg <= HV_REGISTER_SINT15)
			wrmsrl(reg, value | 1 << 20);
	} else {
		wrmsrl(reg, value);
	}
}

If the above code is implemented in one of the modules under arch/x86
to replace the existing implementations in arch/x86/include/asm/mshyper.h,
then it will only be built for x86/x64, and the existing code will just work
for ARM64.   Architecture neutral code in hv_synic_enable_regs() and
hv_synic_disable_regs() will still need check for hv_isolation_type_snp()
and take some special actions, but the calls to hv_get_register() and
hv_set_register() can remain unchanged.

This approach seems a lot simpler to me, but maybe I'm missing
something that your current patch is doing.

Your code does have a special case for HV_REGISTER_EOM.  Is
there a reason it needs to do an additional check of the old_msg_type?
I'm just not understanding why an SNP isolated VM requires
special treatment of this register.

A key point:  Getting/setting any of the synthetic MSRs requires
a trap to the hypervisor.  So they already not super-fast.  The
only code path in Linux that is performance sensitive is setting
HV_REGISTER_STIMER0_COUNT in hv_ce_set_next_event().
That's not a synic register, so the only additional burden with the
above implementation is checking the MSR value to see if it is
in the synic range.  The cost of that check is reasonable for
something that has to trap to the hypervisor anyway.


>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> @@ -211,9 +287,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
>  {
>  	return -1;
>  }
> +static inline void hv_signal_eom_ghcb(void) { };
>  #endif /* CONFIG_HYPERV */
> 
> -
>  #include <asm-generic/mshyperv.h>
> 
>  #endif
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index fa5cd05d3b5b..81beb2a8031b 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -81,6 +81,9 @@ static __always_inline void sev_es_nmi_complete(void)
>  		__sev_es_nmi_complete();
>  }
>  extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
> +extern enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2);
>  #else
>  static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>  static inline void sev_es_ist_exit(void) { }
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 9f90f460a28c..dd7f37de640b 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -94,10 +94,9 @@ static void vc_finish_insn(struct es_em_ctxt *ctxt)
>  	ctxt->regs->ip += ctxt->insn.length;
>  }
> 
> -static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> -					  struct es_em_ctxt *ctxt,
> -					  u64 exit_code, u64 exit_info_1,
> -					  u64 exit_info_2)
> +enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2)
>  {
>  	enum es_result ret;
> 
> @@ -109,29 +108,45 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
>  	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
>  	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
> 
> -	sev_es_wr_ghcb_msr(__pa(ghcb));
>  	VMGEXIT();
> 
> -	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
> -		u64 info = ghcb->save.sw_exit_info_2;
> -		unsigned long v;
> -
> -		info = ghcb->save.sw_exit_info_2;
> -		v = info & SVM_EVTINJ_VEC_MASK;
> -
> -		/* Check if exception information from hypervisor is sane. */
> -		if ((info & SVM_EVTINJ_VALID) &&
> -		    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
> -		    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
> -			ctxt->fi.vector = v;
> -			if (info & SVM_EVTINJ_VALID_ERR)
> -				ctxt->fi.error_code = info >> 32;
> -			ret = ES_EXCEPTION;
> -		} else {
> -			ret = ES_VMM_ERROR;
> -		}
> -	} else {
> +	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1)
> +		ret = ES_VMM_ERROR;
> +	else
>  		ret = ES_OK;
> +
> +	return ret;
> +}
> +
> +static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> +				   struct es_em_ctxt *ctxt,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2)
> +{
> +	unsigned long v;
> +	enum es_result ret;
> +	u64 info;
> +
> +	sev_es_wr_ghcb_msr(__pa(ghcb));
> +
> +	ret = sev_es_ghcb_hv_call_simple(ghcb, exit_code, exit_info_1,
> +					 exit_info_2);
> +	if (ret == ES_OK)
> +		return ret;
> +
> +	info = ghcb->save.sw_exit_info_2;
> +	v = info & SVM_EVTINJ_VEC_MASK;
> +
> +	/* Check if exception information from hypervisor is sane. */
> +	if ((info & SVM_EVTINJ_VALID) &&
> +	    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
> +	    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
> +		ctxt->fi.vector = v;
> +		if (info & SVM_EVTINJ_VALID_ERR)
> +			ctxt->fi.error_code = info >> 32;
> +		ret = ES_EXCEPTION;
> +	} else {
> +		ret = ES_VMM_ERROR;
>  	}
> 
>  	return ret;
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index e83507f49676..97b21256a9db 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -8,6 +8,7 @@
>   */
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> 
> +#include <linux/io.h>
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  #include <linux/slab.h>
> @@ -136,17 +137,24 @@ int hv_synic_alloc(void)
>  		tasklet_init(&hv_cpu->msg_dpc,
>  			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
> 
> -		hv_cpu->synic_message_page =
> -			(void *)get_zeroed_page(GFP_ATOMIC);
> -		if (hv_cpu->synic_message_page == NULL) {
> -			pr_err("Unable to allocate SYNIC message page\n");
> -			goto err;
> -		}
> +		/*
> +		 * Synic message and event pages are allocated by paravisor.
> +		 * Skip these pages allocation here.
> +		 */
> +		if (!hv_isolation_type_snp()) {
> +			hv_cpu->synic_message_page =
> +				(void *)get_zeroed_page(GFP_ATOMIC);
> +			if (hv_cpu->synic_message_page == NULL) {
> +				pr_err("Unable to allocate SYNIC message page\n");
> +				goto err;
> +			}
> 
> -		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
> -		if (hv_cpu->synic_event_page == NULL) {
> -			pr_err("Unable to allocate SYNIC event page\n");
> -			goto err;
> +			hv_cpu->synic_event_page =
> +				(void *)get_zeroed_page(GFP_ATOMIC);
> +			if (hv_cpu->synic_event_page == NULL) {
> +				pr_err("Unable to allocate SYNIC event page\n");
> +				goto err;
> +			}
>  		}
> 
>  		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
> @@ -199,26 +207,43 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	union hv_synic_scontrol sctrl;
> 
>  	/* Setup the Synic's message page */
> -	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
> +	hv_get_simp(simp.as_uint64);
>  	simp.simp_enabled = 1;
> -	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> -		>> HV_HYP_PAGE_SHIFT;
> 
> -	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> +	if (hv_isolation_type_snp()) {
> +		hv_cpu->synic_message_page
> +			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
> +				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> +		if (!hv_cpu->synic_message_page)
> +			pr_err("Fail to map syinc message page.\n");
> +	} else {
> +		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +	}
> +
> +	hv_set_simp(simp.as_uint64);
> 
>  	/* Setup the Synic's event page */
> -	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
> +	hv_get_siefp(siefp.as_uint64);
>  	siefp.siefp_enabled = 1;
> -	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> -		>> HV_HYP_PAGE_SHIFT;
> 
> -	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> +	if (hv_isolation_type_snp()) {
> +		hv_cpu->synic_event_page =
> +			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
> +				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> +
> +		if (!hv_cpu->synic_event_page)
> +			pr_err("Fail to map syinc event page.\n");
> +	} else {
> +		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +	}
> +	hv_set_siefp(siefp.as_uint64);
> 
>  	/* Setup the shared SINT. */
>  	if (vmbus_irq != -1)
>  		enable_percpu_irq(vmbus_irq, 0);
> -	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
> -					VMBUS_MESSAGE_SINT);
> +	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> 
>  	shared_sint.vector = vmbus_interrupt;
>  	shared_sint.masked = false;
> @@ -233,14 +258,12 @@ void hv_synic_enable_regs(unsigned int cpu)
>  #else
>  	shared_sint.auto_eoi = 0;
>  #endif
> -	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
> -				shared_sint.as_uint64);
> +	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> 
>  	/* Enable the global synic bit */
> -	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
> +	hv_get_synic_state(sctrl.as_uint64);
>  	sctrl.enable = 1;
> -
> -	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
> +	hv_set_synic_state(sctrl.as_uint64);
>  }
> 
>  int hv_synic_init(unsigned int cpu)
> @@ -257,37 +280,50 @@ int hv_synic_init(unsigned int cpu)
>   */
>  void hv_synic_disable_regs(unsigned int cpu)
>  {
> +	struct hv_per_cpu_context *hv_cpu
> +		= per_cpu_ptr(hv_context.cpu_context, cpu);
>  	union hv_synic_sint shared_sint;
>  	union hv_synic_simp simp;
>  	union hv_synic_siefp siefp;
>  	union hv_synic_scontrol sctrl;
> 
> -	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
> -					VMBUS_MESSAGE_SINT);
> -
> +	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>  	shared_sint.masked = 1;
> +	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +
> 
>  	/* Need to correctly cleanup in the case of SMP!!! */
>  	/* Disable the interrupt */
> -	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
> -				shared_sint.as_uint64);
> +	hv_get_simp(simp.as_uint64);
> 
> -	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
> +	/*
> +	 * In Isolation VM, sim and sief pages are allocated by
> +	 * paravisor. These pages also will be used by kdump
> +	 * kernel. So just reset enable bit here and keep page
> +	 * addresses.
> +	 */
>  	simp.simp_enabled = 0;
> -	simp.base_simp_gpa = 0;
> +	if (hv_isolation_type_snp())
> +		memunmap(hv_cpu->synic_message_page);
> +	else
> +		simp.base_simp_gpa = 0;
> 
> -	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> +	hv_set_simp(simp.as_uint64);
> 
> -	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
> +	hv_get_siefp(siefp.as_uint64);
>  	siefp.siefp_enabled = 0;
> -	siefp.base_siefp_gpa = 0;
> 
> -	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> +	if (hv_isolation_type_snp())
> +		memunmap(hv_cpu->synic_event_page);
> +	else
> +		siefp.base_siefp_gpa = 0;
> +
> +	hv_set_siefp(siefp.as_uint64);
> 
>  	/* Disable the global synic bit */
> -	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
> +	hv_get_synic_state(sctrl.as_uint64);
>  	sctrl.enable = 0;
> -	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
> +	hv_set_synic_state(sctrl.as_uint64);
> 
>  	if (vmbus_irq != -1)
>  		disable_percpu_irq(vmbus_irq);
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index c0d9048a4112..1fc82d237161 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -249,6 +249,12 @@ bool __weak hv_is_isolation_supported(void)
>  }
>  EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
> 
> +bool __weak hv_isolation_type_snp(void)
> +{
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> +
>  void __weak hv_setup_vmbus_handler(void (*handler)(void))
>  {
>  }
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index aa55447b9700..04a687d95eac 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -24,6 +24,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/nmi.h>
>  #include <asm/ptrace.h>
> +#include <asm/mshyperv.h>
>  #include <asm/hyperv-tlfs.h>
> 
>  struct ms_hyperv_info {
> @@ -54,6 +55,7 @@ extern void  __percpu  **hyperv_pcpu_output_arg;
> 
>  extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
>  extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> +extern bool hv_isolation_type_snp(void);
> 
>  /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
>  static inline int hv_result(u64 status)
> @@ -148,7 +150,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
>  		 * possibly deliver another msg from the
>  		 * hypervisor
>  		 */
> -		hv_set_register(HV_REGISTER_EOM, 0);
> +		hv_signal_eom(old_msg_type);
>  	}
>  }
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver
  2021-09-02  2:34   ` Michael Kelley
@ 2021-09-02  4:56     ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02  4:56 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Michael Kelley <mikelley@microsoft.com> Sent: Wednesday, September 1, 2021 7:34 PM

[snip]

> > +int netvsc_dma_map(struct hv_device *hv_dev,
> > +		   struct hv_netvsc_packet *packet,
> > +		   struct hv_page_buffer *pb)
> > +{
> > +	u32 page_count =  packet->cp_partial ?
> > +		packet->page_buf_cnt - packet->rmsg_pgcnt :
> > +		packet->page_buf_cnt;
> > +	dma_addr_t dma;
> > +	int i;
> > +
> > +	if (!hv_is_isolation_supported())
> > +		return 0;
> > +
> > +	packet->dma_range = kcalloc(page_count,
> > +				    sizeof(*packet->dma_range),
> > +				    GFP_KERNEL);
> > +	if (!packet->dma_range)
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < page_count; i++) {
> > +		char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> > +					 + pb[i].offset);
> > +		u32 len = pb[i].len;
> > +
> > +		dma = dma_map_single(&hv_dev->device, src, len,
> > +				     DMA_TO_DEVICE);
> > +		if (dma_mapping_error(&hv_dev->device, dma)) {
> > +			kfree(packet->dma_range);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		packet->dma_range[i].dma = dma;
> > +		packet->dma_range[i].mapping_size = len;
> > +		pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
> > +		pb[i].offset = offset_in_hvpage(dma);
> > +		pb[i].len = len;
> > +	}
> 
> Just to confirm, this driver does *not* set the DMA min_align_mask
> like storvsc does.  So after the call to dma_map_single(), the offset
> in the page could be different.  That's why you are updating
> the pb[i].offset value.  Alternatively, you could set the DMA
> min_align_mask, which would ensure the offset is unchanged.
> I'm OK with either approach, though perhaps a comment is
> warranted to explain, as this is a subtle issue.
> 

On second thought, I don't think either approach is OK.  The default
alignment in the swiotlb is 2K, and if the length of the data in the
buffer was 3K, the data could cross a page boundary in the bounce
buffer when it originally did not.  This would break the above code
which can only deal with one page at a time.  So I think the netvsc
driver also must set the DMA min_align_mask to 4K, which will
preserve the offset.

Michael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the Isolation VM.
  2021-09-02  0:15   ` Michael Kelley
@ 2021-09-02  6:35     ` Tianyu Lan
  0 siblings, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-09-02  6:35 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen


Hi Michael:
       Thanks for your review.

On 9/2/2021 8:15 AM, Michael Kelley wrote:
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 27, 2021 10:21 AM
>>
>> Hyper-V exposes shared memory boundary via cpuid
>> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
>> shared_gpa_boundary of ms_hyperv struct. This prepares
>> to share memory with host for SNP guest.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>> Change since v3:
>> 	* user BIT_ULL to get shared_gpa_boundary
>> 	* Rename field Reserved* to reserved
>> ---
>>   arch/x86/kernel/cpu/mshyperv.c |  2 ++
>>   include/asm-generic/mshyperv.h | 12 +++++++++++-
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index 20557a9d6e25..8bb001198316 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>>   	if (ms_hyperv.priv_high & HV_ISOLATION) {
>>   		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>>   		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
>> +		ms_hyperv.shared_gpa_boundary =
>> +			BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
>>
>>   		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>>   			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
>> index 0924bbd8458e..7537ae1db828 100644
>> --- a/include/asm-generic/mshyperv.h
>> +++ b/include/asm-generic/mshyperv.h
>> @@ -35,7 +35,17 @@ struct ms_hyperv_info {
>>   	u32 max_vp_index;
>>   	u32 max_lp_index;
>>   	u32 isolation_config_a;
>> -	u32 isolation_config_b;
>> +	union {
>> +		u32 isolation_config_b;
>> +		struct {
>> +			u32 cvm_type : 4;
>> +			u32 reserved11 : 1;
>> +			u32 shared_gpa_boundary_active : 1;
>> +			u32 shared_gpa_boundary_bits : 6;
>> +			u32 reserved12 : 20;
> 
> I'm still curious about the "11" and "12" in the reserved
> field names.  Why not just "reserved1" and "reserved2"?
> Having the "11" and "12" isn't wrong, but it makes one
> wonder why since it's not usual. :-)
> 

Yes, will update. Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-08-31 15:20   ` Tianyu Lan
@ 2021-09-02  7:51     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2021-09-02  7:51 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Christoph Hellwig, kys, haiyangz, sthemmin, wei.liu, decui,
	catalin.marinas, will, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, davem, kuba, jejb, martin.petersen, gregkh, arnd,
	m.szyprowski, robin.murphy, brijesh.singh, thomas.lendacky,
	Tianyu.Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	hannes, aneesh.kumar, krish.sadhukhan, saravanand,
	linux-arm-kernel, xen-devel, rientjes, ardb, michael.h.kelley,
	iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

On Tue, Aug 31, 2021 at 11:20:06PM +0800, Tianyu Lan wrote:
>> If so I suspect the best way to allocate them is by not using vmalloc
>> but just discontiguous pages, and then use kmap_local_pfn where the
>> PFN includes the share_gpa offset when actually copying from/to the
>> skbs.
>>
> When netvsc needs to copy packet data to send buffer, it needs to caculate 
> position with section_index and send_section_size.
> Please seee netvsc_copy_to_send_buf() detail. So the contiguous virtual 
> address of send buffer is necessary to copy data and batch packets.

Actually that makes the kmap approach much easier.  The phys_to_virt
can just be replaced with a kmap_local_pfn and the unmap needs to
be added.  I've been mostly focussing on the receive path, which
would need a similar treatment.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-08-31 17:16   ` Michael Kelley
@ 2021-09-02  7:59     ` Christoph Hellwig
  2021-09-02 11:21       ` Tianyu Lan
  2021-09-02 15:57       ` Michael Kelley
  0 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2021-09-02  7:59 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Christoph Hellwig, Tianyu Lan, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, wei.liu, Dexuan Cui, catalin.marinas, will,
	tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, davem,
	kuba, jejb, martin.petersen, gregkh, arnd, m.szyprowski,
	robin.murphy, brijesh.singh, thomas.lendacky, Tianyu Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, hannes,
	aneesh.kumar, krish.sadhukhan, saravanand, linux-arm-kernel,
	xen-devel, rientjes, ardb, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen

On Tue, Aug 31, 2021 at 05:16:19PM +0000, Michael Kelley wrote:
> As a quick overview, I think there are four places where the
> shared_gpa_boundary must be applied to adjust the guest physical
> address that is used.  Each requires mapping a corresponding
> virtual address range.  Here are the four places:
> 
> 1)  The so-called "monitor pages" that are a core communication
> mechanism between the guest and Hyper-V.  These are two single
> pages, and the mapping is handled by calling memremap() for
> each of the two pages.  See Patch 7 of Tianyu's series.

Ah, interesting.

> 3)  The network driver send and receive buffers.  vmap_phys_range()
> should work here.

Actually it won't.  The problem with these buffers is that they are
physically non-contiguous allocations.  We really have two sensible
options:

 1) use vmap_pfn as in the current series.  But in that case I think
    we should get rid of the other mapping created by vmalloc.  I
    though a bit about finding a way to apply the offset in vmalloc
    itself, but I think it would be too invasive to the normal fast
    path.  So the other sub-option would be to allocate the pages
    manually (maybe even using high order allocations to reduce TLB
    pressure) and then remap them
 2) do away with the contiguous kernel mapping entirely.  This means
    the simple memcpy calls become loops over kmap_local_pfn.  As
    I just found out for the send side that would be pretty easy,
    but the receive side would be more work.  We'd also need to check
    the performance implications.

> 4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
> should work here as well.

Or memremap if it works for 1.

> Case #2 above does unusual mapping.  The ring buffer consists of a ring
> buffer header page, followed by one or more pages that are the actual
> ring buffer.  The pages making up the actual ring buffer are mapped
> twice in succession.  For example, if the ring buffer has 4 pages
> (one header page and three ring buffer pages), the contiguous
> virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
> The duplicate contiguous mapping allows the code that is reading
> or writing the actual ring buffer to not be concerned about wrap-around
> because writing off the end of the ring buffer is automatically
> wrapped-around by the mapping.  The amount of data read or
> written in one batch never exceeds the size of the ring buffer, and
> after a batch is read or written, the read or write indices are adjusted
> to put them back into the range of the first mapping of the actual
> ring buffer pages.  So there's method to the madness, and the
> technique works pretty well.  But this kind of mapping is not
> amenable to using vmap_phys_range().

Hmm.  Can you point me to where this is mapped?  Especially for the
classic non-isolated case where no vmap/vmalloc mapping is involved
at all?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-09-02  7:59     ` Christoph Hellwig
@ 2021-09-02 11:21       ` Tianyu Lan
  2021-09-02 15:57       ` Michael Kelley
  1 sibling, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-09-02 11:21 UTC (permalink / raw)
  To: Christoph Hellwig, Michael Kelley
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu,
	Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86, hpa,
	dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky, jgross,
	sstabellini, joro, davem, kuba, jejb, martin.petersen, gregkh,
	arnd, m.szyprowski, robin.murphy, brijesh.singh, thomas.lendacky,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	hannes, aneesh.kumar, krish.sadhukhan, saravanand,
	linux-arm-kernel, xen-devel, rientjes, ardb, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen



On 9/2/2021 3:59 PM, Christoph Hellwig wrote:
> On Tue, Aug 31, 2021 at 05:16:19PM +0000, Michael Kelley wrote:
>> As a quick overview, I think there are four places where the
>> shared_gpa_boundary must be applied to adjust the guest physical
>> address that is used.  Each requires mapping a corresponding
>> virtual address range.  Here are the four places:
>>
>> 1)  The so-called "monitor pages" that are a core communication
>> mechanism between the guest and Hyper-V.  These are two single
>> pages, and the mapping is handled by calling memremap() for
>> each of the two pages.  See Patch 7 of Tianyu's series.
> 
> Ah, interesting.
> 
>> 3)  The network driver send and receive buffers.  vmap_phys_range()
>> should work here.
> 
> Actually it won't.  The problem with these buffers is that they are
> physically non-contiguous allocations.  We really have two sensible
> options:
> 
>   1) use vmap_pfn as in the current series.  But in that case I think
>      we should get rid of the other mapping created by vmalloc.  I
>      though a bit about finding a way to apply the offset in vmalloc
>      itself, but I think it would be too invasive to the normal fast
>      path.  So the other sub-option would be to allocate the pages
>      manually (maybe even using high order allocations to reduce TLB
>      pressure) and then remap them

Agree. In such case, the map for memory below shared_gpa_boundary is not 
necessary. allocate_pages() is limited by MAX_ORDER and needs to be 
called repeatedly to get enough memory.

>   2) do away with the contiguous kernel mapping entirely.  This means
>      the simple memcpy calls become loops over kmap_local_pfn.  As
>      I just found out for the send side that would be pretty easy,
>      but the receive side would be more work.  We'd also need to check
>      the performance implications.

kmap_local_pfn() requires pfn with backing struct page and this doesn't 
work pfn above shared_gpa_boundary.
> 
>> 4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
>> should work here as well.
> 
> Or memremap if it works for 1.

Now use vmap_pfn() and the hv map function is reused in the netvsc driver.

> 
>> Case #2 above does unusual mapping.  The ring buffer consists of a ring
>> buffer header page, followed by one or more pages that are the actual
>> ring buffer.  The pages making up the actual ring buffer are mapped
>> twice in succession.  For example, if the ring buffer has 4 pages
>> (one header page and three ring buffer pages), the contiguous
>> virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
>> The duplicate contiguous mapping allows the code that is reading
>> or writing the actual ring buffer to not be concerned about wrap-around
>> because writing off the end of the ring buffer is automatically
>> wrapped-around by the mapping.  The amount of data read or
>> written in one batch never exceeds the size of the ring buffer, and
>> after a batch is read or written, the read or write indices are adjusted
>> to put them back into the range of the first mapping of the actual
>> ring buffer pages.  So there's method to the madness, and the
>> technique works pretty well.  But this kind of mapping is not
>> amenable to using vmap_phys_range().
> 
> Hmm.  Can you point me to where this is mapped?  Especially for the
> classic non-isolated case where no vmap/vmalloc mapping is involved
> at all?
> 

This is done via vmap() in the hv_ringbuffer_init()

182/* Initialize the ring buffer. */
183int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
184                       struct page *pages, u32 page_cnt, u32 
max_pkt_size)
185{
186        int i;
187        struct page **pages_wraparound;
188
189        BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
190
191        /*
192         * First page holds struct hv_ring_buffer, do wraparound 
mapping for
193         * the rest.
194         */
195        pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct 
page *),
196                                   GFP_KERNEL);
197        if (!pages_wraparound)
198                return -ENOMEM;
199
/* prepare to wrap page array */
200        pages_wraparound[0] = pages;
201        for (i = 0; i < 2 * (page_cnt - 1); i++)
202                pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
203
/* map */
204        ring_info->ring_buffer = (struct hv_ring_buffer *)
205                vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, 
PAGE_KERNEL);
206
207        kfree(pages_wraparound);
208
209
210        if (!ring_info->ring_buffer)
211                return -ENOMEM;
212
213        ring_info->ring_buffer->read_index =
214                ring_info->ring_buffer->write_index = 0;



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-09-02  0:23   ` Michael Kelley
@ 2021-09-02 13:35     ` Tianyu Lan
  2021-09-02 16:14       ` Michael Kelley
  0 siblings, 1 reply; 41+ messages in thread
From: Tianyu Lan @ 2021-09-02 13:35 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

On 9/2/2021 8:23 AM, Michael Kelley wrote:
>> +	} else {
>> +		pages_wraparound = kcalloc(page_cnt * 2 - 1,
>> +					   sizeof(struct page *),
>> +					   GFP_KERNEL);
>> +
>> +		pages_wraparound[0] = pages;
>> +		for (i = 0; i < 2 * (page_cnt - 1); i++)
>> +			pages_wraparound[i + 1] =
>> +				&pages[i % (page_cnt - 1) + 1];
>> +
>> +		ring_info->ring_buffer = (struct hv_ring_buffer *)
>> +			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
>> +				PAGE_KERNEL);
>> +
>> +		kfree(pages_wraparound);
>> +		if (!ring_info->ring_buffer)
>> +			return -ENOMEM;
>> +	}
> With this patch, the code is a big "if" statement with two halves -- one
> when SNP isolation is in effect, and the other when not.  The SNP isolation
> case does the work using PFNs with the shared_gpa_boundary added,
> while the other case does the same work but using struct page.  Perhaps
> I'm missing something, but can both halves be combined and always
> do the work using PFNs?  The only difference is whether to add the
> shared_gpa_boundary, and whether to zero the memory when done.
> So get the starting PFN, then have an "if" statement for whether to
> add the shared_gpa_boundary.  Then everything else is the same.
> At the end, use an "if" statement to decide whether to zero the
> memory.  It would really be better to have the logic in this algorithm
> coded only once.
> 

Hi Michael:
	I have tried this before. But vmap_pfn() only works for those pfns out 
of normal memory. Please see vmap_pfn_apply() for detail and
return error when the PFN is valid.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-09-02  7:59     ` Christoph Hellwig
  2021-09-02 11:21       ` Tianyu Lan
@ 2021-09-02 15:57       ` Michael Kelley
  2021-09-14 14:41         ` Tianyu Lan
  1 sibling, 1 reply; 41+ messages in thread
From: Michael Kelley @ 2021-09-02 15:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb, iommu,
	linux-arch, linux-hyperv, linux-kernel, linux-scsi, netdev,
	vkuznets, parri.andrea, dave.hansen

From: Christoph Hellwig <hch@lst.de> Sent: Thursday, September 2, 2021 1:00 AM
> 
> On Tue, Aug 31, 2021 at 05:16:19PM +0000, Michael Kelley wrote:
> > As a quick overview, I think there are four places where the
> > shared_gpa_boundary must be applied to adjust the guest physical
> > address that is used.  Each requires mapping a corresponding
> > virtual address range.  Here are the four places:
> >
> > 1)  The so-called "monitor pages" that are a core communication
> > mechanism between the guest and Hyper-V.  These are two single
> > pages, and the mapping is handled by calling memremap() for
> > each of the two pages.  See Patch 7 of Tianyu's series.
> 
> Ah, interesting.
> 
> > 3)  The network driver send and receive buffers.  vmap_phys_range()
> > should work here.
> 
> Actually it won't.  The problem with these buffers is that they are
> physically non-contiguous allocations.  

Indeed you are right.  These buffers are allocated with vzalloc().

> We really have two sensible options:
> 
>  1) use vmap_pfn as in the current series.  But in that case I think
>     we should get rid of the other mapping created by vmalloc.  I
>     though a bit about finding a way to apply the offset in vmalloc
>     itself, but I think it would be too invasive to the normal fast
>     path.  So the other sub-option would be to allocate the pages
>     manually (maybe even using high order allocations to reduce TLB
>     pressure) and then remap them

What's the benefit of getting rid of the other mapping created by
vmalloc if it isn't referenced?  Just page table space?  The default sizes
are a 16 Meg receive buffer and a 1 Meg send buffer for each VMbus
channel used by netvsc, and usually the max number of channels
is 8.  So there's 128 Meg of virtual space to be saved on the receive
buffers,  which could be worth it.

Allocating the pages manually is also an option, but we have to
be careful about high order allocations.  While typically these buffers
are allocated during system boot, these synthetic NICs can be hot
added and removed while the VM is running.   The channel count
can also be changed while the VM is running.  So multiple 16 Meg
receive buffer allocations may need to be done after the system has
been running a long time.

>  2) do away with the contiguous kernel mapping entirely.  This means
>     the simple memcpy calls become loops over kmap_local_pfn.  As
>     I just found out for the send side that would be pretty easy,
>     but the receive side would be more work.  We'd also need to check
>     the performance implications.

Doing away with the contiguous kernel mapping entirely seems like
it would result in fairly messy code to access the buffer.  What's the
benefit of doing away with the mapping?  I'm not an expert on the
netvsc driver, but decoding the incoming packets is already fraught
with complexities because of the nature of the protocol with Hyper-V.
The contiguous kernel mapping at least keeps the basics sane.

> 
> > 4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
> > should work here as well.
> 
> Or memremap if it works for 1.
> 
> > Case #2 above does unusual mapping.  The ring buffer consists of a ring
> > buffer header page, followed by one or more pages that are the actual
> > ring buffer.  The pages making up the actual ring buffer are mapped
> > twice in succession.  For example, if the ring buffer has 4 pages
> > (one header page and three ring buffer pages), the contiguous
> > virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
> > The duplicate contiguous mapping allows the code that is reading
> > or writing the actual ring buffer to not be concerned about wrap-around
> > because writing off the end of the ring buffer is automatically
> > wrapped-around by the mapping.  The amount of data read or
> > written in one batch never exceeds the size of the ring buffer, and
> > after a batch is read or written, the read or write indices are adjusted
> > to put them back into the range of the first mapping of the actual
> > ring buffer pages.  So there's method to the madness, and the
> > technique works pretty well.  But this kind of mapping is not
> > amenable to using vmap_phys_range().
> 
> Hmm.  Can you point me to where this is mapped?  Especially for the
> classic non-isolated case where no vmap/vmalloc mapping is involved
> at all?

The existing code is in hv_ringbuffer_init() in drivers/hv/ring_buffer.c.
The code hasn't changed in a while, so any recent upstream code tree
is valid to look at.  The memory pages are typically allocated
in vmbus_alloc_ring() in drivers/hv/channel.c.

Michael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-09-02 13:35     ` Tianyu Lan
@ 2021-09-02 16:14       ` Michael Kelley
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kelley @ 2021-09-02 16:14 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, davem, kuba, jejb, martin.petersen,
	gregkh, arnd, hch, m.szyprowski, robin.murphy, brijesh.singh,
	thomas.lendacky, Tianyu Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, hannes, aneesh.kumar, krish.sadhukhan,
	saravanand, linux-arm-kernel, xen-devel, rientjes, ardb
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Thursday, September 2, 2021 6:36 AM
> 
> On 9/2/2021 8:23 AM, Michael Kelley wrote:
> >> +	} else {
> >> +		pages_wraparound = kcalloc(page_cnt * 2 - 1,
> >> +					   sizeof(struct page *),
> >> +					   GFP_KERNEL);
> >> +
> >> +		pages_wraparound[0] = pages;
> >> +		for (i = 0; i < 2 * (page_cnt - 1); i++)
> >> +			pages_wraparound[i + 1] =
> >> +				&pages[i % (page_cnt - 1) + 1];
> >> +
> >> +		ring_info->ring_buffer = (struct hv_ring_buffer *)
> >> +			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
> >> +				PAGE_KERNEL);
> >> +
> >> +		kfree(pages_wraparound);
> >> +		if (!ring_info->ring_buffer)
> >> +			return -ENOMEM;
> >> +	}
> > With this patch, the code is a big "if" statement with two halves -- one
> > when SNP isolation is in effect, and the other when not.  The SNP isolation
> > case does the work using PFNs with the shared_gpa_boundary added,
> > while the other case does the same work but using struct page.  Perhaps
> > I'm missing something, but can both halves be combined and always
> > do the work using PFNs?  The only difference is whether to add the
> > shared_gpa_boundary, and whether to zero the memory when done.
> > So get the starting PFN, then have an "if" statement for whether to
> > add the shared_gpa_boundary.  Then everything else is the same.
> > At the end, use an "if" statement to decide whether to zero the
> > memory.  It would really be better to have the logic in this algorithm
> > coded only once.
> >
> 
> Hi Michael:
> 	I have tried this before. But vmap_pfn() only works for those pfns out
> of normal memory. Please see vmap_pfn_apply() for detail and
> return error when the PFN is valid.
> 

Indeed.  This ties into the discussion with Christoph about coming up
with generalized helper functions to assist in handling the
shared_gpa_boundary.   Having a single implementation here in
hv_ringbuffer_init() would be a good goal as well.

Michael


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-09-02 15:57       ` Michael Kelley
@ 2021-09-14 14:41         ` Tianyu Lan
  0 siblings, 0 replies; 41+ messages in thread
From: Tianyu Lan @ 2021-09-14 14:41 UTC (permalink / raw)
  To: Michael Kelley, Christoph Hellwig
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu,
	Dexuan Cui, catalin.marinas, will, tglx, mingo, bp, x86, hpa,
	dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky, jgross,
	sstabellini, joro, davem, kuba, jejb, martin.petersen, gregkh,
	arnd, m.szyprowski, robin.murphy, brijesh.singh, thomas.lendacky,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	hannes, aneesh.kumar, krish.sadhukhan, saravanand,
	linux-arm-kernel, xen-devel, rientjes, ardb, iommu, linux-arch,
	linux-hyperv, linux-kernel, linux-scsi, netdev, vkuznets,
	parri.andrea, dave.hansen

Hi Michael and Christoph:
       I just sent out V5 patchset. I use alloc_pages() to allocate rx/tx
ring buffer in Isolation VM and use vmap() to map rx/tx buffer first
because the vmbus_establish_gpadl() still needs to va of low end memory
to initialize gpadl buffer. After calling vmbus_establish_gpadl(), the
va returned by vmap will be unmapped to release virtual address space 
which will not be used in the following code and then map these pages in 
the extra address space above shared_gpa_boundary via vmap_pfn(). Please
have a look.

https://lkml.org/lkml/2021/9/14/672

Thanks.

On 9/2/2021 11:57 PM, Michael Kelley wrote:
> From: Christoph Hellwig <hch@lst.de> Sent: Thursday, September 2, 2021 1:00 AM
>>
>> On Tue, Aug 31, 2021 at 05:16:19PM +0000, Michael Kelley wrote:
>>> As a quick overview, I think there are four places where the
>>> shared_gpa_boundary must be applied to adjust the guest physical
>>> address that is used.  Each requires mapping a corresponding
>>> virtual address range.  Here are the four places:
>>>
>>> 1)  The so-called "monitor pages" that are a core communication
>>> mechanism between the guest and Hyper-V.  These are two single
>>> pages, and the mapping is handled by calling memremap() for
>>> each of the two pages.  See Patch 7 of Tianyu's series.
>>
>> Ah, interesting.
>>
>>> 3)  The network driver send and receive buffers.  vmap_phys_range()
>>> should work here.
>>
>> Actually it won't.  The problem with these buffers is that they are
>> physically non-contiguous allocations.
> 
> Indeed you are right.  These buffers are allocated with vzalloc().
> 
>> We really have two sensible options:
>>
>>   1) use vmap_pfn as in the current series.  But in that case I think
>>      we should get rid of the other mapping created by vmalloc.  I
>>      though a bit about finding a way to apply the offset in vmalloc
>>      itself, but I think it would be too invasive to the normal fast
>>      path.  So the other sub-option would be to allocate the pages
>>      manually (maybe even using high order allocations to reduce TLB
>>      pressure) and then remap them
> 
> What's the benefit of getting rid of the other mapping created by
> vmalloc if it isn't referenced?  Just page table space?  The default sizes
> are a 16 Meg receive buffer and a 1 Meg send buffer for each VMbus
> channel used by netvsc, and usually the max number of channels
> is 8.  So there's 128 Meg of virtual space to be saved on the receive
> buffers,  which could be worth it.
> 
> Allocating the pages manually is also an option, but we have to
> be careful about high order allocations.  While typically these buffers
> are allocated during system boot, these synthetic NICs can be hot
> added and removed while the VM is running.   The channel count
> can also be changed while the VM is running.  So multiple 16 Meg
> receive buffer allocations may need to be done after the system has
> been running a long time.
> 
>>   2) do away with the contiguous kernel mapping entirely.  This means
>>      the simple memcpy calls become loops over kmap_local_pfn.  As
>>      I just found out for the send side that would be pretty easy,
>>      but the receive side would be more work.  We'd also need to check
>>      the performance implications.
> 
> Doing away with the contiguous kernel mapping entirely seems like
> it would result in fairly messy code to access the buffer.  What's the
> benefit of doing away with the mapping?  I'm not an expert on the
> netvsc driver, but decoding the incoming packets is already fraught
> with complexities because of the nature of the protocol with Hyper-V.
> The contiguous kernel mapping at least keeps the basics sane.
> 
>>
>>> 4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
>>> should work here as well.
>>
>> Or memremap if it works for 1.
>>
>>> Case #2 above does unusual mapping.  The ring buffer consists of a ring
>>> buffer header page, followed by one or more pages that are the actual
>>> ring buffer.  The pages making up the actual ring buffer are mapped
>>> twice in succession.  For example, if the ring buffer has 4 pages
>>> (one header page and three ring buffer pages), the contiguous
>>> virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
>>> The duplicate contiguous mapping allows the code that is reading
>>> or writing the actual ring buffer to not be concerned about wrap-around
>>> because writing off the end of the ring buffer is automatically
>>> wrapped-around by the mapping.  The amount of data read or
>>> written in one batch never exceeds the size of the ring buffer, and
>>> after a batch is read or written, the read or write indices are adjusted
>>> to put them back into the range of the first mapping of the actual
>>> ring buffer pages.  So there's method to the madness, and the
>>> technique works pretty well.  But this kind of mapping is not
>>> amenable to using vmap_phys_range().
>>
>> Hmm.  Can you point me to where this is mapped?  Especially for the
>> classic non-isolated case where no vmap/vmalloc mapping is involved
>> at all?
> 
> The existing code is in hv_ringbuffer_init() in drivers/hv/ring_buffer.c.
> The code hasn't changed in a while, so any recent upstream code tree
> is valid to look at.  The memory pages are typically allocated
> in vmbus_alloc_ring() in drivers/hv/channel.c.
> 
> Michael
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-09-14 14:47 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27 17:20 [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
2021-08-27 17:20 ` [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
2021-09-02  0:15   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the " Tianyu Lan
2021-09-02  0:15   ` Michael Kelley
2021-09-02  6:35     ` Tianyu Lan
2021-08-27 17:21 ` [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support Tianyu Lan
2021-09-02  0:16   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
2021-08-27 17:41   ` Greg KH
2021-08-27 17:44     ` Tianyu Lan
2021-09-02  0:17   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
2021-08-27 17:41   ` Greg KH
2021-08-27 17:46     ` Tianyu Lan
2021-09-02  3:32   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM Tianyu Lan
2021-09-02  0:20   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
2021-09-02  0:21   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
2021-09-02  0:23   ` Michael Kelley
2021-09-02 13:35     ` Tianyu Lan
2021-09-02 16:14       ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
2021-08-27 17:21 ` [PATCH V4 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
2021-08-27 17:21 ` [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
2021-09-02  1:27   ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
2021-09-02  2:34   ` Michael Kelley
2021-09-02  4:56     ` Michael Kelley
2021-08-27 17:21 ` [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
2021-09-02  2:08   ` Michael Kelley
2021-08-30 12:00 ` [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Christoph Hellwig
2021-08-31 15:20   ` Tianyu Lan
2021-09-02  7:51     ` Christoph Hellwig
2021-08-31 17:16   ` Michael Kelley
2021-09-02  7:59     ` Christoph Hellwig
2021-09-02 11:21       ` Tianyu Lan
2021-09-02 15:57       ` Michael Kelley
2021-09-14 14:41         ` Tianyu Lan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).