linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
@ 2021-08-09 17:56 Tianyu Lan
  2021-08-09 17:56 ` [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM Tianyu Lan
                   ` (13 more replies)
  0 siblings, 14 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.


Change since V2:
       - Drop x86_set_memory_enc static call and use platform check
         in the __set_memory_enc_dec() to run platform callback of
	 set memory encrypted or decrypted.

Change since V1:
       - Introduce x86_set_memory_enc static call and so platforms can
         override __set_memory_enc_dec() with their implementation
       - Introduce sev_es_ghcb_hv_call_simple() and share code
         between SEV and Hyper-V code.
       - Not remap monitor pages in the non-SNP isolation VM
       - Make swiotlb_init_io_tlb_mem() return error code and return
         error when dma_map_decrypted() fails.

Change since RFC V4:
       - Introduce dma map decrypted function to remap bounce buffer
          and provide dma map decrypted ops for platform to hook callback.        
       - Split swiotlb and dma map decrypted change into two patches
       - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
       - Add interface set_memory_decrypted_map() to decrypt memory and
         map bounce buffer in extra address space
       - Remove swiotlb remap function and store the remap address
         returned by set_memory_decrypted_map() in swiotlb mem data structure.
       - Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec().

Change since RFC v2:
       - Remove not UIO driver in Isolation VM patch
       - Use vmap_pfn() to replace ioremap_page_range function in
       order to avoid exposing symbol ioremap_page_range() and
       ioremap_page_range()
       - Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted()
       - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
       - Fix code style


Tianyu Lan (13):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  75 ++++++--
 arch/x86/hyperv/ivm.c              | 295 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  20 ++
 arch/x86/include/asm/mshyperv.h    |  87 ++++++++-
 arch/x86/include/asm/sev.h         |   3 +
 arch/x86/kernel/cpu/mshyperv.c     |   5 +
 arch/x86/kernel/sev-shared.c       |  63 +++---
 arch/x86/mm/pat/set_memory.c       |  19 +-
 arch/x86/xen/pci-swiotlb-xen.c     |   3 +-
 drivers/hv/Kconfig                 |   1 +
 drivers/hv/channel.c               |  54 +++++-
 drivers/hv/connection.c            |  71 ++++++-
 drivers/hv/hv.c                    | 129 +++++++++----
 drivers/hv/hyperv_vmbus.h          |   3 +
 drivers/hv/ring_buffer.c           |  84 ++++++--
 drivers/hv/vmbus_drv.c             |   3 +
 drivers/iommu/hyperv-iommu.c       |  65 +++++++
 drivers/net/hyperv/hyperv_net.h    |   6 +
 drivers/net/hyperv/netvsc.c        | 144 +++++++++++++-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c         |  68 ++++++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  54 +++++-
 include/linux/dma-map-ops.h        |   9 +
 include/linux/hyperv.h             |  17 ++
 include/linux/swiotlb.h            |   4 +
 kernel/dma/mapping.c               |  22 +++
 kernel/dma/swiotlb.c               |  32 +++-
 29 files changed, 1212 insertions(+), 129 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-10 10:56   ` Wei Liu
  2021-08-12 19:14   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the " Tianyu Lan
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/hv_init.c       | 66 +++++++++++++++++++++++++++++++--
 arch/x86/include/asm/mshyperv.h |  2 +
 include/asm-generic/mshyperv.h  |  2 +
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 708a2712a516..0bb4d9ca7a55 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include <linux/kexec.h>
 #include <linux/version.h>
 #include <linux/vmalloc.h>
+#include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/hyperv.h>
 #include <linux/slab.h>
@@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
 struct hv_vp_assist_page **hv_vp_assist_page;
 EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 
+static int hyperv_init_ghcb(void)
+{
+	u64 ghcb_gpa;
+	void *ghcb_va;
+	void **ghcb_base;
+
+	if (!ms_hyperv.ghcb_base)
+		return -EINVAL;
+
+	/*
+	 * GHCB page is allocated by paravisor. The address
+	 * returned by MSR_AMD64_SEV_ES_GHCB is above shared
+	 * ghcb boundary and map it here.
+	 */
+	rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+	ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+	if (!ghcb_va)
+		return -ENOMEM;
+
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	*ghcb_base = ghcb_va;
+
+	return 0;
+}
+
 static int hv_cpu_init(unsigned int cpu)
 {
 	union hv_vp_assist_msr_contents msr = { 0 };
@@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
 		}
 	}
 
+	hyperv_init_ghcb();
+
 	return 0;
 }
 
@@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu)
 {
 	struct hv_reenlightenment_control re_ctrl;
 	unsigned int new_cpu;
+	void **ghcb_va = NULL;
+
+	if (ms_hyperv.ghcb_base) {
+		ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+		if (*ghcb_va)
+			memunmap(*ghcb_va);
+		*ghcb_va = NULL;
+	}
 
 	hv_common_cpu_die(cpu);
 
@@ -383,9 +419,19 @@ void __init hyperv_init(void)
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
 			__builtin_return_address(0));
-	if (hv_hypercall_pg == NULL) {
-		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-		goto remove_cpuhp_state;
+	if (hv_hypercall_pg == NULL)
+		goto clean_guest_os_id;
+
+	if (hv_isolation_type_snp()) {
+		ms_hyperv.ghcb_base = alloc_percpu(void *);
+		if (!ms_hyperv.ghcb_base)
+			goto clean_guest_os_id;
+
+		if (hyperv_init_ghcb()) {
+			free_percpu(ms_hyperv.ghcb_base);
+			ms_hyperv.ghcb_base = NULL;
+			goto clean_guest_os_id;
+		}
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -456,7 +502,8 @@ void __init hyperv_init(void)
 	hv_query_ext_cap(0);
 	return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 	cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
 	kfree(hv_vp_assist_page);
@@ -484,6 +531,9 @@ void hyperv_cleanup(void)
 	 */
 	hv_hypercall_pg = NULL;
 
+	if (ms_hyperv.ghcb_base)
+		free_percpu(ms_hyperv.ghcb_base);
+
 	/* Reset the hypercall page */
 	hypercall_msr.as_uint64 = 0;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -559,3 +609,11 @@ bool hv_is_isolation_supported(void)
 {
 	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+bool hv_isolation_type_snp(void)
+{
+	return static_branch_unlikely(&isolation_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..6627cfd2bfba 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include <asm/paravirt.h>
 #include <asm/mshyperv.h>
 
+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
 		void *data);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..4269f3174e58 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,6 +36,7 @@ struct ms_hyperv_info {
 	u32 max_lp_index;
 	u32 isolation_config_a;
 	u32 isolation_config_b;
+	void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
@@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void);
 bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool hv_is_isolation_supported(void);
+bool hv_isolation_type_snp(void);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
 #else /* CONFIG_HYPERV */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
  2021-08-09 17:56 ` [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-12 19:18   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 12 +++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 6b5835a087a3..2b7f396ef1a5 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
 	if (ms_hyperv.priv_high & HV_ISOLATION) {
 		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
 		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+		ms_hyperv.shared_gpa_boundary =
+			(u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 4269f3174e58..aa26d24a5ca9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,8 +35,18 @@ struct ms_hyperv_info {
 	u32 max_vp_index;
 	u32 max_lp_index;
 	u32 isolation_config_a;
-	u32 isolation_config_b;
+	union {
+		u32 isolation_config_b;
+		struct {
+			u32 cvm_type : 4;
+			u32 Reserved11 : 1;
+			u32 shared_gpa_boundary_active : 1;
+			u32 shared_gpa_boundary_bits : 6;
+			u32 Reserved12 : 20;
+		};
+	};
 	void  __percpu **ghcb_base;
+	u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
  2021-08-09 17:56 ` [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM Tianyu Lan
  2021-08-09 17:56 ` [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the " Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-09 22:12   ` Dave Hansen
                     ` (3 more replies)
  2021-08-09 17:56 ` [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
                   ` (10 subsequent siblings)
  13 siblings, 4 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v2:
       * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
         according to platform check.

Change since v1:
       * Use new staic call x86_set_memory_enc to avoid add Hyper-V
         specific check in the set_memory code.
---
 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |   6 ++
 arch/x86/hyperv/ivm.c              | 114 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  20 +++++
 arch/x86/include/asm/mshyperv.h    |   4 +-
 arch/x86/mm/pat/set_memory.c       |  19 +++--
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |   1 +
 8 files changed, 160 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
+obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0bb4d9ca7a55..b3683083208a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
 
 bool hv_is_isolation_supported(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return 0;
+
+	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+		return 0;
+
 	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
 
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index 000000000000..8c905ffdba7f
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan <Tianyu.Lan@microsoft.com>
+ */
+
+#include <linux/hyperv.h>
+#include <linux/types.h>
+#include <linux/bitfield.h>
+#include <linux/slab.h>
+#include <asm/io.h>
+#include <asm/mshyperv.h>
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+			   enum hv_mem_host_visibility visibility)
+{
+	struct hv_gpa_range_for_visibility **input_pcpu, *input;
+	u16 pages_processed;
+	u64 hv_status;
+	unsigned long flags;
+
+	/* no-op if partition isolation is not enabled */
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+			HV_MAX_MODIFY_GPA_REP_COUNT);
+		return -EINVAL;
+	}
+
+	local_irq_save(flags);
+	input_pcpu = (struct hv_gpa_range_for_visibility **)
+			this_cpu_ptr(hyperv_pcpu_input_arg);
+	input = *input_pcpu;
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
+		return -EINVAL;
+	}
+
+	input->partition_id = HV_PARTITION_ID_SELF;
+	input->host_visibility = visibility;
+	input->reserved0 = 0;
+	input->reserved1 = 0;
+	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+	hv_status = hv_do_rep_hypercall(
+			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+			0, input, &pages_processed);
+	local_irq_restore(flags);
+
+	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+		return 0;
+
+	return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount,
+				      enum hv_mem_host_visibility visibility)
+{
+	u64 *pfn_array;
+	int ret = 0;
+	int i, pfn;
+
+	if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+		return 0;
+
+	pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+	if (!pfn_array)
+		return -ENOMEM;
+
+	for (i = 0, pfn = 0; i < pagecount; i++) {
+		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+		pfn++;
+
+		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+			ret |= hv_mark_gpa_visibility(pfn, pfn_array,
+					visibility);
+			pfn = 0;
+
+			if (ret)
+				goto err_free_pfn_array;
+		}
+	}
+
+ err_free_pfn_array:
+	kfree(pfn_array);
+	return ret;
+}
+
+/*
+ * hv_set_mem_host_visibility - Set specified memory visible to host.
+ *
+ * In Isolation VM, all guest memory is encrypted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host. This function works as wrap of hv_mark_gpa_visibility()
+ * with memory base and size.
+ */
+int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
+{
+	enum hv_mem_host_visibility visibility = visible ?
+			VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE;
+
+	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 2322d6bd5883..1691d2bce0b7 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -276,6 +276,13 @@ enum hv_isolation_type {
 #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
 #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
 
+/* Hyper-V memory host visibility */
+enum hv_mem_host_visibility {
+	VMBUS_PAGE_NOT_VISIBLE		= 0,
+	VMBUS_PAGE_VISIBLE_READ_ONLY	= 1,
+	VMBUS_PAGE_VISIBLE_READ_WRITE	= 3
+};
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
@@ -587,4 +594,17 @@ enum hv_interrupt_type {
 
 #include <asm-generic/hyperv-tlfs.h>
 
+/* All input parameters should be in single page. */
+#define HV_MAX_MODIFY_GPA_REP_COUNT		\
+	((PAGE_SIZE / sizeof(u64)) - 2)
+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+struct hv_gpa_range_for_visibility {
+	u64 partition_id;
+	u32 host_visibility:2;
+	u32 reserved0:30;
+	u32 reserved1;
+	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
 #endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6627cfd2bfba..87a386fa97f7 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -190,7 +190,9 @@ struct irq_domain *hv_create_pci_msi_domain(void);
 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 		struct hv_interrupt_entry *entry);
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+			   enum hv_mem_host_visibility visibility);
+int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..1e4a0882820a 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -29,6 +29,8 @@
 #include <asm/proto.h>
 #include <asm/memtype.h>
 #include <asm/set_memory.h>
+#include <asm/hyperv-tlfs.h>
+#include <asm/mshyperv.h>
 
 #include "../mm_internal.h"
 
@@ -1980,15 +1982,11 @@ int set_memory_global(unsigned long addr, int numpages)
 				    __pgprot(_PAGE_GLOBAL), 0);
 }
 
-static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
 	int ret;
 
-	/* Nothing to do if memory encryption is not active */
-	if (!mem_encrypt_active())
-		return 0;
-
 	/* Should not be working on unaligned addresses */
 	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
 		addr &= PAGE_MASK;
@@ -2023,6 +2021,17 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	return ret;
 }
 
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+	if (hv_is_isolation_supported())
+		return hv_set_mem_host_visibility(addr, numpages, !enc);
+
+	if (mem_encrypt_active())
+		return __set_memory_enc_pgtable(addr, numpages, enc);
+
+	return 0;
+}
+
 int set_memory_encrypted(unsigned long addr, int numpages)
 {
 	return __set_memory_enc_dec(addr, numpages, true);
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 56348a541c50..8ed6733d5146 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_RETARGET_INTERRUPT		0x007e
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
+#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
 
 /* Extended hypercalls */
 #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index aa26d24a5ca9..079988ed45b9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -255,6 +255,7 @@ bool hv_query_ext_cap(u64 cap_query);
 static inline bool hv_is_hyperv_initialized(void) { return false; }
 static inline bool hv_is_hibernation_supported(void) { return false; }
 static inline void hyperv_cleanup(void) {}
+static inline hv_is_isolation_supported(void);
 #endif /* CONFIG_HYPERV */
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (2 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-12 22:20   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page Tianyu Lan
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/channel.c   | 44 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/hyperv.h | 11 +++++++++++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..4c4717c26240 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include <linux/hyperv.h>
 #include <linux/uio.h>
 #include <linux/interrupt.h>
+#include <linux/set_memory.h>
 #include <asm/page.h>
 #include <asm/mshyperv.h>
 
@@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	struct list_head *curr;
 	u32 next_gpadl_handle;
 	unsigned long flags;
-	int ret = 0;
+	int ret = 0, index;
+
+	index = atomic_inc_return(&channel->gpadl_index) - 1;
+
+	if (index > VMBUS_GPADL_RANGE_COUNT - 1) {
+		pr_err("Gpadl handle position(%d) has been occupied.\n", index);
+		return -ENOSPC;
+	}
 
 	next_gpadl_handle =
 		(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
@@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	if (ret)
 		return ret;
 
+	ret = set_memory_decrypted((unsigned long)kbuffer,
+				   HVPFN_UP(size));
+	if (ret) {
+		pr_warn("Failed to set host visibility.\n");
+		return ret;
+	}
+
 	init_completion(&msginfo->waitevent);
 	msginfo->waiting_channel = channel;
 
@@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	/* At this point, we received the gpadl created msg */
 	*gpadl_handle = gpadlmsg->gpadl;
 
+	channel->gpadl_array[index].size = size;
+	channel->gpadl_array[index].buffer = kbuffer;
+	channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
+
 cleanup:
 	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
 	list_del(&msginfo->msglistentry);
@@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	}
 
 	kfree(msginfo);
+
+	if (ret) {
+		set_memory_encrypted((unsigned long)kbuffer,
+				     HVPFN_UP(size));
+		atomic_dec(&channel->gpadl_index);
+	}
+
 	return ret;
 }
 
@@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 
 	/* Establish the gpadl for the ring buffer */
 	newchannel->ringbuffer_gpadlhandle = 0;
+	atomic_set(&newchannel->gpadl_index, 0);
 
 	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
 				      page_address(newchannel->ringbuffer_page),
@@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 	struct vmbus_channel_gpadl_teardown *msg;
 	struct vmbus_channel_msginfo *info;
 	unsigned long flags;
-	int ret;
+	int ret, i;
 
 	info = kzalloc(sizeof(*info) +
 		       sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
@@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
 
 	kfree(info);
+
+	/* Find gpadl buffer virtual address and size. */
+	for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
+		if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
+			break;
+
+	if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
+			HVPFN_UP(channel->gpadl_array[i].size)))
+		pr_warn("Fail to set mem host visibility.\n");
+
+	channel->gpadl_array[i].gpadlhandle = 0;
+	atomic_dec(&channel->gpadl_index);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index ddc8713ce57b..90b542597143 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -803,6 +803,14 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl {
+	u32 gpadlhandle;
+	u32 size;
+	void *buffer;
+};
+
+#define VMBUS_GPADL_RANGE_COUNT		3
+
 struct vmbus_channel {
 	struct list_head listentry;
 
@@ -823,6 +831,9 @@ struct vmbus_channel {
 	struct completion rescind_event;
 
 	u32 ringbuffer_gpadlhandle;
+	/* GPADL_RING and Send/Receive GPADL_BUFFER. */
+	struct vmbus_gpadl gpadl_array[VMBUS_GPADL_RANGE_COUNT];
+	atomic_t gpadl_index;
 
 	/* Allocated memory for ring buffer */
 	struct page *ringbuffer_page;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (3 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-13 19:31   ` Michael Kelley
  2021-08-24  8:45   ` Christoph Hellwig
  2021-08-09 17:56 ` [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyper-V requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v1:
         * Introduce sev_es_ghcb_hv_call_simple() and share code
           between SEV and Hyper-V code.
---
 arch/x86/hyperv/hv_init.c       |  33 ++-------
 arch/x86/hyperv/ivm.c           | 110 +++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h |  78 +++++++++++++++++++-
 arch/x86/include/asm/sev.h      |   3 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 arch/x86/kernel/sev-shared.c    |  63 ++++++++++-------
 drivers/hv/hv.c                 | 121 ++++++++++++++++++++++----------
 include/asm-generic/mshyperv.h  |  12 +++-
 8 files changed, 329 insertions(+), 94 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b3683083208a..ab0b33f621e7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -423,7 +423,7 @@ void __init hyperv_init(void)
 		goto clean_guest_os_id;
 
 	if (hv_isolation_type_snp()) {
-		ms_hyperv.ghcb_base = alloc_percpu(void *);
+		ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
 		if (!ms_hyperv.ghcb_base)
 			goto clean_guest_os_id;
 
@@ -432,6 +432,9 @@ void __init hyperv_init(void)
 			ms_hyperv.ghcb_base = NULL;
 			goto clean_guest_os_id;
 		}
+
+		/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+		hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -523,6 +526,7 @@ void hyperv_cleanup(void)
 
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
 	/*
 	 * Reset hypercall page reference before reset the page,
@@ -596,30 +600,3 @@ bool hv_is_hyperv_initialized(void)
 	return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-	if (!(ms_hyperv.priv_high & HV_ISOLATION))
-		return HV_ISOLATION_TYPE_NONE;
-	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
-		return 0;
-
-	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
-		return 0;
-
-	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
-
-bool hv_isolation_type_snp(void)
-{
-	return static_branch_unlikely(&isolation_type_snp);
-}
-EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 8c905ffdba7f..ec0e5c259740 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,6 +6,8 @@
  *  Tianyu Lan <Tianyu.Lan@microsoft.com>
  */
 
+#include <linux/types.h>
+#include <linux/bitfield.h>
 #include <linux/hyperv.h>
 #include <linux/types.h>
 #include <linux/bitfield.h>
@@ -13,6 +15,114 @@
 #include <asm/io.h>
 #include <asm/mshyperv.h>
 
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!ms_hyperv.ghcb_base)
+		return;
+
+	WARN_ON(in_nmi());
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
+	ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
+
+	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
+		pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+	local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!ms_hyperv.ghcb_base)
+		return;
+
+	WARN_ON(in_nmi());
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
+		pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+	else
+		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+	local_irq_restore(flags);
+}
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+	hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
+{
+	hv_ghcb_msr_write(msr, value);
+
+	/* Write proxy bit vua wrmsrl instruction. */
+	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
+		wrmsrl(msr, value | 1 << 20);
+}
+EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
+
+void hv_signal_eom_ghcb(void)
+{
+	hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
+}
+EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);
+
+enum hv_isolation_type hv_get_isolation_type(void)
+{
+	if (!(ms_hyperv.priv_high & HV_ISOLATION))
+		return HV_ISOLATION_TYPE_NONE;
+	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
+}
+EXPORT_SYMBOL_GPL(hv_get_isolation_type);
+
+/*
+ * hv_is_isolation_supported - Check system runs in the Hyper-V
+ * isolation VM.
+ */
+bool hv_is_isolation_supported(void)
+{
+	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
+}
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+/*
+ * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
+ * isolation VM.
+ */
+bool hv_isolation_type_snp(void)
+{
+	return static_branch_unlikely(&isolation_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
+
 /*
  * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
  *
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 87a386fa97f7..730985676ea3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -30,6 +30,63 @@ static inline u64 hv_get_register(unsigned int reg)
 	return value;
 }
 
+#define hv_get_sint_reg(val, reg) {		\
+	if (hv_isolation_type_snp())		\
+		hv_get_##reg##_ghcb(&val);	\
+	else					\
+		rdmsrl(HV_X64_MSR_##reg, val);	\
+	}
+
+#define hv_set_sint_reg(val, reg) {		\
+	if (hv_isolation_type_snp())		\
+		hv_set_##reg##_ghcb(val);	\
+	else					\
+		wrmsrl(HV_X64_MSR_##reg, val);	\
+	}
+
+
+#define hv_get_simp(val) hv_get_sint_reg(val, SIMP)
+#define hv_get_siefp(val) hv_get_sint_reg(val, SIEFP)
+
+#define hv_set_simp(val) hv_set_sint_reg(val, SIMP)
+#define hv_set_siefp(val) hv_set_sint_reg(val, SIEFP)
+
+#define hv_get_synic_state(val) {			\
+	if (hv_isolation_type_snp())			\
+		hv_get_synic_state_ghcb(&val);		\
+	else						\
+		rdmsrl(HV_X64_MSR_SCONTROL, val);	\
+	}
+#define hv_set_synic_state(val) {			\
+	if (hv_isolation_type_snp())			\
+		hv_set_synic_state_ghcb(val);		\
+	else						\
+		wrmsrl(HV_X64_MSR_SCONTROL, val);	\
+	}
+
+#define hv_get_vp_index(index) rdmsrl(HV_X64_MSR_VP_INDEX, index)
+
+#define hv_signal_eom() {			 \
+	if (hv_isolation_type_snp() &&		 \
+	    old_msg_type != HVMSG_TIMER_EXPIRED) \
+		hv_signal_eom_ghcb();		 \
+	else					 \
+		wrmsrl(HV_X64_MSR_EOM, 0);	 \
+	}
+
+#define hv_get_synint_state(int_num, val) {		\
+	if (hv_isolation_type_snp())			\
+		hv_get_synint_state_ghcb(int_num, &val);\
+	else						\
+		rdmsrl(HV_X64_MSR_SINT0 + int_num, val);\
+	}
+#define hv_set_synint_state(int_num, val) {		\
+	if (hv_isolation_type_snp())			\
+		hv_set_synint_state_ghcb(int_num, val);	\
+	else						\
+		wrmsrl(HV_X64_MSR_SINT0 + int_num, val);\
+	}
+
 #define hv_get_raw_timer() rdtsc_ordered()
 
 void hyperv_vector_handler(struct pt_regs *regs);
@@ -193,6 +250,25 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 			   enum hv_mem_host_visibility visibility);
 int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
+void hv_signal_eom_ghcb(void);
+void hv_ghcb_msr_write(u64 msr, u64 value);
+void hv_ghcb_msr_read(u64 msr, u64 *value);
+
+#define hv_get_synint_state_ghcb(int_num, val)			\
+	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+#define hv_set_synint_state_ghcb(int_num, val) \
+	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+
+#define hv_get_SIMP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
+#define hv_set_SIMP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
+
+#define hv_get_SIEFP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+#define hv_set_SIEFP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+
+#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
+#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
@@ -209,9 +285,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
 {
 	return -1;
 }
+static inline void hv_signal_eom_ghcb(void) { };
 #endif /* CONFIG_HYPERV */
 
-
 #include <asm-generic/mshyperv.h>
 
 #endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..81beb2a8031b 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -81,6 +81,9 @@ static __always_inline void sev_es_nmi_complete(void)
 		__sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 2b7f396ef1a5..3633f871ac1e 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -318,6 +318,9 @@ static void __init ms_hyperv_init_platform(void)
 
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
+
+		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+			static_branch_enable(&isolation_type_snp);
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 9f90f460a28c..dd7f37de640b 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -94,10 +94,9 @@ static void vc_finish_insn(struct es_em_ctxt *ctxt)
 	ctxt->regs->ip += ctxt->insn.length;
 }
 
-static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
-					  struct es_em_ctxt *ctxt,
-					  u64 exit_code, u64 exit_info_1,
-					  u64 exit_info_2)
+enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2)
 {
 	enum es_result ret;
 
@@ -109,29 +108,45 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
 	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
 	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
 
-	sev_es_wr_ghcb_msr(__pa(ghcb));
 	VMGEXIT();
 
-	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
-		u64 info = ghcb->save.sw_exit_info_2;
-		unsigned long v;
-
-		info = ghcb->save.sw_exit_info_2;
-		v = info & SVM_EVTINJ_VEC_MASK;
-
-		/* Check if exception information from hypervisor is sane. */
-		if ((info & SVM_EVTINJ_VALID) &&
-		    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
-		    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
-			ctxt->fi.vector = v;
-			if (info & SVM_EVTINJ_VALID_ERR)
-				ctxt->fi.error_code = info >> 32;
-			ret = ES_EXCEPTION;
-		} else {
-			ret = ES_VMM_ERROR;
-		}
-	} else {
+	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1)
+		ret = ES_VMM_ERROR;
+	else
 		ret = ES_OK;
+
+	return ret;
+}
+
+static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+				   struct es_em_ctxt *ctxt,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2)
+{
+	unsigned long v;
+	enum es_result ret;
+	u64 info;
+
+	sev_es_wr_ghcb_msr(__pa(ghcb));
+
+	ret = sev_es_ghcb_hv_call_simple(ghcb, exit_code, exit_info_1,
+					 exit_info_2);
+	if (ret == ES_OK)
+		return ret;
+
+	info = ghcb->save.sw_exit_info_2;
+	v = info & SVM_EVTINJ_VEC_MASK;
+
+	/* Check if exception information from hypervisor is sane. */
+	if ((info & SVM_EVTINJ_VALID) &&
+	    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
+	    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
+		ctxt->fi.vector = v;
+		if (info & SVM_EVTINJ_VALID_ERR)
+			ctxt->fi.error_code = info >> 32;
+		ret = ES_EXCEPTION;
+	} else {
+		ret = ES_VMM_ERROR;
 	}
 
 	return ret;
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index e83507f49676..59f7173c4d9f 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -8,6 +8,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
@@ -136,17 +137,24 @@ int hv_synic_alloc(void)
 		tasklet_init(&hv_cpu->msg_dpc,
 			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
 
-		hv_cpu->synic_message_page =
-			(void *)get_zeroed_page(GFP_ATOMIC);
-		if (hv_cpu->synic_message_page == NULL) {
-			pr_err("Unable to allocate SYNIC message page\n");
-			goto err;
-		}
+		/*
+		 * Synic message and event pages are allocated by paravisor.
+		 * Skip these pages allocation here.
+		 */
+		if (!hv_isolation_type_snp()) {
+			hv_cpu->synic_message_page =
+				(void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->synic_message_page == NULL) {
+				pr_err("Unable to allocate SYNIC message page\n");
+				goto err;
+			}
 
-		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
-		if (hv_cpu->synic_event_page == NULL) {
-			pr_err("Unable to allocate SYNIC event page\n");
-			goto err;
+			hv_cpu->synic_event_page =
+				(void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->synic_event_page == NULL) {
+				pr_err("Unable to allocate SYNIC event page\n");
+				goto err;
+			}
 		}
 
 		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
@@ -173,10 +181,17 @@ void hv_synic_free(void)
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
+		free_page((unsigned long)hv_cpu->post_msg_page);
+
+		/*
+		 * Synic message and event pages are allocated by paravisor.
+		 * Skip free these pages here.
+		 */
+		if (hv_isolation_type_snp())
+			continue;
 
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
-		free_page((unsigned long)hv_cpu->post_msg_page);
 	}
 
 	kfree(hv_context.hv_numa_map);
@@ -199,26 +214,43 @@ void hv_synic_enable_regs(unsigned int cpu)
 	union hv_synic_scontrol sctrl;
 
 	/* Setup the Synic's message page */
-	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
+	hv_get_simp(simp.as_uint64);
 	simp.simp_enabled = 1;
-	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
-		>> HV_HYP_PAGE_SHIFT;
 
-	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
+	if (hv_isolation_type_snp()) {
+		hv_cpu->synic_message_page
+			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
+				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+		if (!hv_cpu->synic_message_page)
+			pr_err("Fail to map syinc message page.\n");
+	} else {
+		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+			>> HV_HYP_PAGE_SHIFT;
+	}
+
+	hv_set_simp(simp.as_uint64);
 
 	/* Setup the Synic's event page */
-	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
+	hv_get_siefp(siefp.as_uint64);
 	siefp.siefp_enabled = 1;
-	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
-		>> HV_HYP_PAGE_SHIFT;
 
-	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
+	if (hv_isolation_type_snp()) {
+		hv_cpu->synic_event_page =
+			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
+				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+
+		if (!hv_cpu->synic_event_page)
+			pr_err("Fail to map syinc event page.\n");
+	} else {
+		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+			>> HV_HYP_PAGE_SHIFT;
+	}
+	hv_set_siefp(siefp.as_uint64);
 
 	/* Setup the shared SINT. */
 	if (vmbus_irq != -1)
 		enable_percpu_irq(vmbus_irq, 0);
-	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
-					VMBUS_MESSAGE_SINT);
+	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 
 	shared_sint.vector = vmbus_interrupt;
 	shared_sint.masked = false;
@@ -233,14 +265,12 @@ void hv_synic_enable_regs(unsigned int cpu)
 #else
 	shared_sint.auto_eoi = 0;
 #endif
-	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
-				shared_sint.as_uint64);
+	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 
 	/* Enable the global synic bit */
-	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
+	hv_get_synic_state(sctrl.as_uint64);
 	sctrl.enable = 1;
-
-	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
+	hv_set_synic_state(sctrl.as_uint64);
 }
 
 int hv_synic_init(unsigned int cpu)
@@ -257,37 +287,50 @@ int hv_synic_init(unsigned int cpu)
  */
 void hv_synic_disable_regs(unsigned int cpu)
 {
+	struct hv_per_cpu_context *hv_cpu
+		= per_cpu_ptr(hv_context.cpu_context, cpu);
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_scontrol sctrl;
 
-	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
-					VMBUS_MESSAGE_SINT);
-
+	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 	shared_sint.masked = 1;
+	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
 
 	/* Need to correctly cleanup in the case of SMP!!! */
 	/* Disable the interrupt */
-	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
-				shared_sint.as_uint64);
+	hv_get_simp(simp.as_uint64);
 
-	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
+	/*
+	 * In Isolation VM, sim and sief pages are allocated by
+	 * paravisor. These pages also will be used by kdump
+	 * kernel. So just reset enable bit here and keep page
+	 * addresses.
+	 */
 	simp.simp_enabled = 0;
-	simp.base_simp_gpa = 0;
+	if (hv_isolation_type_snp())
+		memunmap(hv_cpu->synic_message_page);
+	else
+		simp.base_simp_gpa = 0;
 
-	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
+	hv_set_simp(simp.as_uint64);
 
-	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
+	hv_get_siefp(siefp.as_uint64);
 	siefp.siefp_enabled = 0;
-	siefp.base_siefp_gpa = 0;
 
-	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
+	if (hv_isolation_type_snp())
+		memunmap(hv_cpu->synic_event_page);
+	else
+		siefp.base_siefp_gpa = 0;
+
+	hv_set_siefp(siefp.as_uint64);
 
 	/* Disable the global synic bit */
-	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
+	hv_get_synic_state(sctrl.as_uint64);
 	sctrl.enable = 0;
-	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
+	hv_set_synic_state(sctrl.as_uint64);
 
 	if (vmbus_irq != -1)
 		disable_percpu_irq(vmbus_irq);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 079988ed45b9..90dac369a2dc 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -23,9 +23,16 @@
 #include <linux/bitops.h>
 #include <linux/cpumask.h>
 #include <linux/nmi.h>
+#include <asm/svm.h>
+#include <asm/sev.h>
 #include <asm/ptrace.h>
+#include <asm/mshyperv.h>
 #include <asm/hyperv-tlfs.h>
 
+union hv_ghcb {
+	struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
 struct ms_hyperv_info {
 	u32 features;
 	u32 priv_high;
@@ -45,7 +52,7 @@ struct ms_hyperv_info {
 			u32 Reserved12 : 20;
 		};
 	};
-	void  __percpu **ghcb_base;
+	union hv_ghcb __percpu **ghcb_base;
 	u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
@@ -55,6 +62,7 @@ extern void  __percpu  **hyperv_pcpu_output_arg;
 
 extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
+extern bool hv_isolation_type_snp(void);
 
 /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
 static inline int hv_result(u64 status)
@@ -149,7 +157,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 		 * possibly deliver another msg from the
 		 * hypervisor
 		 */
-		hv_set_register(HV_REGISTER_EOM, 0);
+		hv_signal_eom();
 	}
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (4 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-13 20:42   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/ivm.c           | 43 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c         |  6 ++++-
 drivers/hv/hv.c                 |  8 +++++-
 include/asm-generic/mshyperv.h  | 29 ++++++++++++++++++++++
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index ec0e5c259740..c13ec5560d73 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,49 @@
 #include <asm/io.h>
 #include <asm/mshyperv.h>
 
+#define GHCB_USAGE_HYPERV_CALL	1
+
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!ms_hyperv.ghcb_base)
+		return -EFAULT;
+
+	WARN_ON(in_nmi());
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return -EFAULT;
+	}
+
+	hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
+	hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
+
+	hv_ghcb->hypercall.outputgpa = (u64)output;
+	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+	hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+	if (input_size)
+		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+	VMGEXIT();
+
+	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
+	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+	local_irq_restore(flags);
+
+	return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
 	union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 730985676ea3..a30c60f189a3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val)			\
 	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
 	++channel->sig_events;
 
-	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+	if (hv_isolation_type_snp())
+		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
+				NULL, sizeof(u64));
+	else
+		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 59f7173c4d9f..e5c9fc467893 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
 	aligned_msg->payload_size = payload_size;
 	memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-	status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+	if (hv_isolation_type_snp())
+		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+				(void *)aligned_msg, NULL,
+				sizeof(struct hv_input_post_message));
+	else
+		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+				aligned_msg, NULL);
 
 	/* Preemption must remain disabled until after the hypercall
 	 * so some other thread can't get scheduled onto this cpu and
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 90dac369a2dc..400181b855c1 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -31,6 +31,35 @@
 
 union hv_ghcb {
 	struct ghcb ghcb;
+	struct {
+		u64 hypercalldata[509];
+		u64 outputgpa;
+		union {
+			union {
+				struct {
+					u32 callcode        : 16;
+					u32 isfast          : 1;
+					u32 reserved1       : 14;
+					u32 isnested        : 1;
+					u32 countofelements : 12;
+					u32 reserved2       : 4;
+					u32 repstartindex   : 12;
+					u32 reserved3       : 4;
+				};
+				u64 asuint64;
+			} hypercallinput;
+			union {
+				struct {
+					u16 callstatus;
+					u16 reserved1;
+					u32 elementsprocessed : 12;
+					u32 reserved2         : 20;
+				};
+				u64 asunit64;
+			} hypercalloutput;
+		};
+		u64 reserved2;
+	} hypercall;
 } __packed __aligned(PAGE_SIZE);
 
 struct ms_hyperv_info {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (5 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-13 21:28   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary). Introduce monitor_pages_va to store
the remap address and unmap these va when disconnect vmbus.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v1:
        * Not remap monitor pages in the non-SNP isolation VM.
---
 drivers/hv/connection.c   | 65 +++++++++++++++++++++++++++++++++++++++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 66 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..bf0ac3167bd2 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
 #include <linux/vmalloc.h>
 #include <linux/hyperv.h>
 #include <linux/export.h>
+#include <linux/io.h>
 #include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 
 	msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
 	msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+	if (hv_isolation_type_snp()) {
+		msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+		msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+	}
+
 	msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
 	/*
@@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 		return -ECONNREFUSED;
 	}
 
+	if (hv_isolation_type_snp()) {
+		vmbus_connection.monitor_pages_va[0]
+			= vmbus_connection.monitor_pages[0];
+		vmbus_connection.monitor_pages[0]
+			= memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+				   MEMREMAP_WB);
+		if (!vmbus_connection.monitor_pages[0])
+			return -ENOMEM;
+
+		vmbus_connection.monitor_pages_va[1]
+			= vmbus_connection.monitor_pages[1];
+		vmbus_connection.monitor_pages[1]
+			= memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+				   MEMREMAP_WB);
+		if (!vmbus_connection.monitor_pages[1]) {
+			memunmap(vmbus_connection.monitor_pages[0]);
+			return -ENOMEM;
+		}
+
+		memset(vmbus_connection.monitor_pages[0], 0x00,
+		       HV_HYP_PAGE_SIZE);
+		memset(vmbus_connection.monitor_pages[1], 0x00,
+		       HV_HYP_PAGE_SIZE);
+	}
+
 	return ret;
 }
 
@@ -159,6 +191,7 @@ int vmbus_connect(void)
 	struct vmbus_channel_msginfo *msginfo = NULL;
 	int i, ret = 0;
 	__u32 version;
+	u64 pfn[2];
 
 	/* Initialize the vmbus connection */
 	vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +249,16 @@ int vmbus_connect(void)
 		goto cleanup;
 	}
 
+	if (hv_is_isolation_supported()) {
+		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+		if (hv_mark_gpa_visibility(2, pfn,
+				VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+			ret = -EFAULT;
+			goto cleanup;
+		}
+	}
+
 	msginfo = kzalloc(sizeof(*msginfo) +
 			  sizeof(struct vmbus_channel_initiate_contact),
 			  GFP_KERNEL);
@@ -284,6 +327,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+	u64 pfn[2];
+
 	/*
 	 * First send the unload request to the host.
 	 */
@@ -303,6 +348,26 @@ void vmbus_disconnect(void)
 		vmbus_connection.int_page = NULL;
 	}
 
+	if (hv_is_isolation_supported()) {
+		if (vmbus_connection.monitor_pages_va[0]) {
+			memunmap(vmbus_connection.monitor_pages[0]);
+			vmbus_connection.monitor_pages[0]
+				= vmbus_connection.monitor_pages_va[0];
+			vmbus_connection.monitor_pages_va[0] = NULL;
+		}
+
+		if (vmbus_connection.monitor_pages_va[1]) {
+			memunmap(vmbus_connection.monitor_pages[1]);
+			vmbus_connection.monitor_pages[1]
+				= vmbus_connection.monitor_pages_va[1];
+			vmbus_connection.monitor_pages_va[1] = NULL;
+		}
+
+		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+		hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+	}
+
 	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
 	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
 	vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 42f3d9d123a1..40bc0eff6665 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
 	 * is child->parent notification
 	 */
 	struct hv_monitor_page *monitor_pages[2];
+	void *monitor_pages_va[2];
 	struct list_head chn_msg_list;
 	spinlock_t channelmsg_lock;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (6 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-16 17:28   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/Kconfig        |  1 +
 drivers/hv/channel.c      | 10 +++++
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 84 ++++++++++++++++++++++++++++++---------
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index d1123ceb38f3..dd12af20e467 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -8,6 +8,7 @@ config HYPERV
 		|| (ARM64 && !CPU_BIG_ENDIAN))
 	select PARAVIRT
 	select X86_HV_CALLBACK_VECTOR if X86
+	select VMAP_PFN
 	help
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 4c4717c26240..60ef881a700c 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	if (err)
 		goto error_clean_ring;
 
+	err = hv_ringbuffer_post_init(&newchannel->outbound,
+				      page, send_pages);
+	if (err)
+		goto error_free_gpadl;
+
+	err = hv_ringbuffer_post_init(&newchannel->inbound,
+				      &page[send_pages], recv_pages);
+	if (err)
+		goto error_free_gpadl;
+
 	/* Create and init the channel open message */
 	open_info = kzalloc(sizeof(*open_info) +
 			   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 40bc0eff6665..15cd23a561f3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+		struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 		       struct page *pages, u32 pagecnt, u32 max_pkt_size);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..d4f93fca1108 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
 #include <linux/prefetch.h>
+#include <linux/io.h>
+#include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"
 
@@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
 	mutex_init(&channel->outbound.ring_buffer_mutex);
 }
 
-/* Initialize the ring buffer. */
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+		       struct page *pages, u32 page_cnt)
 {
+	u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+	unsigned long *pfns_wraparound;
+	void *vaddr;
 	int i;
-	struct page **pages_wraparound;
 
-	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+	if (!hv_isolation_type_snp())
+		return 0;
+
+	physic_addr += ms_hyperv.shared_gpa_boundary;
 
 	/*
 	 * First page holds struct hv_ring_buffer, do wraparound mapping for
 	 * the rest.
 	 */
-	pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+	pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
 				   GFP_KERNEL);
-	if (!pages_wraparound)
+	if (!pfns_wraparound)
 		return -ENOMEM;
 
-	pages_wraparound[0] = pages;
+	pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
 	for (i = 0; i < 2 * (page_cnt - 1); i++)
-		pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
-
-	ring_info->ring_buffer = (struct hv_ring_buffer *)
-		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
-
-	kfree(pages_wraparound);
+		pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
+			i % (page_cnt - 1) + 1;
 
-
-	if (!ring_info->ring_buffer)
+	vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
+	kfree(pfns_wraparound);
+	if (!vaddr)
 		return -ENOMEM;
 
-	ring_info->ring_buffer->read_index =
-		ring_info->ring_buffer->write_index = 0;
+	/* Clean memory after setting host visibility. */
+	memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+	ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+	ring_info->ring_buffer->read_index = 0;
+	ring_info->ring_buffer->write_index = 0;
 
 	/* Set the feature bit for enabling flow control. */
 	ring_info->ring_buffer->feature_bits.value = 1;
 
+	return 0;
+}
+
+/* Initialize the ring buffer. */
+int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
+		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
+{
+	int i;
+	struct page **pages_wraparound;
+
+	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+
+	if (!hv_isolation_type_snp()) {
+		/*
+		 * First page holds struct hv_ring_buffer, do wraparound mapping for
+		 * the rest.
+		 */
+		pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+					   GFP_KERNEL);
+		if (!pages_wraparound)
+			return -ENOMEM;
+
+		pages_wraparound[0] = pages;
+		for (i = 0; i < 2 * (page_cnt - 1); i++)
+			pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
+
+		ring_info->ring_buffer = (struct hv_ring_buffer *)
+			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
+
+		kfree(pages_wraparound);
+
+		if (!ring_info->ring_buffer)
+			return -ENOMEM;
+
+		ring_info->ring_buffer->read_index =
+			ring_info->ring_buffer->write_index = 0;
+
+		/* Set the feature bit for enabling flow control. */
+		ring_info->ring_buffer->feature_bits.value = 1;
+	}
+
 	ring_info->ring_size = page_cnt << PAGE_SHIFT;
 	ring_info->ring_size_div10_reciprocal =
 		reciprocal_value(ring_info->ring_size / 10);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (7 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-12 12:26   ` Christoph Hellwig
  2021-08-09 17:56 ` [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops.
---
 include/linux/dma-map-ops.h |  9 +++++++++
 kernel/dma/mapping.c        | 22 ++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
 	unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
+struct dma_memory_decrypted_ops {
+	void *(*map)(void *addr, unsigned long size);
+	void (*unmap)(void *addr);
+};
+
 #ifdef CONFIG_DMA_OPS
 #include <asm/dma-mapping.h>
 
@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
 }
 #endif /* CONFIG_DMA_API_DEBUG */
 
+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
 extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 
 #endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
 #include <linux/of_device.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <asm/set_memory.h>
 #include "debug.h"
 #include "direct.h"
 
 bool dma_default_coherent;
 
+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 /*
  * Managed DMA API
  */
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
 	return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+	if (set_memory_decrypted((unsigned long)addr,
+				 size / PAGE_SIZE))
+		return NULL;
+
+	if (dma_memory_generic_decrypted_ops.map)
+		return dma_memory_generic_decrypted_ops.map(addr, size);
+	else
+		return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+	if (dma_memory_generic_decrypted_ops.unmap)
+		dma_memory_generic_decrypted_ops.unmap(addr);
+
+	return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (8 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-12 12:27   ` Christoph Hellwig
  2021-08-09 17:56 ` [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v1:
       * Make swiotlb_init_io_tlb_mem() return error code and return
         error when dma_map_decrypted() fails.
---
 include/linux/swiotlb.h |  4 ++++
 kernel/dma/swiotlb.c    | 32 ++++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:	The end address of the swiotlb memory pool. Used to do a quick
  *		range check to see if the memory was in fact allocated by this
  *		API.
+ * @vaddr:	The vaddr of the swiotlb memory pool. The swiotlb
+ *		memory pool may be remapped in the memory encrypted case and store
+ *		virtual address for bounce buffer operation.
  * @nslabs:	The number of IO TLB blocks (in groups of 64) between @start and
  *		@end. For default swiotlb, this is command line adjustable via
  *		setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
 	phys_addr_t start;
 	phys_addr_t end;
+	void *vaddr;
 	unsigned long nslabs;
 	unsigned long used;
 	unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..29b6d888ef3b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -176,7 +176,7 @@ void __init swiotlb_update_mem_attributes(void)
 	memset(vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
 				    unsigned long nslabs, bool late_alloc)
 {
 	void *vaddr = phys_to_virt(start);
@@ -194,14 +194,21 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
 		mem->slots[i].alloc_size = 0;
 	}
 
-	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-	memset(vaddr, 0, bytes);
+	mem->vaddr = dma_map_decrypted(vaddr, bytes);
+	if (!mem->vaddr) {
+		pr_err("Failed to decrypt memory.\n");
+		return -ENOMEM;
+	}
+
+	memset(mem->vaddr, 0, bytes);
+	return 0;
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	struct io_tlb_mem *mem;
 	size_t alloc_size;
+	int ret;
 
 	if (swiotlb_force == SWIOTLB_NO_FORCE)
 		return 0;
@@ -216,7 +223,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+	ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+	if (ret) {
+		memblock_free(__pa(mem), alloc_size);
+		return ret;
+	}
 
 	io_tlb_default_mem = mem;
 	if (verbose)
@@ -304,6 +315,8 @@ int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
 	struct io_tlb_mem *mem;
+	int size = get_order(struct_size(mem, slots, nslabs));
+	int ret;
 
 	if (swiotlb_force == SWIOTLB_NO_FORCE)
 		return 0;
@@ -312,12 +325,15 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	if (WARN_ON_ONCE(io_tlb_default_mem))
 		return -ENOMEM;
 
-	mem = (void *)__get_free_pages(GFP_KERNEL,
-		get_order(struct_size(mem, slots, nslabs)));
+	mem = (void *)__get_free_pages(GFP_KERNEL, size);
 	if (!mem)
 		return -ENOMEM;
 
-	swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+	ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+	if (ret) {
+		free_pages((unsigned long)mem, size);
+		return ret;
+	}
 
 	io_tlb_default_mem = mem;
 	swiotlb_print_info();
@@ -360,7 +376,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
-	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
 	unsigned int tlb_offset;
 
 	if (orig_addr == INVALID_PHYS_ADDR)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (9 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-19 18:11   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/ivm.c           | 28 ++++++++++++++
 arch/x86/include/asm/mshyperv.h |  2 +
 arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
 drivers/hv/vmbus_drv.c          |  3 ++
 drivers/iommu/hyperv-iommu.c    | 65 +++++++++++++++++++++++++++++++++
 include/linux/hyperv.h          |  1 +
 6 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index c13ec5560d73..0f05e4d6fc62 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -265,3 +265,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
 
 	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+				      sizeof(unsigned long), GFP_KERNEL);
+	void *vaddr;
+	int i;
+
+	if (!pfns)
+		return NULL;
+
+	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+		pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
+			(ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE,	PAGE_KERNEL_IO);
+	kfree(pfns);
+
+	return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+	vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index a30c60f189a3..b247739f57ac 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 			   enum hv_mem_host_visibility visibility);
 int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
 void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
 void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include <linux/dma-map-ops.h>
 #include <linux/pci.h>
+#include <linux/hyperv.h>
 #include <xen/swiotlb-xen.h>
 
 #include <asm/xen/hypervisor.h>
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
-		  NULL,
+		  hyperv_swiotlb_detect,
 		  pci_xen_swiotlb_init,
 		  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include <linux/cpu.h>
 #include <linux/sched/task_stack.h>
 
+#include <linux/dma-map-ops.h>
 #include <linux/delay.h>
 #include <linux/notifier.h>
 #include <linux/panic_notifier.h>
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
 	return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
 	}
 	hv_debug_add_dev_dir(child_device_obj);
 
+	child_device_obj->device.dma_mask = &vmbus_dma_mask;
 	return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..01e874b3b43a 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,22 @@
 #include <linux/irq.h>
 #include <linux/iommu.h>
 #include <linux/module.h>
+#include <linux/hyperv.h>
+#include <linux/io.h>
 
 #include <asm/apic.h>
 #include <asm/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/io_apic.h>
+#include <asm/iommu.h>
+#include <asm/iommu_table.h>
 #include <asm/irq_remapping.h>
 #include <asm/hypervisor.h>
 #include <asm/mshyperv.h>
+#include <asm/swiotlb.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-direct.h>
+#include <linux/set_memory.h>
 
 #include "irq_remapping.h"
 
@@ -36,6 +44,9 @@
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_size;
+static void *hyperv_io_tlb_start;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
 		const struct cpumask *mask, bool force)
 {
@@ -337,4 +348,58 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
 	.free = hyperv_root_irq_remapping_free,
 };
 
+void __init hyperv_iommu_swiotlb_init(void)
+{
+	unsigned long bytes;
+
+	/*
+	 * Allocate Hyper-V swiotlb bounce buffer at early place
+	 * to reserve large contiguous memory.
+	 */
+	hyperv_io_tlb_size = 256 * 1024 * 1024;
+	hyperv_io_tlb_start =
+			memblock_alloc_low(
+				  PAGE_ALIGN(hyperv_io_tlb_size),
+				  HV_HYP_PAGE_SIZE);
+
+	if (!hyperv_io_tlb_start) {
+		pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
+		return;
+	}
+}
+
+int __init hyperv_swiotlb_detect(void)
+{
+	if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
+	    && hv_is_isolation_supported()) {
+		/*
+		 * Enable swiotlb force mode in Isolation VM to
+		 * use swiotlb bounce buffer for dma transaction.
+		 */
+		swiotlb_force = SWIOTLB_FORCE;
+
+		dma_memory_generic_decrypted_ops.map = hv_map_memory;
+		dma_memory_generic_decrypted_ops.unmap = hv_unmap_memory;
+		return 1;
+	}
+
+	return 0;
+}
+
+void __init hyperv_iommu_swiotlb_later_init(void)
+{
+	/*
+	 * Swiotlb bounce buffer needs to be mapped in extra address
+	 * space. Map function doesn't work in the early place and so
+	 * call swiotlb_late_init_with_tbl() here.
+	 */
+	if (swiotlb_late_init_with_tbl(hyperv_io_tlb_start,
+				       hyperv_io_tlb_size >> IO_TLB_SHIFT))
+		panic("Fail to initialize hyperv swiotlb.\n");
+}
+
+IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
+		  NULL, hyperv_iommu_swiotlb_init,
+		  hyperv_iommu_swiotlb_later_init);
+
 #endif
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 90b542597143..83fa567ad594 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1744,6 +1744,7 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
 int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
 				void (*block_invalidate)(void *context,
 							 u64 block_mask));
+int __init hyperv_swiotlb_detect(void);
 
 struct hyperv_pci_block_ops {
 	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (10 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-19 18:14   ` Michael Kelley
  2021-08-09 17:56 ` [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
  2021-08-16 14:55 ` [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Michael Kelley
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h   |   6 ++
 drivers/net/hyperv/netvsc.c       | 144 +++++++++++++++++++++++++++++-
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h            |   5 ++
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
 	u32 total_bytes;
 	u32 send_buf_index;
 	u32 total_data_buflen;
+	struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
 	/* Receive buffer allocated by us but manages by NetVSP */
 	void *recv_buf;
+	void *recv_original_buf;
 	u32 recv_buf_size; /* allocated bytes */
 	u32 recv_buf_gpadl_handle;
 	u32 recv_section_cnt;
@@ -1082,6 +1084,8 @@ struct netvsc_device {
 
 	/* Send buffer allocated by us */
 	void *send_buf;
+	void *send_original_buf;
+	u32 send_buf_size;
 	u32 send_buf_gpadl_handle;
 	u32 send_section_cnt;
 	u32 send_section_size;
@@ -1730,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI	10000
 #define RETRY_MAX	2000	/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+		      struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..fc312e5db4d5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
 	int i;
 
 	kfree(nvdev->extension);
-	vfree(nvdev->recv_buf);
-	vfree(nvdev->send_buf);
+
+	if (nvdev->recv_original_buf) {
+		vunmap(nvdev->recv_buf);
+		vfree(nvdev->recv_original_buf);
+	} else {
+		vfree(nvdev->recv_buf);
+	}
+
+	if (nvdev->send_original_buf) {
+		vunmap(nvdev->send_buf);
+		vfree(nvdev->send_original_buf);
+	} else {
+		vfree(nvdev->send_buf);
+	}
+
 	kfree(nvdev->send_section_map);
 
 	for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx)
 	return nvchan->mrc.slots ? 0 : -ENOMEM;
 }
 
+static void *netvsc_remap_buf(void *buf, unsigned long size)
+{
+	unsigned long *pfns;
+	void *vaddr;
+	int i;
+
+	pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
+		       GFP_KERNEL);
+	if (!pfns)
+		return NULL;
+
+	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+		pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
+			+ (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+	kfree(pfns);
+
+	return vaddr;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
 			   struct netvsc_device *net_device,
 			   const struct netvsc_device_info *device_info)
@@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
 	unsigned int buf_size;
 	size_t map_words;
 	int i, ret = 0;
+	void *vaddr;
 
 	/* Get receive buffer area. */
 	buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
+		if (!vaddr)
+			goto cleanup;
+
+		net_device->recv_original_buf = net_device->recv_buf;
+		net_device->recv_buf = vaddr;
+	}
+
 	/* Notify the NetVsp of the gpadl handle */
 	init_packet = &net_device->channel_init_pkt;
 	memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
+		if (!vaddr)
+			goto cleanup;
+
+		net_device->send_original_buf = net_device->send_buf;
+		net_device->send_buf = vaddr;
+	}
+
 	/* Notify the NetVsp of the gpadl handle */
 	init_packet = &net_device->channel_init_pkt;
 	memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -767,7 +820,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
 
 	/* Notify the layer above us */
 	if (likely(skb)) {
-		const struct hv_netvsc_packet *packet
+		struct hv_netvsc_packet *packet
 			= (struct hv_netvsc_packet *)skb->cb;
 		u32 send_index = packet->send_buf_index;
 		struct netvsc_stats *tx_stats;
@@ -783,6 +836,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
 		tx_stats->bytes += packet->total_bytes;
 		u64_stats_update_end(&tx_stats->syncp);
 
+		netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
 		napi_consume_skb(skb, budget);
 	}
 
@@ -947,6 +1001,82 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
 		memset(dest, 0, padding);
 }
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+		      struct hv_netvsc_packet *packet)
+{
+	u32 page_count = packet->cp_partial ?
+		packet->page_buf_cnt - packet->rmsg_pgcnt :
+		packet->page_buf_cnt;
+	int i;
+
+	if (!hv_is_isolation_supported())
+		return;
+
+	if (!packet->dma_range)
+		return;
+
+	for (i = 0; i < page_count; i++)
+		dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
+				 packet->dma_range[i].mapping_size,
+				 DMA_TO_DEVICE);
+
+	kfree(packet->dma_range);
+}
+
+/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
+ * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
+ * VM.
+ *
+ * In isolation VM, netvsc send buffer has been marked visible to
+ * host and so the data copied to send buffer doesn't need to use
+ * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
+ * may not be copied to send buffer and so these pages need to be
+ * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
+ * that. The pfns in the struct hv_page_buffer need to be converted
+ * to bounce buffer's pfn. The loop here is necessary and so not
+ * use dma_map_sg() here.
+ */
+int netvsc_dma_map(struct hv_device *hv_dev,
+		   struct hv_netvsc_packet *packet,
+		   struct hv_page_buffer *pb)
+{
+	u32 page_count =  packet->cp_partial ?
+		packet->page_buf_cnt - packet->rmsg_pgcnt :
+		packet->page_buf_cnt;
+	dma_addr_t dma;
+	int i;
+
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	packet->dma_range = kcalloc(page_count,
+				    sizeof(*packet->dma_range),
+				    GFP_KERNEL);
+	if (!packet->dma_range)
+		return -ENOMEM;
+
+	for (i = 0; i < page_count; i++) {
+		char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
+					 + pb[i].offset);
+		u32 len = pb[i].len;
+
+		dma = dma_map_single(&hv_dev->device, src, len,
+				     DMA_TO_DEVICE);
+		if (dma_mapping_error(&hv_dev->device, dma)) {
+			kfree(packet->dma_range);
+			return -ENOMEM;
+		}
+
+		packet->dma_range[i].dma = dma;
+		packet->dma_range[i].mapping_size = len;
+		pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
+		pb[i].offset = offset_in_hvpage(dma);
+		pb[i].len = len;
+	}
+
+	return 0;
+}
+
 static inline int netvsc_send_pkt(
 	struct hv_device *device,
 	struct hv_netvsc_packet *packet,
@@ -987,14 +1117,22 @@ static inline int netvsc_send_pkt(
 
 	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
 
+	packet->dma_range = NULL;
 	if (packet->page_buf_cnt) {
 		if (packet->cp_partial)
 			pb += packet->rmsg_pgcnt;
 
+		ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
+		if (ret)
+			return ret;
+
 		ret = vmbus_sendpacket_pagebuffer(out_channel,
 						  pb, packet->page_buf_cnt,
 						  &nvmsg, sizeof(nvmsg),
 						  req_id);
+
+		if (ret)
+			netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
 	} else {
 		ret = vmbus_sendpacket(out_channel,
 				       &nvmsg, sizeof(nvmsg),
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index f6c9c2a670f9..448fcc325ed7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct net_device *ndev,
 			}
 		}
 
+		netvsc_dma_unmap(((struct net_device_context *)
+			netdev_priv(ndev))->device_ctx, &request->pkt);
 		complete(&request->wait_event);
 	} else {
 		netdev_err(ndev,
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 83fa567ad594..2ea638101645 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1601,6 +1601,11 @@ struct hyperv_service_callback {
 	void (*callback)(void *context);
 };
 
+struct hv_dma_range {
+	dma_addr_t dma;
+	u32 mapping_size;
+};
+
 #define MAX_SRV_VER	0x7ffffff
 extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf, u32 buflen,
 				const int *fw_version, int fw_vercnt,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (11 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
@ 2021-08-09 17:56 ` Tianyu Lan
  2021-08-19 18:17   ` Michael Kelley
  2021-08-16 14:55 ` [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Michael Kelley
  13 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-09 17:56 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 68 +++++++++++++++++++++++++++++++++++---
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..78320719bdd8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include <linux/device.h>
 #include <linux/hyperv.h>
 #include <linux/blkdev.h>
+#include <linux/io.h>
+#include <linux/dma-mapping.h>
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_host.h>
@@ -427,6 +429,8 @@ struct storvsc_cmd_request {
 	u32 payload_sz;
 
 	struct vstor_packet vstor_packet;
+	u32 hvpg_count;
+	struct hv_dma_range *dma_range;
 };
 
 
@@ -509,6 +513,14 @@ struct storvsc_scan_work {
 	u8 tgt_id;
 };
 
+#define storvsc_dma_map(dev, page, offset, size, dir) \
+	dma_map_page(dev, page, offset, size, dir)
+
+#define storvsc_dma_unmap(dev, dma_range, dir)		\
+		dma_unmap_page(dev, dma_range.dma,	\
+			       dma_range.mapping_size,	\
+			       dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
+
 static void storvsc_device_scan(struct work_struct *work)
 {
 	struct storvsc_scan_work *wrk;
@@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
 	struct hv_device *device;
 	struct storvsc_device *stor_device;
 	struct Scsi_Host *shost;
+	int i;
 
 	if (channel->primary_channel != NULL)
 		device = channel->primary_channel->device_obj;
@@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
 				request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
 			}
 
+			if (request->dma_range) {
+				for (i = 0; i < request->hvpg_count; i++)
+					storvsc_dma_unmap(&device->device,
+						request->dma_range[i],
+						request->vstor_packet.vm_srb.data_in == READ_TYPE);
+
+				kfree(request->dma_range);
+			}
+
 			storvsc_on_receive(stor_device, packet, request);
 			continue;
 		}
@@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 		unsigned int hvpgoff, hvpfns_to_add;
 		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
 		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
+		dma_addr_t dma;
 		u64 hvpfn;
+		u32 size;
 
 		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 		payload->range.len = length;
 		payload->range.offset = offset_in_hvpg;
 
+		cmd_request->dma_range = kcalloc(hvpg_count,
+				 sizeof(*cmd_request->dma_range),
+				 GFP_ATOMIC);
+		if (!cmd_request->dma_range) {
+			ret = -ENOMEM;
+			goto free_payload;
+		}
 
 		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
 			/*
@@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 			 * last sgl should be reached at the same time that
 			 * the PFN array is filled.
 			 */
-			while (hvpfns_to_add--)
-				payload->range.pfn_array[i++] =	hvpfn++;
+			while (hvpfns_to_add--) {
+				size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
+					   (unsigned long)length);
+				dma = storvsc_dma_map(&dev->device, pfn_to_page(hvpfn++),
+						      offset_in_hvpg, size,
+						      scmnd->sc_data_direction);
+				if (dma_mapping_error(&dev->device, dma)) {
+					ret = -ENOMEM;
+					goto free_dma_range;
+				}
+
+				if (offset_in_hvpg) {
+					payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
+					offset_in_hvpg = 0;
+				}
+
+				cmd_request->dma_range[i].dma = dma;
+				cmd_request->dma_range[i].mapping_size = size;
+				payload->range.pfn_array[i++] = dma >> HV_HYP_PAGE_SHIFT;
+				length -= size;
+			}
 		}
+		cmd_request->hvpg_count = hvpg_count;
 	}
 
 	cmd_request->payload = payload;
@@ -1860,13 +1911,20 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	put_cpu();
 
 	if (ret == -EAGAIN) {
-		if (payload_sz > sizeof(cmd_request->mpb))
-			kfree(payload);
 		/* no more space */
-		return SCSI_MLQUEUE_DEVICE_BUSY;
+		ret = SCSI_MLQUEUE_DEVICE_BUSY;
+		goto free_dma_range;
 	}
 
 	return 0;
+
+free_dma_range:
+	kfree(cmd_request->dma_range);
+
+free_payload:
+	if (payload_sz > sizeof(cmd_request->mpb))
+		kfree(payload);
+	return ret;
 }
 
 static struct scsi_host_template scsi_driver = {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
@ 2021-08-09 22:12   ` Dave Hansen
  2021-08-10 13:09     ` Tianyu Lan
  2021-08-10 11:03   ` Wei Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 64+ messages in thread
From: Dave Hansen @ 2021-08-09 22:12 UTC (permalink / raw)
  To: Tianyu Lan, kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo,
	bp, x86, hpa, dave.hansen, luto, peterz, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, joro, will, davem, kuba,
	jejb, martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea

On 8/9/21 10:56 AM, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.

From an x86/mm perspective:

Acked-by: Dave Hansen <dave.hansen@intel.com>

A tiny nit:

> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> +		return 0;
> +
> +	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> +		return 0;
> +
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
This might be worthwhile to move to a header.  That ensures that
hv_is_isolation_supported() use can avoid even a function call.  But, I
see this is used in modules and its use here is also in a slow path, so
it's not a big deal

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
  2021-08-09 17:56 ` [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM Tianyu Lan
@ 2021-08-10 10:56   ` Wei Liu
  2021-08-10 12:17     ` Tianyu Lan
  2021-08-12 19:14   ` Michael Kelley
  1 sibling, 1 reply; 64+ messages in thread
From: Wei Liu @ 2021-08-10 10:56 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On Mon, Aug 09, 2021 at 01:56:05PM -0400, Tianyu Lan wrote:
[...]
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>  		}
>  	}
>  
> +	hyperv_init_ghcb();
> +

Why is the return value not checked here? If that's not required, can
you leave a comment?

Wei.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
  2021-08-09 22:12   ` Dave Hansen
@ 2021-08-10 11:03   ` Wei Liu
  2021-08-10 12:25     ` Tianyu Lan
  2021-08-12 19:36   ` Michael Kelley
  2021-08-12 21:10   ` Michael Kelley
  3 siblings, 1 reply; 64+ messages in thread
From: Wei Liu @ 2021-08-10 11:03 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On Mon, Aug 09, 2021 at 01:56:07PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v2:
>        * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>          according to platform check.
> 
> Change since v1:
>        * Use new staic call x86_set_memory_enc to avoid add Hyper-V
>          specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile           |   2 +-
>  arch/x86/hyperv/hv_init.c          |   6 ++
>  arch/x86/hyperv/ivm.c              | 114 +++++++++++++++++++++++++++++
>  arch/x86/include/asm/hyperv-tlfs.h |  20 +++++
>  arch/x86/include/asm/mshyperv.h    |   4 +-
>  arch/x86/mm/pat/set_memory.c       |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h     |   1 +
>  8 files changed, 160 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
>  
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> +		return 0;

Nit: false instead of 0.

> +
> +	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> +		return 0;
> +
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
>  
[...]
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +			   enum hv_mem_host_visibility visibility)
> +{
> +	struct hv_gpa_range_for_visibility **input_pcpu, *input;
> +	u16 pages_processed;
> +	u64 hv_status;
> +	unsigned long flags;
> +
> +	/* no-op if partition isolation is not enabled */
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> +			HV_MAX_MODIFY_GPA_REP_COUNT);
> +		return -EINVAL;
> +	}
> +
> +	local_irq_save(flags);
> +	input_pcpu = (struct hv_gpa_range_for_visibility **)
> +			this_cpu_ptr(hyperv_pcpu_input_arg);
> +	input = *input_pcpu;
> +	if (unlikely(!input)) {
> +		local_irq_restore(flags);
> +		return -EINVAL;
> +	}
> +
> +	input->partition_id = HV_PARTITION_ID_SELF;
> +	input->host_visibility = visibility;
> +	input->reserved0 = 0;
> +	input->reserved1 = 0;
> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> +	hv_status = hv_do_rep_hypercall(
> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> +			0, input, &pages_processed);
> +	local_irq_restore(flags);
> +
> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> +		return 0;
> +
> +	return hv_status & HV_HYPERCALL_RESULT_MASK;

Joseph introduced a few helper functions in 753ed9c95c37d. They will
make the code simpler.

Wei.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
  2021-08-10 10:56   ` Wei Liu
@ 2021-08-10 12:17     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-10 12:17 UTC (permalink / raw)
  To: Wei Liu
  Cc: kys, haiyangz, sthemmin, decui, tglx, mingo, bp, x86, hpa,
	dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky, jgross,
	sstabellini, joro, will, davem, kuba, jejb, martin.petersen,
	arnd, hch, m.szyprowski, robin.murphy, thomas.lendacky,
	brijesh.singh, ardb, Tianyu.Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, sfr, saravanand, krish.sadhukhan,
	aneesh.kumar, xen-devel, rientjes, hannes, tj, michael.h.kelley,
	iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

Hi Wei:
       Thanks for review.

On 8/10/2021 6:56 PM, Wei Liu wrote:
> On Mon, Aug 09, 2021 at 01:56:05PM -0400, Tianyu Lan wrote:
> [...]
>>   static int hv_cpu_init(unsigned int cpu)
>>   {
>>   	union hv_vp_assist_msr_contents msr = { 0 };
>> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>>   		}
>>   	}
>>   
>> +	hyperv_init_ghcb();
>> +
> 
> Why is the return value not checked here? If that's not required, can
> you leave a comment?
> 

The check is necessary here. Will update in the next version.

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-10 11:03   ` Wei Liu
@ 2021-08-10 12:25     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-10 12:25 UTC (permalink / raw)
  To: Wei Liu
  Cc: kys, haiyangz, sthemmin, decui, tglx, mingo, bp, x86, hpa,
	dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky, jgross,
	sstabellini, joro, will, davem, kuba, jejb, martin.petersen,
	arnd, hch, m.szyprowski, robin.murphy, thomas.lendacky,
	brijesh.singh, ardb, Tianyu.Lan, pgonda, martin.b.radev, akpm,
	kirill.shutemov, rppt, sfr, saravanand, krish.sadhukhan,
	aneesh.kumar, xen-devel, rientjes, hannes, tj, michael.h.kelley,
	iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

On 8/10/2021 7:03 PM, Wei Liu wrote:
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 0bb4d9ca7a55..b3683083208a 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>>   
>>   bool hv_is_isolation_supported(void)
>>   {
>> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>> +		return 0;
> Nit: false instead of 0.
> 

OK. Will fix in the next version.

>> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>> +			   enum hv_mem_host_visibility visibility)
>> +{
>> +	struct hv_gpa_range_for_visibility **input_pcpu, *input;
>> +	u16 pages_processed;
>> +	u64 hv_status;
>> +	unsigned long flags;
>> +
>> +	/* no-op if partition isolation is not enabled */
>> +	if (!hv_is_isolation_supported())
>> +		return 0;
>> +
>> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
>> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
>> +			HV_MAX_MODIFY_GPA_REP_COUNT);
>> +		return -EINVAL;
>> +	}
>> +
>> +	local_irq_save(flags);
>> +	input_pcpu = (struct hv_gpa_range_for_visibility **)
>> +			this_cpu_ptr(hyperv_pcpu_input_arg);
>> +	input = *input_pcpu;
>> +	if (unlikely(!input)) {
>> +		local_irq_restore(flags);
>> +		return -EINVAL;
>> +	}
>> +
>> +	input->partition_id = HV_PARTITION_ID_SELF;
>> +	input->host_visibility = visibility;
>> +	input->reserved0 = 0;
>> +	input->reserved1 = 0;
>> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
>> +	hv_status = hv_do_rep_hypercall(
>> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
>> +			0, input, &pages_processed);
>> +	local_irq_restore(flags);
>> +
>> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
>> +		return 0;
>> +
>> +	return hv_status & HV_HYPERCALL_RESULT_MASK;
> Joseph introduced a few helper functions in 753ed9c95c37d. They will
> make the code simpler.

OK. Will update in the next version.

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-09 22:12   ` Dave Hansen
@ 2021-08-10 13:09     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-10 13:09 UTC (permalink / raw)
  To: Dave Hansen, kys, haiyangz, sthemmin, wei.liu, decui, tglx,
	mingo, bp, x86, hpa, dave.hansen, luto, peterz, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, joro, will, davem, kuba,
	jejb, martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea

On 8/10/2021 6:12 AM, Dave Hansen wrote:
> On 8/9/21 10:56 AM, Tianyu Lan wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Add new hvcall guest address host visibility support to mark
>> memory visible to host. Call it inside set_memory_decrypted
>> /encrypted(). Add HYPERVISOR feature check in the
>> hv_is_isolation_supported() to optimize in non-virtualization
>> environment.
> 
>  From an x86/mm perspective:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> 

Thanks for your ACK.


> A tiny nit:
> 
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 0bb4d9ca7a55..b3683083208a 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>>   
>>   bool hv_is_isolation_supported(void)
>>   {
>> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>> +		return 0;
>> +
>> +	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
>> +		return 0;
>> +
>>   	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>>   }
> This might be worthwhile to move to a header.  That ensures that
> hv_is_isolation_supported() use can avoid even a function call.  But, I
> see this is used in modules and its use here is also in a slow path, so
> it's not a big deal
> 

I will move it to header in the following version.


Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  2021-08-09 17:56 ` [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
@ 2021-08-12 12:26   ` Christoph Hellwig
  2021-08-12 15:38     ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Hellwig @ 2021-08-12 12:26 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On Mon, Aug 09, 2021 at 01:56:13PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
> needs to be mapped into address space above vTOM and so
> introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
> bounce buffer memory. The platform can populate man/unmap callback
> in the dma memory decrypted ops.

Nothing here looks actually DMA related, and the names are horribly
confusing vs the actual dma_map_* calls.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-09 17:56 ` [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
@ 2021-08-12 12:27   ` Christoph Hellwig
  2021-08-13 17:58     ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Hellwig @ 2021-08-12 12:27 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

This is still broken.  You need to make sure the actual DMA allocations
do have struct page backing.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  2021-08-12 12:26   ` Christoph Hellwig
@ 2021-08-12 15:38     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-12 15:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen



On 8/12/2021 8:26 PM, Christoph Hellwig wrote:
> On Mon, Aug 09, 2021 at 01:56:13PM -0400, Tianyu Lan wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
>> needs to be mapped into address space above vTOM and so
>> introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
>> bounce buffer memory. The platform can populate man/unmap callback
>> in the dma memory decrypted ops.
> 
> Nothing here looks actually DMA related, and the names are horribly
> confusing vs the actual dma_map_* calls.
> 

OK. So this still need to keep in the set_memory.c file.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
  2021-08-09 17:56 ` [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM Tianyu Lan
  2021-08-10 10:56   ` Wei Liu
@ 2021-08-12 19:14   ` Michael Kelley
  2021-08-13 15:46     ` Tianyu Lan
  1 sibling, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-12 19:14 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

The subject line tag on patches under arch/x86/hyperv is generally "x86/hyperv:".
There's some variation in the spelling of "hyperv", but let's go with the all
lowercase "hyperv".

> 
> Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via GHCB.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c       | 66 +++++++++++++++++++++++++++++++--
>  arch/x86/include/asm/mshyperv.h |  2 +
>  include/asm-generic/mshyperv.h  |  2 +
>  3 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 708a2712a516..0bb4d9ca7a55 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -20,6 +20,7 @@
>  #include <linux/kexec.h>
>  #include <linux/version.h>
>  #include <linux/vmalloc.h>
> +#include <linux/io.h>
>  #include <linux/mm.h>
>  #include <linux/hyperv.h>
>  #include <linux/slab.h>
> @@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
>  struct hv_vp_assist_page **hv_vp_assist_page;
>  EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> 
> +static int hyperv_init_ghcb(void)
> +{
> +	u64 ghcb_gpa;
> +	void *ghcb_va;
> +	void **ghcb_base;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return -EINVAL;
> +
> +	/*
> +	 * GHCB page is allocated by paravisor. The address
> +	 * returned by MSR_AMD64_SEV_ES_GHCB is above shared
> +	 * ghcb boundary and map it here.
> +	 */
> +	rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> +	ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> +	if (!ghcb_va)
> +		return -ENOMEM;
> +
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	*ghcb_base = ghcb_va;
> +
> +	return 0;
> +}
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>  		}
>  	}
> 
> +	hyperv_init_ghcb();
> +
>  	return 0;
>  }
> 
> @@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu)
>  {
>  	struct hv_reenlightenment_control re_ctrl;
>  	unsigned int new_cpu;
> +	void **ghcb_va = NULL;

I'm not seeing any reason why this needs to be initialized.

> +
> +	if (ms_hyperv.ghcb_base) {
> +		ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +		if (*ghcb_va)
> +			memunmap(*ghcb_va);
> +		*ghcb_va = NULL;
> +	}
> 
>  	hv_common_cpu_die(cpu);
> 
> @@ -383,9 +419,19 @@ void __init hyperv_init(void)
>  			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>  			__builtin_return_address(0));
> -	if (hv_hypercall_pg == NULL) {
> -		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> -		goto remove_cpuhp_state;
> +	if (hv_hypercall_pg == NULL)
> +		goto clean_guest_os_id;
> +
> +	if (hv_isolation_type_snp()) {
> +		ms_hyperv.ghcb_base = alloc_percpu(void *);
> +		if (!ms_hyperv.ghcb_base)
> +			goto clean_guest_os_id;
> +
> +		if (hyperv_init_ghcb()) {
> +			free_percpu(ms_hyperv.ghcb_base);
> +			ms_hyperv.ghcb_base = NULL;
> +			goto clean_guest_os_id;
> +		}

Having the GHCB setup code here splits the hypercall page setup into
two parts, which is unexpected.  First the memory is allocated
for the hypercall page, then the GHCB stuff is done, then the hypercall
MSR is setup.  Is there a need to do this split?  Also, if the GHCB stuff
fails and you goto clean_guest_os_id, the memory allocated for the
hypercall page is never freed.

It's also unexpected to have hyperv_init_ghcb() called here and called
in hv_cpu_init().  Wouldn't it be possible to setup ghcb_base *before*
cpu_setup_state() is called, so that hv_cpu_init() would take care of
calling hyperv_init_ghcb() for the boot CPU?  That's the pattern used
by the VP assist page, the percpu input page, etc.

>  	}
> 
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -456,7 +502,8 @@ void __init hyperv_init(void)
>  	hv_query_ext_cap(0);
>  	return;
> 
> -remove_cpuhp_state:
> +clean_guest_os_id:
> +	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  	cpuhp_remove_state(cpuhp);
>  free_vp_assist_page:
>  	kfree(hv_vp_assist_page);
> @@ -484,6 +531,9 @@ void hyperv_cleanup(void)
>  	 */
>  	hv_hypercall_pg = NULL;
> 
> +	if (ms_hyperv.ghcb_base)
> +		free_percpu(ms_hyperv.ghcb_base);
> +

I don't think this cleanup is necessary.  The primary purpose of
hyperv_cleanup() is to ensure that things like overlay pages are
properly reset in Hyper-V before doing a kexec(), or before
panic'ing and running the kdump kernel.  There's no need to do
general memory free'ing in Linux.  Doing so just adds to the risk
that the panic path could itself fail.

>  	/* Reset the hypercall page */
>  	hypercall_msr.as_uint64 = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -559,3 +609,11 @@ bool hv_is_isolation_supported(void)
>  {
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +
> +bool hv_isolation_type_snp(void)
> +{
> +	return static_branch_unlikely(&isolation_type_snp);
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index adccbc209169..6627cfd2bfba 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -11,6 +11,8 @@
>  #include <asm/paravirt.h>
>  #include <asm/mshyperv.h>
> 
> +DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> +
>  typedef int (*hyperv_fill_flush_list_func)(
>  		struct hv_guest_mapping_flush_list *flush,
>  		void *data);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c1ab6a6e72b5..4269f3174e58 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -36,6 +36,7 @@ struct ms_hyperv_info {
>  	u32 max_lp_index;
>  	u32 isolation_config_a;
>  	u32 isolation_config_b;
> +	void  __percpu **ghcb_base;

This doesn't feel like the right place to put this pointer.  The other
fields in the ms_hyperv_info structure are just fixed values obtained
from the CPUID instruction.   The existing patterns similar to ghcb_base
are the VP assist page and the percpu input and output args.  They are
all based on standalone global variables.  It would be more consistent
to do the same with the ghcb_base.

>  };
>  extern struct ms_hyperv_info ms_hyperv;
> 
> @@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void);
>  bool hv_is_hibernation_supported(void);
>  enum hv_isolation_type hv_get_isolation_type(void);
>  bool hv_is_isolation_supported(void);
> +bool hv_isolation_type_snp(void);
>  void hyperv_cleanup(void);
>  bool hv_query_ext_cap(u64 cap_query);
>  #else /* CONFIG_HYPERV */
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.
  2021-08-09 17:56 ` [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the " Tianyu Lan
@ 2021-08-12 19:18   ` Michael Kelley
  2021-08-14 13:32     ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-12 19:18 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.

As with Patch 1, use the "x86/hyperv:" tag in the Subject line.

> 
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Hyper-V exposes shared memory boundary via cpuid
> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
> shared_gpa_boundary of ms_hyperv struct. This prepares
> to share memory with host for SNP guest.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/kernel/cpu/mshyperv.c |  2 ++
>  include/asm-generic/mshyperv.h | 12 +++++++++++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 6b5835a087a3..2b7f396ef1a5 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>  	if (ms_hyperv.priv_high & HV_ISOLATION) {
>  		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>  		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
> +		ms_hyperv.shared_gpa_boundary =
> +			(u64)1 << ms_hyperv.shared_gpa_boundary_bits;

You could use BIT_ULL() here, but it's kind of a shrug.

> 
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 4269f3174e58..aa26d24a5ca9 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -35,8 +35,18 @@ struct ms_hyperv_info {
>  	u32 max_vp_index;
>  	u32 max_lp_index;
>  	u32 isolation_config_a;
> -	u32 isolation_config_b;
> +	union {
> +		u32 isolation_config_b;
> +		struct {
> +			u32 cvm_type : 4;
> +			u32 Reserved11 : 1;
> +			u32 shared_gpa_boundary_active : 1;
> +			u32 shared_gpa_boundary_bits : 6;
> +			u32 Reserved12 : 20;

Any reason to name the reserved fields as "11" and "12"?  It
just looks a bit unusual.  And I'd suggest lowercase "r".

> +		};
> +	};
>  	void  __percpu **ghcb_base;
> +	u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
  2021-08-09 22:12   ` Dave Hansen
  2021-08-10 11:03   ` Wei Liu
@ 2021-08-12 19:36   ` Michael Kelley
  2021-08-12 21:10   ` Michael Kelley
  3 siblings, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-12 19:36 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

Use "x86/hyperv:" tag in the Subject line.

> 
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v2:
>        * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>          according to platform check.
> 
> Change since v1:
>        * Use new staic call x86_set_memory_enc to avoid add Hyper-V
>          specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile           |   2 +-
>  arch/x86/hyperv/hv_init.c          |   6 ++
>  arch/x86/hyperv/ivm.c              | 114 +++++++++++++++++++++++++++++
>  arch/x86/include/asm/hyperv-tlfs.h |  20 +++++
>  arch/x86/include/asm/mshyperv.h    |   4 +-
>  arch/x86/mm/pat/set_memory.c       |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h     |   1 +
>  8 files changed, 160 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> 
>  bool hv_is_isolation_supported(void)
>  {
> +	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> +		return 0;
> +
> +	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> +		return 0;
> +
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;

Could all of the tests in this function be run at initialization time, and
a single Boolean value pre-computed that this function returns?  I don't
think any of tests would change during the lifetime of the Linux instance,
so running the tests every time is slower than it needs to be.

>  }
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> new file mode 100644
> index 000000000000..8c905ffdba7f
> --- /dev/null
> +++ b/arch/x86/hyperv/ivm.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V Isolation VM interface with paravisor and hypervisor
> + *
> + * Author:
> + *  Tianyu Lan <Tianyu.Lan@microsoft.com>
> + */
> +
> +#include <linux/hyperv.h>
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
> +#include <linux/slab.h>
> +#include <asm/io.h>
> +#include <asm/mshyperv.h>
> +
> +/*
> + * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
> + *
> + * In Isolation VM, all guest memory is encripted from host and guest
> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host.
> + */
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +			   enum hv_mem_host_visibility visibility)
> +{
> +	struct hv_gpa_range_for_visibility **input_pcpu, *input;
> +	u16 pages_processed;
> +	u64 hv_status;
> +	unsigned long flags;
> +
> +	/* no-op if partition isolation is not enabled */
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> +			HV_MAX_MODIFY_GPA_REP_COUNT);
> +		return -EINVAL;
> +	}
> +
> +	local_irq_save(flags);
> +	input_pcpu = (struct hv_gpa_range_for_visibility **)
> +			this_cpu_ptr(hyperv_pcpu_input_arg);
> +	input = *input_pcpu;
> +	if (unlikely(!input)) {
> +		local_irq_restore(flags);
> +		return -EINVAL;
> +	}
> +
> +	input->partition_id = HV_PARTITION_ID_SELF;
> +	input->host_visibility = visibility;
> +	input->reserved0 = 0;
> +	input->reserved1 = 0;
> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> +	hv_status = hv_do_rep_hypercall(
> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> +			0, input, &pages_processed);
> +	local_irq_restore(flags);
> +
> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> +		return 0;

pages_processed should also be checked to ensure that it equals count.
If not, something has gone wrong in the hypercall.

> +
> +	return hv_status & HV_HYPERCALL_RESULT_MASK;
> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);
> +
> +static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount,
> +				      enum hv_mem_host_visibility visibility)
> +{
> +	u64 *pfn_array;
> +	int ret = 0;
> +	int i, pfn;
> +
> +	if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
> +		return 0;
> +
> +	pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);

Does the page need to be zero'ed?  All bytes that are used will
be explicitly written in the loop below.

> +	if (!pfn_array)
> +		return -ENOMEM;
> +
> +	for (i = 0, pfn = 0; i < pagecount; i++) {
> +		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
> +		pfn++;
> +
> +		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
> +			ret |= hv_mark_gpa_visibility(pfn, pfn_array,
> +					visibility);

I don't see why value of "ret" is OR'ed.   If the result of hv_mark_gpa_visibility()
is ever non-zero, we'll exit immediately.  There's no need to accumulate the
results of multiple calls to hv_mark_gpa_visibility().

> +			pfn = 0;
> +
> +			if (ret)
> +				goto err_free_pfn_array;
> +		}
> +	}
> +
> + err_free_pfn_array:
> +	kfree(pfn_array);
> +	return ret;
> +}
> +
> +/*
> + * hv_set_mem_host_visibility - Set specified memory visible to host.
> + *
> + * In Isolation VM, all guest memory is encrypted from host and guest
> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host. This function works as wrap of hv_mark_gpa_visibility()
> + * with memory base and size.
> + */
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
> +{
> +	enum hv_mem_host_visibility visibility = visible ?
> +			VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE;
> +
> +	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
> +}
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 2322d6bd5883..1691d2bce0b7 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -276,6 +276,13 @@ enum hv_isolation_type {
>  #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
>  #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
> 
> +/* Hyper-V memory host visibility */
> +enum hv_mem_host_visibility {
> +	VMBUS_PAGE_NOT_VISIBLE		= 0,
> +	VMBUS_PAGE_VISIBLE_READ_ONLY	= 1,
> +	VMBUS_PAGE_VISIBLE_READ_WRITE	= 3
> +};
> +
>  /*
>   * Declare the MSR used to setup pages used to communicate with the hypervisor.
>   */
> @@ -587,4 +594,17 @@ enum hv_interrupt_type {
> 
>  #include <asm-generic/hyperv-tlfs.h>
> 
> +/* All input parameters should be in single page. */
> +#define HV_MAX_MODIFY_GPA_REP_COUNT		\
> +	((PAGE_SIZE / sizeof(u64)) - 2)
> +
> +/* HvCallModifySparseGpaPageHostVisibility hypercall */
> +struct hv_gpa_range_for_visibility {
> +	u64 partition_id;
> +	u32 host_visibility:2;
> +	u32 reserved0:30;
> +	u32 reserved1;
> +	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
> +} __packed;
> +

We should avoid adding definitions *after* the #include of
<asm-generic/hyperv-tlfs.h>.   That #include should be last.  Any
reason these can't go earlier?  And they really go together with
enum hv_mem_host_visibility.

Separately, take a look at how the structure hv_memory_hint
and HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES is handled.
It's a close parallel to what you are doing above, and is a slightly
cleaner approach.

>  #endif
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 6627cfd2bfba..87a386fa97f7 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -190,7 +190,9 @@ struct irq_domain *hv_create_pci_msi_domain(void);
>  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>  		struct hv_interrupt_entry *entry);
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> -
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +			   enum hv_mem_host_visibility visibility);
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index ad8a5c586a35..1e4a0882820a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -29,6 +29,8 @@
>  #include <asm/proto.h>
>  #include <asm/memtype.h>
>  #include <asm/set_memory.h>
> +#include <asm/hyperv-tlfs.h>
> +#include <asm/mshyperv.h>
> 
>  #include "../mm_internal.h"
> 
> @@ -1980,15 +1982,11 @@ int set_memory_global(unsigned long addr, int numpages)
>  				    __pgprot(_PAGE_GLOBAL), 0);
>  }
> 
> -static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>  {
>  	struct cpa_data cpa;
>  	int ret;
> 
> -	/* Nothing to do if memory encryption is not active */
> -	if (!mem_encrypt_active())
> -		return 0;
> -
>  	/* Should not be working on unaligned addresses */
>  	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>  		addr &= PAGE_MASK;
> @@ -2023,6 +2021,17 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	return ret;
>  }
> 
> +static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> +	if (hv_is_isolation_supported())
> +		return hv_set_mem_host_visibility(addr, numpages, !enc);
> +
> +	if (mem_encrypt_active())
> +		return __set_memory_enc_pgtable(addr, numpages, enc);
> +
> +	return 0;
> +}
> +
>  int set_memory_encrypted(unsigned long addr, int numpages)
>  {
>  	return __set_memory_enc_dec(addr, numpages, true);
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 56348a541c50..8ed6733d5146 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_RETARGET_INTERRUPT		0x007e
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> +#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> 
>  /* Extended hypercalls */
>  #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index aa26d24a5ca9..079988ed45b9 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -255,6 +255,7 @@ bool hv_query_ext_cap(u64 cap_query);
>  static inline bool hv_is_hyperv_initialized(void) { return false; }
>  static inline bool hv_is_hibernation_supported(void) { return false; }
>  static inline void hyperv_cleanup(void) {}
> +static inline hv_is_isolation_supported(void);
>  #endif /* CONFIG_HYPERV */
> 
>  #endif
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support
  2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
                     ` (2 preceding siblings ...)
  2021-08-12 19:36   ` Michael Kelley
@ 2021-08-12 21:10   ` Michael Kelley
  3 siblings, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-12 21:10 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM

[snip]

> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index ad8a5c586a35..1e4a0882820a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -29,6 +29,8 @@
>  #include <asm/proto.h>
>  #include <asm/memtype.h>
>  #include <asm/set_memory.h>
> +#include <asm/hyperv-tlfs.h>
> +#include <asm/mshyperv.h>
> 
>  #include "../mm_internal.h"
> 
> @@ -1980,15 +1982,11 @@ int set_memory_global(unsigned long addr, int numpages)
>  				    __pgprot(_PAGE_GLOBAL), 0);
>  }
> 
> -static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>  {
>  	struct cpa_data cpa;
>  	int ret;
> 
> -	/* Nothing to do if memory encryption is not active */
> -	if (!mem_encrypt_active())
> -		return 0;
> -
>  	/* Should not be working on unaligned addresses */
>  	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>  		addr &= PAGE_MASK;
> @@ -2023,6 +2021,17 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	return ret;
>  }
> 
> +static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> +	if (hv_is_isolation_supported())
> +		return hv_set_mem_host_visibility(addr, numpages, !enc);
> +
> +	if (mem_encrypt_active())
> +		return __set_memory_enc_pgtable(addr, numpages, enc);
> +
> +	return 0;
> +}
> +

FYI, this not-yet-accepted patch
https://lore.kernel.org/lkml/ab5a7a983a943e7ca0a7ad28275a2d094c62c371.1623421410.git.ashish.kalra@amd.com/
looks to be providing a generic hook to notify the hypervisor when the
encryption status of a memory range changes.

Michael

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-09 17:56 ` [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
@ 2021-08-12 22:20   ` Michael Kelley
  2021-08-15 15:21     ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-12 22:20 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
> 

Use tag "Drivers: hv: vmbus:" in the Subject line.

> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/hv/channel.c   | 44 ++++++++++++++++++++++++++++++++++++++++--
>  include/linux/hyperv.h | 11 +++++++++++
>  2 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..4c4717c26240 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/uio.h>
>  #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
>  #include <asm/page.h>
>  #include <asm/mshyperv.h>
> 
> @@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	struct list_head *curr;
>  	u32 next_gpadl_handle;
>  	unsigned long flags;
> -	int ret = 0;
> +	int ret = 0, index;
> +
> +	index = atomic_inc_return(&channel->gpadl_index) - 1;
> +
> +	if (index > VMBUS_GPADL_RANGE_COUNT - 1) {
> +		pr_err("Gpadl handle position(%d) has been occupied.\n", index);
> +		return -ENOSPC;
> +	}
> 
>  	next_gpadl_handle =
>  		(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
> @@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	if (ret)
>  		return ret;
> 
> +	ret = set_memory_decrypted((unsigned long)kbuffer,
> +				   HVPFN_UP(size));
> +	if (ret) {
> +		pr_warn("Failed to set host visibility.\n");

Enhance this message a bit.  "Failed to set host visibility for new GPADL\n"
and also output the value of ret.

> +		return ret;
> +	}
> +
>  	init_completion(&msginfo->waitevent);
>  	msginfo->waiting_channel = channel;
> 
> @@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	/* At this point, we received the gpadl created msg */
>  	*gpadl_handle = gpadlmsg->gpadl;
> 
> +	channel->gpadl_array[index].size = size;
> +	channel->gpadl_array[index].buffer = kbuffer;
> +	channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
> +

I can see the merits of transparently stashing the memory address and size
that will be needed by vmbus_teardown_gpadl(), so that the callers of
__vmbus_establish_gpadl() don't have to worry about it.  But doing the
stashing transparently is somewhat messy.

Given that the callers are already have memory allocated to save the
GPADL handle, a little refactoring would make for a much cleaner solution.
Instead of having memory allocated for the 32-bit GPADL handle, callers
should allocate the slightly larger struct vmbus_gpadl that you've
defined below.  The calling interfaces can be updated to take a pointer
to this structure instead of a pointer to the 32-bit GPADL handle, and
you can save the memory address and size right along with the GPADL
handle.  This approach touches a few more files, but I think there are
only two callers outside of the channel management code -- netvsc
and hv_uio -- so it's not a big change.

>  cleanup:
>  	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
>  	list_del(&msginfo->msglistentry);
> @@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	}
> 
>  	kfree(msginfo);
> +
> +	if (ret) {
> +		set_memory_encrypted((unsigned long)kbuffer,
> +				     HVPFN_UP(size));
> +		atomic_dec(&channel->gpadl_index);
> +	}
> +
>  	return ret;
>  }
> 
> @@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
> 
>  	/* Establish the gpadl for the ring buffer */
>  	newchannel->ringbuffer_gpadlhandle = 0;
> +	atomic_set(&newchannel->gpadl_index, 0);
> 
>  	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
>  				      page_address(newchannel->ringbuffer_page),
> @@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>  	struct vmbus_channel_gpadl_teardown *msg;
>  	struct vmbus_channel_msginfo *info;
>  	unsigned long flags;
> -	int ret;
> +	int ret, i;
> 
>  	info = kzalloc(sizeof(*info) +
>  		       sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
> @@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>  	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
> 
>  	kfree(info);
> +
> +	/* Find gpadl buffer virtual address and size. */
> +	for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
> +		if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
> +			break;
> +
> +	if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
> +			HVPFN_UP(channel->gpadl_array[i].size)))
> +		pr_warn("Fail to set mem host visibility.\n");

Enhance this message a bit: "Failed to set host visibility in GPADL teardown\n".
Also output the returned error code to help in debugging any occurrences of
a failure.

> +
> +	channel->gpadl_array[i].gpadlhandle = 0;
> +	atomic_dec(&channel->gpadl_index);

Note that the array entry being cleared (index "i") may not
be the last used entry in the array, so decrementing the gpadl_index
might not have the right behavior.  But I'm pretty sure that all
GPADL's for a channel are freed in sequence with no intervening
allocations, so nothing actually breaks.  But this is one of the 
messy areas with stashing the memory address and size transparently
to the caller.

> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index ddc8713ce57b..90b542597143 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -803,6 +803,14 @@ struct vmbus_device {
> 
>  #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
> 
> +struct vmbus_gpadl {
> +	u32 gpadlhandle;
> +	u32 size;
> +	void *buffer;
> +};
> +
> +#define VMBUS_GPADL_RANGE_COUNT		3
> +
>  struct vmbus_channel {
>  	struct list_head listentry;
> 
> @@ -823,6 +831,9 @@ struct vmbus_channel {
>  	struct completion rescind_event;
> 
>  	u32 ringbuffer_gpadlhandle;
> +	/* GPADL_RING and Send/Receive GPADL_BUFFER. */
> +	struct vmbus_gpadl gpadl_array[VMBUS_GPADL_RANGE_COUNT];
> +	atomic_t gpadl_index;
> 
>  	/* Allocated memory for ring buffer */
>  	struct page *ringbuffer_page;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
  2021-08-12 19:14   ` Michael Kelley
@ 2021-08-13 15:46     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-13 15:46 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

Hi Michael:
      Thanks for your review.

On 8/13/2021 3:14 AM, Michael Kelley wrote:
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
>> Subject: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM
> 
> The subject line tag on patches under arch/x86/hyperv is generally "x86/hyperv:".
> There's some variation in the spelling of "hyperv", but let's go with the all
> lowercase "hyperv".

OK. Will update.

> 
>>
>> Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
>> to communicate with hypervisor. Map GHCB page for all
>> cpus to read/write MSR register and submit hvcall request
>> via GHCB.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>>   arch/x86/hyperv/hv_init.c       | 66 +++++++++++++++++++++++++++++++--
>>   arch/x86/include/asm/mshyperv.h |  2 +
>>   include/asm-generic/mshyperv.h  |  2 +
>>   3 files changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 708a2712a516..0bb4d9ca7a55 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -20,6 +20,7 @@
>>   #include <linux/kexec.h>
>>   #include <linux/version.h>
>>   #include <linux/vmalloc.h>
>> +#include <linux/io.h>
>>   #include <linux/mm.h>
>>   #include <linux/hyperv.h>
>>   #include <linux/slab.h>
>> @@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
>>   struct hv_vp_assist_page **hv_vp_assist_page;
>>   EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>>
>> +static int hyperv_init_ghcb(void)
>> +{
>> +	u64 ghcb_gpa;
>> +	void *ghcb_va;
>> +	void **ghcb_base;
>> +
>> +	if (!ms_hyperv.ghcb_base)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * GHCB page is allocated by paravisor. The address
>> +	 * returned by MSR_AMD64_SEV_ES_GHCB is above shared
>> +	 * ghcb boundary and map it here.
>> +	 */
>> +	rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
>> +	ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
>> +	if (!ghcb_va)
>> +		return -ENOMEM;
>> +
>> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>> +	*ghcb_base = ghcb_va;
>> +
>> +	return 0;
>> +}
>> +
>>   static int hv_cpu_init(unsigned int cpu)
>>   {
>>   	union hv_vp_assist_msr_contents msr = { 0 };
>> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>>   		}
>>   	}
>>
>> +	hyperv_init_ghcb();
>> +
>>   	return 0;
>>   }
>>
>> @@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu)
>>   {
>>   	struct hv_reenlightenment_control re_ctrl;
>>   	unsigned int new_cpu;
>> +	void **ghcb_va = NULL;
> 
> I'm not seeing any reason why this needs to be initialized.
> 
>> +
>> +	if (ms_hyperv.ghcb_base) {
>> +		ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>> +		if (*ghcb_va)
>> +			memunmap(*ghcb_va);
>> +		*ghcb_va = NULL;
>> +	}
>>
>>   	hv_common_cpu_die(cpu);
>>
>> @@ -383,9 +419,19 @@ void __init hyperv_init(void)
>>   			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>>   			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>>   			__builtin_return_address(0));
>> -	if (hv_hypercall_pg == NULL) {
>> -		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>> -		goto remove_cpuhp_state;
>> +	if (hv_hypercall_pg == NULL)
>> +		goto clean_guest_os_id;
>> +
>> +	if (hv_isolation_type_snp()) {
>> +		ms_hyperv.ghcb_base = alloc_percpu(void *);
>> +		if (!ms_hyperv.ghcb_base)
>> +			goto clean_guest_os_id;
>> +
>> +		if (hyperv_init_ghcb()) {
>> +			free_percpu(ms_hyperv.ghcb_base);
>> +			ms_hyperv.ghcb_base = NULL;
>> +			goto clean_guest_os_id;
>> +		}
> 
> Having the GHCB setup code here splits the hypercall page setup into
> two parts, which is unexpected.  First the memory is allocated
> for the hypercall page, then the GHCB stuff is done, then the hypercall
> MSR is setup.  Is there a need to do this split?  Also, if the GHCB stuff
> fails and you goto clean_guest_os_id, the memory allocated for the
> hypercall page is never freed.


Just not enable hypercall when fails to setup ghcb. Otherwise, we need 
to disable hypercall in the failure code path.

Yes,hypercall page should be freed in the clean_guest_os_id path.

> 
> It's also unexpected to have hyperv_init_ghcb() called here and called
> in hv_cpu_init().  Wouldn't it be possible to setup ghcb_base *before*
> cpu_setup_state() is called, so that hv_cpu_init() would take care of
> calling hyperv_init_ghcb() for the boot CPU?  That's the pattern used
> by the VP assist page, the percpu input page, etc.

I will have a try and report back. Thanks for suggestion.

> 
>>   	}
>>
>>   	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> @@ -456,7 +502,8 @@ void __init hyperv_init(void)
>>   	hv_query_ext_cap(0);
>>   	return;
>>
>> -remove_cpuhp_state:
>> +clean_guest_os_id:
>> +	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>>   	cpuhp_remove_state(cpuhp);
>>   free_vp_assist_page:
>>   	kfree(hv_vp_assist_page);
>> @@ -484,6 +531,9 @@ void hyperv_cleanup(void)
>>   	 */
>>   	hv_hypercall_pg = NULL;
>>
>> +	if (ms_hyperv.ghcb_base)
>> +		free_percpu(ms_hyperv.ghcb_base);
>> +
> 
> I don't think this cleanup is necessary.  The primary purpose of
> hyperv_cleanup() is to ensure that things like overlay pages are
> properly reset in Hyper-V before doing a kexec(), or before
> panic'ing and running the kdump kernel.  There's no need to do
> general memory free'ing in Linux.  Doing so just adds to the risk
> that the panic path could itself fail.

Nice catch. I will remove this.

> 
>>   	/* Reset the hypercall page */
>>   	hypercall_msr.as_uint64 = 0;
>>   	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> @@ -559,3 +609,11 @@ bool hv_is_isolation_supported(void)
>>   {
>>   	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>>   }
>> +
>> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
>> +
>> +bool hv_isolation_type_snp(void)
>> +{
>> +	return static_branch_unlikely(&isolation_type_snp);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index adccbc209169..6627cfd2bfba 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -11,6 +11,8 @@
>>   #include <asm/paravirt.h>
>>   #include <asm/mshyperv.h>
>>
>> +DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
>> +
>>   typedef int (*hyperv_fill_flush_list_func)(
>>   		struct hv_guest_mapping_flush_list *flush,
>>   		void *data);
>> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
>> index c1ab6a6e72b5..4269f3174e58 100644
>> --- a/include/asm-generic/mshyperv.h
>> +++ b/include/asm-generic/mshyperv.h
>> @@ -36,6 +36,7 @@ struct ms_hyperv_info {
>>   	u32 max_lp_index;
>>   	u32 isolation_config_a;
>>   	u32 isolation_config_b;
>> +	void  __percpu **ghcb_base;
> 
> This doesn't feel like the right place to put this pointer.  The other
> fields in the ms_hyperv_info structure are just fixed values obtained
> from the CPUID instruction.   The existing patterns similar to ghcb_base
> are the VP assist page and the percpu input and output args.  They are
> all based on standalone global variables.  It would be more consistent
> to do the same with the ghcb_base.

OK. I will update in the next version.

> 
>>   };
>>   extern struct ms_hyperv_info ms_hyperv;
>>
>> @@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void);
>>   bool hv_is_hibernation_supported(void);
>>   enum hv_isolation_type hv_get_isolation_type(void);
>>   bool hv_is_isolation_supported(void);
>> +bool hv_isolation_type_snp(void);
>>   void hyperv_cleanup(void);
>>   bool hv_query_ext_cap(u64 cap_query);
>>   #else /* CONFIG_HYPERV */
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-12 12:27   ` Christoph Hellwig
@ 2021-08-13 17:58     ` Tianyu Lan
  2021-08-16 14:50       ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-13 17:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On 8/12/2021 8:27 PM, Christoph Hellwig wrote:
> This is still broken.  You need to make sure the actual DMA allocations
> do have struct page backing.
> 

Hi Christoph:
      swiotlb_tbl_map_single() still returns PA below vTOM/share_gpa_
boundary. These PAs has backing pages and belong to system memory.
In other word, all PAs passed to DMA API have backing pages and these is 
no difference between Isolation guest and traditional guest for DMA API.
The new mapped VA for PA above vTOM here is just to access the bounce 
buffer in the swiotlb code and isn't exposed to outside.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
  2021-08-09 17:56 ` [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page Tianyu Lan
@ 2021-08-13 19:31   ` Michael Kelley
  2021-08-13 20:26     ` Michael Kelley
  2021-08-24  8:45   ` Christoph Hellwig
  1 sibling, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-13 19:31 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page

See previous comments about tag in the Subject line.

> Hyper-V provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyper-V requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v1:
>          * Introduce sev_es_ghcb_hv_call_simple() and share code
>            between SEV and Hyper-V code.
> ---
>  arch/x86/hyperv/hv_init.c       |  33 ++-------
>  arch/x86/hyperv/ivm.c           | 110 +++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h |  78 +++++++++++++++++++-
>  arch/x86/include/asm/sev.h      |   3 +
>  arch/x86/kernel/cpu/mshyperv.c  |   3 +
>  arch/x86/kernel/sev-shared.c    |  63 ++++++++++-------
>  drivers/hv/hv.c                 | 121 ++++++++++++++++++++++----------
>  include/asm-generic/mshyperv.h  |  12 +++-
>  8 files changed, 329 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b3683083208a..ab0b33f621e7 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -423,7 +423,7 @@ void __init hyperv_init(void)
>  		goto clean_guest_os_id;
> 
>  	if (hv_isolation_type_snp()) {
> -		ms_hyperv.ghcb_base = alloc_percpu(void *);
> +		ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);

union hv_ghcb isn't defined.  It is not added until patch 6 of the series.

>  		if (!ms_hyperv.ghcb_base)
>  			goto clean_guest_os_id;
> 
> @@ -432,6 +432,9 @@ void __init hyperv_init(void)
>  			ms_hyperv.ghcb_base = NULL;
>  			goto clean_guest_os_id;
>  		}
> +
> +		/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> +		hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>  	}
> 
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -523,6 +526,7 @@ void hyperv_cleanup(void)
> 
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> +	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>  	/*
>  	 * Reset hypercall page reference before reset the page,
> @@ -596,30 +600,3 @@ bool hv_is_hyperv_initialized(void)
>  	return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> -	if (!(ms_hyperv.priv_high & HV_ISOLATION))
> -		return HV_ISOLATION_TYPE_NONE;
> -	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> -	if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> -		return 0;
> -
> -	if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> -		return 0;
> -
> -	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -
> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> -
> -bool hv_isolation_type_snp(void)
> -{
> -	return static_branch_unlikely(&isolation_type_snp);
> -}
> -EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 8c905ffdba7f..ec0e5c259740 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,6 +6,8 @@
>   *  Tianyu Lan <Tianyu.Lan@microsoft.com>
>   */
> 
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
>  #include <linux/hyperv.h>
>  #include <linux/types.h>
>  #include <linux/bitfield.h>
> @@ -13,6 +15,114 @@
>  #include <asm/io.h>
>  #include <asm/mshyperv.h>
> 
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> +	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
> +	ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);

Having used lower_32_bits() in the previous line, perhaps use
upper_32_bits() here?

> +
> +	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
> +		pr_warn("Fail to write msr via ghcb %llx.\n", msr);
> +
> +	local_irq_restore(flags);
> +}
> +
> +void hv_ghcb_msr_read(u64 msr, u64 *value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> +	if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
> +		pr_warn("Fail to read msr via ghcb %llx.\n", msr);
> +	else
> +		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
> +			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
> +	local_irq_restore(flags);
> +}
> +
> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
> +{
> +	hv_ghcb_msr_read(msr, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
> +
> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
> +{
> +	hv_ghcb_msr_write(msr, value);
> +
> +	/* Write proxy bit vua wrmsrl instruction. */

s/vua/via/

> +	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
> +		wrmsrl(msr, value | 1 << 20);
> +}
> +EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
> +
> +void hv_signal_eom_ghcb(void)
> +{
> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
> +}
> +EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);

Is there a reason for this to be a function instead of a #define?
Seemingly parallel calls such as  hv_set_synic_state_ghcb()
are #defines.

> +
> +enum hv_isolation_type hv_get_isolation_type(void)
> +{
> +	if (!(ms_hyperv.priv_high & HV_ISOLATION))
> +		return HV_ISOLATION_TYPE_NONE;
> +	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> +}
> +EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> +
> +/*
> + * hv_is_isolation_supported - Check system runs in the Hyper-V
> + * isolation VM.
> + */
> +bool hv_is_isolation_supported(void)
> +{
> +	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> +}
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +
> +/*
> + * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
> + * isolation VM.
> + */
> +bool hv_isolation_type_snp(void)
> +{
> +	return static_branch_unlikely(&isolation_type_snp);
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> +

hv_isolation_type_snp() is implemented here in a file under arch/x86,
but it is called from architecture independent code in drivers/hv, so it
needs to do the right thing on ARM64 as well as x86.  For an example,
see the handling of hv_is_isolation_supported() in the latest linux-next
tree.

>  /*
>   * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
>   *
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 87a386fa97f7..730985676ea3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -30,6 +30,63 @@ static inline u64 hv_get_register(unsigned int reg)
>  	return value;
>  }
> 
> +#define hv_get_sint_reg(val, reg) {		\
> +	if (hv_isolation_type_snp())		\
> +		hv_get_##reg##_ghcb(&val);	\
> +	else					\
> +		rdmsrl(HV_X64_MSR_##reg, val);	\
> +	}
> +
> +#define hv_set_sint_reg(val, reg) {		\
> +	if (hv_isolation_type_snp())		\
> +		hv_set_##reg##_ghcb(val);	\
> +	else					\
> +		wrmsrl(HV_X64_MSR_##reg, val);	\
> +	}
> +
> +
> +#define hv_get_simp(val) hv_get_sint_reg(val, SIMP)
> +#define hv_get_siefp(val) hv_get_sint_reg(val, SIEFP)
> +
> +#define hv_set_simp(val) hv_set_sint_reg(val, SIMP)
> +#define hv_set_siefp(val) hv_set_sint_reg(val, SIEFP)
> +
> +#define hv_get_synic_state(val) {			\
> +	if (hv_isolation_type_snp())			\
> +		hv_get_synic_state_ghcb(&val);		\
> +	else						\
> +		rdmsrl(HV_X64_MSR_SCONTROL, val);	\

Many of these registers that exist on x86 and ARM64 architectures
have new names without the "X64_MSR" portion.  For
example, include/asm-generic/hyperv-tlfs.h defines
HV_REGISTER_SCONTROL.  The x86-specific version of 
hyperv-tlfs.h currently has a #define for HV_X64_MSR_SCONTROL,
but we would like to get rid of these temporary aliases.
So prefer to use HV_REGISTER_SCONTROL.

Same comment applies several places in this code for other
similar registers.

> +	}
> +#define hv_set_synic_state(val) {			\
> +	if (hv_isolation_type_snp())			\
> +		hv_set_synic_state_ghcb(val);		\
> +	else						\
> +		wrmsrl(HV_X64_MSR_SCONTROL, val);	\
> +	}
> +
> +#define hv_get_vp_index(index) rdmsrl(HV_X64_MSR_VP_INDEX, index)
> +
> +#define hv_signal_eom() {			 \
> +	if (hv_isolation_type_snp() &&		 \
> +	    old_msg_type != HVMSG_TIMER_EXPIRED) \

Why is a test of the old_msg_type embedded in this macro?
And given that old_msg_type isn't a parameter of the
macro, its use is really weird.

> +		hv_signal_eom_ghcb();		 \
> +	else					 \
> +		wrmsrl(HV_X64_MSR_EOM, 0);	 \
> +	}
> +
> +#define hv_get_synint_state(int_num, val) {		\
> +	if (hv_isolation_type_snp())			\0
> +		hv_get_synint_state_ghcb(int_num, &val);\
> +	else						\
> +		rdmsrl(HV_X64_MSR_SINT0 + int_num, val);\
> +	}
> +#define hv_set_synint_state(int_num, val) {		\
> +	if (hv_isolation_type_snp())			\
> +		hv_set_synint_state_ghcb(int_num, val);	\
> +	else						\
> +		wrmsrl(HV_X64_MSR_SINT0 + int_num, val);\
> +	}
> +
>  #define hv_get_raw_timer() rdtsc_ordered()
> 
>  void hyperv_vector_handler(struct pt_regs *regs);
> @@ -193,6 +250,25 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  			   enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
> +void hv_signal_eom_ghcb(void);
> +void hv_ghcb_msr_write(u64 msr, u64 value);
> +void hv_ghcb_msr_read(u64 msr, u64 *value);
> +
> +#define hv_get_synint_state_ghcb(int_num, val)			\
> +	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> +#define hv_set_synint_state_ghcb(int_num, val) \
> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> +
> +#define hv_get_SIMP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
> +#define hv_set_SIMP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
> +
> +#define hv_get_SIEFP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
> +#define hv_set_SIEFP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
> +
> +#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
> +#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> @@ -209,9 +285,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
>  {
>  	return -1;
>  }
> +static inline void hv_signal_eom_ghcb(void) { };
>  #endif /* CONFIG_HYPERV */
> 
> -
>  #include <asm-generic/mshyperv.h>
> 
>  #endif
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index fa5cd05d3b5b..81beb2a8031b 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -81,6 +81,9 @@ static __always_inline void sev_es_nmi_complete(void)
>  		__sev_es_nmi_complete();
>  }
>  extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
> +extern enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2);
>  #else
>  static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>  static inline void sev_es_ist_exit(void) { }
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 2b7f396ef1a5..3633f871ac1e 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -318,6 +318,9 @@ static void __init ms_hyperv_init_platform(void)
> 
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> +
> +		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> +			static_branch_enable(&isolation_type_snp);
>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 9f90f460a28c..dd7f37de640b 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -94,10 +94,9 @@ static void vc_finish_insn(struct es_em_ctxt *ctxt)
>  	ctxt->regs->ip += ctxt->insn.length;
>  }
> 
> -static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> -					  struct es_em_ctxt *ctxt,
> -					  u64 exit_code, u64 exit_info_1,
> -					  u64 exit_info_2)
> +enum es_result sev_es_ghcb_hv_call_simple(struct ghcb *ghcb,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2)
>  {
>  	enum es_result ret;
> 
> @@ -109,29 +108,45 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
>  	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
>  	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
> 
> -	sev_es_wr_ghcb_msr(__pa(ghcb));
>  	VMGEXIT();
> 
> -	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
> -		u64 info = ghcb->save.sw_exit_info_2;
> -		unsigned long v;
> -
> -		info = ghcb->save.sw_exit_info_2;
> -		v = info & SVM_EVTINJ_VEC_MASK;
> -
> -		/* Check if exception information from hypervisor is sane. */
> -		if ((info & SVM_EVTINJ_VALID) &&
> -		    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
> -		    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
> -			ctxt->fi.vector = v;
> -			if (info & SVM_EVTINJ_VALID_ERR)
> -				ctxt->fi.error_code = info >> 32;
> -			ret = ES_EXCEPTION;
> -		} else {
> -			ret = ES_VMM_ERROR;
> -		}
> -	} else {
> +	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1)
> +		ret = ES_VMM_ERROR;
> +	else
>  		ret = ES_OK;
> +
> +	return ret;
> +}
> +
> +static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> +				   struct es_em_ctxt *ctxt,
> +				   u64 exit_code, u64 exit_info_1,
> +				   u64 exit_info_2)
> +{
> +	unsigned long v;
> +	enum es_result ret;
> +	u64 info;
> +
> +	sev_es_wr_ghcb_msr(__pa(ghcb));
> +
> +	ret = sev_es_ghcb_hv_call_simple(ghcb, exit_code, exit_info_1,
> +					 exit_info_2);
> +	if (ret == ES_OK)
> +		return ret;
> +
> +	info = ghcb->save.sw_exit_info_2;
> +	v = info & SVM_EVTINJ_VEC_MASK;
> +
> +	/* Check if exception information from hypervisor is sane. */
> +	if ((info & SVM_EVTINJ_VALID) &&
> +	    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
> +	    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
> +		ctxt->fi.vector = v;
> +		if (info & SVM_EVTINJ_VALID_ERR)
> +			ctxt->fi.error_code = info >> 32;
> +		ret = ES_EXCEPTION;
> +	} else {
> +		ret = ES_VMM_ERROR;
>  	}
> 
>  	return ret;
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index e83507f49676..59f7173c4d9f 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -8,6 +8,7 @@
>   */
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> 
> +#include <linux/io.h>
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  #include <linux/slab.h>
> @@ -136,17 +137,24 @@ int hv_synic_alloc(void)
>  		tasklet_init(&hv_cpu->msg_dpc,
>  			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
> 
> -		hv_cpu->synic_message_page =
> -			(void *)get_zeroed_page(GFP_ATOMIC);
> -		if (hv_cpu->synic_message_page == NULL) {
> -			pr_err("Unable to allocate SYNIC message page\n");
> -			goto err;
> -		}
> +		/*
> +		 * Synic message and event pages are allocated by paravisor.
> +		 * Skip these pages allocation here.
> +		 */
> +		if (!hv_isolation_type_snp()) {
> +			hv_cpu->synic_message_page =
> +				(void *)get_zeroed_page(GFP_ATOMIC);
> +			if (hv_cpu->synic_message_page == NULL) {
> +				pr_err("Unable to allocate SYNIC message page\n");
> +				goto err;
> +			}
> 
> -		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
> -		if (hv_cpu->synic_event_page == NULL) {
> -			pr_err("Unable to allocate SYNIC event page\n");
> -			goto err;
> +			hv_cpu->synic_event_page =
> +				(void *)get_zeroed_page(GFP_ATOMIC);
> +			if (hv_cpu->synic_event_page == NULL) {
> +				pr_err("Unable to allocate SYNIC event page\n");
> +				goto err;
> +			}
>  		}
> 
>  		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
> @@ -173,10 +181,17 @@ void hv_synic_free(void)
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> +		free_page((unsigned long)hv_cpu->post_msg_page);
> +
> +		/*
> +		 * Synic message and event pages are allocated by paravisor.
> +		 * Skip free these pages here.
> +		 */
> +		if (hv_isolation_type_snp())
> +			continue;
> 
>  		free_page((unsigned long)hv_cpu->synic_event_page);
>  		free_page((unsigned long)hv_cpu->synic_message_page);
> -		free_page((unsigned long)hv_cpu->post_msg_page);

You could skip making these changes to hv_synic_free().  If the message
and event pages aren't allocated, the pointers will be NULL and
free_page() will happily do nothing.

>  	}
> 
>  	kfree(hv_context.hv_numa_map);
> @@ -199,26 +214,43 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	union hv_synic_scontrol sctrl;
> 
>  	/* Setup the Synic's message page */
> -	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
> +	hv_get_simp(simp.as_uint64);

This code is intended to be architecture independent and builds for
x86 and for ARM64.  Changing the use of hv_get_register() and hv_set_register()
will fail badly when built for ARM64.   I haven't completely thought through
what the best solution might be, but the current set of mappings from
hv_get_simp() down to hv_ghcb_msr_read() isn't going to work on ARM64.

Is it possible to hide all the x86-side complexity in the implementation of
hv_get_register()?   Certain MSRs would have to be special-cased when
SNP isolation is enabled, but that might be easier than trying to no-op out
the ghcb machinery on the ARM64 side. 

>  	simp.simp_enabled = 1;
> -	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> -		>> HV_HYP_PAGE_SHIFT;
> 
> -	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> +	if (hv_isolation_type_snp()) {
> +		hv_cpu->synic_message_page
> +			= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
> +				   HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> +		if (!hv_cpu->synic_message_page)
> +			pr_err("Fail to map syinc message page.\n");
> +	} else {
> +		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +	}
> +
> +	hv_set_simp(simp.as_uint64);
> 
>  	/* Setup the Synic's event page */
> -	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
> +	hv_get_siefp(siefp.as_uint64);
>  	siefp.siefp_enabled = 1;
> -	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> -		>> HV_HYP_PAGE_SHIFT;
> 
> -	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> +	if (hv_isolation_type_snp()) {
> +		hv_cpu->synic_event_page =
> +			memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
> +				 HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> +
> +		if (!hv_cpu->synic_event_page)
> +			pr_err("Fail to map syinc event page.\n");
> +	} else {
> +		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +	}
> +	hv_set_siefp(siefp.as_uint64);
> 
>  	/* Setup the shared SINT. */
>  	if (vmbus_irq != -1)
>  		enable_percpu_irq(vmbus_irq, 0);
> -	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
> -					VMBUS_MESSAGE_SINT);
> +	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> 
>  	shared_sint.vector = vmbus_interrupt;
>  	shared_sint.masked = false;
> @@ -233,14 +265,12 @@ void hv_synic_enable_regs(unsigned int cpu)
>  #else
>  	shared_sint.auto_eoi = 0;
>  #endif
> -	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
> -				shared_sint.as_uint64);
> +	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> 
>  	/* Enable the global synic bit */
> -	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
> +	hv_get_synic_state(sctrl.as_uint64);
>  	sctrl.enable = 1;
> -
> -	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
> +	hv_set_synic_state(sctrl.as_uint64);
>  }
> 
>  int hv_synic_init(unsigned int cpu)
> @@ -257,37 +287,50 @@ int hv_synic_init(unsigned int cpu)
>   */
>  void hv_synic_disable_regs(unsigned int cpu)
>  {
> +	struct hv_per_cpu_context *hv_cpu
> +		= per_cpu_ptr(hv_context.cpu_context, cpu);
>  	union hv_synic_sint shared_sint;
>  	union hv_synic_simp simp;
>  	union hv_synic_siefp siefp;
>  	union hv_synic_scontrol sctrl;
> 
> -	shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
> -					VMBUS_MESSAGE_SINT);
> -
> +	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>  	shared_sint.masked = 1;
> +	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +
> 
>  	/* Need to correctly cleanup in the case of SMP!!! */
>  	/* Disable the interrupt */
> -	hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
> -				shared_sint.as_uint64);
> +	hv_get_simp(simp.as_uint64);
> 
> -	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
> +	/*
> +	 * In Isolation VM, sim and sief pages are allocated by
> +	 * paravisor. These pages also will be used by kdump
> +	 * kernel. So just reset enable bit here and keep page
> +	 * addresses.
> +	 */
>  	simp.simp_enabled = 0;
> -	simp.base_simp_gpa = 0;
> +	if (hv_isolation_type_snp())
> +		memunmap(hv_cpu->synic_message_page);
> +	else
> +		simp.base_simp_gpa = 0;
> 
> -	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> +	hv_set_simp(simp.as_uint64);
> 
> -	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
> +	hv_get_siefp(siefp.as_uint64);
>  	siefp.siefp_enabled = 0;
> -	siefp.base_siefp_gpa = 0;
> 
> -	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> +	if (hv_isolation_type_snp())
> +		memunmap(hv_cpu->synic_event_page);
> +	else
> +		siefp.base_siefp_gpa = 0;
> +
> +	hv_set_siefp(siefp.as_uint64);
> 
>  	/* Disable the global synic bit */
> -	sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
> +	hv_get_synic_state(sctrl.as_uint64);
>  	sctrl.enable = 0;
> -	hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
> +	hv_set_synic_state(sctrl.as_uint64);
> 
>  	if (vmbus_irq != -1)
>  		disable_percpu_irq(vmbus_irq);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 079988ed45b9..90dac369a2dc 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -23,9 +23,16 @@
>  #include <linux/bitops.h>
>  #include <linux/cpumask.h>
>  #include <linux/nmi.h>
> +#include <asm/svm.h>
> +#include <asm/sev.h>
>  #include <asm/ptrace.h>
> +#include <asm/mshyperv.h>
>  #include <asm/hyperv-tlfs.h>
> 
> +union hv_ghcb {
> +	struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);
> +
>  struct ms_hyperv_info {
>  	u32 features;
>  	u32 priv_high;
> @@ -45,7 +52,7 @@ struct ms_hyperv_info {
>  			u32 Reserved12 : 20;
>  		};
>  	};
> -	void  __percpu **ghcb_base;
> +	union hv_ghcb __percpu **ghcb_base;
>  	u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> @@ -55,6 +62,7 @@ extern void  __percpu  **hyperv_pcpu_output_arg;
> 
>  extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
>  extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> +extern bool hv_isolation_type_snp(void);
> 
>  /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
>  static inline int hv_result(u64 status)
> @@ -149,7 +157,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
>  		 * possibly deliver another msg from the
>  		 * hypervisor
>  		 */
> -		hv_set_register(HV_REGISTER_EOM, 0);
> +		hv_signal_eom();
>  	}
>  }
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
  2021-08-13 19:31   ` Michael Kelley
@ 2021-08-13 20:26     ` Michael Kelley
  0 siblings, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-13 20:26 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Michael Kelley <mikelley@microsoft.com> Sent: Friday, August 13, 2021 12:31 PM
> To: Tianyu Lan <ltykernel@gmail.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de; x86@kernel.org; hpa@zytor.com; dave.hansen@linux.intel.com;
> luto@kernel.org; peterz@infradead.org; konrad.wilk@oracle.com; boris.ostrovsky@oracle.com; jgross@suse.com;
> sstabellini@kernel.org; joro@8bytes.org; will@kernel.org; davem@davemloft.net; kuba@kernel.org; jejb@linux.ibm.com;
> martin.petersen@oracle.com; arnd@arndb.de; hch@lst.de; m.szyprowski@samsung.com; robin.murphy@arm.com;
> thomas.lendacky@amd.com; brijesh.singh@amd.com; ardb@kernel.org; Tianyu Lan <Tianyu.Lan@microsoft.com>;
> pgonda@google.com; martin.b.radev@gmail.com; akpm@linux-foundation.org; kirill.shutemov@linux.intel.com;
> rppt@kernel.org; sfr@canb.auug.org.au; saravanand@fb.com; krish.sadhukhan@oracle.com;
> aneesh.kumar@linux.ibm.com; xen-devel@lists.xenproject.org; rientjes@google.com; hannes@cmpxchg.org;
> tj@kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-arch@vger.kernel.org; linux-hyperv@vger.kernel.org; linux-
> kernel@vger.kernel.org; linux-scsi@vger.kernel.org; netdev@vger.kernel.org; vkuznets <vkuznets@redhat.com>;
> parri.andrea@gmail.com; dave.hansen@intel.com
> Subject: RE: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
> 
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> > Subject: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
> 
> See previous comments about tag in the Subject line.
> 
> > Hyper-V provides GHCB protocol to write Synthetic Interrupt
> > Controller MSR registers in Isolation VM with AMD SEV SNP
> > and these registers are emulated by hypervisor directly.
> > Hyper-V requires to write SINTx MSR registers twice. First
> > writes MSR via GHCB page to communicate with hypervisor
> > and then writes wrmsr instruction to talk with paravisor
> > which runs in VMPL0. Guest OS ID MSR also needs to be set
> > via GHCB.
> >
> > Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> > ---
> > Change since v1:
> >          * Introduce sev_es_ghcb_hv_call_simple() and share code
> >            between SEV and Hyper-V code.
> > ---
> >  arch/x86/hyperv/hv_init.c       |  33 ++-------
> >  arch/x86/hyperv/ivm.c           | 110 +++++++++++++++++++++++++++++
> >  arch/x86/include/asm/mshyperv.h |  78 +++++++++++++++++++-
> >  arch/x86/include/asm/sev.h      |   3 +
> >  arch/x86/kernel/cpu/mshyperv.c  |   3 +
> >  arch/x86/kernel/sev-shared.c    |  63 ++++++++++-------
> >  drivers/hv/hv.c                 | 121 ++++++++++++++++++++++----------
> >  include/asm-generic/mshyperv.h  |  12 +++-
> >  8 files changed, 329 insertions(+), 94 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index b3683083208a..ab0b33f621e7 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -423,7 +423,7 @@ void __init hyperv_init(void)
> >  		goto clean_guest_os_id;
> >
> >  	if (hv_isolation_type_snp()) {
> > -		ms_hyperv.ghcb_base = alloc_percpu(void *);
> > +		ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
> 
> union hv_ghcb isn't defined.  It is not added until patch 6 of the series.
> 

Ignore this comment.  My mistake.

Michael

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM
  2021-08-09 17:56 ` [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
@ 2021-08-13 20:42   ` Michael Kelley
  0 siblings, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-13 20:42 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 
> Hyper-V provides ghcb hvcall to handle VMBus
> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
> msg in SNP Isolation VM. Add such support.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/ivm.c           | 43 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h |  1 +
>  drivers/hv/connection.c         |  6 ++++-
>  drivers/hv/hv.c                 |  8 +++++-
>  include/asm-generic/mshyperv.h  | 29 ++++++++++++++++++++++
>  5 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index ec0e5c259740..c13ec5560d73 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -15,6 +15,49 @@
>  #include <asm/io.h>
>  #include <asm/mshyperv.h>
> 
> +#define GHCB_USAGE_HYPERV_CALL	1
> +
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return -EFAULT;
> +
> +	WARN_ON(in_nmi());
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return -EFAULT;
> +	}
> +
> +	hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
> +	hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
> +
> +	hv_ghcb->hypercall.outputgpa = (u64)output;
> +	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> +	hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> +	if (input_size)
> +		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> +	VMGEXIT();
> +
> +	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
> +	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
> +	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
> +
> +	local_irq_restore(flags);
> +
> +	return hv_ghcb->hypercall.hypercalloutput.callstatus;
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);

This function is called from architecture independent code, so it needs a
default no-op stub to enable the code to compile on ARM64.  The stub should
always return failure.

> +
>  void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>  	union hv_ghcb *hv_ghcb;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 730985676ea3..a30c60f189a3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
>  void hv_ghcb_msr_write(u64 msr, u64 value);
>  void hv_ghcb_msr_read(u64 msr, u64 *value);
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
> 
>  #define hv_get_synint_state_ghcb(int_num, val)			\
>  	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 5e479d54918c..6d315c1465e0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
> 
>  	++channel->sig_events;
> 
> -	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> +	if (hv_isolation_type_snp())
> +		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
> +				NULL, sizeof(u64));
> +	else
> +		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_set_event);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 59f7173c4d9f..e5c9fc467893 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
>  	aligned_msg->payload_size = payload_size;
>  	memcpy((void *)aligned_msg->payload, payload, payload_size);
> 
> -	status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
> +	if (hv_isolation_type_snp())
> +		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
> +				(void *)aligned_msg, NULL,
> +				sizeof(struct hv_input_post_message));
> +	else
> +		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
> +				aligned_msg, NULL);
> 
>  	/* Preemption must remain disabled until after the hypercall
>  	 * so some other thread can't get scheduled onto this cpu and
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 90dac369a2dc..400181b855c1 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -31,6 +31,35 @@
> 
>  union hv_ghcb {
>  	struct ghcb ghcb;
> +	struct {
> +		u64 hypercalldata[509];
> +		u64 outputgpa;
> +		union {
> +			union {
> +				struct {
> +					u32 callcode        : 16;
> +					u32 isfast          : 1;
> +					u32 reserved1       : 14;
> +					u32 isnested        : 1;
> +					u32 countofelements : 12;
> +					u32 reserved2       : 4;
> +					u32 repstartindex   : 12;
> +					u32 reserved3       : 4;
> +				};
> +				u64 asuint64;
> +			} hypercallinput;
> +			union {
> +				struct {
> +					u16 callstatus;
> +					u16 reserved1;
> +					u32 elementsprocessed : 12;
> +					u32 reserved2         : 20;
> +				};
> +				u64 asunit64;
> +			} hypercalloutput;
> +		};
> +		u64 reserved2;
> +	} hypercall;
>  } __packed __aligned(PAGE_SIZE);

Alignment should be to HV_HYP_PAGE_SIZE.  And it would be a good safety
play to have a BUILD_BUG_ON() somewhere asserting that
sizeof(union hv_ghcb) == HV_HYP_PAGE_SIZE.

> 
>  struct ms_hyperv_info {
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message
  2021-08-09 17:56 ` [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
@ 2021-08-13 21:28   ` Michael Kelley
  0 siblings, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-13 21:28 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 
> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary). Introduce monitor_pages_va to store
> the remap address and unmap these va when disconnect vmbus.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v1:
>         * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 65 +++++++++++++++++++++++++++++++++++++++
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 66 insertions(+)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6d315c1465e0..bf0ac3167bd2 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/hyperv.h>
>  #include <linux/export.h>
> +#include <linux/io.h>
>  #include <asm/mshyperv.h>
> 
>  #include "hyperv_vmbus.h"
> @@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
> 
>  	msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>  	msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> +	if (hv_isolation_type_snp()) {
> +		msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> +		msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> +	}
> +
>  	msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>  	/*
> @@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
>  		return -ECONNREFUSED;
>  	}
> 
> +	if (hv_isolation_type_snp()) {
> +		vmbus_connection.monitor_pages_va[0]
> +			= vmbus_connection.monitor_pages[0];
> +		vmbus_connection.monitor_pages[0]
> +			= memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +				   MEMREMAP_WB);
> +		if (!vmbus_connection.monitor_pages[0])
> +			return -ENOMEM;

This error case causes vmbus_negotiate_version() to return with
vmbus_connection.con_state set to CONNECTED.  But the caller never checks the
returned error code except for ETIMEDOUT.  So the caller will think that
vmbus_negotiate_version() succeeded when it didn't.  There may be some
existing bugs in that error handling code. :-(

> +
> +		vmbus_connection.monitor_pages_va[1]
> +			= vmbus_connection.monitor_pages[1];
> +		vmbus_connection.monitor_pages[1]
> +			= memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +				   MEMREMAP_WB);
> +		if (!vmbus_connection.monitor_pages[1]) {
> +			memunmap(vmbus_connection.monitor_pages[0]);
> +			return -ENOMEM;
> +		}
> +
> +		memset(vmbus_connection.monitor_pages[0], 0x00,
> +		       HV_HYP_PAGE_SIZE);
> +		memset(vmbus_connection.monitor_pages[1], 0x00,
> +		       HV_HYP_PAGE_SIZE);
> +	}
> +

I don't think the memset() calls are needed.  The memory was originally
allocated with hv_alloc_hyperv_zeroed_page(), so it should already be zeroed.

>  	return ret;
>  }
> 
> @@ -159,6 +191,7 @@ int vmbus_connect(void)
>  	struct vmbus_channel_msginfo *msginfo = NULL;
>  	int i, ret = 0;
>  	__u32 version;
> +	u64 pfn[2];
> 
>  	/* Initialize the vmbus connection */
>  	vmbus_connection.conn_state = CONNECTING;
> @@ -216,6 +249,16 @@ int vmbus_connect(void)
>  		goto cleanup;
>  	}
> 
> +	if (hv_is_isolation_supported()) {
> +		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> +		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> +		if (hv_mark_gpa_visibility(2, pfn,
> +				VMBUS_PAGE_VISIBLE_READ_WRITE)) {

Note that hv_mark_gpa_visibility() will need an appropriate no-op stub so
that this architecture independent code will compile for ARM64.

> +			ret = -EFAULT;
> +			goto cleanup;
> +		}
> +	}
> +
>  	msginfo = kzalloc(sizeof(*msginfo) +
>  			  sizeof(struct vmbus_channel_initiate_contact),
>  			  GFP_KERNEL);
> @@ -284,6 +327,8 @@ int vmbus_connect(void)
> 
>  void vmbus_disconnect(void)
>  {
> +	u64 pfn[2];
> +
>  	/*
>  	 * First send the unload request to the host.
>  	 */
> @@ -303,6 +348,26 @@ void vmbus_disconnect(void)
>  		vmbus_connection.int_page = NULL;
>  	}
> 
> +	if (hv_is_isolation_supported()) {
> +		if (vmbus_connection.monitor_pages_va[0]) {
> +			memunmap(vmbus_connection.monitor_pages[0]);
> +			vmbus_connection.monitor_pages[0]
> +				= vmbus_connection.monitor_pages_va[0];
> +			vmbus_connection.monitor_pages_va[0] = NULL;
> +		}
> +
> +		if (vmbus_connection.monitor_pages_va[1]) {
> +			memunmap(vmbus_connection.monitor_pages[1]);
> +			vmbus_connection.monitor_pages[1]
> +				= vmbus_connection.monitor_pages_va[1];
> +			vmbus_connection.monitor_pages_va[1] = NULL;
> +		}
> +
> +		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> +		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> +		hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
> +	}
> +

The code in this patch feels a bit more complicated than it needs to be.  Altogether,
there are two different virtual addresses and one physical address for each monitor
page.  The two virtual addresses are the one obtained from the original memory
allocation, and which will be used to free the memory.  The second virtual address
is the one used to actually access the data, which is the same as the first virtual
address for a non-isolated VM.  The second VA is the result of memremap() call for an
isolated VM.  The vmbus_connection data structure should save all three values for
each monitor page so they don't need to recomputed or moved around.  Then:

1) For isolated and for non-isolated VMs, setup the virtual and physical addresses
of the monitor pages in vmbus_connect(), and store them in the vmbus_connection
data structure.  The physical address should include the shared_gpa_boundary offset
in the case of an isolated VM.  At this point the two virtual addresses are the same.

2) vmbus_negotiate_version() just grabs the physical address from the
vmbus_connection data structure.  It doesn't make any changes to the virtual
or physical addresses, which keeps it focused just on version negotiation.

3) Once vmbus_negotiate_version() is done, vmbus_connect() can determine
the remapped virtual address, and store that.  It can also change the visibility
of the two pages using the previously stored physical address.

4) vmbus_disconnect() can do the memunmaps() and change the visibility if needed,
and then free the memory using the address from the original allocation in Step 1.

>  	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
>  	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
>  	vmbus_connection.monitor_pages[0] = NULL;
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 42f3d9d123a1..40bc0eff6665 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -240,6 +240,7 @@ struct vmbus_connection {
>  	 * is child->parent notification
>  	 */
>  	struct hv_monitor_page *monitor_pages[2];
> +	void *monitor_pages_va[2];
>  	struct list_head chn_msg_list;
>  	spinlock_t channelmsg_lock;
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.
  2021-08-12 19:18   ` Michael Kelley
@ 2021-08-14 13:32     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-14 13:32 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen


On 8/13/2021 3:18 AM, Michael Kelley wrote:
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
>> Subject: [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.
> 
> As with Patch 1, use the "x86/hyperv:" tag in the Subject line.
> 
>>
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Hyper-V exposes shared memory boundary via cpuid
>> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
>> shared_gpa_boundary of ms_hyperv struct. This prepares
>> to share memory with host for SNP guest.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>>   arch/x86/kernel/cpu/mshyperv.c |  2 ++
>>   include/asm-generic/mshyperv.h | 12 +++++++++++-
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index 6b5835a087a3..2b7f396ef1a5 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>>   	if (ms_hyperv.priv_high & HV_ISOLATION) {
>>   		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>>   		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
>> +		ms_hyperv.shared_gpa_boundary =
>> +			(u64)1 << ms_hyperv.shared_gpa_boundary_bits;
> 
> You could use BIT_ULL() here, but it's kind of a shrug.


Good suggestion. Thanks.

> 
>>
>>   		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>>   			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
>> index 4269f3174e58..aa26d24a5ca9 100644
>> --- a/include/asm-generic/mshyperv.h
>> +++ b/include/asm-generic/mshyperv.h
>> @@ -35,8 +35,18 @@ struct ms_hyperv_info {
>>   	u32 max_vp_index;
>>   	u32 max_lp_index;
>>   	u32 isolation_config_a;
>> -	u32 isolation_config_b;
>> +	union {
>> +		u32 isolation_config_b;
>> +		struct {
>> +			u32 cvm_type : 4;
>> +			u32 Reserved11 : 1;
>> +			u32 shared_gpa_boundary_active : 1;
>> +			u32 shared_gpa_boundary_bits : 6;
>> +			u32 Reserved12 : 20;
> 
> Any reason to name the reserved fields as "11" and "12"?  It
> just looks a bit unusual.  And I'd suggest lowercase "r".
> 

Yes, will update in the next version.

>> +		};
>> +	};
>>   	void  __percpu **ghcb_base;
>> +	u64 shared_gpa_boundary;
>>   };
>>   extern struct ms_hyperv_info ms_hyperv;
>>
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
  2021-08-12 22:20   ` Michael Kelley
@ 2021-08-15 15:21     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-15 15:21 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen



On 8/13/2021 6:20 AM, Michael Kelley wrote:
>> @@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   	if (ret)
>>   		return ret;
>>
>> +	ret = set_memory_decrypted((unsigned long)kbuffer,
>> +				   HVPFN_UP(size));
>> +	if (ret) {
>> +		pr_warn("Failed to set host visibility.\n");
> Enhance this message a bit.  "Failed to set host visibility for new GPADL\n"
> and also output the value of ret.

OK. This looks better. Thanks.

> 
>> @@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   	/* At this point, we received the gpadl created msg */
>>   	*gpadl_handle = gpadlmsg->gpadl;
>>
>> +	channel->gpadl_array[index].size = size;
>> +	channel->gpadl_array[index].buffer = kbuffer;
>> +	channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
>> +
> I can see the merits of transparently stashing the memory address and size
> that will be needed by vmbus_teardown_gpadl(), so that the callers of
> __vmbus_establish_gpadl() don't have to worry about it.  But doing the
> stashing transparently is somewhat messy.
> 
> Given that the callers are already have memory allocated to save the
> GPADL handle, a little refactoring would make for a much cleaner solution.
> Instead of having memory allocated for the 32-bit GPADL handle, callers
> should allocate the slightly larger struct vmbus_gpadl that you've
> defined below.  The calling interfaces can be updated to take a pointer
> to this structure instead of a pointer to the 32-bit GPADL handle, and
> you can save the memory address and size right along with the GPADL
> handle.  This approach touches a few more files, but I think there are
> only two callers outside of the channel management code -- netvsc
> and hv_uio -- so it's not a big change.

Yes, this is a good suggestion and Will update in the next version.

> 
>> @@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>>   	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
>>
>>   	kfree(info);
>> +
>> +	/* Find gpadl buffer virtual address and size. */
>> +	for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
>> +		if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
>> +			break;
>> +
>> +	if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
>> +			HVPFN_UP(channel->gpadl_array[i].size)))
>> +		pr_warn("Fail to set mem host visibility.\n");
> Enhance this message a bit: "Failed to set host visibility in GPADL teardown\n".
> Also output the returned error code to help in debugging any occurrences of
> a failure
Yes, agree. Will update.

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-13 17:58     ` Tianyu Lan
@ 2021-08-16 14:50       ` Tianyu Lan
  2021-08-19  8:49         ` Christoph Hellwig
  0 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-16 14:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On 8/14/2021 1:58 AM, Tianyu Lan wrote:
> On 8/12/2021 8:27 PM, Christoph Hellwig wrote:
>> This is still broken.  You need to make sure the actual DMA allocations
>> do have struct page backing.
>>
> 
> Hi Christoph:
>       swiotlb_tbl_map_single() still returns PA below vTOM/share_gpa_ > boundary. These PAs has backing pages and belong to system memory.
> In other word, all PAs passed to DMA API have backing pages and these is 
> no difference between Isolation guest and traditional guest for DMA API.
> The new mapped VA for PA above vTOM here is just to access the bounce 
> buffer in the swiotlb code and isn't exposed to outside.

Hi Christoph:
       Sorry to bother you.Please double check with these two patches
" [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function 
for HV IVM" and "[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_
unmap_encrypted() function".
       The swiotlb bounce buffer in the isolation VM are allocated in the
low end memory and these memory has struct page backing. All dma address
returned by swiotlb/DMA API are low end memory and this is as same as 
what happen in the traditional VM.So this means all PAs passed to DMA 
API have struct page backing. The difference in Isolation VM is to 
access bounce buffer via address space above vTOM/shared_guest_memory
_boundary. To access bounce buffer shared with host, the guest needs to
mark the memory visible to host via hypercall and map bounce buffer in 
the extra address space(PA + shared_guest_memory_boundary). The vstart
introduced in this patch is to store va of extra address space and it's 
only used to access bounce buffer in the swiotlb_bounce(). The PA in 
extra space is only in the Hyper-V map function and won't be passed to 
DMA API or other components.
       The API dma_map_decrypted() introduced in the patch 9 is to map 
the bounce buffer in the extra space and these memory in the low end 
space are used as DMA memory in the driver. Do you prefer these APIs
still in the set_memory.c? I move the API to dma/mapping.c due to the
suggested name arch_dma_map_decrypted() in the previous mail
(https://lore.kernel.org/netdev/20210720135437.GA13554@lst.de/).
       If there are something unclear, please let me know. Hope this
still can catch the merge window.

Thanks.





^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
  2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (12 preceding siblings ...)
  2021-08-09 17:56 ` [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
@ 2021-08-16 14:55 ` Michael Kelley
  13 siblings, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-16 14:55 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 
> Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
> security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
> is to add support for these Isolation VM support in Linux.
> 

A general comment about this series:  I have not seen any statements
made about whether either type of Isolated VM is supported for 32-bit
Linux guests.   arch/x86/Kconfig has CONFIG_AMD_MEM_ENCRYPT as
64-bit only, so evidently SEV-SNP Isolated VMs would be 64-bit only.
But I don't know if VBS VMs are any different.

I didn't track down what happens if a 32-bit Linux is booted in
a VM that supports SEV-SNP.  Presumably some kind of message
is output that no encryption is being done.  But at a slightly
higher level, the Hyper-V initialization path should probably
also check for 32-bit and output a clear message that no isolation
is being provided.  At that point, I don't know if it is possible to
continue in non-isolated mode or whether the only choice is to
panic.  Continuing in non-isolated mode might be a bad idea
anyway since presumably the user has explicitly requested an
Isolated VM.

Related, I noticed usage of "unsigned long" for holding physical
addresses, which works when running 64-bit, but not when running
32-bit.  But even if Isolated VMs are always 64-bit, it would be still be
better to clean this up and use phys_addr_t instead.  Unfortunately,
more generic functions like set_memory_encrypted() and
set_memory_decrypted() have physical address arguments that
are of type unsigned long.

Michael

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-08-09 17:56 ` [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
@ 2021-08-16 17:28   ` Michael Kelley
  2021-08-17 15:36     ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-16 17:28 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 
> VMbus ring buffer are shared with host and it's need to

s/it's need/it needs/

> be accessed via extra address space of Isolation VM with
> SNP support. This patch is to map the ring buffer
> address in extra address space via ioremap(). HV host

It's actually using vmap_pfn(), not ioremap().

> visibility hvcall smears data in the ring buffer and
> so reset the ring buffer memory to zero after calling
> visibility hvcall.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/hv/Kconfig        |  1 +
>  drivers/hv/channel.c      | 10 +++++
>  drivers/hv/hyperv_vmbus.h |  2 +
>  drivers/hv/ring_buffer.c  | 84 ++++++++++++++++++++++++++++++---------
>  4 files changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index d1123ceb38f3..dd12af20e467 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -8,6 +8,7 @@ config HYPERV
>  		|| (ARM64 && !CPU_BIG_ENDIAN))
>  	select PARAVIRT
>  	select X86_HV_CALLBACK_VECTOR if X86
> +	select VMAP_PFN
>  	help
>  	  Select this option to run Linux as a Hyper-V client operating
>  	  system.
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 4c4717c26240..60ef881a700c 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	if (err)
>  		goto error_clean_ring;
> 
> +	err = hv_ringbuffer_post_init(&newchannel->outbound,
> +				      page, send_pages);
> +	if (err)
> +		goto error_free_gpadl;
> +
> +	err = hv_ringbuffer_post_init(&newchannel->inbound,
> +				      &page[send_pages], recv_pages);
> +	if (err)
> +		goto error_free_gpadl;
> +
>  	/* Create and init the channel open message */
>  	open_info = kzalloc(sizeof(*open_info) +
>  			   sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 40bc0eff6665..15cd23a561f3 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
>  /* Interface */
> 
>  void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +		struct page *pages, u32 page_cnt);
> 
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  		       struct page *pages, u32 pagecnt, u32 max_pkt_size);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 2aee356840a2..d4f93fca1108 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include <linux/vmalloc.h>
>  #include <linux/slab.h>
>  #include <linux/prefetch.h>
> +#include <linux/io.h>
> +#include <asm/mshyperv.h>
> 
>  #include "hyperv_vmbus.h"
> 
> @@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
>  	mutex_init(&channel->outbound.ring_buffer_mutex);
>  }
> 
> -/* Initialize the ring buffer. */
> -int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> -		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +		       struct page *pages, u32 page_cnt)
>  {
> +	u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
> +	unsigned long *pfns_wraparound;
> +	void *vaddr;
>  	int i;
> -	struct page **pages_wraparound;
> 
> -	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
> +	if (!hv_isolation_type_snp())
> +		return 0;
> +
> +	physic_addr += ms_hyperv.shared_gpa_boundary;
> 
>  	/*
>  	 * First page holds struct hv_ring_buffer, do wraparound mapping for
>  	 * the rest.
>  	 */
> -	pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> +	pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
>  				   GFP_KERNEL);
> -	if (!pages_wraparound)
> +	if (!pfns_wraparound)
>  		return -ENOMEM;
> 
> -	pages_wraparound[0] = pages;
> +	pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
>  	for (i = 0; i < 2 * (page_cnt - 1); i++)
> -		pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
> -
> -	ring_info->ring_buffer = (struct hv_ring_buffer *)
> -		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> -
> -	kfree(pages_wraparound);
> +		pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
> +			i % (page_cnt - 1) + 1;
> 
> -
> -	if (!ring_info->ring_buffer)
> +	vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
> +	kfree(pfns_wraparound);
> +	if (!vaddr)
>  		return -ENOMEM;
> 
> -	ring_info->ring_buffer->read_index =
> -		ring_info->ring_buffer->write_index = 0;
> +	/* Clean memory after setting host visibility. */
> +	memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
> +
> +	ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
> +	ring_info->ring_buffer->read_index = 0;
> +	ring_info->ring_buffer->write_index = 0;
> 
>  	/* Set the feature bit for enabling flow control. */
>  	ring_info->ring_buffer->feature_bits.value = 1;
> 
> +	return 0;
> +}
> +
> +/* Initialize the ring buffer. */
> +int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> +		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
> +{
> +	int i;
> +	struct page **pages_wraparound;
> +
> +	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
> +
> +	if (!hv_isolation_type_snp()) {
> +		/*
> +		 * First page holds struct hv_ring_buffer, do wraparound mapping for
> +		 * the rest.
> +		 */
> +		pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> +					   GFP_KERNEL);
> +		if (!pages_wraparound)
> +			return -ENOMEM;
> +
> +		pages_wraparound[0] = pages;
> +		for (i = 0; i < 2 * (page_cnt - 1); i++)
> +			pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
> +
> +		ring_info->ring_buffer = (struct hv_ring_buffer *)
> +			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> +
> +		kfree(pages_wraparound);
> +
> +		if (!ring_info->ring_buffer)
> +			return -ENOMEM;
> +
> +		ring_info->ring_buffer->read_index =
> +			ring_info->ring_buffer->write_index = 0;
> +
> +		/* Set the feature bit for enabling flow control. */
> +		ring_info->ring_buffer->feature_bits.value = 1;
> +	}
> +
>  	ring_info->ring_size = page_cnt << PAGE_SHIFT;
>  	ring_info->ring_size_div10_reciprocal =
>  		reciprocal_value(ring_info->ring_size / 10);
> --
> 2.25.1

This patch does the following:

1) The existing ring buffer wrap-around mapping functionality is still
executed in hv_ringbuffer_init() when not doing SNP isolation.
This mapping is based on an array of struct page's that describe the
contiguous physical memory.

2) New ring buffer wrap-around mapping functionality is added in
hv_ringbuffer_post_init() for the SNP isolation case.  The case is
handled in hv_ringbuffer_post_init() because it must be done after
the GPADL is established, since that's where the host visibility
is set.  What's interesting is that this case is exactly the same
as #1 above, except that the mapping is based on physical
memory addresses instead of struct page's.  We have to use physical
addresses because of applying the GPA boundary, and there are no
struct page's for those physical addresses.

Unfortunately, this duplicates a lot of logic in #1 and #2, except
for the struct page vs. physical address difference.

Proposal:  Couldn't we always do #2, even for the normal case
where SNP isolation is not being used?   The difference would
only be in whether the GPA boundary is added.  And it looks like
the normal case could be done after the GPADL is established,
as setting up the GPADL doesn't have any dependencies on
having the ring buffer mapped.  This approach would remove
a lot of duplication.  Just move the calls to hv_ringbuffer_init()
to after the GPADL is established, and do all the work there for
both cases.

Michael

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-08-16 17:28   ` Michael Kelley
@ 2021-08-17 15:36     ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-17 15:36 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen



On 8/17/2021 1:28 AM, Michael Kelley wrote:
> This patch does the following:
> 
> 1) The existing ring buffer wrap-around mapping functionality is still
> executed in hv_ringbuffer_init() when not doing SNP isolation.
> This mapping is based on an array of struct page's that describe the
> contiguous physical memory.
> 
> 2) New ring buffer wrap-around mapping functionality is added in
> hv_ringbuffer_post_init() for the SNP isolation case.  The case is
> handled in hv_ringbuffer_post_init() because it must be done after
> the GPADL is established, since that's where the host visibility
> is set.  What's interesting is that this case is exactly the same
> as #1 above, except that the mapping is based on physical
> memory addresses instead of struct page's.  We have to use physical
> addresses because of applying the GPA boundary, and there are no
> struct page's for those physical addresses.
> 
> Unfortunately, this duplicates a lot of logic in #1 and #2, except
> for the struct page vs. physical address difference.
> 
> Proposal:  Couldn't we always do #2, even for the normal case
> where SNP isolation is not being used?   The difference would
> only be in whether the GPA boundary is added.  And it looks like
> the normal case could be done after the GPADL is established,
> as setting up the GPADL doesn't have any dependencies on
> having the ring buffer mapped.  This approach would remove
> a lot of duplication.  Just move the calls to hv_ringbuffer_init()
> to after the GPADL is established, and do all the work there for
> both cases.
> 

Hi Michael:
     Thanks for suggestion. I just keep the original logic in current
code. I will try combining these two functions and report back.

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-16 14:50       ` Tianyu Lan
@ 2021-08-19  8:49         ` Christoph Hellwig
  2021-08-19  9:59           ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Hellwig @ 2021-08-19  8:49 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Christoph Hellwig, kys, haiyangz, sthemmin, wei.liu, decui, tglx,
	mingo, bp, x86, hpa, dave.hansen, luto, peterz, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, joro, will, davem, kuba,
	jejb, martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On Mon, Aug 16, 2021 at 10:50:26PM +0800, Tianyu Lan wrote:
> Hi Christoph:
>       Sorry to bother you.Please double check with these two patches
> " [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function 
> for HV IVM" and "[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_
> unmap_encrypted() function".

Do you have a git tree somewhere to look at the whole tree?

>       The swiotlb bounce buffer in the isolation VM are allocated in the
> low end memory and these memory has struct page backing. All dma address
> returned by swiotlb/DMA API are low end memory and this is as same as what 
> happen in the traditional VM.

Indeed.

>       The API dma_map_decrypted() introduced in the patch 9 is to map the 
> bounce buffer in the extra space and these memory in the low end space are 
> used as DMA memory in the driver. Do you prefer these APIs
> still in the set_memory.c? I move the API to dma/mapping.c due to the
> suggested name arch_dma_map_decrypted() in the previous mail
> (https://lore.kernel.org/netdev/20210720135437.GA13554@lst.de/).

Well, what would help is a clear description of the semantics.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-19  8:49         ` Christoph Hellwig
@ 2021-08-19  9:59           ` Tianyu Lan
  2021-08-19 10:02             ` Christoph Hellwig
  0 siblings, 1 reply; 64+ messages in thread
From: Tianyu Lan @ 2021-08-19  9:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen



On 8/19/2021 4:49 PM, Christoph Hellwig wrote:
> On Mon, Aug 16, 2021 at 10:50:26PM +0800, Tianyu Lan wrote:
>> Hi Christoph:
>>        Sorry to bother you.Please double check with these two patches
>> " [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function
>> for HV IVM" and "[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_
>> unmap_encrypted() function".
> 
> Do you have a git tree somewhere to look at the whole tree?

Yes, here is my github link for these two patches.

https://github.com/lantianyu/linux/commit/462f7e4e44644fe7e182f7a5fb043a75acb90ee5

https://github.com/lantianyu/linux/commit/c8de236bf4366d39e8b98e5a091c39df29b03e0b

> 
>>        The swiotlb bounce buffer in the isolation VM are allocated in the
>> low end memory and these memory has struct page backing. All dma address
>> returned by swiotlb/DMA API are low end memory and this is as same as what
>> happen in the traditional VM.
> 
> Indeed.
> 
>>        The API dma_map_decrypted() introduced in the patch 9 is to map the
>> bounce buffer in the extra space and these memory in the low end space are
>> used as DMA memory in the driver. Do you prefer these APIs
>> still in the set_memory.c? I move the API to dma/mapping.c due to the
>> suggested name arch_dma_map_decrypted() in the previous mail
>> (https://lore.kernel.org/netdev/20210720135437.GA13554@lst.de/).
> 
> Well, what would help is a clear description of the semantics.
> 

Yes, I will improve description.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-19  9:59           ` Tianyu Lan
@ 2021-08-19 10:02             ` Christoph Hellwig
  2021-08-19 10:03               ` Tianyu Lan
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Hellwig @ 2021-08-19 10:02 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Christoph Hellwig, kys, haiyangz, sthemmin, wei.liu, decui, tglx,
	mingo, bp, x86, hpa, dave.hansen, luto, peterz, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, joro, will, davem, kuba,
	jejb, martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On Thu, Aug 19, 2021 at 05:59:02PM +0800, Tianyu Lan wrote:
>
>
> On 8/19/2021 4:49 PM, Christoph Hellwig wrote:
>> On Mon, Aug 16, 2021 at 10:50:26PM +0800, Tianyu Lan wrote:
>>> Hi Christoph:
>>>        Sorry to bother you.Please double check with these two patches
>>> " [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function
>>> for HV IVM" and "[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_
>>> unmap_encrypted() function".
>>
>> Do you have a git tree somewhere to look at the whole tree?
>
> Yes, here is my github link for these two patches.
>
> https://github.com/lantianyu/linux/commit/462f7e4e44644fe7e182f7a5fb043a75acb90ee5
>
> https://github.com/lantianyu/linux/commit/c8de236bf4366d39e8b98e5a091c39df29b03e0b

Which branch is this?


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  2021-08-19 10:02             ` Christoph Hellwig
@ 2021-08-19 10:03               ` Tianyu Lan
  0 siblings, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-19 10:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On 8/19/2021 6:02 PM, Christoph Hellwig wrote:
> On Thu, Aug 19, 2021 at 05:59:02PM +0800, Tianyu Lan wrote:
>>
>>
>> On 8/19/2021 4:49 PM, Christoph Hellwig wrote:
>>> On Mon, Aug 16, 2021 at 10:50:26PM +0800, Tianyu Lan wrote:
>>>> Hi Christoph:
>>>>         Sorry to bother you.Please double check with these two patches
>>>> " [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function
>>>> for HV IVM" and "[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_
>>>> unmap_encrypted() function".
>>>
>>> Do you have a git tree somewhere to look at the whole tree?
>>
>> Yes, here is my github link for these two patches.
>>
>> https://github.com/lantianyu/linux/commit/462f7e4e44644fe7e182f7a5fb043a75acb90ee5
>>
>> https://github.com/lantianyu/linux/commit/c8de236bf4366d39e8b98e5a091c39df29b03e0b
> 
> Which branch is this?
> 

https://github.com/lantianyu/linux/tree/isolationv3

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  2021-08-09 17:56 ` [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
@ 2021-08-19 18:11   ` Michael Kelley
  2021-08-20  4:13     ` hch
  2021-08-20  9:32     ` Tianyu Lan
  0 siblings, 2 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-19 18:11 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 
> Hyper-V Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Swiotlb bounce buffer code calls dma_map_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space. Populate dma memory decrypted ops with hv
> map/unmap function.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> The map function vmap_pfn() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/ivm.c           | 28 ++++++++++++++
>  arch/x86/include/asm/mshyperv.h |  2 +
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c          |  3 ++
>  drivers/iommu/hyperv-iommu.c    | 65 +++++++++++++++++++++++++++++++++
>  include/linux/hyperv.h          |  1 +
>  6 files changed, 101 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index c13ec5560d73..0f05e4d6fc62 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -265,3 +265,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible)
> 
>  	return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
>  }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +void *hv_map_memory(void *addr, unsigned long size)
> +{
> +	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
> +				      sizeof(unsigned long), GFP_KERNEL);
> +	void *vaddr;
> +	int i;
> +
> +	if (!pfns)
> +		return NULL;
> +
> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
> +		pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
> +			(ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
> +
> +	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE,	PAGE_KERNEL_IO);
> +	kfree(pfns);
> +
> +	return vaddr;
> +}

This function is manipulating page tables in the guest VM.  It is not involved
in communicating with Hyper-V, or passing PFNs to Hyper-V.  The pfn array
contains guest PFNs, not Hyper-V PFNs.  So it should use PAGE_SIZE
instead of HV_HYP_PAGE_SIZE, and similarly PAGE_SHIFT and virt_to_pfn().
If this code were ever to run on ARM64 in the future with PAGE_SIZE other
than 4 Kbytes, the use of PAGE_SIZE is correct choice.

> +
> +void hv_unmap_memory(void *addr)
> +{
> +	vunmap(addr);
> +}
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index a30c60f189a3..b247739f57ac 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  			   enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);
>  void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
>  void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include <linux/dma-map-ops.h>
>  #include <linux/pci.h>
> +#include <linux/hyperv.h>
>  #include <xen/swiotlb-xen.h>
> 
>  #include <asm/xen/hypervisor.h>
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -		  NULL,
> +		  hyperv_swiotlb_detect,
>  		  pci_xen_swiotlb_init,
>  		  NULL);
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 57bbbaa4e8f7..f068e22a5636 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -23,6 +23,7 @@
>  #include <linux/cpu.h>
>  #include <linux/sched/task_stack.h>
> 
> +#include <linux/dma-map-ops.h>
>  #include <linux/delay.h>
>  #include <linux/notifier.h>
>  #include <linux/panic_notifier.h>
> @@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
>  	return child_device_obj;
>  }
> 
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
>  /*
>   * vmbus_device_register - Register the child device
>   */
> @@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
>  	}
>  	hv_debug_add_dev_dir(child_device_obj);
> 
> +	child_device_obj->device.dma_mask = &vmbus_dma_mask;
>  	return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index e285a220c913..01e874b3b43a 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -13,14 +13,22 @@
>  #include <linux/irq.h>
>  #include <linux/iommu.h>
>  #include <linux/module.h>
> +#include <linux/hyperv.h>
> +#include <linux/io.h>
> 
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hw_irq.h>
>  #include <asm/io_apic.h>
> +#include <asm/iommu.h>
> +#include <asm/iommu_table.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/hypervisor.h>
>  #include <asm/mshyperv.h>
> +#include <asm/swiotlb.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-direct.h>
> +#include <linux/set_memory.h>
> 
>  #include "irq_remapping.h"
> 
> @@ -36,6 +44,9 @@
>  static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
>  static struct irq_domain *ioapic_ir_domain;
> 
> +static unsigned long hyperv_io_tlb_size;
> +static void *hyperv_io_tlb_start;
> +
>  static int hyperv_ir_set_affinity(struct irq_data *data,
>  		const struct cpumask *mask, bool force)
>  {
> @@ -337,4 +348,58 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
>  	.free = hyperv_root_irq_remapping_free,
>  };
> 
> +void __init hyperv_iommu_swiotlb_init(void)
> +{
> +	unsigned long bytes;
> +
> +	/*
> +	 * Allocate Hyper-V swiotlb bounce buffer at early place
> +	 * to reserve large contiguous memory.
> +	 */
> +	hyperv_io_tlb_size = 256 * 1024 * 1024;

A hard coded size here seems problematic.   The memory size of
Isolated VMs can vary by orders of magnitude.  I see that
xen_swiotlb_init() uses swiotlb_size_or_default(), which at least
pays attention to the value specified on the kernel boot line.

Another example is sev_setup_arch(), which in the native case sets
the size to 6% of main memory, with a max of 1 Gbyte.  This is
the case that's closer to Isolated VMs, so doing something
similar could be a good approach.

> +	hyperv_io_tlb_start =
> +			memblock_alloc_low(
> +				  PAGE_ALIGN(hyperv_io_tlb_size),
> +				  HV_HYP_PAGE_SIZE);
> +
> +	if (!hyperv_io_tlb_start) {
> +		pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
> +		return;
> +	}
> +}
> +
> +int __init hyperv_swiotlb_detect(void)
> +{
> +	if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
> +	    && hv_is_isolation_supported()) {
> +		/*
> +		 * Enable swiotlb force mode in Isolation VM to
> +		 * use swiotlb bounce buffer for dma transaction.
> +		 */
> +		swiotlb_force = SWIOTLB_FORCE;
> +
> +		dma_memory_generic_decrypted_ops.map = hv_map_memory;
> +		dma_memory_generic_decrypted_ops.unmap = hv_unmap_memory;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +void __init hyperv_iommu_swiotlb_later_init(void)
> +{
> +	/*
> +	 * Swiotlb bounce buffer needs to be mapped in extra address
> +	 * space. Map function doesn't work in the early place and so
> +	 * call swiotlb_late_init_with_tbl() here.
> +	 */
> +	if (swiotlb_late_init_with_tbl(hyperv_io_tlb_start,
> +				       hyperv_io_tlb_size >> IO_TLB_SHIFT))
> +		panic("Fail to initialize hyperv swiotlb.\n");
> +}
> +
> +IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
> +		  NULL, hyperv_iommu_swiotlb_init,
> +		  hyperv_iommu_swiotlb_later_init);
> +
>  #endif
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 90b542597143..83fa567ad594 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1744,6 +1744,7 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
>  int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
>  				void (*block_invalidate)(void *context,
>  							 u64 block_mask));
> +int __init hyperv_swiotlb_detect(void);
> 
>  struct hyperv_pci_block_ops {
>  	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-08-09 17:56 ` [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
@ 2021-08-19 18:14   ` Michael Kelley
  2021-08-20  4:21     ` hch
  2021-08-20 18:20     ` Tianyu Lan
  0 siblings, 2 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-19 18:14 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 

The Subject line tag should be "hv_netvsc:".

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() still need to handle. Use DMA API to map/umap these
> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb function to allocate bounce buffer and copy data
> from/to bounce buffer.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/net/hyperv/hyperv_net.h   |   6 ++
>  drivers/net/hyperv/netvsc.c       | 144 +++++++++++++++++++++++++++++-
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h            |   5 ++
>  4 files changed, 154 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index bc48855dff10..862419912bfb 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>  	u32 total_bytes;
>  	u32 send_buf_index;
>  	u32 total_data_buflen;
> +	struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>  	/* Receive buffer allocated by us but manages by NetVSP */
>  	void *recv_buf;
> +	void *recv_original_buf;
>  	u32 recv_buf_size; /* allocated bytes */
>  	u32 recv_buf_gpadl_handle;
>  	u32 recv_section_cnt;
> @@ -1082,6 +1084,8 @@ struct netvsc_device {
> 
>  	/* Send buffer allocated by us */
>  	void *send_buf;
> +	void *send_original_buf;
> +	u32 send_buf_size;
>  	u32 send_buf_gpadl_handle;
>  	u32 send_section_cnt;
>  	u32 send_section_size;
> @@ -1730,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI	10000
>  #define RETRY_MAX	2000	/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +		      struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 7bd935412853..fc312e5db4d5 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
>  	int i;
> 
>  	kfree(nvdev->extension);
> -	vfree(nvdev->recv_buf);
> -	vfree(nvdev->send_buf);
> +
> +	if (nvdev->recv_original_buf) {
> +		vunmap(nvdev->recv_buf);
> +		vfree(nvdev->recv_original_buf);
> +	} else {
> +		vfree(nvdev->recv_buf);
> +	}
> +
> +	if (nvdev->send_original_buf) {
> +		vunmap(nvdev->send_buf);
> +		vfree(nvdev->send_original_buf);
> +	} else {
> +		vfree(nvdev->send_buf);
> +	}
> +
>  	kfree(nvdev->send_section_map);
> 
>  	for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx)
>  	return nvchan->mrc.slots ? 0 : -ENOMEM;
>  }
> 
> +static void *netvsc_remap_buf(void *buf, unsigned long size)
> +{
> +	unsigned long *pfns;
> +	void *vaddr;
> +	int i;
> +
> +	pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
> +		       GFP_KERNEL);

This assumes that the "size" argument is a multiple of PAGE_SIZE.  I think
that's true in all the use cases, but it would be safer to check.

> +	if (!pfns)
> +		return NULL;
> +
> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
> +		pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
> +			+ (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
> +
> +	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
> +	kfree(pfns);
> +
> +	return vaddr;
> +}

This function appears to be a duplicate of hv_map_memory() in Patch 11 of this
series.  Is it possible to structure things so there is only one implementation?  In
any case, see the comment in hv_map_memory() about PAGE_SIZE vs
HV_HYP_PAGE_SIZE and similar.

> +
>  static int netvsc_init_buf(struct hv_device *device,
>  			   struct netvsc_device *net_device,
>  			   const struct netvsc_device_info *device_info)
> @@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
>  	unsigned int buf_size;
>  	size_t map_words;
>  	int i, ret = 0;
> +	void *vaddr;
> 
>  	/* Get receive buffer area. */
>  	buf_size = device_info->recv_sections * device_info->recv_section_size;
> @@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
>  		goto cleanup;
>  	}
> 
> +	if (hv_isolation_type_snp()) {
> +		vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
> +		if (!vaddr)
> +			goto cleanup;
> +
> +		net_device->recv_original_buf = net_device->recv_buf;
> +		net_device->recv_buf = vaddr;
> +	}
> +
>  	/* Notify the NetVsp of the gpadl handle */
>  	init_packet = &net_device->channel_init_pkt;
>  	memset(init_packet, 0, sizeof(struct nvsp_message));
> @@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
>  		goto cleanup;
>  	}
> 
> +	if (hv_isolation_type_snp()) {
> +		vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
> +		if (!vaddr)
> +			goto cleanup;

I don't think this error case is handled correctly.  Doesn't the remapping
of the recv buf need to be undone?

> +
> +		net_device->send_original_buf = net_device->send_buf;
> +		net_device->send_buf = vaddr;
> +	}
> +
>  	/* Notify the NetVsp of the gpadl handle */
>  	init_packet = &net_device->channel_init_pkt;
>  	memset(init_packet, 0, sizeof(struct nvsp_message));
> @@ -767,7 +820,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
> 
>  	/* Notify the layer above us */
>  	if (likely(skb)) {
> -		const struct hv_netvsc_packet *packet
> +		struct hv_netvsc_packet *packet
>  			= (struct hv_netvsc_packet *)skb->cb;
>  		u32 send_index = packet->send_buf_index;
>  		struct netvsc_stats *tx_stats;
> @@ -783,6 +836,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
>  		tx_stats->bytes += packet->total_bytes;
>  		u64_stats_update_end(&tx_stats->syncp);
> 
> +		netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>  		napi_consume_skb(skb, budget);
>  	}
> 
> @@ -947,6 +1001,82 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
>  		memset(dest, 0, padding);
>  }
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +		      struct hv_netvsc_packet *packet)
> +{
> +	u32 page_count = packet->cp_partial ?
> +		packet->page_buf_cnt - packet->rmsg_pgcnt :
> +		packet->page_buf_cnt;
> +	int i;
> +
> +	if (!hv_is_isolation_supported())
> +		return;
> +
> +	if (!packet->dma_range)
> +		return;
> +
> +	for (i = 0; i < page_count; i++)
> +		dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
> +				 packet->dma_range[i].mapping_size,
> +				 DMA_TO_DEVICE);
> +
> +	kfree(packet->dma_range);
> +}
> +
> +/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
> + * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
> + * VM.
> + *
> + * In isolation VM, netvsc send buffer has been marked visible to
> + * host and so the data copied to send buffer doesn't need to use
> + * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
> + * may not be copied to send buffer and so these pages need to be
> + * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
> + * that. The pfns in the struct hv_page_buffer need to be converted
> + * to bounce buffer's pfn. The loop here is necessary and so not
> + * use dma_map_sg() here.

I think I understand why the loop is necessary, but it would be
nice to add a bit more comment text to explain.  The reason is
that the entries in the page buffer array are not necessarily full
pages of data.  Each entry in the array has a separate offset and
len that may be non-zero, even for entries in the middle of the
array.   And the entries are not physically contiguous.  So each
entry must be individually mapped rather than as a contiguous unit.

> + */
> +int netvsc_dma_map(struct hv_device *hv_dev,
> +		   struct hv_netvsc_packet *packet,
> +		   struct hv_page_buffer *pb)
> +{
> +	u32 page_count =  packet->cp_partial ?
> +		packet->page_buf_cnt - packet->rmsg_pgcnt :
> +		packet->page_buf_cnt;
> +	dma_addr_t dma;
> +	int i;
> +
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	packet->dma_range = kcalloc(page_count,
> +				    sizeof(*packet->dma_range),
> +				    GFP_KERNEL);
> +	if (!packet->dma_range)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < page_count; i++) {
> +		char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> +					 + pb[i].offset);
> +		u32 len = pb[i].len;
> +
> +		dma = dma_map_single(&hv_dev->device, src, len,
> +				     DMA_TO_DEVICE);
> +		if (dma_mapping_error(&hv_dev->device, dma)) {
> +			kfree(packet->dma_range);
> +			return -ENOMEM;
> +		}
> +
> +		packet->dma_range[i].dma = dma;
> +		packet->dma_range[i].mapping_size = len;
> +		pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
> +		pb[i].offset = offset_in_hvpage(dma);
> +		pb[i].len = len;
> +	}
> +
> +	return 0;
> +}
> +
>  static inline int netvsc_send_pkt(
>  	struct hv_device *device,
>  	struct hv_netvsc_packet *packet,
> @@ -987,14 +1117,22 @@ static inline int netvsc_send_pkt(
> 
>  	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
> 
> +	packet->dma_range = NULL;
>  	if (packet->page_buf_cnt) {
>  		if (packet->cp_partial)
>  			pb += packet->rmsg_pgcnt;
> 
> +		ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> +		if (ret)
> +			return ret;

I think this error case needs to set things up so sending the packet
can be retried at the higher levels.  The typical error is that
swiotlb is out of bounce buffer memory.  That's a transient
condition.  There's already code in this function to retry when
the vmbus_sendpacket functions fails because the ring buffer
is full, and running out of bounce buffer memory should probably
take the same path.

> +
>  		ret = vmbus_sendpacket_pagebuffer(out_channel,
>  						  pb, packet->page_buf_cnt,
>  						  &nvmsg, sizeof(nvmsg),
>  						  req_id);
> +
> +		if (ret)
> +			netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>  	} else {
>  		ret = vmbus_sendpacket(out_channel,
>  				       &nvmsg, sizeof(nvmsg),
> diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
> index f6c9c2a670f9..448fcc325ed7 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct net_device *ndev,
>  			}
>  		}
> 
> +		netvsc_dma_unmap(((struct net_device_context *)
> +			netdev_priv(ndev))->device_ctx, &request->pkt);
>  		complete(&request->wait_event);
>  	} else {
>  		netdev_err(ndev,
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 83fa567ad594..2ea638101645 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1601,6 +1601,11 @@ struct hyperv_service_callback {
>  	void (*callback)(void *context);
>  };
> 
> +struct hv_dma_range {
> +	dma_addr_t dma;
> +	u32 mapping_size;
> +};
> +
>  #define MAX_SRV_VER	0x7ffffff
>  extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf, u32 buflen,
>  				const int *fw_version, int fw_vercnt,
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-09 17:56 ` [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
@ 2021-08-19 18:17   ` Michael Kelley
  2021-08-20  4:32     ` hch
  2021-08-20 15:20     ` Tianyu Lan
  0 siblings, 2 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-19 18:17 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> 

Subject line tag should be "scsi: storvsc:"

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still need to handle. Use DMA API to map/umap these

s/need to handle/needs to be handled/

> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb function to allocate bounce buffer and copy data
> from/to bounce buffer.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/scsi/storvsc_drv.c | 68 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 328bb961c281..78320719bdd8 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include <linux/device.h>
>  #include <linux/hyperv.h>
>  #include <linux/blkdev.h>
> +#include <linux/io.h>
> +#include <linux/dma-mapping.h>
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
>  #include <scsi/scsi_host.h>
> @@ -427,6 +429,8 @@ struct storvsc_cmd_request {
>  	u32 payload_sz;
> 
>  	struct vstor_packet vstor_packet;
> +	u32 hvpg_count;

This count is really the number of entries in the dma_range
array, right?  If so, perhaps "dma_range_count" would be
a better name so that it is more tightly associated.

> +	struct hv_dma_range *dma_range;
>  };
> 
> 
> @@ -509,6 +513,14 @@ struct storvsc_scan_work {
>  	u8 tgt_id;
>  };
> 
> +#define storvsc_dma_map(dev, page, offset, size, dir) \
> +	dma_map_page(dev, page, offset, size, dir)
> +
> +#define storvsc_dma_unmap(dev, dma_range, dir)		\
> +		dma_unmap_page(dev, dma_range.dma,	\
> +			       dma_range.mapping_size,	\
> +			       dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
> +

Each of these macros is used only once.  IMHO, they don't
add a lot of value.  Just coding dma_map/unmap_page()
inline would be fine and eliminate these lines of code.

>  static void storvsc_device_scan(struct work_struct *work)
>  {
>  	struct storvsc_scan_work *wrk;
> @@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
>  	struct hv_device *device;
>  	struct storvsc_device *stor_device;
>  	struct Scsi_Host *shost;
> +	int i;
> 
>  	if (channel->primary_channel != NULL)
>  		device = channel->primary_channel->device_obj;
> @@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
>  				request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
>  			}
> 
> +			if (request->dma_range) {
> +				for (i = 0; i < request->hvpg_count; i++)
> +					storvsc_dma_unmap(&device->device,
> +						request->dma_range[i],
> +						request->vstor_packet.vm_srb.data_in == READ_TYPE);

I think you can directly get the DMA direction as request->cmd->sc_data_direction.

> +
> +				kfree(request->dma_range);
> +			}
> +
>  			storvsc_on_receive(stor_device, packet, request);
>  			continue;
>  		}
> @@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  		unsigned int hvpgoff, hvpfns_to_add;
>  		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>  		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> +		dma_addr_t dma;
>  		u64 hvpfn;
> +		u32 size;
> 
>  		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  		payload->range.len = length;
>  		payload->range.offset = offset_in_hvpg;
> 
> +		cmd_request->dma_range = kcalloc(hvpg_count,
> +				 sizeof(*cmd_request->dma_range),
> +				 GFP_ATOMIC);

With this patch, it appears that storvsc_queuecommand() is always
doing bounce buffering, even when running in a non-isolated VM.
The dma_range is always allocated, and the inner loop below does
the dma mapping for every I/O page.  The corresponding code in
storvsc_on_channel_callback() that does the dma unmap allows for
the dma_range to be NULL, but that never happens.

> +		if (!cmd_request->dma_range) {
> +			ret = -ENOMEM;

The other memory allocation failure in this function returns
SCSI_MLQUEUE_DEVICE_BUSY.   It may be debatable as to whether
that's the best approach, but that's a topic for a different patch.  I
would suggest being consistent and using the same return code
here.

> +			goto free_payload;
> +		}
> 
>  		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
>  			/*
> @@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  			 * last sgl should be reached at the same time that
>  			 * the PFN array is filled.
>  			 */
> -			while (hvpfns_to_add--)
> -				payload->range.pfn_array[i++] =	hvpfn++;
> +			while (hvpfns_to_add--) {
> +				size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
> +					   (unsigned long)length);
> +				dma = storvsc_dma_map(&dev->device, pfn_to_page(hvpfn++),
> +						      offset_in_hvpg, size,
> +						      scmnd->sc_data_direction);
> +				if (dma_mapping_error(&dev->device, dma)) {
> +					ret = -ENOMEM;

The typical error from dma_map_page() will be running out of
bounce buffer memory.   This is a transient condition that should be
retried at the higher levels.  So make sure to return an error code
that indicates the I/O should be resubmitted.

> +					goto free_dma_range;
> +				}
> +
> +				if (offset_in_hvpg) {
> +					payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
> +					offset_in_hvpg = 0;
> +				}

I'm not clear on why payload->range.offset needs to be set again.
Even after the dma mapping is done, doesn't the offset in the first
page have to be the same?  If it wasn't the same, Hyper-V wouldn't
be able to process the PFN list correctly.  In fact, couldn't the above
code just always set offset_in_hvpg = 0?

> +
> +				cmd_request->dma_range[i].dma = dma;
> +				cmd_request->dma_range[i].mapping_size = size;
> +				payload->range.pfn_array[i++] = dma >> HV_HYP_PAGE_SHIFT;
> +				length -= size;
> +			}
>  		}
> +		cmd_request->hvpg_count = hvpg_count;

This line just saves the size of the dma_range array.  Could
it be moved up with the code that allocates the dma_range
array?  To me, it would make more sense to have all that
code together in one place.

>  	}

The whole approach here is to do dma remapping on each individual page
of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
each scatterlist entry as a unit?  Each scatterlist entry describes a range of
physically contiguous memory.  After dma_map_sg(), the resulting dma
address must also refer to a physically contiguous range in the swiotlb
bounce buffer memory.   So at the top of the "for" loop over the scatterlist
entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
hvpfn value based on the dma address instead of sg_page().  But everything
else is the same, and the inner loop for populating the pfn_arry is unmodified.
Furthermore, the dma_range array that you've added is not needed, since
scatterlist entries already have a dma_address field for saving the mapped
address, and dma_unmap_sg() uses that field.

One thing:  There's a maximum swiotlb mapping size, which I think works
out to be 256 Kbytes.  See swiotlb_max_mapping_size().  We need to make
sure that we don't get a scatterlist entry bigger than this size.  But I think
this already happens because you set the device->dma_mask field in
Patch 11 of this series.  __scsi_init_queue checks for this setting and
sets max_sectors to limits transfers to the max mapping size.

> 
>  	cmd_request->payload = payload;
> @@ -1860,13 +1911,20 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  	put_cpu();
> 
>  	if (ret == -EAGAIN) {
> -		if (payload_sz > sizeof(cmd_request->mpb))
> -			kfree(payload);
>  		/* no more space */
> -		return SCSI_MLQUEUE_DEVICE_BUSY;
> +		ret = SCSI_MLQUEUE_DEVICE_BUSY;
> +		goto free_dma_range;
>  	}
> 
>  	return 0;
> +
> +free_dma_range:
> +	kfree(cmd_request->dma_range);
> +
> +free_payload:
> +	if (payload_sz > sizeof(cmd_request->mpb))
> +		kfree(payload);
> +	return ret;
>  }
> 
>  static struct scsi_host_template scsi_driver = {
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  2021-08-19 18:11   ` Michael Kelley
@ 2021-08-20  4:13     ` hch
  2021-08-20  9:32     ` Tianyu Lan
  1 sibling, 0 replies; 64+ messages in thread
From: hch @ 2021-08-20  4:13 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen

On Thu, Aug 19, 2021 at 06:11:30PM +0000, Michael Kelley wrote:
> This function is manipulating page tables in the guest VM.  It is not involved
> in communicating with Hyper-V, or passing PFNs to Hyper-V.  The pfn array
> contains guest PFNs, not Hyper-V PFNs.  So it should use PAGE_SIZE
> instead of HV_HYP_PAGE_SIZE, and similarly PAGE_SHIFT and virt_to_pfn().
> If this code were ever to run on ARM64 in the future with PAGE_SIZE other
> than 4 Kbytes, the use of PAGE_SIZE is correct choice.

Yes.  I just stumled over this yesterday.  I think we can actually use a
nicer helper ased around vmap_range() to simplify this exactly because it
always uses the host page size.  I hope I can draft up a RFC today.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-08-19 18:14   ` Michael Kelley
@ 2021-08-20  4:21     ` hch
  2021-08-20 13:11       ` Tianyu Lan
  2021-08-20 13:30       ` Tom Lendacky
  2021-08-20 18:20     ` Tianyu Lan
  1 sibling, 2 replies; 64+ messages in thread
From: hch @ 2021-08-20  4:21 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen

On Thu, Aug 19, 2021 at 06:14:51PM +0000, Michael Kelley wrote:
> > +	if (!pfns)
> > +		return NULL;
> > +
> > +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
> > +		pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
> > +			+ (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
> > +
> > +	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
> > +	kfree(pfns);
> > +
> > +	return vaddr;
> > +}
> 
> This function appears to be a duplicate of hv_map_memory() in Patch 11 of this
> series.  Is it possible to structure things so there is only one implementation?  In

So right now it it identical, but there is an important difference:
the swiotlb memory is physically contiguous to start with, so we can
do the simple remap using vmap_range as suggested in the last mail.
The cases here are pretty weird in that netvsc_remap_buf is called right
after vzalloc.  That is we create _two_ mappings in vmalloc space right
after another, where the original one is just used for establishing the
"GPADL handle" and freeing the memory.  In other words, the obvious thing
to do here would be to use a vmalloc variant that allows to take the
shared_gpa_boundary into account when setting up the PTEs.

And here is somthing I need help from the x86 experts:  does the CPU
actually care about this shared_gpa_boundary?  Or does it just matter
for the generated DMA address?  Does somehow have a good pointer to
how this mechanism works?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-19 18:17   ` Michael Kelley
@ 2021-08-20  4:32     ` hch
  2021-08-20 15:40       ` Michael Kelley
  2021-08-20 16:01       ` Tianyu Lan
  2021-08-20 15:20     ` Tianyu Lan
  1 sibling, 2 replies; 64+ messages in thread
From: hch @ 2021-08-20  4:32 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen

On Thu, Aug 19, 2021 at 06:17:40PM +0000, Michael Kelley wrote:
> > +#define storvsc_dma_map(dev, page, offset, size, dir) \
> > +	dma_map_page(dev, page, offset, size, dir)
> > +
> > +#define storvsc_dma_unmap(dev, dma_range, dir)		\
> > +		dma_unmap_page(dev, dma_range.dma,	\
> > +			       dma_range.mapping_size,	\
> > +			       dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
> > +
> 
> Each of these macros is used only once.  IMHO, they don't
> add a lot of value.  Just coding dma_map/unmap_page()
> inline would be fine and eliminate these lines of code.

Yes, I had the same thought when looking over the code.  Especially
as macros tend to further obsfucate the code (compared to actual helper
functions).

> > +				for (i = 0; i < request->hvpg_count; i++)
> > +					storvsc_dma_unmap(&device->device,
> > +						request->dma_range[i],
> > +						request->vstor_packet.vm_srb.data_in == READ_TYPE);
> 
> I think you can directly get the DMA direction as request->cmd->sc_data_direction.

Yes.

> > 
> > @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
> >  		payload->range.len = length;
> >  		payload->range.offset = offset_in_hvpg;
> > 
> > +		cmd_request->dma_range = kcalloc(hvpg_count,
> > +				 sizeof(*cmd_request->dma_range),
> > +				 GFP_ATOMIC);
> 
> With this patch, it appears that storvsc_queuecommand() is always
> doing bounce buffering, even when running in a non-isolated VM.
> The dma_range is always allocated, and the inner loop below does
> the dma mapping for every I/O page.  The corresponding code in
> storvsc_on_channel_callback() that does the dma unmap allows for
> the dma_range to be NULL, but that never happens.

Maybe I'm missing something in the hyperv code, but I don't think
dma_map_page would bounce buffer for the non-isolated case.  It
will just return the physical address.

> > +		if (!cmd_request->dma_range) {
> > +			ret = -ENOMEM;
> 
> The other memory allocation failure in this function returns
> SCSI_MLQUEUE_DEVICE_BUSY.   It may be debatable as to whether
> that's the best approach, but that's a topic for a different patch.  I
> would suggest being consistent and using the same return code
> here.

Independent of if SCSI_MLQUEUE_DEVICE_BUSY is good (it it a common
pattern in SCSI drivers), ->queuecommand can't return normal
negative errnos.  It must return the SCSI_MLQUEUE_* codes or 0.
We should probably change the return type of the method definition
to a suitable enum to make this more clear.

> > +				if (offset_in_hvpg) {
> > +					payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
> > +					offset_in_hvpg = 0;
> > +				}
> 
> I'm not clear on why payload->range.offset needs to be set again.
> Even after the dma mapping is done, doesn't the offset in the first
> page have to be the same?  If it wasn't the same, Hyper-V wouldn't
> be able to process the PFN list correctly.  In fact, couldn't the above
> code just always set offset_in_hvpg = 0?

Careful.  DMA mapping is supposed to keep the offset in the page, but
for that the DMA mapping code needs to know what the device considers a
"page".  For that the driver needs to set the min_align_mask field in
struct device_dma_parameters.

> 
> The whole approach here is to do dma remapping on each individual page
> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> each scatterlist entry as a unit?  Each scatterlist entry describes a range of
> physically contiguous memory.  After dma_map_sg(), the resulting dma
> address must also refer to a physically contiguous range in the swiotlb
> bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> hvpfn value based on the dma address instead of sg_page().  But everything
> else is the same, and the inner loop for populating the pfn_arry is unmodified.
> Furthermore, the dma_range array that you've added is not needed, since
> scatterlist entries already have a dma_address field for saving the mapped
> address, and dma_unmap_sg() uses that field.

Yes, I think dma_map_sg is the right thing to use here, probably even
for the non-isolated case so that we can get the hv drivers out of their
little corner and into being more like a normal kernel driver.  That
is, use the scsi_dma_map/scsi_dma_unmap helpers, and then iterate over
the dma addresses one page at a time using for_each_sg_dma_page.

> 
> One thing:  There's a maximum swiotlb mapping size, which I think works
> out to be 256 Kbytes.  See swiotlb_max_mapping_size().  We need to make
> sure that we don't get a scatterlist entry bigger than this size.  But I think
> this already happens because you set the device->dma_mask field in
> Patch 11 of this series.  __scsi_init_queue checks for this setting and
> sets max_sectors to limits transfers to the max mapping size.

Indeed.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  2021-08-19 18:11   ` Michael Kelley
  2021-08-20  4:13     ` hch
@ 2021-08-20  9:32     ` Tianyu Lan
  1 sibling, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20  9:32 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen



On 8/20/2021 2:11 AM, Michael Kelley wrote:
>>   }
>> +
>> +/*
>> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
>> + */
>> +void *hv_map_memory(void *addr, unsigned long size)
>> +{
>> +	unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
>> +				      sizeof(unsigned long), GFP_KERNEL);
>> +	void *vaddr;
>> +	int i;
>> +
>> +	if (!pfns)
>> +		return NULL;
>> +
>> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
>> +		pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
>> +			(ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
>> +
>> +	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE,	PAGE_KERNEL_IO);
>> +	kfree(pfns);
>> +
>> +	return vaddr;
>> +}
> This function is manipulating page tables in the guest VM.  It is not involved
> in communicating with Hyper-V, or passing PFNs to Hyper-V.  The pfn array
> contains guest PFNs, not Hyper-V PFNs.  So it should use PAGE_SIZE
> instead of HV_HYP_PAGE_SIZE, and similarly PAGE_SHIFT and virt_to_pfn().
> If this code were ever to run on ARM64 in the future with PAGE_SIZE other
> than 4 Kbytes, the use of PAGE_SIZE is correct choice.

OK. Will update with PAGE_SIZE.


> 
>> +void __init hyperv_iommu_swiotlb_init(void)
>> +{
>> +	unsigned long bytes;
>> +
>> +	/*
>> +	 * Allocate Hyper-V swiotlb bounce buffer at early place
>> +	 * to reserve large contiguous memory.
>> +	 */
>> +	hyperv_io_tlb_size = 256 * 1024 * 1024;
> A hard coded size here seems problematic.   The memory size of
> Isolated VMs can vary by orders of magnitude.  I see that
> xen_swiotlb_init() uses swiotlb_size_or_default(), which at least
> pays attention to the value specified on the kernel boot line.
> 
> Another example is sev_setup_arch(), which in the native case sets
> the size to 6% of main memory, with a max of 1 Gbyte.  This is
> the case that's closer to Isolated VMs, so doing something
> similar could be a good approach.
> 

Yes, agree. It's better to keep bounce buffer size with AMD SEV.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-08-20  4:21     ` hch
@ 2021-08-20 13:11       ` Tianyu Lan
  2021-08-20 13:30       ` Tom Lendacky
  1 sibling, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20 13:11 UTC (permalink / raw)
  To: hch, Michael Kelley
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu,
	Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, will,
	davem, kuba, jejb, martin.petersen, arnd, m.szyprowski,
	robin.murphy, thomas.lendacky, brijesh.singh, ardb, Tianyu Lan,
	pgonda, martin.b.radev, akpm, kirill.shutemov, rppt, sfr,
	saravanand, krish.sadhukhan, aneesh.kumar, xen-devel, rientjes,
	hannes, tj, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen



On 8/20/2021 12:21 PM, hch@lst.de wrote:
> On Thu, Aug 19, 2021 at 06:14:51PM +0000, Michael Kelley wrote:
>>> +	if (!pfns)
>>> +		return NULL;
>>> +
>>> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
>>> +		pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
>>> +			+ (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
>>> +
>>> +	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
>>> +	kfree(pfns);
>>> +
>>> +	return vaddr;
>>> +}
>>
>> This function appears to be a duplicate of hv_map_memory() in Patch 11 of this
>> series.  Is it possible to structure things so there is only one implementation?  In
> 
> So right now it it identical, but there is an important difference:
> the swiotlb memory is physically contiguous to start with, so we can
> do the simple remap using vmap_range as suggested in the last mail.
> The cases here are pretty weird in that netvsc_remap_buf is called right
> after vzalloc.  That is we create _two_ mappings in vmalloc space right
> after another, where the original one is just used for establishing the
> "GPADL handle" and freeing the memory.  In other words, the obvious thing
> to do here would be to use a vmalloc variant that allows to take the
> shared_gpa_boundary into account when setting up the PTEs.

The buffer is allocated via vmalloc(). It needs to be marked as host
visible via hyperv hvcall before being accessed via address space above
shared_gpa_boundary. The hvcall is called in the vmbus_establish_gpadl().

> 
> And here is somthing I need help from the x86 experts:  does the CPU
> actually care about this shared_gpa_boundary?  Or does it just matter
> for the generated DMA address?  Does somehow have a good pointer to
> how this mechanism works?
> 

The shared_gpa_boundary is vTOM feature of AMD SEV-SNP. Tom Lendacky
introduced the feature in previous mail. I copy it here and please have
a look.

 From Tom Lendacky:
IIUC, this is using the vTOM feature of SEV-SNP. When this feature is
enabled for a VMPL level, any physical memory addresses below vTOM are
considered private/encrypted and any physical memory addresses above vTOM
are considered shared/unencrypted. With this option, you don't need a
fully enlightened guest that sets and clears page table encryption bits.
You just need the DMA buffers to be allocated in the proper range above 
vTOM.

See the section on "Virtual Machine Privilege Levels" in
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-08-20  4:21     ` hch
  2021-08-20 13:11       ` Tianyu Lan
@ 2021-08-20 13:30       ` Tom Lendacky
  1 sibling, 0 replies; 64+ messages in thread
From: Tom Lendacky @ 2021-08-20 13:30 UTC (permalink / raw)
  To: hch, Michael Kelley
  Cc: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd,
	m.szyprowski, robin.murphy, brijesh.singh, ardb, Tianyu Lan,
	pgonda, martin.b.radev, akpm, kirill.shutemov, rppt, sfr,
	saravanand, krish.sadhukhan, aneesh.kumar, xen-devel, rientjes,
	hannes, tj, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

On 8/19/21 11:21 PM, hch@lst.de wrote:
> On Thu, Aug 19, 2021 at 06:14:51PM +0000, Michael Kelley wrote:
>>> +	if (!pfns)
>>> +		return NULL;
>>> +
>>> +	for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
>>> +		pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
>>> +			+ (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
>>> +
>>> +	vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
>>> +	kfree(pfns);
>>> +
>>> +	return vaddr;
>>> +}
>>
>> This function appears to be a duplicate of hv_map_memory() in Patch 11 of this
>> series.  Is it possible to structure things so there is only one implementation?  In
> 
> So right now it it identical, but there is an important difference:
> the swiotlb memory is physically contiguous to start with, so we can
> do the simple remap using vmap_range as suggested in the last mail.
> The cases here are pretty weird in that netvsc_remap_buf is called right
> after vzalloc.  That is we create _two_ mappings in vmalloc space right
> after another, where the original one is just used for establishing the
> "GPADL handle" and freeing the memory.  In other words, the obvious thing
> to do here would be to use a vmalloc variant that allows to take the
> shared_gpa_boundary into account when setting up the PTEs.
> 
> And here is somthing I need help from the x86 experts:  does the CPU
> actually care about this shared_gpa_boundary?  Or does it just matter
> for the generated DMA address?  Does somehow have a good pointer to
> how this mechanism works?

The CPU does care. Here's some info:

APM Volume 2, Section 15.36.8:
https://www.amd.com/system/files/TechDocs/24593.pdf

AMD SEV-SNP Whitepaper, Virtual Machine Privilege Levels (~page 14):
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-19 18:17   ` Michael Kelley
  2021-08-20  4:32     ` hch
@ 2021-08-20 15:20     ` Tianyu Lan
  2021-08-20 15:37       ` Tianyu Lan
  2021-08-20 16:08       ` Michael Kelley
  1 sibling, 2 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20 15:20 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen



On 8/20/2021 2:17 AM, Michael Kelley wrote:
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
>>
> 
> Subject line tag should be "scsi: storvsc:"
> 
>> In Isolation VM, all shared memory with host needs to mark visible
>> to host via hvcall. vmbus_establish_gpadl() has already done it for
>> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
>> mpb_desc() still need to handle. Use DMA API to map/umap these
> 
> s/need to handle/needs to be handled/
> 
>> memory during sending/receiving packet and Hyper-V DMA ops callback
>> will use swiotlb function to allocate bounce buffer and copy data
>> from/to bounce buffer.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>>   drivers/scsi/storvsc_drv.c | 68 +++++++++++++++++++++++++++++++++++---
>>   1 file changed, 63 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
>> index 328bb961c281..78320719bdd8 100644
>> --- a/drivers/scsi/storvsc_drv.c
>> +++ b/drivers/scsi/storvsc_drv.c
>> @@ -21,6 +21,8 @@
>>   #include <linux/device.h>
>>   #include <linux/hyperv.h>
>>   #include <linux/blkdev.h>
>> +#include <linux/io.h>
>> +#include <linux/dma-mapping.h>
>>   #include <scsi/scsi.h>
>>   #include <scsi/scsi_cmnd.h>
>>   #include <scsi/scsi_host.h>
>> @@ -427,6 +429,8 @@ struct storvsc_cmd_request {
>>   	u32 payload_sz;
>>
>>   	struct vstor_packet vstor_packet;
>> +	u32 hvpg_count;
> 
> This count is really the number of entries in the dma_range
> array, right?  If so, perhaps "dma_range_count" would be
> a better name so that it is more tightly associated.

Yes, will update.

> 
>> +	struct hv_dma_range *dma_range;
>>   };
>>
>>
>> @@ -509,6 +513,14 @@ struct storvsc_scan_work {
>>   	u8 tgt_id;
>>   };
>>
>> +#define storvsc_dma_map(dev, page, offset, size, dir) \
>> +	dma_map_page(dev, page, offset, size, dir)
>> +
>> +#define storvsc_dma_unmap(dev, dma_range, dir)		\
>> +		dma_unmap_page(dev, dma_range.dma,	\
>> +			       dma_range.mapping_size,	\
>> +			       dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
>> +
> 
> Each of these macros is used only once.  IMHO, they don't
> add a lot of value.  Just coding dma_map/unmap_page()
> inline would be fine and eliminate these lines of code.

OK. Will update.

> 
>>   static void storvsc_device_scan(struct work_struct *work)
>>   {
>>   	struct storvsc_scan_work *wrk;
>> @@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
>>   	struct hv_device *device;
>>   	struct storvsc_device *stor_device;
>>   	struct Scsi_Host *shost;
>> +	int i;
>>
>>   	if (channel->primary_channel != NULL)
>>   		device = channel->primary_channel->device_obj;
>> @@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
>>   				request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
>>   			}
>>
>> +			if (request->dma_range) {
>> +				for (i = 0; i < request->hvpg_count; i++)
>> +					storvsc_dma_unmap(&device->device,
>> +						request->dma_range[i],
>> +						request->vstor_packet.vm_srb.data_in == READ_TYPE);
> 
> I think you can directly get the DMA direction as request->cmd->sc_data_direction.
> 
>> +
>> +				kfree(request->dma_range);
>> +			}
>> +
>>   			storvsc_on_receive(stor_device, packet, request);
>>   			continue;
>>   		}
>> @@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>>   		unsigned int hvpgoff, hvpfns_to_add;
>>   		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>>   		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
>> +		dma_addr_t dma;
>>   		u64 hvpfn;
>> +		u32 size;
>>
>>   		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
>>
>> @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>>   		payload->range.len = length;
>>   		payload->range.offset = offset_in_hvpg;
>>
>> +		cmd_request->dma_range = kcalloc(hvpg_count,
>> +				 sizeof(*cmd_request->dma_range),
>> +				 GFP_ATOMIC);
> 
> With this patch, it appears that storvsc_queuecommand() is always
> doing bounce buffering, even when running in a non-isolated VM.

In the non-isolated VM, SWIOTLB_FORCE mode isn't enabled and so
the swiotlb bounce buffer will not work.

> The dma_range is always allocated, and the inner loop below does
> the dma mapping for every I/O page.  The corresponding code in
> storvsc_on_channel_callback() that does the dma unmap allows for
> the dma_range to be NULL, but that never happens.

Yes, dma mapping function will return PA directly in non-isolated VM.

> 
>> +		if (!cmd_request->dma_range) {
>> +			ret = -ENOMEM;
> 
> The other memory allocation failure in this function returns
> SCSI_MLQUEUE_DEVICE_BUSY.   It may be debatable as to whether
> that's the best approach, but that's a topic for a different patch.  I
> would suggest being consistent and using the same return code
> here.

OK. I will keep to return SCSI_MLQUEUE_DEVICE_BUSY here.

> 
>> +			goto free_payload;
>> +		}
>>
>>   		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
>>   			/*
>> @@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>>   			 * last sgl should be reached at the same time that
>>   			 * the PFN array is filled.
>>   			 */
>> -			while (hvpfns_to_add--)
>> -				payload->range.pfn_array[i++] =	hvpfn++;
>> +			while (hvpfns_to_add--) {
>> +				size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
>> +					   (unsigned long)length);
>> +				dma = storvsc_dma_map(&dev->device, pfn_to_page(hvpfn++),
>> +						      offset_in_hvpg, size,
>> +						      scmnd->sc_data_direction);
>> +				if (dma_mapping_error(&dev->device, dma)) {
>> +					ret = -ENOMEM;
> 
> The typical error from dma_map_page() will be running out of
> bounce buffer memory.   This is a transient condition that should be
> retried at the higher levels.  So make sure to return an error code
> that indicates the I/O should be resubmitted.

OK. It looks like error code should be SCSI_MLQUEUE_DEVICE_BUSY here.

> 
>> +					goto free_dma_range;
>> +				}
>> +
>> +				if (offset_in_hvpg) {
>> +					payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
>> +					offset_in_hvpg = 0;
>> +				}
> 
> I'm not clear on why payload->range.offset needs to be set again.
> Even after the dma mapping is done, doesn't the offset in the first
> page have to be the same?  If it wasn't the same, Hyper-V wouldn't
> be able to process the PFN list correctly.  In fact, couldn't the above
> code just always set offset_in_hvpg = 0?

The offset will be changed. The swiotlb bounce buffer is allocated with 
IO_TLB_SIZE(2K) as unit. So the offset here may be changed.

> 
>> +
>> +				cmd_request->dma_range[i].dma = dma;
>> +				cmd_request->dma_range[i].mapping_size = size;
>> +				payload->range.pfn_array[i++] = dma >> HV_HYP_PAGE_SHIFT;
>> +				length -= size;
>> +			}
>>   		}
>> +		cmd_request->hvpg_count = hvpg_count;
> 
> This line just saves the size of the dma_range array.  Could
> it be moved up with the code that allocates the dma_range
> array?  To me, it would make more sense to have all that
> code together in one place.

Sure. Will update.

> 
>>   	}
> 
> The whole approach here is to do dma remapping on each individual page
> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> each scatterlist entry as a unit?  Each scatterlist entry describes a range of
> physically contiguous memory.  After dma_map_sg(), the resulting dma
> address must also refer to a physically contiguous range in the swiotlb
> bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> hvpfn value based on the dma address instead of sg_page().  But everything
> else is the same, and the inner loop for populating the pfn_arry is unmodified.
> Furthermore, the dma_range array that you've added is not needed, since
> scatterlist entries already have a dma_address field for saving the mapped
> address, and dma_unmap_sg() uses that field.

I don't use dma_map_sg() here in order to avoid introducing one more 
loop(e,g dma_map_sg()). We already have a loop to populate 
cmd_request->dma_range[] and so do the dma map in the same loop.

> 
> One thing:  There's a maximum swiotlb mapping size, which I think works
> out to be 256 Kbytes.  See swiotlb_max_mapping_size().  We need to make
> sure that we don't get a scatterlist entry bigger than this size.  But I think
> this already happens because you set the device->dma_mask field in
> Patch 11 of this series.  __scsi_init_queue checks for this setting and
> sets max_sectors to limits transfers to the max mapping size.

I will double check.

> 
>>
>>   	cmd_request->payload = payload;
>> @@ -1860,13 +1911,20 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>>   	put_cpu();
>>
>>   	if (ret == -EAGAIN) {
>> -		if (payload_sz > sizeof(cmd_request->mpb))
>> -			kfree(payload);
>>   		/* no more space */
>> -		return SCSI_MLQUEUE_DEVICE_BUSY;
>> +		ret = SCSI_MLQUEUE_DEVICE_BUSY;
>> +		goto free_dma_range;
>>   	}
>>
>>   	return 0;
>> +
>> +free_dma_range:
>> +	kfree(cmd_request->dma_range);
>> +
>> +free_payload:
>> +	if (payload_sz > sizeof(cmd_request->mpb))
>> +		kfree(payload);
>> +	return ret;
>>   }
>>
>>   static struct scsi_host_template scsi_driver = {
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20 15:20     ` Tianyu Lan
@ 2021-08-20 15:37       ` Tianyu Lan
  2021-08-20 16:08       ` Michael Kelley
  1 sibling, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20 15:37 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

On 8/20/2021 11:20 PM, Tianyu Lan wrote:
>> The whole approach here is to do dma remapping on each individual page
>> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to 
>> map
>> each scatterlist entry as a unit?  Each scatterlist entry describes a 
>> range of
>> physically contiguous memory.  After dma_map_sg(), the resulting dma
>> address must also refer to a physically contiguous range in the swiotlb
>> bounce buffer memory.   So at the top of the "for" loop over the 
>> scatterlist
>> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
>> hvpfn value based on the dma address instead of sg_page().  But 
>> everything
>> else is the same, and the inner loop for populating the pfn_arry is 
>> unmodified.
>> Furthermore, the dma_range array that you've added is not needed, since
>> scatterlist entries already have a dma_address field for saving the 
>> mapped
>> address, and dma_unmap_sg() uses that field.
> 
> I don't use dma_map_sg() here in order to avoid introducing one more 
> loop(e,g dma_map_sg()). We already have a loop to populate 
> cmd_request->dma_range[] and so do the dma map in the same loop.

Sorry for a typo. s/cmd_request->dma_range[]/payload->range.pfn_array[]/

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20  4:32     ` hch
@ 2021-08-20 15:40       ` Michael Kelley
  2021-08-24  8:49         ` min_align_mask " hch
  2021-08-20 16:01       ` Tianyu Lan
  1 sibling, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-20 15:40 UTC (permalink / raw)
  To: hch
  Cc: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen

From: hch@lst.de <hch@lst.de> Sent: Thursday, August 19, 2021 9:33 PM
> 
> On Thu, Aug 19, 2021 at 06:17:40PM +0000, Michael Kelley wrote:
> > >
> > > @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
> > >  		payload->range.len = length;
> > >  		payload->range.offset = offset_in_hvpg;
> > >
> > > +		cmd_request->dma_range = kcalloc(hvpg_count,
> > > +				 sizeof(*cmd_request->dma_range),
> > > +				 GFP_ATOMIC);
> >
> > With this patch, it appears that storvsc_queuecommand() is always
> > doing bounce buffering, even when running in a non-isolated VM.
> > The dma_range is always allocated, and the inner loop below does
> > the dma mapping for every I/O page.  The corresponding code in
> > storvsc_on_channel_callback() that does the dma unmap allows for
> > the dma_range to be NULL, but that never happens.
> 
> Maybe I'm missing something in the hyperv code, but I don't think
> dma_map_page would bounce buffer for the non-isolated case.  It
> will just return the physical address.

OK, right.  In the isolated VM case, the swiotlb is in force mode
and will do bounce buffering.  In the non-isolated case,
dma_map_page_attrs() -> dma_direct_map_page() does a lot of
checking but eventually just returns the physical address.  As this
patch is currently coded, it adds a fair amount of overhead
here in storvsc_queuecommand(), plus the overhead of the dma
mapping function deciding to use the identity mapping.  But if
dma_map_sg() is used and the code is simplified a bit, the overhead
will be less in general and will be per sgl entry instead of per page.

> 
> > > +				if (offset_in_hvpg) {
> > > +					payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
> > > +					offset_in_hvpg = 0;
> > > +				}
> >
> > I'm not clear on why payload->range.offset needs to be set again.
> > Even after the dma mapping is done, doesn't the offset in the first
> > page have to be the same?  If it wasn't the same, Hyper-V wouldn't
> > be able to process the PFN list correctly.  In fact, couldn't the above
> > code just always set offset_in_hvpg = 0?
> 
> Careful.  DMA mapping is supposed to keep the offset in the page, but
> for that the DMA mapping code needs to know what the device considers a
> "page".  For that the driver needs to set the min_align_mask field in
> struct device_dma_parameters.
> 

I see that the swiotlb code gets and uses the min_align_mask field.  But
the NVME driver is the only driver that ever sets it, so the value is zero
in all other cases.  Does swiotlb just use PAGE_SIZE in that that case?  I
couldn't tell from a quick glance at the swiotlb code.

> >
> > The whole approach here is to do dma remapping on each individual page
> > of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> > each scatterlist entry as a unit?  Each scatterlist entry describes a range of
> > physically contiguous memory.  After dma_map_sg(), the resulting dma
> > address must also refer to a physically contiguous range in the swiotlb
> > bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> > entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> > hvpfn value based on the dma address instead of sg_page().  But everything
> > else is the same, and the inner loop for populating the pfn_arry is unmodified.
> > Furthermore, the dma_range array that you've added is not needed, since
> > scatterlist entries already have a dma_address field for saving the mapped
> > address, and dma_unmap_sg() uses that field.
> 
> Yes, I think dma_map_sg is the right thing to use here, probably even
> for the non-isolated case so that we can get the hv drivers out of their
> little corner and into being more like a normal kernel driver.  That
> is, use the scsi_dma_map/scsi_dma_unmap helpers, and then iterate over
> the dma addresses one page at a time using for_each_sg_dma_page.
> 

Doing some broader revisions to the Hyper-V storvsc driver is up next on
my to-do list.  Rather than significantly modifying the non-isolated case in
this patch set, I'd suggest factoring it into my broader revisions.

Michael

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20  4:32     ` hch
  2021-08-20 15:40       ` Michael Kelley
@ 2021-08-20 16:01       ` Tianyu Lan
  1 sibling, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20 16:01 UTC (permalink / raw)
  To: hch, Michael Kelley
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu,
	Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz,
	konrad.wilk, boris.ostrovsky, jgross, sstabellini, joro, will,
	davem, kuba, jejb, martin.petersen, arnd, m.szyprowski,
	robin.murphy, thomas.lendacky, brijesh.singh, ardb, Tianyu Lan,
	pgonda, martin.b.radev, akpm, kirill.shutemov, rppt, sfr,
	saravanand, krish.sadhukhan, aneesh.kumar, xen-devel, rientjes,
	hannes, tj, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen



On 8/20/2021 12:32 PM, hch@lst.de wrote:
> On Thu, Aug 19, 2021 at 06:17:40PM +0000, Michael Kelley wrote:
>>> +#define storvsc_dma_map(dev, page, offset, size, dir) \
>>> +	dma_map_page(dev, page, offset, size, dir)
>>> +
>>> +#define storvsc_dma_unmap(dev, dma_range, dir)		\
>>> +		dma_unmap_page(dev, dma_range.dma,	\
>>> +			       dma_range.mapping_size,	\
>>> +			       dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
>>> +
>>
>> Each of these macros is used only once.  IMHO, they don't
>> add a lot of value.  Just coding dma_map/unmap_page()
>> inline would be fine and eliminate these lines of code.
> 
> Yes, I had the same thought when looking over the code.  Especially
> as macros tend to further obsfucate the code (compared to actual helper
> functions).
> 
>>> +				for (i = 0; i < request->hvpg_count; i++)
>>> +					storvsc_dma_unmap(&device->device,
>>> +						request->dma_range[i],
>>> +						request->vstor_packet.vm_srb.data_in == READ_TYPE);
>>
>> I think you can directly get the DMA direction as request->cmd->sc_data_direction.
> 
> Yes.
> 
>>>
>>> @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>>>   		payload->range.len = length;
>>>   		payload->range.offset = offset_in_hvpg;
>>>
>>> +		cmd_request->dma_range = kcalloc(hvpg_count,
>>> +				 sizeof(*cmd_request->dma_range),
>>> +				 GFP_ATOMIC);
>>
>> With this patch, it appears that storvsc_queuecommand() is always
>> doing bounce buffering, even when running in a non-isolated VM.
>> The dma_range is always allocated, and the inner loop below does
>> the dma mapping for every I/O page.  The corresponding code in
>> storvsc_on_channel_callback() that does the dma unmap allows for
>> the dma_range to be NULL, but that never happens.
> 
> Maybe I'm missing something in the hyperv code, but I don't think
> dma_map_page would bounce buffer for the non-isolated case.  It
> will just return the physical address.

Yes, the swiotlb_force mode isn't enabled in non-isolated VM and so
dma_page_page() returns the physical address directly.

> 
>>> +		if (!cmd_request->dma_range) {
>>> +			ret = -ENOMEM;
>>
>> The other memory allocation failure in this function returns
>> SCSI_MLQUEUE_DEVICE_BUSY.   It may be debatable as to whether
>> that's the best approach, but that's a topic for a different patch.  I
>> would suggest being consistent and using the same return code
>> here.
> 
> Independent of if SCSI_MLQUEUE_DEVICE_BUSY is good (it it a common
> pattern in SCSI drivers), ->queuecommand can't return normal
> negative errnos.  It must return the SCSI_MLQUEUE_* codes or 0.
> We should probably change the return type of the method definition
> to a suitable enum to make this more clear.

Yes, will update. Thanks.

> 
>>> +				if (offset_in_hvpg) {
>>> +					payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
>>> +					offset_in_hvpg = 0;
>>> +				}
>>
>> I'm not clear on why payload->range.offset needs to be set again.
>> Even after the dma mapping is done, doesn't the offset in the first
>> page have to be the same?  If it wasn't the same, Hyper-V wouldn't
>> be able to process the PFN list correctly.  In fact, couldn't the above
>> code just always set offset_in_hvpg = 0?
> 
> Careful.  DMA mapping is supposed to keep the offset in the page, but
> for that the DMA mapping code needs to know what the device considers a
> "page".  For that the driver needs to set the min_align_mask field in
> struct device_dma_parameters.

The default allocate unit of swiotlb bounce is IO_TLB_SIZE(2k).
Otherwise, I find some scsi request cmd's length is less than 100byte.
Keep a small unit can avoid wasting bounce buffer and just need to
update the offset.


> 
>>
>> The whole approach here is to do dma remapping on each individual page
>> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
>> each scatterlist entry as a unit?  Each scatterlist entry describes a range of
>> physically contiguous memory.  After dma_map_sg(), the resulting dma
>> address must also refer to a physically contiguous range in the swiotlb
>> bounce buffer memory.   So at the top of the "for" loop over the scatterlist
>> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
>> hvpfn value based on the dma address instead of sg_page().  But everything
>> else is the same, and the inner loop for populating the pfn_arry is unmodified.
>> Furthermore, the dma_range array that you've added is not needed, since
>> scatterlist entries already have a dma_address field for saving the mapped
>> address, and dma_unmap_sg() uses that field.
> 
> Yes, I think dma_map_sg is the right thing to use here, probably even
> for the non-isolated case so that we can get the hv drivers out of their
> little corner and into being more like a normal kernel driver.  That
> is, use the scsi_dma_map/scsi_dma_unmap helpers, and then iterate over
> the dma addresses one page at a time using for_each_sg_dma_page.
> 

I wonder whether we may introduce a new API scsi_dma_map_with_callback. 
Caller provides a callback and run callback in sg loop of 
dma_direct_map_sg(). Caller need to update some data structure in the sg 
loop. Here is such case that driver needs to populate 
payload->range.pfn_array[]. This is why I don't use dma_map_sg() here.


>>
>> One thing:  There's a maximum swiotlb mapping size, which I think works
>> out to be 256 Kbytes.  See swiotlb_max_mapping_size().  We need to make
>> sure that we don't get a scatterlist entry bigger than this size.  But I think
>> this already happens because you set the device->dma_mask field in
>> Patch 11 of this series.  __scsi_init_queue checks for this setting and
>> sets max_sectors to limits transfers to the max mapping size.
> 
> Indeed.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20 15:20     ` Tianyu Lan
  2021-08-20 15:37       ` Tianyu Lan
@ 2021-08-20 16:08       ` Michael Kelley
  2021-08-20 18:04         ` Tianyu Lan
  1 sibling, 1 reply; 64+ messages in thread
From: Michael Kelley @ 2021-08-20 16:08 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 20, 2021 8:20 AM
> 
> On 8/20/2021 2:17 AM, Michael Kelley wrote:
> > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, August 9, 2021 10:56 AM
> >
> > I'm not clear on why payload->range.offset needs to be set again.
> > Even after the dma mapping is done, doesn't the offset in the first
> > page have to be the same?  If it wasn't the same, Hyper-V wouldn't
> > be able to process the PFN list correctly.  In fact, couldn't the above
> > code just always set offset_in_hvpg = 0?
> 
> The offset will be changed. The swiotlb bounce buffer is allocated with
> IO_TLB_SIZE(2K) as unit. So the offset here may be changed.
> 

We need to prevent the offset from changing.  The storvsc driver passes
just a PFN list to Hyper-V, plus an overall starting offset and length.  Unlike
the netvsc driver, each entry in the PFN list does *not* have its own offset
and length.  Hyper-V assumes that the list is "dense" and that there are
no holes (i.e., unused memory areas).

For example, consider an original buffer passed into storvsc_queuecommand()
of 8 Kbytes, but aligned with 1 Kbytes at the end of the first page, then
4 Kbytes in the second page, and 3 Kbytes in the beginning of the third page.
The offset of that first 1 Kbytes has to remain as 3 Kbytes.  If bounce buffering
moves it to a different offset, there's no way to tell Hyper-V to ignore the
remaining bytes in the first page (at least not without using a different
method to communicate with Hyper-V).   In such a case, the wrong
data will get transferred.  Presumably the easier solution is to set the
min_align_mask field as Christop suggested.

> 
> >
> >>   	}
> >
> > The whole approach here is to do dma remapping on each individual page
> > of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> > each scatterlist entry as a unit?  Each scatterlist entry describes a range of
> > physically contiguous memory.  After dma_map_sg(), the resulting dma
> > address must also refer to a physically contiguous range in the swiotlb
> > bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> > entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> > hvpfn value based on the dma address instead of sg_page().  But everything
> > else is the same, and the inner loop for populating the pfn_arry is unmodified.
> > Furthermore, the dma_range array that you've added is not needed, since
> > scatterlist entries already have a dma_address field for saving the mapped
> > address, and dma_unmap_sg() uses that field.
> 
> I don't use dma_map_sg() here in order to avoid introducing one more
> loop(e,g dma_map_sg()). We already have a loop to populate
> cmd_request->dma_range[] and so do the dma map in the same loop.
> 

I'm not seeing where the additional loop comes from.  Storvsc
already has a loop through the sgl entries.  Retain that loop and call
dma_map_sg() with nents set to 1.  Then the sequence is
dma_map_sg() --> dma_map_sg_attrs() --> dma_direct_map_sg() ->
dma_direct_map_page().  The latter function will call swiotlb_map()
to map all pages of the sgl entry as a single operation.

Michael



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20 16:08       ` Michael Kelley
@ 2021-08-20 18:04         ` Tianyu Lan
  2021-08-20 19:22           ` Michael Kelley
  2021-08-24  8:46           ` hch
  0 siblings, 2 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20 18:04 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen



On 8/21/2021 12:08 AM, Michael Kelley wrote:
>>>>    	}
>>> The whole approach here is to do dma remapping on each individual page
>>> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
>>> each scatterlist entry as a unit?  Each scatterlist entry describes a range of
>>> physically contiguous memory.  After dma_map_sg(), the resulting dma
>>> address must also refer to a physically contiguous range in the swiotlb
>>> bounce buffer memory.   So at the top of the "for" loop over the scatterlist
>>> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
>>> hvpfn value based on the dma address instead of sg_page().  But everything
>>> else is the same, and the inner loop for populating the pfn_arry is unmodified.
>>> Furthermore, the dma_range array that you've added is not needed, since
>>> scatterlist entries already have a dma_address field for saving the mapped
>>> address, and dma_unmap_sg() uses that field.
>> I don't use dma_map_sg() here in order to avoid introducing one more
>> loop(e,g dma_map_sg()). We already have a loop to populate
>> cmd_request->dma_range[] and so do the dma map in the same loop.
>>
> I'm not seeing where the additional loop comes from.  Storvsc
> already has a loop through the sgl entries.  Retain that loop and call
> dma_map_sg() with nents set to 1.  Then the sequence is
> dma_map_sg() --> dma_map_sg_attrs() --> dma_direct_map_sg() ->
> dma_direct_map_page().  The latter function will call swiotlb_map()
> to map all pages of the sgl entry as a single operation.

After dma_map_sg(), we still need to go through scatter list again to 
populate payload->rrange.pfn_array. We may just go through the scatter 
list just once if dma_map_sg() accepts a callback and run it during go
through scatter list.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-08-19 18:14   ` Michael Kelley
  2021-08-20  4:21     ` hch
@ 2021-08-20 18:20     ` Tianyu Lan
  1 sibling, 0 replies; 64+ messages in thread
From: Tianyu Lan @ 2021-08-20 18:20 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

On 8/20/2021 2:14 AM, Michael Kelley wrote:
>> @@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
>>   		goto cleanup;
>>   	}
>>
>> +	if (hv_isolation_type_snp()) {
>> +		vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
>> +		if (!vaddr)
>> +			goto cleanup;
> I don't think this error case is handled correctly.  Doesn't the remapping
> of the recv buf need to be undone?
>

Yes, actually I thought to return error here and free_netvsc_device() 
will help to unmap recv_buffer finally. But I forget to set ret = 
-ENOMEM when add netvsc_remap_buf() helper.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20 18:04         ` Tianyu Lan
@ 2021-08-20 19:22           ` Michael Kelley
  2021-08-24  8:46           ` hch
  1 sibling, 0 replies; 64+ messages in thread
From: Michael Kelley @ 2021-08-20 19:22 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj
  Cc: iommu, linux-arch, linux-hyperv, linux-kernel, linux-scsi,
	netdev, vkuznets, parri.andrea, dave.hansen

From: Tianyu Lan <ltykernel@gmail.com> Sent: Friday, August 20, 2021 11:04 AM
> 
> On 8/21/2021 12:08 AM, Michael Kelley wrote:
> >>>>    	}
> >>> The whole approach here is to do dma remapping on each individual page
> >>> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> >>> each scatterlist entry as a unit?  Each scatterlist entry describes a range of
> >>> physically contiguous memory.  After dma_map_sg(), the resulting dma
> >>> address must also refer to a physically contiguous range in the swiotlb
> >>> bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> >>> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> >>> hvpfn value based on the dma address instead of sg_page().  But everything
> >>> else is the same, and the inner loop for populating the pfn_arry is unmodified.
> >>> Furthermore, the dma_range array that you've added is not needed, since
> >>> scatterlist entries already have a dma_address field for saving the mapped
> >>> address, and dma_unmap_sg() uses that field.
> >> I don't use dma_map_sg() here in order to avoid introducing one more
> >> loop(e,g dma_map_sg()). We already have a loop to populate
> >> cmd_request->dma_range[] and so do the dma map in the same loop.
> >>
> > I'm not seeing where the additional loop comes from.  Storvsc
> > already has a loop through the sgl entries.  Retain that loop and call
> > dma_map_sg() with nents set to 1.  Then the sequence is
> > dma_map_sg() --> dma_map_sg_attrs() --> dma_direct_map_sg() ->
> > dma_direct_map_page().  The latter function will call swiotlb_map()
> > to map all pages of the sgl entry as a single operation.
> 
> After dma_map_sg(), we still need to go through scatter list again to
> populate payload->rrange.pfn_array. We may just go through the scatter
> list just once if dma_map_sg() accepts a callback and run it during go
> through scatter list.

Here's some code for what I'm suggesting (not even compile tested).
The only change is what's in the "if" clause of the SNP test.  dma_map_sg()
is called with the nents parameter set to one so that it only
processes one sgl entry each time it is called, and doesn't walk the
entire sgl.  Arguably, we don't even need the SNP test and the else
clause -- just always do what's in the if clause.

The corresponding code in storvsc_on_channel_callback would also
have to be changed.   And we still have to set the min_align_mask
so swiotlb will preserve any offset.  Storsvsc already has things set up
so that higher levels ensure there are no holes between sgl entries,
and that needs to stay true.

	if (sg_count) {
		unsigned int hvpgoff, hvpfns_to_add;
		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
		u64 hvpfn;
		int nents;

		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {

			payload_sz = (hvpg_count * sizeof(u64) +
				      sizeof(struct vmbus_packet_mpb_array));
			payload = kzalloc(payload_sz, GFP_ATOMIC);
			if (!payload)
				return SCSI_MLQUEUE_DEVICE_BUSY;
		}

		payload->range.len = length;
		payload->range.offset = offset_in_hvpg;


		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
			/*
			 * Init values for the current sgl entry. hvpgoff
			 * and hvpfns_to_add are in units of Hyper-V size
			 * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
			 * case also handles values of sgl->offset that are
			 * larger than PAGE_SIZE. Such offsets are handled
			 * even on other than the first sgl entry, provided
			 * they are a multiple of PAGE_SIZE.
			 */
			hvpgoff = HVPFN_DOWN(sgl->offset);

			if (hv_isolation_type_snp()) {
				nents = dma_map_sg(dev->device, sgl, 1, scmnd->sc_data_direction);
				if (nents != 1)
					<return error code so higher levels will retry>
				hvpfn = HVPFN_DOWN(sg_dma_address(sgl)) + hvpgoff;
			} else {
				hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
			}

			hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
						hvpgoff;

			/*
			 * Fill the next portion of the PFN array with
			 * sequential Hyper-V PFNs for the contiguous physical
			 * memory described by the sgl entry. The end of the
			 * last sgl should be reached at the same time that
			 * the PFN array is filled.
			 */
			while (hvpfns_to_add--)
				payload->range.pfn_array[i++] = hvpfn++;
		}
	}

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
  2021-08-09 17:56 ` [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page Tianyu Lan
  2021-08-13 19:31   ` Michael Kelley
@ 2021-08-24  8:45   ` Christoph Hellwig
  1 sibling, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2021-08-24  8:45 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, tglx, mingo, bp, x86,
	hpa, dave.hansen, luto, peterz, konrad.wilk, boris.ostrovsky,
	jgross, sstabellini, joro, will, davem, kuba, jejb,
	martin.petersen, arnd, hch, m.szyprowski, robin.murphy,
	thomas.lendacky, brijesh.singh, ardb, Tianyu.Lan, pgonda,
	martin.b.radev, akpm, kirill.shutemov, rppt, sfr, saravanand,
	krish.sadhukhan, aneesh.kumar, xen-devel, rientjes, hannes, tj,
	michael.h.kelley, iommu, linux-arch, linux-hyperv, linux-kernel,
	linux-scsi, netdev, vkuznets, parri.andrea, dave.hansen

This patch fails to compile when CONFIG_AMD_MEM_ENCRYPT is not enabled,
in which case there is sev_es_ghcb_hv_call_simple is not defined.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20 18:04         ` Tianyu Lan
  2021-08-20 19:22           ` Michael Kelley
@ 2021-08-24  8:46           ` hch
  1 sibling, 0 replies; 64+ messages in thread
From: hch @ 2021-08-24  8:46 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd, hch,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen

On Sat, Aug 21, 2021 at 02:04:11AM +0800, Tianyu Lan wrote:
> After dma_map_sg(), we still need to go through scatter list again to 
> populate payload->rrange.pfn_array. We may just go through the scatter list 
> just once if dma_map_sg() accepts a callback and run it during go
> through scatter list.

Iterating a cache hot array is way faster than doing lots of indirect
calls.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* min_align_mask Re: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
  2021-08-20 15:40       ` Michael Kelley
@ 2021-08-24  8:49         ` hch
  0 siblings, 0 replies; 64+ messages in thread
From: hch @ 2021-08-24  8:49 UTC (permalink / raw)
  To: Michael Kelley
  Cc: hch, Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, dave.hansen,
	luto, peterz, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
	joro, will, davem, kuba, jejb, martin.petersen, arnd,
	m.szyprowski, robin.murphy, thomas.lendacky, brijesh.singh, ardb,
	Tianyu Lan, pgonda, martin.b.radev, akpm, kirill.shutemov, rppt,
	sfr, saravanand, krish.sadhukhan, aneesh.kumar, xen-devel,
	rientjes, hannes, tj, iommu, linux-arch, linux-hyperv,
	linux-kernel, linux-scsi, netdev, vkuznets, parri.andrea,
	dave.hansen, linux-rdma, mpi3mr-linuxdrv.pdl

On Fri, Aug 20, 2021 at 03:40:08PM +0000, Michael Kelley wrote:
> I see that the swiotlb code gets and uses the min_align_mask field.  But
> the NVME driver is the only driver that ever sets it, so the value is zero
> in all other cases.  Does swiotlb just use PAGE_SIZE in that that case?  I
> couldn't tell from a quick glance at the swiotlb code.

The encoding isn't all that common.  I only know it for the RDMA memory
registration format, and RDMA in general does not interact very well
with swiotlb (although the in-kernel drivers should work fine, it is
userspace RDMA that is the problem).  It seems recently a new driver
using the format (mpi3mr) also showed up.  All these should probably set
the min_align_mask.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2021-08-24  8:49 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 17:56 [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM Tianyu Lan
2021-08-10 10:56   ` Wei Liu
2021-08-10 12:17     ` Tianyu Lan
2021-08-12 19:14   ` Michael Kelley
2021-08-13 15:46     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the " Tianyu Lan
2021-08-12 19:18   ` Michael Kelley
2021-08-14 13:32     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support Tianyu Lan
2021-08-09 22:12   ` Dave Hansen
2021-08-10 13:09     ` Tianyu Lan
2021-08-10 11:03   ` Wei Liu
2021-08-10 12:25     ` Tianyu Lan
2021-08-12 19:36   ` Michael Kelley
2021-08-12 21:10   ` Michael Kelley
2021-08-09 17:56 ` [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
2021-08-12 22:20   ` Michael Kelley
2021-08-15 15:21     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page Tianyu Lan
2021-08-13 19:31   ` Michael Kelley
2021-08-13 20:26     ` Michael Kelley
2021-08-24  8:45   ` Christoph Hellwig
2021-08-09 17:56 ` [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
2021-08-13 20:42   ` Michael Kelley
2021-08-09 17:56 ` [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
2021-08-13 21:28   ` Michael Kelley
2021-08-09 17:56 ` [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
2021-08-16 17:28   ` Michael Kelley
2021-08-17 15:36     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function Tianyu Lan
2021-08-12 12:26   ` Christoph Hellwig
2021-08-12 15:38     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
2021-08-12 12:27   ` Christoph Hellwig
2021-08-13 17:58     ` Tianyu Lan
2021-08-16 14:50       ` Tianyu Lan
2021-08-19  8:49         ` Christoph Hellwig
2021-08-19  9:59           ` Tianyu Lan
2021-08-19 10:02             ` Christoph Hellwig
2021-08-19 10:03               ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
2021-08-19 18:11   ` Michael Kelley
2021-08-20  4:13     ` hch
2021-08-20  9:32     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
2021-08-19 18:14   ` Michael Kelley
2021-08-20  4:21     ` hch
2021-08-20 13:11       ` Tianyu Lan
2021-08-20 13:30       ` Tom Lendacky
2021-08-20 18:20     ` Tianyu Lan
2021-08-09 17:56 ` [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
2021-08-19 18:17   ` Michael Kelley
2021-08-20  4:32     ` hch
2021-08-20 15:40       ` Michael Kelley
2021-08-24  8:49         ` min_align_mask " hch
2021-08-20 16:01       ` Tianyu Lan
2021-08-20 15:20     ` Tianyu Lan
2021-08-20 15:37       ` Tianyu Lan
2021-08-20 16:08       ` Michael Kelley
2021-08-20 18:04         ` Tianyu Lan
2021-08-20 19:22           ` Michael Kelley
2021-08-24  8:46           ` hch
2021-08-16 14:55 ` [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support Michael Kelley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).