All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support
@ 2021-02-28 15:03 Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl() Tianyu Lan
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp, x86, hpa,
	davem, kuba, gregkh, arnd, akpm, jejb, martin.petersen
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, linux-mm,
	linux-scsi, netdev, vkuznets, thomas.lendacky, brijesh.singh,
	sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Tianyu Lan (12):
  x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()
  x86/Hyper-V: Add new hvcall guest address host visibility support
  x86/HV: Initialize GHCB page and shared memory boundary
  HV: Add Write/Read MSR registers via ghcb
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  hv/vmbus: Initialize VMbus ring buffer for Isolation VM
  x86/Hyper-V: Initialize bounce buffer page cache and list
  x86/Hyper-V: Add new parameter for
    vmbus_sendpacket_pagebuffer()/mpb_desc()
  HV: Add bounce buffer support for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add bounce buffer support for Storvsc

 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  70 +++-
 arch/x86/hyperv/ivm.c              | 257 ++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  22 +
 arch/x86/include/asm/mshyperv.h    |  26 +-
 arch/x86/kernel/cpu/mshyperv.c     |   2 +
 drivers/hv/Makefile                |   2 +-
 drivers/hv/channel.c               | 103 ++++-
 drivers/hv/channel_mgmt.c          |  30 +-
 drivers/hv/connection.c            |  68 +++-
 drivers/hv/hv.c                    | 196 ++++++---
 drivers/hv/hv_bounce.c             | 619 +++++++++++++++++++++++++++++
 drivers/hv/hyperv_vmbus.h          |  42 ++
 drivers/hv/ring_buffer.c           |  83 +++-
 drivers/net/hyperv/hyperv_net.h    |   5 +
 drivers/net/hyperv/netvsc.c        | 111 +++++-
 drivers/scsi/storvsc_drv.c         |  46 ++-
 drivers/uio/uio_hv_generic.c       |  13 +-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  24 +-
 include/linux/hyperv.h             |  46 ++-
 mm/ioremap.c                       |   1 +
 mm/vmalloc.c                       |   1 +
 23 files changed, 1614 insertions(+), 156 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c
 create mode 100644 drivers/hv/hv_bounce.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-03-03 16:27   ` Vitaly Kuznetsov
  2021-02-28 15:03 ` [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support Tianyu Lan
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp, x86, hpa,
	davem, kuba, gregkh
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add visibility parameter for vmbus_establish_gpadl() and prepare
to change host visibility when create gpadl for buffer.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h |  9 +++++++++
 drivers/hv/channel.c               | 20 +++++++++++---------
 drivers/net/hyperv/netvsc.c        |  8 ++++++--
 drivers/uio/uio_hv_generic.c       |  7 +++++--
 include/linux/hyperv.h             |  3 ++-
 5 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index e6cd3fee562b..fb1893a4c32b 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -236,6 +236,15 @@ enum hv_isolation_type {
 /* TSC invariant control */
 #define HV_X64_MSR_TSC_INVARIANT_CONTROL	0x40000118
 
+/* Hyper-V GPA map flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE		0x0
+#define HV_MAP_GPA_READABLE			0x1
+#define HV_MAP_GPA_WRITABLE			0x2
+
+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
+#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 0bd202de7960..daa21cc72beb 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -242,7 +242,7 @@ EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
  */
 static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
 			       u32 size, u32 send_offset,
-			       struct vmbus_channel_msginfo **msginfo)
+			       struct vmbus_channel_msginfo **msginfo, u32 visibility)
 {
 	int i;
 	int pagecount;
@@ -391,7 +391,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
 static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 				   enum hv_gpadl_type type, void *kbuffer,
 				   u32 size, u32 send_offset,
-				   u32 *gpadl_handle)
+				   u32 *gpadl_handle, u32 visibility)
 {
 	struct vmbus_channel_gpadl_header *gpadlmsg;
 	struct vmbus_channel_gpadl_body *gpadl_body;
@@ -405,7 +405,8 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	next_gpadl_handle =
 		(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
 
-	ret = create_gpadl_header(type, kbuffer, size, send_offset, &msginfo);
+	ret = create_gpadl_header(type, kbuffer, size, send_offset,
+				  &msginfo, visibility);
 	if (ret)
 		return ret;
 
@@ -496,10 +497,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
  * @gpadl_handle: some funky thing
  */
 int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
-			  u32 size, u32 *gpadl_handle)
+			  u32 size, u32 *gpadl_handle, u32 visibility)
 {
 	return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
-				       0U, gpadl_handle);
+				       0U, gpadl_handle, visibility);
 }
 EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
 
@@ -610,10 +611,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	newchannel->ringbuffer_gpadlhandle = 0;
 
 	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
-				      page_address(newchannel->ringbuffer_page),
-				      (send_pages + recv_pages) << PAGE_SHIFT,
-				      newchannel->ringbuffer_send_offset << PAGE_SHIFT,
-				      &newchannel->ringbuffer_gpadlhandle);
+			page_address(newchannel->ringbuffer_page),
+			(send_pages + recv_pages) << PAGE_SHIFT,
+			newchannel->ringbuffer_send_offset << PAGE_SHIFT,
+			&newchannel->ringbuffer_gpadlhandle,
+			VMBUS_PAGE_VISIBLE_READ_WRITE);
 	if (err)
 		goto error_clean_ring;
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2353623259f3..bb72c7578330 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -333,7 +333,8 @@ static int netvsc_init_buf(struct hv_device *device,
 	 */
 	ret = vmbus_establish_gpadl(device->channel, net_device->recv_buf,
 				    buf_size,
-				    &net_device->recv_buf_gpadl_handle);
+				    &net_device->recv_buf_gpadl_handle,
+				    VMBUS_PAGE_VISIBLE_READ_WRITE);
 	if (ret != 0) {
 		netdev_err(ndev,
 			"unable to establish receive buffer's gpadl\n");
@@ -422,10 +423,13 @@ static int netvsc_init_buf(struct hv_device *device,
 	/* Establish the gpadl handle for this buffer on this
 	 * channel.  Note: This call uses the vmbus connection rather
 	 * than the channel to establish the gpadl handle.
+	 * Send buffer should theoretically be only marked as "read-only", but
+	 * the netvsp for some reason needs write capabilities on it.
 	 */
 	ret = vmbus_establish_gpadl(device->channel, net_device->send_buf,
 				    buf_size,
-				    &net_device->send_buf_gpadl_handle);
+				    &net_device->send_buf_gpadl_handle,
+				    VMBUS_PAGE_VISIBLE_READ_WRITE);
 	if (ret != 0) {
 		netdev_err(ndev,
 			   "unable to establish send buffer's gpadl\n");
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0330ba99730e..813a7bee5139 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -29,6 +29,7 @@
 #include <linux/hyperv.h>
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
+#include <asm/mshyperv.h>
 
 #include "../hv/hyperv_vmbus.h"
 
@@ -295,7 +296,8 @@ hv_uio_probe(struct hv_device *dev,
 	}
 
 	ret = vmbus_establish_gpadl(channel, pdata->recv_buf,
-				    RECV_BUFFER_SIZE, &pdata->recv_gpadl);
+				    RECV_BUFFER_SIZE, &pdata->recv_gpadl,
+				    VMBUS_PAGE_VISIBLE_READ_WRITE);
 	if (ret)
 		goto fail_close;
 
@@ -315,7 +317,8 @@ hv_uio_probe(struct hv_device *dev,
 	}
 
 	ret = vmbus_establish_gpadl(channel, pdata->send_buf,
-				    SEND_BUFFER_SIZE, &pdata->send_gpadl);
+				    SEND_BUFFER_SIZE, &pdata->send_gpadl,
+				    VMBUS_PAGE_VISIBLE_READ_ONLY);
 	if (ret)
 		goto fail_close;
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index f1d74dcf0353..016fdca20d6e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1179,7 +1179,8 @@ extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
 				      void *kbuffer,
 				      u32 size,
-				      u32 *gpadl_handle);
+				      u32 *gpadl_handle,
+				      u32 visibility);
 
 extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
 				     u32 gpadl_handle);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl() Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-03-03 16:58   ` Vitaly Kuznetsov
  2021-02-28 15:03 ` [RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary Tianyu Lan
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp, x86, hpa,
	davem, kuba, gregkh, arnd
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, linux-arch,
	vkuznets, thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add new hvcall guest address host visibility support. Mark vmbus
ring buffer visible to host when create gpadl buffer and mark back
to not visible when tear down gpadl buffer.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 13 ++++++++
 arch/x86/include/asm/mshyperv.h    |  4 +--
 arch/x86/kernel/cpu/mshyperv.c     | 46 ++++++++++++++++++++++++++
 drivers/hv/channel.c               | 53 ++++++++++++++++++++++++++++--
 drivers/net/hyperv/hyperv_net.h    |  1 +
 drivers/net/hyperv/netvsc.c        |  9 +++--
 drivers/uio/uio_hv_generic.c       |  6 ++--
 include/asm-generic/hyperv-tlfs.h  |  1 +
 include/linux/hyperv.h             |  3 +-
 9 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index fb1893a4c32b..d22b1c3f425a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -573,4 +573,17 @@ enum hv_interrupt_type {
 
 #include <asm-generic/hyperv-tlfs.h>
 
+/* All input parameters should be in single page. */
+#define HV_MAX_MODIFY_GPA_REP_COUNT		\
+	((PAGE_SIZE - 2 * sizeof(u64)) / (sizeof(u64)))
+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+struct hv_input_modify_sparse_gpa_page_host_visibility {
+	u64 partition_id;
+	u32 host_visibility:2;
+	u32 reserved0:30;
+	u32 reserved1;
+	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
 #endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccf60a809a17..1e8275d35c1f 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -262,13 +262,13 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
 	msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
 	msi_entry->data.as_uint32 = msi_desc->msg.data;
 }
-
 struct irq_domain *hv_create_pci_msi_domain(void);
 
 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 		struct hv_interrupt_entry *entry);
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e88bc296afca..347c32eac8fd 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,6 +37,8 @@
 bool hv_root_partition;
 EXPORT_SYMBOL_GPL(hv_root_partition);
 
+#define HV_PARTITION_ID_SELF ((u64)-1)
+
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
@@ -477,3 +479,47 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
 	.init.init_platform	= ms_hyperv_init_platform,
 };
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+	u16 pages_processed;
+	u64 hv_status;
+	unsigned long flags;
+
+	/* no-op if partition isolation is not enabled */
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+			HV_MAX_MODIFY_GPA_REP_COUNT);
+		return -EINVAL;
+	}
+
+	local_irq_save(flags);
+	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+			this_cpu_ptr(hyperv_pcpu_input_arg);
+	input = *input_pcpu;
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
+		return -1;
+	}
+
+	input->partition_id = HV_PARTITION_ID_SELF;
+	input->host_visibility = visibility;
+	input->reserved0 = 0;
+	input->reserved1 = 0;
+	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+	hv_status = hv_do_rep_hypercall(
+			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+			0, input, &pages_processed);
+	local_irq_restore(flags);
+
+	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+		return 0;
+
+	return -EFAULT;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index daa21cc72beb..204e6f3598a5 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -237,6 +237,38 @@ int vmbus_send_modifychannel(u32 child_relid, u32 target_vp)
 }
 EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
 
+/*
+ * hv_set_mem_host_visibility - Set host visibility for specified memory.
+ */
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
+{
+	int i, pfn;
+	int pagecount = size >> HV_HYP_PAGE_SHIFT;
+	u64 *pfn_array;
+	int ret = 0;
+
+	if (!hv_isolation_type_snp())
+		return 0;
+
+	pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
+	if (!pfn_array)
+		return -ENOMEM;
+
+	for (i = 0, pfn = 0; i < pagecount; i++) {
+		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+		pfn++;
+
+		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+			ret |= hv_mark_gpa_visibility(pfn, pfn_array, visibility);
+			pfn = 0;
+		}
+	}
+
+	vfree(pfn_array);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(hv_set_mem_host_visibility);
+
 /*
  * create_gpadl_header - Creates a gpadl for the specified buffer
  */
@@ -410,6 +442,12 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 	if (ret)
 		return ret;
 
+	ret = hv_set_mem_host_visibility(kbuffer, size, visibility);
+	if (ret) {
+		pr_warn("Failed to set host visibility.\n");
+		return ret;
+	}
+
 	init_completion(&msginfo->waitevent);
 	msginfo->waiting_channel = channel;
 
@@ -693,7 +731,9 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 error_free_info:
 	kfree(open_info);
 error_free_gpadl:
-	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
+	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle,
+			     page_address(newchannel->ringbuffer_page),
+			     newchannel->ringbuffer_pagecount << PAGE_SHIFT);
 	newchannel->ringbuffer_gpadlhandle = 0;
 error_clean_ring:
 	hv_ringbuffer_cleanup(&newchannel->outbound);
@@ -740,7 +780,8 @@ EXPORT_SYMBOL_GPL(vmbus_open);
 /*
  * vmbus_teardown_gpadl -Teardown the specified GPADL handle
  */
-int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
+int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle,
+			 void *kbuffer, u32 size)
 {
 	struct vmbus_channel_gpadl_teardown *msg;
 	struct vmbus_channel_msginfo *info;
@@ -793,6 +834,10 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
 
 	kfree(info);
+
+	if (hv_set_mem_host_visibility(kbuffer, size, VMBUS_PAGE_NOT_VISIBLE))
+		pr_warn("Fail to set mem host visibility.\n");
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
@@ -869,7 +914,9 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
 	/* Tear down the gpadl for the channel's ring buffer */
 	else if (channel->ringbuffer_gpadlhandle) {
 		ret = vmbus_teardown_gpadl(channel,
-					   channel->ringbuffer_gpadlhandle);
+					   channel->ringbuffer_gpadlhandle,
+					   page_address(channel->ringbuffer_page),
+					   channel->ringbuffer_pagecount << PAGE_SHIFT);
 		if (ret) {
 			pr_err("Close failed: teardown gpadl return %d\n", ret);
 			/*
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 2a87cfa27ac0..b3a43c4ec8ab 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1034,6 +1034,7 @@ struct netvsc_device {
 
 	/* Send buffer allocated by us */
 	void *send_buf;
+	u32 send_buf_size;
 	u32 send_buf_gpadl_handle;
 	u32 send_section_cnt;
 	u32 send_section_size;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index bb72c7578330..08d73401bb28 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -245,7 +245,9 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
 
 	if (net_device->recv_buf_gpadl_handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
-					   net_device->recv_buf_gpadl_handle);
+					   net_device->recv_buf_gpadl_handle,
+					   net_device->recv_buf,
+					   net_device->recv_buf_size);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -267,7 +269,9 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
 
 	if (net_device->send_buf_gpadl_handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
-					   net_device->send_buf_gpadl_handle);
+					   net_device->send_buf_gpadl_handle,
+					   net_device->send_buf,
+					   net_device->send_buf_size);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -419,6 +423,7 @@ static int netvsc_init_buf(struct hv_device *device,
 		ret = -ENOMEM;
 		goto cleanup;
 	}
+	net_device->send_buf_size = buf_size;
 
 	/* Establish the gpadl handle for this buffer on this
 	 * channel.  Note: This call uses the vmbus connection rather
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 813a7bee5139..c8d4704fc90c 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -181,13 +181,15 @@ static void
 hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
 {
 	if (pdata->send_gpadl) {
-		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl);
+		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl,
+				     pdata->send_buf, SEND_BUFFER_SIZE);
 		pdata->send_gpadl = 0;
 		vfree(pdata->send_buf);
 	}
 
 	if (pdata->recv_gpadl) {
-		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl);
+		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl,
+				     pdata->recv_buf, RECV_BUFFER_SIZE);
 		pdata->recv_gpadl = 0;
 		vfree(pdata->recv_buf);
 	}
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 83448e837ded..ad19f4199f90 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
 #define HVCALL_RETARGET_INTERRUPT		0x007e
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
+#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
 
 #define HV_FLUSH_ALL_PROCESSORS			BIT(0)
 #define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	BIT(1)
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 016fdca20d6e..41cbaa2db567 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1183,7 +1183,8 @@ extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
 				      u32 visibility);
 
 extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
-				     u32 gpadl_handle);
+				u32 gpadl_handle,
+				void *kbuffer, u32 size);
 
 void vmbus_reset_channel_cb(struct vmbus_channel *channel);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl() Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-03-03 17:05   ` Vitaly Kuznetsov
  2021-02-28 15:03 ` [RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb Tianyu Lan
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp, x86, hpa, arnd
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB. Hyper-V also exposes shared memory boundary via
cpuid HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/hv_init.c      | 52 +++++++++++++++++++++++++++++++---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 14 ++++++++-
 3 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0db5137d5b81..90e65fbf4c58 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -82,6 +82,9 @@ static int hv_cpu_init(unsigned int cpu)
 	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
 	void **input_arg;
 	struct page *pg;
+	u64 ghcb_gpa;
+	void *ghcb_va;
+	void **ghcb_base;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
 	pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
@@ -128,6 +131,17 @@ static int hv_cpu_init(unsigned int cpu)
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
 	}
 
+	if (ms_hyperv.ghcb_base) {
+		rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+
+		ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+		if (!ghcb_va)
+			return -ENOMEM;
+
+		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+		*ghcb_base = ghcb_va;
+	}
+
 	return 0;
 }
 
@@ -223,6 +237,7 @@ static int hv_cpu_die(unsigned int cpu)
 	unsigned long flags;
 	void **input_arg;
 	void *pg;
+	void **ghcb_va = NULL;
 
 	local_irq_save(flags);
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -236,6 +251,13 @@ static int hv_cpu_die(unsigned int cpu)
 		*output_arg = NULL;
 	}
 
+	if (ms_hyperv.ghcb_base) {
+		ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+		if (*ghcb_va)
+			iounmap(*ghcb_va);
+		*ghcb_va = NULL;
+	}
+
 	local_irq_restore(flags);
 
 	free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
@@ -372,6 +394,9 @@ void __init hyperv_init(void)
 	u64 guest_id, required_msrs;
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 	int cpuhp, i;
+	u64 ghcb_gpa;
+	void *ghcb_va;
+	void **ghcb_base;
 
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return;
@@ -432,9 +457,24 @@ void __init hyperv_init(void)
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
 			__builtin_return_address(0));
-	if (hv_hypercall_pg == NULL) {
-		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-		goto remove_cpuhp_state;
+	if (hv_hypercall_pg == NULL)
+		goto clean_guest_os_id;
+
+	if (hv_isolation_type_snp()) {
+		ms_hyperv.ghcb_base = alloc_percpu(void *);
+		if (!ms_hyperv.ghcb_base)
+			goto clean_guest_os_id;
+
+		rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+		ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+		if (!ghcb_va) {
+			free_percpu(ms_hyperv.ghcb_base);
+			ms_hyperv.ghcb_base = NULL;
+			goto clean_guest_os_id;
+		}
+
+		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+		*ghcb_base = ghcb_va;
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -499,7 +539,8 @@ void __init hyperv_init(void)
 
 	return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 	cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
 	kfree(hv_vp_assist_page);
@@ -528,6 +569,9 @@ void hyperv_cleanup(void)
 	 */
 	hv_hypercall_pg = NULL;
 
+	if (ms_hyperv.ghcb_base)
+		free_percpu(ms_hyperv.ghcb_base);
+
 	/* Reset the hypercall page */
 	hypercall_msr.as_uint64 = 0;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 347c32eac8fd..d6c363456cbf 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -330,6 +330,8 @@ static void __init ms_hyperv_init_platform(void)
 	if (ms_hyperv.features_b & HV_ISOLATION) {
 		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
 		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+		ms_hyperv.shared_gpa_boundary =
+			(u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index dff58a3db5d5..ad0e33776668 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -34,7 +34,19 @@ struct ms_hyperv_info {
 	u32 max_vp_index;
 	u32 max_lp_index;
 	u32 isolation_config_a;
-	u32 isolation_config_b;
+	union
+	{
+		u32 isolation_config_b;;
+		struct {
+			u32 cvm_type : 4;
+			u32 Reserved11 : 1;
+			u32 shared_gpa_boundary_active : 1;
+			u32 shared_gpa_boundary_bits : 6;
+			u32 Reserved12 : 20;
+		};
+	};
+	void  __percpu **ghcb_base;
+	u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 4/12]  HV: Add Write/Read MSR registers via ghcb
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (2 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-03-03 17:16   ` Vitaly Kuznetsov
  2021-02-28 15:03 ` [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp, x86, hpa, arnd
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers and these registers are emulated by
Hypervisor rather than paravisor.

Hyper-V requests to write SINTx MSR registers twice(once via
GHCB and once via wrmsr instruction including the proxy bit 21)
Guest OS ID MSR also needs to be set via GHCB.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/Makefile        |   2 +-
 arch/x86/hyperv/hv_init.c       |  18 +--
 arch/x86/hyperv/ivm.c           | 178 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h |  21 +++-
 arch/x86/kernel/cpu/mshyperv.c  |  46 --------
 drivers/hv/channel.c            |   2 +-
 drivers/hv/hv.c                 | 188 ++++++++++++++++++++++----------
 include/asm-generic/mshyperv.h  |  10 +-
 8 files changed, 343 insertions(+), 122 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
+obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 90e65fbf4c58..87b1dd9c84d6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -475,6 +475,9 @@ void __init hyperv_init(void)
 
 		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
 		*ghcb_base = ghcb_va;
+
+		/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+		hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 	}
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -561,6 +564,7 @@ void hyperv_cleanup(void)
 
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
 	/*
 	 * Reset hypercall page reference before reset the page,
@@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
 	return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
 }
 EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-	if (!(ms_hyperv.features_b & HV_ISOLATION))
-		return HV_ISOLATION_TYPE_NONE;
-	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index 000000000000..4332bf7aaf9b
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan <Tianyu.Lan@microsoft.com>
+ */
+#include <linux/types.h>
+#include <linux/bitfield.h>
+#include <asm/io.h>
+#include <asm/svm.h>
+#include <asm/sev-es.h>
+#include <asm/mshyperv.h>
+
+union hv_ghcb {
+	struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!ms_hyperv.ghcb_base)
+		return;
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+	hv_ghcb->ghcb.protocol_version = 1;
+	hv_ghcb->ghcb.ghcb_usage = 0;
+
+	ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
+	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
+	ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
+	ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1);
+	ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
+
+	VMGEXIT();
+
+	if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1)
+		pr_warn("Fail to write msr via ghcb.\n.");
+
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!ms_hyperv.ghcb_base)
+		return;
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	memset(hv_ghcb, 0x00, PAGE_SIZE);
+	hv_ghcb->ghcb.protocol_version = 1;
+	hv_ghcb->ghcb.ghcb_usage = 0;
+
+	ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
+	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+	ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 0);
+	ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
+
+	VMGEXIT();
+
+	if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1)
+		pr_warn("Fail to write msr via ghcb.\n.");
+	else
+		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+	hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
+{
+	hv_ghcb_msr_write(msr, value);
+
+	/* Write proxy bit vua wrmsrl instruction. */
+	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
+		wrmsrl(msr, value | 1 << 20);
+}
+EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
+
+inline void hv_signal_eom_ghcb(void)
+{
+	hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
+}
+EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);
+
+enum hv_isolation_type hv_get_isolation_type(void)
+{
+	if (!(ms_hyperv.features_b & HV_ISOLATION))
+		return HV_ISOLATION_TYPE_NONE;
+	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
+}
+EXPORT_SYMBOL_GPL(hv_get_isolation_type);
+
+bool hv_is_isolation_supported(void)
+{
+	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
+}
+EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
+
+bool hv_isolation_type_snp(void)
+{
+	return hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+	u16 pages_processed;
+	u64 hv_status;
+	unsigned long flags;
+
+	/* no-op if partition isolation is not enabled */
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+			HV_MAX_MODIFY_GPA_REP_COUNT);
+		return -EINVAL;
+	}
+
+	local_irq_save(flags);
+	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+			this_cpu_ptr(hyperv_pcpu_input_arg);
+	input = *input_pcpu;
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
+		return -1;
+	}
+
+	input->partition_id = HV_PARTITION_ID_SELF;
+	input->host_visibility = visibility;
+	input->reserved0 = 0;
+	input->reserved1 = 0;
+	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+	hv_status = hv_do_rep_hypercall(
+			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+			0, input, &pages_processed);
+	local_irq_restore(flags);
+
+	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+		return 0;
+
+	return -EFAULT;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 1e8275d35c1f..f624d72b99d3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -269,6 +269,25 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
+void hv_signal_eom_ghcb(void);
+void hv_ghcb_msr_write(u64 msr, u64 value);
+void hv_ghcb_msr_read(u64 msr, u64 *value);
+
+#define hv_get_synint_state_ghcb(int_num, val)			\
+	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+#define hv_set_synint_state_ghcb(int_num, val) \
+	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+
+#define hv_get_simp_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
+#define hv_set_simp_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
+
+#define hv_get_siefp_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+#define hv_set_siefp_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+
+#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
+#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
@@ -287,9 +306,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
 {
 	return -1;
 }
+static inline void hv_signal_eom_ghcb(void) { };
 #endif /* CONFIG_HYPERV */
 
-
 #include <asm-generic/mshyperv.h>
 
 #endif
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index d6c363456cbf..aeafd4017c89 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,8 +37,6 @@
 bool hv_root_partition;
 EXPORT_SYMBOL_GPL(hv_root_partition);
 
-#define HV_PARTITION_ID_SELF ((u64)-1)
-
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
@@ -481,47 +479,3 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
 	.init.init_platform	= ms_hyperv_init_platform,
 };
-
-int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
-{
-	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
-	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
-	u16 pages_processed;
-	u64 hv_status;
-	unsigned long flags;
-
-	/* no-op if partition isolation is not enabled */
-	if (!hv_is_isolation_supported())
-		return 0;
-
-	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
-		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
-			HV_MAX_MODIFY_GPA_REP_COUNT);
-		return -EINVAL;
-	}
-
-	local_irq_save(flags);
-	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
-			this_cpu_ptr(hyperv_pcpu_input_arg);
-	input = *input_pcpu;
-	if (unlikely(!input)) {
-		local_irq_restore(flags);
-		return -1;
-	}
-
-	input->partition_id = HV_PARTITION_ID_SELF;
-	input->host_visibility = visibility;
-	input->reserved0 = 0;
-	input->reserved1 = 0;
-	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
-	hv_status = hv_do_rep_hypercall(
-			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
-			0, input, &pages_processed);
-	local_irq_restore(flags);
-
-	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
-		return 0;
-
-	return -EFAULT;
-}
-EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 204e6f3598a5..f31b669a1ddf 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -247,7 +247,7 @@ int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
 	u64 *pfn_array;
 	int ret = 0;
 
-	if (!hv_isolation_type_snp())
+	if (!hv_is_isolation_supported())
 		return 0;
 
 	pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index f202ac7f4b3d..28e28ccc2081 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -99,17 +99,24 @@ int hv_synic_alloc(void)
 		tasklet_init(&hv_cpu->msg_dpc,
 			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
 
-		hv_cpu->synic_message_page =
-			(void *)get_zeroed_page(GFP_ATOMIC);
-		if (hv_cpu->synic_message_page == NULL) {
-			pr_err("Unable to allocate SYNIC message page\n");
-			goto err;
-		}
+		/*
+		 * Synic message and event pages are allocated by paravisor.
+		 * Skip these pages allocation here.
+		 */
+		if (!hv_isolation_type_snp()) {
+			hv_cpu->synic_message_page =
+				(void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->synic_message_page == NULL) {
+				pr_err("Unable to allocate SYNIC message page\n");
+				goto err;
+			}
 
-		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
-		if (hv_cpu->synic_event_page == NULL) {
-			pr_err("Unable to allocate SYNIC event page\n");
-			goto err;
+			hv_cpu->synic_event_page =
+				(void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->synic_event_page == NULL) {
+				pr_err("Unable to allocate SYNIC event page\n");
+				goto err;
+			}
 		}
 
 		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
@@ -136,10 +143,17 @@ void hv_synic_free(void)
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
+		free_page((unsigned long)hv_cpu->post_msg_page);
+
+		/*
+		 * Synic message and event pages are allocated by paravisor.
+		 * Skip free these pages here.
+		 */
+		if (hv_isolation_type_snp())
+			continue;
 
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
-		free_page((unsigned long)hv_cpu->post_msg_page);
 	}
 
 	kfree(hv_context.hv_numa_map);
@@ -161,35 +175,72 @@ void hv_synic_enable_regs(unsigned int cpu)
 	union hv_synic_sint shared_sint;
 	union hv_synic_scontrol sctrl;
 
-	/* Setup the Synic's message page */
-	hv_get_simp(simp.as_uint64);
-	simp.simp_enabled = 1;
-	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
-		>> HV_HYP_PAGE_SHIFT;
-
-	hv_set_simp(simp.as_uint64);
-
-	/* Setup the Synic's event page */
-	hv_get_siefp(siefp.as_uint64);
-	siefp.siefp_enabled = 1;
-	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
-		>> HV_HYP_PAGE_SHIFT;
-
-	hv_set_siefp(siefp.as_uint64);
-
-	/* Setup the shared SINT. */
-	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-
-	shared_sint.vector = hv_get_vector();
-	shared_sint.masked = false;
-	shared_sint.auto_eoi = hv_recommend_using_aeoi();
-	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-
-	/* Enable the global synic bit */
-	hv_get_synic_state(sctrl.as_uint64);
-	sctrl.enable = 1;
-
-	hv_set_synic_state(sctrl.as_uint64);
+	/*
+	 * Setup Synic pages for CVM. Synic message and event page
+	 * are allocated by paravisor in the SNP CVM.
+	 */
+	if (hv_isolation_type_snp()) {
+		/* Setup the Synic's message. */
+		hv_get_simp_ghcb(&simp.as_uint64);
+		simp.simp_enabled = 1;
+		hv_cpu->synic_message_page
+			= ioremap_cache(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
+					PAGE_SIZE);
+		if (!hv_cpu->synic_message_page)
+			pr_warn("Fail to map syinc message page.\n");
+
+		hv_set_simp_ghcb(simp.as_uint64);
+
+		/* Setup the Synic's event page */
+		hv_get_siefp_ghcb(&siefp.as_uint64);
+		siefp.siefp_enabled = 1;
+		hv_cpu->synic_event_page = ioremap_cache(
+			 siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT, PAGE_SIZE);
+		if (!hv_cpu->synic_event_page)
+			pr_warn("Fail to map syinc event page.\n");
+		hv_set_siefp_ghcb(siefp.as_uint64);
+
+		/* Setup the shared SINT. */
+		hv_get_synint_state_ghcb(VMBUS_MESSAGE_SINT,
+					 &shared_sint.as_uint64);
+		shared_sint.vector = hv_get_vector();
+		shared_sint.masked = false;
+		shared_sint.auto_eoi = hv_recommend_using_aeoi();
+		hv_set_synint_state_ghcb(VMBUS_MESSAGE_SINT,
+					 shared_sint.as_uint64);
+
+		/* Enable the global synic bit */
+		hv_get_synic_state_ghcb(&sctrl.as_uint64);
+		sctrl.enable = 1;
+		hv_set_synic_state_ghcb(sctrl.as_uint64);
+	} else {
+		/* Setup the Synic's message. */
+		hv_get_simp(simp.as_uint64);
+		simp.simp_enabled = 1;
+		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+			>> HV_HYP_PAGE_SHIFT;
+		hv_set_simp(simp.as_uint64);
+
+		/* Setup the Synic's event page */
+		hv_get_siefp(siefp.as_uint64);
+		siefp.siefp_enabled = 1;
+		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+			>> HV_HYP_PAGE_SHIFT;
+		hv_set_siefp(siefp.as_uint64);
+
+		/* Setup the shared SINT. */
+		hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+		shared_sint.vector = hv_get_vector();
+		shared_sint.masked = false;
+		shared_sint.auto_eoi = hv_recommend_using_aeoi();
+		hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+		/* Enable the global synic bit */
+		hv_get_synic_state(sctrl.as_uint64);
+		sctrl.enable = 1;
+		hv_set_synic_state(sctrl.as_uint64);
+	}
 }
 
 int hv_synic_init(unsigned int cpu)
@@ -211,30 +262,53 @@ void hv_synic_disable_regs(unsigned int cpu)
 	union hv_synic_siefp siefp;
 	union hv_synic_scontrol sctrl;
 
-	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+	if (hv_isolation_type_snp()) {
+		hv_get_synint_state_ghcb(VMBUS_MESSAGE_SINT,
+					 &shared_sint.as_uint64);
+		shared_sint.masked = 1;
+		hv_set_synint_state_ghcb(VMBUS_MESSAGE_SINT,
+					 shared_sint.as_uint64);
+
+		hv_get_simp_ghcb(&simp.as_uint64);
+		simp.simp_enabled = 0;
+		simp.base_simp_gpa = 0;
+		hv_set_simp_ghcb(simp.as_uint64);
+
+		hv_get_siefp_ghcb(&siefp.as_uint64);
+		siefp.siefp_enabled = 0;
+		siefp.base_siefp_gpa = 0;
+		hv_set_siefp_ghcb(siefp.as_uint64);
 
-	shared_sint.masked = 1;
+		/* Disable the global synic bit */
+		hv_get_synic_state_ghcb(&sctrl.as_uint64);
+		sctrl.enable = 0;
+		hv_set_synic_state_ghcb(sctrl.as_uint64);
+	} else {
+		hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 
-	/* Need to correctly cleanup in the case of SMP!!! */
-	/* Disable the interrupt */
-	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+		shared_sint.masked = 1;
 
-	hv_get_simp(simp.as_uint64);
-	simp.simp_enabled = 0;
-	simp.base_simp_gpa = 0;
+		/* Need to correctly cleanup in the case of SMP!!! */
+		/* Disable the interrupt */
+		hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
 
-	hv_set_simp(simp.as_uint64);
+		hv_get_simp(simp.as_uint64);
+		simp.simp_enabled = 0;
+		simp.base_simp_gpa = 0;
 
-	hv_get_siefp(siefp.as_uint64);
-	siefp.siefp_enabled = 0;
-	siefp.base_siefp_gpa = 0;
+		hv_set_simp(simp.as_uint64);
 
-	hv_set_siefp(siefp.as_uint64);
+		hv_get_siefp(siefp.as_uint64);
+		siefp.siefp_enabled = 0;
+		siefp.base_siefp_gpa = 0;
 
-	/* Disable the global synic bit */
-	hv_get_synic_state(sctrl.as_uint64);
-	sctrl.enable = 0;
-	hv_set_synic_state(sctrl.as_uint64);
+		hv_set_siefp(siefp.as_uint64);
+
+		/* Disable the global synic bit */
+		hv_get_synic_state(sctrl.as_uint64);
+		sctrl.enable = 0;
+		hv_set_synic_state(sctrl.as_uint64);
+	}
 }
 
 int hv_synic_cleanup(unsigned int cpu)
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index ad0e33776668..6727f4073b5a 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -23,6 +23,7 @@
 #include <linux/bitops.h>
 #include <linux/cpumask.h>
 #include <asm/ptrace.h>
+#include <asm/mshyperv.h>
 #include <asm/hyperv-tlfs.h>
 
 struct ms_hyperv_info {
@@ -52,7 +53,7 @@ extern struct ms_hyperv_info ms_hyperv;
 
 extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
-
+extern bool hv_isolation_type_snp(void);
 
 /* Generate the guest OS identifier as described in the Hyper-V TLFS */
 static inline  __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
@@ -100,7 +101,11 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 		 * possibly deliver another msg from the
 		 * hypervisor
 		 */
-		hv_signal_eom();
+		if (hv_isolation_type_snp() &&
+		    old_msg_type != HVMSG_TIMER_EXPIRED)
+			hv_signal_eom_ghcb();
+		else
+			hv_signal_eom();
 	}
 }
 
@@ -186,6 +191,7 @@ bool hv_is_hyperv_initialized(void);
 bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool hv_is_isolation_supported(void);
+bool hv_isolation_type_snp(void);
 void hyperv_cleanup(void);
 #else /* CONFIG_HYPERV */
 static inline bool hv_is_hyperv_initialized(void) { return false; }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (3 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-03-03 17:21   ` Vitaly Kuznetsov
  2021-02-28 15:03 ` [RFC PATCH 6/12] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp, x86, hpa
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/ivm.c           | 69 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c         |  6 ++-
 drivers/hv/hv.c                 |  8 +++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 4332bf7aaf9b..feaabcd151f5 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -14,8 +14,77 @@
 
 union hv_ghcb {
 	struct ghcb ghcb;
+	struct {
+		u64 hypercalldata[509];
+		u64 outputgpa;
+		union {
+			union {
+				struct {
+					u32 callcode        : 16;
+					u32 isfast          : 1;
+					u32 reserved1       : 14;
+					u32 isnested        : 1;
+					u32 countofelements : 12;
+					u32 reserved2       : 4;
+					u32 repstartindex   : 12;
+					u32 reserved3       : 4;
+				};
+				u64 asuint64;
+			} hypercallinput;
+			union {
+				struct {
+					u16 callstatus;
+					u16 reserved1;
+					u32 elementsprocessed : 12;
+					u32 reserved2         : 20;
+				};
+				u64 asunit64;
+			} hypercalloutput;
+		};
+		u64 reserved2;
+	} hypercall;
 } __packed __aligned(PAGE_SIZE);
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+	union hv_ghcb *hv_ghcb;
+	void **ghcb_base;
+	unsigned long flags;
+
+	if (!ms_hyperv.ghcb_base)
+		return -EFAULT;
+
+	local_irq_save(flags);
+	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+	hv_ghcb = (union hv_ghcb *)*ghcb_base;
+	if (!hv_ghcb) {
+		local_irq_restore(flags);
+		return -EFAULT;
+	}
+
+	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+	hv_ghcb->ghcb.protocol_version = 1;
+	hv_ghcb->ghcb.ghcb_usage = 1;
+
+	hv_ghcb->hypercall.outputgpa = (u64)output;
+	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+	hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+	if (input_size)
+		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+	VMGEXIT();
+
+	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
+	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+	local_irq_restore(flags);
+
+	return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
 	union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f624d72b99d3..c8f66d269e5b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -274,6 +274,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val)			\
 	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c83612cddb99..79bca653dce9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
 	++channel->sig_events;
 
-	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+	if (hv_isolation_type_snp())
+		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
+				NULL, sizeof(u64));
+	else
+		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 28e28ccc2081..6c64a7fd1ebd 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,13 @@ int hv_post_message(union hv_connection_id connection_id,
 	aligned_msg->payload_size = payload_size;
 	memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-	status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+	if (hv_isolation_type_snp())
+		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+				(void *)aligned_msg, NULL,
+				sizeof(struct hv_input_post_message));
+	else
+		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+				aligned_msg, NULL);
 
 	/* Preemption must remain disabled until after the hypercall
 	 * so some other thread can't get scheduled onto this cpu and
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 6/12] HV/Vmbus: Add SNP support for VMbus channel initiate message
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (4 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 7/12] hv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
msg should be in the extra address space for SNP support and these
pages also should be accessed via the extra address space inside Linux
guest and remap the extra address by ioremap function.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/connection.c   | 62 +++++++++++++++++++++++++++++++++++++++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 63 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 79bca653dce9..a0be9c11d737 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 
 	msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
 	msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+	if (hv_isolation_type_snp()) {
+		msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+		msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+	}
+
 	msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
 	/*
@@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 		return -ECONNREFUSED;
 	}
 
+	if (hv_isolation_type_snp()) {
+		vmbus_connection.monitor_pages_va[0]
+			= vmbus_connection.monitor_pages[0];
+		vmbus_connection.monitor_pages[0]
+			= ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
+		if (!vmbus_connection.monitor_pages[0])
+			return -ENOMEM;
+
+		vmbus_connection.monitor_pages_va[1]
+			= vmbus_connection.monitor_pages[1];
+		vmbus_connection.monitor_pages[1]
+			= ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
+		if (!vmbus_connection.monitor_pages[1]) {
+			vunmap(vmbus_connection.monitor_pages[0]);
+			return -ENOMEM;
+		}
+
+		memset(vmbus_connection.monitor_pages[0], 0x00,
+		       HV_HYP_PAGE_SIZE);
+		memset(vmbus_connection.monitor_pages[1], 0x00,
+		       HV_HYP_PAGE_SIZE);
+	}
+
 	return ret;
 }
 
@@ -156,6 +185,7 @@ int vmbus_connect(void)
 	struct vmbus_channel_msginfo *msginfo = NULL;
 	int i, ret = 0;
 	__u32 version;
+	u64 pfn[2];
 
 	/* Initialize the vmbus connection */
 	vmbus_connection.conn_state = CONNECTING;
@@ -213,6 +243,16 @@ int vmbus_connect(void)
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+		if (hv_mark_gpa_visibility(2, pfn,
+				VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+			ret = -EFAULT;
+			goto cleanup;
+		}
+	}
+
 	msginfo = kzalloc(sizeof(*msginfo) +
 			  sizeof(struct vmbus_channel_initiate_contact),
 			  GFP_KERNEL);
@@ -279,6 +319,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+	u64 pfn[2];
+
 	/*
 	 * First send the unload request to the host.
 	 */
@@ -298,6 +340,26 @@ void vmbus_disconnect(void)
 		vmbus_connection.int_page = NULL;
 	}
 
+	if (hv_isolation_type_snp()) {
+		if (vmbus_connection.monitor_pages_va[0]) {
+			vunmap(vmbus_connection.monitor_pages[0]);
+			vmbus_connection.monitor_pages[0]
+				= vmbus_connection.monitor_pages_va[0];
+			vmbus_connection.monitor_pages_va[0] = NULL;
+		}
+
+		if (vmbus_connection.monitor_pages_va[1]) {
+			vunmap(vmbus_connection.monitor_pages[1]);
+			vmbus_connection.monitor_pages[1]
+				= vmbus_connection.monitor_pages_va[1];
+			vmbus_connection.monitor_pages_va[1] = NULL;
+		}
+
+		pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+		pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+		hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+	}
+
 	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
 	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
 	vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9416e09ebd58..0778add21a9c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
 	 * is child->parent notification
 	 */
 	struct hv_monitor_page *monitor_pages[2];
+	void *monitor_pages_va[2];
 	struct list_head chn_msg_list;
 	spinlock_t channelmsg_lock;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 7/12] hv/vmbus: Initialize VMbus ring buffer for Isolation VM
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (5 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 6/12] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 8/12] x86/Hyper-V: Initialize bounce buffer page cache and list Tianyu Lan
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, akpm
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-mm, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/channel.c      | 10 +++++
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 83 +++++++++++++++++++++++++++++----------
 mm/ioremap.c              |  1 +
 mm/vmalloc.c              |  1 +
 5 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f31b669a1ddf..4c05b1488649 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -657,6 +657,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	if (err)
 		goto error_clean_ring;
 
+	err = hv_ringbuffer_post_init(&newchannel->outbound,
+				      page, send_pages);
+	if (err)
+		goto error_free_gpadl;
+
+	err = hv_ringbuffer_post_init(&newchannel->inbound,
+				      &page[send_pages], recv_pages);
+	if (err)
+		goto error_free_gpadl;
+
 	/* Create and init the channel open message */
 	open_info = kzalloc(sizeof(*open_info) +
 			   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0778add21a9c..d78a04ad5490 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+		struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 		       struct page *pages, u32 pagecnt);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 35833d4d1a1d..c8b0f7b45158 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
 #include <linux/prefetch.h>
+#include <linux/io.h>
+#include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"
 
@@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
 	mutex_init(&channel->outbound.ring_buffer_mutex);
 }
 
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+		       struct page *pages, u32 page_cnt)
+{
+	struct vm_struct *area;
+	u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+	unsigned long vaddr;
+	int err = 0;
+
+	if (!hv_isolation_type_snp())
+		return 0;
+
+	physic_addr += ms_hyperv.shared_gpa_boundary;
+	area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
+	if (!area || !area->addr)
+		return -EFAULT;
+
+	vaddr = (unsigned long)area->addr;
+	err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
+			   physic_addr, PAGE_KERNEL_IO);
+	err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
+				  vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
+				  physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
+	if (err) {
+		vunmap((void *)vaddr);
+		return -EFAULT;
+	}
+
+	/* Clean memory after setting host visibility. */
+	memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+	ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+	ring_info->ring_buffer->read_index = 0;
+	ring_info->ring_buffer->write_index = 0;
+	ring_info->ring_buffer->feature_bits.value = 1;
+
+	return 0;
+}
+
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 		       struct page *pages, u32 page_cnt)
@@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 
 	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
 
-	/*
-	 * First page holds struct hv_ring_buffer, do wraparound mapping for
-	 * the rest.
-	 */
-	pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
-				   GFP_KERNEL);
-	if (!pages_wraparound)
-		return -ENOMEM;
-
-	pages_wraparound[0] = pages;
-	for (i = 0; i < 2 * (page_cnt - 1); i++)
-		pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
+	if (!hv_isolation_type_snp()) {
+		/*
+		 * First page holds struct hv_ring_buffer, do wraparound mapping for
+		 * the rest.
+		 */
+		pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+					   GFP_KERNEL);
+		if (!pages_wraparound)
+			return -ENOMEM;
 
-	ring_info->ring_buffer = (struct hv_ring_buffer *)
-		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
+		pages_wraparound[0] = pages;
+		for (i = 0; i < 2 * (page_cnt - 1); i++)
+			pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
 
-	kfree(pages_wraparound);
+		ring_info->ring_buffer = (struct hv_ring_buffer *)
+			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
 
+		kfree(pages_wraparound);
 
-	if (!ring_info->ring_buffer)
-		return -ENOMEM;
+		if (!ring_info->ring_buffer)
+			return -ENOMEM;
 
-	ring_info->ring_buffer->read_index =
-		ring_info->ring_buffer->write_index = 0;
+		ring_info->ring_buffer->read_index =
+			ring_info->ring_buffer->write_index = 0;
 
-	/* Set the feature bit for enabling flow control. */
-	ring_info->ring_buffer->feature_bits.value = 1;
+		/* Set the feature bit for enabling flow control. */
+		ring_info->ring_buffer->feature_bits.value = 1;
+	}
 
 	ring_info->ring_size = page_cnt << PAGE_SHIFT;
 	ring_info->ring_size_div10_reciprocal =
diff --git a/mm/ioremap.c b/mm/ioremap.c
index 5fa1ab41d152..d63c4ba067f9 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -248,6 +248,7 @@ int ioremap_page_range(unsigned long addr,
 
 	return err;
 }
+EXPORT_SYMBOL_GPL(ioremap_page_range);
 
 #ifdef CONFIG_GENERIC_IOREMAP
 void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e6f352bf0498..19724a8ebcb7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2131,6 +2131,7 @@ struct vm_struct *get_vm_area(unsigned long size, unsigned long flags)
 				  NUMA_NO_NODE, GFP_KERNEL,
 				  __builtin_return_address(0));
 }
+EXPORT_SYMBOL_GPL(get_vm_area);
 
 struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
 				const void *caller)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 8/12] x86/Hyper-V: Initialize bounce buffer page cache and list
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (6 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 7/12] hv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 9/12] x86/Hyper-V: Add new parameter for vmbus_sendpacket_pagebuffer()/mpb_desc() Tianyu Lan
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu
  Cc: Tianyu Lan, linux-kernel, linux-hyperv, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Initialize/free bounce buffer resource when add/delete
vmbus channel in Isolation VM.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/Makefile       |  2 +-
 drivers/hv/channel_mgmt.c | 29 +++++++++++++++++----------
 drivers/hv/hv_bounce.c    | 42 +++++++++++++++++++++++++++++++++++++++
 drivers/hv/hyperv_vmbus.h | 14 +++++++++++++
 include/linux/hyperv.h    | 22 ++++++++++++++++++++
 5 files changed, 97 insertions(+), 12 deletions(-)
 create mode 100644 drivers/hv/hv_bounce.c

diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index 94daf8240c95..b0c20fed9153 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -8,6 +8,6 @@ CFLAGS_hv_balloon.o = -I$(src)
 
 hv_vmbus-y := vmbus_drv.o \
 		 hv.o connection.o channel.o \
-		 channel_mgmt.o ring_buffer.o hv_trace.o
+		 channel_mgmt.o ring_buffer.o hv_trace.o hv_bounce.o
 hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
 hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_fcopy.o hv_utils_transport.o
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index f0ed730e2e4e..e2846cacfd70 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -336,6 +336,18 @@ bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf,
 
 EXPORT_SYMBOL_GPL(vmbus_prep_negotiate_resp);
 
+/*
+ * free_channel - Release the resources used by the vmbus channel object
+ */
+static void free_channel(struct vmbus_channel *channel)
+{
+	tasklet_kill(&channel->callback_event);
+	vmbus_remove_channel_attr_group(channel);
+
+	kobject_put(&channel->kobj);
+	hv_free_channel_ivm(channel);
+}
+
 /*
  * alloc_channel - Allocate and initialize a vmbus channel object
  */
@@ -360,17 +372,6 @@ static struct vmbus_channel *alloc_channel(void)
 	return channel;
 }
 
-/*
- * free_channel - Release the resources used by the vmbus channel object
- */
-static void free_channel(struct vmbus_channel *channel)
-{
-	tasklet_kill(&channel->callback_event);
-	vmbus_remove_channel_attr_group(channel);
-
-	kobject_put(&channel->kobj);
-}
-
 void vmbus_channel_map_relid(struct vmbus_channel *channel)
 {
 	if (WARN_ON(channel->offermsg.child_relid >= MAX_CHANNEL_RELIDS))
@@ -510,6 +511,8 @@ static void vmbus_add_channel_work(struct work_struct *work)
 		if (vmbus_add_channel_kobj(dev, newchannel))
 			goto err_deq_chan;
 
+		hv_init_channel_ivm(newchannel);
+
 		if (primary_channel->sc_creation_callback != NULL)
 			primary_channel->sc_creation_callback(newchannel);
 
@@ -543,6 +546,10 @@ static void vmbus_add_channel_work(struct work_struct *work)
 	}
 
 	newchannel->probe_done = true;
+
+	if (hv_init_channel_ivm(newchannel))
+		goto err_deq_chan;
+
 	return;
 
 err_deq_chan:
diff --git a/drivers/hv/hv_bounce.c b/drivers/hv/hv_bounce.c
new file mode 100644
index 000000000000..c5898325b238
--- /dev/null
+++ b/drivers/hv/hv_bounce.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Bounce buffer code for Hyper-V Isolation VM support.
+ *
+ * Authors:
+ *   Sunil Muthuswamy <sunilmut@microsoft.com>
+ *   Tianyu Lan <Tianyu.Lan@microsoft.com>
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "hyperv_vmbus.h"
+
+int hv_init_channel_ivm(struct vmbus_channel *channel)
+{
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	INIT_LIST_HEAD(&channel->bounce_page_free_head);
+	INIT_LIST_HEAD(&channel->bounce_pkt_free_list_head);
+
+	channel->bounce_pkt_cache = KMEM_CACHE(hv_bounce_pkt, 0);
+	if (unlikely(!channel->bounce_pkt_cache))
+		return -ENOMEM;
+	channel->bounce_page_cache = KMEM_CACHE(hv_bounce_page_list, 0);
+	if (unlikely(!channel->bounce_page_cache))
+		return -ENOMEM;
+
+	return 0;
+}
+
+void hv_free_channel_ivm(struct vmbus_channel *channel)
+{
+	if (!hv_is_isolation_supported())
+		return;
+
+
+	cancel_delayed_work_sync(&channel->bounce_page_list_maintain);
+	hv_bounce_pkt_list_free(channel, &channel->bounce_pkt_free_list_head);
+	hv_bounce_page_list_free(channel, &channel->bounce_page_free_head);
+	kmem_cache_destroy(channel->bounce_pkt_cache);
+	kmem_cache_destroy(channel->bounce_page_cache);
+}
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index d78a04ad5490..7edf2be60d2c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -19,6 +19,7 @@
 #include <linux/hyperv.h>
 #include <linux/interrupt.h>
 
+#include <asm/mshyperv.h>
 #include "hv_trace.h"
 
 /*
@@ -56,6 +57,19 @@ union hv_monitor_trigger_state {
 	};
 };
 
+/*
+ * All vmbus channels initially start with zero bounce pages and are required
+ * to set any non-zero size, if needed.
+ */
+#define HV_DEFAULT_BOUNCE_BUFFER_PAGES  0
+
+/* MIN should be a power of 2 */
+#define HV_MIN_BOUNCE_BUFFER_PAGES	64
+
+extern int hv_init_channel_ivm(struct vmbus_channel *channel);
+
+extern void hv_free_channel_ivm(struct vmbus_channel *channel);
+
 /* struct hv_monitor_page Layout */
 /* ------------------------------------------------------ */
 /* | 0   | TriggerState (4 bytes) | Rsvd1 (4 bytes)     | */
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 41cbaa2db567..d518aba17565 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -25,6 +25,9 @@
 #include <linux/interrupt.h>
 #include <linux/reciprocal_div.h>
 #include <asm/hyperv-tlfs.h>
+#include <linux/slab.h>
+#include <linux/mempool.h>
+#include <linux/mempool.h>
 
 #define MAX_PAGE_BUFFER_COUNT				32
 #define MAX_MULTIPAGE_BUFFER_COUNT			32 /* 128K */
@@ -1007,9 +1010,28 @@ struct vmbus_channel {
 	u32 fuzz_testing_interrupt_delay;
 	u32 fuzz_testing_message_delay;
 
+
 	/* request/transaction ids for VMBus */
 	struct vmbus_requestor requestor;
 	u32 rqstor_size;
+	/*
+	 * Minimum number of bounce resources (i.e bounce packets & pages) that
+	 * should be allocated and reserved for this channel. Allocation is
+	 * permitted to go beyond this limit, and the maintenance task takes
+	 * care of releasing the extra allocated resources.
+	 */
+	u32 min_bounce_resource_count;
+
+	/* The free list of bounce pages is LRU sorted based on last used */
+	struct list_head bounce_page_free_head;
+	u32 bounce_page_alloc_count;
+	struct delayed_work bounce_page_list_maintain;
+
+	struct kmem_cache *bounce_page_cache;
+	struct kmem_cache *bounce_pkt_cache;
+	struct list_head bounce_pkt_free_list_head;
+	u32 bounce_pkt_free_count;
+	spinlock_t bp_lock;
 };
 
 u64 vmbus_next_request_id(struct vmbus_requestor *rqstor, u64 rqst_addr);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 9/12] x86/Hyper-V: Add new parameter for vmbus_sendpacket_pagebuffer()/mpb_desc()
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (7 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 8/12] x86/Hyper-V: Initialize bounce buffer page cache and list Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 10/12] HV: Add bounce buffer support for Isolation VM Tianyu Lan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, davem, kuba, jejb, martin.petersen
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, linux-scsi,
	vkuznets, thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add new parameter io_type and struct bounce_pkt for vmbus_sendpacket_pagebuffer()
and vmbus_sendpacket_mpb_desc() in order to add bounce buffer support
later.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/channel.c            |  7 +++++--
 drivers/hv/hyperv_vmbus.h       | 12 ++++++++++++
 drivers/net/hyperv/hyperv_net.h |  1 +
 drivers/net/hyperv/netvsc.c     |  5 ++++-
 drivers/scsi/storvsc_drv.c      | 23 +++++++++++++++++------
 include/linux/hyperv.h          | 16 ++++++++++++++--
 6 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 4c05b1488649..976ef99dda28 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1044,7 +1044,8 @@ EXPORT_SYMBOL(vmbus_sendpacket);
 int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
 				struct hv_page_buffer pagebuffers[],
 				u32 pagecount, void *buffer, u32 bufferlen,
-				u64 requestid)
+				u64 requestid, u8 io_type,
+				struct hv_bounce_pkt **bounce_pkt)
 {
 	int i;
 	struct vmbus_channel_packet_page_buffer desc;
@@ -1101,7 +1102,9 @@ EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
 int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 			      struct vmbus_packet_mpb_array *desc,
 			      u32 desc_size,
-			      void *buffer, u32 bufferlen, u64 requestid)
+			      void *buffer, u32 bufferlen, u64 requestid,
+			      u32 pfn_count, u8 io_type,
+			      struct hv_bounce_pkt **bounce_pkt)
 {
 	u32 packetlen;
 	u32 packetlen_aligned;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 7edf2be60d2c..7677f083d33a 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -57,6 +57,18 @@ union hv_monitor_trigger_state {
 	};
 };
 
+/*
+ * Hyper-V bounce packet. Each in-use bounce packet is mapped to a vmbus
+ * transaction and contains a list of bounce pages for that transaction.
+ */
+struct hv_bounce_pkt {
+	/* Link to the next bounce packet, when it is in the free list */
+	struct list_head link;
+	struct list_head bounce_page_head;
+	u32 flags;
+};
+
+
 /*
  * All vmbus channels initially start with zero bounce pages and are required
  * to set any non-zero size, if needed.
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index b3a43c4ec8ab..11266b92bcf0 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,6 +130,7 @@ struct hv_netvsc_packet {
 	u32 total_bytes;
 	u32 send_buf_index;
 	u32 total_data_buflen;
+	struct hv_bounce_pkt *bounce_pkt;
 };
 
 #define NETVSC_HASH_KEYLEN 40
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 08d73401bb28..77657c5acc65 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -926,14 +926,17 @@ static inline int netvsc_send_pkt(
 
 	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
 
+	packet->bounce_pkt = NULL;
 	if (packet->page_buf_cnt) {
 		if (packet->cp_partial)
 			pb += packet->rmsg_pgcnt;
 
+		/* The I/O type is always 'write' for netvsc */
 		ret = vmbus_sendpacket_pagebuffer(out_channel,
 						  pb, packet->page_buf_cnt,
 						  &nvmsg, sizeof(nvmsg),
-						  req_id);
+						  req_id, IO_TYPE_WRITE,
+						  &packet->bounce_pkt);
 	} else {
 		ret = vmbus_sendpacket(out_channel,
 				       &nvmsg, sizeof(nvmsg),
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..c5b4974eb41f 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -31,6 +31,7 @@
 #include <scsi/scsi_dbg.h>
 #include <scsi/scsi_transport_fc.h>
 #include <scsi/scsi_transport.h>
+#include <asm/mshyperv.h>
 
 /*
  * All wire protocol details (storage protocol between the guest and the host)
@@ -427,6 +428,7 @@ struct storvsc_cmd_request {
 	u32 payload_sz;
 
 	struct vstor_packet vstor_packet;
+	struct hv_bounce_pkt *bounce_pkt;
 };
 
 
@@ -1390,7 +1392,8 @@ static struct vmbus_channel *get_og_chn(struct storvsc_device *stor_device,
 
 
 static int storvsc_do_io(struct hv_device *device,
-			 struct storvsc_cmd_request *request, u16 q_num)
+			 struct storvsc_cmd_request *request, u16 q_num,
+			 u32 pfn_count)
 {
 	struct storvsc_device *stor_device;
 	struct vstor_packet *vstor_packet;
@@ -1493,14 +1496,18 @@ static int storvsc_do_io(struct hv_device *device,
 
 	vstor_packet->operation = VSTOR_OPERATION_EXECUTE_SRB;
 
+	request->bounce_pkt = NULL;
 	if (request->payload->range.len) {
+		struct vmscsi_request *vm_srb = &request->vstor_packet.vm_srb;
 
 		ret = vmbus_sendpacket_mpb_desc(outgoing_channel,
 				request->payload, request->payload_sz,
 				vstor_packet,
 				(sizeof(struct vstor_packet) -
 				vmscsi_size_delta),
-				(unsigned long)request);
+				(unsigned long)request,
+				pfn_count,
+				vm_srb->data_in, &request->bounce_pkt);
 	} else {
 		ret = vmbus_sendpacket(outgoing_channel, vstor_packet,
 			       (sizeof(struct vstor_packet) -
@@ -1510,8 +1517,10 @@ static int storvsc_do_io(struct hv_device *device,
 			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
 	}
 
-	if (ret != 0)
+	if (ret != 0) {
+		request->bounce_pkt = NULL;
 		return ret;
+	}
 
 	atomic_inc(&stor_device->num_outstanding_req);
 
@@ -1825,14 +1834,16 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	cmd_request->payload_sz = payload_sz;
 
 	/* Invokes the vsc to start an IO */
-	ret = storvsc_do_io(dev, cmd_request, get_cpu());
+	ret = storvsc_do_io(dev, cmd_request, get_cpu(), sg_count);
 	put_cpu();
 
-	if (ret == -EAGAIN) {
+	if (ret) {
 		if (payload_sz > sizeof(cmd_request->mpb))
 			kfree(payload);
 		/* no more space */
-		return SCSI_MLQUEUE_DEVICE_BUSY;
+		if (ret == -EAGAIN || ret == -ENOSPC)
+			return SCSI_MLQUEUE_DEVICE_BUSY;
+		return ret;
 	}
 
 	return 0;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index d518aba17565..d1a936091665 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1184,19 +1184,31 @@ extern int vmbus_sendpacket(struct vmbus_channel *channel,
 				  enum vmbus_packet_type type,
 				  u32 flags);
 
+#define IO_TYPE_WRITE	0
+#define IO_TYPE_READ	1
+#define IO_TYPE_UNKNOWN 2
+
+struct hv_bounce_pkt;
+
 extern int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
 					    struct hv_page_buffer pagebuffers[],
 					    u32 pagecount,
 					    void *buffer,
 					    u32 bufferlen,
-					    u64 requestid);
+					    u64 requestid,
+					    u8 io_type,
+					    struct hv_bounce_pkt **bounce_pkt);
 
 extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 				     struct vmbus_packet_mpb_array *mpb,
 				     u32 desc_size,
 				     void *buffer,
 				     u32 bufferlen,
-				     u64 requestid);
+				     u64 requestid,
+				     u32 pfn_count,
+				     u8 io_type,
+				     struct hv_bounce_pkt **bounce_pkt);
+
 
 extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
 				      void *kbuffer,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 10/12] HV: Add bounce buffer support for Isolation VM
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (8 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 9/12] x86/Hyper-V: Add new parameter for vmbus_sendpacket_pagebuffer()/mpb_desc() Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc Tianyu Lan
  11 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP base Isolation VMs. The memory of these vms
are encrypted and host can't access guest memory directly. The
guest needs to call hv host visibility hvcall to mark memory visible
to host before sharing memory with host for IO operation. So there
is bounce buffer request for IO operation to get data from host.
To receive data, host puts data into the shared memory(bounce buffer)
and guest copies the data to private memory. Vice versa.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary.

Vmbus channel ring buffer has been marked as host visible and works
as bounce buffer for vmbus devices. vmbus_sendpacket_pagebuffer()
and vmbus_sendpacket_mpb_desc() send package which uses system memory
out of vmbus channel ring buffer. These memory still needs to allocate
additional bounce buffer to commnuicate with host. Add vmbus_sendpacket_
pagebuffer_bounce () and vmbus_sendpacket_mpb_desc_bounce() to handle
such case.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/channel.c      |  13 +-
 drivers/hv/channel_mgmt.c |   1 +
 drivers/hv/hv_bounce.c    | 579 +++++++++++++++++++++++++++++++++++++-
 drivers/hv/hyperv_vmbus.h |  13 +
 include/linux/hyperv.h    |   2 +
 5 files changed, 605 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 976ef99dda28..f5391a050bdc 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1090,7 +1090,11 @@ int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
 	bufferlist[2].iov_base = &aligned_data;
 	bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-	return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+	if (hv_is_isolation_supported())
+		return vmbus_sendpacket_pagebuffer_bounce(channel, &desc,
+			descsize, bufferlist, io_type, bounce_pkt, requestid);
+	else
+		return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
 
@@ -1130,7 +1134,12 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
 	bufferlist[2].iov_base = &aligned_data;
 	bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-	return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+	if (hv_is_isolation_supported()) {
+		return vmbus_sendpacket_mpb_desc_bounce(channel, desc,
+			desc_size, bufferlist, io_type, bounce_pkt, requestid);
+	} else {
+		return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+	}
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);
 
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index e2846cacfd70..d8090b2e2421 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -359,6 +359,7 @@ static struct vmbus_channel *alloc_channel(void)
 	if (!channel)
 		return NULL;
 
+	spin_lock_init(&channel->bp_lock);
 	spin_lock_init(&channel->sched_lock);
 	init_completion(&channel->rescind_event);
 
diff --git a/drivers/hv/hv_bounce.c b/drivers/hv/hv_bounce.c
index c5898325b238..bed1a361d167 100644
--- a/drivers/hv/hv_bounce.c
+++ b/drivers/hv/hv_bounce.c
@@ -9,12 +9,589 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include "hyperv_vmbus.h"
+#include <linux/uio.h>
+#include <asm/mshyperv.h>
+
+/* BP == Bounce Pages here */
+#define BP_LIST_MAINTENANCE_FREQ (30 * HZ)
+#define BP_MIN_TIME_IN_FREE_LIST (30 * HZ)
+#define IS_BP_MAINTENANCE_TASK_NEEDED(channel) \
+	(channel->bounce_page_alloc_count > \
+	 channel->min_bounce_resource_count && \
+	 !list_empty(&channel->bounce_page_free_head))
+#define BP_QUEUE_MAINTENANCE_WORK(channel) \
+	queue_delayed_work(system_unbound_wq,		\
+			   &channel->bounce_page_list_maintain, \
+			   BP_LIST_MAINTENANCE_FREQ)
+
+#define hv_copy_to_bounce(bounce_pkt) \
+		hv_copy_to_from_bounce(bounce_pkt, true)
+#define hv_copy_from_bounce(bounce_pkt)	\
+		hv_copy_to_from_bounce(bounce_pkt, false)
+/*
+ * A list of bounce pages, with original va, bounce va and I/O details such as
+ * the offset and length.
+ */
+struct hv_bounce_page_list {
+	struct list_head link;
+	u32 offset;
+	u32 len;
+	unsigned long va;
+	unsigned long bounce_va;
+	unsigned long bounce_original_va;
+	unsigned long bounce_extra_pfn;
+	unsigned long last_used_jiff;
+};
+
+/*
+ * This structure can be safely used to iterate over objects of the type
+ * 'hv_page_buffer', 'hv_mpb_array' or 'hv_multipage_buffer'. The min array
+ * size of 1 is needed to include the size of 'pfn_array' as part of the struct.
+ */
+struct hv_page_range {
+	u32 len;
+	u32 offset;
+	u64 pfn_array[1];
+};
+
+static inline struct hv_bounce_pkt *__hv_bounce_pkt_alloc(
+	struct vmbus_channel *channel)
+{
+	return kmem_cache_alloc(channel->bounce_pkt_cache,
+				__GFP_ZERO | GFP_KERNEL);
+}
+
+static inline void __hv_bounce_pkt_free(struct vmbus_channel *channel,
+					struct hv_bounce_pkt *bounce_pkt)
+{
+	kmem_cache_free(channel->bounce_pkt_cache, bounce_pkt);
+}
+
+static inline void hv_bounce_pkt_list_free(struct vmbus_channel *channel,
+					   const struct list_head *head)
+{
+	struct hv_bounce_pkt *bounce_pkt;
+	struct hv_bounce_pkt *tmp;
+
+	list_for_each_entry_safe(bounce_pkt, tmp, head, link) {
+		list_del(&bounce_pkt->link);
+		__hv_bounce_pkt_free(channel, bounce_pkt);
+	}
+}
+
+/*
+ * Assigns a free bounce packet from the channel, if one is available. Else,
+ * allocates one. Use 'hv_bounce_resources_release' to release the bounce packet
+ * as it also takes care of releasing the bounce pages within, if any.
+ */
+static struct hv_bounce_pkt *hv_bounce_pkt_assign(struct vmbus_channel *channel)
+{
+	if (channel->min_bounce_resource_count) {
+		struct hv_bounce_pkt *bounce_pkt = NULL;
+		unsigned long flags;
+
+		spin_lock_irqsave(&channel->bp_lock, flags);
+		if (!list_empty(&channel->bounce_pkt_free_list_head)) {
+			bounce_pkt = list_first_entry(
+					&channel->bounce_pkt_free_list_head,
+					struct hv_bounce_pkt, link);
+			list_del(&bounce_pkt->link);
+			channel->bounce_pkt_free_count--;
+		}
+
+		spin_unlock_irqrestore(&channel->bp_lock, flags);
+		if (bounce_pkt)
+			return bounce_pkt;
+	}
+
+	return __hv_bounce_pkt_alloc(channel);
+}
+
+static void hv_bounce_pkt_release(struct vmbus_channel *channel,
+				  struct hv_bounce_pkt *bounce_pkt)
+{
+	bool free_pkt = true;
+
+	if (channel->min_bounce_resource_count) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&channel->bp_lock, flags);
+		if (channel->bounce_pkt_free_count <
+		    channel->min_bounce_resource_count) {
+			list_add(&bounce_pkt->link,
+				 &channel->bounce_pkt_free_list_head);
+			channel->bounce_pkt_free_count++;
+			free_pkt = false;
+		}
+
+		spin_unlock_irqrestore(&channel->bp_lock, flags);
+	}
+
+	if (free_pkt)
+		__hv_bounce_pkt_free(channel, bounce_pkt);
+}
+
+/* Frees the list of bounce pages and all of the resources within */
+static void hv_bounce_page_list_free(struct vmbus_channel *channel,
+				     const struct list_head *head)
+{
+	u16 count = 0;
+	u64 pfn[HV_MIN_BOUNCE_BUFFER_PAGES];
+	struct hv_bounce_page_list *bounce_page;
+	struct hv_bounce_page_list *tmp;
+
+	BUILD_BUG_ON(HV_MIN_BOUNCE_BUFFER_PAGES > HV_MAX_MODIFY_GPA_REP_COUNT);
+	list_for_each_entry(bounce_page, head, link) {
+		if (hv_isolation_type_snp())
+			pfn[count++] = virt_to_hvpfn(
+					(void *)bounce_page->bounce_original_va);
+		else
+			pfn[count++] = virt_to_hvpfn(
+					(void *)bounce_page->bounce_va);
+
+		if (count < HV_MIN_BOUNCE_BUFFER_PAGES &&
+		    !list_is_last(&bounce_page->link, head))
+			continue;
+		hv_mark_gpa_visibility(count, pfn, VMBUS_PAGE_NOT_VISIBLE);
+		count = 0;
+	}
+
+	/*
+	 * Need a second iteration because the page should not be freed until
+	 * it is marked not-visible to the host.
+	 */
+	list_for_each_entry_safe(bounce_page, tmp, head, link) {
+		list_del(&bounce_page->link);
+
+		if (hv_isolation_type_snp()) {
+			vunmap((void *)bounce_page->bounce_va);
+			free_page(bounce_page->bounce_original_va);
+		} else
+			free_page(bounce_page->bounce_va);
+
+		kmem_cache_free(channel->bounce_page_cache, bounce_page);
+	}
+}
+
+/* Allocate a list of bounce pages and make them host visible. */
+static int hv_bounce_page_list_alloc(struct vmbus_channel *channel, u32 count)
+{
+	unsigned long flags;
+	struct list_head head;
+	u32 p;
+	u64 pfn[HV_MIN_BOUNCE_BUFFER_PAGES];
+	u32 pfn_count = 0;
+	bool queue_work = false;
+	int ret = -ENOSPC;
+	unsigned long va = 0;
+
+	INIT_LIST_HEAD(&head);
+	for (p = 0; p < count; p++) {
+		struct hv_bounce_page_list *bounce_page;
+
+		va = __get_free_page(__GFP_ZERO | GFP_ATOMIC);
+		if (unlikely(!va))
+			goto err_free;
+		bounce_page = kmem_cache_alloc(channel->bounce_page_cache,
+					       __GFP_ZERO | GFP_ATOMIC);
+		if (unlikely(!bounce_page))
+			goto err_free;
+
+		if (hv_isolation_type_snp()) {
+			bounce_page->bounce_extra_pfn =
+				virt_to_hvpfn((void *)va) +
+				(ms_hyperv.shared_gpa_boundary
+				 >> HV_HYP_PAGE_SHIFT);
+			bounce_page->bounce_original_va = va;
+			bounce_page->bounce_va = (u64)ioremap_cache(
+				bounce_page->bounce_extra_pfn << HV_HYP_PAGE_SHIFT,
+				HV_HYP_PAGE_SIZE);
+			if (!bounce_page->bounce_va)
+				goto err_free;
+		} else {
+			bounce_page->bounce_va = va;
+		}
+
+		pfn[pfn_count++] = virt_to_hvpfn((void *)va);
+		bounce_page->last_used_jiff = jiffies;
+
+		/* Add to the tail to maintain LRU sorting */
+		list_add_tail(&bounce_page->link, &head);
+		va = 0;
+		if (pfn_count == HV_MIN_BOUNCE_BUFFER_PAGES || p == count - 1) {
+			ret = hv_mark_gpa_visibility(pfn_count, pfn,
+					VMBUS_PAGE_VISIBLE_READ_WRITE);
+			if (hv_isolation_type_snp())
+				list_for_each_entry(bounce_page, &head, link)
+					memset((u64 *)bounce_page->bounce_va, 0x00,
+					       HV_HYP_PAGE_SIZE);
+
+			if (unlikely(ret < 0))
+				goto err_free;
+			pfn_count = 0;
+		}
+	}
+
+	spin_lock_irqsave(&channel->bp_lock, flags);
+	list_splice_tail(&head, &channel->bounce_page_free_head);
+	channel->bounce_page_alloc_count += count;
+	queue_work = IS_BP_MAINTENANCE_TASK_NEEDED(channel);
+	spin_unlock_irqrestore(&channel->bp_lock, flags);
+	if (queue_work)
+		BP_QUEUE_MAINTENANCE_WORK(channel);
+	return 0;
+err_free:
+	if (va)
+		free_page(va);
+	hv_bounce_page_list_free(channel, &head);
+	return ret;
+}
+
+/*
+ * Puts the bounce pages in the list back into the channel's free bounce page
+ * list and schedules the bounce page maintenance routine.
+ */
+static void hv_bounce_page_list_release(struct vmbus_channel *channel,
+					struct list_head *head)
+{
+	struct hv_bounce_page_list *bounce_page;
+	unsigned long flags;
+	bool queue_work;
+	struct hv_bounce_page_list *tmp;
+
+	/*
+	 * Need to iterate, rather than a direct list merge so that the last
+	 * used timestamp can be updated for each page.
+	 * Add the page to the tail of the free list to maintain LRU sorting.
+	 */
+	spin_lock_irqsave(&channel->bp_lock, flags);
+	list_for_each_entry_safe(bounce_page, tmp, head, link) {
+		list_del(&bounce_page->link);
+		bounce_page->last_used_jiff = jiffies;
+
+		/* Maintain LRU */
+		list_add_tail(&bounce_page->link,
+			      &channel->bounce_page_free_head);
+	}
+
+	queue_work = IS_BP_MAINTENANCE_TASK_NEEDED(channel);
+	spin_unlock_irqrestore(&channel->bp_lock, flags);
+	if (queue_work)
+		BP_QUEUE_MAINTENANCE_WORK(channel);
+}
+
+/*
+ * Maintenance work to prune the vmbus channel's free bounce page list. It runs
+ * at every 'BP_LIST_MAINTENANCE_FREQ' and frees the bounce pages that are in
+ * the free list longer than 'BP_MIN_TIME_IN_FREE_LIST' once the min bounce
+ * resource reservation requirement is met.
+ */
+static void hv_bounce_page_list_maintain(struct work_struct *work)
+{
+	struct vmbus_channel *channel;
+	struct delayed_work *dwork = to_delayed_work(work);
+	unsigned long flags;
+	struct list_head head_to_free;
+	bool queue_work;
+
+	channel = container_of(dwork, struct vmbus_channel,
+			       bounce_page_list_maintain);
+	INIT_LIST_HEAD(&head_to_free);
+	spin_lock_irqsave(&channel->bp_lock, flags);
+	while (IS_BP_MAINTENANCE_TASK_NEEDED(channel)) {
+		struct hv_bounce_page_list *bounce_page = list_first_entry(
+				&channel->bounce_page_free_head,
+				struct hv_bounce_page_list,
+				link);
+
+		/*
+		 * Stop on the first entry that fails the check since the
+		 * list is expected to be sorted on LRU.
+		 */
+		if (time_before(jiffies, bounce_page->last_used_jiff +
+				BP_MIN_TIME_IN_FREE_LIST))
+			break;
+		list_del(&bounce_page->link);
+		list_add_tail(&bounce_page->link, &head_to_free);
+		channel->bounce_page_alloc_count--;
+	}
+
+	queue_work = IS_BP_MAINTENANCE_TASK_NEEDED(channel);
+	spin_unlock_irqrestore(&channel->bp_lock, flags);
+	if (!list_empty(&head_to_free))
+		hv_bounce_page_list_free(channel, &head_to_free);
+	if (queue_work)
+		BP_QUEUE_MAINTENANCE_WORK(channel);
+}
+
+/*
+ * Assigns a free bounce page from the channel, if one is available. Else,
+ * allocates a bunch of bounce pages into the channel and returns one. Use
+ * 'hv_bounce_page_list_release' to release the page.
+ */
+static struct hv_bounce_page_list *hv_bounce_page_assign(
+	struct vmbus_channel *channel)
+{
+	struct hv_bounce_page_list *bounce_page = NULL;
+	unsigned long flags;
+	int ret;
+
+retry:	
+	spin_lock_irqsave(&channel->bp_lock, flags);
+	if (!list_empty(&channel->bounce_page_free_head)) {
+		bounce_page = list_first_entry(&channel->bounce_page_free_head,
+					       struct hv_bounce_page_list,
+					       link);
+		list_del(&bounce_page->link);
+	}
+	spin_unlock_irqrestore(&channel->bp_lock, flags);
+
+	if (likely(bounce_page))
+		return bounce_page;
+
+	if (hv_isolation_type_snp() && in_interrupt()) {
+		pr_warn_once("Reservse page is not enough.\n");
+		return NULL;
+	}
+
+	ret = hv_bounce_page_list_alloc(channel, HV_MIN_BOUNCE_BUFFER_PAGES);
+	if (unlikely(ret < 0))
+		return NULL;
+	goto retry;
+}
+
+/*
+ * Allocate 'count' linked list of bounce packets into the channel. Use
+ * 'hv_bounce_pkt_list_free' to free the list.
+ */
+static int hv_bounce_pkt_list_alloc(struct vmbus_channel *channel, u32 count)
+{
+	struct list_head bounce_pkt_head;
+	unsigned long flags;
+	u32 i;
+
+	INIT_LIST_HEAD(&bounce_pkt_head);
+	for (i = 0; i < count; i++) {
+		struct hv_bounce_pkt *bounce_pkt = __hv_bounce_pkt_alloc(
+							channel);
+
+		if (unlikely(!bounce_pkt))
+			goto err_free;
+		list_add(&bounce_pkt->link, &bounce_pkt_head);
+	}
+
+	spin_lock_irqsave(&channel->bp_lock, flags);
+	list_splice_tail(&bounce_pkt_head, &channel->bounce_pkt_free_list_head);
+	channel->bounce_pkt_free_count += count;
+	spin_unlock_irqrestore(&channel->bp_lock, flags);
+	return 0;
+err_free:
+	hv_bounce_pkt_list_free(channel, &bounce_pkt_head);
+	return -ENOMEM;
+}
+
+/*
+ * Allocate and reserve enough bounce resources to be able to handle the min
+ * specified bytes. This routine should be called prior to starting the I/O on
+ * the channel, else the channel will end up not reserving any bounce resources.
+ */
+int hv_bounce_resources_reserve(struct vmbus_channel *channel,
+				u32 min_bounce_bytes)
+{
+	unsigned long flags;
+	u32 round_up_count;
+	int ret;
+
+	if (!hv_is_isolation_supported())
+		return 0;
+
+	/* Resize operation is currently not supported */
+	if (unlikely((!min_bounce_bytes || channel->min_bounce_resource_count)))
+		return -EINVAL;
+
+	/*
+	 * Get the page count and round it up to the min bounce pages supported
+	 */
+	round_up_count = round_up(min_bounce_bytes, HV_HYP_PAGE_SIZE)
+			>> HV_HYP_PAGE_SHIFT;
+	round_up_count = round_up(round_up_count, HV_MIN_BOUNCE_BUFFER_PAGES);
+	spin_lock_irqsave(&channel->bp_lock, flags);
+	channel->min_bounce_resource_count = round_up_count;
+	spin_unlock_irqrestore(&channel->bp_lock, flags);
+	ret = hv_bounce_pkt_list_alloc(channel, round_up_count);
+	if (ret < 0)
+		return ret;
+	return hv_bounce_page_list_alloc(channel, round_up_count);
+}
+EXPORT_SYMBOL_GPL(hv_bounce_resources_reserve);
+
+static void hv_bounce_resources_release(struct vmbus_channel *channel,
+					struct hv_bounce_pkt *bounce_pkt)
+{
+	if (unlikely(!bounce_pkt))
+		return;
+	hv_bounce_page_list_release(channel, &bounce_pkt->bounce_page_head);
+	hv_bounce_pkt_release(channel, bounce_pkt);
+}
+
+static void hv_copy_to_from_bounce(const struct hv_bounce_pkt *bounce_pkt,
+				   bool copy_to_bounce)
+{
+	struct hv_bounce_page_list *bounce_page;
+
+	if ((copy_to_bounce && (bounce_pkt->flags != IO_TYPE_WRITE)) ||
+	    (!copy_to_bounce && (bounce_pkt->flags != IO_TYPE_READ)))
+		return;
+
+	list_for_each_entry(bounce_page, &bounce_pkt->bounce_page_head, link) {
+		u32 offset = bounce_page->offset;
+		u32 len = bounce_page->len;
+		u8 *bounce_buffer = (u8 *)bounce_page->bounce_va;
+		u8 *buffer = (u8 *)bounce_page->va;
+
+		if (copy_to_bounce)
+			memcpy(bounce_buffer + offset, buffer + offset, len);
+		else
+			memcpy(buffer + offset, bounce_buffer + offset, len);
+	}
+}
+
+/*
+ * Assigns the bounce resources needed to handle the PFNs within the range and
+ * updates the range accordingly. Uses resources from the pre-allocated pool if
+ * previously reserved, else allocates memory. Use 'hv_bounce_resources_release'
+ * to release.
+ */
+static struct hv_bounce_pkt *hv_bounce_resources_assign(
+	struct vmbus_channel *channel,
+	u32 rangecount,
+	struct hv_page_range *range,
+	u8 io_type)
+{
+	struct hv_bounce_pkt *bounce_pkt;
+	u32 r;
+
+	bounce_pkt = hv_bounce_pkt_assign(channel);
+	if (unlikely(!bounce_pkt))
+		return NULL;
+	bounce_pkt->flags = io_type;
+	INIT_LIST_HEAD(&bounce_pkt->bounce_page_head);
+	for (r = 0; r < rangecount; r++) {
+		u32 len = range[r].len;
+		u32 offset = range[r].offset;
+		u32 p;
+		u32 pfn_count;
+
+		pfn_count = round_up(offset + len, HV_HYP_PAGE_SIZE)
+				>> HV_HYP_PAGE_SHIFT;
+		for (p = 0; p < pfn_count; p++) {
+			struct hv_bounce_page_list *bounce_page;
+			u32 copy_len = min(len, ((u32)HV_HYP_PAGE_SIZE - offset));
+
+			bounce_page  = hv_bounce_page_assign(channel);
+			if (unlikely(!bounce_page))
+				goto err_free;
+			bounce_page->va = (unsigned long)
+				__va(range[r].pfn_array[p] << PAGE_SHIFT);
+			bounce_page->offset = offset;
+			bounce_page->len = copy_len;
+			list_add_tail(&bounce_page->link,
+				      &bounce_pkt->bounce_page_head);
+
+			if (hv_isolation_type_snp()) {
+				range[r].pfn_array[p] =
+					bounce_page->bounce_extra_pfn;
+			} else {
+				range[r].pfn_array[p] = virt_to_hvpfn(
+					(void *)bounce_page->bounce_va);
+			}
+			offset = 0;
+			len -= copy_len;
+		}
+	}
+
+	/* Copy data from original buffer to bounce buffer, if needed */
+	hv_copy_to_bounce(bounce_pkt);
+	return bounce_pkt;
+err_free:
+	/* This will also reclaim any allocated bounce pages */
+	hv_bounce_resources_release(channel, bounce_pkt);
+	return NULL;
+}
+
+int vmbus_sendpacket_pagebuffer_bounce(
+	struct vmbus_channel *channel,
+	struct vmbus_channel_packet_page_buffer *desc,
+	u32 desc_size, struct kvec *bufferlist,
+	u8 io_type, struct hv_bounce_pkt **pbounce_pkt,
+	u64 requestid)
+{
+	struct hv_bounce_pkt *bounce_pkt;
+	int ret;
+
+	bounce_pkt = hv_bounce_resources_assign(channel, desc->rangecount,
+			(struct hv_page_range *)desc->range, io_type);
+	if (unlikely(!bounce_pkt))
+		return -EAGAIN;
+	ret = hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+	if (unlikely(ret < 0))
+		hv_bounce_resources_release(channel, bounce_pkt);
+	else
+		*pbounce_pkt = bounce_pkt;
+
+	return ret;
+}
+
+int vmbus_sendpacket_mpb_desc_bounce(
+	struct vmbus_channel *channel,  struct vmbus_packet_mpb_array *desc,
+	u32 desc_size, struct kvec *bufferlist, u8 io_type,
+	struct hv_bounce_pkt **pbounce_pkt, u64 requestid)
+{
+	struct hv_bounce_pkt *bounce_pkt;
+	struct vmbus_packet_mpb_array *desc_bounce;
+	struct hv_mpb_array *range;
+	int ret = -ENOSPC;
+
+	desc_bounce = kzalloc(desc_size, GFP_ATOMIC);
+	if (unlikely(!desc_bounce))
+		return ret;
+
+	memcpy(desc_bounce, desc, desc_size);
+	range = &desc_bounce->range;
+	bounce_pkt = hv_bounce_resources_assign(channel, desc->rangecount,
+			(struct hv_page_range *)range, io_type);
+	if (unlikely(!bounce_pkt))
+		goto free;
+	bufferlist[0].iov_base = desc_bounce;
+	ret = hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+free:
+	kfree(desc_bounce);
+	if (unlikely(ret < 0))
+		hv_bounce_resources_release(channel, bounce_pkt);
+	else
+		*pbounce_pkt = bounce_pkt;
+	return ret;
+}
+
+void hv_pkt_bounce(struct vmbus_channel *channel,
+		   struct hv_bounce_pkt *bounce_pkt)
+{
+	if (!bounce_pkt)
+		return;
+
+	hv_copy_from_bounce(bounce_pkt);
+	hv_bounce_resources_release(channel, bounce_pkt);
+}
+EXPORT_SYMBOL_GPL(hv_pkt_bounce);
 
 int hv_init_channel_ivm(struct vmbus_channel *channel)
 {
 	if (!hv_is_isolation_supported())
 		return 0;
 
+	INIT_DELAYED_WORK(&channel->bounce_page_list_maintain,
+			  hv_bounce_page_list_maintain);
+
 	INIT_LIST_HEAD(&channel->bounce_page_free_head);
 	INIT_LIST_HEAD(&channel->bounce_pkt_free_list_head);
 
@@ -33,8 +610,8 @@ void hv_free_channel_ivm(struct vmbus_channel *channel)
 	if (!hv_is_isolation_supported())
 		return;
 
-
 	cancel_delayed_work_sync(&channel->bounce_page_list_maintain);
+
 	hv_bounce_pkt_list_free(channel, &channel->bounce_pkt_free_list_head);
 	hv_bounce_page_list_free(channel, &channel->bounce_page_free_head);
 	kmem_cache_destroy(channel->bounce_pkt_cache);
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 7677f083d33a..d985271e5522 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -82,6 +82,19 @@ extern int hv_init_channel_ivm(struct vmbus_channel *channel);
 
 extern void hv_free_channel_ivm(struct vmbus_channel *channel);
 
+extern int vmbus_sendpacket_pagebuffer_bounce(struct vmbus_channel *channel,
+	struct vmbus_channel_packet_page_buffer *desc, u32 desc_size,
+	struct kvec *bufferlist, u8 io_type, struct hv_bounce_pkt **bounce_pkt,
+	u64 requestid);
+
+extern int vmbus_sendpacket_mpb_desc_bounce(struct vmbus_channel *channel,
+	struct vmbus_packet_mpb_array *desc, u32 desc_size,
+	struct kvec *bufferlist, u8 io_type, struct hv_bounce_pkt **bounce_pkt,
+	u64 requestid);
+
+extern void hv_pkt_bounce(struct vmbus_channel *channel,
+			  struct hv_bounce_pkt *bounce_pkt);
+
 /* struct hv_monitor_page Layout */
 /* ------------------------------------------------------ */
 /* | 0   | TriggerState (4 bytes) | Rsvd1 (4 bytes)     | */
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index d1a936091665..be7621b070f2 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1109,6 +1109,8 @@ void vmbus_set_sc_create_callback(struct vmbus_channel *primary_channel,
 void vmbus_set_chn_rescind_callback(struct vmbus_channel *channel,
 		void (*chn_rescind_cb)(struct vmbus_channel *));
 
+extern int hv_bounce_resources_reserve(struct vmbus_channel *channel,
+				       u32 min_bounce_bytes);
 /*
  * Check if sub-channels have already been offerred. This API will be useful
  * when the driver is unloaded after establishing sub-channels. In this case,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (9 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 10/12] HV: Add bounce buffer support for Isolation VM Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-02-28 15:03 ` [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc Tianyu Lan
  11 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, davem, kuba
  Cc: Tianyu Lan, linux-hyperv, netdev, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Add Isolation VM support for netvsc driver. Map send/receive
ring buffer in extra address space in SNP isolation VM, reserve
bounce buffer for packets sent via vmbus_sendpacket_pagebuffer()
and release bounce buffer via hv_pkt_bounce() when get send
complete response from host.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |  3 +
 drivers/net/hyperv/netvsc.c     | 97 ++++++++++++++++++++++++++++++---
 2 files changed, 92 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 11266b92bcf0..45d5838ff128 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1027,14 +1027,17 @@ struct netvsc_device {
 
 	/* Receive buffer allocated by us but manages by NetVSP */
 	void *recv_buf;
+	void *recv_original_buf;
 	u32 recv_buf_size; /* allocated bytes */
 	u32 recv_buf_gpadl_handle;
 	u32 recv_section_cnt;
 	u32 recv_section_size;
 	u32 recv_completion_cnt;
 
+
 	/* Send buffer allocated by us */
 	void *send_buf;
+	void *send_original_buf;
 	u32 send_buf_size;
 	u32 send_buf_gpadl_handle;
 	u32 send_section_cnt;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 77657c5acc65..171af85e055d 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -26,7 +26,7 @@
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
-
+#include "../../hv/hyperv_vmbus.h"
 /*
  * Switch the data path from the synthetic interface to the VF
  * interface.
@@ -119,8 +119,21 @@ static void free_netvsc_device(struct rcu_head *head)
 	int i;
 
 	kfree(nvdev->extension);
-	vfree(nvdev->recv_buf);
-	vfree(nvdev->send_buf);
+
+	if (nvdev->recv_original_buf) {
+		iounmap(nvdev->recv_buf);
+		vfree(nvdev->recv_original_buf);
+	} else {
+		vfree(nvdev->recv_buf);
+	}
+
+	if (nvdev->send_original_buf) {
+		iounmap(nvdev->send_buf);
+		vfree(nvdev->send_original_buf);
+	} else {
+		vfree(nvdev->send_buf);
+	}
+
 	kfree(nvdev->send_section_map);
 
 	for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -241,13 +254,18 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
 				       struct netvsc_device *net_device,
 				       struct net_device *ndev)
 {
+	void *recv_buf;
 	int ret;
 
 	if (net_device->recv_buf_gpadl_handle) {
+		if (net_device->recv_original_buf)
+			recv_buf = net_device->recv_original_buf;
+		else
+			recv_buf = net_device->recv_buf;
+
 		ret = vmbus_teardown_gpadl(device->channel,
 					   net_device->recv_buf_gpadl_handle,
-					   net_device->recv_buf,
-					   net_device->recv_buf_size);
+					   recv_buf, net_device->recv_buf_size);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -265,13 +283,18 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
 				       struct netvsc_device *net_device,
 				       struct net_device *ndev)
 {
+	void *send_buf;
 	int ret;
 
 	if (net_device->send_buf_gpadl_handle) {
+		if (net_device->send_original_buf)
+			send_buf = net_device->send_original_buf;
+		else
+			send_buf = net_device->send_buf;
+
 		ret = vmbus_teardown_gpadl(device->channel,
 					   net_device->send_buf_gpadl_handle,
-					   net_device->send_buf,
-					   net_device->send_buf_size);
+					   send_buf, net_device->send_buf_size);
 
 		/* If we failed here, we might as well return and have a leak
 		 * rather than continue and a bugchk
@@ -306,9 +329,19 @@ static int netvsc_init_buf(struct hv_device *device,
 	struct nvsp_1_message_send_receive_buffer_complete *resp;
 	struct net_device *ndev = hv_get_drvdata(device);
 	struct nvsp_message *init_packet;
+	struct vm_struct *area;
+	u64 extra_phys;
 	unsigned int buf_size;
+	unsigned long vaddr;
 	size_t map_words;
-	int ret = 0;
+	int ret = 0, i;
+
+	ret = hv_bounce_resources_reserve(device->channel,
+			PAGE_SIZE * 1024);
+	if (ret) {
+		pr_warn("Fail to reserve bounce buffer.\n");
+		return -ENOMEM;
+	}
 
 	/* Get receive buffer area. */
 	buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -345,6 +378,28 @@ static int netvsc_init_buf(struct hv_device *device,
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		area = get_vm_area(buf_size, VM_IOREMAP);
+		if (!area)
+			goto cleanup;
+
+		vaddr = (unsigned long)area->addr;
+		for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+			extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * HV_HYP_PAGE_SIZE)
+				<< HV_HYP_PAGE_SHIFT) + ms_hyperv.shared_gpa_boundary;
+			ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+					   vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+					   extra_phys, PAGE_KERNEL_IO);
+		}
+
+		if (ret)
+			goto cleanup;
+
+		net_device->recv_original_buf = net_device->recv_buf;
+		net_device->recv_buf = (void*)vaddr;
+	}
+
+
 	/* Notify the NetVsp of the gpadl handle */
 	init_packet = &net_device->channel_init_pkt;
 	memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -435,12 +490,36 @@ static int netvsc_init_buf(struct hv_device *device,
 				    buf_size,
 				    &net_device->send_buf_gpadl_handle,
 				    VMBUS_PAGE_VISIBLE_READ_WRITE);
+	
 	if (ret != 0) {
 		netdev_err(ndev,
 			   "unable to establish send buffer's gpadl\n");
 		goto cleanup;
 	}
 
+	if (hv_isolation_type_snp()) {
+		area = get_vm_area(buf_size , VM_IOREMAP);
+		if (!area)
+			goto cleanup;
+
+		vaddr = (unsigned long)area->addr;
+	
+		for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+			extra_phys = (virt_to_hvpfn(net_device->send_buf + i * HV_HYP_PAGE_SIZE)
+				<< HV_HYP_PAGE_SHIFT) + ms_hyperv.shared_gpa_boundary;
+			ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+					   vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+					   extra_phys, PAGE_KERNEL_IO);
+		}
+
+		if (ret)
+			goto cleanup;
+
+		net_device->send_original_buf = net_device->send_buf;
+		net_device->send_buf = (void*)vaddr;	
+	}
+
+	
 	/* Notify the NetVsp of the gpadl handle */
 	init_packet = &net_device->channel_init_pkt;
 	memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -747,6 +826,8 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
 		tx_stats->bytes += packet->total_bytes;
 		u64_stats_update_end(&tx_stats->syncp);
 
+		if (desc->type == VM_PKT_COMP && packet->bounce_pkt)
+			hv_pkt_bounce(channel, packet->bounce_pkt);
 		napi_consume_skb(skb, budget);
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc
  2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
                   ` (10 preceding siblings ...)
  2021-02-28 15:03 ` [RFC PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
@ 2021-02-28 15:03 ` Tianyu Lan
  2021-03-01  6:54   ` Christoph Hellwig
  11 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-02-28 15:03 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, jejb, martin.petersen
  Cc: Tianyu Lan, linux-hyperv, linux-scsi, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Storvsc driver needs to reverse additional bounce
buffers to receive multipagebuffer packet and copy
data from brounce buffer when get response messge
from message.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index c5b4974eb41f..4ae8e2a427e4 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -33,6 +33,8 @@
 #include <scsi/scsi_transport.h>
 #include <asm/mshyperv.h>
 
+#include "../hv/hyperv_vmbus.h"
+
 /*
  * All wire protocol details (storage protocol between the guest and the host)
  * are consolidated here.
@@ -725,6 +727,10 @@ static void handle_sc_creation(struct vmbus_channel *new_sc)
 	/* Add the sub-channel to the array of available channels. */
 	stor_device->stor_chns[new_sc->target_cpu] = new_sc;
 	cpumask_set_cpu(new_sc->target_cpu, &stor_device->alloced_cpus);
+
+	if (hv_bounce_resources_reserve(device->channel,
+			stor_device->max_transfer_bytes))
+		pr_warn("Fail to reserve bounce buffer\n");
 }
 
 static void  handle_multichannel_storage(struct hv_device *device, int max_chns)
@@ -964,6 +970,18 @@ static int storvsc_channel_init(struct hv_device *device, bool is_fc)
 	stor_device->max_transfer_bytes =
 		vstor_packet->storage_channel_properties.max_transfer_bytes;
 
+	/*
+	 * Reserve enough bounce resources to be able to support paging
+	 * operations under low memory conditions, that cannot rely on
+	 * additional resources to be allocated.
+	 */
+	ret =  hv_bounce_resources_reserve(device->channel,
+			stor_device->max_transfer_bytes);
+	if (ret < 0) {
+		pr_warn("Fail to reserve bounce buffer\n");
+		goto done;
+	}
+
 	if (!is_fc)
 		goto done;
 
@@ -1263,6 +1281,11 @@ static void storvsc_on_channel_callback(void *context)
 
 		request = (struct storvsc_cmd_request *)(unsigned long)cmd_rqst;
 
+		if (desc->type == VM_PKT_COMP && request->bounce_pkt) {
+			hv_pkt_bounce(channel, request->bounce_pkt);
+			request->bounce_pkt = NULL;
+		}
+
 		if (request == &stor_device->init_request ||
 		    request == &stor_device->reset_request) {
 			memcpy(&request->vstor_packet, packet,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc
  2021-02-28 15:03 ` [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc Tianyu Lan
@ 2021-03-01  6:54   ` Christoph Hellwig
  2021-03-01 13:43     ` Tianyu Lan
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2021-03-01  6:54 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, sthemmin, wei.liu, jejb, martin.petersen,
	Tianyu Lan, linux-hyperv, linux-scsi, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

This should be handled by the DMA mapping layer, just like for native
SEV support.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc
  2021-03-01  6:54   ` Christoph Hellwig
@ 2021-03-01 13:43     ` Tianyu Lan
  2021-03-01 19:45       ` [EXTERNAL] " Sunil Muthuswamy
  0 siblings, 1 reply; 26+ messages in thread
From: Tianyu Lan @ 2021-03-01 13:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kys, haiyangz, sthemmin, wei.liu, jejb, martin.petersen,
	Tianyu Lan, linux-hyperv, linux-scsi, linux-kernel, vkuznets,
	thomas.lendacky, brijesh.singh, sunilmut

Hi Christoph:
      Thanks a lot for your review. There are some reasons.
      1) Vmbus drivers don't use DMA API now.
      2) Hyper-V Vmbus channel ring buffer already play bounce buffer 
role for most vmbus drivers. Just two kinds of packets from 
netvsc/storvsc are uncovered.
      3) In AMD SEV-SNP based Hyper-V guest, the access physical address 
of shared memory should be bounce buffer memory physical address plus
with a shared memory boundary(e.g, 48bit) reported Hyper-V CPUID. It's
called virtual top of memory(vTom) in AMD spec and works as a watermark. 
So it needs to ioremap/memremap the associated physical address above 
the share memory boundary before accessing them. swiotlb_bounce() uses
low end physical address to access bounce buffer and this doesn't work 
in this senario. If something wrong, please help me correct me.

Thanks.


On 3/1/2021 2:54 PM, Christoph Hellwig wrote:
> This should be handled by the DMA mapping layer, just like for native
> SEV support.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXTERNAL] Re: [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc
  2021-03-01 13:43     ` Tianyu Lan
@ 2021-03-01 19:45       ` Sunil Muthuswamy
  2021-03-02 16:03         ` Tianyu Lan
  0 siblings, 1 reply; 26+ messages in thread
From: Sunil Muthuswamy @ 2021-03-01 19:45 UTC (permalink / raw)
  To: Tianyu Lan, Christoph Hellwig
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu, jejb,
	martin.petersen, Tianyu Lan, linux-hyperv, linux-scsi,
	linux-kernel, vkuznets, thomas.lendacky, brijesh.singh

> Hi Christoph:
>       Thanks a lot for your review. There are some reasons.
>       1) Vmbus drivers don't use DMA API now.
What is blocking us from making the Hyper-V drivers use the DMA API's? They
will be a null-op generally, when there is no bounce buffer support needed.

>       2) Hyper-V Vmbus channel ring buffer already play bounce buffer
> role for most vmbus drivers. Just two kinds of packets from
> netvsc/storvsc are uncovered.
How does this make a difference here?

>       3) In AMD SEV-SNP based Hyper-V guest, the access physical address
> of shared memory should be bounce buffer memory physical address plus
> with a shared memory boundary(e.g, 48bit) reported Hyper-V CPUID. It's
> called virtual top of memory(vTom) in AMD spec and works as a watermark.
> So it needs to ioremap/memremap the associated physical address above
> the share memory boundary before accessing them. swiotlb_bounce() uses
> low end physical address to access bounce buffer and this doesn't work
> in this senario. If something wrong, please help me correct me.
> 
There are alternative implementations of swiotlb on top of the core swiotlb
API's. One option is to have Hyper-V specific swiotlb wrapper DMA API's with
the custom logic above.

> Thanks.
> 
> 
> On 3/1/2021 2:54 PM, Christoph Hellwig wrote:
> > This should be handled by the DMA mapping layer, just like for native
> > SEV support.
I agree with Christoph's comment that in principle, this should be handled using
the DMA API's

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [EXTERNAL] Re: [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc
  2021-03-01 19:45       ` [EXTERNAL] " Sunil Muthuswamy
@ 2021-03-02 16:03         ` Tianyu Lan
  0 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-03-02 16:03 UTC (permalink / raw)
  To: Sunil Muthuswamy, Christoph Hellwig
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, wei.liu, jejb,
	martin.petersen, Tianyu Lan, linux-hyperv, linux-scsi,
	linux-kernel, vkuznets, thomas.lendacky, brijesh.singh

Hi Sunil:
      Thanks for your review.

On 3/2/2021 3:45 AM, Sunil Muthuswamy wrote:
>> Hi Christoph:
>>        Thanks a lot for your review. There are some reasons.
>>        1) Vmbus drivers don't use DMA API now.
> What is blocking us from making the Hyper-V drivers use the DMA API's? They
> will be a null-op generally, when there is no bounce buffer support needed.
> 
>>        2) Hyper-V Vmbus channel ring buffer already play bounce buffer
>> role for most vmbus drivers. Just two kinds of packets from
>> netvsc/storvsc are uncovered.
> How does this make a difference here?
> 
>>        3) In AMD SEV-SNP based Hyper-V guest, the access physical address
>> of shared memory should be bounce buffer memory physical address plus
>> with a shared memory boundary(e.g, 48bit) reported Hyper-V CPUID. It's
>> called virtual top of memory(vTom) in AMD spec and works as a watermark.
>> So it needs to ioremap/memremap the associated physical address above
>> the share memory boundary before accessing them. swiotlb_bounce() uses
>> low end physical address to access bounce buffer and this doesn't work
>> in this senario. If something wrong, please help me correct me.
>>
> There are alternative implementations of swiotlb on top of the core swiotlb
> API's. One option is to have Hyper-V specific swiotlb wrapper DMA API's with
> the custom logic above.

Agree. Hyper-V should have its own DMA ops and put Hyper-V bounce buffer
code in DMA API callback. For vmbus channel ring buffer, it doesn't need 
additional bounce buffer and there are two options. 1) Not call DMA API 
around them 2) pass a flag in DMA API to notify Hyper-V DMA callback
and not allocate bounce buffer for them.

> 
>> Thanks.
>>
>>
>> On 3/1/2021 2:54 PM, Christoph Hellwig wrote:
>>> This should be handled by the DMA mapping layer, just like for native
>>> SEV support.
> I agree with Christoph's comment that in principle, this should be handled using
> the DMA API's
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()
  2021-02-28 15:03 ` [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl() Tianyu Lan
@ 2021-03-03 16:27   ` Vitaly Kuznetsov
  2021-03-05  6:06     ` Tianyu Lan
  0 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-03 16:27 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, thomas.lendacky,
	brijesh.singh, sunilmut, kys, haiyangz, sthemmin, wei.liu, tglx,
	mingo, bp, x86, hpa, davem, kuba, gregkh

Tianyu Lan <ltykernel@gmail.com> writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Add visibility parameter for vmbus_establish_gpadl() and prepare
> to change host visibility when create gpadl for buffer.
>

"No functional change" as you don't actually use the parameter.

> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>

Nit: Sunil's SoB looks misleading because the patch is from you,
Co-Developed-by should be sufficient.

> ---
>  arch/x86/include/asm/hyperv-tlfs.h |  9 +++++++++
>  drivers/hv/channel.c               | 20 +++++++++++---------
>  drivers/net/hyperv/netvsc.c        |  8 ++++++--
>  drivers/uio/uio_hv_generic.c       |  7 +++++--
>  include/linux/hyperv.h             |  3 ++-
>  5 files changed, 33 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index e6cd3fee562b..fb1893a4c32b 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -236,6 +236,15 @@ enum hv_isolation_type {
>  /* TSC invariant control */
>  #define HV_X64_MSR_TSC_INVARIANT_CONTROL	0x40000118
>  
> +/* Hyper-V GPA map flags */
> +#define HV_MAP_GPA_PERMISSIONS_NONE		0x0
> +#define HV_MAP_GPA_READABLE			0x1
> +#define HV_MAP_GPA_WRITABLE			0x2
> +
> +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
> +#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
> +#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
> +

Are these x86-only? If not, then we should probably move these defines
to include/asm-generic/hyperv-tlfs.h. In case they are, we should do
something as we're using them from arch neutral places.

Also, could you please add a comment stating that these flags define
host's visibility of a page and not guest's (this seems to be not
obvious at least to me).

>  /*
>   * Declare the MSR used to setup pages used to communicate with the hypervisor.
>   */
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 0bd202de7960..daa21cc72beb 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -242,7 +242,7 @@ EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
>   */
>  static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
>  			       u32 size, u32 send_offset,
> -			       struct vmbus_channel_msginfo **msginfo)
> +			       struct vmbus_channel_msginfo **msginfo, u32 visibility)
>  {
>  	int i;
>  	int pagecount;
> @@ -391,7 +391,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
>  static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  				   enum hv_gpadl_type type, void *kbuffer,
>  				   u32 size, u32 send_offset,
> -				   u32 *gpadl_handle)
> +				   u32 *gpadl_handle, u32 visibility)
>  {
>  	struct vmbus_channel_gpadl_header *gpadlmsg;
>  	struct vmbus_channel_gpadl_body *gpadl_body;
> @@ -405,7 +405,8 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	next_gpadl_handle =
>  		(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
>  
> -	ret = create_gpadl_header(type, kbuffer, size, send_offset, &msginfo);
> +	ret = create_gpadl_header(type, kbuffer, size, send_offset,
> +				  &msginfo, visibility);
>  	if (ret)
>  		return ret;
>  
> @@ -496,10 +497,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>   * @gpadl_handle: some funky thing
>   */
>  int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
> -			  u32 size, u32 *gpadl_handle)
> +			  u32 size, u32 *gpadl_handle, u32 visibility)
>  {
>  	return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
> -				       0U, gpadl_handle);
> +				       0U, gpadl_handle, visibility);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
>  
> @@ -610,10 +611,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	newchannel->ringbuffer_gpadlhandle = 0;
>  
>  	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
> -				      page_address(newchannel->ringbuffer_page),
> -				      (send_pages + recv_pages) << PAGE_SHIFT,
> -				      newchannel->ringbuffer_send_offset << PAGE_SHIFT,
> -				      &newchannel->ringbuffer_gpadlhandle);
> +			page_address(newchannel->ringbuffer_page),
> +			(send_pages + recv_pages) << PAGE_SHIFT,
> +			newchannel->ringbuffer_send_offset << PAGE_SHIFT,
> +			&newchannel->ringbuffer_gpadlhandle,
> +			VMBUS_PAGE_VISIBLE_READ_WRITE);

Nit: I liked the original alignment more and we can avoid the unneeded
code churn.

>  	if (err)
>  		goto error_clean_ring;
>  
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 2353623259f3..bb72c7578330 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -333,7 +333,8 @@ static int netvsc_init_buf(struct hv_device *device,
>  	 */
>  	ret = vmbus_establish_gpadl(device->channel, net_device->recv_buf,
>  				    buf_size,
> -				    &net_device->recv_buf_gpadl_handle);
> +				    &net_device->recv_buf_gpadl_handle,
> +				    VMBUS_PAGE_VISIBLE_READ_WRITE);
>  	if (ret != 0) {
>  		netdev_err(ndev,
>  			"unable to establish receive buffer's gpadl\n");
> @@ -422,10 +423,13 @@ static int netvsc_init_buf(struct hv_device *device,
>  	/* Establish the gpadl handle for this buffer on this
>  	 * channel.  Note: This call uses the vmbus connection rather
>  	 * than the channel to establish the gpadl handle.
> +	 * Send buffer should theoretically be only marked as "read-only", but
> +	 * the netvsp for some reason needs write capabilities on it.
>  	 */
>  	ret = vmbus_establish_gpadl(device->channel, net_device->send_buf,
>  				    buf_size,
> -				    &net_device->send_buf_gpadl_handle);
> +				    &net_device->send_buf_gpadl_handle,
> +				    VMBUS_PAGE_VISIBLE_READ_WRITE);
>  	if (ret != 0) {
>  		netdev_err(ndev,
>  			   "unable to establish send buffer's gpadl\n");
> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
> index 0330ba99730e..813a7bee5139 100644
> --- a/drivers/uio/uio_hv_generic.c
> +++ b/drivers/uio/uio_hv_generic.c
> @@ -29,6 +29,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/vmalloc.h>
>  #include <linux/slab.h>
> +#include <asm/mshyperv.h>
>  
>  #include "../hv/hyperv_vmbus.h"
>  
> @@ -295,7 +296,8 @@ hv_uio_probe(struct hv_device *dev,
>  	}
>  
>  	ret = vmbus_establish_gpadl(channel, pdata->recv_buf,
> -				    RECV_BUFFER_SIZE, &pdata->recv_gpadl);
> +				    RECV_BUFFER_SIZE, &pdata->recv_gpadl,
> +				    VMBUS_PAGE_VISIBLE_READ_WRITE);
>  	if (ret)
>  		goto fail_close;
>  
> @@ -315,7 +317,8 @@ hv_uio_probe(struct hv_device *dev,
>  	}
>  
>  	ret = vmbus_establish_gpadl(channel, pdata->send_buf,
> -				    SEND_BUFFER_SIZE, &pdata->send_gpadl);
> +				    SEND_BUFFER_SIZE, &pdata->send_gpadl,
> +				    VMBUS_PAGE_VISIBLE_READ_ONLY);

Actually, this is the only place where you use 'READ_ONLY' mapping --
which makes me wonder if it's actually worth it or we can hard-code
VMBUS_PAGE_VISIBLE_READ_WRITE for now and avoid this additional
parameter.

>  	if (ret)
>  		goto fail_close;
>  
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index f1d74dcf0353..016fdca20d6e 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1179,7 +1179,8 @@ extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
>  extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
>  				      void *kbuffer,
>  				      u32 size,
> -				      u32 *gpadl_handle);
> +				      u32 *gpadl_handle,
> +				      u32 visibility);
>  
>  extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
>  				     u32 gpadl_handle);

-- 
Vitaly


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support
  2021-02-28 15:03 ` [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support Tianyu Lan
@ 2021-03-03 16:58   ` Vitaly Kuznetsov
  2021-03-05  6:23     ` Tianyu Lan
  0 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-03 16:58 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, linux-arch,
	thomas.lendacky, brijesh.singh, sunilmut, kys, haiyangz,
	sthemmin, wei.liu, tglx, mingo, bp, x86, hpa, davem, kuba,
	gregkh, arnd

Tianyu Lan <ltykernel@gmail.com> writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Add new hvcall guest address host visibility support. Mark vmbus
> ring buffer visible to host when create gpadl buffer and mark back
> to not visible when tear down gpadl buffer.
>
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 13 ++++++++
>  arch/x86/include/asm/mshyperv.h    |  4 +--
>  arch/x86/kernel/cpu/mshyperv.c     | 46 ++++++++++++++++++++++++++
>  drivers/hv/channel.c               | 53 ++++++++++++++++++++++++++++--
>  drivers/net/hyperv/hyperv_net.h    |  1 +
>  drivers/net/hyperv/netvsc.c        |  9 +++--
>  drivers/uio/uio_hv_generic.c       |  6 ++--
>  include/asm-generic/hyperv-tlfs.h  |  1 +
>  include/linux/hyperv.h             |  3 +-
>  9 files changed, 126 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index fb1893a4c32b..d22b1c3f425a 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -573,4 +573,17 @@ enum hv_interrupt_type {
>  
>  #include <asm-generic/hyperv-tlfs.h>
>  
> +/* All input parameters should be in single page. */
> +#define HV_MAX_MODIFY_GPA_REP_COUNT		\
> +	((PAGE_SIZE - 2 * sizeof(u64)) / (sizeof(u64)))

Would it be easier to express this as '((PAGE_SIZE / sizeof(u64)) - 2' ?

> +
> +/* HvCallModifySparseGpaPageHostVisibility hypercall */
> +struct hv_input_modify_sparse_gpa_page_host_visibility {
> +	u64 partition_id;
> +	u32 host_visibility:2;
> +	u32 reserved0:30;
> +	u32 reserved1;
> +	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
> +} __packed;
> +
>  #endif
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccf60a809a17..1e8275d35c1f 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -262,13 +262,13 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
>  	msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
>  	msi_entry->data.as_uint32 = msi_desc->msg.data;
>  }
> -

stray change

>  struct irq_domain *hv_create_pci_msi_domain(void);
>  
>  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>  		struct hv_interrupt_entry *entry);
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> -
> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index e88bc296afca..347c32eac8fd 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -37,6 +37,8 @@
>  bool hv_root_partition;
>  EXPORT_SYMBOL_GPL(hv_root_partition);
>  
> +#define HV_PARTITION_ID_SELF ((u64)-1)
> +

We seem to have this already:

include/asm-generic/hyperv-tlfs.h:#define HV_PARTITION_ID_SELF          ((u64)-1)

>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
>  
> @@ -477,3 +479,47 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
>  	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
>  	.init.init_platform	= ms_hyperv_init_platform,
>  };
> +
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
> +{
> +	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
> +	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
> +	u16 pages_processed;
> +	u64 hv_status;
> +	unsigned long flags;
> +
> +	/* no-op if partition isolation is not enabled */
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> +			HV_MAX_MODIFY_GPA_REP_COUNT);
> +		return -EINVAL;
> +	}
> +
> +	local_irq_save(flags);
> +	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
> +			this_cpu_ptr(hyperv_pcpu_input_arg);
> +	input = *input_pcpu;
> +	if (unlikely(!input)) {
> +		local_irq_restore(flags);
> +		return -1;

-EFAULT/-ENOMEM/... maybe ?

> +	}
> +
> +	input->partition_id = HV_PARTITION_ID_SELF;
> +	input->host_visibility = visibility;
> +	input->reserved0 = 0;
> +	input->reserved1 = 0;
> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> +	hv_status = hv_do_rep_hypercall(
> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> +			0, input, &pages_processed);
> +	local_irq_restore(flags);
> +
> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> +		return 0;
> +
> +	return -EFAULT;

Could we just propagate "hv_status & HV_HYPERCALL_RESULT_MASK" maybe?

> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index daa21cc72beb..204e6f3598a5 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -237,6 +237,38 @@ int vmbus_send_modifychannel(u32 child_relid, u32 target_vp)
>  }
>  EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
>  
> +/*
> + * hv_set_mem_host_visibility - Set host visibility for specified memory.
> + */
> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
> +{
> +	int i, pfn;
> +	int pagecount = size >> HV_HYP_PAGE_SHIFT;
> +	u64 *pfn_array;
> +	int ret = 0;
> +
> +	if (!hv_isolation_type_snp())
> +		return 0;
> +
> +	pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
> +	if (!pfn_array)
> +		return -ENOMEM;
> +
> +	for (i = 0, pfn = 0; i < pagecount; i++) {
> +		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
> +		pfn++;
> +
> +		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
> +			ret |= hv_mark_gpa_visibility(pfn, pfn_array, visibility);
> +			pfn = 0;

hv_mark_gpa_visibility() return different error codes and aggregating
them with 

 ret |= ...

will have an unpredictable result. I'd suggest bail immediately instead:

 if (ret)
     goto err_free_pfn_array;

> +		}
> +	}
> +

err_free_pfn_array:

> +	vfree(pfn_array);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(hv_set_mem_host_visibility);
> +
>  /*
>   * create_gpadl_header - Creates a gpadl for the specified buffer
>   */
> @@ -410,6 +442,12 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  	if (ret)
>  		return ret;
>  
> +	ret = hv_set_mem_host_visibility(kbuffer, size, visibility);
> +	if (ret) {
> +		pr_warn("Failed to set host visibility.\n");
> +		return ret;
> +	}
> +
>  	init_completion(&msginfo->waitevent);
>  	msginfo->waiting_channel = channel;
>  
> @@ -693,7 +731,9 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  error_free_info:
>  	kfree(open_info);
>  error_free_gpadl:
> -	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
> +	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle,
> +			     page_address(newchannel->ringbuffer_page),
> +			     newchannel->ringbuffer_pagecount << PAGE_SHIFT);

Instead of modifying vmbus_teardown_gpadl() interface and all its call
sites, could we just keep track of all established gpadls and then get 
the required data from there? I.e. make vmbus_establish_gpadl() save
kbuffer/size to some internal structure associated with 'gpadl_handle'.

>  	newchannel->ringbuffer_gpadlhandle = 0;
>  error_clean_ring:
>  	hv_ringbuffer_cleanup(&newchannel->outbound);
> @@ -740,7 +780,8 @@ EXPORT_SYMBOL_GPL(vmbus_open);
>  /*
>   * vmbus_teardown_gpadl -Teardown the specified GPADL handle
>   */
> -int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
> +int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle,
> +			 void *kbuffer, u32 size)

This probably doesn't matter but why not 'u64 size'?

>  {
>  	struct vmbus_channel_gpadl_teardown *msg;
>  	struct vmbus_channel_msginfo *info;
> @@ -793,6 +834,10 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>  	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
>  
>  	kfree(info);
> +
> +	if (hv_set_mem_host_visibility(kbuffer, size, VMBUS_PAGE_NOT_VISIBLE))
> +		pr_warn("Fail to set mem host visibility.\n");

pr_err() maybe?

> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
> @@ -869,7 +914,9 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
>  	/* Tear down the gpadl for the channel's ring buffer */
>  	else if (channel->ringbuffer_gpadlhandle) {
>  		ret = vmbus_teardown_gpadl(channel,
> -					   channel->ringbuffer_gpadlhandle);
> +					   channel->ringbuffer_gpadlhandle,
> +					   page_address(channel->ringbuffer_page),
> +					   channel->ringbuffer_pagecount << PAGE_SHIFT);
>  		if (ret) {
>  			pr_err("Close failed: teardown gpadl return %d\n", ret);
>  			/*
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 2a87cfa27ac0..b3a43c4ec8ab 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -1034,6 +1034,7 @@ struct netvsc_device {
>  
>  	/* Send buffer allocated by us */
>  	void *send_buf;
> +	u32 send_buf_size;
>  	u32 send_buf_gpadl_handle;
>  	u32 send_section_cnt;
>  	u32 send_section_size;
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index bb72c7578330..08d73401bb28 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -245,7 +245,9 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
>  
>  	if (net_device->recv_buf_gpadl_handle) {
>  		ret = vmbus_teardown_gpadl(device->channel,
> -					   net_device->recv_buf_gpadl_handle);
> +					   net_device->recv_buf_gpadl_handle,
> +					   net_device->recv_buf,
> +					   net_device->recv_buf_size);
>  
>  		/* If we failed here, we might as well return and have a leak
>  		 * rather than continue and a bugchk
> @@ -267,7 +269,9 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
>  
>  	if (net_device->send_buf_gpadl_handle) {
>  		ret = vmbus_teardown_gpadl(device->channel,
> -					   net_device->send_buf_gpadl_handle);
> +					   net_device->send_buf_gpadl_handle,
> +					   net_device->send_buf,
> +					   net_device->send_buf_size);
>  
>  		/* If we failed here, we might as well return and have a leak
>  		 * rather than continue and a bugchk
> @@ -419,6 +423,7 @@ static int netvsc_init_buf(struct hv_device *device,
>  		ret = -ENOMEM;
>  		goto cleanup;
>  	}
> +	net_device->send_buf_size = buf_size;
>  
>  	/* Establish the gpadl handle for this buffer on this
>  	 * channel.  Note: This call uses the vmbus connection rather
> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
> index 813a7bee5139..c8d4704fc90c 100644
> --- a/drivers/uio/uio_hv_generic.c
> +++ b/drivers/uio/uio_hv_generic.c
> @@ -181,13 +181,15 @@ static void
>  hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
>  {
>  	if (pdata->send_gpadl) {
> -		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl);
> +		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl,
> +				     pdata->send_buf, SEND_BUFFER_SIZE);
>  		pdata->send_gpadl = 0;
>  		vfree(pdata->send_buf);
>  	}
>  
>  	if (pdata->recv_gpadl) {
> -		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl);
> +		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl,
> +				     pdata->recv_buf, RECV_BUFFER_SIZE);
>  		pdata->recv_gpadl = 0;
>  		vfree(pdata->recv_buf);
>  	}
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 83448e837ded..ad19f4199f90 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_RETARGET_INTERRUPT		0x007e
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> +#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
>  
>  #define HV_FLUSH_ALL_PROCESSORS			BIT(0)
>  #define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	BIT(1)
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 016fdca20d6e..41cbaa2db567 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1183,7 +1183,8 @@ extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
>  				      u32 visibility);
>  
>  extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
> -				     u32 gpadl_handle);
> +				u32 gpadl_handle,
> +				void *kbuffer, u32 size);
>  
>  void vmbus_reset_channel_cb(struct vmbus_channel *channel);

-- 
Vitaly


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary
  2021-02-28 15:03 ` [RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary Tianyu Lan
@ 2021-03-03 17:05   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 26+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-03 17:05 UTC (permalink / raw)
  To: Tianyu Lan, kys, haiyangz, sthemmin, wei.liu, tglx, mingo, bp,
	x86, hpa, arnd
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch,
	thomas.lendacky, brijesh.singh, sunilmut

Tianyu Lan <ltykernel@gmail.com> writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via GHCB. Hyper-V also exposes shared memory boundary via
> cpuid HYPERV_CPUID_ISOLATION_CONFIG and store it in the
> shared_gpa_boundary of ms_hyperv struct. This prepares
> to share memory with host for SNP guest.
>

It seems the patch could be split into two: per-cpu GHCB setup and
shared memory boundary setup as these changes seem to be independent.

> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c      | 52 +++++++++++++++++++++++++++++++---
>  arch/x86/kernel/cpu/mshyperv.c |  2 ++
>  include/asm-generic/mshyperv.h | 14 ++++++++-
>  3 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0db5137d5b81..90e65fbf4c58 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -82,6 +82,9 @@ static int hv_cpu_init(unsigned int cpu)
>  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>  	void **input_arg;
>  	struct page *pg;
> +	u64 ghcb_gpa;
> +	void *ghcb_va;
> +	void **ghcb_base;
>  
>  	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
>  	pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
> @@ -128,6 +131,17 @@ static int hv_cpu_init(unsigned int cpu)
>  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
>  	}
>  
> +	if (ms_hyperv.ghcb_base) {
> +		rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> +
> +		ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);

Do we actually need to do ioremap_cache() and not ioremap_uc() for GHCB?
(just wondering)

> +		if (!ghcb_va)
> +			return -ENOMEM;
> +
> +		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +		*ghcb_base = ghcb_va;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -223,6 +237,7 @@ static int hv_cpu_die(unsigned int cpu)
>  	unsigned long flags;
>  	void **input_arg;
>  	void *pg;
> +	void **ghcb_va = NULL;
>  
>  	local_irq_save(flags);
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> @@ -236,6 +251,13 @@ static int hv_cpu_die(unsigned int cpu)
>  		*output_arg = NULL;
>  	}
>  
> +	if (ms_hyperv.ghcb_base) {
> +		ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +		if (*ghcb_va)
> +			iounmap(*ghcb_va);
> +		*ghcb_va = NULL;
> +	}
> +
>  	local_irq_restore(flags);
>  
>  	free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
> @@ -372,6 +394,9 @@ void __init hyperv_init(void)
>  	u64 guest_id, required_msrs;
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
>  	int cpuhp, i;
> +	u64 ghcb_gpa;
> +	void *ghcb_va;
> +	void **ghcb_base;
>  
>  	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>  		return;
> @@ -432,9 +457,24 @@ void __init hyperv_init(void)
>  			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>  			__builtin_return_address(0));
> -	if (hv_hypercall_pg == NULL) {
> -		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> -		goto remove_cpuhp_state;
> +	if (hv_hypercall_pg == NULL)
> +		goto clean_guest_os_id;
> +
> +	if (hv_isolation_type_snp()) {
> +		ms_hyperv.ghcb_base = alloc_percpu(void *);
> +		if (!ms_hyperv.ghcb_base)
> +			goto clean_guest_os_id;
> +
> +		rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> +		ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
> +		if (!ghcb_va) {
> +			free_percpu(ms_hyperv.ghcb_base);
> +			ms_hyperv.ghcb_base = NULL;
> +			goto clean_guest_os_id;
> +		}
> +
> +		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +		*ghcb_base = ghcb_va;
>  	}
>  
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -499,7 +539,8 @@ void __init hyperv_init(void)
>  
>  	return;
>  
> -remove_cpuhp_state:
> +clean_guest_os_id:
> +	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  	cpuhp_remove_state(cpuhp);
>  free_vp_assist_page:
>  	kfree(hv_vp_assist_page);
> @@ -528,6 +569,9 @@ void hyperv_cleanup(void)
>  	 */
>  	hv_hypercall_pg = NULL;
>  
> +	if (ms_hyperv.ghcb_base)
> +		free_percpu(ms_hyperv.ghcb_base);
> +
>  	/* Reset the hypercall page */
>  	hypercall_msr.as_uint64 = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 347c32eac8fd..d6c363456cbf 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -330,6 +330,8 @@ static void __init ms_hyperv_init_platform(void)
>  	if (ms_hyperv.features_b & HV_ISOLATION) {
>  		ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>  		ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
> +		ms_hyperv.shared_gpa_boundary =
> +			(u64)1 << ms_hyperv.shared_gpa_boundary_bits;
>  
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index dff58a3db5d5..ad0e33776668 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -34,7 +34,19 @@ struct ms_hyperv_info {
>  	u32 max_vp_index;
>  	u32 max_lp_index;
>  	u32 isolation_config_a;
> -	u32 isolation_config_b;
> +	union
> +	{
> +		u32 isolation_config_b;;
> +		struct {
> +			u32 cvm_type : 4;
> +			u32 Reserved11 : 1;
> +			u32 shared_gpa_boundary_active : 1;
> +			u32 shared_gpa_boundary_bits : 6;
> +			u32 Reserved12 : 20;
> +		};
> +	};
> +	void  __percpu **ghcb_base;
> +	u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/12]  HV: Add Write/Read MSR registers via ghcb
  2021-02-28 15:03 ` [RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb Tianyu Lan
@ 2021-03-03 17:16   ` Vitaly Kuznetsov
  2021-03-05  6:37     ` Tianyu Lan
  0 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-03 17:16 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch,
	thomas.lendacky, brijesh.singh, sunilmut, kys, haiyangz,
	sthemmin, wei.liu, tglx, mingo, bp, x86, hpa, arnd

Tianyu Lan <ltykernel@gmail.com> writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Hyper-V provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers and these registers are emulated by
> Hypervisor rather than paravisor.
>
> Hyper-V requests to write SINTx MSR registers twice(once via
> GHCB and once via wrmsr instruction including the proxy bit 21)
> Guest OS ID MSR also needs to be set via GHCB.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/Makefile        |   2 +-
>  arch/x86/hyperv/hv_init.c       |  18 +--
>  arch/x86/hyperv/ivm.c           | 178 ++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h |  21 +++-
>  arch/x86/kernel/cpu/mshyperv.c  |  46 --------
>  drivers/hv/channel.c            |   2 +-
>  drivers/hv/hv.c                 | 188 ++++++++++++++++++++++----------
>  include/asm-generic/mshyperv.h  |  10 +-
>  8 files changed, 343 insertions(+), 122 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
>
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
>  
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 90e65fbf4c58..87b1dd9c84d6 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -475,6 +475,9 @@ void __init hyperv_init(void)
>  
>  		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>  		*ghcb_base = ghcb_va;
> +
> +		/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> +		hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>  	}
>  
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -561,6 +564,7 @@ void hyperv_cleanup(void)
>  
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> +	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>  
>  	/*
>  	 * Reset hypercall page reference before reset the page,
> @@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
>  	return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> -	if (!(ms_hyperv.features_b & HV_ISOLATION))
> -		return HV_ISOLATION_TYPE_NONE;
> -	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> -	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> new file mode 100644
> index 000000000000..4332bf7aaf9b
> --- /dev/null
> +++ b/arch/x86/hyperv/ivm.c
> @@ -0,0 +1,178 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V Isolation VM interface with paravisor and hypervisor
> + *
> + * Author:
> + *  Tianyu Lan <Tianyu.Lan@microsoft.com>
> + */
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
> +#include <asm/io.h>
> +#include <asm/svm.h>
> +#include <asm/sev-es.h>
> +#include <asm/mshyperv.h>
> +
> +union hv_ghcb {
> +	struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);
> +
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return;
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
> +
> +	hv_ghcb->ghcb.protocol_version = 1;
> +	hv_ghcb->ghcb.ghcb_usage = 0;
> +
> +	ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> +	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
> +	ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
> +	ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1);
> +	ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
> +
> +	VMGEXIT();
> +
> +	if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1)
> +		pr_warn("Fail to write msr via ghcb.\n.");
> +
> +	local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
> +
> +void hv_ghcb_msr_read(u64 msr, u64 *value)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return;
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return;
> +	}
> +
> +	memset(hv_ghcb, 0x00, PAGE_SIZE);
> +	hv_ghcb->ghcb.protocol_version = 1;
> +	hv_ghcb->ghcb.ghcb_usage = 0;
> +
> +	ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> +	ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 0);
> +	ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
> +
> +	VMGEXIT();
> +
> +	if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1)
> +		pr_warn("Fail to write msr via ghcb.\n.");
> +	else
> +		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
> +			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
> +	local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
> +
> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
> +{
> +	hv_ghcb_msr_read(msr, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
> +
> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
> +{
> +	hv_ghcb_msr_write(msr, value);
> +
> +	/* Write proxy bit vua wrmsrl instruction. */
> +	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
> +		wrmsrl(msr, value | 1 << 20);
> +}
> +EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
> +
> +inline void hv_signal_eom_ghcb(void)
> +{
> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
> +}
> +EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);
> +
> +enum hv_isolation_type hv_get_isolation_type(void)
> +{
> +	if (!(ms_hyperv.features_b & HV_ISOLATION))
> +		return HV_ISOLATION_TYPE_NONE;
> +	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> +}
> +EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> +
> +bool hv_is_isolation_supported(void)
> +{
> +	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> +}
> +EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
> +
> +bool hv_isolation_type_snp(void)
> +{
> +	return hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP;
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> +
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
> +{
> +	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
> +	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
> +	u16 pages_processed;
> +	u64 hv_status;
> +	unsigned long flags;
> +
> +	/* no-op if partition isolation is not enabled */
> +	if (!hv_is_isolation_supported())
> +		return 0;
> +
> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> +			HV_MAX_MODIFY_GPA_REP_COUNT);
> +		return -EINVAL;
> +	}
> +
> +	local_irq_save(flags);
> +	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
> +			this_cpu_ptr(hyperv_pcpu_input_arg);
> +	input = *input_pcpu;
> +	if (unlikely(!input)) {
> +		local_irq_restore(flags);
> +		return -1;
> +	}
> +
> +	input->partition_id = HV_PARTITION_ID_SELF;
> +	input->host_visibility = visibility;
> +	input->reserved0 = 0;
> +	input->reserved1 = 0;
> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> +	hv_status = hv_do_rep_hypercall(
> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> +			0, input, &pages_processed);
> +	local_irq_restore(flags);
> +
> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> +		return 0;
> +
> +	return -EFAULT;
> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);

This looks like an unneeded code churn: first, you implement this in
arch/x86/kernel/cpu/mshyperv.c and several patches later you move it to
the dedicated arch/x86/hyperv/ivm.c. Let's just introduce this new
arch/x86/hyperv/ivm.c from the very beginning.

> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 1e8275d35c1f..f624d72b99d3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -269,6 +269,25 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>  int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
> +void hv_signal_eom_ghcb(void);
> +void hv_ghcb_msr_write(u64 msr, u64 value);
> +void hv_ghcb_msr_read(u64 msr, u64 *value);
> +
> +#define hv_get_synint_state_ghcb(int_num, val)			\
> +	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> +#define hv_set_synint_state_ghcb(int_num, val) \
> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> +
> +#define hv_get_simp_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
> +#define hv_set_simp_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
> +
> +#define hv_get_siefp_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
> +#define hv_set_siefp_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
> +
> +#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
> +#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> @@ -287,9 +306,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
>  {
>  	return -1;
>  }
> +static inline void hv_signal_eom_ghcb(void) { };
>  #endif /* CONFIG_HYPERV */
>  
> -
>  #include <asm-generic/mshyperv.h>
>  
>  #endif
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index d6c363456cbf..aeafd4017c89 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -37,8 +37,6 @@
>  bool hv_root_partition;
>  EXPORT_SYMBOL_GPL(hv_root_partition);
>  
> -#define HV_PARTITION_ID_SELF ((u64)-1)
> -
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
>  
> @@ -481,47 +479,3 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
>  	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
>  	.init.init_platform	= ms_hyperv_init_platform,
>  };
> -
> -int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
> -{
> -	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
> -	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
> -	u16 pages_processed;
> -	u64 hv_status;
> -	unsigned long flags;
> -
> -	/* no-op if partition isolation is not enabled */
> -	if (!hv_is_isolation_supported())
> -		return 0;
> -
> -	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> -		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> -			HV_MAX_MODIFY_GPA_REP_COUNT);
> -		return -EINVAL;
> -	}
> -
> -	local_irq_save(flags);
> -	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
> -			this_cpu_ptr(hyperv_pcpu_input_arg);
> -	input = *input_pcpu;
> -	if (unlikely(!input)) {
> -		local_irq_restore(flags);
> -		return -1;
> -	}
> -
> -	input->partition_id = HV_PARTITION_ID_SELF;
> -	input->host_visibility = visibility;
> -	input->reserved0 = 0;
> -	input->reserved1 = 0;
> -	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> -	hv_status = hv_do_rep_hypercall(
> -			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> -			0, input, &pages_processed);
> -	local_irq_restore(flags);
> -
> -	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> -		return 0;
> -
> -	return -EFAULT;
> -}
> -EXPORT_SYMBOL(hv_mark_gpa_visibility);
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 204e6f3598a5..f31b669a1ddf 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -247,7 +247,7 @@ int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
>  	u64 *pfn_array;
>  	int ret = 0;
>  
> -	if (!hv_isolation_type_snp())
> +	if (!hv_is_isolation_supported())
>  		return 0;
>  
>  	pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index f202ac7f4b3d..28e28ccc2081 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -99,17 +99,24 @@ int hv_synic_alloc(void)
>  		tasklet_init(&hv_cpu->msg_dpc,
>  			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
>  
> -		hv_cpu->synic_message_page =
> -			(void *)get_zeroed_page(GFP_ATOMIC);
> -		if (hv_cpu->synic_message_page == NULL) {
> -			pr_err("Unable to allocate SYNIC message page\n");
> -			goto err;
> -		}
> +		/*
> +		 * Synic message and event pages are allocated by paravisor.
> +		 * Skip these pages allocation here.
> +		 */
> +		if (!hv_isolation_type_snp()) {
> +			hv_cpu->synic_message_page =
> +				(void *)get_zeroed_page(GFP_ATOMIC);
> +			if (hv_cpu->synic_message_page == NULL) {
> +				pr_err("Unable to allocate SYNIC message page\n");
> +				goto err;
> +			}
>  
> -		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
> -		if (hv_cpu->synic_event_page == NULL) {
> -			pr_err("Unable to allocate SYNIC event page\n");
> -			goto err;
> +			hv_cpu->synic_event_page =
> +				(void *)get_zeroed_page(GFP_ATOMIC);
> +			if (hv_cpu->synic_event_page == NULL) {
> +				pr_err("Unable to allocate SYNIC event page\n");
> +				goto err;
> +			}
>  		}
>  
>  		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
> @@ -136,10 +143,17 @@ void hv_synic_free(void)
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> +		free_page((unsigned long)hv_cpu->post_msg_page);
> +
> +		/*
> +		 * Synic message and event pages are allocated by paravisor.
> +		 * Skip free these pages here.
> +		 */
> +		if (hv_isolation_type_snp())
> +			continue;
>  
>  		free_page((unsigned long)hv_cpu->synic_event_page);
>  		free_page((unsigned long)hv_cpu->synic_message_page);
> -		free_page((unsigned long)hv_cpu->post_msg_page);
>  	}
>  
>  	kfree(hv_context.hv_numa_map);
> @@ -161,35 +175,72 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	union hv_synic_sint shared_sint;
>  	union hv_synic_scontrol sctrl;
>  
> -	/* Setup the Synic's message page */
> -	hv_get_simp(simp.as_uint64);
> -	simp.simp_enabled = 1;
> -	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> -		>> HV_HYP_PAGE_SHIFT;
> -
> -	hv_set_simp(simp.as_uint64);
> -
> -	/* Setup the Synic's event page */
> -	hv_get_siefp(siefp.as_uint64);
> -	siefp.siefp_enabled = 1;
> -	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> -		>> HV_HYP_PAGE_SHIFT;
> -
> -	hv_set_siefp(siefp.as_uint64);
> -
> -	/* Setup the shared SINT. */
> -	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> -
> -	shared_sint.vector = hv_get_vector();
> -	shared_sint.masked = false;
> -	shared_sint.auto_eoi = hv_recommend_using_aeoi();
> -	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> -
> -	/* Enable the global synic bit */
> -	hv_get_synic_state(sctrl.as_uint64);
> -	sctrl.enable = 1;
> -
> -	hv_set_synic_state(sctrl.as_uint64);
> +	/*
> +	 * Setup Synic pages for CVM. Synic message and event page
> +	 * are allocated by paravisor in the SNP CVM.
> +	 */
> +	if (hv_isolation_type_snp()) {
> +		/* Setup the Synic's message. */
> +		hv_get_simp_ghcb(&simp.as_uint64);
> +		simp.simp_enabled = 1;
> +		hv_cpu->synic_message_page
> +			= ioremap_cache(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
> +					PAGE_SIZE);
> +		if (!hv_cpu->synic_message_page)
> +			pr_warn("Fail to map syinc message page.\n");
> +
> +		hv_set_simp_ghcb(simp.as_uint64);
> +
> +		/* Setup the Synic's event page */
> +		hv_get_siefp_ghcb(&siefp.as_uint64);
> +		siefp.siefp_enabled = 1;
> +		hv_cpu->synic_event_page = ioremap_cache(
> +			 siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT, PAGE_SIZE);
> +		if (!hv_cpu->synic_event_page)
> +			pr_warn("Fail to map syinc event page.\n");
> +		hv_set_siefp_ghcb(siefp.as_uint64);
> +
> +		/* Setup the shared SINT. */
> +		hv_get_synint_state_ghcb(VMBUS_MESSAGE_SINT,
> +					 &shared_sint.as_uint64);
> +		shared_sint.vector = hv_get_vector();
> +		shared_sint.masked = false;
> +		shared_sint.auto_eoi = hv_recommend_using_aeoi();
> +		hv_set_synint_state_ghcb(VMBUS_MESSAGE_SINT,
> +					 shared_sint.as_uint64);
> +
> +		/* Enable the global synic bit */
> +		hv_get_synic_state_ghcb(&sctrl.as_uint64);
> +		sctrl.enable = 1;
> +		hv_set_synic_state_ghcb(sctrl.as_uint64);
> +	} else {
> +		/* Setup the Synic's message. */
> +		hv_get_simp(simp.as_uint64);
> +		simp.simp_enabled = 1;
> +		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +		hv_set_simp(simp.as_uint64);
> +
> +		/* Setup the Synic's event page */
> +		hv_get_siefp(siefp.as_uint64);
> +		siefp.siefp_enabled = 1;
> +		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +		hv_set_siefp(siefp.as_uint64);
> +
> +		/* Setup the shared SINT. */
> +		hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +
> +		shared_sint.vector = hv_get_vector();
> +		shared_sint.masked = false;
> +		shared_sint.auto_eoi = hv_recommend_using_aeoi();
> +		hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +
> +		/* Enable the global synic bit */
> +		hv_get_synic_state(sctrl.as_uint64);
> +		sctrl.enable = 1;
> +		hv_set_synic_state(sctrl.as_uint64);

There's definitely some room for unification here. E.g. the part after
'Setup the shared SINT' looks identical, you can move it outside of the
if/else block. 

> +	}
>  }
>  
>  int hv_synic_init(unsigned int cpu)
> @@ -211,30 +262,53 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	union hv_synic_siefp siefp;
>  	union hv_synic_scontrol sctrl;
>  
> -	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +	if (hv_isolation_type_snp()) {
> +		hv_get_synint_state_ghcb(VMBUS_MESSAGE_SINT,
> +					 &shared_sint.as_uint64);
> +		shared_sint.masked = 1;
> +		hv_set_synint_state_ghcb(VMBUS_MESSAGE_SINT,
> +					 shared_sint.as_uint64);
> +
> +		hv_get_simp_ghcb(&simp.as_uint64);
> +		simp.simp_enabled = 0;
> +		simp.base_simp_gpa = 0;
> +		hv_set_simp_ghcb(simp.as_uint64);
> +
> +		hv_get_siefp_ghcb(&siefp.as_uint64);
> +		siefp.siefp_enabled = 0;
> +		siefp.base_siefp_gpa = 0;
> +		hv_set_siefp_ghcb(siefp.as_uint64);
>  
> -	shared_sint.masked = 1;
> +		/* Disable the global synic bit */
> +		hv_get_synic_state_ghcb(&sctrl.as_uint64);
> +		sctrl.enable = 0;
> +		hv_set_synic_state_ghcb(sctrl.as_uint64);
> +	} else {
> +		hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>  
> -	/* Need to correctly cleanup in the case of SMP!!! */
> -	/* Disable the interrupt */
> -	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +		shared_sint.masked = 1;
>  
> -	hv_get_simp(simp.as_uint64);
> -	simp.simp_enabled = 0;
> -	simp.base_simp_gpa = 0;
> +		/* Need to correctly cleanup in the case of SMP!!! */
> +		/* Disable the interrupt */
> +		hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>  
> -	hv_set_simp(simp.as_uint64);
> +		hv_get_simp(simp.as_uint64);
> +		simp.simp_enabled = 0;
> +		simp.base_simp_gpa = 0;
>  
> -	hv_get_siefp(siefp.as_uint64);
> -	siefp.siefp_enabled = 0;
> -	siefp.base_siefp_gpa = 0;
> +		hv_set_simp(simp.as_uint64);
>  
> -	hv_set_siefp(siefp.as_uint64);
> +		hv_get_siefp(siefp.as_uint64);
> +		siefp.siefp_enabled = 0;
> +		siefp.base_siefp_gpa = 0;
>  
> -	/* Disable the global synic bit */
> -	hv_get_synic_state(sctrl.as_uint64);
> -	sctrl.enable = 0;
> -	hv_set_synic_state(sctrl.as_uint64);
> +		hv_set_siefp(siefp.as_uint64);
> +
> +		/* Disable the global synic bit */
> +		hv_get_synic_state(sctrl.as_uint64);
> +		sctrl.enable = 0;
> +		hv_set_synic_state(sctrl.as_uint64);
> +	}
>  }
>  
>  int hv_synic_cleanup(unsigned int cpu)
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index ad0e33776668..6727f4073b5a 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -23,6 +23,7 @@
>  #include <linux/bitops.h>
>  #include <linux/cpumask.h>
>  #include <asm/ptrace.h>
> +#include <asm/mshyperv.h>
>  #include <asm/hyperv-tlfs.h>
>  
>  struct ms_hyperv_info {
> @@ -52,7 +53,7 @@ extern struct ms_hyperv_info ms_hyperv;
>  
>  extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
>  extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> -
> +extern bool hv_isolation_type_snp(void);
>  
>  /* Generate the guest OS identifier as described in the Hyper-V TLFS */
>  static inline  __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
> @@ -100,7 +101,11 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
>  		 * possibly deliver another msg from the
>  		 * hypervisor
>  		 */
> -		hv_signal_eom();
> +		if (hv_isolation_type_snp() &&
> +		    old_msg_type != HVMSG_TIMER_EXPIRED)
> +			hv_signal_eom_ghcb();
> +		else
> +			hv_signal_eom();

Would it be better to hide SNP specifics into hv_signal_eom()? Also, out
of pure curiosity, why are timer messages special?

>  	}
>  }
>  
> @@ -186,6 +191,7 @@ bool hv_is_hyperv_initialized(void);
>  bool hv_is_hibernation_supported(void);
>  enum hv_isolation_type hv_get_isolation_type(void);
>  bool hv_is_isolation_supported(void);
> +bool hv_isolation_type_snp(void);
>  void hyperv_cleanup(void);
>  #else /* CONFIG_HYPERV */
>  static inline bool hv_is_hyperv_initialized(void) { return false; }

-- 
Vitaly


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM
  2021-02-28 15:03 ` [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
@ 2021-03-03 17:21   ` Vitaly Kuznetsov
  2021-03-05 15:21     ` Tianyu Lan
  0 siblings, 1 reply; 26+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-03 17:21 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, thomas.lendacky,
	brijesh.singh, sunilmut, kys, haiyangz, sthemmin, wei.liu, tglx,
	mingo, bp, x86, hpa

Tianyu Lan <ltykernel@gmail.com> writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Hyper-V provides ghcb hvcall to handle VMBus
> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
> msg in SNP Isolation VM. Add such support.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/ivm.c           | 69 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h |  1 +
>  drivers/hv/connection.c         |  6 ++-
>  drivers/hv/hv.c                 |  8 +++-
>  4 files changed, 82 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 4332bf7aaf9b..feaabcd151f5 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -14,8 +14,77 @@
>  
>  union hv_ghcb {
>  	struct ghcb ghcb;
> +	struct {
> +		u64 hypercalldata[509];
> +		u64 outputgpa;
> +		union {
> +			union {
> +				struct {
> +					u32 callcode        : 16;
> +					u32 isfast          : 1;
> +					u32 reserved1       : 14;
> +					u32 isnested        : 1;
> +					u32 countofelements : 12;
> +					u32 reserved2       : 4;
> +					u32 repstartindex   : 12;
> +					u32 reserved3       : 4;
> +				};
> +				u64 asuint64;
> +			} hypercallinput;
> +			union {
> +				struct {
> +					u16 callstatus;
> +					u16 reserved1;
> +					u32 elementsprocessed : 12;
> +					u32 reserved2         : 20;
> +				};
> +				u64 asunit64;
> +			} hypercalloutput;
> +		};
> +		u64 reserved2;
> +	} hypercall;
>  } __packed __aligned(PAGE_SIZE);
>  
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> +	union hv_ghcb *hv_ghcb;
> +	void **ghcb_base;
> +	unsigned long flags;
> +
> +	if (!ms_hyperv.ghcb_base)
> +		return -EFAULT;
> +
> +	local_irq_save(flags);
> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
> +	if (!hv_ghcb) {
> +		local_irq_restore(flags);
> +		return -EFAULT;
> +	}
> +
> +	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
> +	hv_ghcb->ghcb.protocol_version = 1;
> +	hv_ghcb->ghcb.ghcb_usage = 1;
> +
> +	hv_ghcb->hypercall.outputgpa = (u64)output;
> +	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> +	hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> +	if (input_size)
> +		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> +	VMGEXIT();
> +
> +	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
> +	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
> +	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
> +
> +	local_irq_restore(flags);
> +
> +	return hv_ghcb->hypercall.hypercalloutput.callstatus;
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
> +
>  void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>  	union hv_ghcb *hv_ghcb;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index f624d72b99d3..c8f66d269e5b 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -274,6 +274,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
>  void hv_ghcb_msr_write(u64 msr, u64 value);
>  void hv_ghcb_msr_read(u64 msr, u64 *value);
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
>  
>  #define hv_get_synint_state_ghcb(int_num, val)			\
>  	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index c83612cddb99..79bca653dce9 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
>  
>  	++channel->sig_events;
>  
> -	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> +	if (hv_isolation_type_snp())
> +		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
> +				NULL, sizeof(u64));
> +	else
> +		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);

vmbus_set_event() is a hotpath so I'd suggest we introduce a static
branch instead of checking hv_isolation_type_snp() every time.

>  }
>  EXPORT_SYMBOL_GPL(vmbus_set_event);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 28e28ccc2081..6c64a7fd1ebd 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -60,7 +60,13 @@ int hv_post_message(union hv_connection_id connection_id,
>  	aligned_msg->payload_size = payload_size;
>  	memcpy((void *)aligned_msg->payload, payload, payload_size);
>  
> -	status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
> +	if (hv_isolation_type_snp())
> +		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
> +				(void *)aligned_msg, NULL,
> +				sizeof(struct hv_input_post_message));
> +	else
> +		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
> +				aligned_msg, NULL);

and, if we are to introduce a static branch, we could use it here
(though it doesn't matter much for messages).

>  
>  	/* Preemption must remain disabled until after the hypercall
>  	 * so some other thread can't get scheduled onto this cpu and

-- 
Vitaly


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()
  2021-03-03 16:27   ` Vitaly Kuznetsov
@ 2021-03-05  6:06     ` Tianyu Lan
  0 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-03-05  6:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, thomas.lendacky,
	brijesh.singh, sunilmut, kys, haiyangz, sthemmin, wei.liu, tglx,
	mingo, bp, x86, hpa, davem, kuba, gregkh

Hi Vitaly:
      Thanks for your review.

On 3/4/2021 12:27 AM, Vitaly Kuznetsov wrote:
> Tianyu Lan <ltykernel@gmail.com> writes:
> 
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Add visibility parameter for vmbus_establish_gpadl() and prepare
>> to change host visibility when create gpadl for buffer.
>>
> 
> "No functional change" as you don't actually use the parameter.

Yes, will add it into commit log.

> 
>> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Nit: Sunil's SoB looks misleading because the patch is from you,
> Co-Developed-by should be sufficient.
> 

Will update.

>> ---
>>   arch/x86/include/asm/hyperv-tlfs.h |  9 +++++++++
>>   drivers/hv/channel.c               | 20 +++++++++++---------
>>   drivers/net/hyperv/netvsc.c        |  8 ++++++--
>>   drivers/uio/uio_hv_generic.c       |  7 +++++--
>>   include/linux/hyperv.h             |  3 ++-
>>   5 files changed, 33 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
>> index e6cd3fee562b..fb1893a4c32b 100644
>> --- a/arch/x86/include/asm/hyperv-tlfs.h
>> +++ b/arch/x86/include/asm/hyperv-tlfs.h
>> @@ -236,6 +236,15 @@ enum hv_isolation_type {
>>   /* TSC invariant control */
>>   #define HV_X64_MSR_TSC_INVARIANT_CONTROL	0x40000118
>>   
>> +/* Hyper-V GPA map flags */
>> +#define HV_MAP_GPA_PERMISSIONS_NONE		0x0
>> +#define HV_MAP_GPA_READABLE			0x1
>> +#define HV_MAP_GPA_WRITABLE			0x2
>> +
>> +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
>> +#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
>> +#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
>> +
> 
> Are these x86-only? If not, then we should probably move these defines
> to include/asm-generic/hyperv-tlfs.h. In case they are, we should do
> something as we're using them from arch neutral places.
> 
> Also, could you please add a comment stating that these flags define
> host's visibility of a page and not guest's (this seems to be not
> obvious at least to me).
>




>>   /*
>>    * Declare the MSR used to setup pages used to communicate with the hypervisor.
>>    */
>> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
>> index 0bd202de7960..daa21cc72beb 100644
>> --- a/drivers/hv/channel.c
>> +++ b/drivers/hv/channel.c
>> @@ -242,7 +242,7 @@ EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
>>    */
>>   static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
>>   			       u32 size, u32 send_offset,
>> -			       struct vmbus_channel_msginfo **msginfo)
>> +			       struct vmbus_channel_msginfo **msginfo, u32 visibility)
>>   {
>>   	int i;
>>   	int pagecount;
>> @@ -391,7 +391,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
>>   static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   				   enum hv_gpadl_type type, void *kbuffer,
>>   				   u32 size, u32 send_offset,
>> -				   u32 *gpadl_handle)
>> +				   u32 *gpadl_handle, u32 visibility)
>>   {
>>   	struct vmbus_channel_gpadl_header *gpadlmsg;
>>   	struct vmbus_channel_gpadl_body *gpadl_body;
>> @@ -405,7 +405,8 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   	next_gpadl_handle =
>>   		(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
>>   
>> -	ret = create_gpadl_header(type, kbuffer, size, send_offset, &msginfo);
>> +	ret = create_gpadl_header(type, kbuffer, size, send_offset,
>> +				  &msginfo, visibility);
>>   	if (ret)
>>   		return ret;
>>   
>> @@ -496,10 +497,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>    * @gpadl_handle: some funky thing
>>    */
>>   int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
>> -			  u32 size, u32 *gpadl_handle)
>> +			  u32 size, u32 *gpadl_handle, u32 visibility)
>>   {
>>   	return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
>> -				       0U, gpadl_handle);
>> +				       0U, gpadl_handle, visibility);
>>   }
>>   EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
>>   
>> @@ -610,10 +611,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>>   	newchannel->ringbuffer_gpadlhandle = 0;
>>   
>>   	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
>> -				      page_address(newchannel->ringbuffer_page),
>> -				      (send_pages + recv_pages) << PAGE_SHIFT,
>> -				      newchannel->ringbuffer_send_offset << PAGE_SHIFT,
>> -				      &newchannel->ringbuffer_gpadlhandle);
>> +			page_address(newchannel->ringbuffer_page),
>> +			(send_pages + recv_pages) << PAGE_SHIFT,
>> +			newchannel->ringbuffer_send_offset << PAGE_SHIFT,
>> +			&newchannel->ringbuffer_gpadlhandle,
>> +			VMBUS_PAGE_VISIBLE_READ_WRITE);
> 
> Nit: I liked the original alignment more and we can avoid the unneeded
> code churn.
> 
>>   	if (err)
>>   		goto error_clean_ring;
>>   
>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>> index 2353623259f3..bb72c7578330 100644
>> --- a/drivers/net/hyperv/netvsc.c
>> +++ b/drivers/net/hyperv/netvsc.c
>> @@ -333,7 +333,8 @@ static int netvsc_init_buf(struct hv_device *device,
>>   	 */
>>   	ret = vmbus_establish_gpadl(device->channel, net_device->recv_buf,
>>   				    buf_size,
>> -				    &net_device->recv_buf_gpadl_handle);
>> +				    &net_device->recv_buf_gpadl_handle,
>> +				    VMBUS_PAGE_VISIBLE_READ_WRITE);
>>   	if (ret != 0) {
>>   		netdev_err(ndev,
>>   			"unable to establish receive buffer's gpadl\n");
>> @@ -422,10 +423,13 @@ static int netvsc_init_buf(struct hv_device *device,
>>   	/* Establish the gpadl handle for this buffer on this
>>   	 * channel.  Note: This call uses the vmbus connection rather
>>   	 * than the channel to establish the gpadl handle.
>> +	 * Send buffer should theoretically be only marked as "read-only", but
>> +	 * the netvsp for some reason needs write capabilities on it.
>>   	 */
>>   	ret = vmbus_establish_gpadl(device->channel, net_device->send_buf,
>>   				    buf_size,
>> -				    &net_device->send_buf_gpadl_handle);
>> +				    &net_device->send_buf_gpadl_handle,
>> +				    VMBUS_PAGE_VISIBLE_READ_WRITE);
>>   	if (ret != 0) {
>>   		netdev_err(ndev,
>>   			   "unable to establish send buffer's gpadl\n");
>> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
>> index 0330ba99730e..813a7bee5139 100644
>> --- a/drivers/uio/uio_hv_generic.c
>> +++ b/drivers/uio/uio_hv_generic.c
>> @@ -29,6 +29,7 @@
>>   #include <linux/hyperv.h>
>>   #include <linux/vmalloc.h>
>>   #include <linux/slab.h>
>> +#include <asm/mshyperv.h>
>>   
>>   #include "../hv/hyperv_vmbus.h"
>>   
>> @@ -295,7 +296,8 @@ hv_uio_probe(struct hv_device *dev,
>>   	}
>>   
>>   	ret = vmbus_establish_gpadl(channel, pdata->recv_buf,
>> -				    RECV_BUFFER_SIZE, &pdata->recv_gpadl);
>> +				    RECV_BUFFER_SIZE, &pdata->recv_gpadl,
>> +				    VMBUS_PAGE_VISIBLE_READ_WRITE);
>>   	if (ret)
>>   		goto fail_close;
>>   
>> @@ -315,7 +317,8 @@ hv_uio_probe(struct hv_device *dev,
>>   	}
>>   
>>   	ret = vmbus_establish_gpadl(channel, pdata->send_buf,
>> -				    SEND_BUFFER_SIZE, &pdata->send_gpadl);
>> +				    SEND_BUFFER_SIZE, &pdata->send_gpadl,
>> +				    VMBUS_PAGE_VISIBLE_READ_ONLY);
> 
> Actually, this is the only place where you use 'READ_ONLY' mapping --
> which makes me wonder if it's actually worth it or we can hard-code
> VMBUS_PAGE_VISIBLE_READ_WRITE for now and avoid this additional
> parameter.
> 

Another option is to set host visibility out of vmbus_establish_gpadl(). 
There are three places calling vmbus_establish_gpadl(). Vmbus, netvsc 
and uio drivers.

>>   	if (ret)
>>   		goto fail_close;
>>   
>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>> index f1d74dcf0353..016fdca20d6e 100644
>> --- a/include/linux/hyperv.h
>> +++ b/include/linux/hyperv.h
>> @@ -1179,7 +1179,8 @@ extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
>>   extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   				      void *kbuffer,
>>   				      u32 size,
>> -				      u32 *gpadl_handle);
>> +				      u32 *gpadl_handle,
>> +				      u32 visibility);
>>   
>>   extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
>>   				     u32 gpadl_handle);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support
  2021-03-03 16:58   ` Vitaly Kuznetsov
@ 2021-03-05  6:23     ` Tianyu Lan
  0 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-03-05  6:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, netdev, linux-arch,
	thomas.lendacky, brijesh.singh, sunilmut, kys, haiyangz,
	sthemmin, wei.liu, tglx, mingo, bp, x86, hpa, davem, kuba,
	gregkh, arnd


On 3/4/2021 12:58 AM, Vitaly Kuznetsov wrote:
> Tianyu Lan <ltykernel@gmail.com> writes:
> 
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Add new hvcall guest address host visibility support. Mark vmbus
>> ring buffer visible to host when create gpadl buffer and mark back
>> to not visible when tear down gpadl buffer.
>>
>> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>>   arch/x86/include/asm/hyperv-tlfs.h | 13 ++++++++
>>   arch/x86/include/asm/mshyperv.h    |  4 +--
>>   arch/x86/kernel/cpu/mshyperv.c     | 46 ++++++++++++++++++++++++++
>>   drivers/hv/channel.c               | 53 ++++++++++++++++++++++++++++--
>>   drivers/net/hyperv/hyperv_net.h    |  1 +
>>   drivers/net/hyperv/netvsc.c        |  9 +++--
>>   drivers/uio/uio_hv_generic.c       |  6 ++--
>>   include/asm-generic/hyperv-tlfs.h  |  1 +
>>   include/linux/hyperv.h             |  3 +-
>>   9 files changed, 126 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
>> index fb1893a4c32b..d22b1c3f425a 100644
>> --- a/arch/x86/include/asm/hyperv-tlfs.h
>> +++ b/arch/x86/include/asm/hyperv-tlfs.h
>> @@ -573,4 +573,17 @@ enum hv_interrupt_type {
>>   
>>   #include <asm-generic/hyperv-tlfs.h>
>>   
>> +/* All input parameters should be in single page. */
>> +#define HV_MAX_MODIFY_GPA_REP_COUNT		\
>> +	((PAGE_SIZE - 2 * sizeof(u64)) / (sizeof(u64)))
> 
> Would it be easier to express this as '((PAGE_SIZE / sizeof(u64)) - 2'
Yes, will update. Thanks.

>> +
>> +/* HvCallModifySparseGpaPageHostVisibility hypercall */
>> +struct hv_input_modify_sparse_gpa_page_host_visibility {
>> +	u64 partition_id;
>> +	u32 host_visibility:2;
>> +	u32 reserved0:30;
>> +	u32 reserved1;
>> +	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
>> +} __packed;
>> +
>>   #endif
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index ccf60a809a17..1e8275d35c1f 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -262,13 +262,13 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
>>   	msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
>>   	msi_entry->data.as_uint32 = msi_desc->msg.data;
>>   }
>> -
> 
> stray change
> 
>>   struct irq_domain *hv_create_pci_msi_domain(void);
>>   
>>   int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>>   		struct hv_interrupt_entry *entry);
>>   int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>> -
>> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
>> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
>>   #else /* CONFIG_HYPERV */
>>   static inline void hyperv_init(void) {}
>>   static inline void hyperv_setup_mmu_ops(void) {}
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index e88bc296afca..347c32eac8fd 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -37,6 +37,8 @@
>>   bool hv_root_partition;
>>   EXPORT_SYMBOL_GPL(hv_root_partition);
>>   
>> +#define HV_PARTITION_ID_SELF ((u64)-1)
>> +
> 
> We seem to have this already:
> 
> include/asm-generic/hyperv-tlfs.h:#define HV_PARTITION_ID_SELF          ((u64)-1)

>>   struct ms_hyperv_info ms_hyperv;
>>   EXPORT_SYMBOL_GPL(ms_hyperv);
>>   
>> @@ -477,3 +479,47 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
>>   	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
>>   	.init.init_platform	= ms_hyperv_init_platform,
>>   };
>> +
>> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
>> +{
>> +	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
>> +	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
>> +	u16 pages_processed;
>> +	u64 hv_status;
>> +	unsigned long flags;
>> +
>> +	/* no-op if partition isolation is not enabled */
>> +	if (!hv_is_isolation_supported())
>> +		return 0;
>> +
>> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
>> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
>> +			HV_MAX_MODIFY_GPA_REP_COUNT);
>> +		return -EINVAL;
>> +	}
>> +
>> +	local_irq_save(flags);
>> +	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
>> +			this_cpu_ptr(hyperv_pcpu_input_arg);
>> +	input = *input_pcpu;
>> +	if (unlikely(!input)) {
>> +		local_irq_restore(flags);
>> +		return -1;
> 
> -EFAULT/-ENOMEM/... maybe ?

Yes, will update.
> 
>> +	}
>> +
>> +	input->partition_id = HV_PARTITION_ID_SELF;
>> +	input->host_visibility = visibility;
>> +	input->reserved0 = 0;
>> +	input->reserved1 = 0;
>> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
>> +	hv_status = hv_do_rep_hypercall(
>> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
>> +			0, input, &pages_processed);
>> +	local_irq_restore(flags);
>> +
>> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
>> +		return 0;
>> +
>> +	return -EFAULT;
> 
> Could we just propagate "hv_status & HV_HYPERCALL_RESULT_MASK" maybe?

Yes. will update.

> 
>> +}
>> +EXPORT_SYMBOL(hv_mark_gpa_visibility);
>> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
>> index daa21cc72beb..204e6f3598a5 100644
>> --- a/drivers/hv/channel.c
>> +++ b/drivers/hv/channel.c
>> @@ -237,6 +237,38 @@ int vmbus_send_modifychannel(u32 child_relid, u32 target_vp)
>>   }
>>   EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
>>   
>> +/*
>> + * hv_set_mem_host_visibility - Set host visibility for specified memory.
>> + */
>> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
>> +{
>> +	int i, pfn;
>> +	int pagecount = size >> HV_HYP_PAGE_SHIFT;
>> +	u64 *pfn_array;
>> +	int ret = 0;
>> +
>> +	if (!hv_isolation_type_snp())
>> +		return 0;
>> +
>> +	pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
>> +	if (!pfn_array)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0, pfn = 0; i < pagecount; i++) {
>> +		pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
>> +		pfn++;
>> +
>> +		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
>> +			ret |= hv_mark_gpa_visibility(pfn, pfn_array, visibility);
>> +			pfn = 0;
> 
> hv_mark_gpa_visibility() return different error codes and aggregating
> them with
> 
>   ret |= ...
> 
> will have an unpredictable result. I'd suggest bail immediately instead:
> 
>   if (ret)
>       goto err_free_pfn_array;

Yes, this makes sense. Thanks.
> 
>> +		}
>> +	}
>> +
> 
> err_free_pfn_array:
> 
>> +	vfree(pfn_array);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(hv_set_mem_host_visibility);
>> +
>>   /*
>>    * create_gpadl_header - Creates a gpadl for the specified buffer
>>    */
>> @@ -410,6 +442,12 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   	if (ret)
>>   		return ret;
>>   
>> +	ret = hv_set_mem_host_visibility(kbuffer, size, visibility);
>> +	if (ret) {
>> +		pr_warn("Failed to set host visibility.\n");
>> +		return ret;
>> +	}
>> +
>>   	init_completion(&msginfo->waitevent);
>>   	msginfo->waiting_channel = channel;
>>   
>> @@ -693,7 +731,9 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>>   error_free_info:
>>   	kfree(open_info);
>>   error_free_gpadl:
>> -	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
>> +	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle,
>> +			     page_address(newchannel->ringbuffer_page),
>> +			     newchannel->ringbuffer_pagecount << PAGE_SHIFT);
> 
> Instead of modifying vmbus_teardown_gpadl() interface and all its call
> sites, could we just keep track of all established gpadls and then get
> the required data from there? I.e. make vmbus_establish_gpadl() save
> kbuffer/size to some internal structure associated with 'gpadl_handle'.
> 

Yes, that's another approach. Add an array or list in struct vmbus_channel.

>>   	newchannel->ringbuffer_gpadlhandle = 0;
>>   error_clean_ring:
>>   	hv_ringbuffer_cleanup(&newchannel->outbound);
>> @@ -740,7 +780,8 @@ EXPORT_SYMBOL_GPL(vmbus_open);
>>   /*
>>    * vmbus_teardown_gpadl -Teardown the specified GPADL handle
>>    */
>> -int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>> +int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle,
>> +			 void *kbuffer, u32 size)
> 
> This probably doesn't matter but why not 'u64 size'?
> 
>>   {
>>   	struct vmbus_channel_gpadl_teardown *msg;
>>   	struct vmbus_channel_msginfo *info;
>> @@ -793,6 +834,10 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
>>   	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
>>   
>>   	kfree(info);
>> +
>> +	if (hv_set_mem_host_visibility(kbuffer, size, VMBUS_PAGE_NOT_VISIBLE))
>> +		pr_warn("Fail to set mem host visibility.\n");
> 
> pr_err() maybe?
> 

Yes, will update.

>> +
>>   	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
>> @@ -869,7 +914,9 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
>>   	/* Tear down the gpadl for the channel's ring buffer */
>>   	else if (channel->ringbuffer_gpadlhandle) {
>>   		ret = vmbus_teardown_gpadl(channel,
>> -					   channel->ringbuffer_gpadlhandle);
>> +					   channel->ringbuffer_gpadlhandle,
>> +					   page_address(channel->ringbuffer_page),
>> +					   channel->ringbuffer_pagecount << PAGE_SHIFT);
>>   		if (ret) {
>>   			pr_err("Close failed: teardown gpadl return %d\n", ret);
>>   			/*
>> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
>> index 2a87cfa27ac0..b3a43c4ec8ab 100644
>> --- a/drivers/net/hyperv/hyperv_net.h
>> +++ b/drivers/net/hyperv/hyperv_net.h
>> @@ -1034,6 +1034,7 @@ struct netvsc_device {
>>   
>>   	/* Send buffer allocated by us */
>>   	void *send_buf;
>> +	u32 send_buf_size;
>>   	u32 send_buf_gpadl_handle;
>>   	u32 send_section_cnt;
>>   	u32 send_section_size;
>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>> index bb72c7578330..08d73401bb28 100644
>> --- a/drivers/net/hyperv/netvsc.c
>> +++ b/drivers/net/hyperv/netvsc.c
>> @@ -245,7 +245,9 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
>>   
>>   	if (net_device->recv_buf_gpadl_handle) {
>>   		ret = vmbus_teardown_gpadl(device->channel,
>> -					   net_device->recv_buf_gpadl_handle);
>> +					   net_device->recv_buf_gpadl_handle,
>> +					   net_device->recv_buf,
>> +					   net_device->recv_buf_size);
>>   
>>   		/* If we failed here, we might as well return and have a leak
>>   		 * rather than continue and a bugchk
>> @@ -267,7 +269,9 @@ static void netvsc_teardown_send_gpadl(struct hv_device *device,
>>   
>>   	if (net_device->send_buf_gpadl_handle) {
>>   		ret = vmbus_teardown_gpadl(device->channel,
>> -					   net_device->send_buf_gpadl_handle);
>> +					   net_device->send_buf_gpadl_handle,
>> +					   net_device->send_buf,
>> +					   net_device->send_buf_size);
>>   
>>   		/* If we failed here, we might as well return and have a leak
>>   		 * rather than continue and a bugchk
>> @@ -419,6 +423,7 @@ static int netvsc_init_buf(struct hv_device *device,
>>   		ret = -ENOMEM;
>>   		goto cleanup;
>>   	}
>> +	net_device->send_buf_size = buf_size;
>>   
>>   	/* Establish the gpadl handle for this buffer on this
>>   	 * channel.  Note: This call uses the vmbus connection rather
>> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
>> index 813a7bee5139..c8d4704fc90c 100644
>> --- a/drivers/uio/uio_hv_generic.c
>> +++ b/drivers/uio/uio_hv_generic.c
>> @@ -181,13 +181,15 @@ static void
>>   hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
>>   {
>>   	if (pdata->send_gpadl) {
>> -		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl);
>> +		vmbus_teardown_gpadl(dev->channel, pdata->send_gpadl,
>> +				     pdata->send_buf, SEND_BUFFER_SIZE);
>>   		pdata->send_gpadl = 0;
>>   		vfree(pdata->send_buf);
>>   	}
>>   
>>   	if (pdata->recv_gpadl) {
>> -		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl);
>> +		vmbus_teardown_gpadl(dev->channel, pdata->recv_gpadl,
>> +				     pdata->recv_buf, RECV_BUFFER_SIZE);
>>   		pdata->recv_gpadl = 0;
>>   		vfree(pdata->recv_buf);
>>   	}
>> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
>> index 83448e837ded..ad19f4199f90 100644
>> --- a/include/asm-generic/hyperv-tlfs.h
>> +++ b/include/asm-generic/hyperv-tlfs.h
>> @@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
>>   #define HVCALL_RETARGET_INTERRUPT		0x007e
>>   #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>>   #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
>> +#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
>>   
>>   #define HV_FLUSH_ALL_PROCESSORS			BIT(0)
>>   #define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES	BIT(1)
>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>> index 016fdca20d6e..41cbaa2db567 100644
>> --- a/include/linux/hyperv.h
>> +++ b/include/linux/hyperv.h
>> @@ -1183,7 +1183,8 @@ extern int vmbus_establish_gpadl(struct vmbus_channel *channel,
>>   				      u32 visibility);
>>   
>>   extern int vmbus_teardown_gpadl(struct vmbus_channel *channel,
>> -				     u32 gpadl_handle);
>> +				u32 gpadl_handle,
>> +				void *kbuffer, u32 size);
>>   
>>   void vmbus_reset_channel_cb(struct vmbus_channel *channel);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb
  2021-03-03 17:16   ` Vitaly Kuznetsov
@ 2021-03-05  6:37     ` Tianyu Lan
  0 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-03-05  6:37 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch,
	thomas.lendacky, brijesh.singh, sunilmut, kys, haiyangz,
	sthemmin, wei.liu, tglx, mingo, bp, x86, hpa, arnd



On 3/4/2021 1:16 AM, Vitaly Kuznetsov wrote:
> Tianyu Lan <ltykernel@gmail.com> writes:
> 
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Hyper-V provides GHCB protocol to write Synthetic Interrupt
>> Controller MSR registers and these registers are emulated by
>> Hypervisor rather than paravisor.
>>
>> Hyper-V requests to write SINTx MSR registers twice(once via
>> GHCB and once via wrmsr instruction including the proxy bit 21)
>> Guest OS ID MSR also needs to be set via GHCB.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>>   arch/x86/hyperv/Makefile        |   2 +-
>>   arch/x86/hyperv/hv_init.c       |  18 +--
>>   arch/x86/hyperv/ivm.c           | 178 ++++++++++++++++++++++++++++++
>>   arch/x86/include/asm/mshyperv.h |  21 +++-
>>   arch/x86/kernel/cpu/mshyperv.c  |  46 --------
>>   drivers/hv/channel.c            |   2 +-
>>   drivers/hv/hv.c                 | 188 ++++++++++++++++++++++----------
>>   include/asm-generic/mshyperv.h  |  10 +-
>>   8 files changed, 343 insertions(+), 122 deletions(-)
>>   create mode 100644 arch/x86/hyperv/ivm.c
>>
>> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
>> index 48e2c51464e8..5d2de10809ae 100644
>> --- a/arch/x86/hyperv/Makefile
>> +++ b/arch/x86/hyperv/Makefile
>> @@ -1,5 +1,5 @@
>>   # SPDX-License-Identifier: GPL-2.0-only
>> -obj-y			:= hv_init.o mmu.o nested.o irqdomain.o
>> +obj-y			:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>>   obj-$(CONFIG_X86_64)	+= hv_apic.o hv_proc.o
>>   
>>   ifdef CONFIG_X86_64
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 90e65fbf4c58..87b1dd9c84d6 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -475,6 +475,9 @@ void __init hyperv_init(void)
>>   
>>   		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>>   		*ghcb_base = ghcb_va;
>> +
>> +		/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
>> +		hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>>   	}
>>   
>>   	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> @@ -561,6 +564,7 @@ void hyperv_cleanup(void)
>>   
>>   	/* Reset our OS id */
>>   	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>> +	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>>   
>>   	/*
>>   	 * Reset hypercall page reference before reset the page,
>> @@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
>>   	return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
>>   }
>>   EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
>> -
>> -enum hv_isolation_type hv_get_isolation_type(void)
>> -{
>> -	if (!(ms_hyperv.features_b & HV_ISOLATION))
>> -		return HV_ISOLATION_TYPE_NONE;
>> -	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
>> -}
>> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>> -
>> -bool hv_is_isolation_supported(void)
>> -{
>> -	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>> -}
>> -EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
>> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
>> new file mode 100644
>> index 000000000000..4332bf7aaf9b
>> --- /dev/null
>> +++ b/arch/x86/hyperv/ivm.c
>> @@ -0,0 +1,178 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Hyper-V Isolation VM interface with paravisor and hypervisor
>> + *
>> + * Author:
>> + *  Tianyu Lan <Tianyu.Lan@microsoft.com>
>> + */
>> +#include <linux/types.h>
>> +#include <linux/bitfield.h>
>> +#include <asm/io.h>
>> +#include <asm/svm.h>
>> +#include <asm/sev-es.h>
>> +#include <asm/mshyperv.h>
>> +
>> +union hv_ghcb {
>> +	struct ghcb ghcb;
>> +} __packed __aligned(PAGE_SIZE);
>> +
>> +void hv_ghcb_msr_write(u64 msr, u64 value)
>> +{
>> +	union hv_ghcb *hv_ghcb;
>> +	void **ghcb_base;
>> +	unsigned long flags;
>> +
>> +	if (!ms_hyperv.ghcb_base)
>> +		return;
>> +
>> +	local_irq_save(flags);
>> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
>> +	if (!hv_ghcb) {
>> +		local_irq_restore(flags);
>> +		return;
>> +	}
>> +
>> +	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
>> +
>> +	hv_ghcb->ghcb.protocol_version = 1;
>> +	hv_ghcb->ghcb.ghcb_usage = 0;
>> +
>> +	ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
>> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
>> +	ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
>> +	ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
>> +	ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1);
>> +	ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
>> +
>> +	VMGEXIT();
>> +
>> +	if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1)
>> +		pr_warn("Fail to write msr via ghcb.\n.");
>> +
>> +	local_irq_restore(flags);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
>> +
>> +void hv_ghcb_msr_read(u64 msr, u64 *value)
>> +{
>> +	union hv_ghcb *hv_ghcb;
>> +	void **ghcb_base;
>> +	unsigned long flags;
>> +
>> +	if (!ms_hyperv.ghcb_base)
>> +		return;
>> +
>> +	local_irq_save(flags);
>> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
>> +	if (!hv_ghcb) {
>> +		local_irq_restore(flags);
>> +		return;
>> +	}
>> +
>> +	memset(hv_ghcb, 0x00, PAGE_SIZE);
>> +	hv_ghcb->ghcb.protocol_version = 1;
>> +	hv_ghcb->ghcb.ghcb_usage = 0;
>> +
>> +	ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
>> +	ghcb_set_rcx(&hv_ghcb->ghcb, msr);
>> +	ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 0);
>> +	ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
>> +
>> +	VMGEXIT();
>> +
>> +	if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0xffffffff) == 1)
>> +		pr_warn("Fail to write msr via ghcb.\n.");
>> +	else
>> +		*value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
>> +			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
>> +	local_irq_restore(flags);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
>> +
>> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
>> +{
>> +	hv_ghcb_msr_read(msr, value);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
>> +
>> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
>> +{
>> +	hv_ghcb_msr_write(msr, value);
>> +
>> +	/* Write proxy bit vua wrmsrl instruction. */
>> +	if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
>> +		wrmsrl(msr, value | 1 << 20);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
>> +
>> +inline void hv_signal_eom_ghcb(void)
>> +{
>> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);
>> +
>> +enum hv_isolation_type hv_get_isolation_type(void)
>> +{
>> +	if (!(ms_hyperv.features_b & HV_ISOLATION))
>> +		return HV_ISOLATION_TYPE_NONE;
>> +	return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>> +
>> +bool hv_is_isolation_supported(void)
>> +{
>> +	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>> +}
>> +EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
>> +
>> +bool hv_isolation_type_snp(void)
>> +{
>> +	return hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP;
>> +}
>> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
>> +
>> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
>> +{
>> +	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
>> +	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
>> +	u16 pages_processed;
>> +	u64 hv_status;
>> +	unsigned long flags;
>> +
>> +	/* no-op if partition isolation is not enabled */
>> +	if (!hv_is_isolation_supported())
>> +		return 0;
>> +
>> +	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
>> +		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
>> +			HV_MAX_MODIFY_GPA_REP_COUNT);
>> +		return -EINVAL;
>> +	}
>> +
>> +	local_irq_save(flags);
>> +	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
>> +			this_cpu_ptr(hyperv_pcpu_input_arg);
>> +	input = *input_pcpu;
>> +	if (unlikely(!input)) {
>> +		local_irq_restore(flags);
>> +		return -1;
>> +	}
>> +
>> +	input->partition_id = HV_PARTITION_ID_SELF;
>> +	input->host_visibility = visibility;
>> +	input->reserved0 = 0;
>> +	input->reserved1 = 0;
>> +	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
>> +	hv_status = hv_do_rep_hypercall(
>> +			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
>> +			0, input, &pages_processed);
>> +	local_irq_restore(flags);
>> +
>> +	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
>> +		return 0;
>> +
>> +	return -EFAULT;
>> +}
>> +EXPORT_SYMBOL(hv_mark_gpa_visibility);
> 
> This looks like an unneeded code churn: first, you implement this in
> arch/x86/kernel/cpu/mshyperv.c and several patches later you move it to
> the dedicated arch/x86/hyperv/ivm.c. Let's just introduce this new
> arch/x86/hyperv/ivm.c from the very beginning.

OK. Will update.

> 
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index 1e8275d35c1f..f624d72b99d3 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -269,6 +269,25 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>>   int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>>   int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
>>   int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
>> +void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
>> +void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>> +void hv_signal_eom_ghcb(void);
>> +void hv_ghcb_msr_write(u64 msr, u64 value);
>> +void hv_ghcb_msr_read(u64 msr, u64 *value);
>> +
>> +#define hv_get_synint_state_ghcb(int_num, val)			\
>> +	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
>> +#define hv_set_synint_state_ghcb(int_num, val) \
>> +	hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
>> +
>> +#define hv_get_simp_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
>> +#define hv_set_simp_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
>> +
>> +#define hv_get_siefp_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
>> +#define hv_set_siefp_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
>> +
>> +#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
>> +#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
>>   #else /* CONFIG_HYPERV */
>>   static inline void hyperv_init(void) {}
>>   static inline void hyperv_setup_mmu_ops(void) {}
>> @@ -287,9 +306,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
>>   {
>>   	return -1;
>>   }
>> +static inline void hv_signal_eom_ghcb(void) { };
>>   #endif /* CONFIG_HYPERV */
>>   
>> -
>>   #include <asm-generic/mshyperv.h>
>>   
>>   #endif
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index d6c363456cbf..aeafd4017c89 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -37,8 +37,6 @@
>>   bool hv_root_partition;
>>   EXPORT_SYMBOL_GPL(hv_root_partition);
>>   
>> -#define HV_PARTITION_ID_SELF ((u64)-1)
>> -
>>   struct ms_hyperv_info ms_hyperv;
>>   EXPORT_SYMBOL_GPL(ms_hyperv);
>>   
>> @@ -481,47 +479,3 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
>>   	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
>>   	.init.init_platform	= ms_hyperv_init_platform,
>>   };
>> -
>> -int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
>> -{
>> -	struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
>> -	struct hv_input_modify_sparse_gpa_page_host_visibility *input;
>> -	u16 pages_processed;
>> -	u64 hv_status;
>> -	unsigned long flags;
>> -
>> -	/* no-op if partition isolation is not enabled */
>> -	if (!hv_is_isolation_supported())
>> -		return 0;
>> -
>> -	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
>> -		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
>> -			HV_MAX_MODIFY_GPA_REP_COUNT);
>> -		return -EINVAL;
>> -	}
>> -
>> -	local_irq_save(flags);
>> -	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
>> -			this_cpu_ptr(hyperv_pcpu_input_arg);
>> -	input = *input_pcpu;
>> -	if (unlikely(!input)) {
>> -		local_irq_restore(flags);
>> -		return -1;
>> -	}
>> -
>> -	input->partition_id = HV_PARTITION_ID_SELF;
>> -	input->host_visibility = visibility;
>> -	input->reserved0 = 0;
>> -	input->reserved1 = 0;
>> -	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
>> -	hv_status = hv_do_rep_hypercall(
>> -			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
>> -			0, input, &pages_processed);
>> -	local_irq_restore(flags);
>> -
>> -	if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
>> -		return 0;
>> -
>> -	return -EFAULT;
>> -}
>> -EXPORT_SYMBOL(hv_mark_gpa_visibility);
>> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
>> index 204e6f3598a5..f31b669a1ddf 100644
>> --- a/drivers/hv/channel.c
>> +++ b/drivers/hv/channel.c
>> @@ -247,7 +247,7 @@ int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
>>   	u64 *pfn_array;
>>   	int ret = 0;
>>   
>> -	if (!hv_isolation_type_snp())
>> +	if (!hv_is_isolation_supported())
>>   		return 0;
>>   
>>   	pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
>> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
>> index f202ac7f4b3d..28e28ccc2081 100644
>> --- a/drivers/hv/hv.c
>> +++ b/drivers/hv/hv.c
>> @@ -99,17 +99,24 @@ int hv_synic_alloc(void)
>>   		tasklet_init(&hv_cpu->msg_dpc,
>>   			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
>>   
>> -		hv_cpu->synic_message_page =
>> -			(void *)get_zeroed_page(GFP_ATOMIC);
>> -		if (hv_cpu->synic_message_page == NULL) {
>> -			pr_err("Unable to allocate SYNIC message page\n");
>> -			goto err;
>> -		}
>> +		/*
>> +		 * Synic message and event pages are allocated by paravisor.
>> +		 * Skip these pages allocation here.
>> +		 */
>> +		if (!hv_isolation_type_snp()) {
>> +			hv_cpu->synic_message_page =
>> +				(void *)get_zeroed_page(GFP_ATOMIC);
>> +			if (hv_cpu->synic_message_page == NULL) {
>> +				pr_err("Unable to allocate SYNIC message page\n");
>> +				goto err;
>> +			}
>>   
>> -		hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
>> -		if (hv_cpu->synic_event_page == NULL) {
>> -			pr_err("Unable to allocate SYNIC event page\n");
>> -			goto err;
>> +			hv_cpu->synic_event_page =
>> +				(void *)get_zeroed_page(GFP_ATOMIC);
>> +			if (hv_cpu->synic_event_page == NULL) {
>> +				pr_err("Unable to allocate SYNIC event page\n");
>> +				goto err;
>> +			}
>>   		}
>>   
>>   		hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
>> @@ -136,10 +143,17 @@ void hv_synic_free(void)
>>   	for_each_present_cpu(cpu) {
>>   		struct hv_per_cpu_context *hv_cpu
>>   			= per_cpu_ptr(hv_context.cpu_context, cpu);
>> +		free_page((unsigned long)hv_cpu->post_msg_page);
>> +
>> +		/*
>> +		 * Synic message and event pages are allocated by paravisor.
>> +		 * Skip free these pages here.
>> +		 */
>> +		if (hv_isolation_type_snp())
>> +			continue;
>>   
>>   		free_page((unsigned long)hv_cpu->synic_event_page);
>>   		free_page((unsigned long)hv_cpu->synic_message_page);
>> -		free_page((unsigned long)hv_cpu->post_msg_page);
>>   	}
>>   
>>   	kfree(hv_context.hv_numa_map);
>> @@ -161,35 +175,72 @@ void hv_synic_enable_regs(unsigned int cpu)
>>   	union hv_synic_sint shared_sint;
>>   	union hv_synic_scontrol sctrl;
>>   
>> -	/* Setup the Synic's message page */
>> -	hv_get_simp(simp.as_uint64);
>> -	simp.simp_enabled = 1;
>> -	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>> -		>> HV_HYP_PAGE_SHIFT;
>> -
>> -	hv_set_simp(simp.as_uint64);
>> -
>> -	/* Setup the Synic's event page */
>> -	hv_get_siefp(siefp.as_uint64);
>> -	siefp.siefp_enabled = 1;
>> -	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>> -		>> HV_HYP_PAGE_SHIFT;
>> -
>> -	hv_set_siefp(siefp.as_uint64);
>> -
>> -	/* Setup the shared SINT. */
>> -	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>> -
>> -	shared_sint.vector = hv_get_vector();
>> -	shared_sint.masked = false;
>> -	shared_sint.auto_eoi = hv_recommend_using_aeoi();
>> -	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>> -
>> -	/* Enable the global synic bit */
>> -	hv_get_synic_state(sctrl.as_uint64);
>> -	sctrl.enable = 1;
>> -
>> -	hv_set_synic_state(sctrl.as_uint64);
>> +	/*
>> +	 * Setup Synic pages for CVM. Synic message and event page
>> +	 * are allocated by paravisor in the SNP CVM.
>> +	 */
>> +	if (hv_isolation_type_snp()) {
>> +		/* Setup the Synic's message. */
>> +		hv_get_simp_ghcb(&simp.as_uint64);
>> +		simp.simp_enabled = 1;
>> +		hv_cpu->synic_message_page
>> +			= ioremap_cache(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
>> +					PAGE_SIZE);
>> +		if (!hv_cpu->synic_message_page)
>> +			pr_warn("Fail to map syinc message page.\n");
>> +
>> +		hv_set_simp_ghcb(simp.as_uint64);
>> +
>> +		/* Setup the Synic's event page */
>> +		hv_get_siefp_ghcb(&siefp.as_uint64);
>> +		siefp.siefp_enabled = 1;
>> +		hv_cpu->synic_event_page = ioremap_cache(
>> +			 siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT, PAGE_SIZE);
>> +		if (!hv_cpu->synic_event_page)
>> +			pr_warn("Fail to map syinc event page.\n");
>> +		hv_set_siefp_ghcb(siefp.as_uint64);
>> +
>> +		/* Setup the shared SINT. */
>> +		hv_get_synint_state_ghcb(VMBUS_MESSAGE_SINT,
>> +					 &shared_sint.as_uint64);
>> +		shared_sint.vector = hv_get_vector();
>> +		shared_sint.masked = false;
>> +		shared_sint.auto_eoi = hv_recommend_using_aeoi();
>> +		hv_set_synint_state_ghcb(VMBUS_MESSAGE_SINT,
>> +					 shared_sint.as_uint64);
>> +
>> +		/* Enable the global synic bit */
>> +		hv_get_synic_state_ghcb(&sctrl.as_uint64);
>> +		sctrl.enable = 1;
>> +		hv_set_synic_state_ghcb(sctrl.as_uint64);
>> +	} else {
>> +		/* Setup the Synic's message. */
>> +		hv_get_simp(simp.as_uint64);
>> +		simp.simp_enabled = 1;
>> +		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>> +			>> HV_HYP_PAGE_SHIFT;
>> +		hv_set_simp(simp.as_uint64);
>> +
>> +		/* Setup the Synic's event page */
>> +		hv_get_siefp(siefp.as_uint64);
>> +		siefp.siefp_enabled = 1;
>> +		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>> +			>> HV_HYP_PAGE_SHIFT;
>> +		hv_set_siefp(siefp.as_uint64);
>> +
>> +		/* Setup the shared SINT. */
>> +		hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>> +
>> +		shared_sint.vector = hv_get_vector();
>> +		shared_sint.masked = false;
>> +		shared_sint.auto_eoi = hv_recommend_using_aeoi();
>> +		hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>> +
>> +		/* Enable the global synic bit */
>> +		hv_get_synic_state(sctrl.as_uint64);
>> +		sctrl.enable = 1;
>> +		hv_set_synic_state(sctrl.as_uint64);
> 
> There's definitely some room for unification here. E.g. the part after
> 'Setup the shared SINT' looks identical, you can move it outside of the
> if/else block.

Yes, will rework it. Thanks.


> 
>> +	}
>>   }
>>   
>>   int hv_synic_init(unsigned int cpu)
>> @@ -211,30 +262,53 @@ void hv_synic_disable_regs(unsigned int cpu)
>>   	union hv_synic_siefp siefp;
>>   	union hv_synic_scontrol sctrl;
>>   
>> -	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>> +	if (hv_isolation_type_snp()) {
>> +		hv_get_synint_state_ghcb(VMBUS_MESSAGE_SINT,
>> +					 &shared_sint.as_uint64);
>> +		shared_sint.masked = 1;
>> +		hv_set_synint_state_ghcb(VMBUS_MESSAGE_SINT,
>> +					 shared_sint.as_uint64);
>> +
>> +		hv_get_simp_ghcb(&simp.as_uint64);
>> +		simp.simp_enabled = 0;
>> +		simp.base_simp_gpa = 0;
>> +		hv_set_simp_ghcb(simp.as_uint64);
>> +
>> +		hv_get_siefp_ghcb(&siefp.as_uint64);
>> +		siefp.siefp_enabled = 0;
>> +		siefp.base_siefp_gpa = 0;
>> +		hv_set_siefp_ghcb(siefp.as_uint64);
>>   
>> -	shared_sint.masked = 1;
>> +		/* Disable the global synic bit */
>> +		hv_get_synic_state_ghcb(&sctrl.as_uint64);
>> +		sctrl.enable = 0;
>> +		hv_set_synic_state_ghcb(sctrl.as_uint64);
>> +	} else {
>> +		hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>>   
>> -	/* Need to correctly cleanup in the case of SMP!!! */
>> -	/* Disable the interrupt */
>> -	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>> +		shared_sint.masked = 1;
>>   
>> -	hv_get_simp(simp.as_uint64);
>> -	simp.simp_enabled = 0;
>> -	simp.base_simp_gpa = 0;
>> +		/* Need to correctly cleanup in the case of SMP!!! */
>> +		/* Disable the interrupt */
>> +		hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
>>   
>> -	hv_set_simp(simp.as_uint64);
>> +		hv_get_simp(simp.as_uint64);
>> +		simp.simp_enabled = 0;
>> +		simp.base_simp_gpa = 0;
>>   
>> -	hv_get_siefp(siefp.as_uint64);
>> -	siefp.siefp_enabled = 0;
>> -	siefp.base_siefp_gpa = 0;
>> +		hv_set_simp(simp.as_uint64);
>>   
>> -	hv_set_siefp(siefp.as_uint64);
>> +		hv_get_siefp(siefp.as_uint64);
>> +		siefp.siefp_enabled = 0;
>> +		siefp.base_siefp_gpa = 0;
>>   
>> -	/* Disable the global synic bit */
>> -	hv_get_synic_state(sctrl.as_uint64);
>> -	sctrl.enable = 0;
>> -	hv_set_synic_state(sctrl.as_uint64);
>> +		hv_set_siefp(siefp.as_uint64);
>> +
>> +		/* Disable the global synic bit */
>> +		hv_get_synic_state(sctrl.as_uint64);
>> +		sctrl.enable = 0;
>> +		hv_set_synic_state(sctrl.as_uint64);
>> +	}
>>   }
>>   
>>   int hv_synic_cleanup(unsigned int cpu)
>> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
>> index ad0e33776668..6727f4073b5a 100644
>> --- a/include/asm-generic/mshyperv.h
>> +++ b/include/asm-generic/mshyperv.h
>> @@ -23,6 +23,7 @@
>>   #include <linux/bitops.h>
>>   #include <linux/cpumask.h>
>>   #include <asm/ptrace.h>
>> +#include <asm/mshyperv.h>
>>   #include <asm/hyperv-tlfs.h>
>>   
>>   struct ms_hyperv_info {
>> @@ -52,7 +53,7 @@ extern struct ms_hyperv_info ms_hyperv;
>>   
>>   extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
>>   extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
>> -
>> +extern bool hv_isolation_type_snp(void);
>>   
>>   /* Generate the guest OS identifier as described in the Hyper-V TLFS */
>>   static inline  __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
>> @@ -100,7 +101,11 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
>>   		 * possibly deliver another msg from the
>>   		 * hypervisor
>>   		 */
>> -		hv_signal_eom();
>> +		if (hv_isolation_type_snp() &&
>> +		    old_msg_type != HVMSG_TIMER_EXPIRED)
>> +			hv_signal_eom_ghcb();
>> +		else
>> +			hv_signal_eom();
> 
> Would it be better to hide SNP specifics into hv_signal_eom()? Also, out
> of pure curiosity, why are timer messages special?
> 
>>   	}
>>   }
>>   
>> @@ -186,6 +191,7 @@ bool hv_is_hyperv_initialized(void);
>>   bool hv_is_hibernation_supported(void);
>>   enum hv_isolation_type hv_get_isolation_type(void);
>>   bool hv_is_isolation_supported(void);
>> +bool hv_isolation_type_snp(void);
>>   void hyperv_cleanup(void);
>>   #else /* CONFIG_HYPERV */
>>   static inline bool hv_is_hyperv_initialized(void) { return false; }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM
  2021-03-03 17:21   ` Vitaly Kuznetsov
@ 2021-03-05 15:21     ` Tianyu Lan
  0 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2021-03-05 15:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, thomas.lendacky,
	brijesh.singh, sunilmut, kys, haiyangz, sthemmin, wei.liu, tglx,
	mingo, bp, x86, hpa



On 3/4/2021 1:21 AM, Vitaly Kuznetsov wrote:
> Tianyu Lan <ltykernel@gmail.com> writes:
> 
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Hyper-V provides ghcb hvcall to handle VMBus
>> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
>> msg in SNP Isolation VM. Add such support.
>>
>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> ---
>>   arch/x86/hyperv/ivm.c           | 69 +++++++++++++++++++++++++++++++++
>>   arch/x86/include/asm/mshyperv.h |  1 +
>>   drivers/hv/connection.c         |  6 ++-
>>   drivers/hv/hv.c                 |  8 +++-
>>   4 files changed, 82 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
>> index 4332bf7aaf9b..feaabcd151f5 100644
>> --- a/arch/x86/hyperv/ivm.c
>> +++ b/arch/x86/hyperv/ivm.c
>> @@ -14,8 +14,77 @@
>>   
>>   union hv_ghcb {
>>   	struct ghcb ghcb;
>> +	struct {
>> +		u64 hypercalldata[509];
>> +		u64 outputgpa;
>> +		union {
>> +			union {
>> +				struct {
>> +					u32 callcode        : 16;
>> +					u32 isfast          : 1;
>> +					u32 reserved1       : 14;
>> +					u32 isnested        : 1;
>> +					u32 countofelements : 12;
>> +					u32 reserved2       : 4;
>> +					u32 repstartindex   : 12;
>> +					u32 reserved3       : 4;
>> +				};
>> +				u64 asuint64;
>> +			} hypercallinput;
>> +			union {
>> +				struct {
>> +					u16 callstatus;
>> +					u16 reserved1;
>> +					u32 elementsprocessed : 12;
>> +					u32 reserved2         : 20;
>> +				};
>> +				u64 asunit64;
>> +			} hypercalloutput;
>> +		};
>> +		u64 reserved2;
>> +	} hypercall;
>>   } __packed __aligned(PAGE_SIZE);
>>   
>> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
>> +{
>> +	union hv_ghcb *hv_ghcb;
>> +	void **ghcb_base;
>> +	unsigned long flags;
>> +
>> +	if (!ms_hyperv.ghcb_base)
>> +		return -EFAULT;
>> +
>> +	local_irq_save(flags);
>> +	ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>> +	hv_ghcb = (union hv_ghcb *)*ghcb_base;
>> +	if (!hv_ghcb) {
>> +		local_irq_restore(flags);
>> +		return -EFAULT;
>> +	}
>> +
>> +	memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
>> +	hv_ghcb->ghcb.protocol_version = 1;
>> +	hv_ghcb->ghcb.ghcb_usage = 1;
>> +
>> +	hv_ghcb->hypercall.outputgpa = (u64)output;
>> +	hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
>> +	hv_ghcb->hypercall.hypercallinput.callcode = control;
>> +
>> +	if (input_size)
>> +		memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
>> +
>> +	VMGEXIT();
>> +
>> +	hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
>> +	memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
>> +	       sizeof(hv_ghcb->ghcb.save.valid_bitmap));
>> +
>> +	local_irq_restore(flags);
>> +
>> +	return hv_ghcb->hypercall.hypercalloutput.callstatus;
>> +}
>> +EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
>> +
>>   void hv_ghcb_msr_write(u64 msr, u64 value)
>>   {
>>   	union hv_ghcb *hv_ghcb;
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index f624d72b99d3..c8f66d269e5b 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -274,6 +274,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>>   void hv_signal_eom_ghcb(void);
>>   void hv_ghcb_msr_write(u64 msr, u64 value);
>>   void hv_ghcb_msr_read(u64 msr, u64 *value);
>> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
>>   
>>   #define hv_get_synint_state_ghcb(int_num, val)			\
>>   	hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index c83612cddb99..79bca653dce9 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
>>   
>>   	++channel->sig_events;
>>   
>> -	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
>> +	if (hv_isolation_type_snp())
>> +		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
>> +				NULL, sizeof(u64));
>> +	else
>> +		hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> 
> vmbus_set_event() is a hotpath so I'd suggest we introduce a static
> branch instead of checking hv_isolation_type_snp() every time.
> 

Good suggestion. Will add it in the next version. Thanks.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-03-05 15:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-28 15:03 [RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl() Tianyu Lan
2021-03-03 16:27   ` Vitaly Kuznetsov
2021-03-05  6:06     ` Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support Tianyu Lan
2021-03-03 16:58   ` Vitaly Kuznetsov
2021-03-05  6:23     ` Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary Tianyu Lan
2021-03-03 17:05   ` Vitaly Kuznetsov
2021-02-28 15:03 ` [RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb Tianyu Lan
2021-03-03 17:16   ` Vitaly Kuznetsov
2021-03-05  6:37     ` Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM Tianyu Lan
2021-03-03 17:21   ` Vitaly Kuznetsov
2021-03-05 15:21     ` Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 6/12] HV/Vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 7/12] hv/vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 8/12] x86/Hyper-V: Initialize bounce buffer page cache and list Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 9/12] x86/Hyper-V: Add new parameter for vmbus_sendpacket_pagebuffer()/mpb_desc() Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 10/12] HV: Add bounce buffer support for Isolation VM Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
2021-02-28 15:03 ` [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc Tianyu Lan
2021-03-01  6:54   ` Christoph Hellwig
2021-03-01 13:43     ` Tianyu Lan
2021-03-01 19:45       ` [EXTERNAL] " Sunil Muthuswamy
2021-03-02 16:03         ` Tianyu Lan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.