Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K
@ 2020-07-21  1:41 Boqun Feng
  2020-07-21  1:41 ` [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl Boqun Feng
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

This patchset add the necessary changes to support guests whose page
size is larger than 4K.

Hyper-V always uses 4K as the page size and expects the same page size
when communicating with guests. That is, all the "pfn"s in the
hypervisor-guest communication protocol are the page numbers in the unit
of HV_HYP_PAGE_SIZE rather than PAGE_SIZE. To support guests with larger
page size, we need to convert between these two page sizes correctly in
the hypervisor-guest communication, which is basically what this
patchset does.

In this conversion, one challenge is how to handle the ringbuffer. A
ringbuffer has two parts: a header and a data part, both of which want
to be PAGE_SIZE aligned in the guest, because we use the "double
mapping" trick to map the data part twice in the guest virtual address
space for faster wrap-around and ease to process data in place. However,
the Hyper-V hypervisor always treats the ringbuffer headers as 4k pages.
To overcome this gap, we enlarge the hv_ring_buffer structure to be
always PAGE_SIZE aligned, and introduce the gpadl type concept to allow
vmbus_establish_gpadl() to handle ringbuffer cases specially. Note that
gpadl type is only meaningful to the guest, there is no such concept in
Hyper-V hypervisor.

This patchset consists of 11 patches:

Patch 1~4: Introduce the types of gpadl, so that we can handle
	   ringbuffer when PAGE_SIZE != HV_HYP_PAGE_SIZE, and also fix
	   a few places where we should use HV_HYP_PAGE_SIZE other than
	   PAGE_SIZE.

Patch 5~6: Add a few helper functions to help calculate the hvpfn (page
	   number in the unit of HV_HYP_PAGE_SIZE) and other related
	   data. So that we can use them in the code of drivers.

Patch 7~11: Use the helpers and change the driver code accordingly to
	    make net/input/util/storage driver work with PAGE_SIZE !=
	    HV_HYP_PAGE_SIZE

I've done some tests with PAGE_SIZE=64k and PAGE_SIZE=16k configurations
on ARM64 guests (with Michael's patchset[1] for ARM64 Hyper-V guest
support), nothing major breaks yet ;-) (I could observe an error caused
by unaligned firmware data, but it's better to have it fixed in the
Hyper-V).

Looking forwards to comments and suggestions!

Regards,
Boqun

[1]: https://lore.kernel.org/lkml/1584200119-18594-1-git-send-email-mikelley@microsoft.com/

Boqun Feng (11):
  Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
  Drivers: hv: vmbus: Move __vmbus_open()
  Drivers: hv: vmbus: Introduce types of GPADL
  Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs()
  Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header
  hv: hyperv.h: Introduce some hvpfn helper functions
  hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication
  Input: hyperv-keyboard: Make ringbuffer at least take two pages
  HID: hyperv: Make ringbuffer at least take two pages
  Driver: hv: util: Make ringbuffer at least take two pages
  scsi: storvsc: Support PAGE_SIZE larger than 4K

 drivers/hid/hid-hyperv.c              |   4 +-
 drivers/hv/channel.c                  | 452 +++++++++++++++-----------
 drivers/hv/hv.c                       |   4 +-
 drivers/hv/hv_util.c                  |  16 +-
 drivers/input/serio/hyperv-keyboard.c |   4 +-
 drivers/net/hyperv/netvsc.c           |   2 +-
 drivers/net/hyperv/netvsc_drv.c       |  46 +--
 drivers/net/hyperv/rndis_filter.c     |  12 +-
 drivers/scsi/storvsc_drv.c            |  27 +-
 include/linux/hyperv.h                |  63 +++-
 10 files changed, 400 insertions(+), 230 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21 15:22   ` Wei Liu
  2020-07-21  1:41 ` [RFC 02/11] Drivers: hv: vmbus: Move __vmbus_open() Boqun Feng
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

Since the hypervisor always uses 4K as its page size, the size of PFNs
used for gpadl should be HV_HYP_PAGE_SIZE rather than PAGE_SIZE, so
adjust this accordingly as the preparation for supporting 16K/64K page
size guests.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hv/channel.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 23f358cb7f49..b4378d9ae6ca 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -21,9 +21,6 @@
 
 #include "hyperv_vmbus.h"
 
-#define NUM_PAGES_SPANNED(addr, len) \
-((PAGE_ALIGN(addr + len) >> PAGE_SHIFT) - (addr >> PAGE_SHIFT))
-
 static unsigned long virt_to_hvpfn(void *addr)
 {
 	phys_addr_t paddr;
@@ -34,7 +31,7 @@ static unsigned long virt_to_hvpfn(void *addr)
 	else
 		paddr = __pa(addr);
 
-	return  paddr >> PAGE_SHIFT;
+	return  paddr >> HV_HYP_PAGE_SHIFT;
 }
 
 /*
@@ -305,7 +302,7 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 
 	int pfnsum, pfncount, pfnleft, pfncurr, pfnsize;
 
-	pagecount = size >> PAGE_SHIFT;
+	pagecount = size >> HV_HYP_PAGE_SHIFT;
 
 	/* do we need a gpadl body msg */
 	pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
@@ -335,7 +332,7 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 		gpadl_header->range[0].byte_count = size;
 		for (i = 0; i < pfncount; i++)
 			gpadl_header->range[0].pfn_array[i] = virt_to_hvpfn(
-				kbuffer + PAGE_SIZE * i);
+				kbuffer + HV_HYP_PAGE_SIZE * i);
 		*msginfo = msgheader;
 
 		pfnsum = pfncount;
@@ -387,7 +384,7 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 			 */
 			for (i = 0; i < pfncurr; i++)
 				gpadl_body->pfn[i] = virt_to_hvpfn(
-					kbuffer + PAGE_SIZE * (pfnsum + i));
+					kbuffer + HV_HYP_PAGE_SIZE * (pfnsum + i));
 
 			/* add to msg header */
 			list_add_tail(&msgbody->msglistentry,
@@ -416,7 +413,7 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 		gpadl_header->range[0].byte_count = size;
 		for (i = 0; i < pagecount; i++)
 			gpadl_header->range[0].pfn_array[i] = virt_to_hvpfn(
-				kbuffer + PAGE_SIZE * i);
+				kbuffer + HV_HYP_PAGE_SIZE * i);
 
 		*msginfo = msgheader;
 	}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 02/11] Drivers: hv: vmbus: Move __vmbus_open()
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
  2020-07-21  1:41 ` [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21 15:23   ` Wei Liu
  2020-07-21  1:41 ` [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL Boqun Feng
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

Pure function movement, no functional changes. The move is made, because
in a later change, __vmbus_open() will rely on some static functions
afterwards, so we sperate the move and the modification of
__vmbus_open() in two patches to make it easy to review.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hv/channel.c | 316 +++++++++++++++++++++----------------------
 1 file changed, 158 insertions(+), 158 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index b4378d9ae6ca..3d9d5d2c7fd1 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -108,164 +108,6 @@ int vmbus_alloc_ring(struct vmbus_channel *newchannel,
 }
 EXPORT_SYMBOL_GPL(vmbus_alloc_ring);
 
-static int __vmbus_open(struct vmbus_channel *newchannel,
-		       void *userdata, u32 userdatalen,
-		       void (*onchannelcallback)(void *context), void *context)
-{
-	struct vmbus_channel_open_channel *open_msg;
-	struct vmbus_channel_msginfo *open_info = NULL;
-	struct page *page = newchannel->ringbuffer_page;
-	u32 send_pages, recv_pages;
-	unsigned long flags;
-	int err;
-
-	if (userdatalen > MAX_USER_DEFINED_BYTES)
-		return -EINVAL;
-
-	send_pages = newchannel->ringbuffer_send_offset;
-	recv_pages = newchannel->ringbuffer_pagecount - send_pages;
-
-	spin_lock_irqsave(&newchannel->lock, flags);
-	if (newchannel->state != CHANNEL_OPEN_STATE) {
-		spin_unlock_irqrestore(&newchannel->lock, flags);
-		return -EINVAL;
-	}
-	spin_unlock_irqrestore(&newchannel->lock, flags);
-
-	newchannel->state = CHANNEL_OPENING_STATE;
-	newchannel->onchannel_callback = onchannelcallback;
-	newchannel->channel_callback_context = context;
-
-	err = hv_ringbuffer_init(&newchannel->outbound, page, send_pages);
-	if (err)
-		goto error_clean_ring;
-
-	err = hv_ringbuffer_init(&newchannel->inbound,
-				 &page[send_pages], recv_pages);
-	if (err)
-		goto error_clean_ring;
-
-	/* Establish the gpadl for the ring buffer */
-	newchannel->ringbuffer_gpadlhandle = 0;
-
-	err = vmbus_establish_gpadl(newchannel,
-				    page_address(newchannel->ringbuffer_page),
-				    (send_pages + recv_pages) << PAGE_SHIFT,
-				    &newchannel->ringbuffer_gpadlhandle);
-	if (err)
-		goto error_clean_ring;
-
-	/* Create and init the channel open message */
-	open_info = kmalloc(sizeof(*open_info) +
-			   sizeof(struct vmbus_channel_open_channel),
-			   GFP_KERNEL);
-	if (!open_info) {
-		err = -ENOMEM;
-		goto error_free_gpadl;
-	}
-
-	init_completion(&open_info->waitevent);
-	open_info->waiting_channel = newchannel;
-
-	open_msg = (struct vmbus_channel_open_channel *)open_info->msg;
-	open_msg->header.msgtype = CHANNELMSG_OPENCHANNEL;
-	open_msg->openid = newchannel->offermsg.child_relid;
-	open_msg->child_relid = newchannel->offermsg.child_relid;
-	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
-	open_msg->downstream_ringbuffer_pageoffset = newchannel->ringbuffer_send_offset;
-	open_msg->target_vp = newchannel->target_vp;
-
-	if (userdatalen)
-		memcpy(open_msg->userdata, userdata, userdatalen);
-
-	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
-	list_add_tail(&open_info->msglistentry,
-		      &vmbus_connection.chn_msg_list);
-	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
-
-	if (newchannel->rescind) {
-		err = -ENODEV;
-		goto error_free_info;
-	}
-
-	err = vmbus_post_msg(open_msg,
-			     sizeof(struct vmbus_channel_open_channel), true);
-
-	trace_vmbus_open(open_msg, err);
-
-	if (err != 0)
-		goto error_clean_msglist;
-
-	wait_for_completion(&open_info->waitevent);
-
-	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
-	list_del(&open_info->msglistentry);
-	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
-
-	if (newchannel->rescind) {
-		err = -ENODEV;
-		goto error_free_info;
-	}
-
-	if (open_info->response.open_result.status) {
-		err = -EAGAIN;
-		goto error_free_info;
-	}
-
-	newchannel->state = CHANNEL_OPENED_STATE;
-	kfree(open_info);
-	return 0;
-
-error_clean_msglist:
-	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
-	list_del(&open_info->msglistentry);
-	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
-error_free_info:
-	kfree(open_info);
-error_free_gpadl:
-	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
-	newchannel->ringbuffer_gpadlhandle = 0;
-error_clean_ring:
-	hv_ringbuffer_cleanup(&newchannel->outbound);
-	hv_ringbuffer_cleanup(&newchannel->inbound);
-	newchannel->state = CHANNEL_OPEN_STATE;
-	return err;
-}
-
-/*
- * vmbus_connect_ring - Open the channel but reuse ring buffer
- */
-int vmbus_connect_ring(struct vmbus_channel *newchannel,
-		       void (*onchannelcallback)(void *context), void *context)
-{
-	return  __vmbus_open(newchannel, NULL, 0, onchannelcallback, context);
-}
-EXPORT_SYMBOL_GPL(vmbus_connect_ring);
-
-/*
- * vmbus_open - Open the specified channel.
- */
-int vmbus_open(struct vmbus_channel *newchannel,
-	       u32 send_ringbuffer_size, u32 recv_ringbuffer_size,
-	       void *userdata, u32 userdatalen,
-	       void (*onchannelcallback)(void *context), void *context)
-{
-	int err;
-
-	err = vmbus_alloc_ring(newchannel, send_ringbuffer_size,
-			       recv_ringbuffer_size);
-	if (err)
-		return err;
-
-	err = __vmbus_open(newchannel, userdata, userdatalen,
-			   onchannelcallback, context);
-	if (err)
-		vmbus_free_ring(newchannel);
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(vmbus_open);
-
 /* Used for Hyper-V Socket: a guest client's connect() to the host */
 int vmbus_send_tl_connect_request(const guid_t *shv_guest_servie_id,
 				  const guid_t *shv_host_servie_id)
@@ -531,6 +373,164 @@ int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
 }
 EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
 
+static int __vmbus_open(struct vmbus_channel *newchannel,
+		       void *userdata, u32 userdatalen,
+		       void (*onchannelcallback)(void *context), void *context)
+{
+	struct vmbus_channel_open_channel *open_msg;
+	struct vmbus_channel_msginfo *open_info = NULL;
+	struct page *page = newchannel->ringbuffer_page;
+	u32 send_pages, recv_pages;
+	unsigned long flags;
+	int err;
+
+	if (userdatalen > MAX_USER_DEFINED_BYTES)
+		return -EINVAL;
+
+	send_pages = newchannel->ringbuffer_send_offset;
+	recv_pages = newchannel->ringbuffer_pagecount - send_pages;
+
+	spin_lock_irqsave(&newchannel->lock, flags);
+	if (newchannel->state != CHANNEL_OPEN_STATE) {
+		spin_unlock_irqrestore(&newchannel->lock, flags);
+		return -EINVAL;
+	}
+	spin_unlock_irqrestore(&newchannel->lock, flags);
+
+	newchannel->state = CHANNEL_OPENING_STATE;
+	newchannel->onchannel_callback = onchannelcallback;
+	newchannel->channel_callback_context = context;
+
+	err = hv_ringbuffer_init(&newchannel->outbound, page, send_pages);
+	if (err)
+		goto error_clean_ring;
+
+	err = hv_ringbuffer_init(&newchannel->inbound,
+				 &page[send_pages], recv_pages);
+	if (err)
+		goto error_clean_ring;
+
+	/* Establish the gpadl for the ring buffer */
+	newchannel->ringbuffer_gpadlhandle = 0;
+
+	err = vmbus_establish_gpadl(newchannel,
+				    page_address(newchannel->ringbuffer_page),
+				    (send_pages + recv_pages) << PAGE_SHIFT,
+				    &newchannel->ringbuffer_gpadlhandle);
+	if (err)
+		goto error_clean_ring;
+
+	/* Create and init the channel open message */
+	open_info = kmalloc(sizeof(*open_info) +
+			   sizeof(struct vmbus_channel_open_channel),
+			   GFP_KERNEL);
+	if (!open_info) {
+		err = -ENOMEM;
+		goto error_free_gpadl;
+	}
+
+	init_completion(&open_info->waitevent);
+	open_info->waiting_channel = newchannel;
+
+	open_msg = (struct vmbus_channel_open_channel *)open_info->msg;
+	open_msg->header.msgtype = CHANNELMSG_OPENCHANNEL;
+	open_msg->openid = newchannel->offermsg.child_relid;
+	open_msg->child_relid = newchannel->offermsg.child_relid;
+	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
+	open_msg->downstream_ringbuffer_pageoffset = newchannel->ringbuffer_send_offset;
+	open_msg->target_vp = newchannel->target_vp;
+
+	if (userdatalen)
+		memcpy(open_msg->userdata, userdata, userdatalen);
+
+	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
+	list_add_tail(&open_info->msglistentry,
+		      &vmbus_connection.chn_msg_list);
+	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
+
+	if (newchannel->rescind) {
+		err = -ENODEV;
+		goto error_free_info;
+	}
+
+	err = vmbus_post_msg(open_msg,
+			     sizeof(struct vmbus_channel_open_channel), true);
+
+	trace_vmbus_open(open_msg, err);
+
+	if (err != 0)
+		goto error_clean_msglist;
+
+	wait_for_completion(&open_info->waitevent);
+
+	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
+	list_del(&open_info->msglistentry);
+	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
+
+	if (newchannel->rescind) {
+		err = -ENODEV;
+		goto error_free_info;
+	}
+
+	if (open_info->response.open_result.status) {
+		err = -EAGAIN;
+		goto error_free_info;
+	}
+
+	newchannel->state = CHANNEL_OPENED_STATE;
+	kfree(open_info);
+	return 0;
+
+error_clean_msglist:
+	spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
+	list_del(&open_info->msglistentry);
+	spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
+error_free_info:
+	kfree(open_info);
+error_free_gpadl:
+	vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
+	newchannel->ringbuffer_gpadlhandle = 0;
+error_clean_ring:
+	hv_ringbuffer_cleanup(&newchannel->outbound);
+	hv_ringbuffer_cleanup(&newchannel->inbound);
+	newchannel->state = CHANNEL_OPEN_STATE;
+	return err;
+}
+
+/*
+ * vmbus_connect_ring - Open the channel but reuse ring buffer
+ */
+int vmbus_connect_ring(struct vmbus_channel *newchannel,
+		       void (*onchannelcallback)(void *context), void *context)
+{
+	return  __vmbus_open(newchannel, NULL, 0, onchannelcallback, context);
+}
+EXPORT_SYMBOL_GPL(vmbus_connect_ring);
+
+/*
+ * vmbus_open - Open the specified channel.
+ */
+int vmbus_open(struct vmbus_channel *newchannel,
+	       u32 send_ringbuffer_size, u32 recv_ringbuffer_size,
+	       void *userdata, u32 userdatalen,
+	       void (*onchannelcallback)(void *context), void *context)
+{
+	int err;
+
+	err = vmbus_alloc_ring(newchannel, send_ringbuffer_size,
+			       recv_ringbuffer_size);
+	if (err)
+		return err;
+
+	err = __vmbus_open(newchannel, userdata, userdatalen,
+			   onchannelcallback, context);
+	if (err)
+		vmbus_free_ring(newchannel);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(vmbus_open);
+
 /*
  * vmbus_teardown_gpadl -Teardown the specified GPADL handle
  */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
  2020-07-21  1:41 ` [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl Boqun Feng
  2020-07-21  1:41 ` [RFC 02/11] Drivers: hv: vmbus: Move __vmbus_open() Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-22 23:25   ` Michael Kelley
  2020-07-21  1:41 ` [RFC 04/11] Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs() Boqun Feng
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

This patch introduces two types of GPADL: HV_GPADL_{BUFFER, RING}. The
types of GPADL are purely the concept in the guest, IOW the hypervisor
treat them as the same.

The reason of introducing the types of GPADL is to support guests whose
page size is not 4k (the page size of Hyper-V hypervisor). In these
guests, both the headers and the data parts of the ringbuffers need to
be aligned to the PAGE_SIZE, because 1) some of the ringbuffers will be
mapped into userspace and 2) we use "double mapping" mechanism to
support fast wrap-around, and "double mapping" relies on ringbuffers
being page-aligned. However, the Hyper-V hypervisor only uses 4k
(HV_HYP_PAGE_SIZE) headers. Our solution to this is that we always make
the headers of ringbuffers take one guest page and when GPADL is
established between the guest and hypervisor, the only first 4k of
header is used. To handle this special case, we need the types of GPADL
to differ different guest memory usage for GPADL.

Type enum is introduced along with several general interfaces to
describe the differences between normal buffer GPADL and ringbuffer
GPADL.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hv/channel.c   | 140 +++++++++++++++++++++++++++++++++++------
 include/linux/hyperv.h |  44 ++++++++++++-
 2 files changed, 164 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 3d9d5d2c7fd1..13e98ac2a00f 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -34,6 +34,75 @@ static unsigned long virt_to_hvpfn(void *addr)
 	return  paddr >> HV_HYP_PAGE_SHIFT;
 }
 
+/*
+ * hv_gpadl_size - Return the real size of a gpadl, the size that Hyper-V uses
+ *
+ * For BUFFER gpadl, Hyper-V uses the exact same size as the guest does.
+ *
+ * For RING gpadl, in each ring, the guest uses one PAGE_SIZE as the header
+ * (because of the alignment requirement), however, the hypervisor only
+ * uses the first HV_HYP_PAGE_SIZE as the header, therefore leaving a
+ * (PAGE_SIZE - HV_HYP_PAGE_SIZE) gap. And since there are two rings in a
+ * ringbuffer, So the total size for a RING gpadl that Hyper-V uses is the
+ * total size that the guest uses minus twice of the gap size.
+ */
+static inline u32 hv_gpadl_size(enum hv_gpadl_type type, u32 size)
+{
+	switch (type) {
+	case HV_GPADL_BUFFER:
+		return size;
+	case HV_GPADL_RING:
+		/* The size of a ringbuffer must be page-aligned */
+		BUG_ON(size % PAGE_SIZE);
+		/*
+		 * Two things to notice here:
+		 * 1) We're processing two ring buffers as a unit
+		 * 2) We're skipping any space larger than HV_HYP_PAGE_SIZE in
+		 * the first guest-size page of each of the two ring buffers.
+		 * So we effectively subtract out two guest-size pages, and add
+		 * back two Hyper-V size pages.
+		 */
+		return size - 2 * (PAGE_SIZE - HV_HYP_PAGE_SIZE);
+	}
+	BUG();
+	return 0;
+}
+
+/*
+ * hv_gpadl_hvpfn - Return the Hyper-V page PFN of the @i th Hyper-V page in
+ *                  the gpadl
+ *
+ * @type: the type of the gpadl
+ * @kbuffer: the pointer to the gpadl in the guest
+ * @size: the total size (in bytes) of the gpadl
+ * @send_offset: the offset (in bytes) where the send ringbuffer starts
+ * @i: the index
+ */
+static inline u64 hv_gpadl_hvpfn(enum hv_gpadl_type type, void *kbuffer,
+				 u32 size, u32 send_offset, int i)
+{
+	int send_idx = (send_offset - (PAGE_SIZE - HV_HYP_PAGE_SIZE)) >> HV_HYP_PAGE_SHIFT;
+	unsigned long delta = 0UL;
+
+	switch (type) {
+	case HV_GPADL_BUFFER:
+		break;
+	case HV_GPADL_RING:
+		if (i == 0)
+			delta = 0;
+		else if (i <= send_idx)
+			delta = PAGE_SIZE - HV_HYP_PAGE_SIZE;
+		else
+			delta = 2 * (PAGE_SIZE - HV_HYP_PAGE_SIZE);
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	return virt_to_hvpfn(kbuffer + delta + (HV_HYP_PAGE_SIZE * i));
+}
+
 /*
  * vmbus_setevent- Trigger an event notification on the specified
  * channel.
@@ -131,7 +200,8 @@ EXPORT_SYMBOL_GPL(vmbus_send_tl_connect_request);
 /*
  * create_gpadl_header - Creates a gpadl for the specified buffer
  */
-static int create_gpadl_header(void *kbuffer, u32 size,
+static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
+			       u32 size, u32 send_offset,
 			       struct vmbus_channel_msginfo **msginfo)
 {
 	int i;
@@ -144,7 +214,7 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 
 	int pfnsum, pfncount, pfnleft, pfncurr, pfnsize;
 
-	pagecount = size >> HV_HYP_PAGE_SHIFT;
+	pagecount = hv_gpadl_size(type, size) >> HV_HYP_PAGE_SHIFT;
 
 	/* do we need a gpadl body msg */
 	pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
@@ -171,10 +241,10 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 		gpadl_header->range_buflen = sizeof(struct gpa_range) +
 					 pagecount * sizeof(u64);
 		gpadl_header->range[0].byte_offset = 0;
-		gpadl_header->range[0].byte_count = size;
+		gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
 		for (i = 0; i < pfncount; i++)
-			gpadl_header->range[0].pfn_array[i] = virt_to_hvpfn(
-				kbuffer + HV_HYP_PAGE_SIZE * i);
+			gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
+				type, kbuffer, size, send_offset, i);
 		*msginfo = msgheader;
 
 		pfnsum = pfncount;
@@ -225,8 +295,8 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 			 * so the hypervisor guarantees that this is ok.
 			 */
 			for (i = 0; i < pfncurr; i++)
-				gpadl_body->pfn[i] = virt_to_hvpfn(
-					kbuffer + HV_HYP_PAGE_SIZE * (pfnsum + i));
+				gpadl_body->pfn[i] = hv_gpadl_hvpfn(type,
+					kbuffer, size, send_offset, pfnsum + i);
 
 			/* add to msg header */
 			list_add_tail(&msgbody->msglistentry,
@@ -252,10 +322,10 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 		gpadl_header->range_buflen = sizeof(struct gpa_range) +
 					 pagecount * sizeof(u64);
 		gpadl_header->range[0].byte_offset = 0;
-		gpadl_header->range[0].byte_count = size;
+		gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
 		for (i = 0; i < pagecount; i++)
-			gpadl_header->range[0].pfn_array[i] = virt_to_hvpfn(
-				kbuffer + HV_HYP_PAGE_SIZE * i);
+			gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
+				type, kbuffer, size, send_offset, i);
 
 		*msginfo = msgheader;
 	}
@@ -268,15 +338,20 @@ static int create_gpadl_header(void *kbuffer, u32 size,
 }
 
 /*
- * vmbus_establish_gpadl - Establish a GPADL for the specified buffer
+ * __vmbus_establish_gpadl - Establish a GPADL for a buffer or ringbuffer
  *
  * @channel: a channel
+ * @type: the type of the corresponding GPADL, only meaningful for the guest.
  * @kbuffer: from kmalloc or vmalloc
  * @size: page-size multiple
+ * @send_offset: the offset (in bytes) where the send ring buffer starts,
+ * 		 should be 0 for BUFFER type gpadl
  * @gpadl_handle: some funky thing
  */
-int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
-			       u32 size, u32 *gpadl_handle)
+static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
+				   enum hv_gpadl_type type, void *kbuffer,
+				   u32 size, u32 send_offset,
+				   u32 *gpadl_handle)
 {
 	struct vmbus_channel_gpadl_header *gpadlmsg;
 	struct vmbus_channel_gpadl_body *gpadl_body;
@@ -290,7 +365,7 @@ int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
 	next_gpadl_handle =
 		(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
 
-	ret = create_gpadl_header(kbuffer, size, &msginfo);
+	ret = create_gpadl_header(type, kbuffer, size, send_offset, &msginfo);
 	if (ret)
 		return ret;
 
@@ -371,6 +446,21 @@ int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
 	kfree(msginfo);
 	return ret;
 }
+
+/*
+ * vmbus_establish_gpadl - Establish a GPADL for the specified buffer
+ *
+ * @channel: a channel
+ * @kbuffer: from kmalloc or vmalloc
+ * @size: page-size multiple
+ * @gpadl_handle: some funky thing
+ */
+int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
+			  u32 size, u32 *gpadl_handle)
+{
+	return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
+				       0U, gpadl_handle);
+}
 EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
 
 static int __vmbus_open(struct vmbus_channel *newchannel,
@@ -413,10 +503,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	/* Establish the gpadl for the ring buffer */
 	newchannel->ringbuffer_gpadlhandle = 0;
 
-	err = vmbus_establish_gpadl(newchannel,
-				    page_address(newchannel->ringbuffer_page),
-				    (send_pages + recv_pages) << PAGE_SHIFT,
-				    &newchannel->ringbuffer_gpadlhandle);
+	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
+				      page_address(newchannel->ringbuffer_page),
+				      (send_pages + recv_pages) << PAGE_SHIFT,
+				      newchannel->ringbuffer_send_offset << PAGE_SHIFT,
+				      &newchannel->ringbuffer_gpadlhandle);
 	if (err)
 		goto error_clean_ring;
 
@@ -437,7 +528,17 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	open_msg->openid = newchannel->offermsg.child_relid;
 	open_msg->child_relid = newchannel->offermsg.child_relid;
 	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
-	open_msg->downstream_ringbuffer_pageoffset = newchannel->ringbuffer_send_offset;
+	/*
+	 * The unit of ->downstream_ringbuffer_pageoffset is HV_HYP_PAGE and
+	 * the unit of ->ringbuffer_send_offset is PAGE, so here we first
+	 * calculate it into bytes and then convert into HV_HYP_PAGE. Also
+	 * ->ringbuffer_send_offset is the offset in guest, while
+	 * ->downstream_ringbuffer_pageoffset is the offset in gpadl (i.e. in
+	 * hypervisor), so a (PAGE_SIZE - HV_HYP_PAGE_SIZE) gap need to be
+	 * skipped.
+	 */
+	open_msg->downstream_ringbuffer_pageoffset =
+		((newchannel->ringbuffer_send_offset << PAGE_SHIFT) - (PAGE_SIZE - HV_HYP_PAGE_SIZE)) >> HV_HYP_PAGE_SHIFT;
 	open_msg->target_vp = newchannel->target_vp;
 
 	if (userdatalen)
@@ -497,6 +598,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 	return err;
 }
 
+
 /*
  * vmbus_connect_ring - Open the channel but reuse ring buffer
  */
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 692c89ccf5df..663f0a016237 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -29,6 +29,48 @@
 
 #pragma pack(push, 1)
 
+/*
+ * Types for GPADL, decides is how GPADL header is created.
+ *
+ * It doesn't make much difference between BUFFER and RING if PAGE_SIZE is the
+ * same as HV_HYP_PAGE_SIZE.
+ *
+ * If PAGE_SIZE is bigger than HV_HYP_PAGE_SIZE, the headers of ring buffers
+ * will be of PAGE_SIZE, however, only the first HV_HYP_PAGE will be put
+ * into gpadl, therefore the number for HV_HYP_PAGE and the indexes of each
+ * HV_HYP_PAGE will be different between different types of GPADL, for example
+ * if PAGE_SIZE is 64K:
+ *
+ * BUFFER:
+ *
+ * gva:    |--       64k      --|--       64k      --| ... |
+ * gpa:    | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k |
+ * index:  0    1    2     15   16   17   18 .. 31   32 ...
+ *         |    |    ...   |    |    |   ...    |   ...
+ *         v    V          V    V    V          V
+ * gpadl:  | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k | ... |
+ * index:  0    1    2 ... 15   16   17   18 .. 31   32 ...
+ *
+ * RING:
+ *
+ *         | header  |           data           | header  |     data      |
+ * gva:    |-- 64k --|--       64k      --| ... |-- 64k --|-- 64k --| ... |
+ * gpa:    | 4k | .. | 4k | 4k | ... | 4k | ... | 4k | .. | 4k | .. | ... |
+ * index:  0    1    16   17   18    31   ...   n   n+1  n+16 ...         2n
+ *         |         /    /          /          |         /               /
+ *         |        /    /          /           |        /               /
+ *         |       /    /   ...    /    ...     |       /      ...      /
+ *         |      /    /          /             |      /               /
+ *         |     /    /          /              |     /               /
+ *         V    V    V          V               V    V               v
+ * gpadl:  | 4k | 4k |   ...    |    ...        | 4k | 4k |  ...     |
+ * index:  0    1    2   ...    16   ...       n-15 n-14 n-13  ...  2n-30
+ */
+enum hv_gpadl_type {
+	HV_GPADL_BUFFER,
+	HV_GPADL_RING
+};
+
 /* Single-page buffer */
 struct hv_page_buffer {
 	u32 len;
@@ -111,7 +153,7 @@ struct hv_ring_buffer {
 	} feature_bits;
 
 	/* Pad it to PAGE_SIZE so that data starts on page boundary */
-	u8	reserved2[4028];
+	u8	reserved2[PAGE_SIZE - 68];
 
 	/*
 	 * Ring data starts here + RingDataStartOffset
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 04/11] Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs()
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (2 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21  1:41 ` [RFC 05/11] Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header Boqun Feng
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

Both the base_*_gpa should use the guest page number in Hyper-V page, so
use HV_HYP_PAGE instead of PAGE.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hv/hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 41b2ee06cc2f..e4f50dcdc46c 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -167,7 +167,7 @@ void hv_synic_enable_regs(unsigned int cpu)
 	hv_get_simp(simp.as_uint64);
 	simp.simp_enabled = 1;
 	simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
-		>> PAGE_SHIFT;
+		>> HV_HYP_PAGE_SHIFT;
 
 	hv_set_simp(simp.as_uint64);
 
@@ -175,7 +175,7 @@ void hv_synic_enable_regs(unsigned int cpu)
 	hv_get_siefp(siefp.as_uint64);
 	siefp.siefp_enabled = 1;
 	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
-		>> PAGE_SHIFT;
+		>> HV_HYP_PAGE_SHIFT;
 
 	hv_set_siefp(siefp.as_uint64);
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 05/11] Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (3 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 04/11] Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs() Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21  1:41 ` [RFC 06/11] hv: hyperv.h: Introduce some hvpfn helper functions Boqun Feng
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

There will be more places other than vmbus where we need to calculate
the Hyper-V page PFN from a virtual address, so move virt_to_hvpfn() to
hyperv generic header.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hv/channel.c   | 13 -------------
 include/linux/hyperv.h | 15 +++++++++++++++
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 13e98ac2a00f..ac7039fe2f29 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -21,19 +21,6 @@
 
 #include "hyperv_vmbus.h"
 
-static unsigned long virt_to_hvpfn(void *addr)
-{
-	phys_addr_t paddr;
-
-	if (is_vmalloc_addr(addr))
-		paddr = page_to_phys(vmalloc_to_page(addr)) +
-					 offset_in_page(addr);
-	else
-		paddr = __pa(addr);
-
-	return  paddr >> HV_HYP_PAGE_SHIFT;
-}
-
 /*
  * hv_gpadl_size - Return the real size of a gpadl, the size that Hyper-V uses
  *
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 663f0a016237..eda8e5f9a49d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -23,6 +23,8 @@
 #include <linux/mod_devicetable.h>
 #include <linux/interrupt.h>
 #include <linux/reciprocal_div.h>
+#include <asm/memory.h>
+#include <asm/hyperv-tlfs.h>
 
 #define MAX_PAGE_BUFFER_COUNT				32
 #define MAX_MULTIPAGE_BUFFER_COUNT			32 /* 128K */
@@ -1688,4 +1690,17 @@ struct hyperv_pci_block_ops {
 
 extern struct hyperv_pci_block_ops hvpci_block_ops;
 
+static inline unsigned long virt_to_hvpfn(void *addr)
+{
+	phys_addr_t paddr;
+
+	if (is_vmalloc_addr(addr))
+		paddr = page_to_phys(vmalloc_to_page(addr)) +
+				     offset_in_page(addr);
+	else
+		paddr = __pa(addr);
+
+	return  paddr >> HV_HYP_PAGE_SHIFT;
+}
+
 #endif /* _HYPERV_H */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 06/11] hv: hyperv.h: Introduce some hvpfn helper functions
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (4 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 05/11] Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21  1:41 ` [RFC 07/11] hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication Boqun Feng
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

When a guest communicate with the hypervisor, it must use HV_HYP_PAGE to
calculate PFN, so introduce a few hvpfn helper functions as the
counterpart of the page helper functions. This is the preparation for
supporting guest whose PAGE_SIZE is not 4k.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/hyperv.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index eda8e5f9a49d..b34d94b1f659 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1703,4 +1703,8 @@ static inline unsigned long virt_to_hvpfn(void *addr)
 	return  paddr >> HV_HYP_PAGE_SHIFT;
 }
 
+#define offset_in_hvpage(ptr)	((unsigned long)(ptr) & ~HV_HYP_PAGE_MASK)
+#define HVPFN_UP(x)	(((x) + HV_HYP_PAGE_SIZE-1) >> HV_HYP_PAGE_SHIFT)
+#define page_to_hvpfn(page)	((page_to_pfn(page) << PAGE_SHIFT) >> HV_HYP_PAGE_SHIFT)
+
 #endif /* _HYPERV_H */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 07/11] hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (5 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 06/11] hv: hyperv.h: Introduce some hvpfn helper functions Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21  1:41 ` [RFC 08/11] Input: hyperv-keyboard: Make ringbuffer at least take two pages Boqun Feng
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

When communicating with Hyper-V, HV_HYP_PAGE_SIZE should be used since
that's the page size used by Hyper-V and Hyper-V expects all
page-related data using the unit of HY_HYP_PAGE_SIZE, for example, the
"pfn" in hv_page_buffer is actually the HV_HYP_PAGE (i.e. the Hyper-V
page) number.

In order to support guest whose page size is not 4k, we need to make
hv_netvsc always use HV_HYP_PAGE_SIZE for Hyper-V communication.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/net/hyperv/netvsc.c       |  2 +-
 drivers/net/hyperv/netvsc_drv.c   | 46 +++++++++++++++----------------
 drivers/net/hyperv/rndis_filter.c | 12 ++++----
 3 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index ca68aa1df801..02877b560e4d 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -791,7 +791,7 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
 	}
 
 	for (i = 0; i < page_count; i++) {
-		char *src = phys_to_virt(pb[i].pfn << PAGE_SHIFT);
+		char *src = phys_to_virt(pb[i].pfn << HV_HYP_PAGE_SHIFT);
 		u32 offset = pb[i].offset;
 		u32 len = pb[i].len;
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index ebcfbae05690..9d9cbaed5441 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -373,32 +373,29 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
 	return txq;
 }
 
-static u32 fill_pg_buf(struct page *page, u32 offset, u32 len,
+static u32 fill_pg_buf(unsigned long hvpfn, u32 offset, u32 len,
 		       struct hv_page_buffer *pb)
 {
 	int j = 0;
 
-	/* Deal with compound pages by ignoring unused part
-	 * of the page.
-	 */
-	page += (offset >> PAGE_SHIFT);
-	offset &= ~PAGE_MASK;
+	hvpfn += offset >> HV_HYP_PAGE_SHIFT;
+	offset = offset & ~HV_HYP_PAGE_MASK;
 
 	while (len > 0) {
 		unsigned long bytes;
 
-		bytes = PAGE_SIZE - offset;
+		bytes = HV_HYP_PAGE_SIZE - offset;
 		if (bytes > len)
 			bytes = len;
-		pb[j].pfn = page_to_pfn(page);
+		pb[j].pfn = hvpfn;
 		pb[j].offset = offset;
 		pb[j].len = bytes;
 
 		offset += bytes;
 		len -= bytes;
 
-		if (offset == PAGE_SIZE && len) {
-			page++;
+		if (offset == HV_HYP_PAGE_SIZE && len) {
+			hvpfn++;
 			offset = 0;
 			j++;
 		}
@@ -421,23 +418,26 @@ static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
 	 * 2. skb linear data
 	 * 3. skb fragment data
 	 */
-	slots_used += fill_pg_buf(virt_to_page(hdr),
-				  offset_in_page(hdr),
-				  len, &pb[slots_used]);
+	slots_used += fill_pg_buf(virt_to_hvpfn(hdr),
+				  offset_in_hvpage(hdr),
+				  len,
+				  &pb[slots_used]);
 
 	packet->rmsg_size = len;
 	packet->rmsg_pgcnt = slots_used;
 
-	slots_used += fill_pg_buf(virt_to_page(data),
-				offset_in_page(data),
-				skb_headlen(skb), &pb[slots_used]);
+	slots_used += fill_pg_buf(virt_to_hvpfn(data),
+				  offset_in_hvpage(data),
+				  skb_headlen(skb),
+				  &pb[slots_used]);
 
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
 
-		slots_used += fill_pg_buf(skb_frag_page(frag),
-					skb_frag_off(frag),
-					skb_frag_size(frag), &pb[slots_used]);
+		slots_used += fill_pg_buf(page_to_hvpfn(skb_frag_page(frag)),
+					  skb_frag_off(frag),
+					  skb_frag_size(frag),
+					  &pb[slots_used]);
 	}
 	return slots_used;
 }
@@ -453,8 +453,8 @@ static int count_skb_frag_slots(struct sk_buff *skb)
 		unsigned long offset = skb_frag_off(frag);
 
 		/* Skip unused frames from start of page */
-		offset &= ~PAGE_MASK;
-		pages += PFN_UP(offset + size);
+		offset &= ~HV_HYP_PAGE_MASK;
+		pages += HVPFN_UP(offset + size);
 	}
 	return pages;
 }
@@ -462,12 +462,12 @@ static int count_skb_frag_slots(struct sk_buff *skb)
 static int netvsc_get_slots(struct sk_buff *skb)
 {
 	char *data = skb->data;
-	unsigned int offset = offset_in_page(data);
+	unsigned int offset = offset_in_hvpage(data);
 	unsigned int len = skb_headlen(skb);
 	int slots;
 	int frag_slots;
 
-	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+	slots = DIV_ROUND_UP(offset + len, HV_HYP_PAGE_SIZE);
 	frag_slots = count_skb_frag_slots(skb);
 	return slots + frag_slots;
 }
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index b81ceba38218..acc8d957bbfc 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -25,7 +25,7 @@
 
 static void rndis_set_multicast(struct work_struct *w);
 
-#define RNDIS_EXT_LEN PAGE_SIZE
+#define RNDIS_EXT_LEN HV_HYP_PAGE_SIZE
 struct rndis_request {
 	struct list_head list_ent;
 	struct completion  wait_event;
@@ -215,18 +215,18 @@ static int rndis_filter_send_request(struct rndis_device *dev,
 	packet->page_buf_cnt = 1;
 
 	pb[0].pfn = virt_to_phys(&req->request_msg) >>
-					PAGE_SHIFT;
+					HV_HYP_PAGE_SHIFT;
 	pb[0].len = req->request_msg.msg_len;
 	pb[0].offset =
-		(unsigned long)&req->request_msg & (PAGE_SIZE - 1);
+		(unsigned long)&req->request_msg & (HV_HYP_PAGE_SIZE - 1);
 
 	/* Add one page_buf when request_msg crossing page boundary */
-	if (pb[0].offset + pb[0].len > PAGE_SIZE) {
+	if (pb[0].offset + pb[0].len > HV_HYP_PAGE_SIZE) {
 		packet->page_buf_cnt++;
-		pb[0].len = PAGE_SIZE -
+		pb[0].len = HV_HYP_PAGE_SIZE -
 			pb[0].offset;
 		pb[1].pfn = virt_to_phys((void *)&req->request_msg
-			+ pb[0].len) >> PAGE_SHIFT;
+			+ pb[0].len) >> HV_HYP_PAGE_SHIFT;
 		pb[1].offset = 0;
 		pb[1].len = req->request_msg.msg_len -
 			pb[0].len;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 08/11] Input: hyperv-keyboard: Make ringbuffer at least take two pages
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (6 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 07/11] hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21  1:41 ` [RFC 09/11] HID: hyperv: " Boqun Feng
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

When PAGE_SIZE > HV_HYP_PAGE_SIZE, we need the ringbuffer size to be at
least 2 * PAGE_SIZE: one page for the header and at least one page of
the data part (because of the alignment requirement for double mapping).

So make sure the ringbuffer sizes to be at least 2 * PAGE_SIZE when
using vmbus_open() to establish the vmbus connection.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/input/serio/hyperv-keyboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
index df4e9f6f4529..77ba57ba2691 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -75,8 +75,8 @@ struct synth_kbd_keystroke {
 
 #define HK_MAXIMUM_MESSAGE_SIZE 256
 
-#define KBD_VSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE		max(40 * 1024, 2 * PAGE_SIZE)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE		max(40 * 1024, 2 * PAGE_SIZE)
 
 #define XTKBD_EMUL0     0xe0
 #define XTKBD_EMUL1     0xe1
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 09/11] HID: hyperv: Make ringbuffer at least take two pages
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (7 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 08/11] Input: hyperv-keyboard: Make ringbuffer at least take two pages Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-22 23:36   ` Michael Kelley
  2020-07-21  1:41 ` [RFC 10/11] Driver: hv: util: " Boqun Feng
  2020-07-21  1:41 ` [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K Boqun Feng
  10 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

When PAGE_SIZE > HV_HYP_PAGE_SIZE, we need the ringbuffer size to be at
least 2 * PAGE_SIZE: one page for the header and at least one page of
the data part (because of the alignment requirement for double mapping).

So make sure the ringbuffer sizes to be at least 2 * PAGE_SIZE when
using vmbus_open() to establish the vmbus connection.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hid/hid-hyperv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 0b6ee1dee625..36c5e157c691 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -104,8 +104,8 @@ struct synthhid_input_report {
 
 #pragma pack(pop)
 
-#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE		(128 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE		(128 * 1024)
 
 
 enum pipe_prot_msg_type {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 10/11] Driver: hv: util: Make ringbuffer at least take two pages
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (8 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 09/11] HID: hyperv: " Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-21  1:41 ` [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K Boqun Feng
  10 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

When PAGE_SIZE > HV_HYP_PAGE_SIZE, we need the ringbuffer size to be at
least 2 * PAGE_SIZE: one page for the header and at least one page of
the data part (because of the alignment requirement for double mapping).

So make sure the ringbuffer sizes to be at least 2 * PAGE_SIZE when
using vmbus_open() to establish the vmbus connection.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/hv/hv_util.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 92ee0fe4c919..73a77bead2be 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -461,6 +461,14 @@ static void heartbeat_onchannelcallback(void *context)
 	}
 }
 
+/*
+ * The size of each ring should be at least 2 * PAGE_SIZE, because we need one
+ * page for the header and at least another page (because of the alignment
+ * requirement for double mapping) for data part.
+ */
+#define HV_UTIL_RING_SEND_SIZE max(4 * HV_HYP_PAGE_SIZE, 2 * PAGE_SIZE)
+#define HV_UTIL_RING_RECV_SIZE max(4 * HV_HYP_PAGE_SIZE, 2 * PAGE_SIZE)
+
 static int util_probe(struct hv_device *dev,
 			const struct hv_vmbus_device_id *dev_id)
 {
@@ -491,8 +499,8 @@ static int util_probe(struct hv_device *dev,
 
 	hv_set_drvdata(dev, srv);
 
-	ret = vmbus_open(dev->channel, 4 * HV_HYP_PAGE_SIZE,
-			 4 * HV_HYP_PAGE_SIZE, NULL, 0, srv->util_cb,
+	ret = vmbus_open(dev->channel, HV_UTIL_RING_SEND_SIZE,
+			 HV_UTIL_RING_RECV_SIZE, NULL, 0, srv->util_cb,
 			 dev->channel);
 	if (ret)
 		goto error;
@@ -551,8 +559,8 @@ static int util_resume(struct hv_device *dev)
 			return ret;
 	}
 
-	ret = vmbus_open(dev->channel, 4 * HV_HYP_PAGE_SIZE,
-			 4 * HV_HYP_PAGE_SIZE, NULL, 0, srv->util_cb,
+	ret = vmbus_open(dev->channel, HV_UTIL_RING_SEND_SIZE,
+			 HV_UTIL_RING_RECV_SIZE, NULL, 0, srv->util_cb,
 			 dev->channel);
 	return ret;
 }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K
  2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
                   ` (9 preceding siblings ...)
  2020-07-21  1:41 ` [RFC 10/11] Driver: hv: util: " Boqun Feng
@ 2020-07-21  1:41 ` Boqun Feng
  2020-07-23  0:13   ` Michael Kelley
  10 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2020-07-21  1:41 UTC (permalink / raw)
  To: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley, Boqun Feng

Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
as the unit for page related data. For storvsc, the data is
vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
into Hyper-V pages in vmbus_packet_mpb_array.

This patch does the conversion by dividing pages in sglist into Hyper-V
pages, offset and indexes in vmbus_packet_mpb_array are recalculated
accordingly.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/scsi/storvsc_drv.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index fb41636519ee..c54d25f279bc 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1561,7 +1561,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	struct hv_host_device *host_dev = shost_priv(host);
 	struct hv_device *dev = host_dev->dev;
 	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-	int i;
+	int i, j, k;
 	struct scatterlist *sgl;
 	unsigned int sg_count = 0;
 	struct vmscsi_request *vm_srb;
@@ -1569,6 +1569,8 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	struct vmbus_packet_mpb_array  *payload;
 	u32 payload_sz;
 	u32 length;
+	int subpage_idx = 0;
+	unsigned int hvpg_count = 0;
 
 	if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
 		/*
@@ -1643,23 +1645,36 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	payload_sz = sizeof(cmd_request->mpb);
 
 	if (sg_count) {
-		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
+		hvpg_count = sg_count * (PAGE_SIZE / HV_HYP_PAGE_SIZE);
+		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
-			payload_sz = (sg_count * sizeof(u64) +
+			payload_sz = (hvpg_count * sizeof(u64) +
 				      sizeof(struct vmbus_packet_mpb_array));
 			payload = kzalloc(payload_sz, GFP_ATOMIC);
 			if (!payload)
 				return SCSI_MLQUEUE_DEVICE_BUSY;
 		}
 
+		/*
+		 * sgl is a list of PAGEs, and payload->range.pfn_array
+		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
+		 * page size that Hyper-V uses, so here we need to divide PAGEs
+		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
+		 */
 		payload->range.len = length;
-		payload->range.offset = sgl[0].offset;
+		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
+		subpage_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
 
 		cur_sgl = sgl;
+		k = 0;
 		for (i = 0; i < sg_count; i++) {
-			payload->range.pfn_array[i] =
-				page_to_pfn(sg_page((cur_sgl)));
+			for (j = subpage_idx; j < (PAGE_SIZE / HV_HYP_PAGE_SIZE); j++) {
+				payload->range.pfn_array[k] =
+					page_to_hvpfn(sg_page((cur_sgl))) + j;
+				k++;
+			}
 			cur_sgl = sg_next(cur_sgl);
+			subpage_idx = 0;
 		}
 	}
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
  2020-07-21  1:41 ` [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl Boqun Feng
@ 2020-07-21 15:22   ` Wei Liu
  2020-07-22 23:20     ` Boqun Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Wei Liu @ 2020-07-21 15:22 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley

On Tue, Jul 21, 2020 at 09:41:25AM +0800, Boqun Feng wrote:
> Since the hypervisor always uses 4K as its page size, the size of PFNs
> used for gpadl should be HV_HYP_PAGE_SIZE rather than PAGE_SIZE, so
> adjust this accordingly as the preparation for supporting 16K/64K page
> size guests.

It may be worth calling out there is no change on x86 because
HV_HYP_PAGE_SHIFT and PAGE_SHIFT are of the same value there.

Wei.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 02/11] Drivers: hv: vmbus: Move __vmbus_open()
  2020-07-21  1:41 ` [RFC 02/11] Drivers: hv: vmbus: Move __vmbus_open() Boqun Feng
@ 2020-07-21 15:23   ` Wei Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Liu @ 2020-07-21 15:23 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen, Michael Kelley

On Tue, Jul 21, 2020 at 09:41:26AM +0800, Boqun Feng wrote:
> Pure function movement, no functional changes. The move is made, because
> in a later change, __vmbus_open() will rely on some static functions
> afterwards, so we sperate the move and the modification of
> __vmbus_open() in two patches to make it easy to review.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Reviewed-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
  2020-07-21 15:22   ` Wei Liu
@ 2020-07-22 23:20     ` Boqun Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-22 23:20 UTC (permalink / raw)
  To: Wei Liu
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Jiri Kosina,
	Benjamin Tissoires, Dmitry Torokhov, David S. Miller,
	Jakub Kicinski, James E.J. Bottomley, Martin K. Petersen,
	Michael Kelley

On Tue, Jul 21, 2020 at 03:22:18PM +0000, Wei Liu wrote:
> On Tue, Jul 21, 2020 at 09:41:25AM +0800, Boqun Feng wrote:
> > Since the hypervisor always uses 4K as its page size, the size of PFNs
> > used for gpadl should be HV_HYP_PAGE_SIZE rather than PAGE_SIZE, so
> > adjust this accordingly as the preparation for supporting 16K/64K page
> > size guests.
> 
> It may be worth calling out there is no change on x86 because
> HV_HYP_PAGE_SHIFT and PAGE_SHIFT are of the same value there.
> 

Sure, I will call it out in the commit log of the next version, thanks!

Regards,
Boqun

> Wei.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL
  2020-07-21  1:41 ` [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL Boqun Feng
@ 2020-07-22 23:25   ` Michael Kelley
  2020-07-22 23:43     ` Boqun Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2020-07-22 23:25 UTC (permalink / raw)
  To: Boqun Feng, linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:41 PM
> 
> This patch introduces two types of GPADL: HV_GPADL_{BUFFER, RING}. The
> types of GPADL are purely the concept in the guest, IOW the hypervisor
> treat them as the same.
> 
> The reason of introducing the types of GPADL is to support guests whose
> page size is not 4k (the page size of Hyper-V hypervisor). In these
> guests, both the headers and the data parts of the ringbuffers need to
> be aligned to the PAGE_SIZE, because 1) some of the ringbuffers will be
> mapped into userspace and 2) we use "double mapping" mechanism to
> support fast wrap-around, and "double mapping" relies on ringbuffers
> being page-aligned. However, the Hyper-V hypervisor only uses 4k
> (HV_HYP_PAGE_SIZE) headers. Our solution to this is that we always make
> the headers of ringbuffers take one guest page and when GPADL is
> established between the guest and hypervisor, the only first 4k of
> header is used. To handle this special case, we need the types of GPADL
> to differ different guest memory usage for GPADL.
> 
> Type enum is introduced along with several general interfaces to
> describe the differences between normal buffer GPADL and ringbuffer
> GPADL.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  drivers/hv/channel.c   | 140 +++++++++++++++++++++++++++++++++++------
>  include/linux/hyperv.h |  44 ++++++++++++-
>  2 files changed, 164 insertions(+), 20 deletions(-)

[snip]

> 
> 
> @@ -437,7 +528,17 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	open_msg->openid = newchannel->offermsg.child_relid;
>  	open_msg->child_relid = newchannel->offermsg.child_relid;
>  	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
> -	open_msg->downstream_ringbuffer_pageoffset = newchannel-
> >ringbuffer_send_offset;
> +	/*
> +	 * The unit of ->downstream_ringbuffer_pageoffset is HV_HYP_PAGE and
> +	 * the unit of ->ringbuffer_send_offset is PAGE, so here we first
> +	 * calculate it into bytes and then convert into HV_HYP_PAGE. Also
> +	 * ->ringbuffer_send_offset is the offset in guest, while
> +	 * ->downstream_ringbuffer_pageoffset is the offset in gpadl (i.e. in
> +	 * hypervisor), so a (PAGE_SIZE - HV_HYP_PAGE_SIZE) gap need to be
> +	 * skipped.
> +	 */
> +	open_msg->downstream_ringbuffer_pageoffset =
> +		((newchannel->ringbuffer_send_offset << PAGE_SHIFT) - (PAGE_SIZE -
> HV_HYP_PAGE_SIZE)) >> HV_HYP_PAGE_SHIFT;

I couldn't find that the "downstream_ringbuffer_pageoffset" field
is used anywhere.  Can it just be deleted entirely instead of having
this really messy calculation?

>  	open_msg->target_vp = newchannel->target_vp;
> 
>  	if (userdatalen)
> @@ -497,6 +598,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	return err;
>  }
> 
> +

Spurious add of a blank line?

>  /*
>   * vmbus_connect_ring - Open the channel but reuse ring buffer
>   */
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 692c89ccf5df..663f0a016237 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -29,6 +29,48 @@
> 
>  #pragma pack(push, 1)
> 
> +/*
> + * Types for GPADL, decides is how GPADL header is created.
> + *
> + * It doesn't make much difference between BUFFER and RING if PAGE_SIZE is the
> + * same as HV_HYP_PAGE_SIZE.
> + *
> + * If PAGE_SIZE is bigger than HV_HYP_PAGE_SIZE, the headers of ring buffers
> + * will be of PAGE_SIZE, however, only the first HV_HYP_PAGE will be put
> + * into gpadl, therefore the number for HV_HYP_PAGE and the indexes of each
> + * HV_HYP_PAGE will be different between different types of GPADL, for example
> + * if PAGE_SIZE is 64K:
> + *
> + * BUFFER:
> + *
> + * gva:    |--       64k      --|--       64k      --| ... |
> + * gpa:    | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k |
> + * index:  0    1    2     15   16   17   18 .. 31   32 ...
> + *         |    |    ...   |    |    |   ...    |   ...
> + *         v    V          V    V    V          V
> + * gpadl:  | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k | ... |
> + * index:  0    1    2 ... 15   16   17   18 .. 31   32 ...
> + *
> + * RING:
> + *
> + *         | header  |           data           | header  |     data      |
> + * gva:    |-- 64k --|--       64k      --| ... |-- 64k --|-- 64k --| ... |
> + * gpa:    | 4k | .. | 4k | 4k | ... | 4k | ... | 4k | .. | 4k | .. | ... |
> + * index:  0    1    16   17   18    31   ...   n   n+1  n+16 ...         2n
> + *         |         /    /          /          |         /               /
> + *         |        /    /          /           |        /               /
> + *         |       /    /   ...    /    ...     |       /      ...      /
> + *         |      /    /          /             |      /               /
> + *         |     /    /          /              |     /               /
> + *         V    V    V          V               V    V               v
> + * gpadl:  | 4k | 4k |   ...    |    ...        | 4k | 4k |  ...     |
> + * index:  0    1    2   ...    16   ...       n-15 n-14 n-13  ...  2n-30
> + */
> +enum hv_gpadl_type {
> +	HV_GPADL_BUFFER,
> +	HV_GPADL_RING
> +};
> +
>  /* Single-page buffer */
>  struct hv_page_buffer {
>  	u32 len;
> @@ -111,7 +153,7 @@ struct hv_ring_buffer {
>  	} feature_bits;
> 
>  	/* Pad it to PAGE_SIZE so that data starts on page boundary */
> -	u8	reserved2[4028];
> +	u8	reserved2[PAGE_SIZE - 68];
> 
>  	/*
>  	 * Ring data starts here + RingDataStartOffset
> --
> 2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC 09/11] HID: hyperv: Make ringbuffer at least take two pages
  2020-07-21  1:41 ` [RFC 09/11] HID: hyperv: " Boqun Feng
@ 2020-07-22 23:36   ` Michael Kelley
  2020-07-23  1:28     ` boqun.feng
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2020-07-22 23:36 UTC (permalink / raw)
  To: Boqun Feng, linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:42 PM
> 
> When PAGE_SIZE > HV_HYP_PAGE_SIZE, we need the ringbuffer size to be at
> least 2 * PAGE_SIZE: one page for the header and at least one page of
> the data part (because of the alignment requirement for double mapping).
> 
> So make sure the ringbuffer sizes to be at least 2 * PAGE_SIZE when
> using vmbus_open() to establish the vmbus connection.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  drivers/hid/hid-hyperv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
> index 0b6ee1dee625..36c5e157c691 100644
> --- a/drivers/hid/hid-hyperv.c
> +++ b/drivers/hid/hid-hyperv.c
> @@ -104,8 +104,8 @@ struct synthhid_input_report {
> 
>  #pragma pack(pop)
> 
> -#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
> -#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
> +#define INPUTVSC_SEND_RING_BUFFER_SIZE		(128 * 1024)
> +#define INPUTVSC_RECV_RING_BUFFER_SIZE		(128 * 1024)

Use max(40 * 1024, 2 * PAGE_SIZE) like in patch 8 of the series?

> 
> 
>  enum pipe_prot_msg_type {
> --
> 2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL
  2020-07-22 23:25   ` Michael Kelley
@ 2020-07-22 23:43     ` Boqun Feng
  2020-07-22 23:56       ` Michael Kelley
  0 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2020-07-22 23:43 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

On Wed, Jul 22, 2020 at 11:25:18PM +0000, Michael Kelley wrote:
> From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:41 PM
> > 
> > This patch introduces two types of GPADL: HV_GPADL_{BUFFER, RING}. The
> > types of GPADL are purely the concept in the guest, IOW the hypervisor
> > treat them as the same.
> > 
> > The reason of introducing the types of GPADL is to support guests whose
> > page size is not 4k (the page size of Hyper-V hypervisor). In these
> > guests, both the headers and the data parts of the ringbuffers need to
> > be aligned to the PAGE_SIZE, because 1) some of the ringbuffers will be
> > mapped into userspace and 2) we use "double mapping" mechanism to
> > support fast wrap-around, and "double mapping" relies on ringbuffers
> > being page-aligned. However, the Hyper-V hypervisor only uses 4k
> > (HV_HYP_PAGE_SIZE) headers. Our solution to this is that we always make
> > the headers of ringbuffers take one guest page and when GPADL is
> > established between the guest and hypervisor, the only first 4k of
> > header is used. To handle this special case, we need the types of GPADL
> > to differ different guest memory usage for GPADL.
> > 
> > Type enum is introduced along with several general interfaces to
> > describe the differences between normal buffer GPADL and ringbuffer
> > GPADL.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  drivers/hv/channel.c   | 140 +++++++++++++++++++++++++++++++++++------
> >  include/linux/hyperv.h |  44 ++++++++++++-
> >  2 files changed, 164 insertions(+), 20 deletions(-)
> 
> [snip]
> 
> > 
> > 
> > @@ -437,7 +528,17 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
> >  	open_msg->openid = newchannel->offermsg.child_relid;
> >  	open_msg->child_relid = newchannel->offermsg.child_relid;
> >  	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
> > -	open_msg->downstream_ringbuffer_pageoffset = newchannel-
> > >ringbuffer_send_offset;
> > +	/*
> > +	 * The unit of ->downstream_ringbuffer_pageoffset is HV_HYP_PAGE and
> > +	 * the unit of ->ringbuffer_send_offset is PAGE, so here we first
> > +	 * calculate it into bytes and then convert into HV_HYP_PAGE. Also
> > +	 * ->ringbuffer_send_offset is the offset in guest, while
> > +	 * ->downstream_ringbuffer_pageoffset is the offset in gpadl (i.e. in
> > +	 * hypervisor), so a (PAGE_SIZE - HV_HYP_PAGE_SIZE) gap need to be
> > +	 * skipped.
> > +	 */
> > +	open_msg->downstream_ringbuffer_pageoffset =
> > +		((newchannel->ringbuffer_send_offset << PAGE_SHIFT) - (PAGE_SIZE -
> > HV_HYP_PAGE_SIZE)) >> HV_HYP_PAGE_SHIFT;
> 
> I couldn't find that the "downstream_ringbuffer_pageoffset" field
> is used anywhere.  Can it just be deleted entirely instead of having
> this really messy calculation?
> 

This field is part of struct vmbus_channel_open_channel, which means
guest-hypervisor communication protocal requires us to set the field,
IIUC. So I don't think we can delete it.

To deal with the messy calculation, I do realize there is a similar
calculation in hv_gpadl_hvpfn() too, so in the next version, I will
add a new helper to do this "send offset in guest virtual address to
send offset in GPADL calculation", and use it here and in
hv_gpadl_hvpfn(). Thoughts?

> >  	open_msg->target_vp = newchannel->target_vp;
> > 
> >  	if (userdatalen)
> > @@ -497,6 +598,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
> >  	return err;
> >  }
> > 
> > +
> 
> Spurious add of a blank line?
> 

Yeah, I will fix this, thanks!

Regards,
Boqun

> >  /*
> >   * vmbus_connect_ring - Open the channel but reuse ring buffer
> >   */
> > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> > index 692c89ccf5df..663f0a016237 100644
> > --- a/include/linux/hyperv.h
> > +++ b/include/linux/hyperv.h
> > @@ -29,6 +29,48 @@
> > 
> >  #pragma pack(push, 1)
> > 
> > +/*
> > + * Types for GPADL, decides is how GPADL header is created.
> > + *
> > + * It doesn't make much difference between BUFFER and RING if PAGE_SIZE is the
> > + * same as HV_HYP_PAGE_SIZE.
> > + *
> > + * If PAGE_SIZE is bigger than HV_HYP_PAGE_SIZE, the headers of ring buffers
> > + * will be of PAGE_SIZE, however, only the first HV_HYP_PAGE will be put
> > + * into gpadl, therefore the number for HV_HYP_PAGE and the indexes of each
> > + * HV_HYP_PAGE will be different between different types of GPADL, for example
> > + * if PAGE_SIZE is 64K:
> > + *
> > + * BUFFER:
> > + *
> > + * gva:    |--       64k      --|--       64k      --| ... |
> > + * gpa:    | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k |
> > + * index:  0    1    2     15   16   17   18 .. 31   32 ...
> > + *         |    |    ...   |    |    |   ...    |   ...
> > + *         v    V          V    V    V          V
> > + * gpadl:  | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k | ... |
> > + * index:  0    1    2 ... 15   16   17   18 .. 31   32 ...
> > + *
> > + * RING:
> > + *
> > + *         | header  |           data           | header  |     data      |
> > + * gva:    |-- 64k --|--       64k      --| ... |-- 64k --|-- 64k --| ... |
> > + * gpa:    | 4k | .. | 4k | 4k | ... | 4k | ... | 4k | .. | 4k | .. | ... |
> > + * index:  0    1    16   17   18    31   ...   n   n+1  n+16 ...         2n
> > + *         |         /    /          /          |         /               /
> > + *         |        /    /          /           |        /               /
> > + *         |       /    /   ...    /    ...     |       /      ...      /
> > + *         |      /    /          /             |      /               /
> > + *         |     /    /          /              |     /               /
> > + *         V    V    V          V               V    V               v
> > + * gpadl:  | 4k | 4k |   ...    |    ...        | 4k | 4k |  ...     |
> > + * index:  0    1    2   ...    16   ...       n-15 n-14 n-13  ...  2n-30
> > + */
> > +enum hv_gpadl_type {
> > +	HV_GPADL_BUFFER,
> > +	HV_GPADL_RING
> > +};
> > +
> >  /* Single-page buffer */
> >  struct hv_page_buffer {
> >  	u32 len;
> > @@ -111,7 +153,7 @@ struct hv_ring_buffer {
> >  	} feature_bits;
> > 
> >  	/* Pad it to PAGE_SIZE so that data starts on page boundary */
> > -	u8	reserved2[4028];
> > +	u8	reserved2[PAGE_SIZE - 68];
> > 
> >  	/*
> >  	 * Ring data starts here + RingDataStartOffset
> > --
> > 2.27.0
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL
  2020-07-22 23:43     ` Boqun Feng
@ 2020-07-22 23:56       ` Michael Kelley
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2020-07-22 23:56 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

From: Boqun Feng <boqun.feng@gmail.com> Sent: Wednesday, July 22, 2020 4:43 PM
> 
> On Wed, Jul 22, 2020 at 11:25:18PM +0000, Michael Kelley wrote:
> > From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:41 PM
> > >
> > > This patch introduces two types of GPADL: HV_GPADL_{BUFFER, RING}. The
> > > types of GPADL are purely the concept in the guest, IOW the hypervisor
> > > treat them as the same.
> > >
> > > The reason of introducing the types of GPADL is to support guests whose
> > > page size is not 4k (the page size of Hyper-V hypervisor). In these
> > > guests, both the headers and the data parts of the ringbuffers need to
> > > be aligned to the PAGE_SIZE, because 1) some of the ringbuffers will be
> > > mapped into userspace and 2) we use "double mapping" mechanism to
> > > support fast wrap-around, and "double mapping" relies on ringbuffers
> > > being page-aligned. However, the Hyper-V hypervisor only uses 4k
> > > (HV_HYP_PAGE_SIZE) headers. Our solution to this is that we always make
> > > the headers of ringbuffers take one guest page and when GPADL is
> > > established between the guest and hypervisor, the only first 4k of
> > > header is used. To handle this special case, we need the types of GPADL
> > > to differ different guest memory usage for GPADL.
> > >
> > > Type enum is introduced along with several general interfaces to
> > > describe the differences between normal buffer GPADL and ringbuffer
> > > GPADL.
> > >
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > ---
> > >  drivers/hv/channel.c   | 140 +++++++++++++++++++++++++++++++++++------
> > >  include/linux/hyperv.h |  44 ++++++++++++-
> > >  2 files changed, 164 insertions(+), 20 deletions(-)
> >
> > [snip]
> >
> > >
> > >
> > > @@ -437,7 +528,17 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
> > >  	open_msg->openid = newchannel->offermsg.child_relid;
> > >  	open_msg->child_relid = newchannel->offermsg.child_relid;
> > >  	open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
> > > -	open_msg->downstream_ringbuffer_pageoffset = newchannel-
> > > >ringbuffer_send_offset;
> > > +	/*
> > > +	 * The unit of ->downstream_ringbuffer_pageoffset is HV_HYP_PAGE and
> > > +	 * the unit of ->ringbuffer_send_offset is PAGE, so here we first
> > > +	 * calculate it into bytes and then convert into HV_HYP_PAGE. Also
> > > +	 * ->ringbuffer_send_offset is the offset in guest, while
> > > +	 * ->downstream_ringbuffer_pageoffset is the offset in gpadl (i.e. in
> > > +	 * hypervisor), so a (PAGE_SIZE - HV_HYP_PAGE_SIZE) gap need to be
> > > +	 * skipped.
> > > +	 */
> > > +	open_msg->downstream_ringbuffer_pageoffset =
> > > +		((newchannel->ringbuffer_send_offset << PAGE_SHIFT) - (PAGE_SIZE -
> > > HV_HYP_PAGE_SIZE)) >> HV_HYP_PAGE_SHIFT;
> >
> > I couldn't find that the "downstream_ringbuffer_pageoffset" field
> > is used anywhere.  Can it just be deleted entirely instead of having
> > this really messy calculation?
> >
> 
> This field is part of struct vmbus_channel_open_channel, which means
> guest-hypervisor communication protocal requires us to set the field,
> IIUC. So I don't think we can delete it.

Indeed, you are right.  I mis-read it as a field in struct vmbus_channel,
but that's not the case.  Thanks.

> 
> To deal with the messy calculation, I do realize there is a similar
> calculation in hv_gpadl_hvpfn() too, so in the next version, I will
> add a new helper to do this "send offset in guest virtual address to
> send offset in GPADL calculation", and use it here and in
> hv_gpadl_hvpfn(). Thoughts?

Yes, that helps.

> 
> > >  	open_msg->target_vp = newchannel->target_vp;
> > >
> > >  	if (userdatalen)
> > > @@ -497,6 +598,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
> > >  	return err;
> > >  }
> > >
> > > +
> >
> > Spurious add of a blank line?
> >
> 
> Yeah, I will fix this, thanks!
> 
> Regards,
> Boqun
> 
> > >  /*
> > >   * vmbus_connect_ring - Open the channel but reuse ring buffer
> > >   */
> > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> > > index 692c89ccf5df..663f0a016237 100644
> > > --- a/include/linux/hyperv.h
> > > +++ b/include/linux/hyperv.h
> > > @@ -29,6 +29,48 @@
> > >
> > >  #pragma pack(push, 1)
> > >
> > > +/*
> > > + * Types for GPADL, decides is how GPADL header is created.
> > > + *
> > > + * It doesn't make much difference between BUFFER and RING if PAGE_SIZE is the
> > > + * same as HV_HYP_PAGE_SIZE.
> > > + *
> > > + * If PAGE_SIZE is bigger than HV_HYP_PAGE_SIZE, the headers of ring buffers
> > > + * will be of PAGE_SIZE, however, only the first HV_HYP_PAGE will be put
> > > + * into gpadl, therefore the number for HV_HYP_PAGE and the indexes of each
> > > + * HV_HYP_PAGE will be different between different types of GPADL, for example
> > > + * if PAGE_SIZE is 64K:
> > > + *
> > > + * BUFFER:
> > > + *
> > > + * gva:    |--       64k      --|--       64k      --| ... |
> > > + * gpa:    | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k |
> > > + * index:  0    1    2     15   16   17   18 .. 31   32 ...
> > > + *         |    |    ...   |    |    |   ...    |   ...
> > > + *         v    V          V    V    V          V
> > > + * gpadl:  | 4k | 4k | ... | 4k | 4k | 4k | ... | 4k | ... |
> > > + * index:  0    1    2 ... 15   16   17   18 .. 31   32 ...
> > > + *
> > > + * RING:
> > > + *
> > > + *         | header  |           data           | header  |     data      |
> > > + * gva:    |-- 64k --|--       64k      --| ... |-- 64k --|-- 64k --| ... |
> > > + * gpa:    | 4k | .. | 4k | 4k | ... | 4k | ... | 4k | .. | 4k | .. | ... |
> > > + * index:  0    1    16   17   18    31   ...   n   n+1  n+16 ...         2n
> > > + *         |         /    /          /          |         /               /
> > > + *         |        /    /          /           |        /               /
> > > + *         |       /    /   ...    /    ...     |       /      ...      /
> > > + *         |      /    /          /             |      /               /
> > > + *         |     /    /          /              |     /               /
> > > + *         V    V    V          V               V    V               v
> > > + * gpadl:  | 4k | 4k |   ...    |    ...        | 4k | 4k |  ...     |
> > > + * index:  0    1    2   ...    16   ...       n-15 n-14 n-13  ...  2n-30
> > > + */
> > > +enum hv_gpadl_type {
> > > +	HV_GPADL_BUFFER,
> > > +	HV_GPADL_RING
> > > +};
> > > +
> > >  /* Single-page buffer */
> > >  struct hv_page_buffer {
> > >  	u32 len;
> > > @@ -111,7 +153,7 @@ struct hv_ring_buffer {
> > >  	} feature_bits;
> > >
> > >  	/* Pad it to PAGE_SIZE so that data starts on page boundary */
> > > -	u8	reserved2[4028];
> > > +	u8	reserved2[PAGE_SIZE - 68];
> > >
> > >  	/*
> > >  	 * Ring data starts here + RingDataStartOffset
> > > --
> > > 2.27.0
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K
  2020-07-21  1:41 ` [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K Boqun Feng
@ 2020-07-23  0:13   ` Michael Kelley
  2020-07-23  1:51     ` boqun.feng
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2020-07-23  0:13 UTC (permalink / raw)
  To: Boqun Feng, linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:42 PM
> 
> Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> as the unit for page related data. For storvsc, the data is
> vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> into Hyper-V pages in vmbus_packet_mpb_array.
> 
> This patch does the conversion by dividing pages in sglist into Hyper-V
> pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> accordingly.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  drivers/scsi/storvsc_drv.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index fb41636519ee..c54d25f279bc 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1561,7 +1561,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> scsi_cmnd *scmnd)
>  	struct hv_host_device *host_dev = shost_priv(host);
>  	struct hv_device *dev = host_dev->dev;
>  	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> -	int i;
> +	int i, j, k;
>  	struct scatterlist *sgl;
>  	unsigned int sg_count = 0;
>  	struct vmscsi_request *vm_srb;
> @@ -1569,6 +1569,8 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> scsi_cmnd *scmnd)
>  	struct vmbus_packet_mpb_array  *payload;
>  	u32 payload_sz;
>  	u32 length;
> +	int subpage_idx = 0;
> +	unsigned int hvpg_count = 0;
> 
>  	if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
>  		/*
> @@ -1643,23 +1645,36 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> scsi_cmnd *scmnd)
>  	payload_sz = sizeof(cmd_request->mpb);
> 
>  	if (sg_count) {
> -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> +		hvpg_count = sg_count * (PAGE_SIZE / HV_HYP_PAGE_SIZE);

The above calculation doesn't take into account the offset in the
first sglist or the overall length of the transfer, so the value of hvpg_count
could be quite a bit bigger than it needs to be.  For example, with a 64K
page size and an 8 Kbyte transfer size that starts at offset 60K in the
first page, hvpg_count will be 32 when it really only needs to be 2.

The nested loops below that populate the pfn_array take the 
offset into account when starting, so that's good.  But it will potentially
leave allocated entries unused.  Furthermore, the nested loops could
terminate early when enough Hyper-V size pages are mapped to PFNs
based on the length of the transfer, even if all of the last guest size
page has not been mapped to PFNs.  Like the offset at the beginning of
first guest size page in the sglist, there's potentially an unused portion
at the end of the last guest size page in the sglist.

> +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> -			payload_sz = (sg_count * sizeof(u64) +
> +			payload_sz = (hvpg_count * sizeof(u64) +
>  				      sizeof(struct vmbus_packet_mpb_array));
>  			payload = kzalloc(payload_sz, GFP_ATOMIC);
>  			if (!payload)
>  				return SCSI_MLQUEUE_DEVICE_BUSY;
>  		}
> 
> +		/*
> +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> +		 */
>  		payload->range.len = length;
> -		payload->range.offset = sgl[0].offset;
> +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> +		subpage_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> 
>  		cur_sgl = sgl;
> +		k = 0;
>  		for (i = 0; i < sg_count; i++) {
> -			payload->range.pfn_array[i] =
> -				page_to_pfn(sg_page((cur_sgl)));
> +			for (j = subpage_idx; j < (PAGE_SIZE / HV_HYP_PAGE_SIZE); j++) {

In the case where PAGE_SIZE == HV_HYP_PAGE_SIZE, would it help the compiler
eliminate the loop if local variable j is declared as unsigned?  In that case the test in the
for statement will always be false.

> +				payload->range.pfn_array[k] =
> +					page_to_hvpfn(sg_page((cur_sgl))) + j;
> +				k++;
> +			}
>  			cur_sgl = sg_next(cur_sgl);
> +			subpage_idx = 0;
>  		}
>  	}
> 
> --
> 2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 09/11] HID: hyperv: Make ringbuffer at least take two pages
  2020-07-22 23:36   ` Michael Kelley
@ 2020-07-23  1:28     ` boqun.feng
  0 siblings, 0 replies; 24+ messages in thread
From: boqun.feng @ 2020-07-23  1:28 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

On Wed, Jul 22, 2020 at 11:36:15PM +0000, Michael Kelley wrote:
> From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:42 PM
> > 
> > When PAGE_SIZE > HV_HYP_PAGE_SIZE, we need the ringbuffer size to be at
> > least 2 * PAGE_SIZE: one page for the header and at least one page of
> > the data part (because of the alignment requirement for double mapping).
> > 
> > So make sure the ringbuffer sizes to be at least 2 * PAGE_SIZE when
> > using vmbus_open() to establish the vmbus connection.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  drivers/hid/hid-hyperv.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
> > index 0b6ee1dee625..36c5e157c691 100644
> > --- a/drivers/hid/hid-hyperv.c
> > +++ b/drivers/hid/hid-hyperv.c
> > @@ -104,8 +104,8 @@ struct synthhid_input_report {
> > 
> >  #pragma pack(pop)
> > 
> > -#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
> > -#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
> > +#define INPUTVSC_SEND_RING_BUFFER_SIZE		(128 * 1024)
> > +#define INPUTVSC_RECV_RING_BUFFER_SIZE		(128 * 1024)
> 
> Use max(40 * 1024, 2 * PAGE_SIZE) like in patch 8 of the series?
> 

Sure! Will change it in next version.

Regards,
Boqun

> > 
> > 
> >  enum pipe_prot_msg_type {
> > --
> > 2.27.0
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K
  2020-07-23  0:13   ` Michael Kelley
@ 2020-07-23  1:51     ` boqun.feng
  2020-07-23  2:26       ` Michael Kelley
  0 siblings, 1 reply; 24+ messages in thread
From: boqun.feng @ 2020-07-23  1:51 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

On Thu, Jul 23, 2020 at 12:13:07AM +0000, Michael Kelley wrote:
> From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:42 PM
> > 
> > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > as the unit for page related data. For storvsc, the data is
> > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > into Hyper-V pages in vmbus_packet_mpb_array.
> > 
> > This patch does the conversion by dividing pages in sglist into Hyper-V
> > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > accordingly.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  drivers/scsi/storvsc_drv.c | 27 +++++++++++++++++++++------
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index fb41636519ee..c54d25f279bc 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -1561,7 +1561,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> > scsi_cmnd *scmnd)
> >  	struct hv_host_device *host_dev = shost_priv(host);
> >  	struct hv_device *dev = host_dev->dev;
> >  	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> > -	int i;
> > +	int i, j, k;
> >  	struct scatterlist *sgl;
> >  	unsigned int sg_count = 0;
> >  	struct vmscsi_request *vm_srb;
> > @@ -1569,6 +1569,8 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> > scsi_cmnd *scmnd)
> >  	struct vmbus_packet_mpb_array  *payload;
> >  	u32 payload_sz;
> >  	u32 length;
> > +	int subpage_idx = 0;
> > +	unsigned int hvpg_count = 0;
> > 
> >  	if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
> >  		/*
> > @@ -1643,23 +1645,36 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> > scsi_cmnd *scmnd)
> >  	payload_sz = sizeof(cmd_request->mpb);
> > 
> >  	if (sg_count) {
> > -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > +		hvpg_count = sg_count * (PAGE_SIZE / HV_HYP_PAGE_SIZE);
> 
> The above calculation doesn't take into account the offset in the
> first sglist or the overall length of the transfer, so the value of hvpg_count
> could be quite a bit bigger than it needs to be.  For example, with a 64K
> page size and an 8 Kbyte transfer size that starts at offset 60K in the
> first page, hvpg_count will be 32 when it really only needs to be 2.
> 
> The nested loops below that populate the pfn_array take the 
> offset into account when starting, so that's good.  But it will potentially
> leave allocated entries unused.  Furthermore, the nested loops could
> terminate early when enough Hyper-V size pages are mapped to PFNs
> based on the length of the transfer, even if all of the last guest size
> page has not been mapped to PFNs.  Like the offset at the beginning of
> first guest size page in the sglist, there's potentially an unused portion
> at the end of the last guest size page in the sglist.
> 

Good point. I think we could calculate the exact hvpg_count as follow:

	hvpg_count = 0;
	cur_sgl = sgl;

	for (i = 0; i < sg_count; i++) {
		hvpg_count += HVPFN_UP(cur_sg->length)
		cur_sgl = sg_next(cur_sgl);
	}

> > +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > 
> > -			payload_sz = (sg_count * sizeof(u64) +
> > +			payload_sz = (hvpg_count * sizeof(u64) +
> >  				      sizeof(struct vmbus_packet_mpb_array));
> >  			payload = kzalloc(payload_sz, GFP_ATOMIC);
> >  			if (!payload)
> >  				return SCSI_MLQUEUE_DEVICE_BUSY;
> >  		}
> > 
> > +		/*
> > +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> > +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> > +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > +		 */
> >  		payload->range.len = length;
> > -		payload->range.offset = sgl[0].offset;
> > +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > +		subpage_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > 
> >  		cur_sgl = sgl;
> > +		k = 0;
> >  		for (i = 0; i < sg_count; i++) {
> > -			payload->range.pfn_array[i] =
> > -				page_to_pfn(sg_page((cur_sgl)));
> > +			for (j = subpage_idx; j < (PAGE_SIZE / HV_HYP_PAGE_SIZE); j++) {
> 
> In the case where PAGE_SIZE == HV_HYP_PAGE_SIZE, would it help the compiler
> eliminate the loop if local variable j is declared as unsigned?  In that case the test in the
> for statement will always be false.
> 

Good point! I did the following test:

test.c:

	int func(unsigned int input, int *arr)
	{
		unsigned int i;
		int result = 0;

		for (i = input; i < 1; i++)
			result += arr[i];

		return result;
	}

if I define i as "int", I got:

	0000000000000000 <func>:
	   0:	85 ff                	test   %edi,%edi
	   2:	7f 2c                	jg     30 <func+0x30>
	   4:	48 63 d7             	movslq %edi,%rdx
	   7:	f7 df                	neg    %edi
	   9:	45 31 c0             	xor    %r8d,%r8d
	   c:	89 ff                	mov    %edi,%edi
	   e:	48 8d 04 96          	lea    (%rsi,%rdx,4),%rax
	  12:	48 01 d7             	add    %rdx,%rdi
	  15:	48 8d 54 be 04       	lea    0x4(%rsi,%rdi,4),%rdx
	  1a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
	  20:	44 03 00             	add    (%rax),%r8d
	  23:	48 83 c0 04          	add    $0x4,%rax
	  27:	48 39 d0             	cmp    %rdx,%rax
	  2a:	75 f4                	jne    20 <func+0x20>
	  2c:	44 89 c0             	mov    %r8d,%eax
	  2f:	c3                   	retq   
	  30:	45 31 c0             	xor    %r8d,%r8d
	  33:	44 89 c0             	mov    %r8d,%eax
	  36:	c3                   	retq   

and when I define i as "unsigned int", I got:

	0000000000000000 <func>:
	   0:	85 ff                	test   %edi,%edi
	   2:	75 03                	jne    7 <func+0x7>
	   4:	8b 06                	mov    (%rsi),%eax
	   6:	c3                   	retq   
	   7:	31 c0                	xor    %eax,%eax
	   9:	c3                   	retq   

So clearly it helps, I will change this in the next version.

Regards,
Boqun

> > +				payload->range.pfn_array[k] =
> > +					page_to_hvpfn(sg_page((cur_sgl))) + j;
> > +				k++;
> > +			}
> >  			cur_sgl = sg_next(cur_sgl);
> > +			subpage_idx = 0;
> >  		}
> >  	}
> > 
> > --
> > 2.27.0
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K
  2020-07-23  1:51     ` boqun.feng
@ 2020-07-23  2:26       ` Michael Kelley
  2020-07-23  3:12         ` Boqun Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2020-07-23  2:26 UTC (permalink / raw)
  To: boqun.feng
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

From: boqun.feng@gmail.com <boqun.feng@gmail.com> Sent: Wednesday, July 22, 2020 6:52 PM
> 
> On Thu, Jul 23, 2020 at 12:13:07AM +0000, Michael Kelley wrote:
> > From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:42 PM
> > >
> > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > > as the unit for page related data. For storvsc, the data is
> > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > > into Hyper-V pages in vmbus_packet_mpb_array.
> > >
> > > This patch does the conversion by dividing pages in sglist into Hyper-V
> > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > > accordingly.
> > >
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 27 +++++++++++++++++++++------
> > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index fb41636519ee..c54d25f279bc 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -1561,7 +1561,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct
> > > scsi_cmnd *scmnd)
> > >  	struct hv_host_device *host_dev = shost_priv(host);
> > >  	struct hv_device *dev = host_dev->dev;
> > >  	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> > > -	int i;
> > > +	int i, j, k;
> > >  	struct scatterlist *sgl;
> > >  	unsigned int sg_count = 0;
> > >  	struct vmscsi_request *vm_srb;
> > > @@ -1569,6 +1569,8 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct
> > > scsi_cmnd *scmnd)
> > >  	struct vmbus_packet_mpb_array  *payload;
> > >  	u32 payload_sz;
> > >  	u32 length;
> > > +	int subpage_idx = 0;
> > > +	unsigned int hvpg_count = 0;
> > >
> > >  	if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
> > >  		/*
> > > @@ -1643,23 +1645,36 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct
> > > scsi_cmnd *scmnd)
> > >  	payload_sz = sizeof(cmd_request->mpb);
> > >
> > >  	if (sg_count) {
> > > -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +		hvpg_count = sg_count * (PAGE_SIZE / HV_HYP_PAGE_SIZE);
> >
> > The above calculation doesn't take into account the offset in the
> > first sglist or the overall length of the transfer, so the value of hvpg_count
> > could be quite a bit bigger than it needs to be.  For example, with a 64K
> > page size and an 8 Kbyte transfer size that starts at offset 60K in the
> > first page, hvpg_count will be 32 when it really only needs to be 2.
> >
> > The nested loops below that populate the pfn_array take the
> > offset into account when starting, so that's good.  But it will potentially
> > leave allocated entries unused.  Furthermore, the nested loops could
> > terminate early when enough Hyper-V size pages are mapped to PFNs
> > based on the length of the transfer, even if all of the last guest size
> > page has not been mapped to PFNs.  Like the offset at the beginning of
> > first guest size page in the sglist, there's potentially an unused portion
> > at the end of the last guest size page in the sglist.
> >
> 
> Good point. I think we could calculate the exact hvpg_count as follow:
> 
> 	hvpg_count = 0;
> 	cur_sgl = sgl;
> 
> 	for (i = 0; i < sg_count; i++) {
> 		hvpg_count += HVPFN_UP(cur_sg->length)
> 		cur_sgl = sg_next(cur_sgl);
> 	}
> 

The downside would be going around that loop a lot of times when
the page size is 4K bytes and the I/O transfer size is something like
256K bytes.  I think this gives the right result in constant time:  the
starting offset within a Hyper-V page, plus the transfer length,
rounded up to a Hyper-V page size, and divided by the Hyper-V
page size.


> > > +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > >
> > > -			payload_sz = (sg_count * sizeof(u64) +
> > > +			payload_sz = (hvpg_count * sizeof(u64) +
> > >  				      sizeof(struct vmbus_packet_mpb_array));
> > >  			payload = kzalloc(payload_sz, GFP_ATOMIC);
> > >  			if (!payload)
> > >  				return SCSI_MLQUEUE_DEVICE_BUSY;
> > >  		}
> > >
> > > +		/*
> > > +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> > > +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > > +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> > > +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > > +		 */
> > >  		payload->range.len = length;
> > > -		payload->range.offset = sgl[0].offset;
> > > +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > > +		subpage_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > >
> > >  		cur_sgl = sgl;
> > > +		k = 0;
> > >  		for (i = 0; i < sg_count; i++) {
> > > -			payload->range.pfn_array[i] =
> > > -				page_to_pfn(sg_page((cur_sgl)));
> > > +			for (j = subpage_idx; j < (PAGE_SIZE / HV_HYP_PAGE_SIZE); j++) {
> >
> > In the case where PAGE_SIZE == HV_HYP_PAGE_SIZE, would it help the compiler
> > eliminate the loop if local variable j is declared as unsigned?  In that case the test in the
> > for statement will always be false.
> >
> 
> Good point! I did the following test:
> 
> test.c:
> 
> 	int func(unsigned int input, int *arr)
> 	{
> 		unsigned int i;
> 		int result = 0;
> 
> 		for (i = input; i < 1; i++)
> 			result += arr[i];
> 
> 		return result;
> 	}
> 
> if I define i as "int", I got:
> 
> 	0000000000000000 <func>:
> 	   0:	85 ff                	test   %edi,%edi
> 	   2:	7f 2c                	jg     30 <func+0x30>
> 	   4:	48 63 d7             	movslq %edi,%rdx
> 	   7:	f7 df                	neg    %edi
> 	   9:	45 31 c0             	xor    %r8d,%r8d
> 	   c:	89 ff                	mov    %edi,%edi
> 	   e:	48 8d 04 96          	lea    (%rsi,%rdx,4),%rax
> 	  12:	48 01 d7             	add    %rdx,%rdi
> 	  15:	48 8d 54 be 04       	lea    0x4(%rsi,%rdi,4),%rdx
> 	  1a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
> 	  20:	44 03 00             	add    (%rax),%r8d
> 	  23:	48 83 c0 04          	add    $0x4,%rax
> 	  27:	48 39 d0             	cmp    %rdx,%rax
> 	  2a:	75 f4                	jne    20 <func+0x20>
> 	  2c:	44 89 c0             	mov    %r8d,%eax
> 	  2f:	c3                   	retq
> 	  30:	45 31 c0             	xor    %r8d,%r8d
> 	  33:	44 89 c0             	mov    %r8d,%eax
> 	  36:	c3                   	retq
> 
> and when I define i as "unsigned int", I got:
> 
> 	0000000000000000 <func>:
> 	   0:	85 ff                	test   %edi,%edi
> 	   2:	75 03                	jne    7 <func+0x7>
> 	   4:	8b 06                	mov    (%rsi),%eax
> 	   6:	c3                   	retq
> 	   7:	31 c0                	xor    %eax,%eax
> 	   9:	c3                   	retq
> 
> So clearly it helps, I will change this in the next version.

Wow!  The compiler is good ....

> 
> Regards,
> Boqun
> 
> > > +				payload->range.pfn_array[k] =
> > > +					page_to_hvpfn(sg_page((cur_sgl))) + j;
> > > +				k++;
> > > +			}
> > >  			cur_sgl = sg_next(cur_sgl);
> > > +			subpage_idx = 0;
> > >  		}
> > >  	}
> > >
> > > --
> > > 2.27.0
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K
  2020-07-23  2:26       ` Michael Kelley
@ 2020-07-23  3:12         ` Boqun Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2020-07-23  3:12 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-hyperv, linux-input, linux-kernel, netdev, linux-scsi,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Jiri Kosina, Benjamin Tissoires, Dmitry Torokhov,
	David S. Miller, Jakub Kicinski, James E.J. Bottomley,
	Martin K. Petersen

On Thu, Jul 23, 2020 at 02:26:00AM +0000, Michael Kelley wrote:
> From: boqun.feng@gmail.com <boqun.feng@gmail.com> Sent: Wednesday, July 22, 2020 6:52 PM
> > 
> > On Thu, Jul 23, 2020 at 12:13:07AM +0000, Michael Kelley wrote:
> > > From: Boqun Feng <boqun.feng@gmail.com> Sent: Monday, July 20, 2020 6:42 PM
> > > >
> > > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > > > as the unit for page related data. For storvsc, the data is
> > > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > > > into Hyper-V pages in vmbus_packet_mpb_array.
> > > >
> > > > This patch does the conversion by dividing pages in sglist into Hyper-V
> > > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > > > accordingly.
> > > >
> > > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > > ---
> > > >  drivers/scsi/storvsc_drv.c | 27 +++++++++++++++++++++------
> > > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > > index fb41636519ee..c54d25f279bc 100644
> > > > --- a/drivers/scsi/storvsc_drv.c
> > > > +++ b/drivers/scsi/storvsc_drv.c
> > > > @@ -1561,7 +1561,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> > struct
> > > > scsi_cmnd *scmnd)
> > > >  	struct hv_host_device *host_dev = shost_priv(host);
> > > >  	struct hv_device *dev = host_dev->dev;
> > > >  	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> > > > -	int i;
> > > > +	int i, j, k;
> > > >  	struct scatterlist *sgl;
> > > >  	unsigned int sg_count = 0;
> > > >  	struct vmscsi_request *vm_srb;
> > > > @@ -1569,6 +1569,8 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> > struct
> > > > scsi_cmnd *scmnd)
> > > >  	struct vmbus_packet_mpb_array  *payload;
> > > >  	u32 payload_sz;
> > > >  	u32 length;
> > > > +	int subpage_idx = 0;
> > > > +	unsigned int hvpg_count = 0;
> > > >
> > > >  	if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
> > > >  		/*
> > > > @@ -1643,23 +1645,36 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> > struct
> > > > scsi_cmnd *scmnd)
> > > >  	payload_sz = sizeof(cmd_request->mpb);
> > > >
> > > >  	if (sg_count) {
> > > > -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > > > +		hvpg_count = sg_count * (PAGE_SIZE / HV_HYP_PAGE_SIZE);
> > >
> > > The above calculation doesn't take into account the offset in the
> > > first sglist or the overall length of the transfer, so the value of hvpg_count
> > > could be quite a bit bigger than it needs to be.  For example, with a 64K
> > > page size and an 8 Kbyte transfer size that starts at offset 60K in the
> > > first page, hvpg_count will be 32 when it really only needs to be 2.
> > >
> > > The nested loops below that populate the pfn_array take the
> > > offset into account when starting, so that's good.  But it will potentially
> > > leave allocated entries unused.  Furthermore, the nested loops could
> > > terminate early when enough Hyper-V size pages are mapped to PFNs
> > > based on the length of the transfer, even if all of the last guest size
> > > page has not been mapped to PFNs.  Like the offset at the beginning of
> > > first guest size page in the sglist, there's potentially an unused portion
> > > at the end of the last guest size page in the sglist.
> > >
> > 
> > Good point. I think we could calculate the exact hvpg_count as follow:
> > 
> > 	hvpg_count = 0;
> > 	cur_sgl = sgl;
> > 
> > 	for (i = 0; i < sg_count; i++) {
> > 		hvpg_count += HVPFN_UP(cur_sg->length)
> > 		cur_sgl = sg_next(cur_sgl);
> > 	}
> > 
> 
> The downside would be going around that loop a lot of times when
> the page size is 4K bytes and the I/O transfer size is something like
> 256K bytes.  I think this gives the right result in constant time:  the
> starting offset within a Hyper-V page, plus the transfer length,
> rounded up to a Hyper-V page size, and divided by the Hyper-V
> page size.
> 

Ok, then:

	hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
	hvpg_count = HVPFN_UP(hv_offset + length);

?

Thanks!

Regards,
Boqun

> 
> > > > +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > > >
> > > > -			payload_sz = (sg_count * sizeof(u64) +
> > > > +			payload_sz = (hvpg_count * sizeof(u64) +
> > > >  				      sizeof(struct vmbus_packet_mpb_array));
> > > >  			payload = kzalloc(payload_sz, GFP_ATOMIC);
> > > >  			if (!payload)
> > > >  				return SCSI_MLQUEUE_DEVICE_BUSY;
> > > >  		}
> > > >
> > > > +		/*
> > > > +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> > > > +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > > > +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> > > > +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > > > +		 */
> > > >  		payload->range.len = length;
> > > > -		payload->range.offset = sgl[0].offset;
> > > > +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > > > +		subpage_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > > >
> > > >  		cur_sgl = sgl;
> > > > +		k = 0;
> > > >  		for (i = 0; i < sg_count; i++) {
> > > > -			payload->range.pfn_array[i] =
> > > > -				page_to_pfn(sg_page((cur_sgl)));
> > > > +			for (j = subpage_idx; j < (PAGE_SIZE / HV_HYP_PAGE_SIZE); j++) {
> > >
> > > In the case where PAGE_SIZE == HV_HYP_PAGE_SIZE, would it help the compiler
> > > eliminate the loop if local variable j is declared as unsigned?  In that case the test in the
> > > for statement will always be false.
> > >
> > 
> > Good point! I did the following test:
> > 
> > test.c:
> > 
> > 	int func(unsigned int input, int *arr)
> > 	{
> > 		unsigned int i;
> > 		int result = 0;
> > 
> > 		for (i = input; i < 1; i++)
> > 			result += arr[i];
> > 
> > 		return result;
> > 	}
> > 
> > if I define i as "int", I got:
> > 
> > 	0000000000000000 <func>:
> > 	   0:	85 ff                	test   %edi,%edi
> > 	   2:	7f 2c                	jg     30 <func+0x30>
> > 	   4:	48 63 d7             	movslq %edi,%rdx
> > 	   7:	f7 df                	neg    %edi
> > 	   9:	45 31 c0             	xor    %r8d,%r8d
> > 	   c:	89 ff                	mov    %edi,%edi
> > 	   e:	48 8d 04 96          	lea    (%rsi,%rdx,4),%rax
> > 	  12:	48 01 d7             	add    %rdx,%rdi
> > 	  15:	48 8d 54 be 04       	lea    0x4(%rsi,%rdi,4),%rdx
> > 	  1a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
> > 	  20:	44 03 00             	add    (%rax),%r8d
> > 	  23:	48 83 c0 04          	add    $0x4,%rax
> > 	  27:	48 39 d0             	cmp    %rdx,%rax
> > 	  2a:	75 f4                	jne    20 <func+0x20>
> > 	  2c:	44 89 c0             	mov    %r8d,%eax
> > 	  2f:	c3                   	retq
> > 	  30:	45 31 c0             	xor    %r8d,%r8d
> > 	  33:	44 89 c0             	mov    %r8d,%eax
> > 	  36:	c3                   	retq
> > 
> > and when I define i as "unsigned int", I got:
> > 
> > 	0000000000000000 <func>:
> > 	   0:	85 ff                	test   %edi,%edi
> > 	   2:	75 03                	jne    7 <func+0x7>
> > 	   4:	8b 06                	mov    (%rsi),%eax
> > 	   6:	c3                   	retq
> > 	   7:	31 c0                	xor    %eax,%eax
> > 	   9:	c3                   	retq
> > 
> > So clearly it helps, I will change this in the next version.
> 
> Wow!  The compiler is good ....
> 
> > 
> > Regards,
> > Boqun
> > 
> > > > +				payload->range.pfn_array[k] =
> > > > +					page_to_hvpfn(sg_page((cur_sgl))) + j;
> > > > +				k++;
> > > > +			}
> > > >  			cur_sgl = sg_next(cur_sgl);
> > > > +			subpage_idx = 0;
> > > >  		}
> > > >  	}
> > > >
> > > > --
> > > > 2.27.0
> > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21  1:41 [RFC 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Boqun Feng
2020-07-21  1:41 ` [RFC 01/11] Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl Boqun Feng
2020-07-21 15:22   ` Wei Liu
2020-07-22 23:20     ` Boqun Feng
2020-07-21  1:41 ` [RFC 02/11] Drivers: hv: vmbus: Move __vmbus_open() Boqun Feng
2020-07-21 15:23   ` Wei Liu
2020-07-21  1:41 ` [RFC 03/11] Drivers: hv: vmbus: Introduce types of GPADL Boqun Feng
2020-07-22 23:25   ` Michael Kelley
2020-07-22 23:43     ` Boqun Feng
2020-07-22 23:56       ` Michael Kelley
2020-07-21  1:41 ` [RFC 04/11] Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs() Boqun Feng
2020-07-21  1:41 ` [RFC 05/11] Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header Boqun Feng
2020-07-21  1:41 ` [RFC 06/11] hv: hyperv.h: Introduce some hvpfn helper functions Boqun Feng
2020-07-21  1:41 ` [RFC 07/11] hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication Boqun Feng
2020-07-21  1:41 ` [RFC 08/11] Input: hyperv-keyboard: Make ringbuffer at least take two pages Boqun Feng
2020-07-21  1:41 ` [RFC 09/11] HID: hyperv: " Boqun Feng
2020-07-22 23:36   ` Michael Kelley
2020-07-23  1:28     ` boqun.feng
2020-07-21  1:41 ` [RFC 10/11] Driver: hv: util: " Boqun Feng
2020-07-21  1:41 ` [RFC 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K Boqun Feng
2020-07-23  0:13   ` Michael Kelley
2020-07-23  1:51     ` boqun.feng
2020-07-23  2:26       ` Michael Kelley
2020-07-23  3:12         ` Boqun Feng

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git