netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts
@ 2018-01-23  9:34 Mohammed Gamal
  2018-01-23  9:34 ` [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl() Mohammed Gamal
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Mohammed Gamal @ 2018-01-23  9:34 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devel, kys, haiyangz, sthemmin, vkuznets, cavery,
	otubo, Mohammed Gamal

Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
a regression that caused VMs not to shutdown after netvsc_device_remove() is
called. This is caused by GPADL teardown sequence change, and while that was 
necessary to fix issues with Win2016 hosts, it did introduce a regression for
earlier versions.

Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as 
follows (as implemented in netvsc_destroy_buf()):
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Teardown receive buffer GPADL
3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
4- Teardown send buffer GPADL
5- Close vmbus

This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
into two functions and rearranged the order as follows
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Close vmbus
4- Teardown receive buffer GPADL
5- Teardown send buffer GPADL

That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
shutting down. 

This patch series works around this problem. The first patch splits
netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
functions for tearing down send and receive buffers individally. The second patch
uses the finer grained functions to implement the teardown sequence according to
the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
2016 hosts, while we re-introduce the old sequence for earlier verions.

Mohammed Gamal (2):
  hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  hv_netvsc: Change GPADL teardown order according to Hyper-V version

 drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 8 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-01-23  9:34 [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Mohammed Gamal
@ 2018-01-23  9:34 ` Mohammed Gamal
  2018-01-30 19:29   ` Stephen Hemminger
  2018-01-23  9:34 ` [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version Mohammed Gamal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Mohammed Gamal @ 2018-01-23  9:34 UTC (permalink / raw)
  To: netdev
  Cc: otubo, Mohammed Gamal, sthemmin, haiyangz, linux-kernel, devel, vkuznets

Split each of the functions into two for each of send/recv buffers

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index bfc7969..3982f76 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -100,8 +100,8 @@ static void free_netvsc_device_rcu(struct netvsc_device *nvdev)
 	call_rcu(&nvdev->rcu, free_netvsc_device);
 }
 
-static void netvsc_revoke_buf(struct hv_device *device,
-			      struct netvsc_device *net_device)
+static void netvsc_revoke_recv_buf(struct hv_device *device,
+				   struct netvsc_device *net_device)
 {
 	struct nvsp_message *revoke_packet;
 	struct net_device *ndev = hv_get_drvdata(device);
@@ -146,6 +146,14 @@ static void netvsc_revoke_buf(struct hv_device *device,
 		}
 		net_device->recv_section_cnt = 0;
 	}
+}
+
+static void netvsc_revoke_send_buf(struct hv_device *device,
+				   struct netvsc_device *net_device)
+{
+	struct nvsp_message *revoke_packet;
+	struct net_device *ndev = hv_get_drvdata(device);
+	int ret;
 
 	/* Deal with the send buffer we may have setup.
 	 * If we got a  send section size, it means we received a
@@ -189,8 +197,8 @@ static void netvsc_revoke_buf(struct hv_device *device,
 	}
 }
 
-static void netvsc_teardown_gpadl(struct hv_device *device,
-				  struct netvsc_device *net_device)
+static void netvsc_teardown_recv_buf_gpadl(struct hv_device *device,
+					   struct netvsc_device *net_device)
 {
 	struct net_device *ndev = hv_get_drvdata(device);
 	int ret;
@@ -215,6 +223,13 @@ static void netvsc_teardown_gpadl(struct hv_device *device,
 		vfree(net_device->recv_buf);
 		net_device->recv_buf = NULL;
 	}
+}
+
+static void netvsc_teardown_send_buf_gpadl(struct hv_device *device,
+					   struct netvsc_device *net_device)
+{
+	struct net_device *ndev = hv_get_drvdata(device);
+	int ret;
 
 	if (net_device->send_buf_gpadl_handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
@@ -425,8 +440,10 @@ static int netvsc_init_buf(struct hv_device *device,
 	goto exit;
 
 cleanup:
-	netvsc_revoke_buf(device, net_device);
-	netvsc_teardown_gpadl(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device);
+	netvsc_teardown_recv_buf_gpadl(device, net_device);
+	netvsc_teardown_send_buf_gpadl(device, net_device);
 
 exit:
 	return ret;
@@ -558,7 +575,8 @@ void netvsc_device_remove(struct hv_device *device)
 
 	cancel_work_sync(&net_device->subchan_work);
 
-	netvsc_revoke_buf(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -571,7 +589,8 @@ void netvsc_device_remove(struct hv_device *device)
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
-	netvsc_teardown_gpadl(device, net_device);
+	netvsc_teardown_recv_buf_gpadl(device, net_device);
+	netvsc_teardown_send_buf_gpadl(device, net_device);
 
 	/* And dissassociate NAPI context from device */
 	for (i = 0; i < net_device->num_chn; i++)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version
  2018-01-23  9:34 [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Mohammed Gamal
  2018-01-23  9:34 ` [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl() Mohammed Gamal
@ 2018-01-23  9:34 ` Mohammed Gamal
  2018-01-30 19:30   ` Stephen Hemminger
  2018-01-23 15:43 ` [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Haiyang Zhang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Mohammed Gamal @ 2018-01-23  9:34 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, devel, kys, haiyangz, sthemmin, vkuznets, cavery,
	otubo, Mohammed Gamal

Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
introduced a regression causing VMs not to shutdown on pre-Wind2016
hosts after netvsc_remove_device() is called. This was caused as the
GPADL teardown sequence was changed.

This patch restores the old behavior for pre-Win2016 hosts, while
keeping the changes from 0cf7378 for Win2016 and higher hosts.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 3982f76..d09bb3b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -575,8 +575,17 @@ void netvsc_device_remove(struct hv_device *device)
 
 	cancel_work_sync(&net_device->subchan_work);
 
+	/*
+	 * Revoke receive buffer. If host is pre-Win2016 then tear down
+	 * receive buffer GPADL. Do the same for send buffer.
+	 */
 	netvsc_revoke_recv_buf(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10)
+		netvsc_teardown_recv_buf_gpadl(device, net_device);
+
 	netvsc_revoke_send_buf(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10)
+		netvsc_teardown_send_buf_gpadl(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -589,8 +598,14 @@ void netvsc_device_remove(struct hv_device *device)
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
-	netvsc_teardown_recv_buf_gpadl(device, net_device);
-	netvsc_teardown_send_buf_gpadl(device, net_device);
+	/*
+	 * If host is Win2016 or higher then we do the GPADL tear down
+	 * here after VMBus is closed, instead of doing it earlier.
+	 */
+	if (vmbus_proto_version >= VERSION_WIN10) {
+		netvsc_teardown_recv_buf_gpadl(device, net_device);
+		netvsc_teardown_send_buf_gpadl(device, net_device);
+	}
 
 	/* And dissassociate NAPI context from device */
 	for (i = 0; i < net_device->num_chn; i++)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts
  2018-01-23  9:34 [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Mohammed Gamal
  2018-01-23  9:34 ` [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl() Mohammed Gamal
  2018-01-23  9:34 ` [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version Mohammed Gamal
@ 2018-01-23 15:43 ` Haiyang Zhang
  2018-01-23 16:33 ` Stephen Hemminger
  2018-01-26 18:10 ` Stephen Hemminger
  4 siblings, 0 replies; 13+ messages in thread
From: Haiyang Zhang @ 2018-01-23 15:43 UTC (permalink / raw)
  To: Mohammed Gamal, netdev
  Cc: otubo, Stephen Hemminger, linux-kernel, devel, vkuznets



> -----Original Message-----
> From: Mohammed Gamal [mailto:mgamal@redhat.com]
> Sent: Tuesday, January 23, 2018 4:34 AM
> To: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; KY
> Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; vkuznets@redhat.com;
> cavery@redhat.com; otubo@redhat.com; Mohammed Gamal
> <mgamal@redhat.com>
> Subject: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012
> hosts
> 
> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> introduced a regression that caused VMs not to shutdown after
> netvsc_device_remove() is called. This is caused by GPADL teardown
> sequence change, and while that was necessary to fix issues with Win2016
> hosts, it did introduce a regression for earlier versions.
> 
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was
> as follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
> 
> This didn't work for WS2016 hosts. Commit 0cf737808 split
> netvsc_destroy_buf() into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
> 
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs
> from shutting down.
> 
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second
> patch uses the finer grained functions to implement the teardown sequence
> according to the host's version. We keep the behavior introduced in
> 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
> 
> Mohammed Gamal (2):
>   hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
>   hv_netvsc: Change GPADL teardown order according to Hyper-V version
> 
>  drivers/net/hyperv/netvsc.c | 50
> +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)

Thank you for the patches. We are testing them internally.

- Haiyang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts
  2018-01-23  9:34 [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Mohammed Gamal
                   ` (2 preceding siblings ...)
  2018-01-23 15:43 ` [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Haiyang Zhang
@ 2018-01-23 16:33 ` Stephen Hemminger
  2018-01-26 18:10 ` Stephen Hemminger
  4 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2018-01-23 16:33 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: netdev, otubo, sthemmin, haiyangz, linux-kernel, devel, vkuznets

On Tue, 23 Jan 2018 10:34:03 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
> a regression that caused VMs not to shutdown after netvsc_device_remove() is
> called. This is caused by GPADL teardown sequence change, and while that was 
> necessary to fix issues with Win2016 hosts, it did introduce a regression for
> earlier versions.
> 
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as 
> follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
> 
> This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
> into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
> 
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
> shutting down. 
> 
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second patch
> uses the finer grained functions to implement the teardown sequence according to
> the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
> 
> Mohammed Gamal (2):
>   hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
>   hv_netvsc: Change GPADL teardown order according to Hyper-V version
> 
>  drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)
> 

The problem the original commit was trying to solve was actions in flight
in the receive buffer on shutdown. Having different ordering for each version of Hyper-V
seems unnecessary. There should be a way to get a stable sequence here.

Let me see if I can shake more information out of the Windows team to see what
the handshake on the other side is. Let's not apply this until then.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts
  2018-01-23  9:34 [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Mohammed Gamal
                   ` (3 preceding siblings ...)
  2018-01-23 16:33 ` Stephen Hemminger
@ 2018-01-26 18:10 ` Stephen Hemminger
  4 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2018-01-26 18:10 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: netdev, otubo, sthemmin, haiyangz, linux-kernel, devel, vkuznets

On Tue, 23 Jan 2018 10:34:03 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
> a regression that caused VMs not to shutdown after netvsc_device_remove() is
> called. This is caused by GPADL teardown sequence change, and while that was 
> necessary to fix issues with Win2016 hosts, it did introduce a regression for
> earlier versions.
> 
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as 
> follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
> 
> This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
> into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
> 
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
> shutting down. 
> 
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second patch
> uses the finer grained functions to implement the teardown sequence according to
> the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
> 
> Mohammed Gamal (2):
>   hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
>   hv_netvsc: Change GPADL teardown order according to Hyper-V version
> 
>  drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)
> 

What I am experimenting with is sending an NDIS_RESET (instead of setting packet filter)
as part of the close processing. This seems more like what the description of what Windows
driver does and matches my reading of the public RNDIS specification.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-01-23  9:34 ` [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl() Mohammed Gamal
@ 2018-01-30 19:29   ` Stephen Hemminger
  2018-01-31 11:16     ` Mohammed Gamal
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2018-01-30 19:29 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: otubo, sthemmin, netdev, haiyangz, linux-kernel, devel, vkuznets

On Tue, 23 Jan 2018 10:34:04 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> Split each of the functions into two for each of send/recv buffers
> 
> Signed-off-by: Mohammed Gamal <mgamal@redhat.com>

Splitting these functions is not necessary

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version
  2018-01-23  9:34 ` [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version Mohammed Gamal
@ 2018-01-30 19:30   ` Stephen Hemminger
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2018-01-30 19:30 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: otubo, sthemmin, netdev, haiyangz, linux-kernel, devel, vkuznets

On Tue, 23 Jan 2018 10:34:05 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> introduced a regression causing VMs not to shutdown on pre-Wind2016
> hosts after netvsc_remove_device() is called. This was caused as the
> GPADL teardown sequence was changed.
> 
> This patch restores the old behavior for pre-Win2016 hosts, while
> keeping the changes from 0cf7378 for Win2016 and higher hosts.
> 
> Signed-off-by: Mohammed Gamal <mgamal@redhat.com>

Investigated the Windows driver to see how it handled this.
It uses NVSP version < 4 to check for older hosts. So that patch
should use that.

Currently testing a version with that change.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-01-30 19:29   ` Stephen Hemminger
@ 2018-01-31 11:16     ` Mohammed Gamal
  2018-01-31 23:01       ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Mohammed Gamal @ 2018-01-31 11:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, otubo, sthemmin, haiyangz, linux-kernel, devel, vkuznets

On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> On Tue, 23 Jan 2018 10:34:04 +0100
> Mohammed Gamal <mgamal@redhat.com> wrote:
> 
> > Split each of the functions into two for each of send/recv buffers
> > 
> > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
> 
> Splitting these functions is not necessary

How so? We need to send each message independently, and hence the split
(see cover letter). Is there another way?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-01-31 11:16     ` Mohammed Gamal
@ 2018-01-31 23:01       ` Stephen Hemminger
  2018-02-01  8:37         ` Mohammed Gamal
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2018-01-31 23:01 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: otubo, sthemmin, netdev, haiyangz, linux-kernel, devel, vkuznets

On Wed, 31 Jan 2018 12:16:49 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > On Tue, 23 Jan 2018 10:34:04 +0100
> > Mohammed Gamal <mgamal@redhat.com> wrote:
> >   
> > > Split each of the functions into two for each of send/recv buffers
> > > 
> > > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>  
> > 
> > Splitting these functions is not necessary  
> 
> How so? We need to send each message independently, and hence the split
> (see cover letter). Is there another way?

This is all that is needed.


Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older windows
 server

On WS2012 the host ignores messages after vmbus channel is closed.
Workaround this by doing what Windows does and send the teardown
before close on older versions of NVSP protocol.

Reported-by: Mohammed Gamal <mgamal@redhat.com>
Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 17e529af79dc..1a3df0eff42f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device *device)
 	 */
 	netdev_dbg(ndev, "net device safe to remove\n");
 
+	/* Workaround for older versions of Windows require that
+	 * buffer be revoked before channel is disabled
+	 */
+	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
+		netvsc_teardown_gpadl(device, net_device);
+
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
-	netvsc_teardown_gpadl(device, net_device);
+	if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
+		netvsc_teardown_gpadl(device, net_device);
 
 	/* And dissassociate NAPI context from device */
 	for (i = 0; i < net_device->num_chn; i++)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-01-31 23:01       ` Stephen Hemminger
@ 2018-02-01  8:37         ` Mohammed Gamal
  2018-02-01 22:34           ` Mohammed Gamal
  0 siblings, 1 reply; 13+ messages in thread
From: Mohammed Gamal @ 2018-02-01  8:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, otubo, sthemmin, haiyangz, linux-kernel, devel, vkuznets

On Wed, 2018-01-31 at 15:01 -0800, Stephen Hemminger wrote:
> On Wed, 31 Jan 2018 12:16:49 +0100
> Mohammed Gamal <mgamal@redhat.com> wrote:
> 
> > On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > > On Tue, 23 Jan 2018 10:34:04 +0100
> > > Mohammed Gamal <mgamal@redhat.com> wrote:
> > >   
> > > > Split each of the functions into two for each of send/recv
> > > > buffers
> > > > 
> > > > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>  
> > > 
> > > Splitting these functions is not necessary  
> > 
> > How so? We need to send each message independently, and hence the
> > split
> > (see cover letter). Is there another way?
> 
> This is all that is needed.
> 
> 
> Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older
> windows
>  server
> 
> On WS2012 the host ignores messages after vmbus channel is closed.
> Workaround this by doing what Windows does and send the teardown
> before close on older versions of NVSP protocol.
> 
> Reported-by: Mohammed Gamal <mgamal@redhat.com>
> Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/hyperv/netvsc.c
> b/drivers/net/hyperv/netvsc.c
> index 17e529af79dc..1a3df0eff42f 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device
> *device)
>  	 */
>  	netdev_dbg(ndev, "net device safe to remove\n");
>  
> +	/* Workaround for older versions of Windows require that
> +	 * buffer be revoked before channel is disabled
> +	 */
> +	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
> +		netvsc_teardown_gpadl(device, net_device);
> +
>  	/* Now, we can close the channel safely */
>  	vmbus_close(device->channel);
>  
> -	netvsc_teardown_gpadl(device, net_device);
> +	if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
> +		netvsc_teardown_gpadl(device, net_device);
>  
>  	/* And dissassociate NAPI context from device */
>  	for (i = 0; i < net_device->num_chn; i++)

I've tried a similar workaround before by calling
netvsc_teardown_gpadl() after netvsc_revoke_buf(), but before setting
net_device_ctx->nvdev to NULL and it caused the guest to hang when
trying to change MTU. 

Let me try that change and see if it behaves differently.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-02-01  8:37         ` Mohammed Gamal
@ 2018-02-01 22:34           ` Mohammed Gamal
  2018-02-01 22:38             ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Mohammed Gamal @ 2018-02-01 22:34 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: otubo, sthemmin, netdev, haiyangz, linux-kernel, devel, vkuznets

On Thu, 2018-02-01 at 09:37 +0100, Mohammed Gamal wrote:
> On Wed, 2018-01-31 at 15:01 -0800, Stephen Hemminger wrote:
> > On Wed, 31 Jan 2018 12:16:49 +0100
> > Mohammed Gamal <mgamal@redhat.com> wrote:
> > 
> > > On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > > > On Tue, 23 Jan 2018 10:34:04 +0100
> > > > Mohammed Gamal <mgamal@redhat.com> wrote:
> > > >   
> > > > > Split each of the functions into two for each of send/recv
> > > > > buffers
> > > > > 
> > > > > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>  
> > > > 
> > > > Splitting these functions is not necessary  
> > > 
> > > How so? We need to send each message independently, and hence the
> > > split
> > > (see cover letter). Is there another way?
> > 
> > This is all that is needed.
> > 
> > 
> > Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older
> > windows
> >  server
> > 
> > On WS2012 the host ignores messages after vmbus channel is closed.
> > Workaround this by doing what Windows does and send the teardown
> > before close on older versions of NVSP protocol.
> > 
> > Reported-by: Mohammed Gamal <mgamal@redhat.com>
> > Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > ---
> >  drivers/net/hyperv/netvsc.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c
> > index 17e529af79dc..1a3df0eff42f 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device
> > *device)
> >  	 */
> >  	netdev_dbg(ndev, "net device safe to remove\n");
> >  
> > +	/* Workaround for older versions of Windows require that
> > +	 * buffer be revoked before channel is disabled
> > +	 */
> > +	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
> > +		netvsc_teardown_gpadl(device, net_device);
> > +
> >  	/* Now, we can close the channel safely */
> >  	vmbus_close(device->channel);
> >  
> > -	netvsc_teardown_gpadl(device, net_device);
> > +	if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
> > +		netvsc_teardown_gpadl(device, net_device);
> >  
> >  	/* And dissassociate NAPI context from device */
> >  	for (i = 0; i < net_device->num_chn; i++)
> 
> I've tried a similar workaround before by calling
> netvsc_teardown_gpadl() after netvsc_revoke_buf(), but before setting
> net_device_ctx->nvdev to NULL and it caused the guest to hang when
> trying to change MTU. 
> 
> Let me try that change and see if it behaves differently.

I tested the patch, but I've actually seen some unexpected behavior.

First, net_device->nvsp_version is actually NVSP_PROTOCOL_VERSION_5 on
both my Win2012 and Win2016 hosts that I tested on, so the condition is
never executed.

Second, when doing the check instead as  if (vmbus_proto_version <
VERSION_WIN10), I get the same behavior I described above where the
guest hangs as the kernel waits indefinitely in vmbus_teardown_gpadl()
for a completion to be signaled. This is actually what lead me to
propose splitting netvsc_revoke_buf() and netvsc_teardown_gpadl() in my
initial patchset so that we keep the same order of messages and avoid
that indefinite wait.
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  2018-02-01 22:34           ` Mohammed Gamal
@ 2018-02-01 22:38             ` Stephen Hemminger
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2018-02-01 22:38 UTC (permalink / raw)
  To: mgamal, Stephen Hemminger
  Cc: netdev, otubo, Haiyang Zhang, linux-kernel, devel, vkuznets

There are multiple issues with some of the parameter change paths.
Still working on getting something stable. Both upstream, and net-next do have crash issues under concurrent changes.

I don't want Linux doing different workaround than Windows if at all possible; because it means that it would require much wider testing against many different versions.
Ps: WS2008r2 still needs to be supported.

-----Original Message-----
From: Mohammed Gamal [mailto:mgamal@redhat.com] 
Sent: Thursday, February 1, 2018 2:34 PM
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: netdev@vger.kernel.org; otubo@redhat.com; Stephen Hemminger <sthemmin@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; vkuznets@redhat.com
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Thu, 2018-02-01 at 09:37 +0100, Mohammed Gamal wrote:
> On Wed, 2018-01-31 at 15:01 -0800, Stephen Hemminger wrote:
> > On Wed, 31 Jan 2018 12:16:49 +0100
> > Mohammed Gamal <mgamal@redhat.com> wrote:
> > 
> > > On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > > > On Tue, 23 Jan 2018 10:34:04 +0100
> > > > Mohammed Gamal <mgamal@redhat.com> wrote:
> > > >   
> > > > > Split each of the functions into two for each of send/recv
> > > > > buffers
> > > > > 
> > > > > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>  
> > > > 
> > > > Splitting these functions is not necessary  
> > > 
> > > How so? We need to send each message independently, and hence the
> > > split
> > > (see cover letter). Is there another way?
> > 
> > This is all that is needed.
> > 
> > 
> > Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older
> > windows
> >  server
> > 
> > On WS2012 the host ignores messages after vmbus channel is closed.
> > Workaround this by doing what Windows does and send the teardown
> > before close on older versions of NVSP protocol.
> > 
> > Reported-by: Mohammed Gamal <mgamal@redhat.com>
> > Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > ---
> >  drivers/net/hyperv/netvsc.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c
> > index 17e529af79dc..1a3df0eff42f 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device
> > *device)
> >  	 */
> >  	netdev_dbg(ndev, "net device safe to remove\n");
> >  
> > +	/* Workaround for older versions of Windows require that
> > +	 * buffer be revoked before channel is disabled
> > +	 */
> > +	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
> > +		netvsc_teardown_gpadl(device, net_device);
> > +
> >  	/* Now, we can close the channel safely */
> >  	vmbus_close(device->channel);
> >  
> > -	netvsc_teardown_gpadl(device, net_device);
> > +	if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
> > +		netvsc_teardown_gpadl(device, net_device);
> >  
> >  	/* And dissassociate NAPI context from device */
> >  	for (i = 0; i < net_device->num_chn; i++)
> 
> I've tried a similar workaround before by calling
> netvsc_teardown_gpadl() after netvsc_revoke_buf(), but before setting
> net_device_ctx->nvdev to NULL and it caused the guest to hang when
> trying to change MTU. 
> 
> Let me try that change and see if it behaves differently.

I tested the patch, but I've actually seen some unexpected behavior.

First, net_device->nvsp_version is actually NVSP_PROTOCOL_VERSION_5 on
both my Win2012 and Win2016 hosts that I tested on, so the condition is
never executed.

Second, when doing the check instead as  if (vmbus_proto_version <
VERSION_WIN10), I get the same behavior I described above where the
guest hangs as the kernel waits indefinitely in vmbus_teardown_gpadl()
for a completion to be signaled. This is actually what lead me to
propose splitting netvsc_revoke_buf() and netvsc_teardown_gpadl() in my
initial patchset so that we keep the same order of messages and avoid
that indefinite wait.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-02-01 22:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-23  9:34 [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Mohammed Gamal
2018-01-23  9:34 ` [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl() Mohammed Gamal
2018-01-30 19:29   ` Stephen Hemminger
2018-01-31 11:16     ` Mohammed Gamal
2018-01-31 23:01       ` Stephen Hemminger
2018-02-01  8:37         ` Mohammed Gamal
2018-02-01 22:34           ` Mohammed Gamal
2018-02-01 22:38             ` Stephen Hemminger
2018-01-23  9:34 ` [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version Mohammed Gamal
2018-01-30 19:30   ` Stephen Hemminger
2018-01-23 15:43 ` [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts Haiyang Zhang
2018-01-23 16:33 ` Stephen Hemminger
2018-01-26 18:10 ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).