linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
@ 2023-06-26  9:17 souradeep chakrabarti
  2023-06-26  9:18 ` [PATCH 1/2 " souradeep chakrabarti
  0 siblings, 1 reply; 17+ messages in thread
From: souradeep chakrabarti @ 2023-06-26  9:17 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma
  Cc: stable, schakrabarti, Souradeep Chakrabarti

From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

VF unload gets stuck in MANA driver, when the host is not responding.
The function mana_dealloc_queues() tries to clear the inflight packets,
and gets stuck in while loop. Another problem in this scenario is the
timeout from hwc send request.
These patch add fix for the same.
In mana driver we are adding a timeout in the while loop, to fix it.
Also we are adding a new attribute in mana_context, which gets set when
mana_hwc_send_request() hits a timeout because of host unresponsiveness.

Souradeep Chakrabarti (2):
  net: mana: Fix MANA VF unload when host is unresponsive
  net: mana: Fix MANA VF unload when host is unresponsive

 .../net/ethernet/microsoft/mana/gdma_main.c   |  4 +++-
 .../net/ethernet/microsoft/mana/hw_channel.c  | 12 +++++++++++-
 drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
 include/net/mana/mana.h                       |  2 ++
 4 files changed, 33 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:17 [PATCH 0/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive souradeep chakrabarti
@ 2023-06-26  9:18 ` souradeep chakrabarti
  2023-06-26  9:18   ` souradeep chakrabarti
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: souradeep chakrabarti @ 2023-06-26  9:18 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma
  Cc: stable, schakrabarti, Souradeep Chakrabarti

From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

This patch addresses the VF unload issue, where mana_dealloc_queues()
gets stuck in infinite while loop, because of host unresponsiveness.
It adds a timeout in the while loop, to fix it.

Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
Microsoft Azure Network Adapter)
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V2 -> V3:
* Splitted the patch in two parts.
* Removed the unnecessary braces from mana_dealloc_queues().
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d907727c7b7a..cb5c43c3c47e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
 	struct gdma_dev *gd = apc->ac->gdma_dev;
+	unsigned long timeout;
 	struct mana_txq *txq;
+	struct sk_buff *skb;
+	struct mana_cq *cq;
 	int i, err;
 
 	if (apc->port_is_up)
@@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	 *
 	 * Drain all the in-flight TX packets
 	 */
+
+	timeout = jiffies + 120 * HZ;
 	for (i = 0; i < apc->num_queues; i++) {
 		txq = &apc->tx_qp[i].txq;
-
-		while (atomic_read(&txq->pending_sends) > 0)
+		while (atomic_read(&txq->pending_sends) > 0 &&
+		       time_before(jiffies, timeout))
 			usleep_range(1000, 2000);
 	}
 
+	for (i = 0; i < apc->num_queues; i++) {
+		txq = &apc->tx_qp[i].txq;
+		cq = &apc->tx_qp[i].tx_cq;
+		while (atomic_read(&txq->pending_sends)) {
+			skb = skb_dequeue(&txq->pending_skbs);
+			mana_unmap_skb(skb, apc);
+			napi_consume_skb(skb, cq->budget);
+			atomic_sub(1, &txq->pending_sends);
+		}
+	}
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:18 ` [PATCH 1/2 " souradeep chakrabarti
@ 2023-06-26  9:18   ` souradeep chakrabarti
  2023-06-26 14:20     ` Praveen Kumar
  2023-06-26 15:35     ` Michael Kelley (LINUX)
  2023-06-26  9:20   ` [PATCH 2/2 " souradeep chakrabarti
  2023-06-26 13:05   ` [PATCH 1/2 " Simon Horman
  2 siblings, 2 replies; 17+ messages in thread
From: souradeep chakrabarti @ 2023-06-26  9:18 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma
  Cc: stable, schakrabarti, Souradeep Chakrabarti

From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

This patch addresses the VF unload issue, where mana_dealloc_queues()
gets stuck in infinite while loop, because of host unresponsiveness.
It adds a timeout in the while loop, to fix it.

Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
Microsoft Azure Network Adapter)
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V2 -> V3:
* Splitted the patch in two parts.
* Removed the unnecessary braces from mana_dealloc_queues().
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d907727c7b7a..cb5c43c3c47e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
 	struct gdma_dev *gd = apc->ac->gdma_dev;
+	unsigned long timeout;
 	struct mana_txq *txq;
+	struct sk_buff *skb;
+	struct mana_cq *cq;
 	int i, err;
 
 	if (apc->port_is_up)
@@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	 *
 	 * Drain all the in-flight TX packets
 	 */
+
+	timeout = jiffies + 120 * HZ;
 	for (i = 0; i < apc->num_queues; i++) {
 		txq = &apc->tx_qp[i].txq;
-
-		while (atomic_read(&txq->pending_sends) > 0)
+		while (atomic_read(&txq->pending_sends) > 0 &&
+		       time_before(jiffies, timeout))
 			usleep_range(1000, 2000);
 	}
 
+	for (i = 0; i < apc->num_queues; i++) {
+		txq = &apc->tx_qp[i].txq;
+		cq = &apc->tx_qp[i].tx_cq;
+		while (atomic_read(&txq->pending_sends)) {
+			skb = skb_dequeue(&txq->pending_skbs);
+			mana_unmap_skb(skb, apc);
+			napi_consume_skb(skb, cq->budget);
+			atomic_sub(1, &txq->pending_sends);
+		}
+	}
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:18 ` [PATCH 1/2 " souradeep chakrabarti
  2023-06-26  9:18   ` souradeep chakrabarti
@ 2023-06-26  9:20   ` souradeep chakrabarti
  2023-06-26 14:13     ` Praveen Kumar
  2023-06-26 15:53     ` Haiyang Zhang
  2023-06-26 13:05   ` [PATCH 1/2 " Simon Horman
  2 siblings, 2 replies; 17+ messages in thread
From: souradeep chakrabarti @ 2023-06-26  9:20 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma
  Cc: stable, schakrabarti, Souradeep Chakrabarti

From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

This is the second part of the fix.

Also this patch adds a new attribute in mana_context, which gets set when
mana_hwc_send_request() hits a timeout because of host unresponsiveness.
This flag then helps to avoid the timeouts in successive calls.

Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
Microsoft Azure Network Adapter)
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V2 -> V3:
* Removed the initialization of vf_unload_timeout
* Splitted the patch in two.
* Fixed extra space from the commit message.
---
 drivers/net/ethernet/microsoft/mana/gdma_main.c  |  4 +++-
 drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++-
 include/net/mana/mana.h                          |  2 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..6411f01be0d9 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
 	struct gdma_context *gc = gd->gdma_context;
 	struct gdma_general_resp resp = {};
 	struct gdma_general_req req = {};
+	struct mana_context *ac;
 	int err;
 
 	if (gd->pdid == INVALID_PDID)
 		return -EINVAL;
+	ac = gd->driver_data;
 
 	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, sizeof(req),
 			     sizeof(resp));
@@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
 	req.hdr.dev_id = gd->dev_id;
 
 	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
-	if (err || resp.hdr.status) {
+	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
 		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
 			err, resp.hdr.status);
 		if (!err)
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 9d1507eba5b9..492cb2c6e2cb 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /* Copyright (c) 2021, Microsoft Corporation. */
 
+#include "asm-generic/errno.h"
 #include <net/mana/gdma.h>
 #include <net/mana/hw_channel.h>
+#include <net/mana/mana.h>
 
 static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
 {
@@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
 	struct hwc_wq *txq = hwc->txq;
 	struct gdma_req_hdr *req_msg;
 	struct hwc_caller_ctx *ctx;
+	struct mana_context *ac;
 	u32 dest_vrcq = 0;
 	u32 dest_vrq = 0;
 	u16 msg_id;
 	int err;
 
 	mana_hwc_get_msg_index(hwc, &msg_id);
+	ac = hwc->gdma_dev->driver_data;
+	if (ac->vf_unload_timeout) {
+		dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
+		err = -ETIMEDOUT;
+		goto out;
+	}
 
 	tx_wr = &txq->msg_buf->reqs[msg_id];
 
@@ -825,9 +834,10 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
 		goto out;
 	}
 
-	if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
+	if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
 		dev_err(hwc->dev, "HWC: Request timed out!\n");
 		err = -ETIMEDOUT;
+		ac->vf_unload_timeout = true;
 		goto out;
 	}
 
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 9eef19972845..5f5affdca1eb 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -358,6 +358,8 @@ struct mana_context {
 
 	u16 num_ports;
 
+	bool vf_unload_timeout;
+
 	struct mana_eq *eqs;
 
 	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:18 ` [PATCH 1/2 " souradeep chakrabarti
  2023-06-26  9:18   ` souradeep chakrabarti
  2023-06-26  9:20   ` [PATCH 2/2 " souradeep chakrabarti
@ 2023-06-26 13:05   ` Simon Horman
  2023-06-26 20:06     ` Dexuan Cui
  2 siblings, 1 reply; 17+ messages in thread
From: Simon Horman @ 2023-06-26 13:05 UTC (permalink / raw)
  To: souradeep chakrabarti
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
	schakrabarti

On Mon, Jun 26, 2023 at 02:18:18AM -0700, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)

nit: A correct format of this fixes tag is:

In particular:
* All on lone line
* Description in double quotes.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Splitted the patch in two parts.
> * Removed the unnecessary braces from mana_dealloc_queues().

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:20   ` [PATCH 2/2 " souradeep chakrabarti
@ 2023-06-26 14:13     ` Praveen Kumar
  2023-06-26 20:32       ` Haiyang Zhang
  2023-06-27  8:42       ` Souradeep Chakrabarti
  2023-06-26 15:53     ` Haiyang Zhang
  1 sibling, 2 replies; 17+ messages in thread
From: Praveen Kumar @ 2023-06-26 14:13 UTC (permalink / raw)
  To: souradeep chakrabarti, kys, haiyangz, wei.liu, decui, davem,
	edumazet, kuba, pabeni, longli, sharmaajay, leon, cai.huoqing,
	ssengar, vkuznets, tglx, linux-hyperv, netdev, linux-kernel,
	linux-rdma
  Cc: stable, schakrabarti

On 6/26/2023 2:50 PM, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This is the second part of the fix.
> 
> Also this patch adds a new attribute in mana_context, which gets set when
> mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> This flag then helps to avoid the timeouts in successive calls.
> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Removed the initialization of vf_unload_timeout
> * Splitted the patch in two.
> * Fixed extra space from the commit message.
> ---
>  drivers/net/ethernet/microsoft/mana/gdma_main.c  |  4 +++-
>  drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++-
>  include/net/mana/mana.h                          |  2 ++
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..6411f01be0d9 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
>  	struct gdma_context *gc = gd->gdma_context;
>  	struct gdma_general_resp resp = {};
>  	struct gdma_general_req req = {};
> +	struct mana_context *ac;
>  	int err;
>  
>  	if (gd->pdid == INVALID_PDID)
>  		return -EINVAL;
> +	ac = gd->driver_data;
>  
>  	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, sizeof(req),
>  			     sizeof(resp));
> @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
>  	req.hdr.dev_id = gd->dev_id;
>  
>  	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> -	if (err || resp.hdr.status) {
> +	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
>  		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
>  			err, resp.hdr.status);

With !ac->vf_unload_timeout option, this message may not be correctly showing err, status. Probably you want to add explicit information during timeouts so that it give right information ? Or have the err, status field properly updated.

>  		if (!err)
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..492cb2c6e2cb 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -1,8 +1,10 @@
>  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /* Copyright (c) 2021, Microsoft Corporation. */
>  
> +#include "asm-generic/errno.h"
>  #include <net/mana/gdma.h>
>  #include <net/mana/hw_channel.h>
> +#include <net/mana/mana.h>
>  
>  static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
>  {
> @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
>  	struct hwc_wq *txq = hwc->txq;
>  	struct gdma_req_hdr *req_msg;
>  	struct hwc_caller_ctx *ctx;
> +	struct mana_context *ac;
>  	u32 dest_vrcq = 0;
>  	u32 dest_vrq = 0;
>  	u16 msg_id;
>  	int err;
>  
>  	mana_hwc_get_msg_index(hwc, &msg_id);
> +	ac = hwc->gdma_dev->driver_data;

Is there a case where gdma_dev be invalid here ? If so, lets check the state and then proceed further ?

> +	if (ac->vf_unload_timeout) {
> +		dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
> +		err = -ETIMEDOUT;
> +		goto out;
> +	}
>  
>  	tx_wr = &txq->msg_buf->reqs[msg_id];
>  
> @@ -825,9 +834,10 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
>  		goto out;
>  	}
>  
> -	if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> +	if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {

IMHO we should have macros instead of magic numbers (5 , 30 or so). But would like others to comment here.

>  		dev_err(hwc->dev, "HWC: Request timed out!\n");
>  		err = -ETIMEDOUT;
> +		ac->vf_unload_timeout = true;
>  		goto out;
>  	}
>  
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 9eef19972845..5f5affdca1eb 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -358,6 +358,8 @@ struct mana_context {
>  
>  	u16 num_ports;
>  
> +	bool vf_unload_timeout;
> +
>  	struct mana_eq *eqs;
>  
>  	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:18   ` souradeep chakrabarti
@ 2023-06-26 14:20     ` Praveen Kumar
  2023-06-27  8:44       ` Souradeep Chakrabarti
  2023-06-26 15:35     ` Michael Kelley (LINUX)
  1 sibling, 1 reply; 17+ messages in thread
From: Praveen Kumar @ 2023-06-26 14:20 UTC (permalink / raw)
  To: souradeep chakrabarti, kys, haiyangz, wei.liu, decui, davem,
	edumazet, kuba, pabeni, longli, sharmaajay, leon, cai.huoqing,
	ssengar, vkuznets, tglx, linux-hyperv, netdev, linux-kernel,
	linux-rdma
  Cc: stable, schakrabarti

On 6/26/2023 2:48 PM, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Splitted the patch in two parts.
> * Removed the unnecessary braces from mana_dealloc_queues().
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..cb5c43c3c47e 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  {
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	struct gdma_dev *gd = apc->ac->gdma_dev;
> +	unsigned long timeout;
>  	struct mana_txq *txq;
> +	struct sk_buff *skb;
> +	struct mana_cq *cq;
>  	int i, err;
>  
>  	if (apc->port_is_up)
> @@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	 *
>  	 * Drain all the in-flight TX packets
>  	 */
> +
> +	timeout = jiffies + 120 * HZ;
>  	for (i = 0; i < apc->num_queues; i++) {
>  		txq = &apc->tx_qp[i].txq;
> -
> -		while (atomic_read(&txq->pending_sends) > 0)
> +		while (atomic_read(&txq->pending_sends) > 0 &&
> +		       time_before(jiffies, timeout))
>  			usleep_range(1000, 2000);
>  	}
>  
> +	for (i = 0; i < apc->num_queues; i++) {
> +		txq = &apc->tx_qp[i].txq;
> +		cq = &apc->tx_qp[i].tx_cq;
> +		while (atomic_read(&txq->pending_sends)) {
> +			skb = skb_dequeue(&txq->pending_skbs);
> +			mana_unmap_skb(skb, apc);
> +			napi_consume_skb(skb, cq->budget);
> +			atomic_sub(1, &txq->pending_sends);
> +		}
> +	}

Can we combine these 2 loops into 1 something like this ?

	for (i = 0; i < apc->num_queues; i++) {
		txq = &apc->tx_qp[i].txq;
		cq = &apc->tx_qp[i].tx_cq;
		while (atomic_read(&txq->pending_sends)) {
			if (time_before(jiffies, timeout)) {
				usleep_range(1000, 2000);
			} else {
				skb = skb_dequeue(&txq->pending_skbs);
				mana_unmap_skb(skb, apc);
				napi_consume_skb(skb, cq->budget);
				atomic_sub(1, &txq->pending_sends);
			}
		}
	}
>  	/* We're 100% sure the queues can no longer be woken up, because
>  	 * we're sure now mana_poll_tx_cq() can't be running.
>  	 */


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:18   ` souradeep chakrabarti
  2023-06-26 14:20     ` Praveen Kumar
@ 2023-06-26 15:35     ` Michael Kelley (LINUX)
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Kelley (LINUX) @ 2023-06-26 15:35 UTC (permalink / raw)
  To: souradeep chakrabarti, KY Srinivasan, Haiyang Zhang, wei.liu,
	Dexuan Cui, davem, edumazet, kuba, pabeni, Long Li, Ajay Sharma,
	leon, cai.huoqing, ssengar, vkuznets, tglx, linux-hyperv, netdev,
	linux-kernel, linux-rdma
  Cc: stable, Souradeep Chakrabarti

From: souradeep chakrabarti <schakrabarti@linux.microsoft.com> Sent: Monday, June 26, 2023 2:19 AM
> 
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.

For a patch series, the cover letter (patch 0 of the series) does not get
included in the commit log anywhere.   The cover letter can provide
overall motivation and describe how the patches fit together, but the
commit message for each patch should be as self-contained as possible.

The commit message here refers to "the VF unload issue", and there's
no context for understanding what that issue is, though you do provide
some description in the text following "where". Could you provide a
commit message that is a bit more self-contained?

Same comment applies to commit message for the 2nd patch of this
series.

Also, avoid text like "this patch".   See the "Describe your changes"
section in Documentation/process/submitting-patches.rst where the
use of imperative mood is mentioned.  If you like, I can provide some
offline help on writing a good commit message.

Michael

> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Splitted the patch in two parts.
> * Removed the unnecessary braces from mana_dealloc_queues().
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..cb5c43c3c47e 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  {
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	struct gdma_dev *gd = apc->ac->gdma_dev;
> +	unsigned long timeout;
>  	struct mana_txq *txq;
> +	struct sk_buff *skb;
> +	struct mana_cq *cq;
>  	int i, err;
> 
>  	if (apc->port_is_up)
> @@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	 *
>  	 * Drain all the in-flight TX packets
>  	 */
> +
> +	timeout = jiffies + 120 * HZ;
>  	for (i = 0; i < apc->num_queues; i++) {
>  		txq = &apc->tx_qp[i].txq;
> -
> -		while (atomic_read(&txq->pending_sends) > 0)
> +		while (atomic_read(&txq->pending_sends) > 0 &&
> +		       time_before(jiffies, timeout))
>  			usleep_range(1000, 2000);
>  	}
> 
> +	for (i = 0; i < apc->num_queues; i++) {
> +		txq = &apc->tx_qp[i].txq;
> +		cq = &apc->tx_qp[i].tx_cq;
> +		while (atomic_read(&txq->pending_sends)) {
> +			skb = skb_dequeue(&txq->pending_skbs);
> +			mana_unmap_skb(skb, apc);
> +			napi_consume_skb(skb, cq->budget);
> +			atomic_sub(1, &txq->pending_sends);
> +		}
> +	}
>  	/* We're 100% sure the queues can no longer be woken up, because
>  	 * we're sure now mana_poll_tx_cq() can't be running.
>  	 */
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26  9:20   ` [PATCH 2/2 " souradeep chakrabarti
  2023-06-26 14:13     ` Praveen Kumar
@ 2023-06-26 15:53     ` Haiyang Zhang
  2023-06-26 16:02       ` Haiyang Zhang
  2023-06-27  8:45       ` Souradeep Chakrabarti
  1 sibling, 2 replies; 17+ messages in thread
From: Haiyang Zhang @ 2023-06-26 15:53 UTC (permalink / raw)
  To: souradeep chakrabarti, KY Srinivasan, wei.liu, Dexuan Cui, davem,
	edumazet, kuba, pabeni, Long Li, Ajay Sharma, leon, cai.huoqing,
	ssengar, vkuznets, tglx, linux-hyperv, netdev, linux-kernel,
	linux-rdma
  Cc: stable, Souradeep Chakrabarti



> -----Original Message-----
> From: souradeep chakrabarti <schakrabarti@linux.microsoft.com>
> Sent: Monday, June 26, 2023 5:20 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> <schakrabarti@microsoft.com>; Souradeep Chakrabarti
> <schakrabarti@linux.microsoft.com>
> Subject: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
> unresponsive

In general, two patches shouldn't have the same Subject.

For this patch set, the two patches are two steps to fix the unloading issue, and
they are not very long. IMHO, they should be in one patch.

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 15:53     ` Haiyang Zhang
@ 2023-06-26 16:02       ` Haiyang Zhang
  2023-06-27  8:45       ` Souradeep Chakrabarti
  1 sibling, 0 replies; 17+ messages in thread
From: Haiyang Zhang @ 2023-06-26 16:02 UTC (permalink / raw)
  To: Haiyang Zhang, souradeep chakrabarti, KY Srinivasan, wei.liu,
	Dexuan Cui, davem, edumazet, kuba, pabeni, Long Li, Ajay Sharma,
	leon, cai.huoqing, ssengar, vkuznets, tglx, linux-hyperv, netdev,
	linux-kernel, linux-rdma
  Cc: stable, Souradeep Chakrabarti



> -----Original Message-----
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Sent: Monday, June 26, 2023 11:54 AM
> To: souradeep chakrabarti <schakrabarti@linux.microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> <schakrabarti@microsoft.com>
> Subject: RE: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
> unresponsive
> 
> 
> 
> > -----Original Message-----
> > From: souradeep chakrabarti <schakrabarti@linux.microsoft.com>
> > Sent: Monday, June 26, 2023 5:20 AM
> > To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> > <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> > kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> > Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> > cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> > tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> > linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> > Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> > <schakrabarti@microsoft.com>; Souradeep Chakrabarti
> > <schakrabarti@linux.microsoft.com>
> > Subject: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
> > unresponsive
> 
> In general, two patches shouldn't have the same Subject.
> 

If two patches are preferred, please use more descriptive subjects like these:

1/2: Fix the infinite waiting on pending_sends during host unresponsiveness
2/2: Avoid extra waits if host not responding in earlier steps

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 13:05   ` [PATCH 1/2 " Simon Horman
@ 2023-06-26 20:06     ` Dexuan Cui
  2023-06-26 20:47       ` Stephen Hemminger
  0 siblings, 1 reply; 17+ messages in thread
From: Dexuan Cui @ 2023-06-26 20:06 UTC (permalink / raw)
  To: Simon Horman, souradeep chakrabarti
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu, davem, edumazet, kuba,
	pabeni, Long Li, Ajay Sharma, leon, cai.huoqing, ssengar,
	vkuznets, tglx, linux-hyperv, netdev, linux-kernel, linux-rdma,
	stable, Souradeep Chakrabarti

> From: Simon Horman
> Sent: Monday, June 26, 2023 6:05 AM
> > ...
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> > driver for
> > Microsoft Azure Network Adapter)
> 
> nit: A correct format of this fixes tag is:
> 
> In particular:
> * All on lone line
> * Description in double quotes.
> 
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> Adapter (MANA)")

Hi Souradeep, FYI I often refer to:
https://marc.info/?l=linux-pci&m=150905742808166&w=2

The link mentions:
alias gsr='git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h (\"%s\")%n"'

"gsr ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f" produces:
ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 14:13     ` Praveen Kumar
@ 2023-06-26 20:32       ` Haiyang Zhang
  2023-06-27  8:42       ` Souradeep Chakrabarti
  1 sibling, 0 replies; 17+ messages in thread
From: Haiyang Zhang @ 2023-06-26 20:32 UTC (permalink / raw)
  To: Praveen Kumar, souradeep chakrabarti, KY Srinivasan, wei.liu,
	Dexuan Cui, davem, edumazet, kuba, pabeni, Long Li, Ajay Sharma,
	leon, cai.huoqing, ssengar, vkuznets, tglx, linux-hyperv, netdev,
	linux-kernel, linux-rdma
  Cc: stable, Souradeep Chakrabarti



> -----Original Message-----
> From: Praveen Kumar <kumarpraveen@linux.microsoft.com>
> Sent: Monday, June 26, 2023 10:13 AM
> To: souradeep chakrabarti <schakrabarti@linux.microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay Sharma
> <sharmaajay@microsoft.com>; leon@kernel.org; cai.huoqing@linux.dev;
> ssengar@linux.microsoft.com; vkuznets@redhat.com; tglx@linutronix.de; linux-
> hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-rdma@vger.kernel.org
> Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> <schakrabarti@microsoft.com>
> Subject: Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
> unresponsive
> 
> On 6/26/2023 2:50 PM, souradeep chakrabarti wrote:
> > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> >
> > This is the second part of the fix.
> >
> > Also this patch adds a new attribute in mana_context, which gets set when
> > mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> > This flag then helps to avoid the timeouts in successive calls.
> >
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> driver for
> > Microsoft Azure Network Adapter)
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > V2 -> V3:
> > * Removed the initialization of vf_unload_timeout
> > * Splitted the patch in two.
> > * Fixed extra space from the commit message.
> > ---
> >  drivers/net/ethernet/microsoft/mana/gdma_main.c  |  4 +++-
> >  drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++-
> >  include/net/mana/mana.h                          |  2 ++
> >  3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 8f3f78b68592..6411f01be0d9 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev
> *gd)
> >  	struct gdma_context *gc = gd->gdma_context;
> >  	struct gdma_general_resp resp = {};
> >  	struct gdma_general_req req = {};
> > +	struct mana_context *ac;
> >  	int err;
> >
> >  	if (gd->pdid == INVALID_PDID)
> >  		return -EINVAL;
> > +	ac = gd->driver_data;
> >
> >  	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE,
> sizeof(req),
> >  			     sizeof(resp));
> > @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> >  	req.hdr.dev_id = gd->dev_id;
> >
> >  	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> > -	if (err || resp.hdr.status) {
> > +	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
> >  		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
> >  			err, resp.hdr.status);
> 
> With !ac->vf_unload_timeout option, this message may not be correctly
> showing err, status. Probably you want to add explicit information during
> timeouts so that it give right information ? Or have the err, status field properly
> updated.
> 
> >  		if (!err)
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index 9d1507eba5b9..492cb2c6e2cb 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -1,8 +1,10 @@
> >  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> >  /* Copyright (c) 2021, Microsoft Corporation. */
> >
> > +#include "asm-generic/errno.h"
> >  #include <net/mana/gdma.h>
> >  #include <net/mana/hw_channel.h>
> > +#include <net/mana/mana.h>
> >
> >  static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16
> *msg_id)
> >  {
> > @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
> >  	struct hwc_wq *txq = hwc->txq;
> >  	struct gdma_req_hdr *req_msg;
> >  	struct hwc_caller_ctx *ctx;
> > +	struct mana_context *ac;
> >  	u32 dest_vrcq = 0;
> >  	u32 dest_vrq = 0;
> >  	u16 msg_id;
> >  	int err;
> >
> >  	mana_hwc_get_msg_index(hwc, &msg_id);
> > +	ac = hwc->gdma_dev->driver_data;
> 
> Is there a case where gdma_dev be invalid here ? If so, lets check the state and
> then proceed further ?

Yes, hwc->gdma_dev is assigned shortly after it's allocated - see the code below. So
it's valid.
But hwc->gdma_dev->driver_data is hwc, not "mana_context *ac". There are two 
gdma_dev in gdma_context: hwc & mana.

You can get ac from: hwc->gdma_dev->gdma_context->mana.driver_data
Or, to avoid too many pointer deference, I suggest to put the vf_unload_timeout 
into gdma_context.

int mana_hwc_create_channel(struct gdma_context *gc)
{
        hwc = kzalloc(sizeof(*hwc), GFP_KERNEL);
...
        gd->gdma_context = gc;
        gd->driver_data = hwc;
        hwc->gdma_dev = gd;
        hwc->dev = gc->dev;


Also, mana_gd_send_request/mana_hwc_send_request() is used in many places,
not just unloading.
Should you use timeout value 5 sec, and the vf_unload_timeout flag in unloading
path only, and avoid touching other code paths? Please check with hostnet team for
suggestions.

If we decide to let the vf_unload_timeout flag affect all code paths, not just 
unloading, then it should be renamed to hwc_timeout, and submit the second patch
separately.
If just use it for unloading, since mana_gd_deregister_device() is used by PF too, 
name it like: unload_hwc_timeout.

Thanks,
-Haiyang



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 20:06     ` Dexuan Cui
@ 2023-06-26 20:47       ` Stephen Hemminger
  2023-06-27  8:35         ` Souradeep Chakrabarti
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2023-06-26 20:47 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Simon Horman, souradeep chakrabarti, KY Srinivasan,
	Haiyang Zhang, wei.liu, davem, edumazet, kuba, pabeni, Long Li,
	Ajay Sharma, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
	Souradeep Chakrabarti

On Mon, 26 Jun 2023 20:06:48 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Simon Horman
> > Sent: Monday, June 26, 2023 6:05 AM  
> > > ...
> > > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> > > driver for
> > > Microsoft Azure Network Adapter)  
> > 
> > nit: A correct format of this fixes tag is:
> > 
> > In particular:
> > * All on lone line
> > * Description in double quotes.
> > 
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> > Adapter (MANA)")  
> 
> Hi Souradeep, FYI I often refer to:
> https://marc.info/?l=linux-pci&m=150905742808166&w=2
> 
> The link mentions:
> alias gsr='git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h (\"%s\")%n"'
> 
> "gsr ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f" produces:
> ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

You can do same thing without shell alias by using git-config

[alias]
	fixes = log -1 --format=fixes
	gsr = log -1 --format=gsr

[pretty]
	fixes = Fixes: %h (\"%s\")
	gsr = %h (\"%s\")

Then:
$ git gsr 1919b39fc6eabb9a6f9a51706ff6d03865f5df29
1919b39fc6ea ("net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters")


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 20:47       ` Stephen Hemminger
@ 2023-06-27  8:35         ` Souradeep Chakrabarti
  0 siblings, 0 replies; 17+ messages in thread
From: Souradeep Chakrabarti @ 2023-06-27  8:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Dexuan Cui, Simon Horman, KY Srinivasan, Haiyang Zhang, wei.liu,
	davem, edumazet, kuba, pabeni, Long Li, Ajay Sharma, leon,
	cai.huoqing, ssengar, vkuznets, tglx, linux-hyperv, netdev,
	linux-kernel, linux-rdma, stable, Souradeep Chakrabarti

On Mon, Jun 26, 2023 at 01:47:21PM -0700, Stephen Hemminger wrote:
> On Mon, 26 Jun 2023 20:06:48 +0000
> Dexuan Cui <decui@microsoft.com> wrote:
> 
> > > From: Simon Horman
> > > Sent: Monday, June 26, 2023 6:05 AM  
> > > > ...
> > > > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> > > > driver for
> > > > Microsoft Azure Network Adapter)  
> > > 
> > > nit: A correct format of this fixes tag is:
> > > 
> > > In particular:
> > > * All on lone line
> > > * Description in double quotes.
> > > 
> > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> > > Adapter (MANA)")  
> > 
> > Hi Souradeep, FYI I often refer to:
> > https://marc.info/?l=linux-pci&m=150905742808166&w=2
> > 
> > The link mentions:
> > alias gsr='git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h (\"%s\")%n"'
> > 
Thank you for the advice. Will use it from now onwards.
> > "gsr ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f" produces:
> > ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> 
> You can do same thing without shell alias by using git-config
> 
> [alias]
> 	fixes = log -1 --format=fixes
> 	gsr = log -1 --format=gsr
> 
> [pretty]
> 	fixes = Fixes: %h (\"%s\")
> 	gsr = %h (\"%s\")
> 
> Then:
> $ git gsr 1919b39fc6eabb9a6f9a51706ff6d03865f5df29
> 1919b39fc6ea ("net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters")
Thank you for the suggestion.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 14:13     ` Praveen Kumar
  2023-06-26 20:32       ` Haiyang Zhang
@ 2023-06-27  8:42       ` Souradeep Chakrabarti
  1 sibling, 0 replies; 17+ messages in thread
From: Souradeep Chakrabarti @ 2023-06-27  8:42 UTC (permalink / raw)
  To: Praveen Kumar
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
	schakrabarti

On Mon, Jun 26, 2023 at 07:43:07PM +0530, Praveen Kumar wrote:
> On 6/26/2023 2:50 PM, souradeep chakrabarti wrote:
> > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > 
> > This is the second part of the fix.
> > 
> > Also this patch adds a new attribute in mana_context, which gets set when
> > mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> > This flag then helps to avoid the timeouts in successive calls.
> > 
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> > Microsoft Azure Network Adapter)
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > V2 -> V3:
> > * Removed the initialization of vf_unload_timeout
> > * Splitted the patch in two.
> > * Fixed extra space from the commit message.
> > ---
> >  drivers/net/ethernet/microsoft/mana/gdma_main.c  |  4 +++-
> >  drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++-
> >  include/net/mana/mana.h                          |  2 ++
> >  3 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 8f3f78b68592..6411f01be0d9 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> >  	struct gdma_context *gc = gd->gdma_context;
> >  	struct gdma_general_resp resp = {};
> >  	struct gdma_general_req req = {};
> > +	struct mana_context *ac;
> >  	int err;
> >  
> >  	if (gd->pdid == INVALID_PDID)
> >  		return -EINVAL;
> > +	ac = gd->driver_data;
> >  
> >  	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, sizeof(req),
> >  			     sizeof(resp));
> > @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> >  	req.hdr.dev_id = gd->dev_id;
> >  
> >  	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> > -	if (err || resp.hdr.status) {
> > +	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
> >  		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
> >  			err, resp.hdr.status);
> 
> With !ac->vf_unload_timeout option, this message may not be correctly showing err, status. Probably you want to add explicit information during timeouts so that it give right information ? Or have the err, status field properly updated.
This check !ac->vf_unload_timeout here means if ac->vf_unload_timeout is not yet set,
then only consider the err path, else continue the remaining operation.
> 
> >  		if (!err)
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index 9d1507eba5b9..492cb2c6e2cb 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -1,8 +1,10 @@
> >  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> >  /* Copyright (c) 2021, Microsoft Corporation. */
> >  
> > +#include "asm-generic/errno.h"
> >  #include <net/mana/gdma.h>
> >  #include <net/mana/hw_channel.h>
> > +#include <net/mana/mana.h>
> >  
> >  static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
> >  {
> > @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> >  	struct hwc_wq *txq = hwc->txq;
> >  	struct gdma_req_hdr *req_msg;
> >  	struct hwc_caller_ctx *ctx;
> > +	struct mana_context *ac;
> >  	u32 dest_vrcq = 0;
> >  	u32 dest_vrq = 0;
> >  	u16 msg_id;
> >  	int err;
> >  
> >  	mana_hwc_get_msg_index(hwc, &msg_id);
> > +	ac = hwc->gdma_dev->driver_data;
> 
> Is there a case where gdma_dev be invalid here ? If so, lets check the state and then proceed further ?
I can see Haiyang has already in his comment, responded on the same.
hwc->gdma_dev will be valid here, but as Haiyang pointed we need to use
hwc->gdma_dev->gdma_context->mana.driver_data, or better to relocate the
attribute in gdma_context.
> 
> > +	if (ac->vf_unload_timeout) {
> > +		dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
> > +		err = -ETIMEDOUT;
> > +		goto out;
> > +	}
> >  
> >  	tx_wr = &txq->msg_buf->reqs[msg_id];
> >  
> > @@ -825,9 +834,10 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> >  		goto out;
> >  	}
> >  
> > -	if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> > +	if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
> 
> IMHO we should have macros instead of magic numbers (5 , 30 or so). But would like others to comment here.
> 
> >  		dev_err(hwc->dev, "HWC: Request timed out!\n");
> >  		err = -ETIMEDOUT;
> > +		ac->vf_unload_timeout = true;
> >  		goto out;
> >  	}
> >  
> > diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> > index 9eef19972845..5f5affdca1eb 100644
> > --- a/include/net/mana/mana.h
> > +++ b/include/net/mana/mana.h
> > @@ -358,6 +358,8 @@ struct mana_context {
> >  
> >  	u16 num_ports;
> >  
> > +	bool vf_unload_timeout;
> > +
> >  	struct mana_eq *eqs;
> >  
> >  	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 14:20     ` Praveen Kumar
@ 2023-06-27  8:44       ` Souradeep Chakrabarti
  0 siblings, 0 replies; 17+ messages in thread
From: Souradeep Chakrabarti @ 2023-06-27  8:44 UTC (permalink / raw)
  To: Praveen Kumar
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
	schakrabarti

On Mon, Jun 26, 2023 at 07:50:44PM +0530, Praveen Kumar wrote:
> On 6/26/2023 2:48 PM, souradeep chakrabarti wrote:
> > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > 
> > This patch addresses the VF unload issue, where mana_dealloc_queues()
> > gets stuck in infinite while loop, because of host unresponsiveness.
> > It adds a timeout in the while loop, to fix it.
> > 
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> > Microsoft Azure Network Adapter)
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > V2 -> V3:
> > * Splitted the patch in two parts.
> > * Removed the unnecessary braces from mana_dealloc_queues().
> > ---
> >  drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index d907727c7b7a..cb5c43c3c47e 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
> >  {
> >  	struct mana_port_context *apc = netdev_priv(ndev);
> >  	struct gdma_dev *gd = apc->ac->gdma_dev;
> > +	unsigned long timeout;
> >  	struct mana_txq *txq;
> > +	struct sk_buff *skb;
> > +	struct mana_cq *cq;
> >  	int i, err;
> >  
> >  	if (apc->port_is_up)
> > @@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
> >  	 *
> >  	 * Drain all the in-flight TX packets
> >  	 */
> > +
> > +	timeout = jiffies + 120 * HZ;
> >  	for (i = 0; i < apc->num_queues; i++) {
> >  		txq = &apc->tx_qp[i].txq;
> > -
> > -		while (atomic_read(&txq->pending_sends) > 0)
> > +		while (atomic_read(&txq->pending_sends) > 0 &&
> > +		       time_before(jiffies, timeout))
> >  			usleep_range(1000, 2000);
> >  	}
> >  
> > +	for (i = 0; i < apc->num_queues; i++) {
> > +		txq = &apc->tx_qp[i].txq;
> > +		cq = &apc->tx_qp[i].tx_cq;
> > +		while (atomic_read(&txq->pending_sends)) {
> > +			skb = skb_dequeue(&txq->pending_skbs);
> > +			mana_unmap_skb(skb, apc);
> > +			napi_consume_skb(skb, cq->budget);
> > +			atomic_sub(1, &txq->pending_sends);
> > +		}
> > +	}
> 
> Can we combine these 2 loops into 1 something like this ?
> 
> 	for (i = 0; i < apc->num_queues; i++) {
> 		txq = &apc->tx_qp[i].txq;
> 		cq = &apc->tx_qp[i].tx_cq;
> 		while (atomic_read(&txq->pending_sends)) {
> 			if (time_before(jiffies, timeout)) {
> 				usleep_range(1000, 2000);
> 			} else {
> 				skb = skb_dequeue(&txq->pending_skbs);
> 				mana_unmap_skb(skb, apc);
> 				napi_consume_skb(skb, cq->budget);
> 				atomic_sub(1, &txq->pending_sends);
> 			}
> 		}
> 	}
We should free up the skbs only after timeout has happened or after all the
queues are looped.
> >  	/* We're 100% sure the queues can no longer be woken up, because
> >  	 * we're sure now mana_poll_tx_cq() can't be running.
> >  	 */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive
  2023-06-26 15:53     ` Haiyang Zhang
  2023-06-26 16:02       ` Haiyang Zhang
@ 2023-06-27  8:45       ` Souradeep Chakrabarti
  1 sibling, 0 replies; 17+ messages in thread
From: Souradeep Chakrabarti @ 2023-06-27  8:45 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: KY Srinivasan, wei.liu, Dexuan Cui, davem, edumazet, kuba,
	pabeni, Long Li, Ajay Sharma, leon, cai.huoqing, ssengar,
	vkuznets, tglx, linux-hyperv, netdev, linux-kernel, linux-rdma,
	stable, Souradeep Chakrabarti

On Mon, Jun 26, 2023 at 03:53:50PM +0000, Haiyang Zhang wrote:
> 
> 
> > -----Original Message-----
> > From: souradeep chakrabarti <schakrabarti@linux.microsoft.com>
> > Sent: Monday, June 26, 2023 5:20 AM
> > To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> > <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> > kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> > Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> > cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> > tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> > linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> > Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> > <schakrabarti@microsoft.com>; Souradeep Chakrabarti
> > <schakrabarti@linux.microsoft.com>
> > Subject: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
> > unresponsive
> 
> In general, two patches shouldn't have the same Subject.
> 
> For this patch set, the two patches are two steps to fix the unloading issue, and
> they are not very long. IMHO, they should be in one patch.
> 
> Thanks,
> - Haiyang
I will create a single patch in next version. As this fixes the unloading issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-27  8:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-26  9:17 [PATCH 0/2 V3 net] net: mana: Fix MANA VF unload when host is unresponsive souradeep chakrabarti
2023-06-26  9:18 ` [PATCH 1/2 " souradeep chakrabarti
2023-06-26  9:18   ` souradeep chakrabarti
2023-06-26 14:20     ` Praveen Kumar
2023-06-27  8:44       ` Souradeep Chakrabarti
2023-06-26 15:35     ` Michael Kelley (LINUX)
2023-06-26  9:20   ` [PATCH 2/2 " souradeep chakrabarti
2023-06-26 14:13     ` Praveen Kumar
2023-06-26 20:32       ` Haiyang Zhang
2023-06-27  8:42       ` Souradeep Chakrabarti
2023-06-26 15:53     ` Haiyang Zhang
2023-06-26 16:02       ` Haiyang Zhang
2023-06-27  8:45       ` Souradeep Chakrabarti
2023-06-26 13:05   ` [PATCH 1/2 " Simon Horman
2023-06-26 20:06     ` Dexuan Cui
2023-06-26 20:47       ` Stephen Hemminger
2023-06-27  8:35         ` Souradeep Chakrabarti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).