linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] qed: rdma - don't wait for resources under hw error recovery flow
@ 2021-09-22  7:36 Shai Malin
  2021-09-22  9:40 ` Leon Romanovsky
  0 siblings, 1 reply; 3+ messages in thread
From: Shai Malin @ 2021-09-22  7:36 UTC (permalink / raw)
  To: netdev, davem, kuba
  Cc: linux-rdma, jgg, leon, aelior, smalin, malin1024, Michal Kalderon

If the HW device is during recovery, the HW resources will never return,
hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
This fix speeds up the error recovery flow.

Changes since v1:
- Fix race condition (thanks to Leon Romanovsky).

Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery")
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 8 ++++++++
 drivers/net/ethernet/qlogic/qed/qed_roce.c  | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index fc8b3e64f153..186d0048a9d1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -1297,6 +1297,14 @@ qed_iwarp_wait_cid_map_cleared(struct qed_hwfn *p_hwfn, struct qed_bmap *bmap)
 	prev_weight = weight;
 
 	while (weight) {
+		/* If the HW device is during recovery, all resources are
+		 * immediately reset without receiving a per-cid indication
+		 * from HW. In this case we don't expect the cid_map to be
+		 * cleared.
+		 */
+		if (p_hwfn->cdev->recov_in_prog)
+			return 0;
+
 		msleep(QED_IWARP_MAX_CID_CLEAN_TIME);
 
 		weight = bitmap_weight(bmap->bitmap, bmap->max_count);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index f16a157bb95a..cf5baa5e59bc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -77,6 +77,14 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
 	 * Beyond the added delay we clear the bitmap anyway.
 	 */
 	while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
+		/* If the HW device is during recovery, all resources are
+		 * immediately reset without receiving a per-cid indication
+		 * from HW. In this case we don't expect the cid bitmap to be
+		 * cleared.
+		 */
+		if (p_hwfn->cdev->recov_in_prog)
+			return;
+
 		msleep(100);
 		if (wait_count++ > 20) {
 			DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");
-- 
2.27.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] qed: rdma - don't wait for resources under hw error recovery flow
  2021-09-22  7:36 [PATCH net v2] qed: rdma - don't wait for resources under hw error recovery flow Shai Malin
@ 2021-09-22  9:40 ` Leon Romanovsky
  0 siblings, 0 replies; 3+ messages in thread
From: Leon Romanovsky @ 2021-09-22  9:40 UTC (permalink / raw)
  To: Shai Malin
  Cc: netdev, davem, kuba, linux-rdma, jgg, aelior, malin1024, Michal Kalderon

On Wed, Sep 22, 2021 at 10:36:31AM +0300, Shai Malin wrote:
> If the HW device is during recovery, the HW resources will never return,
> hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
> This fix speeds up the error recovery flow.
> 
> Changes since v1:
> - Fix race condition (thanks to Leon Romanovsky).

Please put changelog under "---", there is a little value for them in the
commit message.

> 
> Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery")
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 8 ++++++++
>  drivers/net/ethernet/qlogic/qed/qed_roce.c  | 8 ++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> index fc8b3e64f153..186d0048a9d1 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> @@ -1297,6 +1297,14 @@ qed_iwarp_wait_cid_map_cleared(struct qed_hwfn *p_hwfn, struct qed_bmap *bmap)
>  	prev_weight = weight;
>  
>  	while (weight) {
> +		/* If the HW device is during recovery, all resources are
> +		 * immediately reset without receiving a per-cid indication
> +		 * from HW. In this case we don't expect the cid_map to be
> +		 * cleared.
> +		 */
> +		if (p_hwfn->cdev->recov_in_prog)
> +			return 0;
> +
>  		msleep(QED_IWARP_MAX_CID_CLEAN_TIME);
>  
>  		weight = bitmap_weight(bmap->bitmap, bmap->max_count);
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
> index f16a157bb95a..cf5baa5e59bc 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
> @@ -77,6 +77,14 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
>  	 * Beyond the added delay we clear the bitmap anyway.
>  	 */
>  	while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
> +		/* If the HW device is during recovery, all resources are
> +		 * immediately reset without receiving a per-cid indication
> +		 * from HW. In this case we don't expect the cid bitmap to be
> +		 * cleared.
> +		 */
> +		if (p_hwfn->cdev->recov_in_prog)
> +			return;
> +
>  		msleep(100);
>  		if (wait_count++ > 20) {
>  			DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH net v2] qed: rdma - don't wait for resources under hw error recovery flow
@ 2021-09-22 10:55 Shai Malin
  0 siblings, 0 replies; 3+ messages in thread
From: Shai Malin @ 2021-09-22 10:55 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: netdev, davem, kuba, linux-rdma, jgg, Ariel Elior, malin1024,
	Michal Kalderon

On Wed, 22 Sept 2021 at 12:40, Leon Romanovsky <leon@kernel.org> wrote:
> On Wed, Sep 22, 2021 at 10:36:31AM +0300, Shai Malin wrote:
> > If the HW device is during recovery, the HW resources will never return,
> > hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
> > This fix speeds up the error recovery flow.
> >
> > Changes since v1:
> > - Fix race condition (thanks to Leon Romanovsky).
> 
> Please put changelog under "---", there is a little value for them in the
> commit message.

Sure. Thanks.

> 
> >
> > Fixes: 64515dc899df ("qed: Add infrastructure for error detection and
> recovery")
> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> > Signed-off-by: Ariel Elior <aelior@marvell.com>
> > Signed-off-by: Shai Malin <smalin@marvell.com>
> > ---
> >  drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 8 ++++++++
> >  drivers/net/ethernet/qlogic/qed/qed_roce.c  | 8 ++++++++
> >  2 files changed, 16 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> > index fc8b3e64f153..186d0048a9d1 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> > @@ -1297,6 +1297,14 @@ qed_iwarp_wait_cid_map_cleared(struct qed_hwfn
> *p_hwfn, struct qed_bmap *bmap)
> >  	prev_weight = weight;
> >
> >  	while (weight) {
> > +		/* If the HW device is during recovery, all resources are
> > +		 * immediately reset without receiving a per-cid indication
> > +		 * from HW. In this case we don't expect the cid_map to be
> > +		 * cleared.
> > +		 */
> > +		if (p_hwfn->cdev->recov_in_prog)
> > +			return 0;
> > +
> >  		msleep(QED_IWARP_MAX_CID_CLEAN_TIME);
> >
> >  		weight = bitmap_weight(bmap->bitmap, bmap->max_count);
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c
> b/drivers/net/ethernet/qlogic/qed/qed_roce.c
> > index f16a157bb95a..cf5baa5e59bc 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
> > @@ -77,6 +77,14 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
> >  	 * Beyond the added delay we clear the bitmap anyway.
> >  	 */
> >  	while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
> > +		/* If the HW device is during recovery, all resources are
> > +		 * immediately reset without receiving a per-cid indication
> > +		 * from HW. In this case we don't expect the cid bitmap to be
> > +		 * cleared.
> > +		 */
> > +		if (p_hwfn->cdev->recov_in_prog)
> > +			return;
> > +
> >  		msleep(100);
> >  		if (wait_count++ > 20) {
> >  			DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-22 10:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22  7:36 [PATCH net v2] qed: rdma - don't wait for resources under hw error recovery flow Shai Malin
2021-09-22  9:40 ` Leon Romanovsky
2021-09-22 10:55 Shai Malin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).