linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next 0/6] RDMA/bnxt_re driver update
@ 2019-11-25  8:39 Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 1/6] RDMA/bnxt_re: Avoid freeing MR resources if dereg fails Selvin Xavier
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

This series includes couple of bug fixes and
code refactoring in the device init/deinit path.
Also, modifies the code to use the new driver unregistration APIs.

Please apply to for-next.

Thanks,
Selvin Xavier

Selvin Xavier (6):
  RDMA/bnxt_re: Avoid freeing MR resources if dereg fails
  RDMA/bnxt_re: Fix Send Work Entry state check while polling
    completions
  RDMA/bnxt_re: Add more flags in device init and uninit path
  RDMA/bnxt_re: Refactor device add/remove functionalities
  RDMA/bnxt_re: Use driver_unregister and unregistration API
  RDMA/bnxt_re: Report more number of completion vectors

 drivers/infiniband/hw/bnxt_re/bnxt_re.h    |  16 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.c   |   4 +-
 drivers/infiniband/hw/bnxt_re/main.c       | 246 +++++++++++++++--------------
 drivers/infiniband/hw/bnxt_re/qplib_fp.c   |  12 +-
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |   7 +
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |   1 +
 6 files changed, 154 insertions(+), 132 deletions(-)

-- 
2.5.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH for-next 1/6] RDMA/bnxt_re: Avoid freeing MR resources if dereg fails
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
@ 2019-11-25  8:39 ` Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 2/6] RDMA/bnxt_re: Fix Send Work Entry state check while polling completions Selvin Xavier
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

Driver returns error code for MR dereg, but frees the MR structure.
When the MR dereg is retried due to previous error, the system crashes
as the structure is already freed.

[45545.547748] BUG: unable to handle kernel NULL pointer dereference at 00000000000001b8
[45545.557020] PGD 0 P4D 0
[45545.560370] Oops: 0000 [#1] SMP PTI
[45545.564778] CPU: 7 PID: 12178 Comm: ib_send_bw Kdump: loaded Not tainted 4.18.0-124.el8.x86_64 #1
[45545.575211] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015
[45545.584202] RIP: 0010:__dev_printk+0x2a/0x70
[45545.589495] Code: 0f 1f 44 00 00 49 89 d1 48 85 f6 0f 84 f6 2b 00 00 4c 8b 46 70 4d 85 c0 75 04 4c 8b
46 10 48 8b 86 a8 00 00 00 48 85 c0 74 16 <48> 8b 08 0f be 7f 01 48 c7 c2 13 ac ac 83 83 ef 30 e9 10 fe ff ff
[45545.611538] RSP: 0018:ffffaf7c04607a60 EFLAGS: 00010006
[45545.617903] RAX: 00000000000001b8 RBX: ffffa0010c91c488 RCX: 0000000000000246
[45545.626416] RDX: ffffaf7c04607a68 RSI: ffffa0010c91caa8 RDI: ffffffff83a788eb
[45545.634929] RBP: ffffaf7c04607ac8 R08: 0000000000000000 R09: ffffaf7c04607a68
[45545.643433] R10: 0000000000000000 R11: 0000000000000001 R12: ffffaf7c04607b90
[45545.651924] R13: 000000000000000e R14: 0000000000000000 R15: 00000000ffffa001
[45545.660411] FS:  0000146fa1f1cdc0(0000) GS:ffffa0012fac0000(0000) knlGS:0000000000000000
[45545.669969] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45545.676910] CR2: 00000000000001b8 CR3: 000000007680a003 CR4: 00000000001606e0
[45545.685405] Call Trace:
[45545.688661]  dev_err+0x6c/0x90
[45545.692592]  ? dev_printk_emit+0x4e/0x70
[45545.697490]  bnxt_qplib_rcfw_send_message+0x594/0x660 [bnxt_re]
[45545.704619]  ? dev_err+0x6c/0x90
[45545.708727]  bnxt_qplib_free_mrw+0x80/0xe0 [bnxt_re]
[45545.714782]  bnxt_re_dereg_mr+0x2e/0xd0 [bnxt_re]
[45545.720552]  ib_dereg_mr+0x2f/0x50 [ib_core]
[45545.725835]  destroy_hw_idr_uobject+0x20/0x70 [ib_uverbs]
[45545.732375]  uverbs_destroy_uobject+0x2e/0x170 [ib_uverbs]
[45545.739010]  __uverbs_cleanup_ufile+0x6e/0x90 [ib_uverbs]
[45545.745544]  uverbs_destroy_ufile_hw+0x61/0x130 [ib_uverbs]
[45545.752272]  ib_uverbs_close+0x1f/0x80 [ib_uverbs]
[45545.758126]  __fput+0xb7/0x230
[45545.762033]  task_work_run+0x8a/0xb0
[45545.766518]  do_exit+0x2da/0xb40
...
[45545.841546] RIP: 0033:0x146fa113a387
[45545.845934] Code: Bad RIP value.
[45545.849931] RSP: 002b:00007fff945d1478 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
[45545.858783] RAX: 0000000000000000 RBX: 000055a248908d70 RCX: 0000000000000000
[45545.867145] RDX: 0000146fa1f2b000 RSI: 0000000000000001 RDI: 000055a248906488
[45545.875503] RBP: 000055a248909630 R08: 0000000000010000 R09: 0000000000000000
[45545.883849] R10: 0000000000000000 R11: 0000000000000000 R12: 000055a248906488
[45545.892180] R13: 0000000000000001 R14: 0000000000000000 R15: 000055a2489095f0

Do not free the MR structures, when driver returns error to the stack.

Fixes: 872f3578241d ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 9b6ca15..ad5112a 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3305,8 +3305,10 @@ int bnxt_re_dereg_mr(struct ib_mr *ib_mr, struct ib_udata *udata)
 	int rc;
 
 	rc = bnxt_qplib_free_mrw(&rdev->qplib_res, &mr->qplib_mr);
-	if (rc)
+	if (rc) {
 		dev_err(rdev_to_dev(rdev), "Dereg MR failed: %#x\n", rc);
+		return rc;
+	}
 
 	if (mr->pages) {
 		rc = bnxt_qplib_free_fast_reg_page_list(&rdev->qplib_res,
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-next 2/6] RDMA/bnxt_re: Fix Send Work Entry state check while polling completions
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 1/6] RDMA/bnxt_re: Avoid freeing MR resources if dereg fails Selvin Xavier
@ 2019-11-25  8:39 ` Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 3/6] RDMA/bnxt_re: Add more flags in device init and uninit path Selvin Xavier
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

Some adapters need fence Work Entry to handle retransmission.
Currently the driver checks for this condition, only if the Send
queue entry is signalled. Implement the condition check, irrespective
of the signalled state of the Work queue entries

Fixes: 9152e0b722b2 ("RDMA/bnxt_re: HW workarounds for handling specific conditions")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
index 958c1ff..4d07d22 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
@@ -2283,13 +2283,13 @@ static int bnxt_qplib_cq_process_req(struct bnxt_qplib_cq *cq,
 			/* Add qp to flush list of the CQ */
 			bnxt_qplib_add_flush_qp(qp);
 		} else {
+			/* Before we complete, do WA 9060 */
+			if (do_wa9060(qp, cq, cq_cons, sw_sq_cons,
+				      cqe_sq_cons)) {
+				*lib_qp = qp;
+				goto out;
+			}
 			if (swq->flags & SQ_SEND_FLAGS_SIGNAL_COMP) {
-				/* Before we complete, do WA 9060 */
-				if (do_wa9060(qp, cq, cq_cons, sw_sq_cons,
-					      cqe_sq_cons)) {
-					*lib_qp = qp;
-					goto out;
-				}
 				cqe->status = CQ_REQ_STATUS_OK;
 				cqe++;
 				(*budget)--;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-next 3/6] RDMA/bnxt_re: Add more flags in device init and uninit path
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 1/6] RDMA/bnxt_re: Avoid freeing MR resources if dereg fails Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 2/6] RDMA/bnxt_re: Fix Send Work Entry state check while polling completions Selvin Xavier
@ 2019-11-25  8:39 ` Selvin Xavier
  2019-11-25  8:39 ` [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities Selvin Xavier
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

Add more flags for better granularity in in device init/uninit path

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h | 16 ++++++++----
 drivers/infiniband/hw/bnxt_re/main.c    | 45 ++++++++++++++++++---------------
 2 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 725b235..3be495d 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -118,11 +118,17 @@ struct bnxt_re_dev {
 #define BNXT_RE_FLAG_IBDEV_REGISTERED		1
 #define BNXT_RE_FLAG_GOT_MSIX			2
 #define BNXT_RE_FLAG_HAVE_L2_REF		3
-#define BNXT_RE_FLAG_RCFW_CHANNEL_EN		4
-#define BNXT_RE_FLAG_QOS_WORK_REG		5
-#define BNXT_RE_FLAG_RESOURCES_ALLOCATED	7
-#define BNXT_RE_FLAG_RESOURCES_INITIALIZED	8
-#define BNXT_RE_FLAG_ISSUE_ROCE_STATS          29
+#define BNXT_RE_FLAG_ALLOC_RCFW			4
+#define BNXT_RE_FLAG_NET_RING_ALLOC		5
+#define BNXT_RE_FLAG_RCFW_CHANNEL_EN		6
+#define BNXT_RE_FLAG_ALLOC_CTX			7
+#define BNXT_RE_FLAG_STATS_CTX_ALLOC		8
+#define BNXT_RE_FLAG_STATS_CTX2_ALLOC		9
+#define BNXT_RE_FLAG_RCFW_CHANNEL_INIT		10
+#define BNXT_RE_FLAG_QOS_WORK_REG		11
+#define BNXT_RE_FLAG_RESOURCES_ALLOCATED	12
+#define BNXT_RE_FLAG_RESOURCES_INITIALIZED	13
+#define BNXT_RE_FLAG_ISSUE_ROCE_STATS		29
 	struct net_device		*netdev;
 	unsigned int			version, major, minor;
 	struct bnxt_qplib_chip_ctx	chip_ctx;
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index e7e8a0f..fbe3192 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -1315,18 +1315,23 @@ static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev)
 	if (test_and_clear_bit(BNXT_RE_FLAG_RESOURCES_ALLOCATED, &rdev->flags))
 		bnxt_re_free_res(rdev);
 
-	if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags)) {
+	if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_INIT, &rdev->flags)) {
 		rc = bnxt_qplib_deinit_rcfw(&rdev->rcfw);
 		if (rc)
 			dev_warn(rdev_to_dev(rdev),
 				 "Failed to deinitialize RCFW: %#x", rc);
+	}
+	if (test_and_clear_bit(BNXT_RE_FLAG_STATS_CTX_ALLOC, &rdev->flags))
 		bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id);
+	if (test_and_clear_bit(BNXT_RE_FLAG_ALLOC_CTX, &rdev->flags))
 		bnxt_qplib_free_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx);
+	if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags))
 		bnxt_qplib_disable_rcfw_channel(&rdev->rcfw);
-		type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
+	type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
+	if (test_and_clear_bit(BNXT_RE_FLAG_NET_RING_ALLOC, &rdev->flags))
 		bnxt_re_net_ring_free(rdev, rdev->rcfw.creq_ring_id, type);
+	if (test_and_clear_bit(BNXT_RE_FLAG_ALLOC_RCFW, &rdev->flags))
 		bnxt_qplib_free_rcfw_channel(&rdev->rcfw);
-	}
 	if (test_and_clear_bit(BNXT_RE_FLAG_GOT_MSIX, &rdev->flags)) {
 		rc = bnxt_re_free_msix(rdev);
 		if (rc)
@@ -1404,6 +1409,9 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		pr_err("Failed to allocate RCFW Channel: %#x\n", rc);
 		goto fail;
 	}
+
+	set_bit(BNXT_RE_FLAG_ALLOC_RCFW, &rdev->flags);
+
 	type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
 	pg_map = rdev->rcfw.creq.pbl[PBL_LVL_0].pg_map_arr;
 	pages = rdev->rcfw.creq.pbl[rdev->rcfw.creq.level].pg_count;
@@ -1413,8 +1421,10 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 				    ridx, &rdev->rcfw.creq_ring_id);
 	if (rc) {
 		pr_err("Failed to allocate CREQ: %#x\n", rc);
-		goto free_rcfw;
+		goto fail;
 	}
+	set_bit(BNXT_RE_FLAG_NET_RING_ALLOC, &rdev->flags);
+
 	db_offt = bnxt_re_get_nqdb_offset(rdev, BNXT_RE_AEQ_IDX);
 	vid = rdev->msix_entries[BNXT_RE_AEQ_IDX].vector;
 	rc = bnxt_qplib_enable_rcfw_channel(rdev->en_dev->pdev, &rdev->rcfw,
@@ -1422,13 +1432,14 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 					    &bnxt_re_aeq_handler);
 	if (rc) {
 		pr_err("Failed to enable RCFW channel: %#x\n", rc);
-		goto free_ring;
+		goto fail;
 	}
 
+	set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags);
 	rc = bnxt_qplib_get_dev_attr(&rdev->rcfw, &rdev->dev_attr,
 				     rdev->is_virtfn);
 	if (rc)
-		goto disable_rcfw;
+		goto fail;
 
 	bnxt_re_set_resource_limits(rdev);
 
@@ -1436,23 +1447,26 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 				  bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx));
 	if (rc) {
 		pr_err("Failed to allocate QPLIB context: %#x\n", rc);
-		goto disable_rcfw;
+		goto fail;
 	}
+	set_bit(BNXT_RE_FLAG_ALLOC_CTX, &rdev->flags);
+
 	rc = bnxt_re_net_stats_ctx_alloc(rdev,
 					 rdev->qplib_ctx.stats.dma_map,
 					 &rdev->qplib_ctx.stats.fw_id);
 	if (rc) {
 		pr_err("Failed to allocate stats context: %#x\n", rc);
-		goto free_ctx;
+		goto fail;
 	}
+	set_bit(BNXT_RE_FLAG_STATS_CTX_ALLOC, &rdev->flags);
 
 	rc = bnxt_qplib_init_rcfw(&rdev->rcfw, &rdev->qplib_ctx,
 				  rdev->is_virtfn);
 	if (rc) {
 		pr_err("Failed to initialize RCFW: %#x\n", rc);
-		goto free_sctx;
+		goto fail;
 	}
-	set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags);
+	set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_INIT, &rdev->flags);
 
 	/* Resources based on the 'new' device caps */
 	rc = bnxt_re_alloc_res(rdev);
@@ -1496,17 +1510,6 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
 
 	return 0;
-free_sctx:
-	bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id);
-free_ctx:
-	bnxt_qplib_free_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx);
-disable_rcfw:
-	bnxt_qplib_disable_rcfw_channel(&rdev->rcfw);
-free_ring:
-	type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
-	bnxt_re_net_ring_free(rdev, rdev->rcfw.creq_ring_id, type);
-free_rcfw:
-	bnxt_qplib_free_rcfw_channel(&rdev->rcfw);
 fail:
 	if (!locked)
 		rtnl_lock();
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
                   ` (2 preceding siblings ...)
  2019-11-25  8:39 ` [PATCH for-next 3/6] RDMA/bnxt_re: Add more flags in device init and uninit path Selvin Xavier
@ 2019-11-25  8:39 ` Selvin Xavier
  2020-01-03 19:36   ` Jason Gunthorpe
  2019-11-25  8:39 ` [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API Selvin Xavier
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

 - bnxt_re_ib_reg() handles two main functionalities - initializing
   the device and registering with the IB stack.  Split it into 2
   functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
   for the same thereby improve modularity. Do the same for
   bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
   and  bnxt_re_ib_uninit().
 - Simplify the code by combining the different steps to add and
   remove the device into two functions.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c | 133 ++++++++++++++++++++---------------
 1 file changed, 78 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index fbe3192..0cf38a4 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -78,7 +78,8 @@ static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
 /* Mutex to protect the list of bnxt_re devices added */
 static DEFINE_MUTEX(bnxt_re_dev_lock);
 static struct workqueue_struct *bnxt_re_wq;
-static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev);
+static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
+static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev);
 
 static void bnxt_re_destroy_chip_ctx(struct bnxt_re_dev *rdev)
 {
@@ -222,7 +223,9 @@ static void bnxt_re_shutdown(void *p)
 	if (!rdev)
 		return;
 
-	bnxt_re_ib_unreg(rdev);
+	bnxt_re_ib_uninit(rdev);
+	/* rtnl_lock held by L2 before coming here */
+	bnxt_re_remove_device(rdev);
 }
 
 static void bnxt_re_stop_irq(void *handle)
@@ -1297,7 +1300,37 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev)
 		le16_to_cpu(resp.hwrm_intf_patch);
 }
 
-static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev)
+static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev)
+{
+	/* Cleanup ib dev */
+	if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
+		ib_unregister_device(&rdev->ibdev);
+		clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+	}
+}
+
+int bnxt_re_ib_init(struct bnxt_re_dev *rdev)
+{
+	int rc = 0;
+
+	/* Register ib dev */
+	rc = bnxt_re_register_ib(rdev);
+	if (rc) {
+		pr_err("Failed to register with IB: %#x\n", rc);
+		return rc;
+	}
+	set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+	dev_info(rdev_to_dev(rdev), "Device registered successfully");
+	ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
+			 &rdev->active_width);
+	set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
+	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
+	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_GID_CHANGE);
+
+	return rc;
+}
+
+static void bnxt_re_dev_uninit(struct bnxt_re_dev *rdev)
 {
 	u8 type;
 	int rc;
@@ -1358,19 +1391,14 @@ static void bnxt_re_worker(struct work_struct *work)
 	schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
 }
 
-static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
+static int bnxt_re_dev_init(struct bnxt_re_dev *rdev)
 {
 	dma_addr_t *pg_map;
 	u32 db_offt, ridx;
 	int pages, vid;
-	bool locked;
 	u8 type;
 	int rc;
 
-	/* Acquire rtnl lock through out this function */
-	rtnl_lock();
-	locked = true;
-
 	/* Registered a new RoCE device instance to netdev */
 	rc = bnxt_re_register_netdev(rdev);
 	if (rc) {
@@ -1493,29 +1521,10 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
 	}
 
-	rtnl_unlock();
-	locked = false;
-
-	/* Register ib dev */
-	rc = bnxt_re_register_ib(rdev);
-	if (rc) {
-		pr_err("Failed to register with IB: %#x\n", rc);
-		goto fail;
-	}
-	set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
-	dev_info(rdev_to_dev(rdev), "Device registered successfully");
-	ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
-			 &rdev->active_width);
-	set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
-	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
-
 	return 0;
 fail:
-	if (!locked)
-		rtnl_lock();
-	bnxt_re_ib_unreg(rdev);
-	rtnl_unlock();
 
+	bnxt_re_dev_uninit(rdev);
 	return rc;
 }
 
@@ -1555,9 +1564,35 @@ static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, struct net_device *netdev)
 	return rc;
 }
 
-static void bnxt_re_remove_one(struct bnxt_re_dev *rdev)
+static void bnxt_re_remove_device(struct bnxt_re_dev *rdev)
 {
+	bnxt_re_dev_uninit(rdev);
 	pci_dev_put(rdev->en_dev->pdev);
+	bnxt_re_dev_unreg(rdev);
+}
+
+static int bnxt_re_add_device(struct bnxt_re_dev **rdev,
+			      struct net_device *netdev)
+{
+	int rc;
+
+	rc = bnxt_re_dev_reg(rdev, netdev);
+	if (rc == -ENODEV)
+		return rc;
+	if (rc) {
+		pr_err("Failed to register with the device %s: %#x\n",
+		       netdev->name, rc);
+		return rc;
+	}
+
+	pci_dev_get((*rdev)->en_dev->pdev);
+	rc = bnxt_re_dev_init(*rdev);
+	if (rc) {
+		pci_dev_put((*rdev)->en_dev->pdev);
+		bnxt_re_dev_unreg(*rdev);
+	}
+
+	return rc;
 }
 
 /* Handle all deferred netevents tasks */
@@ -1572,16 +1607,17 @@ static void bnxt_re_task(struct work_struct *work)
 
 	if (re_work->event != NETDEV_REGISTER &&
 	    !test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
-		return;
+		goto done;
 
 	switch (re_work->event) {
 	case NETDEV_REGISTER:
-		rc = bnxt_re_ib_reg(rdev);
+		rc = bnxt_re_ib_init(rdev);
 		if (rc) {
 			dev_err(rdev_to_dev(rdev),
 				"Failed to register with IB: %#x", rc);
-			bnxt_re_remove_one(rdev);
-			bnxt_re_dev_unreg(rdev);
+			rtnl_lock();
+			bnxt_re_remove_device(rdev);
+			rtnl_unlock();
 			goto exit;
 		}
 		break;
@@ -1604,17 +1640,13 @@ static void bnxt_re_task(struct work_struct *work)
 	default:
 		break;
 	}
+done:
 	smp_mb__before_atomic();
 	atomic_dec(&rdev->sched_count);
 exit:
 	kfree(re_work);
 }
 
-static void bnxt_re_init_one(struct bnxt_re_dev *rdev)
-{
-	pci_dev_get(rdev->en_dev->pdev);
-}
-
 /*
  * "Notifier chain callback can be invoked for the same chain from
  * different CPUs at the same time".
@@ -1652,16 +1684,9 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 	case NETDEV_REGISTER:
 		if (rdev)
 			break;
-		rc = bnxt_re_dev_reg(&rdev, real_dev);
-		if (rc == -ENODEV)
-			break;
-		if (rc) {
-			pr_err("Failed to register with the device %s: %#x\n",
-			       real_dev->name, rc);
-			break;
-		}
-		bnxt_re_init_one(rdev);
-		sch_work = true;
+		rc = bnxt_re_add_device(&rdev, real_dev);
+		if (!rc)
+			sch_work = true;
 		break;
 
 	case NETDEV_UNREGISTER:
@@ -1670,9 +1695,8 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 		 */
 		if (atomic_read(&rdev->sched_count) > 0)
 			goto exit;
-		bnxt_re_ib_unreg(rdev);
-		bnxt_re_remove_one(rdev);
-		bnxt_re_dev_unreg(rdev);
+		bnxt_re_ib_uninit(rdev);
+		bnxt_re_remove_device(rdev);
 		break;
 
 	default:
@@ -1749,12 +1773,11 @@ static void __exit bnxt_re_mod_exit(void)
 		 */
 		flush_workqueue(bnxt_re_wq);
 		bnxt_re_dev_stop(rdev);
+		bnxt_re_ib_uninit(rdev);
 		/* Acquire the rtnl_lock as the L2 resources are freed here */
 		rtnl_lock();
-		bnxt_re_ib_unreg(rdev);
+		bnxt_re_remove_device(rdev);
 		rtnl_unlock();
-		bnxt_re_remove_one(rdev);
-		bnxt_re_dev_unreg(rdev);
 	}
 	unregister_netdevice_notifier(&bnxt_re_netdev_notifier);
 	if (bnxt_re_wq)
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
                   ` (3 preceding siblings ...)
  2019-11-25  8:39 ` [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities Selvin Xavier
@ 2019-11-25  8:39 ` Selvin Xavier
  2020-01-03 19:44   ` Jason Gunthorpe
  2019-11-25  8:39 ` [PATCH for-next 6/6] RDMA/bnxt_re: Report more number of completion vectors Selvin Xavier
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

Using the new unregister APIs provided by the core.
Provide the dealloc_driver hook for the core to callback at the time
of device un-registration. Also, simplify bnxt_re_from_netdev by
retrieving the netdev information from the core API- ib_device_get_by_netdev.

bnxt_re VF resources are created by the corresponding PF driver.
During driver unload, if PFs gets destroyed first, driver need
not send down any command to FW. So adding a state (res_deinit) in the
per device structures to detect driver removal and avoid failures
VF destroy due to driver removal.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c       | 100 ++++++++++++-----------------
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |   7 ++
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |   1 +
 3 files changed, 48 insertions(+), 60 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 0cf38a4..9ea4ce7 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -79,7 +79,7 @@ static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
 static DEFINE_MUTEX(bnxt_re_dev_lock);
 static struct workqueue_struct *bnxt_re_wq;
 static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
-static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev);
+static void bnxt_re_dealloc_driver(struct ib_device *ib_dev);
 
 static void bnxt_re_destroy_chip_ctx(struct bnxt_re_dev *rdev)
 {
@@ -223,9 +223,7 @@ static void bnxt_re_shutdown(void *p)
 	if (!rdev)
 		return;
 
-	bnxt_re_ib_uninit(rdev);
-	/* rtnl_lock held by L2 before coming here */
-	bnxt_re_remove_device(rdev);
+	ib_unregister_device_queued(&rdev->ibdev);
 }
 
 static void bnxt_re_stop_irq(void *handle)
@@ -527,17 +525,12 @@ static bool is_bnxt_re_dev(struct net_device *netdev)
 
 static struct bnxt_re_dev *bnxt_re_from_netdev(struct net_device *netdev)
 {
-	struct bnxt_re_dev *rdev;
+	struct ib_device *ibdev =
+		ib_device_get_by_netdev(netdev, RDMA_DRIVER_BNXT_RE);
+	if (!ibdev)
+		return NULL;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(rdev, &bnxt_re_dev_list, list) {
-		if (rdev->netdev == netdev) {
-			rcu_read_unlock();
-			return rdev;
-		}
-	}
-	rcu_read_unlock();
-	return NULL;
+	return container_of(ibdev, struct bnxt_re_dev, ibdev);
 }
 
 static void bnxt_re_dev_unprobe(struct net_device *netdev,
@@ -611,11 +604,6 @@ static const struct attribute_group bnxt_re_dev_attr_group = {
 	.attrs = bnxt_re_attributes,
 };
 
-static void bnxt_re_unregister_ib(struct bnxt_re_dev *rdev)
-{
-	ib_unregister_device(&rdev->ibdev);
-}
-
 static const struct ib_device_ops bnxt_re_dev_ops = {
 	.owner = THIS_MODULE,
 	.driver_id = RDMA_DRIVER_BNXT_RE,
@@ -630,6 +618,7 @@ static const struct ib_device_ops bnxt_re_dev_ops = {
 	.create_cq = bnxt_re_create_cq,
 	.create_qp = bnxt_re_create_qp,
 	.create_srq = bnxt_re_create_srq,
+	.dealloc_driver = bnxt_re_dealloc_driver,
 	.dealloc_pd = bnxt_re_dealloc_pd,
 	.dealloc_ucontext = bnxt_re_dealloc_ucontext,
 	.del_gid = bnxt_re_del_gid,
@@ -726,15 +715,11 @@ static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
 {
 	dev_put(rdev->netdev);
 	rdev->netdev = NULL;
-
 	mutex_lock(&bnxt_re_dev_lock);
 	list_del_rcu(&rdev->list);
 	mutex_unlock(&bnxt_re_dev_lock);
 
 	synchronize_rcu();
-
-	ib_dealloc_device(&rdev->ibdev);
-	/* rdev is gone */
 }
 
 static struct bnxt_re_dev *bnxt_re_dev_add(struct net_device *netdev,
@@ -767,6 +752,7 @@ static struct bnxt_re_dev *bnxt_re_dev_add(struct net_device *netdev,
 	mutex_lock(&bnxt_re_dev_lock);
 	list_add_tail_rcu(&rdev->list, &bnxt_re_dev_list);
 	mutex_unlock(&bnxt_re_dev_lock);
+
 	return rdev;
 }
 
@@ -1300,15 +1286,6 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev)
 		le16_to_cpu(resp.hwrm_intf_patch);
 }
 
-static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev)
-{
-	/* Cleanup ib dev */
-	if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
-		ib_unregister_device(&rdev->ibdev);
-		clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
-	}
-}
-
 int bnxt_re_ib_init(struct bnxt_re_dev *rdev)
 {
 	int rc = 0;
@@ -1335,10 +1312,6 @@ static void bnxt_re_dev_uninit(struct bnxt_re_dev *rdev)
 	u8 type;
 	int rc;
 
-	if (test_and_clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
-		/* Cleanup ib dev */
-		bnxt_re_unregister_ib(rdev);
-	}
 	if (test_and_clear_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags))
 		cancel_delayed_work_sync(&rdev->worker);
 
@@ -1595,6 +1568,19 @@ static int bnxt_re_add_device(struct bnxt_re_dev **rdev,
 	return rc;
 }
 
+static void bnxt_re_dealloc_driver(struct ib_device *ib_dev)
+{
+	struct bnxt_re_dev *rdev =
+		container_of(ib_dev, struct bnxt_re_dev, ibdev);
+
+	clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+	dev_info(rdev_to_dev(rdev), "Unregistering Device");
+
+	rtnl_lock();
+	bnxt_re_remove_device(rdev);
+	rtnl_unlock();
+}
+
 /* Handle all deferred netevents tasks */
 static void bnxt_re_task(struct work_struct *work)
 {
@@ -1669,6 +1655,7 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 	struct bnxt_re_dev *rdev;
 	int rc = 0;
 	bool sch_work = false;
+	bool release = true;
 
 	real_dev = rdma_vlan_dev_real_dev(netdev);
 	if (!real_dev)
@@ -1676,7 +1663,8 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 
 	rdev = bnxt_re_from_netdev(real_dev);
 	if (!rdev && event != NETDEV_REGISTER)
-		goto exit;
+		return NOTIFY_OK;
+
 	if (real_dev != netdev)
 		goto exit;
 
@@ -1687,6 +1675,7 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 		rc = bnxt_re_add_device(&rdev, real_dev);
 		if (!rc)
 			sch_work = true;
+		release = false;
 		break;
 
 	case NETDEV_UNREGISTER:
@@ -1695,8 +1684,7 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 		 */
 		if (atomic_read(&rdev->sched_count) > 0)
 			goto exit;
-		bnxt_re_ib_uninit(rdev);
-		bnxt_re_remove_device(rdev);
+		ib_unregister_device_queued(&rdev->ibdev);
 		break;
 
 	default:
@@ -1718,6 +1706,8 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
 	}
 
 exit:
+	if (release)
+		ib_device_put(&rdev->ibdev);
 	return NOTIFY_DONE;
 }
 
@@ -1753,32 +1743,22 @@ static int __init bnxt_re_mod_init(void)
 
 static void __exit bnxt_re_mod_exit(void)
 {
-	struct bnxt_re_dev *rdev, *next;
-	LIST_HEAD(to_be_deleted);
+	struct bnxt_re_dev *rdev;
 
+	flush_workqueue(bnxt_re_wq);
 	mutex_lock(&bnxt_re_dev_lock);
-	/* Free all adapter allocated resources */
-	if (!list_empty(&bnxt_re_dev_list))
-		list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
-	mutex_unlock(&bnxt_re_dev_lock);
-       /*
-	* Cleanup the devices in reverse order so that the VF device
-	* cleanup is done before PF cleanup
-	*/
-	list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
-		dev_info(rdev_to_dev(rdev), "Unregistering Device");
+	list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
 		/*
-		 * Flush out any scheduled tasks before destroying the
-		 * resources
+		 * Set unreg flag to avoid VF resource cleanup
+		 * in module unload path. This is required because
+		 * dealloc_driver for VF can come after PF cleaning
+		 * the VF resources.
 		 */
-		flush_workqueue(bnxt_re_wq);
-		bnxt_re_dev_stop(rdev);
-		bnxt_re_ib_uninit(rdev);
-		/* Acquire the rtnl_lock as the L2 resources are freed here */
-		rtnl_lock();
-		bnxt_re_remove_device(rdev);
-		rtnl_unlock();
+		if (rdev->is_virtfn)
+			rdev->rcfw.res_deinit = true;
 	}
+	mutex_unlock(&bnxt_re_dev_lock);
+	ib_unregister_driver(RDMA_DRIVER_BNXT_RE);
 	unregister_netdevice_notifier(&bnxt_re_netdev_notifier);
 	if (bnxt_re_wq)
 		destroy_workqueue(bnxt_re_wq);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 5cdfa84..2304f97 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -211,6 +211,13 @@ int bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
 	u8 opcode, retry_cnt = 0xFF;
 	int rc = 0;
 
+	/* Check if the command needs to be send down to HW.
+	 * Return success if resources are de-initialized
+	 */
+
+	if (rcfw->res_deinit)
+		return 0;
+
 	do {
 		opcode = req->opcode;
 		rc = __send_message(rcfw, req, resp, sb, is_block);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
index dfeadc1..c2b930e 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
@@ -265,6 +265,7 @@ struct bnxt_qplib_rcfw {
 	u64 oos_prev;
 	u32 init_oos_stats;
 	u32 cmdq_depth;
+	bool res_deinit;
 };
 
 void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-next 6/6] RDMA/bnxt_re: Report more number of completion vectors
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
                   ` (4 preceding siblings ...)
  2019-11-25  8:39 ` [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API Selvin Xavier
@ 2019-11-25  8:39 ` Selvin Xavier
  2020-01-03 19:46   ` Jason Gunthorpe
  2019-12-11 12:57 ` [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
  2020-01-03 19:29 ` Jason Gunthorpe
  7 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-11-25  8:39 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, Selvin Xavier

Report the the data path MSIx vectors allocated by driver as
number of completion vectors. One interrupt
vector is used for Control path. So reporting one
less than the total number of  MSIx vectors allocated
by the driver.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 9ea4ce7..2e400da 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -669,7 +669,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 
 	bnxt_qplib_get_guid(rdev->netdev->dev_addr, (u8 *)&ibdev->node_guid);
 
-	ibdev->num_comp_vectors	= 1;
+	ibdev->num_comp_vectors	= rdev->num_msix - 1;
 	ibdev->dev.parent = &rdev->en_dev->pdev->dev;
 	ibdev->local_dma_lkey = BNXT_QPLIB_RSVD_LKEY;
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 0/6] RDMA/bnxt_re driver update
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
                   ` (5 preceding siblings ...)
  2019-11-25  8:39 ` [PATCH for-next 6/6] RDMA/bnxt_re: Report more number of completion vectors Selvin Xavier
@ 2019-12-11 12:57 ` Selvin Xavier
  2019-12-11 16:20   ` Jason Gunthorpe
  2020-01-03 19:29 ` Jason Gunthorpe
  7 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-12-11 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

Hi Doug/Jason,

Can you please confirm whether you plan to pull this series in 5.5? I
marked it as for-next. I was not sure whether it was late for the
current merge window.  If you don't plan to pull this series, can you
please pull patch 1 and 2 for RC? Both are bug fixes.
Thanks,
Selvin

On Mon, Nov 25, 2019 at 2:09 PM Selvin Xavier
<selvin.xavier@broadcom.com> wrote:
>
> This series includes couple of bug fixes and
> code refactoring in the device init/deinit path.
> Also, modifies the code to use the new driver unregistration APIs.
>
> Please apply to for-next.
>
> Thanks,
> Selvin Xavier
>
> Selvin Xavier (6):
>   RDMA/bnxt_re: Avoid freeing MR resources if dereg fails
>   RDMA/bnxt_re: Fix Send Work Entry state check while polling
>     completions
>   RDMA/bnxt_re: Add more flags in device init and uninit path
>   RDMA/bnxt_re: Refactor device add/remove functionalities
>   RDMA/bnxt_re: Use driver_unregister and unregistration API
>   RDMA/bnxt_re: Report more number of completion vectors
>
>  drivers/infiniband/hw/bnxt_re/bnxt_re.h    |  16 +-
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c   |   4 +-
>  drivers/infiniband/hw/bnxt_re/main.c       | 246 +++++++++++++++--------------
>  drivers/infiniband/hw/bnxt_re/qplib_fp.c   |  12 +-
>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |   7 +
>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |   1 +
>  6 files changed, 154 insertions(+), 132 deletions(-)
>
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 0/6] RDMA/bnxt_re driver update
  2019-12-11 12:57 ` [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
@ 2019-12-11 16:20   ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2019-12-11 16:20 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: Doug Ledford, linux-rdma

On Wed, Dec 11, 2019 at 06:27:25PM +0530, Selvin Xavier wrote:
> Hi Doug/Jason,
> 
> Can you please confirm whether you plan to pull this series in 5.5? I
> marked it as for-next. I was not sure whether it was late for the
> current merge window.

It was way too late

> If you don't plan to pull this series, can you please pull patch 1
> and 2 for RC? Both are bug fixes.  Thanks, Selvin

I'll look at it next week

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 0/6] RDMA/bnxt_re driver update
  2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
                   ` (6 preceding siblings ...)
  2019-12-11 12:57 ` [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
@ 2020-01-03 19:29 ` Jason Gunthorpe
  7 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2020-01-03 19:29 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: dledford, linux-rdma

On Mon, Nov 25, 2019 at 12:39:28AM -0800, Selvin Xavier wrote:
> This series includes couple of bug fixes and
> code refactoring in the device init/deinit path.
> Also, modifies the code to use the new driver unregistration APIs.
> 
> Please apply to for-next.
> 
> Thanks,
> Selvin Xavier
> 
> Selvin Xavier (6):
>   RDMA/bnxt_re: Avoid freeing MR resources if dereg fails
>   RDMA/bnxt_re: Fix Send Work Entry state check while polling
>     completions

These two have been applied to -rc, thanks

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities
  2019-11-25  8:39 ` [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities Selvin Xavier
@ 2020-01-03 19:36   ` Jason Gunthorpe
  2020-02-18  7:23     ` Selvin Xavier
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2020-01-03 19:36 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: dledford, linux-rdma

On Mon, Nov 25, 2019 at 12:39:32AM -0800, Selvin Xavier wrote:
>  - bnxt_re_ib_reg() handles two main functionalities - initializing
>    the device and registering with the IB stack.  Split it into 2
>    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
>    for the same thereby improve modularity. Do the same for
>    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
>    and  bnxt_re_ib_uninit().
>  - Simplify the code by combining the different steps to add and
>    remove the device into two functions.
> 
> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
>  drivers/infiniband/hw/bnxt_re/main.c | 133 ++++++++++++++++++++---------------
>  1 file changed, 78 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> index fbe3192..0cf38a4 100644
> +++ b/drivers/infiniband/hw/bnxt_re/main.c
> @@ -78,7 +78,8 @@ static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
>  /* Mutex to protect the list of bnxt_re devices added */
>  static DEFINE_MUTEX(bnxt_re_dev_lock);
>  static struct workqueue_struct *bnxt_re_wq;
> -static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev);
> +static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
> +static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev);
>  
>  static void bnxt_re_destroy_chip_ctx(struct bnxt_re_dev *rdev)
>  {
> @@ -222,7 +223,9 @@ static void bnxt_re_shutdown(void *p)
>  	if (!rdev)
>  		return;
>  
> -	bnxt_re_ib_unreg(rdev);
> +	bnxt_re_ib_uninit(rdev);
> +	/* rtnl_lock held by L2 before coming here */
> +	bnxt_re_remove_device(rdev);

Is this a warning that RTNL is held, or a note that is is required? If
it is the latter then plen use ASSERT_RTNL instead of a comment

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API
  2019-11-25  8:39 ` [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API Selvin Xavier
@ 2020-01-03 19:44   ` Jason Gunthorpe
  2020-02-18 11:49     ` Selvin Xavier
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2020-01-03 19:44 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: dledford, linux-rdma

On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:

>  static void __exit bnxt_re_mod_exit(void)
>  {
> -	struct bnxt_re_dev *rdev, *next;
> -	LIST_HEAD(to_be_deleted);
> +	struct bnxt_re_dev *rdev;
>  
> +	flush_workqueue(bnxt_re_wq);
>  	mutex_lock(&bnxt_re_dev_lock);
> -	/* Free all adapter allocated resources */
> -	if (!list_empty(&bnxt_re_dev_list))
> -		list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> -	mutex_unlock(&bnxt_re_dev_lock);
> -       /*
> -	* Cleanup the devices in reverse order so that the VF device
> -	* cleanup is done before PF cleanup
> -	*/
> -	list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> -		dev_info(rdev_to_dev(rdev), "Unregistering Device");
> +	list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
>  		/*
> -		 * Flush out any scheduled tasks before destroying the
> -		 * resources
> +		 * Set unreg flag to avoid VF resource cleanup
> +		 * in module unload path. This is required because
> +		 * dealloc_driver for VF can come after PF cleaning
> +		 * the VF resources.
>  		 */
> -		flush_workqueue(bnxt_re_wq);
> -		bnxt_re_dev_stop(rdev);
> -		bnxt_re_ib_uninit(rdev);
> -		/* Acquire the rtnl_lock as the L2 resources are freed here */
> -		rtnl_lock();
> -		bnxt_re_remove_device(rdev);
> -		rtnl_unlock();
> +		if (rdev->is_virtfn)
> +			rdev->rcfw.res_deinit = true;
>  	}
> +	mutex_unlock(&bnxt_re_dev_lock);

This is super ugly. This driver already has bugs if it has a
dependency on driver unbinding order as drivers can become unbound
from userspace using sysfs or hot un-plug in any ordering.

If the VF driver somehow depends on the PF driver then destruction of
the PF must synchronize and fence the VFs during it's own shutdown.

But this seems very very strange, how can it work if the VF is in a VM
or something and the PF driver is unplugged?

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 6/6] RDMA/bnxt_re: Report more number of completion vectors
  2019-11-25  8:39 ` [PATCH for-next 6/6] RDMA/bnxt_re: Report more number of completion vectors Selvin Xavier
@ 2020-01-03 19:46   ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2020-01-03 19:46 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: dledford, linux-rdma

On Mon, Nov 25, 2019 at 12:39:34AM -0800, Selvin Xavier wrote:
> Report the the data path MSIx vectors allocated by driver as
> number of completion vectors. One interrupt
> vector is used for Control path. So reporting one
> less than the total number of  MSIx vectors allocated
> by the driver.
> 
> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> ---
>  drivers/infiniband/hw/bnxt_re/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied to for-next, thanks

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities
  2020-01-03 19:36   ` Jason Gunthorpe
@ 2020-02-18  7:23     ` Selvin Xavier
  0 siblings, 0 replies; 18+ messages in thread
From: Selvin Xavier @ 2020-02-18  7:23 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

On Sat, Jan 4, 2020 at 1:06 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Nov 25, 2019 at 12:39:32AM -0800, Selvin Xavier wrote:
> >  - bnxt_re_ib_reg() handles two main functionalities - initializing
> >    the device and registering with the IB stack.  Split it into 2
> >    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
> >    for the same thereby improve modularity. Do the same for
> >    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
> >    and  bnxt_re_ib_uninit().
> >  - Simplify the code by combining the different steps to add and
> >    remove the device into two functions.
> >
> > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> >  drivers/infiniband/hw/bnxt_re/main.c | 133 ++++++++++++++++++++---------------
> >  1 file changed, 78 insertions(+), 55 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> > index fbe3192..0cf38a4 100644
> > +++ b/drivers/infiniband/hw/bnxt_re/main.c
> > @@ -78,7 +78,8 @@ static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
> >  /* Mutex to protect the list of bnxt_re devices added */
> >  static DEFINE_MUTEX(bnxt_re_dev_lock);
> >  static struct workqueue_struct *bnxt_re_wq;
> > -static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev);
> > +static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
> > +static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev);
> >
> >  static void bnxt_re_destroy_chip_ctx(struct bnxt_re_dev *rdev)
> >  {
> > @@ -222,7 +223,9 @@ static void bnxt_re_shutdown(void *p)
> >       if (!rdev)
> >               return;
> >
> > -     bnxt_re_ib_unreg(rdev);
> > +     bnxt_re_ib_uninit(rdev);
> > +     /* rtnl_lock held by L2 before coming here */
> > +     bnxt_re_remove_device(rdev);
>
> Is this a warning that RTNL is held, or a note that is is required? If
> it is the latter then plen use ASSERT_RTNL instead of a comment
>
It was intended as a note. I will replace this with the ASSERT_RTNL.
Thanks
> Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API
  2020-01-03 19:44   ` Jason Gunthorpe
@ 2020-02-18 11:49     ` Selvin Xavier
  2020-02-19  0:15       ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2020-02-18 11:49 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
>
> >  static void __exit bnxt_re_mod_exit(void)
> >  {
> > -     struct bnxt_re_dev *rdev, *next;
> > -     LIST_HEAD(to_be_deleted);
> > +     struct bnxt_re_dev *rdev;
> >
> > +     flush_workqueue(bnxt_re_wq);
> >       mutex_lock(&bnxt_re_dev_lock);
> > -     /* Free all adapter allocated resources */
> > -     if (!list_empty(&bnxt_re_dev_list))
> > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > -     mutex_unlock(&bnxt_re_dev_lock);
> > -       /*
> > -     * Cleanup the devices in reverse order so that the VF device
> > -     * cleanup is done before PF cleanup
> > -     */
> > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> >               /*
> > -              * Flush out any scheduled tasks before destroying the
> > -              * resources
> > +              * Set unreg flag to avoid VF resource cleanup
> > +              * in module unload path. This is required because
> > +              * dealloc_driver for VF can come after PF cleaning
> > +              * the VF resources.
> >                */
> > -             flush_workqueue(bnxt_re_wq);
> > -             bnxt_re_dev_stop(rdev);
> > -             bnxt_re_ib_uninit(rdev);
> > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > -             rtnl_lock();
> > -             bnxt_re_remove_device(rdev);
> > -             rtnl_unlock();
> > +             if (rdev->is_virtfn)
> > +                     rdev->rcfw.res_deinit = true;
> >       }
> > +     mutex_unlock(&bnxt_re_dev_lock);
>
> This is super ugly. This driver already has bugs if it has a
> dependency on driver unbinding order as drivers can become unbound
> from userspace using sysfs or hot un-plug in any ordering.
>
The dependency is from the HW side and not from the driver side.
In some of the HW versions, RoCE PF driver is allowed to allocate the
host memory
for VFs and this dependency is due to this.
> If the VF driver somehow depends on the PF driver then destruction of
> the PF must synchronize and fence the VFs during it's own shutdown.
>
Do you suggest that synchronization should be done in the stack before
invoking the
dealloc_driver for VF?

> But this seems very strange, how can it work if the VF is in a VM
> or something and the PF driver is unplugged?
This code is not handling the case where the VF is attached to a VM.
 First command to HW after the removal of PF will fail with timeout.
Driver will stop sending commands to HW once it sees this timeout. VF
driver removal
will proceed with cleaning up of host resources without sending any
commands to FW
and exit the removal process.

On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
patch, when VF removal is initiated,
the first command will timeout and driver will stop sending any more commands
to FW and proceed with removal. All VFs will exit in the same way.
Just that each
function will wait for one command to timeout. Setting
rdev->rcfw.res_deinit was added
as a hack to avoid this  waiting time.
> Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API
  2020-02-18 11:49     ` Selvin Xavier
@ 2020-02-19  0:15       ` Jason Gunthorpe
  2020-02-19  9:27         ` Selvin Xavier
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2020-02-19  0:15 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: Doug Ledford, linux-rdma

On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote:
> On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
> >
> > >  static void __exit bnxt_re_mod_exit(void)
> > >  {
> > > -     struct bnxt_re_dev *rdev, *next;
> > > -     LIST_HEAD(to_be_deleted);
> > > +     struct bnxt_re_dev *rdev;
> > >
> > > +     flush_workqueue(bnxt_re_wq);
> > >       mutex_lock(&bnxt_re_dev_lock);
> > > -     /* Free all adapter allocated resources */
> > > -     if (!list_empty(&bnxt_re_dev_list))
> > > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > > -     mutex_unlock(&bnxt_re_dev_lock);
> > > -       /*
> > > -     * Cleanup the devices in reverse order so that the VF device
> > > -     * cleanup is done before PF cleanup
> > > -     */
> > > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> > >               /*
> > > -              * Flush out any scheduled tasks before destroying the
> > > -              * resources
> > > +              * Set unreg flag to avoid VF resource cleanup
> > > +              * in module unload path. This is required because
> > > +              * dealloc_driver for VF can come after PF cleaning
> > > +              * the VF resources.
> > >                */
> > > -             flush_workqueue(bnxt_re_wq);
> > > -             bnxt_re_dev_stop(rdev);
> > > -             bnxt_re_ib_uninit(rdev);
> > > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > > -             rtnl_lock();
> > > -             bnxt_re_remove_device(rdev);
> > > -             rtnl_unlock();
> > > +             if (rdev->is_virtfn)
> > > +                     rdev->rcfw.res_deinit = true;
> > >       }
> > > +     mutex_unlock(&bnxt_re_dev_lock);
> >
> > This is super ugly. This driver already has bugs if it has a
> > dependency on driver unbinding order as drivers can become unbound
> > from userspace using sysfs or hot un-plug in any ordering.
> >
> The dependency is from the HW side and not from the driver side.
> In some of the HW versions, RoCE PF driver is allowed to allocate the
> host memory
> for VFs and this dependency is due to this.
> > If the VF driver somehow depends on the PF driver then destruction of
> > the PF must synchronize and fence the VFs during it's own shutdown.
>
> Do you suggest that synchronization should be done in the stack before
> invoking the
> dealloc_driver for VF?

'in the stack'? This is a driver problem.. You can't assume ordering
of driver detaches in Linux, and drivers should really not be
co-ordinating across their instances.

> > But this seems very strange, how can it work if the VF is in a VM
> > or something and the PF driver is unplugged?

> This code is not handling the case where the VF is attached to a VM.
> First command to HW after the removal of PF will fail with timeout.
> Driver will stop sending commands to HW once it sees this timeout. VF
> driver removal
> will proceed with cleaning up of host resources without sending any
> commands to FW
> and exit the removal process.

So why not use this for the host case as well? The timeout is too
long?

> On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> patch, when VF removal is initiated,
> the first command will timeout and driver will stop sending any more commands
> to FW and proceed with removal. All VFs will exit in the same way.
> Just that each
> function will wait for one command to timeout. Setting
> rdev->rcfw.res_deinit was added
> as a hack to avoid this  waiting time.

The issue is that pci_driver_unregister undoes the driver binds in
FIFO not LIFO order?

What happens when the VF binds after the PF?

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API
  2020-02-19  0:15       ` Jason Gunthorpe
@ 2020-02-19  9:27         ` Selvin Xavier
  2020-02-19 13:15           ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2020-02-19  9:27 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

On Wed, Feb 19, 2020 at 5:45 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote:
> > On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
> > >
> > > >  static void __exit bnxt_re_mod_exit(void)
> > > >  {
> > > > -     struct bnxt_re_dev *rdev, *next;
> > > > -     LIST_HEAD(to_be_deleted);
> > > > +     struct bnxt_re_dev *rdev;
> > > >
> > > > +     flush_workqueue(bnxt_re_wq);
> > > >       mutex_lock(&bnxt_re_dev_lock);
> > > > -     /* Free all adapter allocated resources */
> > > > -     if (!list_empty(&bnxt_re_dev_list))
> > > > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > > > -     mutex_unlock(&bnxt_re_dev_lock);
> > > > -       /*
> > > > -     * Cleanup the devices in reverse order so that the VF device
> > > > -     * cleanup is done before PF cleanup
> > > > -     */
> > > > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > > > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > > > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> > > >               /*
> > > > -              * Flush out any scheduled tasks before destroying the
> > > > -              * resources
> > > > +              * Set unreg flag to avoid VF resource cleanup
> > > > +              * in module unload path. This is required because
> > > > +              * dealloc_driver for VF can come after PF cleaning
> > > > +              * the VF resources.
> > > >                */
> > > > -             flush_workqueue(bnxt_re_wq);
> > > > -             bnxt_re_dev_stop(rdev);
> > > > -             bnxt_re_ib_uninit(rdev);
> > > > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > > > -             rtnl_lock();
> > > > -             bnxt_re_remove_device(rdev);
> > > > -             rtnl_unlock();
> > > > +             if (rdev->is_virtfn)
> > > > +                     rdev->rcfw.res_deinit = true;
> > > >       }
> > > > +     mutex_unlock(&bnxt_re_dev_lock);
> > >
> > > This is super ugly. This driver already has bugs if it has a
> > > dependency on driver unbinding order as drivers can become unbound
> > > from userspace using sysfs or hot un-plug in any ordering.
> > >
> > The dependency is from the HW side and not from the driver side.
> > In some of the HW versions, RoCE PF driver is allowed to allocate the
> > host memory
> > for VFs and this dependency is due to this.
> > > If the VF driver somehow depends on the PF driver then destruction of
> > > the PF must synchronize and fence the VFs during it's own shutdown.
> >
> > Do you suggest that synchronization should be done in the stack before
> > invoking the
> > dealloc_driver for VF?
>
> 'in the stack'? This is a driver problem.. You can't assume ordering
> of driver detaches in Linux, and drivers should really not be
> co-ordinating across their instances.
>
> > > But this seems very strange, how can it work if the VF is in a VM
> > > or something and the PF driver is unplugged?
>
> > This code is not handling the case where the VF is attached to a VM.
> > First command to HW after the removal of PF will fail with timeout.
> > Driver will stop sending commands to HW once it sees this timeout. VF
> > driver removal
> > will proceed with cleaning up of host resources without sending any
> > commands to FW
> > and exit the removal process.
>
> So why not use this for the host case as well? The timeout is too
> long?
Yeah. Timeout for the first command is 20sec now. May be, I can use
a smaller timeout in the unreg path.
>
> > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> > patch, when VF removal is initiated,
> > the first command will timeout and driver will stop sending any more commands
> > to FW and proceed with removal. All VFs will exit in the same way.
> > Just that each
> > function will wait for one command to timeout. Setting
> > rdev->rcfw.res_deinit was added
> > as a hack to avoid this  waiting time.
>
> The issue is that pci_driver_unregister undoes the driver binds in
> FIFO not LIFO order?
>
> What happens when the VF binds after the PF?

This is not dependent on PCI driver unregister. This particular issue
is happening when bnxt_re
driver only  is unloaded and the new  ib_unregister_driver is invoked
by bnxt_re driver in the mod_exit hook.
dealloc_driver for each IB device  is called mostly in FIFO
order(using xa_for_each).  So since PF ib device was added first, it
gets removed and then VF is removed.

After this discussion, now I feel, it's better to remove the hack and
allow the commands to fail with timeout and exit.
The issue is seen only when users try to unload bnxt_re alone, without
destroying the VFs.The chances of this type of usage is slim.
Anyway, it's not a Fatal error if any of the users try this sequence.
Just that the rmmod will take some time to exit.

Shall i repost by removing this hack?
>
> Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API
  2020-02-19  9:27         ` Selvin Xavier
@ 2020-02-19 13:15           ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2020-02-19 13:15 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: Doug Ledford, linux-rdma

On Wed, Feb 19, 2020 at 02:57:45PM +0530, Selvin Xavier wrote:
> > > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> > > patch, when VF removal is initiated,
> > > the first command will timeout and driver will stop sending any more commands
> > > to FW and proceed with removal. All VFs will exit in the same way.
> > > Just that each
> > > function will wait for one command to timeout. Setting
> > > rdev->rcfw.res_deinit was added
> > > as a hack to avoid this  waiting time.
> >
> > The issue is that pci_driver_unregister undoes the driver binds in
> > FIFO not LIFO order?
> >
> > What happens when the VF binds after the PF?
> 
> This is not dependent on PCI driver unregister. This particular issue
> is happening when bnxt_re
> driver only  is unloaded and the new  ib_unregister_driver is invoked
> by bnxt_re driver in the mod_exit hook.

Oh, now I remember, this driver sits on top of netdev using the
notification stuff so things are a little weird.

> dealloc_driver for each IB device  is called mostly in FIFO
> order(using xa_for_each).  So since PF ib device was added first, it
> gets removed and then VF is removed.

I see.. You could probably arrange to remove your VFs first, then call
ib_unregister_driver() to finish it up and serialize away any
races. That would be reasonably clean and much better than hacking
special stuff in.

> Shall i repost by removing this hack?

Please

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-02-19 13:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-25  8:39 [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
2019-11-25  8:39 ` [PATCH for-next 1/6] RDMA/bnxt_re: Avoid freeing MR resources if dereg fails Selvin Xavier
2019-11-25  8:39 ` [PATCH for-next 2/6] RDMA/bnxt_re: Fix Send Work Entry state check while polling completions Selvin Xavier
2019-11-25  8:39 ` [PATCH for-next 3/6] RDMA/bnxt_re: Add more flags in device init and uninit path Selvin Xavier
2019-11-25  8:39 ` [PATCH for-next 4/6] RDMA/bnxt_re: Refactor device add/remove functionalities Selvin Xavier
2020-01-03 19:36   ` Jason Gunthorpe
2020-02-18  7:23     ` Selvin Xavier
2019-11-25  8:39 ` [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API Selvin Xavier
2020-01-03 19:44   ` Jason Gunthorpe
2020-02-18 11:49     ` Selvin Xavier
2020-02-19  0:15       ` Jason Gunthorpe
2020-02-19  9:27         ` Selvin Xavier
2020-02-19 13:15           ` Jason Gunthorpe
2019-11-25  8:39 ` [PATCH for-next 6/6] RDMA/bnxt_re: Report more number of completion vectors Selvin Xavier
2020-01-03 19:46   ` Jason Gunthorpe
2019-12-11 12:57 ` [PATCH for-next 0/6] RDMA/bnxt_re driver update Selvin Xavier
2019-12-11 16:20   ` Jason Gunthorpe
2020-01-03 19:29 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).