All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
  2016-09-01 16:12 ` Steve Wise
@ 2016-09-01 13:43     ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 13:43 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Reviewed-by: Sagi Grimberg <sagi-Eyp3ogxVSr5BDLzU/O5InQ@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b6a953a..434a7fb 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2117,8 +2117,10 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 		}
 		ep->l2t = cxgb4_l2t_get(cdev->rdev.lldi.l2t,
 					n, pdev, rt_tos2priority(tos));
-		if (!ep->l2t)
+		if (!ep->l2t) {
+			dev_put(pdev);
 			goto out;
+		}
 		ep->mtu = pdev->mtu;
 		ep->tx_chan = cxgb4_port_chan(pdev);
 		ep->smac_idx = cxgb4_tp_smt_idx(adapter_type,
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
@ 2016-09-01 13:43     ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 13:43 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b6a953a..434a7fb 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2117,8 +2117,10 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 		}
 		ep->l2t = cxgb4_l2t_get(cdev->rdev.lldi.l2t,
 					n, pdev, rt_tos2priority(tos));
-		if (!ep->l2t)
+		if (!ep->l2t) {
+			dev_put(pdev);
 			goto out;
+		}
 		ep->mtu = pdev->mtu;
 		ep->tx_chan = cxgb4_port_chan(pdev);
 		ep->smac_idx = cxgb4_tp_smt_idx(adapter_type,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/6] iw_cxgb4: block module unload until all ep resources are released
  2016-09-01 16:12 ` Steve Wise
@ 2016-09-01 13:44     ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 13:44 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Otherwise an endpoint can be still closing down causing a touch
after free crash.  Also WARN_ON if ulps have failed to destroy
various resources during device removal.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <sagi-Eyp3ogxVSr5BDLzU/O5InQ@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/hw/cxgb4/cm.c       | 2 ++
 drivers/infiniband/hw/cxgb4/device.c   | 5 +++++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 434a7fb..80f9889 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -333,6 +333,8 @@ static void remove_ep_tid(struct c4iw_ep *ep)
 
 	spin_lock_irqsave(&ep->com.dev->lock, flags);
 	_remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid, 0);
+	if (idr_is_empty(&ep->com.dev->hwtid_idr))
+		wake_up(&ep->com.dev->wait);
 	spin_unlock_irqrestore(&ep->com.dev->lock, flags);
 }
 
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 071d733..3c4b212 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -872,9 +872,13 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
 static void c4iw_dealloc(struct uld_ctx *ctx)
 {
 	c4iw_rdev_close(&ctx->dev->rdev);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->cqidr));
 	idr_destroy(&ctx->dev->cqidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->qpidr));
 	idr_destroy(&ctx->dev->qpidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->mmidr));
 	idr_destroy(&ctx->dev->mmidr);
+	wait_event(ctx->dev->wait, idr_is_empty(&ctx->dev->hwtid_idr));
 	idr_destroy(&ctx->dev->hwtid_idr);
 	idr_destroy(&ctx->dev->stid_idr);
 	idr_destroy(&ctx->dev->atid_idr);
@@ -992,6 +996,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	mutex_init(&devp->rdev.stats.lock);
 	mutex_init(&devp->db_mutex);
 	INIT_LIST_HEAD(&devp->db_fc_list);
+	init_waitqueue_head(&devp->wait);
 	devp->avail_ird = devp->rdev.lldi.max_ird_adapter;
 
 	if (c4iw_debugfs_root) {
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index aa47e0a..4b83b84 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -263,6 +263,7 @@ struct c4iw_dev {
 	struct idr stid_idr;
 	struct list_head db_fc_list;
 	u32 avail_ird;
+	wait_queue_head_t wait;
 };
 
 static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/6] iw_cxgb4: block module unload until all ep resources are released
@ 2016-09-01 13:44     ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 13:44 UTC (permalink / raw)


Otherwise an endpoint can be still closing down causing a touch
after free crash.  Also WARN_ON if ulps have failed to destroy
various resources during device removal.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c       | 2 ++
 drivers/infiniband/hw/cxgb4/device.c   | 5 +++++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 434a7fb..80f9889 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -333,6 +333,8 @@ static void remove_ep_tid(struct c4iw_ep *ep)
 
 	spin_lock_irqsave(&ep->com.dev->lock, flags);
 	_remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid, 0);
+	if (idr_is_empty(&ep->com.dev->hwtid_idr))
+		wake_up(&ep->com.dev->wait);
 	spin_unlock_irqrestore(&ep->com.dev->lock, flags);
 }
 
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 071d733..3c4b212 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -872,9 +872,13 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
 static void c4iw_dealloc(struct uld_ctx *ctx)
 {
 	c4iw_rdev_close(&ctx->dev->rdev);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->cqidr));
 	idr_destroy(&ctx->dev->cqidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->qpidr));
 	idr_destroy(&ctx->dev->qpidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->mmidr));
 	idr_destroy(&ctx->dev->mmidr);
+	wait_event(ctx->dev->wait, idr_is_empty(&ctx->dev->hwtid_idr));
 	idr_destroy(&ctx->dev->hwtid_idr);
 	idr_destroy(&ctx->dev->stid_idr);
 	idr_destroy(&ctx->dev->atid_idr);
@@ -992,6 +996,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	mutex_init(&devp->rdev.stats.lock);
 	mutex_init(&devp->db_mutex);
 	INIT_LIST_HEAD(&devp->db_fc_list);
+	init_waitqueue_head(&devp->wait);
 	devp->avail_ird = devp->rdev.lldi.max_ird_adapter;
 
 	if (c4iw_debugfs_root) {
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index aa47e0a..4b83b84 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -263,6 +263,7 @@ struct c4iw_dev {
 	struct idr stid_idr;
 	struct list_head db_fc_list;
 	u32 avail_ird;
+	wait_queue_head_t wait;
 };
 
 static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-09-01 16:12 ` Steve Wise
@ 2016-09-01 16:12     ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-Eyp3ogxVSr5BDLzU/O5InQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/nvme/host/rdma.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ab545fb..15b0c1d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1351,9 +1351,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 		ret = 1;
 	}
 
-	/* Queue controller deletion */
+	/*
+	 * Queue controller deletion. Keep a reference until all
+	 * work is flushed since delete_work will free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	queue_work(nvme_rdma_wq, &ctrl->delete_work);
 	flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+
 	return ret;
 }
 
@@ -1700,15 +1706,19 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-	int ret;
+	int ret = 0;
 
+	/*
+	 * Keep a reference until all work is flushed since
+	 * __nvme_rdma_del_ctrl can free the ctrl mem
+	 */
+	if (!kref_get_unless_zero(&ctrl->ctrl.kref))
+		return -EBUSY;
 	ret = __nvme_rdma_del_ctrl(ctrl);
-	if (ret)
-		return ret;
-
-	flush_work(&ctrl->delete_work);
-
-	return 0;
+	if (!ret)
+		flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ret;
 }
 
 static void nvme_rdma_remove_ctrl_work(struct work_struct *work)
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
@ 2016-09-01 16:12     ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)


Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ab545fb..15b0c1d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1351,9 +1351,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 		ret = 1;
 	}
 
-	/* Queue controller deletion */
+	/*
+	 * Queue controller deletion. Keep a reference until all
+	 * work is flushed since delete_work will free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	queue_work(nvme_rdma_wq, &ctrl->delete_work);
 	flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+
 	return ret;
 }
 
@@ -1700,15 +1706,19 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-	int ret;
+	int ret = 0;
 
+	/*
+	 * Keep a reference until all work is flushed since
+	 * __nvme_rdma_del_ctrl can free the ctrl mem
+	 */
+	if (!kref_get_unless_zero(&ctrl->ctrl.kref))
+		return -EBUSY;
 	ret = __nvme_rdma_del_ctrl(ctrl);
-	if (ret)
-		return ret;
-
-	flush_work(&ctrl->delete_work);
-
-	return 0;
+	if (!ret)
+		flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ret;
 }
 
 static void nvme_rdma_remove_ctrl_work(struct work_struct *work)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-09-01 16:12 ` Steve Wise
@ 2016-09-01 16:12     ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

After address resolution, the nvme_rdma_queue rdma resources are
allocated.  If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed.  Otherwise, rdma resources are leaked.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-Eyp3ogxVSr5BDLzU/O5InQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/nvme/host/rdma.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 15b0c1d..d97a16f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -82,6 +82,7 @@ struct nvme_rdma_request {
 
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
+	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
 };
 
 struct nvme_rdma_queue {
@@ -483,6 +484,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
+	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags))
+		return;
 	rdma_destroy_qp(queue->cm_id);
 	ib_free_cq(queue->ib_cq);
 
@@ -533,6 +536,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue,
 		ret = -ENOMEM;
 		goto out_destroy_qp;
 	}
+	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
 
 	return 0;
 
@@ -590,6 +594,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_destroy_cm_id:
+	if (test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl->queues[0].flags))
+		nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 	return ret;
 }
@@ -652,7 +658,7 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return 0;
 
 out_free_queues:
-	for (; i >= 1; i--)
+	for (i--; i >= 1; i--)
 		nvme_rdma_stop_and_free_queue(&ctrl->queues[i]);
 
 	return ret;
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/6] nvme-rdma: add DELETING queue flag
  2016-09-01 16:12 ` Steve Wise
@ 2016-09-01 16:12     ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

When we get a surprise disconnect from the target we queue a periodic
reconnect (which is the sane thing to do...).

We only move the queues out of CONNECTED when we retry to reconnect (after
10 seconds in the default case) but we stop the blk queues immediately
so we are not bothered with traffic from now on. If delete() is kicking
off in this period the queues are still in CONNECTED state.

Part of the delete sequence is trying to issue ctrl shutdown if the
admin queue is CONNECTED (which it is!). This request is issued but
stuck in blk-mq waiting for the queues to start again. This might be
the one preventing us from forward progress...

The patch separates the queue flags to CONNECTED and DELETING. Now we
will move out of CONNECTED as soon as error recovery kicks in (before
stopping the queues) and DELETING is on when we start the queue deletion.

Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/nvme/host/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d97a16f..a5c2280 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -83,6 +83,7 @@ struct nvme_rdma_request {
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
 	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
+	NVME_RDMA_Q_DELETING = (1 << 2),
 };
 
 struct nvme_rdma_queue {
@@ -614,7 +615,7 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 
 static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
 {
-	if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags))
+	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	nvme_rdma_stop_queue(queue);
 	nvme_rdma_free_queue(queue);
@@ -767,8 +768,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
+	int i;
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
+
+	for (i = 0; i < ctrl->queue_count; i++)
+		clear_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[i].flags);
+
 	if (ctrl->queue_count > 1)
 		nvme_stop_queues(&ctrl->ctrl);
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
@@ -1348,7 +1354,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	/* Disable the queue so ctrl delete won't free it */
-	if (test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) {
+	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
 		/* Free this queue ourselves */
 		nvme_rdma_stop_queue(queue);
 		nvme_rdma_destroy_queue_ib(queue);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
@ 2016-09-01 16:12     ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)


After address resolution, the nvme_rdma_queue rdma resources are
allocated.  If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed.  Otherwise, rdma resources are leaked.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 15b0c1d..d97a16f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -82,6 +82,7 @@ struct nvme_rdma_request {
 
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
+	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
 };
 
 struct nvme_rdma_queue {
@@ -483,6 +484,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
+	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags))
+		return;
 	rdma_destroy_qp(queue->cm_id);
 	ib_free_cq(queue->ib_cq);
 
@@ -533,6 +536,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue,
 		ret = -ENOMEM;
 		goto out_destroy_qp;
 	}
+	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
 
 	return 0;
 
@@ -590,6 +594,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_destroy_cm_id:
+	if (test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl->queues[0].flags))
+		nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 	return ret;
 }
@@ -652,7 +658,7 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return 0;
 
 out_free_queues:
-	for (; i >= 1; i--)
+	for (i--; i >= 1; i--)
 		nvme_rdma_stop_and_free_queue(&ctrl->queues[i]);
 
 	return ret;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/6] nvme-rdma: add DELETING queue flag
@ 2016-09-01 16:12     ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)


From: Sagi Grimberg <sagi@grimberg.me>

When we get a surprise disconnect from the target we queue a periodic
reconnect (which is the sane thing to do...).

We only move the queues out of CONNECTED when we retry to reconnect (after
10 seconds in the default case) but we stop the blk queues immediately
so we are not bothered with traffic from now on. If delete() is kicking
off in this period the queues are still in CONNECTED state.

Part of the delete sequence is trying to issue ctrl shutdown if the
admin queue is CONNECTED (which it is!). This request is issued but
stuck in blk-mq waiting for the queues to start again. This might be
the one preventing us from forward progress...

The patch separates the queue flags to CONNECTED and DELETING. Now we
will move out of CONNECTED as soon as error recovery kicks in (before
stopping the queues) and DELETING is on when we start the queue deletion.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d97a16f..a5c2280 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -83,6 +83,7 @@ struct nvme_rdma_request {
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
 	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
+	NVME_RDMA_Q_DELETING = (1 << 2),
 };
 
 struct nvme_rdma_queue {
@@ -614,7 +615,7 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 
 static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
 {
-	if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags))
+	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	nvme_rdma_stop_queue(queue);
 	nvme_rdma_free_queue(queue);
@@ -767,8 +768,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
+	int i;
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
+
+	for (i = 0; i < ctrl->queue_count; i++)
+		clear_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[i].flags);
+
 	if (ctrl->queue_count > 1)
 		nvme_stop_queues(&ctrl->ctrl);
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
@@ -1348,7 +1354,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	/* Disable the queue so ctrl delete won't free it */
-	if (test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) {
+	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
 		/* Free this queue ourselves */
 		nvme_rdma_stop_queue(queue);
 		nvme_rdma_destroy_queue_ib(queue);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-09-01 16:12 ` Steve Wise
@ 2016-09-01 16:12     ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Change nvme-rdma to use the IB Client API to detect device removal.
This has the wonderful benefit of being able to blow away all the
ib/rdma_cm resources for the device being removed.  No craziness about
not destroying the cm_id handling the event.  No deadlocks due to broken
iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
around during controller recovery/reconnect to catch device removal
events.

We don't use the device_add aspect of the ib_client service since we only
want to create resources for an IB device if we have a target utilizing
that device.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/nvme/host/rdma.c | 108 ++++++++++++++++++-----------------------------
 1 file changed, 40 insertions(+), 68 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a5c2280..52e1e82 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1317,64 +1317,6 @@ out_destroy_queue_ib:
 	return ret;
 }
 
-/**
- * nvme_rdma_device_unplug() - Handle RDMA device unplug
- * @queue:      Queue that owns the cm_id that caught the event
- *
- * DEVICE_REMOVAL event notifies us that the RDMA device is about
- * to unplug so we should take care of destroying our RDMA resources.
- * This event will be generated for each allocated cm_id.
- *
- * In our case, the RDMA resources are managed per controller and not
- * only per queue. So the way we handle this is we trigger an implicit
- * controller deletion upon the first DEVICE_REMOVAL event we see, and
- * hold the event inflight until the controller deletion is completed.
- *
- * One exception that we need to handle is the destruction of the cm_id
- * that caught the event. Since we hold the callout until the controller
- * deletion is completed, we'll deadlock if the controller deletion will
- * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
- * of destroying this queue before-hand, destroy the queue resources,
- * then queue the controller deletion which won't destroy this queue and
- * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
- */
-static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
-{
-	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
-	int ret = 0;
-
-	/* Own the controller deletion */
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
-		return 0;
-
-	dev_warn(ctrl->ctrl.device,
-		"Got rdma device removal event, deleting ctrl\n");
-
-	/* Get rid of reconnect work if its running */
-	cancel_delayed_work_sync(&ctrl->reconnect_work);
-
-	/* Disable the queue so ctrl delete won't free it */
-	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
-		/* Free this queue ourselves */
-		nvme_rdma_stop_queue(queue);
-		nvme_rdma_destroy_queue_ib(queue);
-
-		/* Return non-zero so the cm_id will destroy implicitly */
-		ret = 1;
-	}
-
-	/*
-	 * Queue controller deletion. Keep a reference until all
-	 * work is flushed since delete_work will free the ctrl mem
-	 */
-	kref_get(&ctrl->ctrl.kref);
-	queue_work(nvme_rdma_wq, &ctrl->delete_work);
-	flush_work(&ctrl->delete_work);
-	nvme_put_ctrl(&ctrl->ctrl);
-
-	return ret;
-}
-
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *ev)
 {
@@ -1416,8 +1358,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		nvme_rdma_error_recovery(queue->ctrl);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		/* return 1 means impliciy CM ID destroy */
-		return nvme_rdma_device_unplug(queue);
+		/* device removal is handled via the ib_client API */
+		break;
 	default:
 		dev_err(queue->ctrl->ctrl.device,
 			"Unexpected RDMA CM event (%d)\n", ev->event);
@@ -2027,27 +1969,57 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
 	.create_ctrl	= nvme_rdma_create_ctrl,
 };
 
+static void nvme_rdma_add_one(struct ib_device *ib_device)
+{
+}
+
+static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
+{
+	struct nvme_rdma_ctrl *ctrl;
+
+	/* Delete all controllers using this device */
+	mutex_lock(&nvme_rdma_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
+		if (ctrl->device->dev != ib_device)
+			continue;
+		dev_info(ctrl->ctrl.device,
+			"Removing ctrl: NQN \"%s\", addr %pISp\n",
+			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
+		__nvme_rdma_del_ctrl(ctrl);
+	}
+	mutex_unlock(&nvme_rdma_ctrl_mutex);
+
+	flush_workqueue(nvme_rdma_wq);
+}
+
+static struct ib_client nvme_rdma_ib_client = {
+	.name   = "nvme_rdma",
+	.add = nvme_rdma_add_one,
+	.remove = nvme_rdma_remove_one
+};
+
 static int __init nvme_rdma_init_module(void)
 {
+	int ret;
+
 	nvme_rdma_wq = create_workqueue("nvme_rdma_wq");
 	if (!nvme_rdma_wq)
 		return -ENOMEM;
 
+	ret = ib_register_client(&nvme_rdma_ib_client);
+	if (ret) {
+		destroy_workqueue(nvme_rdma_wq);
+		return ret;
+	}
+
 	nvmf_register_transport(&nvme_rdma_transport);
 	return 0;
 }
 
 static void __exit nvme_rdma_cleanup_module(void)
 {
-	struct nvme_rdma_ctrl *ctrl;
-
 	nvmf_unregister_transport(&nvme_rdma_transport);
-
-	mutex_lock(&nvme_rdma_ctrl_mutex);
-	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list)
-		__nvme_rdma_del_ctrl(ctrl);
-	mutex_unlock(&nvme_rdma_ctrl_mutex);
-
+	ib_unregister_client(&nvme_rdma_ib_client);
 	destroy_workqueue(nvme_rdma_wq);
 }
 
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/6] nvme-rdma: use ib_client API to detect device removal
@ 2016-09-01 16:12     ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)


Change nvme-rdma to use the IB Client API to detect device removal.
This has the wonderful benefit of being able to blow away all the
ib/rdma_cm resources for the device being removed.  No craziness about
not destroying the cm_id handling the event.  No deadlocks due to broken
iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
around during controller recovery/reconnect to catch device removal
events.

We don't use the device_add aspect of the ib_client service since we only
want to create resources for an IB device if we have a target utilizing
that device.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 108 ++++++++++++++++++-----------------------------
 1 file changed, 40 insertions(+), 68 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a5c2280..52e1e82 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1317,64 +1317,6 @@ out_destroy_queue_ib:
 	return ret;
 }
 
-/**
- * nvme_rdma_device_unplug() - Handle RDMA device unplug
- * @queue:      Queue that owns the cm_id that caught the event
- *
- * DEVICE_REMOVAL event notifies us that the RDMA device is about
- * to unplug so we should take care of destroying our RDMA resources.
- * This event will be generated for each allocated cm_id.
- *
- * In our case, the RDMA resources are managed per controller and not
- * only per queue. So the way we handle this is we trigger an implicit
- * controller deletion upon the first DEVICE_REMOVAL event we see, and
- * hold the event inflight until the controller deletion is completed.
- *
- * One exception that we need to handle is the destruction of the cm_id
- * that caught the event. Since we hold the callout until the controller
- * deletion is completed, we'll deadlock if the controller deletion will
- * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
- * of destroying this queue before-hand, destroy the queue resources,
- * then queue the controller deletion which won't destroy this queue and
- * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
- */
-static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
-{
-	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
-	int ret = 0;
-
-	/* Own the controller deletion */
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
-		return 0;
-
-	dev_warn(ctrl->ctrl.device,
-		"Got rdma device removal event, deleting ctrl\n");
-
-	/* Get rid of reconnect work if its running */
-	cancel_delayed_work_sync(&ctrl->reconnect_work);
-
-	/* Disable the queue so ctrl delete won't free it */
-	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
-		/* Free this queue ourselves */
-		nvme_rdma_stop_queue(queue);
-		nvme_rdma_destroy_queue_ib(queue);
-
-		/* Return non-zero so the cm_id will destroy implicitly */
-		ret = 1;
-	}
-
-	/*
-	 * Queue controller deletion. Keep a reference until all
-	 * work is flushed since delete_work will free the ctrl mem
-	 */
-	kref_get(&ctrl->ctrl.kref);
-	queue_work(nvme_rdma_wq, &ctrl->delete_work);
-	flush_work(&ctrl->delete_work);
-	nvme_put_ctrl(&ctrl->ctrl);
-
-	return ret;
-}
-
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *ev)
 {
@@ -1416,8 +1358,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		nvme_rdma_error_recovery(queue->ctrl);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		/* return 1 means impliciy CM ID destroy */
-		return nvme_rdma_device_unplug(queue);
+		/* device removal is handled via the ib_client API */
+		break;
 	default:
 		dev_err(queue->ctrl->ctrl.device,
 			"Unexpected RDMA CM event (%d)\n", ev->event);
@@ -2027,27 +1969,57 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
 	.create_ctrl	= nvme_rdma_create_ctrl,
 };
 
+static void nvme_rdma_add_one(struct ib_device *ib_device)
+{
+}
+
+static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
+{
+	struct nvme_rdma_ctrl *ctrl;
+
+	/* Delete all controllers using this device */
+	mutex_lock(&nvme_rdma_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
+		if (ctrl->device->dev != ib_device)
+			continue;
+		dev_info(ctrl->ctrl.device,
+			"Removing ctrl: NQN \"%s\", addr %pISp\n",
+			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
+		__nvme_rdma_del_ctrl(ctrl);
+	}
+	mutex_unlock(&nvme_rdma_ctrl_mutex);
+
+	flush_workqueue(nvme_rdma_wq);
+}
+
+static struct ib_client nvme_rdma_ib_client = {
+	.name   = "nvme_rdma",
+	.add = nvme_rdma_add_one,
+	.remove = nvme_rdma_remove_one
+};
+
 static int __init nvme_rdma_init_module(void)
 {
+	int ret;
+
 	nvme_rdma_wq = create_workqueue("nvme_rdma_wq");
 	if (!nvme_rdma_wq)
 		return -ENOMEM;
 
+	ret = ib_register_client(&nvme_rdma_ib_client);
+	if (ret) {
+		destroy_workqueue(nvme_rdma_wq);
+		return ret;
+	}
+
 	nvmf_register_transport(&nvme_rdma_transport);
 	return 0;
 }
 
 static void __exit nvme_rdma_cleanup_module(void)
 {
-	struct nvme_rdma_ctrl *ctrl;
-
 	nvmf_unregister_transport(&nvme_rdma_transport);
-
-	mutex_lock(&nvme_rdma_ctrl_mutex);
-	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list)
-		__nvme_rdma_del_ctrl(ctrl);
-	mutex_unlock(&nvme_rdma_ctrl_mutex);
-
+	ib_unregister_client(&nvme_rdma_ib_client);
 	destroy_workqueue(nvme_rdma_wq);
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 0/6] nvme-rdma device removal fixes
@ 2016-09-01 16:12 ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

This series addresses several problems when shutting down a nvme-rdma
host when its controllers are attempting to reconnect to a target that
is no longer reachable.

To tickle these bugs:

1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
   - rmmod iw_cxgb4
   - reboot
   - reboot -f

Doug/Sagi, the first 2 iw_cxgb4 patches are included here because they're
needed for the testing.  While I've also submitted them to linux-rdma,
perhaps Sagi can merge them in nvmf-4.8-rc?  If that is acceptable
with everyone.

Patch series:

1/6 iw_cxgb4: call dev_put() on l2t allocation failure
2/6 iw_cxgb4: block module unload until all ep resources are released
3/6 nvme_rdma: keep a ref on the ctrl during delete/flush
4/6 nvme-rdma: destroy nvme queue rdma resources on connect failure
5/6 nvme-rdma: add DELETING queue flag
6/6 nvme-rdma: use ib_client API to detect device removal

Changes since v3:

- removed WIP/RFC tag

- fixed a bug in patch 4 where a rdma reject from the target causes a
double free of the ib queue and cm_id.

- remove noisy pr_info()s in patch 6

- kref_get -> kref_get_unless_zero in nvme_rdma_del_ctrl() of patch 3

- add reviewed-by tags

Changes since v2:

- refactor/simplify the remove_one function.

- nvme-rdma module remove function doesn't need to explicitly
remove the controllers; they will be removed as part of
ib_client unregister.

- removed forward declarations.

Changes since v1:

- the big change was patch 6 rewrite - use client_ib API to handle
device removal instead of rdma_cm device removal events.

- tweaked patch 5 to avoid bisect issues

- small code rework on patch 3 based on Christoph's suggestion

- clear_bit() -> !test_and_clear_bit() in patch 4 (Christoph's comment)

- add reviewed-by tags.

---

Sagi Grimberg (1):
  nvme-rdma: add DELETING queue flag

Steve Wise (5):
  iw_cxgb4: call dev_put() on l2t allocation failure
  iw_cxgb4: block module unload until all ep resources are released
  nvme_rdma: keep a ref on the ctrl during delete/flush
  nvme-rdma: destroy nvme queue rdma resources on connect failure
  nvme-rdma: use ib_client API to detect device removal

 drivers/infiniband/hw/cxgb4/cm.c       |   6 +-
 drivers/infiniband/hw/cxgb4/device.c   |   5 ++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   1 +
 drivers/nvme/host/rdma.c               | 136 ++++++++++++++++-----------------
 4 files changed, 76 insertions(+), 72 deletions(-)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 0/6] nvme-rdma device removal fixes
@ 2016-09-01 16:12 ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-01 16:12 UTC (permalink / raw)


This series addresses several problems when shutting down a nvme-rdma
host when its controllers are attempting to reconnect to a target that
is no longer reachable.

To tickle these bugs:

1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
   - rmmod iw_cxgb4
   - reboot
   - reboot -f

Doug/Sagi, the first 2 iw_cxgb4 patches are included here because they're
needed for the testing.  While I've also submitted them to linux-rdma,
perhaps Sagi can merge them in nvmf-4.8-rc?  If that is acceptable
with everyone.

Patch series:

1/6 iw_cxgb4: call dev_put() on l2t allocation failure
2/6 iw_cxgb4: block module unload until all ep resources are released
3/6 nvme_rdma: keep a ref on the ctrl during delete/flush
4/6 nvme-rdma: destroy nvme queue rdma resources on connect failure
5/6 nvme-rdma: add DELETING queue flag
6/6 nvme-rdma: use ib_client API to detect device removal

Changes since v3:

- removed WIP/RFC tag

- fixed a bug in patch 4 where a rdma reject from the target causes a
double free of the ib queue and cm_id.

- remove noisy pr_info()s in patch 6

- kref_get -> kref_get_unless_zero in nvme_rdma_del_ctrl() of patch 3

- add reviewed-by tags

Changes since v2:

- refactor/simplify the remove_one function.

- nvme-rdma module remove function doesn't need to explicitly
remove the controllers; they will be removed as part of
ib_client unregister.

- removed forward declarations.

Changes since v1:

- the big change was patch 6 rewrite - use client_ib API to handle
device removal instead of rdma_cm device removal events.

- tweaked patch 5 to avoid bisect issues

- small code rework on patch 3 based on Christoph's suggestion

- clear_bit() -> !test_and_clear_bit() in patch 4 (Christoph's comment)

- add reviewed-by tags.

---

Sagi Grimberg (1):
  nvme-rdma: add DELETING queue flag

Steve Wise (5):
  iw_cxgb4: call dev_put() on l2t allocation failure
  iw_cxgb4: block module unload until all ep resources are released
  nvme_rdma: keep a ref on the ctrl during delete/flush
  nvme-rdma: destroy nvme queue rdma resources on connect failure
  nvme-rdma: use ib_client API to detect device removal

 drivers/infiniband/hw/cxgb4/cm.c       |   6 +-
 drivers/infiniband/hw/cxgb4/device.c   |   5 ++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   1 +
 drivers/nvme/host/rdma.c               | 136 ++++++++++++++++-----------------
 4 files changed, 76 insertions(+), 72 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-09-01 16:12     ` Steve Wise
@ 2016-09-02 15:08         ` Steve Wise
  -1 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-02 15:08 UTC (permalink / raw)
  To: sagi-NQWnxTmZq1alnMjI0IkVqw, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, dledford-H+wXaHxf7aLQT0dZR+AlfA

> 
> After address resolution, the nvme_rdma_queue rdma resources are
> allocated.  If rdma route resolution or the connect fails, or the
> controller reconnect times out and gives up, then the rdma resources
> need to be freed.  Otherwise, rdma resources are leaked.
> 
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Reviewed-by: Sagi Grimberg <sagi-Eyp3ogxVSr5BDLzU/O5InQ@public.gmane.org>
> Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> ---
>  drivers/nvme/host/rdma.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 15b0c1d..d97a16f 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -82,6 +82,7 @@ struct nvme_rdma_request {
> 
>  enum nvme_rdma_queue_flags {
>  	NVME_RDMA_Q_CONNECTED = (1 << 0),
> +	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
>  };
> 
>  struct nvme_rdma_queue {
> @@ -483,6 +484,8 @@ static void nvme_rdma_destroy_queue_ib(struct
> nvme_rdma_queue *queue)
>  	struct nvme_rdma_device *dev = queue->device;
>  	struct ib_device *ibdev = dev->dev;
> 
> +	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED,
> &queue->flags))
> +		return;
>  	rdma_destroy_qp(queue->cm_id);
>  	ib_free_cq(queue->ib_cq);
> 
> @@ -533,6 +536,7 @@ static int nvme_rdma_create_queue_ib(struct
> nvme_rdma_queue *queue,
>  		ret = -ENOMEM;
>  		goto out_destroy_qp;
>  	}
> +	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
> 
>  	return 0;
> 
> @@ -590,6 +594,8 @@ static int nvme_rdma_init_queue(struct
> nvme_rdma_ctrl *ctrl,
>  	return 0;
> 
>  out_destroy_cm_id:
> +	if (test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl-
> >queues[0].flags))
> +		nvme_rdma_destroy_queue_ib(queue);
>  	rdma_destroy_id(queue->cm_id);
>  	return ret;
>  }

The above chunk is incorrect.  It is testing the ALLOCATED bit of the admin
queue (ctrl->queues[0]), which is just wrong.  This chunk should just call
nvme_rdma_destroy_queue_ib() and let that function test/clear the ALLOCATED
bit of the queue being cleaned up.  This bug was apparently benign and it
slipped through my testing.

I'll be posting v5 after I fix this and retest...


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
@ 2016-09-02 15:08         ` Steve Wise
  0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2016-09-02 15:08 UTC (permalink / raw)


> 
> After address resolution, the nvme_rdma_queue rdma resources are
> allocated.  If rdma route resolution or the connect fails, or the
> controller reconnect times out and gives up, then the rdma resources
> need to be freed.  Otherwise, rdma resources are leaked.
> 
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
>  drivers/nvme/host/rdma.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 15b0c1d..d97a16f 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -82,6 +82,7 @@ struct nvme_rdma_request {
> 
>  enum nvme_rdma_queue_flags {
>  	NVME_RDMA_Q_CONNECTED = (1 << 0),
> +	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
>  };
> 
>  struct nvme_rdma_queue {
> @@ -483,6 +484,8 @@ static void nvme_rdma_destroy_queue_ib(struct
> nvme_rdma_queue *queue)
>  	struct nvme_rdma_device *dev = queue->device;
>  	struct ib_device *ibdev = dev->dev;
> 
> +	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED,
> &queue->flags))
> +		return;
>  	rdma_destroy_qp(queue->cm_id);
>  	ib_free_cq(queue->ib_cq);
> 
> @@ -533,6 +536,7 @@ static int nvme_rdma_create_queue_ib(struct
> nvme_rdma_queue *queue,
>  		ret = -ENOMEM;
>  		goto out_destroy_qp;
>  	}
> +	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
> 
>  	return 0;
> 
> @@ -590,6 +594,8 @@ static int nvme_rdma_init_queue(struct
> nvme_rdma_ctrl *ctrl,
>  	return 0;
> 
>  out_destroy_cm_id:
> +	if (test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl-
> >queues[0].flags))
> +		nvme_rdma_destroy_queue_ib(queue);
>  	rdma_destroy_id(queue->cm_id);
>  	return ret;
>  }

The above chunk is incorrect.  It is testing the ALLOCATED bit of the admin
queue (ctrl->queues[0]), which is just wrong.  This chunk should just call
nvme_rdma_destroy_queue_ib() and let that function test/clear the ALLOCATED
bit of the queue being cleaned up.  This bug was apparently benign and it
slipped through my testing.

I'll be posting v5 after I fix this and retest...

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-02 15:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-01 16:12 [PATCH v4 0/6] nvme-rdma device removal fixes Steve Wise
2016-09-01 16:12 ` Steve Wise
     [not found] ` <cover.1472746379.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-09-01 13:43   ` [PATCH v4 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-09-01 13:43     ` Steve Wise
2016-09-01 13:44   ` [PATCH v4 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-09-01 13:44     ` Steve Wise
2016-09-01 16:12   ` [PATCH v4 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-09-01 16:12     ` Steve Wise
2016-09-01 16:12   ` [PATCH v4 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-09-01 16:12     ` Steve Wise
     [not found]     ` <c04f7c0232402294ce6fc0dd85878c0a21ed46bc.1472746379.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-09-02 15:08       ` Steve Wise
2016-09-02 15:08         ` Steve Wise
2016-09-01 16:12   ` [PATCH v4 5/6] nvme-rdma: add DELETING queue flag Steve Wise
2016-09-01 16:12     ` Steve Wise
2016-09-01 16:12   ` [PATCH v4 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
2016-09-01 16:12     ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.