[PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
  2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
@ 2016-08-25 20:49 ` Steve Wise
  2016-08-28 12:42   ` Sagi Grimberg
  2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Steve Wise @ 2016-08-25 20:49 UTC (permalink / raw)


Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b6a953a..a4f3f1d 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2117,8 +2117,11 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 		}
 		ep->l2t = cxgb4_l2t_get(cdev->rdev.lldi.l2t,
 					n, pdev, rt_tos2priority(tos));
-		if (!ep->l2t)
+		if (!ep->l2t) {
+			err = -ENOMEM;
+			dev_put(pdev);
 			goto out;
+		}
 		ep->mtu = pdev->mtu;
 		ep->tx_chan = cxgb4_port_chan(pdev);
 		ep->smac_idx = cxgb4_tp_smt_idx(adapter_type,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released
  2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
  2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
@ 2016-08-26 13:50 ` Steve Wise
  2016-08-28 12:43   ` Sagi Grimberg
  2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Steve Wise @ 2016-08-26 13:50 UTC (permalink / raw)


Otherwise an endpoint can be still closing down causing a touch
after free crash.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c       | 2 ++
 drivers/infiniband/hw/cxgb4/device.c   | 5 +++++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index a4f3f1d..8c582bf 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -333,6 +333,8 @@ static void remove_ep_tid(struct c4iw_ep *ep)
 
 	spin_lock_irqsave(&ep->com.dev->lock, flags);
 	_remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid, 0);
+	if (idr_is_empty(&ep->com.dev->hwtid_idr))
+		wake_up(&ep->com.dev->wait);
 	spin_unlock_irqrestore(&ep->com.dev->lock, flags);
 }
 
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 071d733..3c4b212 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -872,9 +872,13 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
 static void c4iw_dealloc(struct uld_ctx *ctx)
 {
 	c4iw_rdev_close(&ctx->dev->rdev);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->cqidr));
 	idr_destroy(&ctx->dev->cqidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->qpidr));
 	idr_destroy(&ctx->dev->qpidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->mmidr));
 	idr_destroy(&ctx->dev->mmidr);
+	wait_event(ctx->dev->wait, idr_is_empty(&ctx->dev->hwtid_idr));
 	idr_destroy(&ctx->dev->hwtid_idr);
 	idr_destroy(&ctx->dev->stid_idr);
 	idr_destroy(&ctx->dev->atid_idr);
@@ -992,6 +996,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	mutex_init(&devp->rdev.stats.lock);
 	mutex_init(&devp->db_mutex);
 	INIT_LIST_HEAD(&devp->db_fc_list);
+	init_waitqueue_head(&devp->wait);
 	devp->avail_ird = devp->rdev.lldi.max_ird_adapter;
 
 	if (c4iw_debugfs_root) {
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index aa47e0a..4b83b84 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -263,6 +263,7 @@ struct c4iw_dev {
 	struct idr stid_idr;
 	struct list_head db_fc_list;
 	u32 avail_ird;
+	wait_queue_head_t wait;
 };
 
 static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
  2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
  2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
@ 2016-08-26 13:50 ` Steve Wise
  2016-08-26 14:38   ` Christoph Hellwig
  2016-08-28 12:45   ` Sagi Grimberg
  2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 13:50 UTC (permalink / raw)


Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c133256..00943cd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1356,9 +1356,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 		ret = 1;
 	}
 
-	/* Queue controller deletion */
+	/*
+	 * Queue controller deletion. Keep a reference until all
+	 * work is flushed since delete_work will free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	queue_work(nvme_rdma_wq, &ctrl->delete_work);
 	flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+
 	return ret;
 }
 
@@ -1705,15 +1711,22 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-	int ret;
+	int ret = 0;
+
+	/*
+	 * Keep a reference until all work is flushed since
+	 * __nvme_rdma_del_ctrl can free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 
 	ret = __nvme_rdma_del_ctrl(ctrl);
 	if (ret)
-		return ret;
+		goto out;
 
 	flush_work(&ctrl->delete_work);
-
-	return 0;
+out:
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ret;
 }
 
 static void nvme_rdma_remove_ctrl_work(struct work_struct *work)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (2 preceding siblings ...)
  2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
@ 2016-08-26 13:50 ` Steve Wise
  2016-08-26 14:39   ` Christoph Hellwig
  2016-08-28 12:44   ` Sagi Grimberg
  2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
  2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
  5 siblings, 2 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 13:50 UTC (permalink / raw)


After address resolution, the nvme_rdma_queue rdma resources are
allocated.  If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed.  Otherwise, rdma resources are leaked.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 00943cd..8be7b6c 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -87,6 +87,7 @@ struct nvme_rdma_request {
 
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
+	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
 };
 
 struct nvme_rdma_queue {
@@ -488,6 +489,7 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
+	clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
 	rdma_destroy_qp(queue->cm_id);
 	ib_free_cq(queue->ib_cq);
 
@@ -538,6 +540,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue,
 		ret = -ENOMEM;
 		goto out_destroy_qp;
 	}
+	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
 
 	return 0;
 
@@ -595,6 +598,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_destroy_cm_id:
+	if (test_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl->queues[0].flags))
+		nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 	return ret;
 }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag
  2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (3 preceding siblings ...)
  2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
@ 2016-08-26 13:50 ` Steve Wise
  2016-08-26 14:14   ` Steve Wise
  2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
  5 siblings, 1 reply; 23+ messages in thread
From: Steve Wise @ 2016-08-26 13:50 UTC (permalink / raw)


From: Sagi Grimberg <sagi@grimberg.me>

When we get a surprise disconnect from the target we queue a periodic
reconnect (which is the sane thing to do...).

We only move the queues out of CONNECTED when we retry to reconnect (after
10 seconds in the default case) but we stop the blk queues immediately
so we are not bothered with traffic from now on. If delete() is kicking
off in this period the queues are still in CONNECTED state.

Part of the delete sequence is trying to issue ctrl shutdown if the
admin queue is CONNECTED (which it is!). This request is issued but
stuck in blk-mq waiting for the queues to start again. This might be
the one preventing us from forward progress...

The patch separates the queue flags to CONNECTED and DELETING. Now we
will move out of CONNECTED as soon as error recovery kicks in (before
stopping the queues) and DELETING is on when we start the queue deletion.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8be7b6c..b99d7fd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -88,6 +88,7 @@ struct nvme_rdma_request {
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
 	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
+	NVME_RDMA_Q_DELETING = (1 << 2),
 };
 
 struct nvme_rdma_queue {
@@ -618,7 +619,7 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 
 static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
 {
-	if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags))
+	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	nvme_rdma_stop_queue(queue);
 	nvme_rdma_free_queue(queue);
@@ -771,8 +772,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
+	int i;
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
+
+	for (i = 0; i < ctrl->queue_count; i++)
+		clear_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[i].flags);
+
 	if (ctrl->queue_count > 1)
 		nvme_stop_queues(&ctrl->ctrl);
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (4 preceding siblings ...)
  2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
@ 2016-08-26 13:52 ` Steve Wise
  2016-08-26 14:41   ` Christoph Hellwig
  2016-08-28 12:56   ` Sagi Grimberg
  5 siblings, 2 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 13:52 UTC (permalink / raw)


This patch adds the concept of an "unplug" cm_id for each nvme_rdma_ctrl
controller.  When the controller is first created and the admin qp
is connected to the target, the unplug_cm_id is created and address
resolution is done on it to bind it to the same device that the admin QP
is bound to.   This unplug_cm_id remains across any/all kato recovery and
thus will always be available for DEVICE_REMOVAL events.  This simplifies
the unplug handler because the cm_id isn't associated with any of the IO
queues nor the admin queue.  Plus it ensures a cm_id is always available
per controller to get the DEVICE_REMOVAL event.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 134 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 107 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index b99d7fd..f05fa0c 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -140,6 +140,11 @@ struct nvme_rdma_ctrl {
 	};
 
 	struct nvme_ctrl	ctrl;
+
+	/* the cm_id and vars for device unplug events */
+	struct rdma_cm_id	*unplug_cm_id;
+	int			unplug_cm_error;
+	struct completion	unplug_cm_done;
 };
 
 static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl)
@@ -168,6 +173,7 @@ MODULE_PARM_DESC(register_always,
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *event);
 static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
+static int nvme_rdma_device_unplug(struct nvme_rdma_ctrl *ctrl);
 
 /* XXX: really should move to a generic header sooner or later.. */
 static inline void put_unaligned_le24(u32 val, u8 *p)
@@ -553,6 +559,79 @@ out:
 	return ret;
 }
 
+static int nvme_rdma_unplug_cm_handler(struct rdma_cm_id *cm_id,
+		struct rdma_cm_event *ev)
+{
+	struct nvme_rdma_ctrl *ctrl = cm_id->context;
+	int ret = 0;
+
+	dev_warn(ctrl->ctrl.device, "%s (%d): status %d id %p\n",
+		rdma_event_msg(ev->event), ev->event,
+		ev->status, cm_id);
+
+	switch (ev->event) {
+	case RDMA_CM_EVENT_ADDR_RESOLVED:
+		ctrl->unplug_cm_error = 0;
+		complete(&ctrl->unplug_cm_done);
+		break;
+	case RDMA_CM_EVENT_DEVICE_REMOVAL:
+		/* return 1 means implicit CM ID destroy */
+		ret = nvme_rdma_device_unplug(ctrl);
+		break;
+	default:
+		dev_err(ctrl->ctrl.device,
+			"Unexpected RDMA CM event (%d) status %d\n",
+			ev->event, ev->status);
+		ctrl->unplug_cm_error = ev->status;
+		complete(&ctrl->unplug_cm_done);
+		break;
+	}
+	return ret;
+}
+
+static int nvme_rdma_init_unplug_cm_id(struct nvme_rdma_ctrl *ctrl)
+{
+	int ret;
+
+	dev_info(ctrl->ctrl.device, "%s enter\n", __func__);
+	init_completion(&ctrl->unplug_cm_done);
+
+	ctrl->unplug_cm_id = rdma_create_id(&init_net,
+			nvme_rdma_unplug_cm_handler, ctrl,
+			RDMA_PS_TCP, IB_QPT_RC);
+	if (IS_ERR(ctrl->unplug_cm_id)) {
+		dev_info(ctrl->ctrl.device, "failed to create CM ID: %ld\n",
+			PTR_ERR(ctrl->unplug_cm_id));
+		return PTR_ERR(ctrl->unplug_cm_id);
+	}
+
+	ctrl->unplug_cm_error = -ETIMEDOUT;
+	ret = rdma_resolve_addr(ctrl->unplug_cm_id, NULL, &ctrl->addr,
+			NVME_RDMA_CONNECT_TIMEOUT_MS);
+	if (ret) {
+		dev_info(ctrl->ctrl.device,
+			"rdma_resolve_addr failed (%d).\n", ret);
+		goto out_destroy_unplug_cm_id;
+	}
+	wait_for_completion_interruptible_timeout(&ctrl->unplug_cm_done,
+			msecs_to_jiffies(NVME_RDMA_CONNECT_TIMEOUT_MS) + 1);
+	ret = ctrl->unplug_cm_error;
+	if (ret) {
+		dev_info(ctrl->ctrl.device,
+			"nvme_rdma_init_unplug_unplug_cm_id failed (%d).\n",
+			ret);
+		goto out_destroy_unplug_cm_id;
+	}
+
+	dev_info(ctrl->ctrl.device, "%s exit\n", __func__);
+	return 0;
+
+out_destroy_unplug_cm_id:
+	rdma_destroy_id(ctrl->unplug_cm_id);
+	dev_info(ctrl->ctrl.device, "%s exit err %d\n", __func__, ret);
+	return ret;
+}
+
 static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 		int idx, size_t queue_size)
 {
@@ -594,6 +673,15 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 		goto out_destroy_cm_id;
 	}
 
+	if (idx == 0 && !ctrl->unplug_cm_id) {
+		ret = nvme_rdma_init_unplug_cm_id(ctrl);
+		if (ret) {
+			dev_info(ctrl->ctrl.device,
+				"init_unplug_cm_id failed (%d).\n", ret);
+			goto out_destroy_cm_id;
+		}
+	}
+
 	set_bit(NVME_RDMA_Q_CONNECTED, &queue->flags);
 
 	return 0;
@@ -1323,30 +1411,24 @@ out_destroy_queue_ib:
 
 /**
  * nvme_rdma_device_unplug() - Handle RDMA device unplug
- * @queue:      Queue that owns the cm_id that caught the event
+ * @ctrl:      Controller that owns the unplug_cm_id that caught the event
  *
  * DEVICE_REMOVAL event notifies us that the RDMA device is about
  * to unplug so we should take care of destroying our RDMA resources.
- * This event will be generated for each allocated cm_id.
+ * This event will be generated for each allocated cm_id, but only handled
+ * by each controller's unplug_cm_id.
  *
- * In our case, the RDMA resources are managed per controller and not
- * only per queue. So the way we handle this is we trigger an implicit
- * controller deletion upon the first DEVICE_REMOVAL event we see, and
- * hold the event inflight until the controller deletion is completed.
+ * Trigger an implicit controller deletion and hold the event inflight
+ * until the controller deletion is completed.
  *
- * One exception that we need to handle is the destruction of the cm_id
+ * One exception that we need to handle is the destruction of the unplug_cm_id
  * that caught the event. Since we hold the callout until the controller
  * deletion is completed, we'll deadlock if the controller deletion will
- * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
- * of destroying this queue before-hand, destroy the queue resources,
- * then queue the controller deletion which won't destroy this queue and
- * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
+ * call rdma_destroy_id on this queue's cm_id.  We destroy the unplug_cm_id
+ * implicitely by returning a non-zero rc to the callout.
  */
-static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
+static int nvme_rdma_device_unplug(struct nvme_rdma_ctrl *ctrl)
 {
-	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
-	int ret = 0;
-
 	/* Own the controller deletion */
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
 		return 0;
@@ -1357,15 +1439,11 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 	/* Get rid of reconnect work if its running */
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
-	/* Disable the queue so ctrl delete won't free it */
-	if (test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) {
-		/* Free this queue ourselves */
-		nvme_rdma_stop_queue(queue);
-		nvme_rdma_destroy_queue_ib(queue);
-
-		/* Return non-zero so the cm_id will destroy implicitly */
-		ret = 1;
-	}
+	/*
+	 * NULL out the unplug_cm_id pointer so the controller deletion
+	 * does not free it.
+	 */
+	ctrl->unplug_cm_id = NULL;
 
 	/*
 	 * Queue controller deletion. Keep a reference until all
@@ -1376,7 +1454,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 	flush_work(&ctrl->delete_work);
 	nvme_put_ctrl(&ctrl->ctrl);
 
-	return ret;
+	return 1;
 }
 
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
@@ -1420,8 +1498,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		nvme_rdma_error_recovery(queue->ctrl);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		/* return 1 means impliciy CM ID destroy */
-		return nvme_rdma_device_unplug(queue);
+		/* handled on ctrl->unplug_cm_id */
+		break;
 	default:
 		dev_err(queue->ctrl->ctrl.device,
 			"Unexpected RDMA CM event (%d)\n", ev->event);
@@ -1697,6 +1775,8 @@ static void __nvme_rdma_remove_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 		nvme_rdma_dev_put(ctrl->device);
 	}
 
+	if (ctrl->unplug_cm_id)
+		rdma_destroy_id(ctrl->unplug_cm_id);
 	nvme_put_ctrl(&ctrl->ctrl);
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes
@ 2016-08-26 13:53 Steve Wise
  2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 13:53 UTC (permalink / raw)


This series is a Work In Progress (WIP) attempting to address several
problems when shutting down a nvme-rdma host when its controllers are
attempting to reconnect to a target that is no longer reachable.

I'm still testing, and there is at least one outstanding bug I'm still
chasing, but I welcome review.  Specifically the last patch which
solves the problem of always being able to detect a device removal.

To tickle these bugs:

1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
   - rmmod iw_cxgb4
   - reboot
   - reboot -f

Note, the default configuration of the chelsio RNIC is for a lan/wan
environment.  This causes very long delays due to TCP retransmit
backoff algorithms and basically hangs the host during a shutdown for an
unreasonable amount of time.  This is further complicated by the fact
that the RDMA_CM blocks cm_id destruction if the cm_id is attempting
connection setup.  I will address this issue with another series.
To work around the problem, the chelsio RNIC can be configured for a
storage cluster environment, where the retransmit timeout times are
much shorter.  If anyone is doing this sort of testing, I can provide
you with a config file for the storage/cluster configuration.

Sagi, I included your DELETEING patch since it is needed to make forward
progress on my device removal testing.

Thanks,

Steve.

----

Sagi Grimberg (1):
  nvme-rdma: add DELETING queue flag

Steve Wise (5):
  iw_cxgb4: call dev_put() on l2t allocation failure
  iw_cxgb4: block module unload until all ep resources are released
  nvme_rdma: keep a ref on the ctrl during delete/flush
  nvme-rdma: destroy nvme queue rdma resources on connect failure
  nvme-rdma: keep a cm_id around during reconnect to get events

 drivers/infiniband/hw/cxgb4/cm.c       |   7 +-
 drivers/infiniband/hw/cxgb4/device.c   |   5 +
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   1 +
 drivers/nvme/host/rdma.c               | 170 ++++++++++++++++++++++++++-------
 4 files changed, 149 insertions(+), 34 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag
  2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
@ 2016-08-26 14:14   ` Steve Wise
  2016-08-28 12:48     ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Wise @ 2016-08-26 14:14 UTC (permalink / raw)



> 
> From: Sagi Grimberg <sagi at grimberg.me>
> 
> When we get a surprise disconnect from the target we queue a periodic
> reconnect (which is the sane thing to do...).
> 
> We only move the queues out of CONNECTED when we retry to reconnect (after
> 10 seconds in the default case) but we stop the blk queues immediately
> so we are not bothered with traffic from now on. If delete() is kicking
> off in this period the queues are still in CONNECTED state.
> 
> Part of the delete sequence is trying to issue ctrl shutdown if the
> admin queue is CONNECTED (which it is!). This request is issued but
> stuck in blk-mq waiting for the queues to start again. This might be
> the one preventing us from forward progress...
> 
> The patch separates the queue flags to CONNECTED and DELETING. Now we
> will move out of CONNECTED as soon as error recovery kicks in (before
> stopping the queues) and DELETING is on when we start the queue deletion.
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Sagi,

This patch is missing the change to nvme_rdma_device_unplug().  That is my
mistake.  Since patch 6 removes that part of the unplug logic, the omission is
benign for the series, but it should be fixed so that this patch in and of
itself fixes the problem it is addressing regardless of whether patch 6 is
applied.  I can fix this up if we decide patch 6 is the correct approach...


Steve.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
@ 2016-08-26 14:38   ` Christoph Hellwig
  2016-08-26 14:41     ` Steve Wise
  2016-08-28 12:45   ` Sagi Grimberg
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-08-26 14:38 UTC (permalink / raw)


This looks good to me:

Reviewed-by: Christoph Hellwig <hch at lst.de>

Minor nitpick in case you're going to respin it anyway:

> +	/*
> +	 * Keep a reference until all work is flushed since
> +	 * __nvme_rdma_del_ctrl can free the ctrl mem
> +	 */
> +	kref_get(&ctrl->ctrl.kref);
>  
>  	ret = __nvme_rdma_del_ctrl(ctrl);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	flush_work(&ctrl->delete_work);
> -
> -	return 0;
> +out:
> +	nvme_put_ctrl(&ctrl->ctrl);
> +	return ret;

How about:

	kref_get(&ctrl->ctrl.kref);
	ret = __nvme_rdma_del_ctrl(ctrl);
	if (!ret)
		flush_work(&ctrl->delete_work);
	nvme_put_ctrl(&ctrl->ctrl);
	return ret;

here?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
@ 2016-08-26 14:39   ` Christoph Hellwig
  2016-08-26 14:42     ` Steve Wise
  2016-08-28 12:44   ` Sagi Grimberg
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-08-26 14:39 UTC (permalink / raw)


On Fri, Aug 26, 2016@06:50:58AM -0700, Steve Wise wrote:
> After address resolution, the nvme_rdma_queue rdma resources are
> allocated.  If rdma route resolution or the connect fails, or the
> controller reconnect times out and gives up, then the rdma resources
> need to be freed.  Otherwise, rdma resources are leaked.

I'm pretty sure Sagi tried to convince me we'd need something like this
begfore and I tried to resists.  Sorry guys..

>  struct nvme_rdma_queue {
> @@ -488,6 +489,7 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
>  	struct nvme_rdma_device *dev = queue->device;
>  	struct ib_device *ibdev = dev->dev;
>  
> +	clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);

How about doing a

	if (!test_and_clear(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags))
		return;

here so the the callers don't have to worry?

Otherwise this looks fine to me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
@ 2016-08-26 14:41   ` Christoph Hellwig
  2016-08-26 14:48     ` Steve Wise
  2016-08-28 12:56   ` Sagi Grimberg
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-08-26 14:41 UTC (permalink / raw)


On Fri, Aug 26, 2016@06:52:59AM -0700, Steve Wise wrote:
> This patch adds the concept of an "unplug" cm_id for each nvme_rdma_ctrl
> controller.  When the controller is first created and the admin qp
> is connected to the target, the unplug_cm_id is created and address
> resolution is done on it to bind it to the same device that the admin QP
> is bound to.   This unplug_cm_id remains across any/all kato recovery and
> thus will always be available for DEVICE_REMOVAL events.  This simplifies
> the unplug handler because the cm_id isn't associated with any of the IO
> queues nor the admin queue.  Plus it ensures a cm_id is always available
> per controller to get the DEVICE_REMOVAL event.

I'll need some time to digest all the crazy RDMA/CM interactions here.
Do you have any clue how other drivers handle this situation?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-26 14:38   ` Christoph Hellwig
@ 2016-08-26 14:41     ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 14:41 UTC (permalink / raw)


> 
> This looks good to me:
> 
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> 
> Minor nitpick in case you're going to respin it anyway:
> 
> > +	/*
> > +	 * Keep a reference until all work is flushed since
> > +	 * __nvme_rdma_del_ctrl can free the ctrl mem
> > +	 */
> > +	kref_get(&ctrl->ctrl.kref);
> >
> >  	ret = __nvme_rdma_del_ctrl(ctrl);
> >  	if (ret)
> > -		return ret;
> > +		goto out;
> >
> >  	flush_work(&ctrl->delete_work);
> > -
> > -	return 0;
> > +out:
> > +	nvme_put_ctrl(&ctrl->ctrl);
> > +	return ret;
> 
> How about:
> 
> 	kref_get(&ctrl->ctrl.kref);
> 	ret = __nvme_rdma_del_ctrl(ctrl);
> 	if (!ret)
> 		flush_work(&ctrl->delete_work);
> 	nvme_put_ctrl(&ctrl->ctrl);
> 	return ret;
> 
> here?

That looks cleaner.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-26 14:39   ` Christoph Hellwig
@ 2016-08-26 14:42     ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 14:42 UTC (permalink / raw)


> 
> On Fri, Aug 26, 2016@06:50:58AM -0700, Steve Wise wrote:
> > After address resolution, the nvme_rdma_queue rdma resources are
> > allocated.  If rdma route resolution or the connect fails, or the
> > controller reconnect times out and gives up, then the rdma resources
> > need to be freed.  Otherwise, rdma resources are leaked.
> 
> I'm pretty sure Sagi tried to convince me we'd need something like this
> begfore and I tried to resists.  Sorry guys..
> 
> >  struct nvme_rdma_queue {
> > @@ -488,6 +489,7 @@ static void nvme_rdma_destroy_queue_ib(struct
> nvme_rdma_queue *queue)
> >  	struct nvme_rdma_device *dev = queue->device;
> >  	struct ib_device *ibdev = dev->dev;
> >
> > +	clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
> 
> How about doing a
> 
> 	if (!test_and_clear(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags))
> 		return;
> 
> here so the the callers don't have to worry?
> 
> Otherwise this looks fine to me.

Sure. 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-26 14:41   ` Christoph Hellwig
@ 2016-08-26 14:48     ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-26 14:48 UTC (permalink / raw)


> 
> On Fri, Aug 26, 2016@06:52:59AM -0700, Steve Wise wrote:
> > This patch adds the concept of an "unplug" cm_id for each nvme_rdma_ctrl
> > controller.  When the controller is first created and the admin qp
> > is connected to the target, the unplug_cm_id is created and address
> > resolution is done on it to bind it to the same device that the admin QP
> > is bound to.   This unplug_cm_id remains across any/all kato recovery and
> > thus will always be available for DEVICE_REMOVAL events.  This simplifies
> > the unplug handler because the cm_id isn't associated with any of the IO
> > queues nor the admin queue.  Plus it ensures a cm_id is always available
> > per controller to get the DEVICE_REMOVAL event.
> 
> I'll need some time to digest all the crazy RDMA/CM interactions here.
> Do you have any clue how other drivers handle this situation?

No.   I'm not sure the other users correctly handle device removal.   There is
another way though:  I sent out a patch earlier using the ib_client interface to
handle this.  ib_clients register with the ib core and provide add and remove
callback functions to handle device insertion/removal.  We could use that here
instead.   But the vision of the RDMA_CM was to provide a transport-independent
way to handle this I guess...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
  2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
@ 2016-08-28 12:42   ` Sagi Grimberg
  0 siblings, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-28 12:42 UTC (permalink / raw)


Looks fine,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released
  2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
@ 2016-08-28 12:43   ` Sagi Grimberg
  0 siblings, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-28 12:43 UTC (permalink / raw)


Looks reasonable,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
  2016-08-26 14:39   ` Christoph Hellwig
@ 2016-08-28 12:44   ` Sagi Grimberg
  1 sibling, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-28 12:44 UTC (permalink / raw)


No 3/6?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
  2016-08-26 14:38   ` Christoph Hellwig
@ 2016-08-28 12:45   ` Sagi Grimberg
  1 sibling, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-28 12:45 UTC (permalink / raw)


Here it is...

With Christoph's comment looks fine to me,

Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag
  2016-08-26 14:14   ` Steve Wise
@ 2016-08-28 12:48     ` Sagi Grimberg
  0 siblings, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-28 12:48 UTC (permalink / raw)



>> From: Sagi Grimberg <sagi at grimberg.me>
>>
>> When we get a surprise disconnect from the target we queue a periodic
>> reconnect (which is the sane thing to do...).
>>
>> We only move the queues out of CONNECTED when we retry to reconnect (after
>> 10 seconds in the default case) but we stop the blk queues immediately
>> so we are not bothered with traffic from now on. If delete() is kicking
>> off in this period the queues are still in CONNECTED state.
>>
>> Part of the delete sequence is trying to issue ctrl shutdown if the
>> admin queue is CONNECTED (which it is!). This request is issued but
>> stuck in blk-mq waiting for the queues to start again. This might be
>> the one preventing us from forward progress...
>>
>> The patch separates the queue flags to CONNECTED and DELETING. Now we
>> will move out of CONNECTED as soon as error recovery kicks in (before
>> stopping the queues) and DELETING is on when we start the queue deletion.
>>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>
> Sagi,
>
> This patch is missing the change to nvme_rdma_device_unplug().  That is my
> mistake.  Since patch 6 removes that part of the unplug logic, the omission is
> benign for the series, but it should be fixed so that this patch in and of
> itself fixes the problem it is addressing regardless of whether patch 6 is
> applied.  I can fix this up if we decide patch 6 is the correct approach...

Let's fix it in unplug regardless of moving the logic so we'll have a
better bisection experience...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
  2016-08-26 14:41   ` Christoph Hellwig
@ 2016-08-28 12:56   ` Sagi Grimberg
  2016-08-29  7:30     ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-28 12:56 UTC (permalink / raw)


I think this patch will have a problem running over a bond device which
can trigger a ADDR_CHANGE event. The problem with this is that with
bonding you can easily get a different device as a result. Also, I have
a feeling that we are not prepared to handle this scenario anyway (but
this approach are not making it any better...)

Care to respin your client registration patch so we can judge which
is better?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-28 12:56   ` Sagi Grimberg
@ 2016-08-29  7:30     ` Christoph Hellwig
  2016-08-29 14:32       ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-08-29  7:30 UTC (permalink / raw)

On Sun, Aug 28, 2016@03:56:24PM +0300, Sagi Grimberg wrote:
> Care to respin your client registration patch so we can judge which
> is better?

FYI, I also really hate the idea of having to potentially allocate
resources on each device at driver load time which the client registration
forces us into.

I really think we need to take a step back and offer interfaces that don't
suck in the core instead of trying to work around RDMA/CM in the core.
Unfortunately I don't really know what it takes for that yet.  I'm pretty
busy this work, but I'd be happy to reserve a lot of time next week to
dig into it unless someone beats me.

I suspect a big part of that is having a queue state machine in the core,
and getting rid of that horrible RDMA/CM event multiplexer.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-29  7:30     ` Christoph Hellwig
@ 2016-08-29 14:32       ` Sagi Grimberg
  2016-08-29 19:42         ` Steve Wise
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2016-08-29 14:32 UTC (permalink / raw)



>> Care to respin your client registration patch so we can judge which
>> is better?
>
> FYI, I also really hate the idea of having to potentially allocate
> resources on each device at driver load time which the client registration
> forces us into.

The client registration doesn't force us to allocate anything.
It's simply for us trigger cleanups when the device is unplugged...

static void nvme_rdma_add_one(struct ib_device *device)
{
	/* Do nothing */
}

static void nvme_rdma_remove_one(struct ib_device *device,
		void *cdata)
{
	/*
	 * for each ctrl where (ctrl->dev->device == device)
	 * 	queue delete controller
	 *
	 * flush the workqueue
	 */
}

static struct ib_client nvme_rdma_client = {
         .name   = "nvme_rdma",
         .add    = nvme_rdma_add_one,
         .remove = nvme_rdma_remove_one
};


> I really think we need to take a step back and offer interfaces that don't
> suck in the core instead of trying to work around RDMA/CM in the core.
> Unfortunately I don't really know what it takes for that yet.  I'm pretty
> busy this work, but I'd be happy to reserve a lot of time next week to
> dig into it unless someone beats me.

I agree we have *plenty* of room to improve in the RDMA_CM interface.
But this particular problem is the fact that we might get a device
removal right in the moment where we have no cm_id's open because we
are in the middle of periodic reconnects. This is why we can't even see
the event.

What sort of interface that would help here did you have in mind?

> I suspect a big part of that is having a queue state machine in the core,

We have a queue-pair state machine in the core, but currently it's not
very useful for the consumers, and the silly thing is that it's not
represented in the ib_qp struct and needs a ib_query_qp to figure it
out (one of the reasons is that the QP states and their transitions
are detailed in the different specs and not all of them are
synchronous).

> and getting rid of that horrible RDMA/CM event multiplexer.

That would be very nice improvement...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
  2016-08-29 14:32       ` Sagi Grimberg
@ 2016-08-29 19:42         ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2016-08-29 19:42 UTC (permalink / raw)


> >> Care to respin your client registration patch so we can judge which
> >> is better?
> >
> > FYI, I also really hate the idea of having to potentially allocate
> > resources on each device at driver load time which the client registration
> > forces us into.
> 
> The client registration doesn't force us to allocate anything.
> It's simply for us trigger cleanups when the device is unplugged...
> 
> static void nvme_rdma_add_one(struct ib_device *device)
> {
> 	/* Do nothing */
> }
> 
> static void nvme_rdma_remove_one(struct ib_device *device,
> 		void *cdata)
> {
> 	/*
> 	 * for each ctrl where (ctrl->dev->device == device)
> 	 * 	queue delete controller
> 	 *
> 	 * flush the workqueue
> 	 */
> }
> 
> static struct ib_client nvme_rdma_client = {
>          .name   = "nvme_rdma",
>          .add    = nvme_rdma_add_one,
>          .remove = nvme_rdma_remove_one
> };
> 
> 
> > I really think we need to take a step back and offer interfaces that don't
> > suck in the core instead of trying to work around RDMA/CM in the core.
> > Unfortunately I don't really know what it takes for that yet.  I'm pretty
> > busy this work, but I'd be happy to reserve a lot of time next week to
> > dig into it unless someone beats me.
> 
> I agree we have *plenty* of room to improve in the RDMA_CM interface.
> But this particular problem is the fact that we might get a device
> removal right in the moment where we have no cm_id's open because we
> are in the middle of periodic reconnects. This is why we can't even see
> the event.
> 
> What sort of interface that would help here did you have in mind?
> 
> > I suspect a big part of that is having a queue state machine in the core,
> 
> We have a queue-pair state machine in the core, but currently it's not
> very useful for the consumers, and the silly thing is that it's not
> represented in the ib_qp struct and needs a ib_query_qp to figure it
> out (one of the reasons is that the QP states and their transitions
> are detailed in the different specs and not all of them are
> synchronous).
> 
> > and getting rid of that horrible RDMA/CM event multiplexer.
> 
> That would be very nice improvement...
> 

So should I respin the ib_client patch to just do device removal, or am I
wasting my time?

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-08-29 19:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-08-28 12:42   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-08-28 12:43   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-26 14:38   ` Christoph Hellwig
2016-08-26 14:41     ` Steve Wise
2016-08-28 12:45   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-08-26 14:39   ` Christoph Hellwig
2016-08-26 14:42     ` Steve Wise
2016-08-28 12:44   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
2016-08-26 14:14   ` Steve Wise
2016-08-28 12:48     ` Sagi Grimberg
2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
2016-08-26 14:41   ` Christoph Hellwig
2016-08-26 14:48     ` Steve Wise
2016-08-28 12:56   ` Sagi Grimberg
2016-08-29  7:30     ` Christoph Hellwig
2016-08-29 14:32       ` Sagi Grimberg
2016-08-29 19:42         ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.