All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset
@ 2019-05-06 10:47 Max Gurtovoy
  2019-05-08  7:17 ` Christoph Hellwig
  2019-05-12 15:26 ` Sagi Grimberg
  0 siblings, 2 replies; 5+ messages in thread
From: Max Gurtovoy @ 2019-05-06 10:47 UTC (permalink / raw)


In the past, before adding f41725bb ("nvme-rdma: Use mr pool") commit,
we needed a reference on the ib_device as long as the tagset
was alive, as the MRs in the request structures needed a valid ib_device.
Now, we allocate/deallocate MR pool per QP and consume on demand.

Also remove nvme_rdma_free_tagset function and use blk_mq_free_tag_set
instead, as it unneeded anymore.

This commit also fixes a memory leakage and possible segmentation fault.
When configuring the system with NIC teaming (aka bonding), we use 1
network interface to create an HA connection to the target side. In case
one connection breaks down, nvme-rdma driver will get notification from
rdma-cm layer that underlying address was change and will start error
recovery process. During this process, we'll reconnect to the target
via the second interface in the bond without destroying the tagset.
This will cause a leakage of the initial rdma device (ndev) and miscount
in the reference count of the new created rdma device (new ndev). In
the final destruction (or in another error flow), we'll get a warning
dump from the ib_dealloc_pd that we still have inflight MR's related to
that pd. This happens becasue of the miscount of the reference tag of
the rdma device and causing access violation to it's elements (some
queues are not destroyed yet).

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
Signed-off-by: Israel Rukshin <israelr at mellanox.com>
---
 drivers/nvme/host/rdma.c | 34 +++++-----------------------------
 1 file changed, 5 insertions(+), 29 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 794b08e60acc..40a5a98e4aa1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -697,15 +697,6 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return ret;
 }
 
-static void nvme_rdma_free_tagset(struct nvme_ctrl *nctrl,
-		struct blk_mq_tag_set *set)
-{
-	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-
-	blk_mq_free_tag_set(set);
-	nvme_rdma_dev_put(ctrl->device);
-}
-
 static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		bool admin)
 {
@@ -744,24 +735,9 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 
 	ret = blk_mq_alloc_tag_set(set);
 	if (ret)
-		goto out;
-
-	/*
-	 * We need a reference on the device as long as the tag_set is alive,
-	 * as the MRs in the request structures need a valid ib_device.
-	 */
-	ret = nvme_rdma_dev_get(ctrl->device);
-	if (!ret) {
-		ret = -EINVAL;
-		goto out_free_tagset;
-	}
+		return ERR_PTR(ret);
 
 	return set;
-
-out_free_tagset:
-	blk_mq_free_tag_set(set);
-out:
-	return ERR_PTR(ret);
 }
 
 static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
@@ -769,7 +745,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
 {
 	if (remove) {
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
-		nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
+		blk_mq_free_tag_set(ctrl->ctrl.admin_tagset);
 	}
 	if (ctrl->async_event_sqe.data) {
 		nvme_rdma_free_qe(ctrl->device->dev, &ctrl->async_event_sqe,
@@ -847,7 +823,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset:
 	if (new)
-		nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
+		blk_mq_free_tag_set(ctrl->ctrl.admin_tagset);
 out_free_async_qe:
 	nvme_rdma_free_qe(ctrl->device->dev, &ctrl->async_event_sqe,
 		sizeof(struct nvme_command), DMA_TO_DEVICE);
@@ -862,7 +838,7 @@ static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl,
 {
 	if (remove) {
 		blk_cleanup_queue(ctrl->ctrl.connect_q);
-		nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
+		blk_mq_free_tag_set(ctrl->ctrl.tagset);
 	}
 	nvme_rdma_free_io_queues(ctrl);
 }
@@ -903,7 +879,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		blk_cleanup_queue(ctrl->ctrl.connect_q);
 out_free_tag_set:
 	if (new)
-		nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
+		blk_mq_free_tag_set(ctrl->ctrl.tagset);
 out_free_io_queues:
 	nvme_rdma_free_io_queues(ctrl);
 	return ret;
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset
  2019-05-06 10:47 [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset Max Gurtovoy
@ 2019-05-08  7:17 ` Christoph Hellwig
  2019-05-12 15:26 ` Sagi Grimberg
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2019-05-08  7:17 UTC (permalink / raw)


Thanks,

applied to nvme-5.2.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset
  2019-05-06 10:47 [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset Max Gurtovoy
  2019-05-08  7:17 ` Christoph Hellwig
@ 2019-05-12 15:26 ` Sagi Grimberg
  2019-05-14 10:30   ` Max Gurtovoy
  1 sibling, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2019-05-12 15:26 UTC (permalink / raw)


This looks good max.

On 5/6/19 3:47 AM, Max Gurtovoy wrote:
> In the past, before adding f41725bb ("nvme-rdma: Use mr pool") commit,
> we needed a reference on the ib_device as long as the tagset
> was alive, as the MRs in the request structures needed a valid ib_device.
> Now, we allocate/deallocate MR pool per QP and consume on demand.
> 
> Also remove nvme_rdma_free_tagset function and use blk_mq_free_tag_set
> instead, as it unneeded anymore.
> 
> This commit also fixes a memory leakage and possible segmentation fault.
> When configuring the system with NIC teaming (aka bonding)

Question, what about sqe dma mappings? don't they need to be remapped as
well?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset
  2019-05-12 15:26 ` Sagi Grimberg
@ 2019-05-14 10:30   ` Max Gurtovoy
  2019-05-14 11:18     ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2019-05-14 10:30 UTC (permalink / raw)



On 5/12/2019 6:26 PM, Sagi Grimberg wrote:
> This looks good max.
>
> On 5/6/19 3:47 AM, Max Gurtovoy wrote:
>> In the past, before adding f41725bb ("nvme-rdma: Use mr pool") commit,
>> we needed a reference on the ib_device as long as the tagset
>> was alive, as the MRs in the request structures needed a valid 
>> ib_device.
>> Now, we allocate/deallocate MR pool per QP and consume on demand.
>>
>> Also remove nvme_rdma_free_tagset function and use blk_mq_free_tag_set
>> instead, as it unneeded anymore.
>>
>> This commit also fixes a memory leakage and possible segmentation fault.
>> When configuring the system with NIC teaming (aka bonding)
>
> Question, what about sqe dma mappings? don't they need to be remapped as
> well?


Yes, I'm looking into the block layer to find a function that does 
exit_request for each hctx and init_request for each hctx..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset
  2019-05-14 10:30   ` Max Gurtovoy
@ 2019-05-14 11:18     ` Sagi Grimberg
  0 siblings, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2019-05-14 11:18 UTC (permalink / raw)



>> This looks good max.
>>
>> On 5/6/19 3:47 AM, Max Gurtovoy wrote:
>>> In the past, before adding f41725bb ("nvme-rdma: Use mr pool") commit,
>>> we needed a reference on the ib_device as long as the tagset
>>> was alive, as the MRs in the request structures needed a valid 
>>> ib_device.
>>> Now, we allocate/deallocate MR pool per QP and consume on demand.
>>>
>>> Also remove nvme_rdma_free_tagset function and use blk_mq_free_tag_set
>>> instead, as it unneeded anymore.
>>>
>>> This commit also fixes a memory leakage and possible segmentation fault.
>>> When configuring the system with NIC teaming (aka bonding)
>>
>> Question, what about sqe dma mappings? don't they need to be remapped as
>> well?
> 
> 
> Yes, I'm looking into the block layer to find a function that does 
> exit_request for each hctx and init_request for each hctx..

Once upon of time there was a .reinit_request handler, but was removed
at some point...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-14 11:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-06 10:47 [PATCH] nvme-rdma: remove redundant reference between ib_device and tagset Max Gurtovoy
2019-05-08  7:17 ` Christoph Hellwig
2019-05-12 15:26 ` Sagi Grimberg
2019-05-14 10:30   ` Max Gurtovoy
2019-05-14 11:18     ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.