All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation
@ 2017-11-12 10:12 Max Gurtovoy
  2017-11-13 19:31 ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-12 10:12 UTC (permalink / raw)


In case nvme_rdma_wait_for_cm timeout expires before we get
an established or rejected event (rdma_connect succeeded) from
rdma_cm, we end up with leaking the ib transport resources for
dedicated queue. This scenario can easily reproduced using traffic
test during port toggling.
Also, in order to protect from parallel ib queue destruction, that
may be invoked from different context's, introduce new flag that
stands for transport readiness. While we're here, protect also against
a situation that we can receive rdma_cm events during ib queue destruction.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---

Changes from v1:
 - added new queue flag bit NVME_RDMA_Q_TR_READY to avoid parallel destruction
 - guarantee that cm_id destroyed before ib queue resources (from Sagi)
 - rebase over nvme-4.15

---
 drivers/nvme/host/rdma.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c8d8544..738d870 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -73,6 +73,7 @@ struct nvme_rdma_request {
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_ALLOCATED		= 0,
 	NVME_RDMA_Q_LIVE		= 1,
+	NVME_RDMA_Q_TR_READY		= 2,
 };
 
 struct nvme_rdma_queue {
@@ -428,10 +429,16 @@ static int nvme_rdma_dev_get(struct nvme_rdma_device *dev)
 
 static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 {
-	struct nvme_rdma_device *dev = queue->device;
-	struct ib_device *ibdev = dev->dev;
+	struct nvme_rdma_device *dev;
+	struct ib_device *ibdev;
+
+	if (!test_and_clear_bit(NVME_RDMA_Q_TR_READY, &queue->flags))
+		return;
 
-	rdma_destroy_qp(queue->cm_id);
+	dev = queue->device;
+	ibdev = dev->dev;
+
+	ib_destroy_qp(queue->qp);
 	ib_free_cq(queue->ib_cq);
 
 	nvme_rdma_free_ring(ibdev, queue->rsp_ring, queue->queue_size,
@@ -482,10 +489,12 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 		goto out_destroy_qp;
 	}
 
+	set_bit(NVME_RDMA_Q_TR_READY, &queue->flags);
+
 	return 0;
 
 out_destroy_qp:
-	rdma_destroy_qp(queue->cm_id);
+	ib_destroy_qp(queue->qp);
 out_destroy_ib_cq:
 	ib_free_cq(queue->ib_cq);
 out_put_dev:
@@ -546,6 +555,7 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 
 out_destroy_cm_id:
 	rdma_destroy_id(queue->cm_id);
+	nvme_rdma_destroy_queue_ib(queue);
 	return ret;
 }
 
@@ -563,8 +573,8 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 	if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
 		return;
 
-	nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
+	nvme_rdma_destroy_queue_ib(queue);
 }
 
 static void nvme_rdma_free_io_queues(struct nvme_rdma_ctrl *ctrl)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-12 10:12 [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
@ 2017-11-13 19:31 ` Sagi Grimberg
  2017-11-14  8:57   ` Max Gurtovoy
  0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2017-11-13 19:31 UTC (permalink / raw)


Hi Max,

>   out_destroy_qp:
> -	rdma_destroy_qp(queue->cm_id);
> +	ib_destroy_qp(queue->qp);

Why was this changed? Any specific reason?>

>   out_destroy_ib_cq:
>   	ib_free_cq(queue->ib_cq);
>   out_put_dev:
> @@ -546,6 +555,7 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
>   
>   out_destroy_cm_id:
>   	rdma_destroy_id(queue->cm_id);
> +	nvme_rdma_destroy_queue_ib(queue);
>   	return ret;
>   }
>   
> @@ -563,8 +573,8 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
>   	if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>   		return;
>   
> -	nvme_rdma_destroy_queue_ib(queue);
>   	rdma_destroy_id(queue->cm_id);
> +	nvme_rdma_destroy_queue_ib(queue);

Why was this changed? What race are you preventing here?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-13 19:31 ` Sagi Grimberg
@ 2017-11-14  8:57   ` Max Gurtovoy
  2017-11-22 13:48     ` [Suspected-Phishing]Re: " Max Gurtovoy
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-14  8:57 UTC (permalink / raw)




On 11/13/2017 9:31 PM, Sagi Grimberg wrote:
> Hi Max,
> 
>> ? out_destroy_qp:
>> -??? rdma_destroy_qp(queue->cm_id);
>> +??? ib_destroy_qp(queue->qp);
> 
> Why was this changed? Any specific reason?>
> 

In order to destroy QP using 1 API instead of 2.
We can leave it "rdma_destroy_qp(queue->cm_id);", here it's safe.

>> ? out_destroy_ib_cq:
>> ????? ib_free_cq(queue->ib_cq);
>> ? out_put_dev:
>> @@ -546,6 +555,7 @@ static int nvme_rdma_alloc_queue(struct 
>> nvme_rdma_ctrl *ctrl,
>> ? out_destroy_cm_id:
>> ????? rdma_destroy_id(queue->cm_id);
>> +??? nvme_rdma_destroy_queue_ib(queue);
>> ????? return ret;
>> ? }
>> @@ -563,8 +573,8 @@ static void nvme_rdma_free_queue(struct 
>> nvme_rdma_queue *queue)
>> ????? if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>> ????????? return;
>> -??? nvme_rdma_destroy_queue_ib(queue);
>> ????? rdma_destroy_id(queue->cm_id);
>> +??? nvme_rdma_destroy_queue_ib(queue);
> 
> Why was this changed? What race are you preventing here?

No race here, just wanted to align the order of destruction and make 
sure we don't get any rdma_cm events during queue_ib destruction as we 
did above.

Do you prefer leaving these 2 lines "as is" and add comments in the code ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Suspected-Phishing]Re: [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-14  8:57   ` Max Gurtovoy
@ 2017-11-22 13:48     ` Max Gurtovoy
  2017-11-22 14:31       ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-22 13:48 UTC (permalink / raw)


Hi Sagi/Christoph,

On 11/14/2017 10:57 AM, Max Gurtovoy wrote:
> 
> 
> On 11/13/2017 9:31 PM, Sagi Grimberg wrote:
>> Hi Max,
>>
>>> ? out_destroy_qp:
>>> -??? rdma_destroy_qp(queue->cm_id);
>>> +??? ib_destroy_qp(queue->qp);
>>
>> Why was this changed? Any specific reason?>
>>
> 
> In order to destroy QP using 1 API instead of 2.
> We can leave it "rdma_destroy_qp(queue->cm_id);", here it's safe.
> 

should I leave it rdma_destroy_qp(queue->cm_id) or modify it to 
ib_destroy_qp(queue->qp) ?

>>> ? out_destroy_ib_cq:
>>> ????? ib_free_cq(queue->ib_cq);
>>> ? out_put_dev:
>>> @@ -546,6 +555,7 @@ static int nvme_rdma_alloc_queue(struct 
>>> nvme_rdma_ctrl *ctrl,
>>> ? out_destroy_cm_id:
>>> ????? rdma_destroy_id(queue->cm_id);
>>> +??? nvme_rdma_destroy_queue_ib(queue);
>>> ????? return ret;
>>> ? }
>>> @@ -563,8 +573,8 @@ static void nvme_rdma_free_queue(struct 
>>> nvme_rdma_queue *queue)
>>> ????? if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>>> ????????? return;
>>> -??? nvme_rdma_destroy_queue_ib(queue);
>>> ????? rdma_destroy_id(queue->cm_id);
>>> +??? nvme_rdma_destroy_queue_ib(queue);
>>
>> Why was this changed? What race are you preventing here?
> 
> No race here, just wanted to align the order of destruction and make 
> sure we don't get any rdma_cm events during queue_ib destruction as we 
> did above.
> 
> Do you prefer leaving these 2 lines "as is" and add comments in the code ?

what is prefered here ?

> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&data=02%7C01%7Cmaxg%40mellanox.com%7C82f0b9595a1848503fa608d52b3de82c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636462467259217932&sdata=kCIR9j%2FJZxm8NN72rqaOg%2BC9METO6XicAkoIYAY%2BQio%3D&reserved=0 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Suspected-Phishing]Re: [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-22 13:48     ` [Suspected-Phishing]Re: " Max Gurtovoy
@ 2017-11-22 14:31       ` Sagi Grimberg
  0 siblings, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2017-11-22 14:31 UTC (permalink / raw)



>> In order to destroy QP using 1 API instead of 2.
>> We can leave it "rdma_destroy_qp(queue->cm_id);", here it's safe.
>>
> 
> should I leave it rdma_destroy_qp(queue->cm_id) or modify it to 
> ib_destroy_qp(queue->qp) ?

Yes

>>>> @@ -563,8 +573,8 @@ static void nvme_rdma_free_queue(struct 
>>>> nvme_rdma_queue *queue)
>>>> ????? if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>>>> ????????? return;
>>>> -??? nvme_rdma_destroy_queue_ib(queue);
>>>> ????? rdma_destroy_id(queue->cm_id);
>>>> +??? nvme_rdma_destroy_queue_ib(queue);
>>>
>>> Why was this changed? What race are you preventing here?
>>
>> No race here, just wanted to align the order of destruction and make 
>> sure we don't get any rdma_cm events during queue_ib destruction as we 
>> did above.
>>
>> Do you prefer leaving these 2 lines "as is" and add comments in the 
>> code ?
> 
> what is prefered here ?

Keep it as is

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-22 14:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-12 10:12 [PATCH v2 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
2017-11-13 19:31 ` Sagi Grimberg
2017-11-14  8:57   ` Max Gurtovoy
2017-11-22 13:48     ` [Suspected-Phishing]Re: " Max Gurtovoy
2017-11-22 14:31       ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.