All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] nvmet: fixup processing async events
@ 2020-05-18 18:59 David Milburn
  2020-05-18 18:59 ` [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen David Milburn
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: David Milburn @ 2020-05-18 18:59 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, chaitanya.kulkarni, dwagner

This patch series changes how async events are processed
and freed based upon review from Christoph Hellwig and
Daniel Wagner, this was discovered while debugging a memory 
leak after clearing connection on the target.

Changes from v1:
 - free aens separate from completing requests in nvmet_async_events_free().
 - declare struct nvmet_async_event in second patch.


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen
  2020-05-18 18:59 [PATCH v2 0/2] nvmet: fixup processing async events David Milburn
@ 2020-05-18 18:59 ` David Milburn
  2020-05-20 17:18   ` Christoph Hellwig
  2020-05-18 18:59 ` [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free David Milburn
  2020-05-19  8:40 ` [PATCH v2 0/2] nvmet: fixup processing async events Chaitanya Kulkarni
  2 siblings, 1 reply; 29+ messages in thread
From: David Milburn @ 2020-05-18 18:59 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, chaitanya.kulkarni, dwagner

Free aens while completing requests in nvmet_async_events_process().

Based-on-a-patch-by: Christoph Hellwig <hch@infradead.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: David Milburn <dmilburn@redhat.com>
---
Changes from v1:
 - free aens separate from completing requests in nvmet_async_events_free().

 drivers/nvme/target/core.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b685f99d56a1..dc036a815d39 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -134,15 +134,10 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
 	struct nvmet_async_event *aen;
 	struct nvmet_req *req;
 
-	while (1) {
-		mutex_lock(&ctrl->lock);
-		aen = list_first_entry_or_null(&ctrl->async_events,
-				struct nvmet_async_event, entry);
-		if (!aen || !ctrl->nr_async_event_cmds) {
-			mutex_unlock(&ctrl->lock);
-			break;
-		}
-
+	mutex_lock(&ctrl->lock);
+	while (ctrl->nr_async_event_cmds && !list_empty(&ctrl->async_events)) {
+		aen = list_first_entry(&ctrl->async_events,
+				       struct nvmet_async_event, entry);
 		req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
 		if (status == 0)
 			nvmet_set_result(req, nvmet_async_event_result(aen));
@@ -152,7 +147,9 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
 
 		mutex_unlock(&ctrl->lock);
 		nvmet_req_complete(req, status);
+		mutex_lock(&ctrl->lock);
 	}
+	mutex_unlock(&ctrl->lock);
 }
 
 static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
-- 
2.18.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-18 18:59 [PATCH v2 0/2] nvmet: fixup processing async events David Milburn
  2020-05-18 18:59 ` [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen David Milburn
@ 2020-05-18 18:59 ` David Milburn
  2020-05-19  8:33   ` Sagi Grimberg
  2020-05-19  8:40 ` [PATCH v2 0/2] nvmet: fixup processing async events Chaitanya Kulkarni
  2 siblings, 1 reply; 29+ messages in thread
From: David Milburn @ 2020-05-18 18:59 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, chaitanya.kulkarni, dwagner

Make sure we free all resources including any remaining aens
which may result in a memory leak.

$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888c1af2c000 (size 32):
  comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
  hex dump (first 32 bytes):
    28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
    02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
  backtrace:
    [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
    [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
    [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
    [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
    [<00000000a15deb52>] config_item_release+0xf1/0x1c0
    [<000000007e148432>] configfs_rmdir+0x555/0x7c0
    [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
    [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
    [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
    [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

Steps to Reproduce:

target:
1. nvmetcli restore rdma.json

client:
2. nvme connect -t rdma -a $IP -s 4420 -n testnqn

target:
3. nvmetcli clear
4. sleep 5 && nvmetcli restore rdma.json
5. cat /sys/kernel/debug/kmemleak after 5 minutes

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: David Milburn <dmilburn@redhatcom>
---
Changes from v1:
 - declare struct nvmet_async_event in this patch.

 drivers/nvme/target/core.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index dc036a815d39..dda888672f31 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -154,6 +154,7 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
 
 static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
 {
+	struct nvmet_async_event *aen;
 	struct nvmet_req *req;
 
 	mutex_lock(&ctrl->lock);
@@ -163,6 +164,13 @@ static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
 		nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
 		mutex_lock(&ctrl->lock);
 	}
+
+	while (!list_empty(&ctrl->async_events)) {
+		aen = list_first_entry(&ctrl->async_events,
+				       struct nvmet_async_event, entry);
+		list_del(&aen->entry);
+		kfree(aen);
+	}
 	mutex_unlock(&ctrl->lock);
 }
 
-- 
2.18.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-18 18:59 ` [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free David Milburn
@ 2020-05-19  8:33   ` Sagi Grimberg
  2020-05-19 19:14     ` David Milburn
  2020-05-20  6:16     ` Christoph Hellwig
  0 siblings, 2 replies; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-19  8:33 UTC (permalink / raw)
  To: David Milburn, linux-nvme; +Cc: hch, dwagner, chaitanya.kulkarni



On 5/18/20 11:59 AM, David Milburn wrote:
> Make sure we free all resources including any remaining aens
> which may result in a memory leak.
>
> $ cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff888c1af2c000 (size 32):
>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>    hex dump (first 32 bytes):
>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>    backtrace:
>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>
> Steps to Reproduce:
>
> target:
> 1. nvmetcli restore rdma.json
>
> client:
> 2. nvme connect -t rdma -a $IP -s 4420 -n testnqn
>
> target:
> 3. nvmetcli clear
> 4. sleep 5 && nvmetcli restore rdma.json
> 5. cat /sys/kernel/debug/kmemleak after 5 minutes
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: David Milburn <dmilburn@redhatcom>
> ---
> Changes from v1:
>   - declare struct nvmet_async_event in this patch.
>
>   drivers/nvme/target/core.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index dc036a815d39..dda888672f31 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -154,6 +154,7 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
>   
>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   {
> +	struct nvmet_async_event *aen;
>   	struct nvmet_req *req;
>   
>   	mutex_lock(&ctrl->lock);
> @@ -163,6 +164,13 @@ static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   		nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>   		mutex_lock(&ctrl->lock);
>   	}
> +
> +	while (!list_empty(&ctrl->async_events)) {
> +		aen = list_first_entry(&ctrl->async_events,
> +				       struct nvmet_async_event, entry);
> +		list_del(&aen->entry);
> +		kfree(aen);
> +	}
>   	mutex_unlock(&ctrl->lock);
>   }
>   

Something here looks wrong to me... There is no reason to free aens here...

Also, seeing prior discussion on this patch
we don't actually take anything from the list if we don't have an 
available slot, so I
don't see how patch #1 helps anything...

Did you analyze the root cause of the issue? It's not clear what is the 
root cause
here..

Looking at the code, nvmet_async_events_free which is designed to free 
all the
pending aens that are not going to be sent anywhere, is not freeing 
anything...
Its also not clear to me from the code how can ctrl->async_events list and
ctrl->nr_async_event_cmds are not correlated...

Does this patch solve your issue?
--
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b685f99d56a1..190d36ceda47 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
nvmet_ctrl *ctrl, u16 status)

  static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
  {
+       struct nvmet_async_event *aen;
         struct nvmet_req *req;

         mutex_lock(&ctrl->lock);
         while (ctrl->nr_async_event_cmds) {
+               aen = list_first_entry(&ctrl->async_events,
+                               struct nvmet_async_event, entry);
+               list_del(&aen->entry);
+               kfree(aen);
                 req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
                 mutex_unlock(&ctrl->lock);
                 nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/2] nvmet: fixup processing async events
  2020-05-18 18:59 [PATCH v2 0/2] nvmet: fixup processing async events David Milburn
  2020-05-18 18:59 ` [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen David Milburn
  2020-05-18 18:59 ` [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free David Milburn
@ 2020-05-19  8:40 ` Chaitanya Kulkarni
  2020-05-19 19:17   ` David Milburn
  2 siblings, 1 reply; 29+ messages in thread
From: Chaitanya Kulkarni @ 2020-05-19  8:40 UTC (permalink / raw)
  To: David Milburn, linux-nvme; +Cc: hch, dwagner

On 5/18/20 12:00 PM, David Milburn wrote:
> This patch series changes how async events are processed
> and freed based upon review from Christoph Hellwig and
> Daniel Wagner, this was discovered while debugging a memory
> leak after clearing connection on the target.
> 
> Changes from v1:
>   - free aens separate from completing requests in nvmet_async_events_free().
>   - declare struct nvmet_async_event in second patch.
> 
> 
I'm still trying to reproduce the issue with nvme-loop.
Are able to reproduce this issue with nvme-loop ? if so can you
please share the steps ? OR this is transport specific ?


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-19  8:33   ` Sagi Grimberg
@ 2020-05-19 19:14     ` David Milburn
  2020-05-19 20:42       ` Sagi Grimberg
  2020-05-20  6:16     ` Christoph Hellwig
  1 sibling, 1 reply; 29+ messages in thread
From: David Milburn @ 2020-05-19 19:14 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: hch, dwagner, chaitanya.kulkarni

Hi Sagi,

On 05/19/2020 03:33 AM, Sagi Grimberg wrote:
> 
> 
> On 5/18/20 11:59 AM, David Milburn wrote:
>> Make sure we free all resources including any remaining aens
>> which may result in a memory leak.
>>
>> $ cat /sys/kernel/debug/kmemleak
>> unreferenced object 0xffff888c1af2c000 (size 32):
>>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>>    hex dump (first 32 bytes):
>>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>>    backtrace:
>>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>
>> Steps to Reproduce:
>>
>> target:
>> 1. nvmetcli restore rdma.json
>>
>> client:
>> 2. nvme connect -t rdma -a $IP -s 4420 -n testnqn
>>
>> target:
>> 3. nvmetcli clear
>> 4. sleep 5 && nvmetcli restore rdma.json
>> 5. cat /sys/kernel/debug/kmemleak after 5 minutes
>>
>> Reported-by: Yi Zhang <yi.zhang@redhat.com>
>> Signed-off-by: David Milburn <dmilburn@redhatcom>
>> ---
>> Changes from v1:
>>   - declare struct nvmet_async_event in this patch.
>>
>>   drivers/nvme/target/core.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index dc036a815d39..dda888672f31 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -154,6 +154,7 @@ static void nvmet_async_events_process(struct 
>> nvmet_ctrl *ctrl, u16 status)
>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>   {
>> +    struct nvmet_async_event *aen;
>>       struct nvmet_req *req;
>>       mutex_lock(&ctrl->lock);
>> @@ -163,6 +164,13 @@ static void nvmet_async_events_free(struct 
>> nvmet_ctrl *ctrl)
>>           nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>>           mutex_lock(&ctrl->lock);
>>       }
>> +
>> +    while (!list_empty(&ctrl->async_events)) {
>> +        aen = list_first_entry(&ctrl->async_events,
>> +                       struct nvmet_async_event, entry);
>> +        list_del(&aen->entry);
>> +        kfree(aen);
>> +    }
>>       mutex_unlock(&ctrl->lock);
>>   }
> 
> Something here looks wrong to me... There is no reason to free aens here...
> 
> Also, seeing prior discussion on this patch
> we don't actually take anything from the list if we don't have an 
> available slot, so I
> don't see how patch #1 helps anything...
> 
> Did you analyze the root cause of the issue? It's not clear what is the 
> root cause
> here..
> 
> Looking at the code, nvmet_async_events_free which is designed to free 
> all the
> pending aens that are not going to be sent anywhere, is not freeing 
> anything...
> Its also not clear to me from the code how can ctrl->async_events list and
> ctrl->nr_async_event_cmds are not correlated...
> 
> Does this patch solve your issue?
> -- 
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index b685f99d56a1..190d36ceda47 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
> nvmet_ctrl *ctrl, u16 status)
> 
>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   {
> +       struct nvmet_async_event *aen;
>          struct nvmet_req *req;
> 
>          mutex_lock(&ctrl->lock);
>          while (ctrl->nr_async_event_cmds) {
> +               aen = list_first_entry(&ctrl->async_events,
> +                               struct nvmet_async_event, entry);
> +               list_del(&aen->entry);
> +               kfree(aen);
>                  req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>                  mutex_unlock(&ctrl->lock);
>                  nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
> -- 
> 

The above doesn't solve the issue, this is what I see with
the handling of ctrl->async_events and ctrl->nr_async_events_cmds.

After host system connects to target

nvmet_rdma_handle_command
  nvmet_rdma_execute_command
   nvmet_execute_async_event

Now, request is added to async_event_cmds, increment 
ctrl->nr_async_event_cmds++

(just used the above, not patch #1 of this series)

nvmet_async_events_process

So, at this point nothing has been added to ctrl->async_events
and ctrl->nr_async_events_cmd is 1 so the driver breaks out
of while(1).

Next test does "nvmetcli clear"

nvmet_sq_destroy
  nvmet_async_events_process

Same as before, nothing has been added to ctrl->async_events
and ctrl->nr_async_events_cmd is 1 so the drivers breaks out
of while(1).

nvmet_async_events_free

Nothing yet has been added to ctrl->async_events, and the
driver pulls the request, dec ctrl->nr_async_events_cmd to 0,
and nvmet_req_complete, unlock ctrl->lock.

Then,

nvmet_ns_free
  nvmet_ns_disable
   nvmet_ns_changed
    nvmet_add_async_event

Now at this point we add the entry to ctrl->async_events, go
back through nvmet_async_events_process, we have an entry
on ctrl->async_events, but ctrl->nr_async_event_cmds is 0,
so the driver breaks out of while(1).

Thanks,
David




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/2] nvmet: fixup processing async events
  2020-05-19  8:40 ` [PATCH v2 0/2] nvmet: fixup processing async events Chaitanya Kulkarni
@ 2020-05-19 19:17   ` David Milburn
  2020-05-20  6:20     ` hch
  0 siblings, 1 reply; 29+ messages in thread
From: David Milburn @ 2020-05-19 19:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, dwagner

Hi Chaitanya,

On 05/19/2020 03:40 AM, Chaitanya Kulkarni wrote:
> On 5/18/20 12:00 PM, David Milburn wrote:
>> This patch series changes how async events are processed
>> and freed based upon review from Christoph Hellwig and
>> Daniel Wagner, this was discovered while debugging a memory
>> leak after clearing connection on the target.
>>
>> Changes from v1:
>>    - free aens separate from completing requests in nvmet_async_events_free().
>>    - declare struct nvmet_async_event in second patch.
>>
>>
> I'm still trying to reproduce the issue with nvme-loop.
> Are able to reproduce this issue with nvme-loop ? if so can you
> please share the steps ? OR this is transport specific ?
> 

We have only tested with rdma.

Thanks,
David


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-19 19:14     ` David Milburn
@ 2020-05-19 20:42       ` Sagi Grimberg
  2020-05-19 20:51         ` Sagi Grimberg
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-19 20:42 UTC (permalink / raw)
  To: David Milburn, linux-nvme; +Cc: hch, chaitanya.kulkarni, dwagner



On 5/19/20 12:14 PM, David Milburn wrote:
> Hi Sagi,
> 
> On 05/19/2020 03:33 AM, Sagi Grimberg wrote:
>>
>>
>> On 5/18/20 11:59 AM, David Milburn wrote:
>>> Make sure we free all resources including any remaining aens
>>> which may result in a memory leak.
>>>
>>> $ cat /sys/kernel/debug/kmemleak
>>> unreferenced object 0xffff888c1af2c000 (size 32):
>>>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>>>    hex dump (first 32 bytes):
>>>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>>>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>>>    backtrace:
>>>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>>>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>>>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>>>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>>>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>>>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>>>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>>>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>>>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>>>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>>
>>> Steps to Reproduce:
>>>
>>> target:
>>> 1. nvmetcli restore rdma.json
>>>
>>> client:
>>> 2. nvme connect -t rdma -a $IP -s 4420 -n testnqn
>>>
>>> target:
>>> 3. nvmetcli clear
>>> 4. sleep 5 && nvmetcli restore rdma.json
>>> 5. cat /sys/kernel/debug/kmemleak after 5 minutes
>>>
>>> Reported-by: Yi Zhang <yi.zhang@redhat.com>
>>> Signed-off-by: David Milburn <dmilburn@redhatcom>
>>> ---
>>> Changes from v1:
>>>   - declare struct nvmet_async_event in this patch.
>>>
>>>   drivers/nvme/target/core.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>> index dc036a815d39..dda888672f31 100644
>>> --- a/drivers/nvme/target/core.c
>>> +++ b/drivers/nvme/target/core.c
>>> @@ -154,6 +154,7 @@ static void nvmet_async_events_process(struct 
>>> nvmet_ctrl *ctrl, u16 status)
>>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>>   {
>>> +    struct nvmet_async_event *aen;
>>>       struct nvmet_req *req;
>>>       mutex_lock(&ctrl->lock);
>>> @@ -163,6 +164,13 @@ static void nvmet_async_events_free(struct 
>>> nvmet_ctrl *ctrl)
>>>           nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>>>           mutex_lock(&ctrl->lock);
>>>       }
>>> +
>>> +    while (!list_empty(&ctrl->async_events)) {
>>> +        aen = list_first_entry(&ctrl->async_events,
>>> +                       struct nvmet_async_event, entry);
>>> +        list_del(&aen->entry);
>>> +        kfree(aen);
>>> +    }
>>>       mutex_unlock(&ctrl->lock);
>>>   }
>>
>> Something here looks wrong to me... There is no reason to free aens 
>> here...
>>
>> Also, seeing prior discussion on this patch
>> we don't actually take anything from the list if we don't have an 
>> available slot, so I
>> don't see how patch #1 helps anything...
>>
>> Did you analyze the root cause of the issue? It's not clear what is 
>> the root cause
>> here..
>>
>> Looking at the code, nvmet_async_events_free which is designed to free 
>> all the
>> pending aens that are not going to be sent anywhere, is not freeing 
>> anything...
>> Its also not clear to me from the code how can ctrl->async_events list 
>> and
>> ctrl->nr_async_event_cmds are not correlated...
>>
>> Does this patch solve your issue?
>> -- 
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index b685f99d56a1..190d36ceda47 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
>> nvmet_ctrl *ctrl, u16 status)
>>
>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>   {
>> +       struct nvmet_async_event *aen;
>>          struct nvmet_req *req;
>>
>>          mutex_lock(&ctrl->lock);
>>          while (ctrl->nr_async_event_cmds) {
>> +               aen = list_first_entry(&ctrl->async_events,
>> +                               struct nvmet_async_event, entry);
>> +               list_del(&aen->entry);
>> +               kfree(aen);
>>                  req = 
>> ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>>                  mutex_unlock(&ctrl->lock);
>>                  nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>> -- 
>>
> 
> The above doesn't solve the issue, this is what I see with
> the handling of ctrl->async_events and ctrl->nr_async_events_cmds.
> 
> After host system connects to target
> 
> nvmet_rdma_handle_command
>   nvmet_rdma_execute_command
>    nvmet_execute_async_event
> 
> Now, request is added to async_event_cmds, increment 
> ctrl->nr_async_event_cmds++
> 
> (just used the above, not patch #1 of this series)
> 
> nvmet_async_events_process
> 
> So, at this point nothing has been added to ctrl->async_events
> and ctrl->nr_async_events_cmd is 1 so the driver breaks out
> of while(1).
> 
> Next test does "nvmetcli clear"
> 
> nvmet_sq_destroy
>   nvmet_async_events_process
> 
> Same as before, nothing has been added to ctrl->async_events
> and ctrl->nr_async_events_cmd is 1 so the drivers breaks out
> of while(1).
> 
> nvmet_async_events_free
> 
> Nothing yet has been added to ctrl->async_events, and the
> driver pulls the request, dec ctrl->nr_async_events_cmd to 0,
> and nvmet_req_complete, unlock ctrl->lock.
> 
> Then,
> 
> nvmet_ns_free
>   nvmet_ns_disable
>    nvmet_ns_changed
>     nvmet_add_async_event
> 
> Now at this point we add the entry to ctrl->async_events, go
> back through nvmet_async_events_process, we have an entry
> on ctrl->async_events, but ctrl->nr_async_event_cmds is 0,
> so the driver breaks out of while(1).

And there is your problem, the admin sq was destroyed before
the async event was processed, and nothing cleans it up
in the ctrl removal.

How about this?
--
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b685f99d56a1..027166c7d172 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
nvmet_ctrl *ctrl, u16 status)

  static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
  {
+       struct nvmet_async_event *aen;
         struct nvmet_req *req;

         mutex_lock(&ctrl->lock);
         while (ctrl->nr_async_event_cmds) {
+               aen = list_first_entry(&ctrl->async_events,
+                               struct nvmet_async_event, entry);
+               list_del(&aen->entry);
+               kfree(aen);
                 req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
                 mutex_unlock(&ctrl->lock);
                 nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
@@ -764,10 +769,8 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
          * If this is the admin queue, complete all AERs so that our
          * queue doesn't have outstanding requests on it.
          */
-       if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
+       if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq)
                 nvmet_async_events_process(ctrl, status);
-               nvmet_async_events_free(ctrl);
-       }
         percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
         wait_for_completion(&sq->confirm_done);
         wait_for_completion(&sq->free_done);
@@ -1353,6 +1356,7 @@ static void nvmet_ctrl_free(struct kref *ref)
         nvmet_stop_keep_alive_timer(ctrl);

         flush_work(&ctrl->async_event_work);
+       nvmet_async_events_free(ctrl);
         cancel_work_sync(&ctrl->fatal_err_work);

         ida_simple_remove(&cntlid_ida, ctrl->cntlid);
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-19 20:42       ` Sagi Grimberg
@ 2020-05-19 20:51         ` Sagi Grimberg
  2020-05-20  6:18           ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-19 20:51 UTC (permalink / raw)
  To: David Milburn, linux-nvme; +Cc: hch, chaitanya.kulkarni, dwagner



On 5/19/20 1:42 PM, Sagi Grimberg wrote:
> 
> 
> On 5/19/20 12:14 PM, David Milburn wrote:
>> Hi Sagi,
>>
>> On 05/19/2020 03:33 AM, Sagi Grimberg wrote:
>>>
>>>
>>> On 5/18/20 11:59 AM, David Milburn wrote:
>>>> Make sure we free all resources including any remaining aens
>>>> which may result in a memory leak.
>>>>
>>>> $ cat /sys/kernel/debug/kmemleak
>>>> unreferenced object 0xffff888c1af2c000 (size 32):
>>>>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>>>>    hex dump (first 32 bytes):
>>>>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>>>>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>>>>    backtrace:
>>>>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>>>>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>>>>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>>>>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>>>>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>>>>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>>>>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>>>>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>>>>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>>>>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>>>
>>>> Steps to Reproduce:
>>>>
>>>> target:
>>>> 1. nvmetcli restore rdma.json
>>>>
>>>> client:
>>>> 2. nvme connect -t rdma -a $IP -s 4420 -n testnqn
>>>>
>>>> target:
>>>> 3. nvmetcli clear
>>>> 4. sleep 5 && nvmetcli restore rdma.json
>>>> 5. cat /sys/kernel/debug/kmemleak after 5 minutes
>>>>
>>>> Reported-by: Yi Zhang <yi.zhang@redhat.com>
>>>> Signed-off-by: David Milburn <dmilburn@redhatcom>
>>>> ---
>>>> Changes from v1:
>>>>   - declare struct nvmet_async_event in this patch.
>>>>
>>>>   drivers/nvme/target/core.c | 8 ++++++++
>>>>   1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>>> index dc036a815d39..dda888672f31 100644
>>>> --- a/drivers/nvme/target/core.c
>>>> +++ b/drivers/nvme/target/core.c
>>>> @@ -154,6 +154,7 @@ static void nvmet_async_events_process(struct 
>>>> nvmet_ctrl *ctrl, u16 status)
>>>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>>>   {
>>>> +    struct nvmet_async_event *aen;
>>>>       struct nvmet_req *req;
>>>>       mutex_lock(&ctrl->lock);
>>>> @@ -163,6 +164,13 @@ static void nvmet_async_events_free(struct 
>>>> nvmet_ctrl *ctrl)
>>>>           nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>>>>           mutex_lock(&ctrl->lock);
>>>>       }
>>>> +
>>>> +    while (!list_empty(&ctrl->async_events)) {
>>>> +        aen = list_first_entry(&ctrl->async_events,
>>>> +                       struct nvmet_async_event, entry);
>>>> +        list_del(&aen->entry);
>>>> +        kfree(aen);
>>>> +    }
>>>>       mutex_unlock(&ctrl->lock);
>>>>   }
>>>
>>> Something here looks wrong to me... There is no reason to free aens 
>>> here...
>>>
>>> Also, seeing prior discussion on this patch
>>> we don't actually take anything from the list if we don't have an 
>>> available slot, so I
>>> don't see how patch #1 helps anything...
>>>
>>> Did you analyze the root cause of the issue? It's not clear what is 
>>> the root cause
>>> here..
>>>
>>> Looking at the code, nvmet_async_events_free which is designed to 
>>> free all the
>>> pending aens that are not going to be sent anywhere, is not freeing 
>>> anything...
>>> Its also not clear to me from the code how can ctrl->async_events 
>>> list and
>>> ctrl->nr_async_event_cmds are not correlated...
>>>
>>> Does this patch solve your issue?
>>> -- 
>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>> index b685f99d56a1..190d36ceda47 100644
>>> --- a/drivers/nvme/target/core.c
>>> +++ b/drivers/nvme/target/core.c
>>> @@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
>>> nvmet_ctrl *ctrl, u16 status)
>>>
>>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>>   {
>>> +       struct nvmet_async_event *aen;
>>>          struct nvmet_req *req;
>>>
>>>          mutex_lock(&ctrl->lock);
>>>          while (ctrl->nr_async_event_cmds) {
>>> +               aen = list_first_entry(&ctrl->async_events,
>>> +                               struct nvmet_async_event, entry);
>>> +               list_del(&aen->entry);
>>> +               kfree(aen);
>>>                  req = 
>>> ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>>>                  mutex_unlock(&ctrl->lock);
>>>                  nvmet_req_complete(req, NVME_SC_INTERNAL | 
>>> NVME_SC_DNR);
>>> -- 
>>>
>>
>> The above doesn't solve the issue, this is what I see with
>> the handling of ctrl->async_events and ctrl->nr_async_events_cmds.
>>
>> After host system connects to target
>>
>> nvmet_rdma_handle_command
>>   nvmet_rdma_execute_command
>>    nvmet_execute_async_event
>>
>> Now, request is added to async_event_cmds, increment 
>> ctrl->nr_async_event_cmds++
>>
>> (just used the above, not patch #1 of this series)
>>
>> nvmet_async_events_process
>>
>> So, at this point nothing has been added to ctrl->async_events
>> and ctrl->nr_async_events_cmd is 1 so the driver breaks out
>> of while(1).
>>
>> Next test does "nvmetcli clear"
>>
>> nvmet_sq_destroy
>>   nvmet_async_events_process
>>
>> Same as before, nothing has been added to ctrl->async_events
>> and ctrl->nr_async_events_cmd is 1 so the drivers breaks out
>> of while(1).
>>
>> nvmet_async_events_free
>>
>> Nothing yet has been added to ctrl->async_events, and the
>> driver pulls the request, dec ctrl->nr_async_events_cmd to 0,
>> and nvmet_req_complete, unlock ctrl->lock.
>>
>> Then,
>>
>> nvmet_ns_free
>>   nvmet_ns_disable
>>    nvmet_ns_changed
>>     nvmet_add_async_event
>>
>> Now at this point we add the entry to ctrl->async_events, go
>> back through nvmet_async_events_process, we have an entry
>> on ctrl->async_events, but ctrl->nr_async_event_cmds is 0,
>> so the driver breaks out of while(1).
> 
> And there is your problem, the admin sq was destroyed before
> the async event was processed, and nothing cleans it up
> in the ctrl removal.
> 
> How about this?
> -- 
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index b685f99d56a1..027166c7d172 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
> nvmet_ctrl *ctrl, u16 status)
> 
>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   {
> +       struct nvmet_async_event *aen;
>          struct nvmet_req *req;
> 
>          mutex_lock(&ctrl->lock);
>          while (ctrl->nr_async_event_cmds) {
> +               aen = list_first_entry(&ctrl->async_events,
> +                               struct nvmet_async_event, entry);
> +               list_del(&aen->entry);
> +               kfree(aen);
>                  req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>                  mutex_unlock(&ctrl->lock);
>                  nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);

Umm, and this section needs to be removed now of course...

The loop here needs to be:
--
         mutex_lock(&ctrl->lock);
         while (ctrl->nr_async_event_cmds) {
                 aen = list_first_entry(&ctrl->async_events,
                                 struct nvmet_async_event, entry);
                 list_del(&aen->entry);
                 kfree(aen);
                 req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
         }
         mutex_unlock(&ctrl->lock);
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-19  8:33   ` Sagi Grimberg
  2020-05-19 19:14     ` David Milburn
@ 2020-05-20  6:16     ` Christoph Hellwig
  2020-05-20  6:59       ` Sagi Grimberg
  1 sibling, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20  6:16 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: hch, David Milburn, chaitanya.kulkarni, dwagner, linux-nvme

On Tue, May 19, 2020 at 01:33:17AM -0700, Sagi Grimberg wrote:
> Something here looks wrong to me... There is no reason to free aens here...

I think there is.  Remember we have two aen related resources:

 (1) the nvmet_req structures in ctrl->async_event_cmds.  One of these
     exists for each command that the host has outstanding
 (2) the nvmet_async_event strutures hanging off ctrl->async_events.
     One of these is added for each event that the target generates.

So when the host usually only has a single aen command outstanding (like
the Linux host) we might start having a pretty long list on the target
for a while until all the roundtrips to deliver them have been
completed, and if the admin queue gets shut down during that time we'll
need to free them.


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-19 20:51         ` Sagi Grimberg
@ 2020-05-20  6:18           ` Christoph Hellwig
  2020-05-20  7:01             ` Sagi Grimberg
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20  6:18 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: hch, David Milburn, dwagner, chaitanya.kulkarni, linux-nvme

On Tue, May 19, 2020 at 01:51:38PM -0700, Sagi Grimberg wrote:
> Umm, and this section needs to be removed now of course...
> 
> The loop here needs to be:
> --
>         mutex_lock(&ctrl->lock);
>         while (ctrl->nr_async_event_cmds) {
>                 aen = list_first_entry(&ctrl->async_events,
>                                 struct nvmet_async_event, entry);
>                 list_del(&aen->entry);
>                 kfree(aen);
>                 req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>         }
>         mutex_unlock(&ctrl->lock);
> --

->nr_async_event_cmds has nothing to do with the number of entries on
->async_events

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/2] nvmet: fixup processing async events
  2020-05-19 19:17   ` David Milburn
@ 2020-05-20  6:20     ` hch
  0 siblings, 0 replies; 29+ messages in thread
From: hch @ 2020-05-20  6:20 UTC (permalink / raw)
  To: David Milburn; +Cc: hch, dwagner, linux-nvme, Chaitanya Kulkarni

On Tue, May 19, 2020 at 02:17:39PM -0500, David Milburn wrote:
> We have only tested with rdma.

I you want to reproduce it I think we'd need to do something like:

 - generated lots of aens
 - shut down the admin connection


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  6:16     ` Christoph Hellwig
@ 2020-05-20  6:59       ` Sagi Grimberg
  2020-05-20  7:03         ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20  6:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Milburn, linux-nvme, dwagner, chaitanya.kulkarni



On 5/19/20 11:16 PM, Christoph Hellwig wrote:
> On Tue, May 19, 2020 at 01:33:17AM -0700, Sagi Grimberg wrote:
>> Something here looks wrong to me... There is no reason to free aens here...
> 
> I think there is.  Remember we have two aen related resources:
> 
>   (1) the nvmet_req structures in ctrl->async_event_cmds.  One of these
>       exists for each command that the host has outstanding
>   (2) the nvmet_async_event strutures hanging off ctrl->async_events.
>       One of these is added for each event that the target generates.
> 
> So when the host usually only has a single aen command outstanding (like
> the Linux host) we might start having a pretty long list on the target
> for a while until all the roundtrips to deliver them have been
> completed, and if the admin queue gets shut down during that time we'll
> need to free them.

That's fine, but why should it do that in the process loop?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  6:18           ` Christoph Hellwig
@ 2020-05-20  7:01             ` Sagi Grimberg
  0 siblings, 0 replies; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20  7:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Milburn, dwagner, chaitanya.kulkarni, linux-nvme


>> Umm, and this section needs to be removed now of course...
>>
>> The loop here needs to be:
>> --
>>          mutex_lock(&ctrl->lock);
>>          while (ctrl->nr_async_event_cmds) {
>>                  aen = list_first_entry(&ctrl->async_events,
>>                                  struct nvmet_async_event, entry);
>>                  list_del(&aen->entry);
>>                  kfree(aen);
>>                  req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>>          }
>>          mutex_unlock(&ctrl->lock);
>> --
> 
> ->nr_async_event_cmds has nothing to do with the number of entries on
> ->async_events

You're right, oversight.

We can just do list_for_each_safe... Moving this to the final release,
we shouldn't worry about commands, the transport will free these
resources.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  6:59       ` Sagi Grimberg
@ 2020-05-20  7:03         ` Christoph Hellwig
  2020-05-20  7:08           ` Sagi Grimberg
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20  7:03 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, David Milburn, linux-nvme, dwagner,
	chaitanya.kulkarni

On Tue, May 19, 2020 at 11:59:25PM -0700, Sagi Grimberg wrote:
> > So when the host usually only has a single aen command outstanding (like
> > the Linux host) we might start having a pretty long list on the target
> > for a while until all the roundtrips to deliver them have been
> > completed, and if the admin queue gets shut down during that time we'll
> > need to free them.
> 
> That's fine, but why should it do that in the process loop?

This patch doesn't touch the process loop, it is the free loop when
destroying the admin queue.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  7:03         ` Christoph Hellwig
@ 2020-05-20  7:08           ` Sagi Grimberg
  2020-05-20  7:15             ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20  7:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chaitanya.kulkarni, David Milburn, dwagner, linux-nvme


>>> So when the host usually only has a single aen command outstanding (like
>>> the Linux host) we might start having a pretty long list on the target
>>> for a while until all the roundtrips to deliver them have been
>>> completed, and if the admin queue gets shut down during that time we'll
>>> need to free them.
>>
>> That's fine, but why should it do that in the process loop?
> 
> This patch doesn't touch the process loop, it is the free loop when
> destroying the admin queue.

:) I guess the pandemic got to me...

Things make a lot more sense now. But don't we need to move the aens
free to after we remove the controller from the subsys->ctrls?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  7:08           ` Sagi Grimberg
@ 2020-05-20  7:15             ` Christoph Hellwig
  2020-05-20  8:06               ` Sagi Grimberg
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20  7:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, David Milburn, chaitanya.kulkarni, dwagner,
	linux-nvme

On Wed, May 20, 2020 at 12:08:54AM -0700, Sagi Grimberg wrote:
> :) I guess the pandemic got to me...
> 
> Things make a lot more sense now. But don't we need to move the aens
> free to after we remove the controller from the subsys->ctrls?

Yes.  I also think we can clean up a few things here.  How about I apply
the series from David now and send a few additional fixups on top?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  7:15             ` Christoph Hellwig
@ 2020-05-20  8:06               ` Sagi Grimberg
  2020-05-20 10:39                 ` David Milburn
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20  8:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chaitanya.kulkarni, David Milburn, dwagner, linux-nvme


>> :) I guess the pandemic got to me...
>>
>> Things make a lot more sense now. But don't we need to move the aens
>> free to after we remove the controller from the subsys->ctrls?
> 
> Yes.  I also think we can clean up a few things here.  How about I apply
> the series from David now and send a few additional fixups on top?

Sounds good

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20  8:06               ` Sagi Grimberg
@ 2020-05-20 10:39                 ` David Milburn
  2020-05-20 17:19                   ` Sagi Grimberg
  2020-05-20 17:27                   ` Christoph Hellwig
  0 siblings, 2 replies; 29+ messages in thread
From: David Milburn @ 2020-05-20 10:39 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: chaitanya.kulkarni, dwagner, linux-nvme

Hi Christoph, Sagi,

On 05/20/2020 03:06 AM, Sagi Grimberg wrote:
> 
>>> :) I guess the pandemic got to me...
>>>
>>> Things make a lot more sense now. But don't we need to move the aens
>>> free to after we remove the controller from the subsys->ctrls?
>>
>> Yes.  I also think we can clean up a few things here.  How about I apply
>> the series from David now and send a few additional fixups on top?
> 
> Sounds good
> 

Yi was able to reproduce the memleak using the v2 of the patch series
since nvmet_async_events_free() ran before nvmet_add_async_event().

http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html

With Sagi's patch below, I do see after nvmet_add_async_event(),
nvmet_async_events_process pulls the request, decrements
ctrl->nr_async_event_cmds to 0, and frees the aen,
and no memory leak.

http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html

Thanks,
David


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen
  2020-05-18 18:59 ` [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen David Milburn
@ 2020-05-20 17:18   ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20 17:18 UTC (permalink / raw)
  To: David Milburn; +Cc: hch, dwagner, chaitanya.kulkarni, linux-nvme

On Mon, May 18, 2020 at 01:59:55PM -0500, David Milburn wrote:
> Free aens while completing requests in nvmet_async_events_process().
> 
> Based-on-a-patch-by: Christoph Hellwig <hch@infradead.org>
> Tested-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: David Milburn <dmilburn@redhat.com>

Applied to nvme-5.8.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 10:39                 ` David Milburn
@ 2020-05-20 17:19                   ` Sagi Grimberg
  2020-05-20 17:23                     ` David Milburn
  2020-05-20 17:30                     ` Christoph Hellwig
  2020-05-20 17:27                   ` Christoph Hellwig
  1 sibling, 2 replies; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20 17:19 UTC (permalink / raw)
  To: David Milburn, Christoph Hellwig; +Cc: chaitanya.kulkarni, dwagner, linux-nvme


> Hi Christoph, Sagi,
> 
> On 05/20/2020 03:06 AM, Sagi Grimberg wrote:
>>
>>>> :) I guess the pandemic got to me...
>>>>
>>>> Things make a lot more sense now. But don't we need to move the aens
>>>> free to after we remove the controller from the subsys->ctrls?
>>>
>>> Yes.  I also think we can clean up a few things here.  How about I apply
>>> the series from David now and send a few additional fixups on top?
>>
>> Sounds good
>>
> 
> Yi was able to reproduce the memleak using the v2 of the patch series
> since nvmet_async_events_free() ran before nvmet_add_async_event().
> 
> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
> 
> With Sagi's patch below, I do see after nvmet_add_async_event(),
> nvmet_async_events_process pulls the request, decrements
> ctrl->nr_async_event_cmds to 0, and frees the aen,
> and no memory leak.
> 
> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html

I don't think that patch #1 is required to fix the issue, can you
confirm that? Christoph said he has some more cleaning up in this
area.

Christoph,
Do you want me to send a formal patch for this?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 17:19                   ` Sagi Grimberg
@ 2020-05-20 17:23                     ` David Milburn
  2020-05-20 17:30                     ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: David Milburn @ 2020-05-20 17:23 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: chaitanya.kulkarni, dwagner, linux-nvme

On 05/20/2020 12:19 PM, Sagi Grimberg wrote:
> 
>> Hi Christoph, Sagi,
>>
>> On 05/20/2020 03:06 AM, Sagi Grimberg wrote:
>>>
>>>>> :) I guess the pandemic got to me...
>>>>>
>>>>> Things make a lot more sense now. But don't we need to move the aens
>>>>> free to after we remove the controller from the subsys->ctrls?
>>>>
>>>> Yes.  I also think we can clean up a few things here.  How about I 
>>>> apply
>>>> the series from David now and send a few additional fixups on top?
>>>
>>> Sounds good
>>>
>>
>> Yi was able to reproduce the memleak using the v2 of the patch series
>> since nvmet_async_events_free() ran before nvmet_add_async_event().
>>
>> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
>>
>> With Sagi's patch below, I do see after nvmet_add_async_event(),
>> nvmet_async_events_process pulls the request, decrements
>> ctrl->nr_async_event_cmds to 0, and frees the aen,
>> and no memory leak.
>>
>> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html
> 
> I don't think that patch #1 is required to fix the issue, can you
> confirm that? Christoph said he has some more cleaning up in this
> area.

Yes, that is correct, when I tested your change I did not apply
patch 1.

Thanks,
David


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 10:39                 ` David Milburn
  2020-05-20 17:19                   ` Sagi Grimberg
@ 2020-05-20 17:27                   ` Christoph Hellwig
  2020-05-20 17:41                     ` Sagi Grimberg
  1 sibling, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20 17:27 UTC (permalink / raw)
  To: David Milburn
  Cc: Christoph Hellwig, linux-nvme, dwagner, Sagi Grimberg,
	chaitanya.kulkarni

On Wed, May 20, 2020 at 05:39:18AM -0500, David Milburn wrote:
> Yi was able to reproduce the memleak using the v2 of the patch series
> since nvmet_async_events_free() ran before nvmet_add_async_event().
> 
> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
> 
> With Sagi's patch below, I do see after nvmet_add_async_event(),
> nvmet_async_events_process pulls the request, decrements
> ctrl->nr_async_event_cmds to 0, and frees the aen,
> and no memory leak.
> 
> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html

Ok, let's try a version of Sagis latest suggestion then.  What about
this (replacement for this patch only, I applied the first one already):

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 54679260e6648..1525426d0a31f 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -129,8 +129,10 @@ static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
 	return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16);
 }
 
-static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
+static void nvmet_async_event_work(struct work_struct *work)
 {
+	struct nvmet_ctrl *ctrl =
+		container_of(work, struct nvmet_ctrl, async_event_work);
 	struct nvmet_async_event *aen;
 	struct nvmet_req *req;
 
@@ -139,21 +141,20 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
 		aen = list_first_entry(&ctrl->async_events,
 				       struct nvmet_async_event, entry);
 		req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
-		if (status == 0)
-			nvmet_set_result(req, nvmet_async_event_result(aen));
+		nvmet_set_result(req, nvmet_async_event_result(aen));
 
 		list_del(&aen->entry);
 		kfree(aen);
 
 		mutex_unlock(&ctrl->lock);
 		trace_nvmet_async_event(ctrl, req->cqe->result.u32);
-		nvmet_req_complete(req, status);
+		nvmet_req_complete(req, 0);
 		mutex_lock(&ctrl->lock);
 	}
 	mutex_unlock(&ctrl->lock);
 }
 
-static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
+static void nvmet_async_events_fail_all(struct nvmet_ctrl *ctrl)
 {
 	struct nvmet_req *req;
 
@@ -167,12 +168,14 @@ static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
 	mutex_unlock(&ctrl->lock);
 }
 
-static void nvmet_async_event_work(struct work_struct *work)
+static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
 {
-	struct nvmet_ctrl *ctrl =
-		container_of(work, struct nvmet_ctrl, async_event_work);
+	struct nvmet_async_event *aen, *n;
 
-	nvmet_async_events_process(ctrl, 0);
+	list_for_each_entry_safe(aen, n, &ctrl->async_events, entry) {
+		list_del(&aen->entry);
+		kfree(aen);
+	}
 }
 
 void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
@@ -768,17 +771,14 @@ static void nvmet_confirm_sq(struct percpu_ref *ref)
 
 void nvmet_sq_destroy(struct nvmet_sq *sq)
 {
-	u16 status = NVME_SC_INTERNAL | NVME_SC_DNR;
 	struct nvmet_ctrl *ctrl = sq->ctrl;
 
 	/*
 	 * If this is the admin queue, complete all AERs so that our
 	 * queue doesn't have outstanding requests on it.
 	 */
-	if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
-		nvmet_async_events_process(ctrl, status);
-		nvmet_async_events_free(ctrl);
-	}
+	if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq)
+		nvmet_async_events_fail_all(ctrl);
 	percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
 	wait_for_completion(&sq->confirm_done);
 	wait_for_completion(&sq->free_done);
@@ -1368,6 +1368,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 
 	ida_simple_remove(&cntlid_ida, ctrl->cntlid);
 
+	nvmet_async_events_free(ctrl);
 	kfree(ctrl->sqs);
 	kfree(ctrl->cqs);
 	kfree(ctrl->changed_ns_list);

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 17:19                   ` Sagi Grimberg
  2020-05-20 17:23                     ` David Milburn
@ 2020-05-20 17:30                     ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20 17:30 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, David Milburn, linux-nvme, dwagner,
	chaitanya.kulkarni

On Wed, May 20, 2020 at 10:19:27AM -0700, Sagi Grimberg wrote:
> I don't think that patch #1 is required to fix the issue, can you
> confirm that? Christoph said he has some more cleaning up in this
> area.

1 is really just a cleanup.

> Christoph,
> Do you want me to send a formal patch for this?

Take a look at what I just sent.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 17:27                   ` Christoph Hellwig
@ 2020-05-20 17:41                     ` Sagi Grimberg
  2020-05-20 17:46                       ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20 17:41 UTC (permalink / raw)
  To: Christoph Hellwig, David Milburn; +Cc: chaitanya.kulkarni, dwagner, linux-nvme



On 5/20/20 10:27 AM, Christoph Hellwig wrote:
> On Wed, May 20, 2020 at 05:39:18AM -0500, David Milburn wrote:
>> Yi was able to reproduce the memleak using the v2 of the patch series
>> since nvmet_async_events_free() ran before nvmet_add_async_event().
>>
>> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
>>
>> With Sagi's patch below, I do see after nvmet_add_async_event(),
>> nvmet_async_events_process pulls the request, decrements
>> ctrl->nr_async_event_cmds to 0, and frees the aen,
>> and no memory leak.
>>
>> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html
> 
> Ok, let's try a version of Sagis latest suggestion then.  What about
> this (replacement for this patch only, I applied the first one already):

Patch #1 is not needed, but where did you apply it?

Do you think the fix is 5.7-rc material?

The below looks fine, but maybe it would be good to split for small
and easy fix that can go to stable, and then a bit more cleanup?

> 
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 54679260e6648..1525426d0a31f 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -129,8 +129,10 @@ static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
>   	return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16);
>   }
>   
> -static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
> +static void nvmet_async_event_work(struct work_struct *work)
>   {
> +	struct nvmet_ctrl *ctrl =
> +		container_of(work, struct nvmet_ctrl, async_event_work);
>   	struct nvmet_async_event *aen;
>   	struct nvmet_req *req;
>   
> @@ -139,21 +141,20 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
>   		aen = list_first_entry(&ctrl->async_events,
>   				       struct nvmet_async_event, entry);
>   		req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
> -		if (status == 0)
> -			nvmet_set_result(req, nvmet_async_event_result(aen));
> +		nvmet_set_result(req, nvmet_async_event_result(aen));
>   
>   		list_del(&aen->entry);
>   		kfree(aen);
>   
>   		mutex_unlock(&ctrl->lock);
>   		trace_nvmet_async_event(ctrl, req->cqe->result.u32);
> -		nvmet_req_complete(req, status);
> +		nvmet_req_complete(req, 0);
>   		mutex_lock(&ctrl->lock);
>   	}
>   	mutex_unlock(&ctrl->lock);
>   }
>   
> -static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
> +static void nvmet_async_events_fail_all(struct nvmet_ctrl *ctrl)
>   {
>   	struct nvmet_req *req;
>   
> @@ -167,12 +168,14 @@ static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   	mutex_unlock(&ctrl->lock);
>   }
>   
> -static void nvmet_async_event_work(struct work_struct *work)
> +static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   {
> -	struct nvmet_ctrl *ctrl =
> -		container_of(work, struct nvmet_ctrl, async_event_work);
> +	struct nvmet_async_event *aen, *n;
>   
> -	nvmet_async_events_process(ctrl, 0);
> +	list_for_each_entry_safe(aen, n, &ctrl->async_events, entry) {
> +		list_del(&aen->entry);
> +		kfree(aen);
> +	}
>   }
>   
>   void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
> @@ -768,17 +771,14 @@ static void nvmet_confirm_sq(struct percpu_ref *ref)
>   
>   void nvmet_sq_destroy(struct nvmet_sq *sq)
>   {
> -	u16 status = NVME_SC_INTERNAL | NVME_SC_DNR;
>   	struct nvmet_ctrl *ctrl = sq->ctrl;
>   
>   	/*
>   	 * If this is the admin queue, complete all AERs so that our
>   	 * queue doesn't have outstanding requests on it.
>   	 */
> -	if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
> -		nvmet_async_events_process(ctrl, status);
> -		nvmet_async_events_free(ctrl);
> -	}
> +	if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq)
> +		nvmet_async_events_fail_all(ctrl);
>   	percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
>   	wait_for_completion(&sq->confirm_done);
>   	wait_for_completion(&sq->free_done);
> @@ -1368,6 +1368,7 @@ static void nvmet_ctrl_free(struct kref *ref)
>   
>   	ida_simple_remove(&cntlid_ida, ctrl->cntlid);
>   
> +	nvmet_async_events_free(ctrl);
>   	kfree(ctrl->sqs);
>   	kfree(ctrl->cqs);
>   	kfree(ctrl->changed_ns_list);
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 17:41                     ` Sagi Grimberg
@ 2020-05-20 17:46                       ` Christoph Hellwig
  2020-05-20 18:04                         ` Sagi Grimberg
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20 17:46 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, David Milburn, chaitanya.kulkarni, dwagner,
	linux-nvme

On Wed, May 20, 2020 at 10:41:56AM -0700, Sagi Grimberg wrote:
> 
> 
> On 5/20/20 10:27 AM, Christoph Hellwig wrote:
> > On Wed, May 20, 2020 at 05:39:18AM -0500, David Milburn wrote:
> > > Yi was able to reproduce the memleak using the v2 of the patch series
> > > since nvmet_async_events_free() ran before nvmet_add_async_event().
> > > 
> > > http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
> > > 
> > > With Sagi's patch below, I do see after nvmet_add_async_event(),
> > > nvmet_async_events_process pulls the request, decrements
> > > ctrl->nr_async_event_cmds to 0, and frees the aen,
> > > and no memory leak.
> > > 
> > > http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html
> > 
> > Ok, let's try a version of Sagis latest suggestion then.  What about
> > this (replacement for this patch only, I applied the first one already):
> 
> Patch #1 is not needed, but where did you apply it?

nvme-5.8.

> Do you think the fix is 5.7-rc material?

At this point І don't think a memleak that has been there since day 1
is 5.7 material.  І'd rather wait for 5.8 an get a stable backport
with the proper Fixes tag.

> The below looks fine, but maybe it would be good to split for small
> and easy fix that can go to stable, and then a bit more cleanup?

What would the small and simple fix be?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 17:46                       ` Christoph Hellwig
@ 2020-05-20 18:04                         ` Sagi Grimberg
  2020-05-20 18:15                           ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20 18:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chaitanya.kulkarni, David Milburn, dwagner, linux-nvme



On 5/20/20 10:46 AM, Christoph Hellwig wrote:
> On Wed, May 20, 2020 at 10:41:56AM -0700, Sagi Grimberg wrote:
>>
>>
>> On 5/20/20 10:27 AM, Christoph Hellwig wrote:
>>> On Wed, May 20, 2020 at 05:39:18AM -0500, David Milburn wrote:
>>>> Yi was able to reproduce the memleak using the v2 of the patch series
>>>> since nvmet_async_events_free() ran before nvmet_add_async_event().
>>>>
>>>> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
>>>>
>>>> With Sagi's patch below, I do see after nvmet_add_async_event(),
>>>> nvmet_async_events_process pulls the request, decrements
>>>> ctrl->nr_async_event_cmds to 0, and frees the aen,
>>>> and no memory leak.
>>>>
>>>> http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html
>>>
>>> Ok, let's try a version of Sagis latest suggestion then.  What about
>>> this (replacement for this patch only, I applied the first one already):
>>
>> Patch #1 is not needed, but where did you apply it?
> 
> nvme-5.8.
> 
>> Do you think the fix is 5.7-rc material?
> 
> At this point І don't think a memleak that has been there since day 1
> is 5.7 material.  І'd rather wait for 5.8 an get a stable backport
> with the proper Fixes tag.

Makes sense.

>> The below looks fine, but maybe it would be good to split for small
>> and easy fix that can go to stable, and then a bit more cleanup?
> 
> What would the small and simple fix be?

Just the fix related part. Then have a small cleanup incrementally.
move the nvmet_async_events_free to nvmet_ctrl_free and make it
just free the aens.
--
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index edf54d9957b7..48f5123d875b 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -158,14 +158,12 @@ static void nvmet_async_events_process(struct 
nvmet_ctrl *ctrl, u16 status)

  static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
  {
-       struct nvmet_req *req;
+       struct nvmet_async_event *aen, *tmp;

         mutex_lock(&ctrl->lock);
-       while (ctrl->nr_async_event_cmds) {
-               req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
-               mutex_unlock(&ctrl->lock);
-               nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
-               mutex_lock(&ctrl->lock);
+       list_for_each_entry_safe(aen, tmp, &ctrl->async_events, entry) {
+               list_del(&aen->entry);
+               kfree(aen);
         }
         mutex_unlock(&ctrl->lock);
  }
@@ -791,10 +789,8 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
          * If this is the admin queue, complete all AERs so that our
          * queue doesn't have outstanding requests on it.
          */
-       if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
+       if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq)
                 nvmet_async_events_process(ctrl, status);
-               nvmet_async_events_free(ctrl);
-       }
         percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
         wait_for_completion(&sq->confirm_done);
         wait_for_completion(&sq->free_done);
@@ -1427,6 +1423,7 @@ static void nvmet_ctrl_free(struct kref *ref)

         ida_simple_remove(&cntlid_ida, ctrl->cntlid);

+       nvmet_async_events_free(ctrl);
         kfree(ctrl->sqs);
         kfree(ctrl->cqs);
         kfree(ctrl->changed_ns_list);
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 18:04                         ` Sagi Grimberg
@ 2020-05-20 18:15                           ` Christoph Hellwig
  2020-05-20 19:40                             ` Sagi Grimberg
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2020-05-20 18:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, David Milburn, chaitanya.kulkarni, dwagner,
	linux-nvme

On Wed, May 20, 2020 at 11:04:28AM -0700, Sagi Grimberg wrote:
> 
> 
> On 5/20/20 10:46 AM, Christoph Hellwig wrote:
> > On Wed, May 20, 2020 at 10:41:56AM -0700, Sagi Grimberg wrote:
> > > 
> > > 
> > > On 5/20/20 10:27 AM, Christoph Hellwig wrote:
> > > > On Wed, May 20, 2020 at 05:39:18AM -0500, David Milburn wrote:
> > > > > Yi was able to reproduce the memleak using the v2 of the patch series
> > > > > since nvmet_async_events_free() ran before nvmet_add_async_event().
> > > > > 
> > > > > http://lists.infradead.org/pipermail/linux-nvme/2020-May/030512.html
> > > > > 
> > > > > With Sagi's patch below, I do see after nvmet_add_async_event(),
> > > > > nvmet_async_events_process pulls the request, decrements
> > > > > ctrl->nr_async_event_cmds to 0, and frees the aen,
> > > > > and no memory leak.
> > > > > 
> > > > > http://lists.infradead.org/pipermail/linux-nvme/2020-May/030548.html
> > > > 
> > > > Ok, let's try a version of Sagis latest suggestion then.  What about
> > > > this (replacement for this patch only, I applied the first one already):
> > > 
> > > Patch #1 is not needed, but where did you apply it?
> > 
> > nvme-5.8.
> > 
> > > Do you think the fix is 5.7-rc material?
> > 
> > At this point І don't think a memleak that has been there since day 1
> > is 5.7 material.  І'd rather wait for 5.8 an get a stable backport
> > with the proper Fixes tag.
> 
> Makes sense.
> 
> > > The below looks fine, but maybe it would be good to split for small
> > > and easy fix that can go to stable, and then a bit more cleanup?
> > 
> > What would the small and simple fix be?
> 
> Just the fix related part. Then have a small cleanup incrementally.
> move the nvmet_async_events_free to nvmet_ctrl_free and make it
> just free the aens.

Ok.  Do you want to send a proper patch?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
  2020-05-20 18:15                           ` Christoph Hellwig
@ 2020-05-20 19:40                             ` Sagi Grimberg
  0 siblings, 0 replies; 29+ messages in thread
From: Sagi Grimberg @ 2020-05-20 19:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chaitanya.kulkarni, David Milburn, dwagner, linux-nvme


>> Just the fix related part. Then have a small cleanup incrementally.
>> move the nvmet_async_events_free to nvmet_ctrl_free and make it
>> just free the aens.
> 
> Ok.  Do you want to send a proper patch?

Sure, I can do that.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-05-20 19:40 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18 18:59 [PATCH v2 0/2] nvmet: fixup processing async events David Milburn
2020-05-18 18:59 ` [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen David Milburn
2020-05-20 17:18   ` Christoph Hellwig
2020-05-18 18:59 ` [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free David Milburn
2020-05-19  8:33   ` Sagi Grimberg
2020-05-19 19:14     ` David Milburn
2020-05-19 20:42       ` Sagi Grimberg
2020-05-19 20:51         ` Sagi Grimberg
2020-05-20  6:18           ` Christoph Hellwig
2020-05-20  7:01             ` Sagi Grimberg
2020-05-20  6:16     ` Christoph Hellwig
2020-05-20  6:59       ` Sagi Grimberg
2020-05-20  7:03         ` Christoph Hellwig
2020-05-20  7:08           ` Sagi Grimberg
2020-05-20  7:15             ` Christoph Hellwig
2020-05-20  8:06               ` Sagi Grimberg
2020-05-20 10:39                 ` David Milburn
2020-05-20 17:19                   ` Sagi Grimberg
2020-05-20 17:23                     ` David Milburn
2020-05-20 17:30                     ` Christoph Hellwig
2020-05-20 17:27                   ` Christoph Hellwig
2020-05-20 17:41                     ` Sagi Grimberg
2020-05-20 17:46                       ` Christoph Hellwig
2020-05-20 18:04                         ` Sagi Grimberg
2020-05-20 18:15                           ` Christoph Hellwig
2020-05-20 19:40                             ` Sagi Grimberg
2020-05-19  8:40 ` [PATCH v2 0/2] nvmet: fixup processing async events Chaitanya Kulkarni
2020-05-19 19:17   ` David Milburn
2020-05-20  6:20     ` hch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.