All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: David Milburn <dmilburn@redhat.com>, linux-nvme@lists.infradead.org
Cc: hch@infradead.org, chaitanya.kulkarni@wdc.com, dwagner@suse.de
Subject: Re: [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free
Date: Tue, 19 May 2020 13:51:38 -0700	[thread overview]
Message-ID: <37746101-6300-4364-079d-c6850d2d55d5@grimberg.me> (raw)
In-Reply-To: <a47fb849-df48-1d7e-d34a-269257487393@grimberg.me>



On 5/19/20 1:42 PM, Sagi Grimberg wrote:
> 
> 
> On 5/19/20 12:14 PM, David Milburn wrote:
>> Hi Sagi,
>>
>> On 05/19/2020 03:33 AM, Sagi Grimberg wrote:
>>>
>>>
>>> On 5/18/20 11:59 AM, David Milburn wrote:
>>>> Make sure we free all resources including any remaining aens
>>>> which may result in a memory leak.
>>>>
>>>> $ cat /sys/kernel/debug/kmemleak
>>>> unreferenced object 0xffff888c1af2c000 (size 32):
>>>>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>>>>    hex dump (first 32 bytes):
>>>>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>>>>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>>>>    backtrace:
>>>>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>>>>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>>>>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>>>>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>>>>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>>>>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>>>>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>>>>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>>>>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>>>>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>>>
>>>> Steps to Reproduce:
>>>>
>>>> target:
>>>> 1. nvmetcli restore rdma.json
>>>>
>>>> client:
>>>> 2. nvme connect -t rdma -a $IP -s 4420 -n testnqn
>>>>
>>>> target:
>>>> 3. nvmetcli clear
>>>> 4. sleep 5 && nvmetcli restore rdma.json
>>>> 5. cat /sys/kernel/debug/kmemleak after 5 minutes
>>>>
>>>> Reported-by: Yi Zhang <yi.zhang@redhat.com>
>>>> Signed-off-by: David Milburn <dmilburn@redhatcom>
>>>> ---
>>>> Changes from v1:
>>>>   - declare struct nvmet_async_event in this patch.
>>>>
>>>>   drivers/nvme/target/core.c | 8 ++++++++
>>>>   1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>>> index dc036a815d39..dda888672f31 100644
>>>> --- a/drivers/nvme/target/core.c
>>>> +++ b/drivers/nvme/target/core.c
>>>> @@ -154,6 +154,7 @@ static void nvmet_async_events_process(struct 
>>>> nvmet_ctrl *ctrl, u16 status)
>>>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>>>   {
>>>> +    struct nvmet_async_event *aen;
>>>>       struct nvmet_req *req;
>>>>       mutex_lock(&ctrl->lock);
>>>> @@ -163,6 +164,13 @@ static void nvmet_async_events_free(struct 
>>>> nvmet_ctrl *ctrl)
>>>>           nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>>>>           mutex_lock(&ctrl->lock);
>>>>       }
>>>> +
>>>> +    while (!list_empty(&ctrl->async_events)) {
>>>> +        aen = list_first_entry(&ctrl->async_events,
>>>> +                       struct nvmet_async_event, entry);
>>>> +        list_del(&aen->entry);
>>>> +        kfree(aen);
>>>> +    }
>>>>       mutex_unlock(&ctrl->lock);
>>>>   }
>>>
>>> Something here looks wrong to me... There is no reason to free aens 
>>> here...
>>>
>>> Also, seeing prior discussion on this patch
>>> we don't actually take anything from the list if we don't have an 
>>> available slot, so I
>>> don't see how patch #1 helps anything...
>>>
>>> Did you analyze the root cause of the issue? It's not clear what is 
>>> the root cause
>>> here..
>>>
>>> Looking at the code, nvmet_async_events_free which is designed to 
>>> free all the
>>> pending aens that are not going to be sent anywhere, is not freeing 
>>> anything...
>>> Its also not clear to me from the code how can ctrl->async_events 
>>> list and
>>> ctrl->nr_async_event_cmds are not correlated...
>>>
>>> Does this patch solve your issue?
>>> -- 
>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>> index b685f99d56a1..190d36ceda47 100644
>>> --- a/drivers/nvme/target/core.c
>>> +++ b/drivers/nvme/target/core.c
>>> @@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
>>> nvmet_ctrl *ctrl, u16 status)
>>>
>>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>>   {
>>> +       struct nvmet_async_event *aen;
>>>          struct nvmet_req *req;
>>>
>>>          mutex_lock(&ctrl->lock);
>>>          while (ctrl->nr_async_event_cmds) {
>>> +               aen = list_first_entry(&ctrl->async_events,
>>> +                               struct nvmet_async_event, entry);
>>> +               list_del(&aen->entry);
>>> +               kfree(aen);
>>>                  req = 
>>> ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>>>                  mutex_unlock(&ctrl->lock);
>>>                  nvmet_req_complete(req, NVME_SC_INTERNAL | 
>>> NVME_SC_DNR);
>>> -- 
>>>
>>
>> The above doesn't solve the issue, this is what I see with
>> the handling of ctrl->async_events and ctrl->nr_async_events_cmds.
>>
>> After host system connects to target
>>
>> nvmet_rdma_handle_command
>>   nvmet_rdma_execute_command
>>    nvmet_execute_async_event
>>
>> Now, request is added to async_event_cmds, increment 
>> ctrl->nr_async_event_cmds++
>>
>> (just used the above, not patch #1 of this series)
>>
>> nvmet_async_events_process
>>
>> So, at this point nothing has been added to ctrl->async_events
>> and ctrl->nr_async_events_cmd is 1 so the driver breaks out
>> of while(1).
>>
>> Next test does "nvmetcli clear"
>>
>> nvmet_sq_destroy
>>   nvmet_async_events_process
>>
>> Same as before, nothing has been added to ctrl->async_events
>> and ctrl->nr_async_events_cmd is 1 so the drivers breaks out
>> of while(1).
>>
>> nvmet_async_events_free
>>
>> Nothing yet has been added to ctrl->async_events, and the
>> driver pulls the request, dec ctrl->nr_async_events_cmd to 0,
>> and nvmet_req_complete, unlock ctrl->lock.
>>
>> Then,
>>
>> nvmet_ns_free
>>   nvmet_ns_disable
>>    nvmet_ns_changed
>>     nvmet_add_async_event
>>
>> Now at this point we add the entry to ctrl->async_events, go
>> back through nvmet_async_events_process, we have an entry
>> on ctrl->async_events, but ctrl->nr_async_event_cmds is 0,
>> so the driver breaks out of while(1).
> 
> And there is your problem, the admin sq was destroyed before
> the async event was processed, and nothing cleans it up
> in the ctrl removal.
> 
> How about this?
> -- 
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index b685f99d56a1..027166c7d172 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -157,10 +157,15 @@ static void nvmet_async_events_process(struct 
> nvmet_ctrl *ctrl, u16 status)
> 
>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>   {
> +       struct nvmet_async_event *aen;
>          struct nvmet_req *req;
> 
>          mutex_lock(&ctrl->lock);
>          while (ctrl->nr_async_event_cmds) {
> +               aen = list_first_entry(&ctrl->async_events,
> +                               struct nvmet_async_event, entry);
> +               list_del(&aen->entry);
> +               kfree(aen);
>                  req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>                  mutex_unlock(&ctrl->lock);
>                  nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);

Umm, and this section needs to be removed now of course...

The loop here needs to be:
--
         mutex_lock(&ctrl->lock);
         while (ctrl->nr_async_event_cmds) {
                 aen = list_first_entry(&ctrl->async_events,
                                 struct nvmet_async_event, entry);
                 list_del(&aen->entry);
                 kfree(aen);
                 req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
         }
         mutex_unlock(&ctrl->lock);
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-19 20:51 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-18 18:59 [PATCH v2 0/2] nvmet: fixup processing async events David Milburn
2020-05-18 18:59 ` [PATCH v2 1/2] nvmet: check command slot before pulling and freeing aen David Milburn
2020-05-20 17:18   ` Christoph Hellwig
2020-05-18 18:59 ` [PATCH v2 2/2] nvmet: avoid memleak by freeing any remaining aens in nvmet_async_events_free David Milburn
2020-05-19  8:33   ` Sagi Grimberg
2020-05-19 19:14     ` David Milburn
2020-05-19 20:42       ` Sagi Grimberg
2020-05-19 20:51         ` Sagi Grimberg [this message]
2020-05-20  6:18           ` Christoph Hellwig
2020-05-20  7:01             ` Sagi Grimberg
2020-05-20  6:16     ` Christoph Hellwig
2020-05-20  6:59       ` Sagi Grimberg
2020-05-20  7:03         ` Christoph Hellwig
2020-05-20  7:08           ` Sagi Grimberg
2020-05-20  7:15             ` Christoph Hellwig
2020-05-20  8:06               ` Sagi Grimberg
2020-05-20 10:39                 ` David Milburn
2020-05-20 17:19                   ` Sagi Grimberg
2020-05-20 17:23                     ` David Milburn
2020-05-20 17:30                     ` Christoph Hellwig
2020-05-20 17:27                   ` Christoph Hellwig
2020-05-20 17:41                     ` Sagi Grimberg
2020-05-20 17:46                       ` Christoph Hellwig
2020-05-20 18:04                         ` Sagi Grimberg
2020-05-20 18:15                           ` Christoph Hellwig
2020-05-20 19:40                             ` Sagi Grimberg
2020-05-19  8:40 ` [PATCH v2 0/2] nvmet: fixup processing async events Chaitanya Kulkarni
2020-05-19 19:17   ` David Milburn
2020-05-20  6:20     ` hch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37746101-6300-4364-079d-c6850d2d55d5@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=dmilburn@redhat.com \
    --cc=dwagner@suse.de \
    --cc=hch@infradead.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.