All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
@ 2022-02-21 11:37 Yi Zhang
  2022-02-21 12:37 ` Yi Zhang
  2022-03-08 15:51 ` Max Gurtovoy
  0 siblings, 2 replies; 12+ messages in thread
From: Yi Zhang @ 2022-02-21 11:37 UTC (permalink / raw)
  To: RDMA mailing list; +Cc: Sagi Grimberg

Hello

Below kmemleak triggered when I do nvme connect/reset/disconnect
operations on latest 5.17.0-rc5, pls check it.

# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8883e398bc00 (size 192):
  comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
  hex dump (first 32 bytes):
    80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
    [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
    [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
    [<00000000aade682c>] blk_alloc_queue+0x400/0x840
    [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
    [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
    [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
    [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000031d8624b>] vfs_write+0x17e/0x9a0
    [<00000000471d7945>] ksys_write+0xf1/0x1c0
    [<00000000a963bc79>] do_syscall_64+0x3a/0x80
    [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8883e398a700 (size 192):
  comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
  hex dump (first 32 bytes):
    80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
    [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
    [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
    [<00000000aade682c>] blk_alloc_queue+0x400/0x840
    [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
    [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
    [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
    [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000031d8624b>] vfs_write+0x17e/0x9a0
    [<00000000471d7945>] ksys_write+0xf1/0x1c0
    [<00000000a963bc79>] do_syscall_64+0x3a/0x80
    [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8894253d9d00 (size 192):
  comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
  hex dump (first 32 bytes):
    80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
    [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
    [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
    [<00000000aade682c>] blk_alloc_queue+0x400/0x840
    [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
    [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
    [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
    [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000031d8624b>] vfs_write+0x17e/0x9a0
    [<00000000471d7945>] ksys_write+0xf1/0x1c0
    [<00000000a963bc79>] do_syscall_64+0x3a/0x80
    [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae



-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-02-21 11:37 [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing Yi Zhang
@ 2022-02-21 12:37 ` Yi Zhang
  2022-03-08 15:51 ` Max Gurtovoy
  1 sibling, 0 replies; 12+ messages in thread
From: Yi Zhang @ 2022-02-21 12:37 UTC (permalink / raw)
  To: RDMA mailing list, open list:NVM EXPRESS DRIVER; +Cc: Sagi Grimberg

add linux-nvme maillist
On Mon, Feb 21, 2022 at 7:37 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hello
>
> Below kmemleak triggered when I do nvme connect/reset/disconnect
> operations on latest 5.17.0-rc5, pls check it.
>
> # cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff8883e398bc00 (size 192):
>   comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
>   hex dump (first 32 bytes):
>     80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
>     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>     [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>     [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>     [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>     [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>     [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
>     [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>     [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>     [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>     [<00000000471d7945>] ksys_write+0xf1/0x1c0
>     [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>     [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8883e398a700 (size 192):
>   comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
>   hex dump (first 32 bytes):
>     80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
>     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>     [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>     [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>     [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>     [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>     [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
>     [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>     [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>     [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>     [<00000000471d7945>] ksys_write+0xf1/0x1c0
>     [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>     [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8894253d9d00 (size 192):
>   comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
>   hex dump (first 32 bytes):
>     80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
>     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>     [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>     [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>     [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>     [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>     [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
>     [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>     [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>     [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>     [<00000000471d7945>] ksys_write+0xf1/0x1c0
>     [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>     [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
>
>
> --
> Best Regards,
>   Yi Zhang



-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-02-21 11:37 [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing Yi Zhang
  2022-02-21 12:37 ` Yi Zhang
@ 2022-03-08 15:51 ` Max Gurtovoy
  2022-03-08 22:59   ` Yi Zhang
  1 sibling, 1 reply; 12+ messages in thread
From: Max Gurtovoy @ 2022-03-08 15:51 UTC (permalink / raw)
  To: Yi Zhang, RDMA mailing list; +Cc: Sagi Grimberg

Hi Yi Zhang,

Please send the commands to repro.

I run the following with no success to repro:

for i in `seq 100`; do echo $i &&  cat /sys/kernel/debug/kmemleak && 
echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 && 
sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done

-Max.

On 2/21/2022 1:37 PM, Yi Zhang wrote:
> Hello
>
> Below kmemleak triggered when I do nvme connect/reset/disconnect
> operations on latest 5.17.0-rc5, pls check it.
>
> # cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff8883e398bc00 (size 192):
>    comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
>    hex dump (first 32 bytes):
>      80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>      [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>      [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>      [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>      [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>      [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
>      [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>      [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>      [<00000000471d7945>] ksys_write+0xf1/0x1c0
>      [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>      [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8883e398a700 (size 192):
>    comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
>    hex dump (first 32 bytes):
>      80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>      [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>      [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>      [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>      [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>      [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
>      [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>      [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>      [<00000000471d7945>] ksys_write+0xf1/0x1c0
>      [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>      [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8894253d9d00 (size 192):
>    comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
>    hex dump (first 32 bytes):
>      80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>      [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>      [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>      [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>      [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>      [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
>      [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>      [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>      [<00000000471d7945>] ksys_write+0xf1/0x1c0
>      [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>      [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-08 15:51 ` Max Gurtovoy
@ 2022-03-08 22:59   ` Yi Zhang
  2022-03-10 11:52     ` Max Gurtovoy
  0 siblings, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2022-03-08 22:59 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: RDMA mailing list, Sagi Grimberg, open list:NVM EXPRESS DRIVER

On Tue, Mar 8, 2022 at 11:51 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>
> Hi Yi Zhang,
>
> Please send the commands to repro.
>
> I run the following with no success to repro:
>
> for i in `seq 100`; do echo $i &&  cat /sys/kernel/debug/kmemleak &&
> echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 &&
> sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done

Hi Max
Sorry, I should add more details when I report it.
The kmemleak observed when I was reproducing the "nvme reset" timeout
issue we discussed before[1], and the cmd I used are[2]

[1]
https://lore.kernel.org/linux-nvme/CAHj4cs_ir917u7Up5PBfwWpZtnVLey69pXXNjFNAjbqQ5vwU0w@mail.gmail.com/T/#m5e6dcc434fc1925b18047c348226cfbc48ffbd14
[2]
# nvme connect to target
# nvme reset /dev/nvme0
# nvme disconnect-all
# sleep 10
# echo scan > /sys/kernel/debug/kmemleak
# sleep 60
# cat /sys/kernel/debug/kmemleak


>
> -Max.
>
> On 2/21/2022 1:37 PM, Yi Zhang wrote:
> > Hello
> >
> > Below kmemleak triggered when I do nvme connect/reset/disconnect
> > operations on latest 5.17.0-rc5, pls check it.
> >
> > # cat /sys/kernel/debug/kmemleak
> > unreferenced object 0xffff8883e398bc00 (size 192):
> >    comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
> >    hex dump (first 32 bytes):
> >      80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >      [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >      [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >      [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >      [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
> >      [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >      [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >      [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >      [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >      [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > unreferenced object 0xffff8883e398a700 (size 192):
> >    comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
> >    hex dump (first 32 bytes):
> >      80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >      [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >      [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >      [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >      [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
> >      [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >      [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >      [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >      [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >      [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > unreferenced object 0xffff8894253d9d00 (size 192):
> >    comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
> >    hex dump (first 32 bytes):
> >      80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >      [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >      [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >      [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >      [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
> >      [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >      [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >      [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >      [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >      [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> >
> >
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-08 22:59   ` Yi Zhang
@ 2022-03-10 11:52     ` Max Gurtovoy
  2022-03-19  6:54       ` Yi Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Max Gurtovoy @ 2022-03-10 11:52 UTC (permalink / raw)
  To: Yi Zhang
  Cc: RDMA mailing list, Sagi Grimberg, open list:NVM EXPRESS DRIVER,
	Nitzan Carmi, Israel Rukshin


On 3/9/2022 12:59 AM, Yi Zhang wrote:
> On Tue, Mar 8, 2022 at 11:51 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>> Hi Yi Zhang,
>>
>> Please send the commands to repro.
>>
>> I run the following with no success to repro:
>>
>> for i in `seq 100`; do echo $i &&  cat /sys/kernel/debug/kmemleak &&
>> echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 &&
>> sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done
> Hi Max
> Sorry, I should add more details when I report it.
> The kmemleak observed when I was reproducing the "nvme reset" timeout
> issue we discussed before[1], and the cmd I used are[2]
>
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-nvme%2FCAHj4cs_ir917u7Up5PBfwWpZtnVLey69pXXNjFNAjbqQ5vwU0w%40mail.gmail.com%2FT%2F%23m5e6dcc434fc1925b18047c348226cfbc48ffbd14&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C8cef8eb496e84d35f52308da01575419%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637823771831899724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=kjMvRAWlBe1ym3FDQO1rdZ9%2FwtKQpscvXRG48aTt3L0%3D&amp;reserved=0
> [2]
> # nvme connect to target
> # nvme reset /dev/nvme0
> # nvme disconnect-all
> # sleep 10
> # echo scan > /sys/kernel/debug/kmemleak
> # sleep 60
> # cat /sys/kernel/debug/kmemleak
>
Thanks I was able to repro it with the above commands.

Still not clear where is the leak is, but I do see some non-symmetric 
code in the error flows that we need to fix. Plus the keep-alive timing 
movement.

It will take some time for me to debug this.

Can you repro it with tcp transport as well ?

maybe add some debug prints to catch the exact flow it happens ?

>> -Max.
>>
>> On 2/21/2022 1:37 PM, Yi Zhang wrote:
>>> Hello
>>>
>>> Below kmemleak triggered when I do nvme connect/reset/disconnect
>>> operations on latest 5.17.0-rc5, pls check it.
>>>
>>> # cat /sys/kernel/debug/kmemleak
>>> unreferenced object 0xffff8883e398bc00 (size 192):
>>>     comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
>>>     hex dump (first 32 bytes):
>>>       80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>>>       [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
>>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
>>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> unreferenced object 0xffff8883e398a700 (size 192):
>>>     comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
>>>     hex dump (first 32 bytes):
>>>       80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>>>       [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
>>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
>>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> unreferenced object 0xffff8894253d9d00 (size 192):
>>>     comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
>>>     hex dump (first 32 bytes):
>>>       80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>>>       [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
>>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
>>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>>
>>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-10 11:52     ` Max Gurtovoy
@ 2022-03-19  6:54       ` Yi Zhang
  2022-03-20 13:00         ` Sagi Grimberg
  0 siblings, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2022-03-19  6:54 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: RDMA mailing list, Sagi Grimberg, open list:NVM EXPRESS DRIVER,
	Nitzan Carmi, Israel Rukshin

On Thu, Mar 10, 2022 at 7:52 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>
>
> On 3/9/2022 12:59 AM, Yi Zhang wrote:
> > On Tue, Mar 8, 2022 at 11:51 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> >> Hi Yi Zhang,
> >>
> >> Please send the commands to repro.
> >>
> >> I run the following with no success to repro:
> >>
> >> for i in `seq 100`; do echo $i &&  cat /sys/kernel/debug/kmemleak &&
> >> echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 &&
> >> sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done
> > Hi Max
> > Sorry, I should add more details when I report it.
> > The kmemleak observed when I was reproducing the "nvme reset" timeout
> > issue we discussed before[1], and the cmd I used are[2]
> >
> > [1]
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-nvme%2FCAHj4cs_ir917u7Up5PBfwWpZtnVLey69pXXNjFNAjbqQ5vwU0w%40mail.gmail.com%2FT%2F%23m5e6dcc434fc1925b18047c348226cfbc48ffbd14&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C8cef8eb496e84d35f52308da01575419%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637823771831899724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=kjMvRAWlBe1ym3FDQO1rdZ9%2FwtKQpscvXRG48aTt3L0%3D&amp;reserved=0
> > [2]
> > # nvme connect to target
> > # nvme reset /dev/nvme0
> > # nvme disconnect-all
> > # sleep 10
> > # echo scan > /sys/kernel/debug/kmemleak
> > # sleep 60
> > # cat /sys/kernel/debug/kmemleak
> >
> Thanks I was able to repro it with the above commands.
>
> Still not clear where is the leak is, but I do see some non-symmetric
> code in the error flows that we need to fix. Plus the keep-alive timing
> movement.
>
> It will take some time for me to debug this.
>
> Can you repro it with tcp transport as well ?

Yes, nvme/tcp also can reproduce it, here is the log:

unreferenced object 0xffff8881675f7000 (size 192):
  comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
  hex dump (first 32 bytes):
    20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
    [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
    [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
    [<000000002653e58d>] blk_alloc_queue+0x400/0x840
    [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
    [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
    [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
    [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000056b79a25>] vfs_write+0x17e/0x9a0
    [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
    [<00000000c035c128>] do_syscall_64+0x3a/0x80
    [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8881675f7600 (size 192):
  comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
  hex dump (first 32 bytes):
    20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
    [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
    [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
    [<000000002653e58d>] blk_alloc_queue+0x400/0x840
    [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
    [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
    [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
    [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000056b79a25>] vfs_write+0x17e/0x9a0
    [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
    [<00000000c035c128>] do_syscall_64+0x3a/0x80
    [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8891fb6a3600 (size 192):
  comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
  hex dump (first 32 bytes):
    20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
    [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
    [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
    [<000000002653e58d>] blk_alloc_queue+0x400/0x840
    [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
    [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
    [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
    [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000056b79a25>] vfs_write+0x17e/0x9a0
    [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
    [<00000000c035c128>] do_syscall_64+0x3a/0x80
    [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae



>
> maybe add some debug prints to catch the exact flow it happens ?
>
> >> -Max.
> >>
> >> On 2/21/2022 1:37 PM, Yi Zhang wrote:
> >>> Hello
> >>>
> >>> Below kmemleak triggered when I do nvme connect/reset/disconnect
> >>> operations on latest 5.17.0-rc5, pls check it.
> >>>
> >>> # cat /sys/kernel/debug/kmemleak
> >>> unreferenced object 0xffff8883e398bc00 (size 192):
> >>>     comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
> >>>     hex dump (first 32 bytes):
> >>>       80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
> >>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>     backtrace:
> >>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >>>       [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
> >>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> unreferenced object 0xffff8883e398a700 (size 192):
> >>>     comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
> >>>     hex dump (first 32 bytes):
> >>>       80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
> >>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>     backtrace:
> >>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >>>       [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
> >>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> unreferenced object 0xffff8894253d9d00 (size 192):
> >>>     comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
> >>>     hex dump (first 32 bytes):
> >>>       80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
> >>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>     backtrace:
> >>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >>>       [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
> >>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>
> >>>
> >>>
> >
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-19  6:54       ` Yi Zhang
@ 2022-03-20 13:00         ` Sagi Grimberg
  2022-03-21  2:06           ` Yi Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2022-03-20 13:00 UTC (permalink / raw)
  To: Yi Zhang, Max Gurtovoy
  Cc: RDMA mailing list, open list:NVM EXPRESS DRIVER, Nitzan Carmi,
	Israel Rukshin


>>> # nvme connect to target
>>> # nvme reset /dev/nvme0
>>> # nvme disconnect-all
>>> # sleep 10
>>> # echo scan > /sys/kernel/debug/kmemleak
>>> # sleep 60
>>> # cat /sys/kernel/debug/kmemleak
>>>
>> Thanks I was able to repro it with the above commands.
>>
>> Still not clear where is the leak is, but I do see some non-symmetric
>> code in the error flows that we need to fix. Plus the keep-alive timing
>> movement.
>>
>> It will take some time for me to debug this.
>>
>> Can you repro it with tcp transport as well ?
> 
> Yes, nvme/tcp also can reproduce it, here is the log:
> 
> unreferenced object 0xffff8881675f7000 (size 192):
>    comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
>    hex dump (first 32 bytes):
>      20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
>      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
>      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
>      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
>      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
>      [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
>      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
>      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
>      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
>      [<00000000c035c128>] do_syscall_64+0x3a/0x80
>      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8881675f7600 (size 192):
>    comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
>    hex dump (first 32 bytes):
>      20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
>      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
>      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
>      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
>      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
>      [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
>      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
>      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
>      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
>      [<00000000c035c128>] do_syscall_64+0x3a/0x80
>      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8891fb6a3600 (size 192):
>    comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
>    hex dump (first 32 bytes):
>      20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
>      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
>      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
>      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
>      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
>      [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
>      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
>      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
>      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
>      [<00000000c035c128>] do_syscall_64+0x3a/0x80
>      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Looks like there is some asymmetry on blk_iolatency. It is intialized
when allocating a request queue and exited when deleting a genhd. In
nvme we have request queues that will never have genhd that corresponds
to them (like the admin queue).

Does this patch eliminate the issue?
--
diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d..6ccc02a41f25 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)

         blk_queue_flag_set(QUEUE_FLAG_DEAD, q);

+       rq_qos_exit(q);
         blk_sync_queue(q);
         if (queue_is_mq(q)) {
                 blk_mq_cancel_work_sync(q);
diff --git a/block/genhd.c b/block/genhd.c
index 54f60ded2ee6..10ff0606c100 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -626,7 +626,6 @@ void del_gendisk(struct gendisk *disk)

         blk_mq_freeze_queue_wait(q);

-       rq_qos_exit(q);
         blk_sync_queue(q);
         blk_flush_integrity();
         /*
--

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-20 13:00         ` Sagi Grimberg
@ 2022-03-21  2:06           ` Yi Zhang
  2022-03-21  9:25             ` Sagi Grimberg
  0 siblings, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2022-03-21  2:06 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, RDMA mailing list, open list:NVM EXPRESS DRIVER,
	Nitzan Carmi, Israel Rukshin

On Sun, Mar 20, 2022 at 9:00 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
> >>> # nvme connect to target
> >>> # nvme reset /dev/nvme0
> >>> # nvme disconnect-all
> >>> # sleep 10
> >>> # echo scan > /sys/kernel/debug/kmemleak
> >>> # sleep 60
> >>> # cat /sys/kernel/debug/kmemleak
> >>>
> >> Thanks I was able to repro it with the above commands.
> >>
> >> Still not clear where is the leak is, but I do see some non-symmetric
> >> code in the error flows that we need to fix. Plus the keep-alive timing
> >> movement.
> >>
> >> It will take some time for me to debug this.
> >>
> >> Can you repro it with tcp transport as well ?
> >
> > Yes, nvme/tcp also can reproduce it, here is the log:
> >
> > unreferenced object 0xffff8881675f7000 (size 192):
> >    comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
> >    hex dump (first 32 bytes):
> >      20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
> >      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
> >      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
> >      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
> >      [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
> >      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
> >      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
> >      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
> >      [<00000000c035c128>] do_syscall_64+0x3a/0x80
> >      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > unreferenced object 0xffff8881675f7600 (size 192):
> >    comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
> >    hex dump (first 32 bytes):
> >      20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
> >      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
> >      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
> >      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
> >      [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
> >      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
> >      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
> >      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
> >      [<00000000c035c128>] do_syscall_64+0x3a/0x80
> >      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > unreferenced object 0xffff8891fb6a3600 (size 192):
> >    comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
> >    hex dump (first 32 bytes):
> >      20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
> >      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
> >      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
> >      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
> >      [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
> >      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
> >      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
> >      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
> >      [<00000000c035c128>] do_syscall_64+0x3a/0x80
> >      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Looks like there is some asymmetry on blk_iolatency. It is intialized
> when allocating a request queue and exited when deleting a genhd. In
> nvme we have request queues that will never have genhd that corresponds
> to them (like the admin queue).
>
> Does this patch eliminate the issue?

Yes, the nvme/rdma nvme/tcp kmemleak fixed with the change.

> --
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 94bf37f8e61d..6ccc02a41f25 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
>
>          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>
> +       rq_qos_exit(q);
>          blk_sync_queue(q);
>          if (queue_is_mq(q)) {
>                  blk_mq_cancel_work_sync(q);
> diff --git a/block/genhd.c b/block/genhd.c
> index 54f60ded2ee6..10ff0606c100 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -626,7 +626,6 @@ void del_gendisk(struct gendisk *disk)
>
>          blk_mq_freeze_queue_wait(q);
>
> -       rq_qos_exit(q);
>          blk_sync_queue(q);
>          blk_flush_integrity();
>          /*
> --
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-21  2:06           ` Yi Zhang
@ 2022-03-21  9:25             ` Sagi Grimberg
  2022-03-22  4:58               ` Yi Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2022-03-21  9:25 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Max Gurtovoy, RDMA mailing list, open list:NVM EXPRESS DRIVER,
	Nitzan Carmi, Israel Rukshin


>>>>> # nvme connect to target
>>>>> # nvme reset /dev/nvme0
>>>>> # nvme disconnect-all
>>>>> # sleep 10
>>>>> # echo scan > /sys/kernel/debug/kmemleak
>>>>> # sleep 60
>>>>> # cat /sys/kernel/debug/kmemleak
>>>>>
>>>> Thanks I was able to repro it with the above commands.
>>>>
>>>> Still not clear where is the leak is, but I do see some non-symmetric
>>>> code in the error flows that we need to fix. Plus the keep-alive timing
>>>> movement.
>>>>
>>>> It will take some time for me to debug this.
>>>>
>>>> Can you repro it with tcp transport as well ?
>>>
>>> Yes, nvme/tcp also can reproduce it, here is the log:

Looks like the offending commit was 8e141f9eb803 ("block: drain file 
system I/O on del_gendisk") which moved the call-site for a reason.

However rq_qos_exit() should be reentrant safe, so can you verify
that this change eliminates the issue as well?
--
diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d..6ccc02a41f25 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)

         blk_queue_flag_set(QUEUE_FLAG_DEAD, q);

+       rq_qos_exit(q);
         blk_sync_queue(q);
         if (queue_is_mq(q)) {
                 blk_mq_cancel_work_sync(q);
--

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-21  9:25             ` Sagi Grimberg
@ 2022-03-22  4:58               ` Yi Zhang
  2022-03-22  7:36                 ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2022-03-22  4:58 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, RDMA mailing list, open list:NVM EXPRESS DRIVER,
	Nitzan Carmi, Israel Rukshin, Ming Lei

On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
> >>>>> # nvme connect to target
> >>>>> # nvme reset /dev/nvme0
> >>>>> # nvme disconnect-all
> >>>>> # sleep 10
> >>>>> # echo scan > /sys/kernel/debug/kmemleak
> >>>>> # sleep 60
> >>>>> # cat /sys/kernel/debug/kmemleak
> >>>>>
> >>>> Thanks I was able to repro it with the above commands.
> >>>>
> >>>> Still not clear where is the leak is, but I do see some non-symmetric
> >>>> code in the error flows that we need to fix. Plus the keep-alive timing
> >>>> movement.
> >>>>
> >>>> It will take some time for me to debug this.
> >>>>
> >>>> Can you repro it with tcp transport as well ?
> >>>
> >>> Yes, nvme/tcp also can reproduce it, here is the log:
>
> Looks like the offending commit was 8e141f9eb803 ("block: drain file
> system I/O on del_gendisk") which moved the call-site for a reason.
>
> However rq_qos_exit() should be reentrant safe, so can you verify
> that this change eliminates the issue as well?

Yes, this change also fixed the kmemleak, thanks.

> --
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 94bf37f8e61d..6ccc02a41f25 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
>
>          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>
> +       rq_qos_exit(q);
>          blk_sync_queue(q);
>          if (queue_is_mq(q)) {
>                  blk_mq_cancel_work_sync(q);
> --
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-22  4:58               ` Yi Zhang
@ 2022-03-22  7:36                 ` Ming Lei
  2022-03-22 12:50                   ` Yi Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-03-22  7:36 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Sagi Grimberg, Max Gurtovoy, RDMA mailing list,
	open list:NVM EXPRESS DRIVER, Nitzan Carmi, Israel Rukshin

On Tue, Mar 22, 2022 at 12:58 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@grimberg.me> wrote:
> >
> >
> > >>>>> # nvme connect to target
> > >>>>> # nvme reset /dev/nvme0
> > >>>>> # nvme disconnect-all
> > >>>>> # sleep 10
> > >>>>> # echo scan > /sys/kernel/debug/kmemleak
> > >>>>> # sleep 60
> > >>>>> # cat /sys/kernel/debug/kmemleak
> > >>>>>
> > >>>> Thanks I was able to repro it with the above commands.
> > >>>>
> > >>>> Still not clear where is the leak is, but I do see some non-symmetric
> > >>>> code in the error flows that we need to fix. Plus the keep-alive timing
> > >>>> movement.
> > >>>>
> > >>>> It will take some time for me to debug this.
> > >>>>
> > >>>> Can you repro it with tcp transport as well ?
> > >>>
> > >>> Yes, nvme/tcp also can reproduce it, here is the log:
> >
> > Looks like the offending commit was 8e141f9eb803 ("block: drain file
> > system I/O on del_gendisk") which moved the call-site for a reason.
> >
> > However rq_qos_exit() should be reentrant safe, so can you verify
> > that this change eliminates the issue as well?
>
> Yes, this change also fixed the kmemleak, thanks.
>
> > --
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 94bf37f8e61d..6ccc02a41f25 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
> >
> >          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
> >
> > +       rq_qos_exit(q);
> >          blk_sync_queue(q);
> >          if (queue_is_mq(q)) {
> >                  blk_mq_cancel_work_sync(q);

BTW,  the similar fix has been merged to v5.17:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daaca3522a8e67c46e39ef09c1d542e866f85f3b

Thanks,


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing
  2022-03-22  7:36                 ` Ming Lei
@ 2022-03-22 12:50                   ` Yi Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Yi Zhang @ 2022-03-22 12:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Sagi Grimberg, Max Gurtovoy, RDMA mailing list,
	open list:NVM EXPRESS DRIVER, Nitzan Carmi, Israel Rukshin

On Tue, Mar 22, 2022 at 3:36 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Mar 22, 2022 at 12:58 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> >
> > On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@grimberg.me> wrote:
> > >
> > >
> > > >>>>> # nvme connect to target
> > > >>>>> # nvme reset /dev/nvme0
> > > >>>>> # nvme disconnect-all
> > > >>>>> # sleep 10
> > > >>>>> # echo scan > /sys/kernel/debug/kmemleak
> > > >>>>> # sleep 60
> > > >>>>> # cat /sys/kernel/debug/kmemleak
> > > >>>>>
> > > >>>> Thanks I was able to repro it with the above commands.
> > > >>>>
> > > >>>> Still not clear where is the leak is, but I do see some non-symmetric
> > > >>>> code in the error flows that we need to fix. Plus the keep-alive timing
> > > >>>> movement.
> > > >>>>
> > > >>>> It will take some time for me to debug this.
> > > >>>>
> > > >>>> Can you repro it with tcp transport as well ?
> > > >>>
> > > >>> Yes, nvme/tcp also can reproduce it, here is the log:
> > >
> > > Looks like the offending commit was 8e141f9eb803 ("block: drain file
> > > system I/O on del_gendisk") which moved the call-site for a reason.
> > >
> > > However rq_qos_exit() should be reentrant safe, so can you verify
> > > that this change eliminates the issue as well?
> >
> > Yes, this change also fixed the kmemleak, thanks.
> >
> > > --
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 94bf37f8e61d..6ccc02a41f25 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
> > >
> > >          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
> > >
> > > +       rq_qos_exit(q);
> > >          blk_sync_queue(q);
> > >          if (queue_is_mq(q)) {
> > >                  blk_mq_cancel_work_sync(q);
>
> BTW,  the similar fix has been merged to v5.17:

Thanks Ming, confirmed the kmemleak was fixed on v5.17

>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daaca3522a8e67c46e39ef09c1d542e866f85f3b
>
> Thanks,
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-22 12:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 11:37 [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing Yi Zhang
2022-02-21 12:37 ` Yi Zhang
2022-03-08 15:51 ` Max Gurtovoy
2022-03-08 22:59   ` Yi Zhang
2022-03-10 11:52     ` Max Gurtovoy
2022-03-19  6:54       ` Yi Zhang
2022-03-20 13:00         ` Sagi Grimberg
2022-03-21  2:06           ` Yi Zhang
2022-03-21  9:25             ` Sagi Grimberg
2022-03-22  4:58               ` Yi Zhang
2022-03-22  7:36                 ` Ming Lei
2022-03-22 12:50                   ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.