linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: David Milburn <dmilburn@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH] nvmet: fix memory leak when removing namespaces and controllers concurrently
Date: Thu, 21 May 2020 04:15:36 -0500	[thread overview]
Message-ID: <02cdd11e-7999-7912-3a6b-bd9b898a113e@redhat.com> (raw)
In-Reply-To: <57a9b906-abfe-e0f4-0558-6e5ab2218b88@grimberg.me>

On 05/20/2020 06:58 PM, Sagi Grimberg wrote:
> 
>> Hi Sagi,
>>
>> On 05/20/2020 02:48 PM, Sagi Grimberg wrote:
>>> When removing a namespace, we add an NS_CHANGE async event, however if
>>> the controller admin queue is removed after the event was added but not
>>> yet processed, we won't free the aens, resulting in the below memory
>>> leak [1].
>>>
>>> Fix that by moving nvmet_async_event_free to the final controller
>>> release after it is detached from subsys->ctrls ensuring no async
>>> events are added, and modify it to simply remove all pending aens.
>>>
>>> -- 
>>> $ cat /sys/kernel/debug/kmemleak
>>> unreferenced object 0xffff888c1af2c000 (size 32):
>>>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>>>    hex dump (first 32 bytes):
>>>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>>>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>>>    backtrace:
>>>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>>>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>>>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>>>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>>>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>>>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>>>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>>>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>>>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>>>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>>
>>> Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
>>> Reported-by: David Milburn <dmilburn@redhat.com>
>>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
>>> ---
>>> David, it would be great to get your Tested-by tag.
>>
>> This takes care of the memory leak, though we did see a warning from
>> the mlx5 code when we rebooted the target:
> 
> This looks unrelated... your mlx5 device is timing out at teardown
> resulting in the dev_watchdog complaining. I'd be very surprise > if the change caused any of this...

Ok, thanks Sagi.

Tested-by: David Milburn <dmilburn@redhat.com>

> 
>>
>> [ 3109.755246] ib_srpt received unrecognized IB event 8
>> [ 3109.761400] mlx5_core 0000:04:00.1: 
>> mlx5_cmd_fast_teardown_hca:339:(pid 5135): teardown with fast mode failed
>> [ 3109.773399] mlx5_core 0000:04:00.1: 
>> mlx5_cmd_force_teardown_hca:308:(pid 5135): teardown with force mode 
>> failed, doing normal n
>> [ 3113.147598] mlx5_core 0000:04:00.0: 
>> mlx5_cmd_fast_teardown_hca:339:(pid 5135): teardown with fast mode failed
>> [ 3113.159330] mlx5_core 0000:04:00.0: 
>> mlx5_cmd_force_teardown_hca:308:(pid 5135): teardown with force mode 
>> failed, doing normal n
>> [ 3139.203383] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>> [ 3139.228137] nvmet: ctrl 1 fatal error occurred!
>> [ 3197.571533] ------------[ cut here ]------------
>> [ 3197.577286] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 
>> dev_watchdog+0x258/0x260
>> [ 3197.587074] Modules linked in: null_blk nvme_rdma nvme_fabrics 
>> nvme_core nvmet_rdma nvmet 8021q garp mrp bonding bridge stp lld
>> [ 3197.684124] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.7.0-rc6+ #2
>> [ 3197.691806] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 
>> 1.6.2 01/08/2016
>> [ 3197.700856] RIP: 0010:dev_watchdog+0x258/0x260
>> [ 3197.706512] Code: 48 85 c0 75 e4 eb 9b 4c 89 ef c6 05 ea da dd 00 
>> 01 e8 2c 26 fb ff 89 d9 4c 89 ee 48 c7 c7 a8 cd dc 91 48 89 5
>> [ 3197.728930] RSP: 0000:ffffa68606550e80 EFLAGS: 00010282
>> [ 3197.735495] RAX: 0000000000000000 RBX: 000000000000000c RCX: 
>> ffffffff9205e228
>> [ 3197.744201] RDX: 0000000000000001 RSI: 0000000000000092 RDI: 
>> ffffffff92896d8c
>> [ 3197.752906] RBP: ffff8ab814c003dc R08: 00000000000007a6 R09: 
>> 0000000000000005
>> [ 3197.761613] R10: 0000000000000000 R11: 0000000000000001 R12: 
>> ffff8ab8232f8f40
>> [ 3197.770329] R13: ffff8ab814c00000 R14: ffff8ab814c00480 R15: 
>> 0000000000000140
>> [ 3197.779052] FS:  0000000000000000(0000) GS:ffff8ab83f840000(0000) 
>> knlGS:0000000000000000
>> [ 3197.788855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3197.796041] CR2: 00007fb509690800 CR3: 0000001000d68004 CR4: 
>> 00000000001606e0
>> [ 3197.804789] Call Trace:
>> [ 3197.808291]  <IRQ>
>> [ 3197.811290]  ? pfifo_fast_enqueue+0x140/0x140
>> [ 3197.816919]  call_timer_fn+0x2d/0x130
>> [ 3197.821765]  run_timer_softirq+0x1a0/0x410
>> [ 3197.827090]  ? tick_sched_timer+0x37/0x70
>> [ 3197.832321]  ? __hrtimer_run_queues+0x110/0x280
>> [ 3197.838136]  __do_softirq+0xde/0x2ec
>> [ 3197.842879]  ? ktime_get+0x36/0xa0
>> [ 3197.847419]  irq_exit+0xe3/0xf0
>> [ 3197.851657]  smp_apic_timer_interrupt+0x74/0x130
>> [ 3197.857542]  apic_timer_interrupt+0xf/0x20
>> [ 3197.862835]  </IRQ>
>> [ 3197.865889] RIP: 0010:mwait_idle+0x94/0x1c0
>> [ 3197.871273] Code: 40 7f 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 
>> 85 a3 00 00 00 e9 07 00 00 00 0f 00 2d 37 dc 4b 00 31 c0 48 8
>> [ 3197.893717] RSP: 0000:ffffa686000ebeb8 EFLAGS: 00000246 ORIG_RAX: 
>> ffffffffffffff13
>> [ 3197.902905] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 
>> 0000000000000000
>> [ 3197.911597] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 
>> ffff8ab83f85fbe0
>> [ 3197.920270] RBP: 0000000000000002 R08: 0000000000000000 R09: 
>> 0000000000000000
>> [ 3197.928934] R10: 000009a747fa1ff4 R11: 0000000000000000 R12: 
>> ffffffffffffffff
>> [ 3197.937581] R13: 0000000000000000 R14: 0000000000000000 R15: 
>> 0000000000000000
>> [ 3197.946225]  ? tick_nohz_idle_stop_tick+0x5e/0xc0
>> [ 3197.952154]  do_idle+0x1bd/0x240
>> [ 3197.956429]  cpu_startup_entry+0x19/0x20
>> [ 3197.961473]  start_secondary+0x15f/0x1b0
>> [ 3197.966499]  secondary_startup_64+0xb6/0xc0
>> [ 3197.971798] ---[ end trace 95e950eb607e4843 ]---
>> [ 3197.977567] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3197.985175] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3198.002627] mlx5_core 0000:05:00.0: mlx5_rsc_dump_trigger:95:(pid 
>> 3203): Resource dump: Failed to access err -5
>> [ 3198.014897] mlx5_core 0000:05:00.0: mlx5_rsc_dump_next:150:(pid 
>> 3203): Resource dump: Failed to trigger dump, -5
>> [ 3198.027229] mlx5_core 0000:05:00.0 mlx5_slave1: EQ 0x13: Cons = 
>> 0xd, irqn = 0x87
>> [ 3198.037669] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3213.443565] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3213.451158] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3213.503496] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3214.467566] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3214.475169] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3214.492699] mlx5_core 0000:05:00.1: mlx5_rsc_dump_trigger:95:(pid 
>> 3615): Resource dump: Failed to access err -5
>> [ 3214.505046] mlx5_core 0000:05:00.1: mlx5_rsc_dump_next:150:(pid 
>> 3615): Resource dump: Failed to trigger dump, -5
>> [ 3214.517502] mlx5_core 0000:05:00.1 mlx5_slave2: EQ 0x12: Cons = 
>> 0xa, irqn = 0xb0
>> [ 3214.526795] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3214.544974] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3228.803601] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3228.811280] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3228.846174] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3229.315601] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3229.323333] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
>> [ 3229.341177] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3229.359106] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3229.393142] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3244.675638] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3244.675639] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3244.683486] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
>> [ 3244.710638] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3244.728732] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3244.753172] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3244.766798] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3244.823195] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3260.547675] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3260.547676] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3260.547683] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
>> [ 3260.582542] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3260.601189] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3260.625671] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3260.639449] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
>> [ 3260.657873] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3260.678095] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3275.395712] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3275.403736] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
>> [ 3275.451159] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3275.504724] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3276.419712] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3276.427760] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
>> [ 3276.446435] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3276.465180] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3276.497511] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3290.755746] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3290.763803] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
>> [ 3290.782450] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3290.801156] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 19, SQ: 0x1294, CQ: 0xee, SQ Cons: 0x2 SQ Prod: 0x4, usec0
>> [ 3290.834065] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3291.779748] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3291.787825] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
>> [ 3291.806619] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3291.825602] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3291.844466] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 14, SQ: 0x1a80, CQ: 0x8cc, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3291.875065] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3306.627784] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
>> [ 3306.635862] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
>> [ 3306.654708] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
>> [ 3306.673537] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on 
>> queue: 19, SQ: 0x1294, CQ: 0xee, SQ Cons: 0x2 SQ Prod: 0x4, usec0
>> [ 3306.705696] mlx5_core 0000:05:00.0 mlx5_slave1: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3307.651786] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
>> [ 3307.659884] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
>> [ 3307.678633] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3307.697642] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
>> [ 3307.716531] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on 
>> queue: 14, SQ: 0x1a80, CQ: 0x8cc, SQ Cons: 0x5 SQ Prod: 0x7, use0
>> [ 3307.747161] mlx5_core 0000:05:00.1 mlx5_slave2: 
>> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
>> [ 3319.427821] INFO: task kworker/2:3:1640 blocked for more than 122 
>> seconds.
>> [ 3319.436618]       Tainted: G        W         5.7.0-rc6+ #2
>> [ 3319.443934] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [ 3319.453774] Call Trace:
>> [ 3319.457583]  ? __schedule+0x262/0x6e0
>> [ 3319.462744]  ? __switch_to_asm+0x40/0x70
>> [ 3319.468177]  ? schedule+0x40/0xb0
>> [ 3319.472913]  ? schedule_timeout+0x21e/0x310
>> [ 3319.478604]  ? nvmet_destroy_namespace+0x20/0x20 [nvmet]
>> [ 3319.485539]  ? wait_for_completion+0x8d/0xf0
>> [ 3319.491293]  ? nvmet_sq_destroy+0x51/0xb0 [nvmet]
>> [ 3319.497515]  ? nvmet_rdma_free_queue+0x17/0xb0 [nvmet_rdma]
>> [ 3319.504692]  ? nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
>> [ 3319.512650]  ? process_one_work+0x1a7/0x370
>> [ 3319.518263]  ? worker_thread+0x30/0x380
>> [ 3319.523468]  ? process_one_work+0x370/0x370
>> [ 3319.529048]  ? kthread+0x10c/0x130
>> [ 3319.533732]  ? kthread_park+0x80/0x80
>> [ 3319.538687]  ? ret_from_fork+0x35/0x40
>>
>> Thanks,
>> David
>>
>>
>>>
>>>   drivers/nvme/target/core.c | 15 ++++++---------
>>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>> index edf54d9957b7..48f5123d875b 100644
>>> --- a/drivers/nvme/target/core.c
>>> +++ b/drivers/nvme/target/core.c
>>> @@ -158,14 +158,12 @@ static void nvmet_async_events_process(struct 
>>> nvmet_ctrl *ctrl, u16 status)
>>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>>   {
>>> -    struct nvmet_req *req;
>>> +    struct nvmet_async_event *aen, *tmp;
>>>       mutex_lock(&ctrl->lock);
>>> -    while (ctrl->nr_async_event_cmds) {
>>> -        req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>>> -        mutex_unlock(&ctrl->lock);
>>> -        nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>>> -        mutex_lock(&ctrl->lock);
>>> +    list_for_each_entry_safe(aen, tmp, &ctrl->async_events, entry) {
>>> +        list_del(&aen->entry);
>>> +        kfree(aen);
>>>       }
>>>       mutex_unlock(&ctrl->lock);
>>>   }
>>> @@ -791,10 +789,8 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>>>        * If this is the admin queue, complete all AERs so that our
>>>        * queue doesn't have outstanding requests on it.
>>>        */
>>> -    if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
>>> +    if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq)
>>>           nvmet_async_events_process(ctrl, status);
>>> -        nvmet_async_events_free(ctrl);
>>> -    }
>>>       percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
>>>       wait_for_completion(&sq->confirm_done);
>>>       wait_for_completion(&sq->free_done);
>>> @@ -1427,6 +1423,7 @@ static void nvmet_ctrl_free(struct kref *ref)
>>>       ida_simple_remove(&cntlid_ida, ctrl->cntlid);
>>> +    nvmet_async_events_free(ctrl);
>>>       kfree(ctrl->sqs);
>>>       kfree(ctrl->cqs);
>>>       kfree(ctrl->changed_ns_list);
>>>
>>
> 


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-21  9:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 19:48 [PATCH] nvmet: fix memory leak when removing namespaces and controllers concurrently Sagi Grimberg
2020-05-20 22:55 ` David Milburn
2020-05-20 23:58   ` Sagi Grimberg
2020-05-21  9:15     ` David Milburn [this message]
2020-05-21 15:27 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02cdd11e-7999-7912-3a6b-bd9b898a113e@redhat.com \
    --to=dmilburn@redhat.com \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).