All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: David Milburn <dmilburn@redhat.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH] nvmet: fix memory leak when removing namespaces and controllers concurrently
Date: Wed, 20 May 2020 16:58:16 -0700	[thread overview]
Message-ID: <57a9b906-abfe-e0f4-0558-6e5ab2218b88@grimberg.me> (raw)
In-Reply-To: <396c0e62-fac2-2236-5f2d-40ee542e4814@redhat.com>


> Hi Sagi,
> 
> On 05/20/2020 02:48 PM, Sagi Grimberg wrote:
>> When removing a namespace, we add an NS_CHANGE async event, however if
>> the controller admin queue is removed after the event was added but not
>> yet processed, we won't free the aens, resulting in the below memory
>> leak [1].
>>
>> Fix that by moving nvmet_async_event_free to the final controller
>> release after it is detached from subsys->ctrls ensuring no async
>> events are added, and modify it to simply remove all pending aens.
>>
>> -- 
>> $ cat /sys/kernel/debug/kmemleak
>> unreferenced object 0xffff888c1af2c000 (size 32):
>>    comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
>>    hex dump (first 32 bytes):
>>      28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff  (..;....(..;....
>>      02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00  ...event_file...
>>    backtrace:
>>      [<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
>>      [<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
>>      [<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
>>      [<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
>>      [<00000000a15deb52>] config_item_release+0xf1/0x1c0
>>      [<000000007e148432>] configfs_rmdir+0x555/0x7c0
>>      [<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
>>      [<0000000000acaaf0>] do_rmdir+0x2b2/0x340
>>      [<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
>>      [<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>
>> Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
>> Reported-by: David Milburn <dmilburn@redhat.com>
>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
>> ---
>> David, it would be great to get your Tested-by tag.
> 
> This takes care of the memory leak, though we did see a warning from
> the mlx5 code when we rebooted the target:

This looks unrelated... your mlx5 device is timing out at teardown
resulting in the dev_watchdog complaining. I'd be very surprised
if the change caused any of this...

> 
> [ 3109.755246] ib_srpt received unrecognized IB event 8
> [ 3109.761400] mlx5_core 0000:04:00.1: 
> mlx5_cmd_fast_teardown_hca:339:(pid 5135): teardown with fast mode failed
> [ 3109.773399] mlx5_core 0000:04:00.1: 
> mlx5_cmd_force_teardown_hca:308:(pid 5135): teardown with force mode 
> failed, doing normal n
> [ 3113.147598] mlx5_core 0000:04:00.0: 
> mlx5_cmd_fast_teardown_hca:339:(pid 5135): teardown with fast mode failed
> [ 3113.159330] mlx5_core 0000:04:00.0: 
> mlx5_cmd_force_teardown_hca:308:(pid 5135): teardown with force mode 
> failed, doing normal n
> [ 3139.203383] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [ 3139.228137] nvmet: ctrl 1 fatal error occurred!
> [ 3197.571533] ------------[ cut here ]------------
> [ 3197.577286] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 
> dev_watchdog+0x258/0x260
> [ 3197.587074] Modules linked in: null_blk nvme_rdma nvme_fabrics 
> nvme_core nvmet_rdma nvmet 8021q garp mrp bonding bridge stp lld
> [ 3197.684124] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.7.0-rc6+ #2
> [ 3197.691806] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 
> 1.6.2 01/08/2016
> [ 3197.700856] RIP: 0010:dev_watchdog+0x258/0x260
> [ 3197.706512] Code: 48 85 c0 75 e4 eb 9b 4c 89 ef c6 05 ea da dd 00 01 
> e8 2c 26 fb ff 89 d9 4c 89 ee 48 c7 c7 a8 cd dc 91 48 89 5
> [ 3197.728930] RSP: 0000:ffffa68606550e80 EFLAGS: 00010282
> [ 3197.735495] RAX: 0000000000000000 RBX: 000000000000000c RCX: 
> ffffffff9205e228
> [ 3197.744201] RDX: 0000000000000001 RSI: 0000000000000092 RDI: 
> ffffffff92896d8c
> [ 3197.752906] RBP: ffff8ab814c003dc R08: 00000000000007a6 R09: 
> 0000000000000005
> [ 3197.761613] R10: 0000000000000000 R11: 0000000000000001 R12: 
> ffff8ab8232f8f40
> [ 3197.770329] R13: ffff8ab814c00000 R14: ffff8ab814c00480 R15: 
> 0000000000000140
> [ 3197.779052] FS:  0000000000000000(0000) GS:ffff8ab83f840000(0000) 
> knlGS:0000000000000000
> [ 3197.788855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3197.796041] CR2: 00007fb509690800 CR3: 0000001000d68004 CR4: 
> 00000000001606e0
> [ 3197.804789] Call Trace:
> [ 3197.808291]  <IRQ>
> [ 3197.811290]  ? pfifo_fast_enqueue+0x140/0x140
> [ 3197.816919]  call_timer_fn+0x2d/0x130
> [ 3197.821765]  run_timer_softirq+0x1a0/0x410
> [ 3197.827090]  ? tick_sched_timer+0x37/0x70
> [ 3197.832321]  ? __hrtimer_run_queues+0x110/0x280
> [ 3197.838136]  __do_softirq+0xde/0x2ec
> [ 3197.842879]  ? ktime_get+0x36/0xa0
> [ 3197.847419]  irq_exit+0xe3/0xf0
> [ 3197.851657]  smp_apic_timer_interrupt+0x74/0x130
> [ 3197.857542]  apic_timer_interrupt+0xf/0x20
> [ 3197.862835]  </IRQ>
> [ 3197.865889] RIP: 0010:mwait_idle+0x94/0x1c0
> [ 3197.871273] Code: 40 7f 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 
> a3 00 00 00 e9 07 00 00 00 0f 00 2d 37 dc 4b 00 31 c0 48 8
> [ 3197.893717] RSP: 0000:ffffa686000ebeb8 EFLAGS: 00000246 ORIG_RAX: 
> ffffffffffffff13
> [ 3197.902905] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 
> 0000000000000000
> [ 3197.911597] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 
> ffff8ab83f85fbe0
> [ 3197.920270] RBP: 0000000000000002 R08: 0000000000000000 R09: 
> 0000000000000000
> [ 3197.928934] R10: 000009a747fa1ff4 R11: 0000000000000000 R12: 
> ffffffffffffffff
> [ 3197.937581] R13: 0000000000000000 R14: 0000000000000000 R15: 
> 0000000000000000
> [ 3197.946225]  ? tick_nohz_idle_stop_tick+0x5e/0xc0
> [ 3197.952154]  do_idle+0x1bd/0x240
> [ 3197.956429]  cpu_startup_entry+0x19/0x20
> [ 3197.961473]  start_secondary+0x15f/0x1b0
> [ 3197.966499]  secondary_startup_64+0xb6/0xc0
> [ 3197.971798] ---[ end trace 95e950eb607e4843 ]---
> [ 3197.977567] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3197.985175] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3198.002627] mlx5_core 0000:05:00.0: mlx5_rsc_dump_trigger:95:(pid 
> 3203): Resource dump: Failed to access err -5
> [ 3198.014897] mlx5_core 0000:05:00.0: mlx5_rsc_dump_next:150:(pid 
> 3203): Resource dump: Failed to trigger dump, -5
> [ 3198.027229] mlx5_core 0000:05:00.0 mlx5_slave1: EQ 0x13: Cons = 0xd, 
> irqn = 0x87
> [ 3198.037669] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3213.443565] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3213.451158] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3213.503496] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3214.467566] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3214.475169] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3214.492699] mlx5_core 0000:05:00.1: mlx5_rsc_dump_trigger:95:(pid 
> 3615): Resource dump: Failed to access err -5
> [ 3214.505046] mlx5_core 0000:05:00.1: mlx5_rsc_dump_next:150:(pid 
> 3615): Resource dump: Failed to trigger dump, -5
> [ 3214.517502] mlx5_core 0000:05:00.1 mlx5_slave2: EQ 0x12: Cons = 0xa, 
> irqn = 0xb0
> [ 3214.526795] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3214.544974] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3228.803601] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3228.811280] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3228.846174] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3229.315601] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3229.323333] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
> [ 3229.341177] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3229.359106] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3229.393142] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3244.675638] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3244.675639] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3244.683486] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
> [ 3244.710638] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3244.728732] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3244.753172] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3244.766798] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3244.823195] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3260.547675] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3260.547676] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3260.547683] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
> [ 3260.582542] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3260.601189] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3260.625671] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3260.639449] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
> [ 3260.657873] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3260.678095] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3275.395712] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3275.403736] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
> [ 3275.451159] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3275.504724] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3276.419712] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3276.427760] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
> [ 3276.446435] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3276.465180] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3276.497511] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3290.755746] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3290.763803] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
> [ 3290.782450] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3290.801156] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 19, SQ: 0x1294, CQ: 0xee, SQ Cons: 0x2 SQ Prod: 0x4, usec0
> [ 3290.834065] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3291.779748] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3291.787825] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
> [ 3291.806619] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3291.825602] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3291.844466] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 14, SQ: 0x1a80, CQ: 0x8cc, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3291.875065] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3306.627784] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout detected
> [ 3306.635862] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 11, SQ: 0x1274, CQ: 0xce, SQ Cons: 0x7 SQ Prod: 0x9, usec0
> [ 3306.654708] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 12, SQ: 0x1278, CQ: 0xf2, SQ Cons: 0x8 SQ Prod: 0xa, usec0
> [ 3306.673537] mlx5_core 0000:05:00.0 mlx5_slave1: TX timeout on queue: 
> 19, SQ: 0x1294, CQ: 0xee, SQ Cons: 0x2 SQ Prod: 0x4, usec0
> [ 3306.705696] mlx5_core 0000:05:00.0 mlx5_slave1: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3307.651786] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout detected
> [ 3307.659884] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 10, SQ: 0x1a70, CQ: 0x8dc, SQ Cons: 0x8 SQ Prod: 0xa, use0
> [ 3307.678633] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 11, SQ: 0x1a74, CQ: 0x8c0, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3307.697642] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 12, SQ: 0x1a78, CQ: 0x8c4, SQ Cons: 0x6 SQ Prod: 0x8, use0
> [ 3307.716531] mlx5_core 0000:05:00.1 mlx5_slave2: TX timeout on queue: 
> 14, SQ: 0x1a80, CQ: 0x8cc, SQ Cons: 0x5 SQ Prod: 0x7, use0
> [ 3307.747161] mlx5_core 0000:05:00.1 mlx5_slave2: 
> mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(-5).
> [ 3319.427821] INFO: task kworker/2:3:1640 blocked for more than 122 
> seconds.
> [ 3319.436618]       Tainted: G        W         5.7.0-rc6+ #2
> [ 3319.443934] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [ 3319.453774] Call Trace:
> [ 3319.457583]  ? __schedule+0x262/0x6e0
> [ 3319.462744]  ? __switch_to_asm+0x40/0x70
> [ 3319.468177]  ? schedule+0x40/0xb0
> [ 3319.472913]  ? schedule_timeout+0x21e/0x310
> [ 3319.478604]  ? nvmet_destroy_namespace+0x20/0x20 [nvmet]
> [ 3319.485539]  ? wait_for_completion+0x8d/0xf0
> [ 3319.491293]  ? nvmet_sq_destroy+0x51/0xb0 [nvmet]
> [ 3319.497515]  ? nvmet_rdma_free_queue+0x17/0xb0 [nvmet_rdma]
> [ 3319.504692]  ? nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
> [ 3319.512650]  ? process_one_work+0x1a7/0x370
> [ 3319.518263]  ? worker_thread+0x30/0x380
> [ 3319.523468]  ? process_one_work+0x370/0x370
> [ 3319.529048]  ? kthread+0x10c/0x130
> [ 3319.533732]  ? kthread_park+0x80/0x80
> [ 3319.538687]  ? ret_from_fork+0x35/0x40
> 
> Thanks,
> David
> 
> 
>>
>>   drivers/nvme/target/core.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index edf54d9957b7..48f5123d875b 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -158,14 +158,12 @@ static void nvmet_async_events_process(struct 
>> nvmet_ctrl *ctrl, u16 status)
>>   static void nvmet_async_events_free(struct nvmet_ctrl *ctrl)
>>   {
>> -    struct nvmet_req *req;
>> +    struct nvmet_async_event *aen, *tmp;
>>       mutex_lock(&ctrl->lock);
>> -    while (ctrl->nr_async_event_cmds) {
>> -        req = ctrl->async_event_cmds[--ctrl->nr_async_event_cmds];
>> -        mutex_unlock(&ctrl->lock);
>> -        nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
>> -        mutex_lock(&ctrl->lock);
>> +    list_for_each_entry_safe(aen, tmp, &ctrl->async_events, entry) {
>> +        list_del(&aen->entry);
>> +        kfree(aen);
>>       }
>>       mutex_unlock(&ctrl->lock);
>>   }
>> @@ -791,10 +789,8 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>>        * If this is the admin queue, complete all AERs so that our
>>        * queue doesn't have outstanding requests on it.
>>        */
>> -    if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
>> +    if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq)
>>           nvmet_async_events_process(ctrl, status);
>> -        nvmet_async_events_free(ctrl);
>> -    }
>>       percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
>>       wait_for_completion(&sq->confirm_done);
>>       wait_for_completion(&sq->free_done);
>> @@ -1427,6 +1423,7 @@ static void nvmet_ctrl_free(struct kref *ref)
>>       ida_simple_remove(&cntlid_ida, ctrl->cntlid);
>> +    nvmet_async_events_free(ctrl);
>>       kfree(ctrl->sqs);
>>       kfree(ctrl->cqs);
>>       kfree(ctrl->changed_ns_list);
>>
> 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-20 23:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 19:48 [PATCH] nvmet: fix memory leak when removing namespaces and controllers concurrently Sagi Grimberg
2020-05-20 22:55 ` David Milburn
2020-05-20 23:58   ` Sagi Grimberg [this message]
2020-05-21  9:15     ` David Milburn
2020-05-21 15:27 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57a9b906-abfe-e0f4-0558-6e5ab2218b88@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=dmilburn@redhat.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.