From mboxrd@z Thu Jan 1 00:00:00 1970 From: sagi@grimberg.me (Sagi Grimberg) Date: Mon, 29 Jul 2019 16:31:59 -0700 Subject: [PATCH rfc 0/2] nvme controller reset and namespace scan work race conditions Message-ID: <20190729233201.27993-1-sagi@grimberg.me> Hey Hannes, Here is two patches that to my understanding of the issues you describe in your patchset: "nvme: flush rescan worker before resetting" and your report of "spurious I/O errors during failover". Patch #1 avoids removing a namespace if the revalidation I/O failed because of a racing controller reset (or removal). This should fix any spurious I/O failures during path failover. Patch #2 avoids a use-after-free condition in the case where the scan work ends up actually removing a namespace but is racing with a controller reset (that is scheduled after the removal) and is accessing the request queue (nvme_stop_queues) after it was destroyed. The one trace that you have mentioned in our discussions that is still not clear to me is: -- [67088.344034] WARNING: CPU: 4 PID: 25020 at ../lib/percpu-refcount.c:334 percpu_ref_kill_and_confirm+0x7a/0xa0 [...] [67088.344106] Call Trace: [67088.344112] blk_freeze_queue_start+0x2a/0x40 [67088.344114] blk_freeze_queue+0xe/0x40 [67088.344118] nvme_update_disk_info+0x36/0x260 [nvme_core] [67088.344122] __nvme_revalidate_disk+0xca/0xf0 [nvme_core] [67088.344125] nvme_revalidate_disk+0xa6/0x120 [nvme_core] [67088.344127] ? blk_mq_get_tag+0xa3/0x220 [67088.344130] revalidate_disk+0x23/0xc0 [67088.344133] nvme_validate_ns+0x43/0x830 [nvme_core] [67088.344137] ? wake_up_q+0x70/0x70 [67088.344139] ? blk_mq_free_request+0x12a/0x160 [67088.344142] ? __nvme_submit_sync_cmd+0x73/0xe0 [nvme_core] [67088.344145] nvme_scan_work+0x2b3/0x350 [nvme_core] [67088.344149] process_one_work+0x1da/0x400 [67088.344150] worker_thread+0x2b/0x3f0 [67088.344152] ? process_one_work+0x400/0x400 -- Which indicates that we are revalidating a namespace that was already removed. Given that the only namespace removal that is outside of the scan_work is in nvme_remove_namespaces() which flushes the scan_work before it actually removes the namespaces. I'm still lost how this can happen. Can you please apply the following two patches and report if they address the issues you are seeing? And if not, can you please report a call trace of the hanged threads? And, given that your are in a multipath environment, can you apply these on top of: "nvme: fix controller removal race with scan work"? Thanks. Sagi Grimberg (2): nvme: don't remove namespace if revalidate failed because of controller reset nvme: fix possible use-after-free condition when controller reset is racing namespace scanning drivers/nvme/host/core.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) -- 2.17.1