RE: : name: possible deadlock in nvme_ns_remove

From: Judy Brock <judy.brock@samsung.com>
To: "Sagi Grimberg" <sagi@grimberg.me>,
	"Javier González" <javier@javigon.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: Judy Brock <judy.brock@samsung.com>
Subject: RE: : name: possible deadlock in nvme_ns_remove
Date: Tue, 1 Oct 2019 22:09:30 +0000	[thread overview]
Message-ID: <ed69f7a430a74c988af825735968c066@samsung.com> (raw)
In-Reply-To: <3ac73ede-1be4-8459-01f2-1a24b83a1155@grimberg.me>

Thank you Sagi - we are trying it now. If it does not fail within a few days I think it will be safe to say it worked (I need to check with the test team to know exactly what the failure cadence has been).

I think this email might not make it to the mailing list so if you could please post my response I would appreciate it. Thank you so much for the quick response!

Judy Brock
Samsung

-----Original Message-----
From: Sagi Grimberg [mailto:sagi@grimberg.me] 
Sent: Tuesday, October 1, 2019 11:08 AM
To: Javier González; linux-nvme@lists.infradead.org
Cc: Judy Brock
Subject: Re: : name: possible deadlock in nvme_ns_remove

> Sending this bug report from Judy, who is having problems with the mailing list.

Hey Judy,

It appears that you indeed step on a real issue. It seems that
if nvme_update_formats will fail the revalidate we will acquire
the sybsys->lock and hang.

Does this change solve the issue?
--

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd7dea36c3b6..dc785b1aff8a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1306,8 +1306,6 @@ static void nvme_update_formats(struct nvme_ctrl 
*ctrl)
                 if (ns->disk && nvme_revalidate_disk(ns->disk))
                         nvme_set_queue_dying(ns);
         up_read(&ctrl->namespaces_rwsem);
-
-       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
  }

  static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
@@ -1325,6 +1323,9 @@ static void nvme_passthru_end(struct nvme_ctrl 
*ctrl, u32 effects)
                 mutex_unlock(&ctrl->subsys->lock);
                 mutex_unlock(&ctrl->scan_lock);
         }
+
+       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
+
         if (effects & NVME_CMD_EFFECTS_CCC)
                 nvme_init_identify(ctrl);
         if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
-- 

> 
> —
> 
> Hi,
> 
> We are running test cases to stress our device that seem to hit race conditions / deadlocks  - every couple of days or so - where multiple threads are trying to acquire ctrl->subsystem->lock or ctrl->scan_lock.
> 
> We'd like to know if our test is valid. If not, we are hoping for some overall guidelines about what a host is not permitted/expected not to do.
> 
> We are sending a lot of nvme cli requests to do Sanitize, Format, FW Download, FW Activate, Flush, Get Log, Identify, and reset requests to two controllers that share a namespace - some of those commands target a namespace, some target a controller.  They're sent in random order / random mix to the two controllers.
> 
> We DO NOT WAIT for nvme cli requests to finish before sending more. So for example, there could be multiple reset requests, multiple format requests, outstanding at the same time as a sanitize, on both paths at the same time, etc. Combos that don't really make sense - it's just meant to create as much stress as possible.
> 
>  From looking at kernel driver sources and various hang traces, they all go thru the nvme_user_cmd, many have effects (e.g. NVME_CMD_EFFECTS_LBCC, NVME_CMD_EFFECTS_NCC), launch things like namespace invalidation, new scans, etc. as part of processing of those effects.
> 
> I've tried to understand various old threads on the nvme_linux reflector and see several that seem to address issues similar to some of the hang traces we've seen.  For example, these threads:
> 
> - nvme namespace scanning might race with controller reset
> - shorten race window in nvme_ns_remove() -
> - nvme: fix a possible deadlock when passthru commands sent to a multipath device
> 
> We are see hang traces that look like some of those addressed in the threads above (so maybe we are hitting some deadlock scenarios not yet discovered .) - for ex. we see multiple blocked scans, and/or  threads stuck in nvme_ns_remove / nvme_scan_queue / nvme_passthru_start at the same time waiting on the same mutex.
> 
> I've pasted one such trace below.
> 
> I know normally people paste not just problems but solutions on this reflector - we are not Linux kernel devs though so what we're trying to do is find out what we need to do to move forward. We need to be able to avoid any potential deadlocks on the OS side so we can do multi-day stress testing of our devices. If we need to change things on our side, that's fine - we just aren't sure of next steps. By the way, forgot to mention that everything always looks clean on the drive side - no commands outstanding, etc.
> 
> If there is additional information we should/could provide or other outlets we should direct our inquiries to, please let me know.
> 
> Thank any/all of you for your time and please see the trace below:
> 
> 2019-09-19 02:42:11,610 INFO nvmeSanitize:: >>> Executing the command:nvme sanitize /dev/nvme0 --sanact=4  PID:13559 childPID:19919
> 
> 2019-09-19 02:43:11,610 ERROR stopTest:: Function:nvmeSanitize cmd:nvme sanitize /dev/nvme0 --sanact=4  timeout occurred ourPid:13559 childPID:19919; Test run will be stopped Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:19919 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 19919  13559 0x00004080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ns_remove+0x2d/0x130 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_remove_invalid_namespaces+0xf1/0x120 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x2c4/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_dev_ioctl+0x10d/0x1d0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:16,790 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13591 childPID:20094
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20094 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20094      1 0x00000080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:17,380 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --ses=1  PID:13625 childPID:20113
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20113 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20113      1 0x00000080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:22,268 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13586 childPID:20230
> 
> Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task nvme:20230 blocked for more than 122 seconds.
> Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:46:24 IOP-Z370-026 kernel: nvme            D    0 20230      1 0x00000080
> Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:46:24 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:46:24 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:46:24 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:46:24 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:46:24 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> the 2 nvme scan worker kernel threads are also blocked waiting for the same mutex:
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task kworker/u24:3:5351 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: kworker/u24:3   D    0  5351      2 0x80004080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: ? check_preempt_curr+0x6b/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: ? try_to_wake_up+0x203/0x5e0 Sep 19 02:44:21 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:44:21 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0 Sep 19 02:44:21 IOP-Z370-026 kernel: kthread+0xf8/0x130
> 
> Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task kworker/u24:1:2229 blocked for more than 122 seconds.
> Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:46:24 IOP-Z370-026 kernel: kworker/u24:1   D    0  2229      2 0x80004080
> Sep 19 02:46:24 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to+0x7a/0x400 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:46:24 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0
> 
> Thank you,
> Judy Brock, Samsung
> p.s. - I tried to post an email about this a few days ago so if it finally goes thru, sorry for the redundancy.
> 
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIGaQ&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=650RqofpbeiU8wTLHyRmfqlHvmTGVifoySPqbtlF11I&s=olcIVKvCpS7_kUtqsMKSZ0HSngQZ4ETw-3EKBqIRbME&e= 
> 
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme