Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* : name: possible deadlock in nvme_ns_remove 
@ 2019-10-01  8:07 Javier González
  2019-10-01 18:08 ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Javier González @ 2019-10-01  8:07 UTC (permalink / raw)
  To: linux-nvme; +Cc: Judy Brock

[-- Attachment #1.1: Type: text/plain, Size: 11800 bytes --]

Sending this bug report from Judy, who is having problems with the mailing list.

—

Hi,

We are running test cases to stress our device that seem to hit race conditions / deadlocks  - every couple of days or so - where multiple threads are trying to acquire ctrl->subsystem->lock or ctrl->scan_lock.

We'd like to know if our test is valid. If not, we are hoping for some overall guidelines about what a host is not permitted/expected not to do.

We are sending a lot of nvme cli requests to do Sanitize, Format, FW Download, FW Activate, Flush, Get Log, Identify, and reset requests to two controllers that share a namespace - some of those commands target a namespace, some target a controller.  They're sent in random order / random mix to the two controllers.

We DO NOT WAIT for nvme cli requests to finish before sending more. So for example, there could be multiple reset requests, multiple format requests, outstanding at the same time as a sanitize, on both paths at the same time, etc. Combos that don't really make sense - it's just meant to create as much stress as possible.

From looking at kernel driver sources and various hang traces, they all go thru the nvme_user_cmd, many have effects (e.g. NVME_CMD_EFFECTS_LBCC, NVME_CMD_EFFECTS_NCC), launch things like namespace invalidation, new scans, etc. as part of processing of those effects.

I've tried to understand various old threads on the nvme_linux reflector and see several that seem to address issues similar to some of the hang traces we've seen.  For example, these threads:

- nvme namespace scanning might race with controller reset
- shorten race window in nvme_ns_remove() -
- nvme: fix a possible deadlock when passthru commands sent to a multipath device

We are see hang traces that look like some of those addressed in the threads above (so maybe we are hitting some deadlock scenarios not yet discovered .) - for ex. we see multiple blocked scans, and/or  threads stuck in nvme_ns_remove / nvme_scan_queue / nvme_passthru_start at the same time waiting on the same mutex.

I've pasted one such trace below.

I know normally people paste not just problems but solutions on this reflector - we are not Linux kernel devs though so what we're trying to do is find out what we need to do to move forward. We need to be able to avoid any potential deadlocks on the OS side so we can do multi-day stress testing of our devices. If we need to change things on our side, that's fine - we just aren't sure of next steps. By the way, forgot to mention that everything always looks clean on the drive side - no commands outstanding, etc.

If there is additional information we should/could provide or other outlets we should direct our inquiries to, please let me know.

Thank any/all of you for your time and please see the trace below:

2019-09-19 02:42:11,610 INFO nvmeSanitize:: >>> Executing the command:nvme sanitize /dev/nvme0 --sanact=4  PID:13559 childPID:19919

2019-09-19 02:43:11,610 ERROR stopTest:: Function:nvmeSanitize cmd:nvme sanitize /dev/nvme0 --sanact=4  timeout occurred ourPid:13559 childPID:19919; Test run will be stopped Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:19919 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 19919  13559 0x00004080
Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ns_remove+0x2d/0x130 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_remove_invalid_namespaces+0xf1/0x120 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x2c4/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_dev_ioctl+0x10d/0x1d0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

2019-09-19 02:42:16,790 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13591 childPID:20094

Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20094 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20094      1 0x00000080
Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

2019-09-19 02:42:17,380 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --ses=1  PID:13625 childPID:20113

Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20113 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20113      1 0x00000080
Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

2019-09-19 02:42:22,268 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13586 childPID:20230

Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task nvme:20230 blocked for more than 122 seconds.
Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:46:24 IOP-Z370-026 kernel: nvme            D    0 20230      1 0x00000080
Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:46:24 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:46:24 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:46:24 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:46:24 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:46:24 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

the 2 nvme scan worker kernel threads are also blocked waiting for the same mutex:

Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task kworker/u24:3:5351 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: kworker/u24:3   D    0  5351      2 0x80004080
Sep 19 02:44:21 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: ? check_preempt_curr+0x6b/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: ? try_to_wake_up+0x203/0x5e0 Sep 19 02:44:21 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:44:21 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0 Sep 19 02:44:21 IOP-Z370-026 kernel: kthread+0xf8/0x130

Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task kworker/u24:1:2229 blocked for more than 122 seconds.
Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:46:24 IOP-Z370-026 kernel: kworker/u24:1   D    0  2229      2 0x80004080
Sep 19 02:46:24 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to+0x7a/0x400 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:46:24 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0

Thank you,
Judy Brock, Samsung
p.s. - I tried to post an email about this a few days ago so if it finally goes thru, sorry for the redundancy.



[-- Attachment #1.2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: : name: possible deadlock in nvme_ns_remove
  2019-10-01  8:07 : name: possible deadlock in nvme_ns_remove Javier González
@ 2019-10-01 18:08 ` Sagi Grimberg
  2019-10-01 22:09   ` Judy Brock
  2019-10-02  8:49   ` Keith Busch
  0 siblings, 2 replies; 5+ messages in thread
From: Sagi Grimberg @ 2019-10-01 18:08 UTC (permalink / raw)
  To: Javier González, linux-nvme; +Cc: Judy Brock

> Sending this bug report from Judy, who is having problems with the mailing list.

Hey Judy,

It appears that you indeed step on a real issue. It seems that
if nvme_update_formats will fail the revalidate we will acquire
the sybsys->lock and hang.

Does this change solve the issue?
-- 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd7dea36c3b6..dc785b1aff8a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1306,8 +1306,6 @@ static void nvme_update_formats(struct nvme_ctrl 
*ctrl)
                 if (ns->disk && nvme_revalidate_disk(ns->disk))
                         nvme_set_queue_dying(ns);
         up_read(&ctrl->namespaces_rwsem);
-
-       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
  }

  static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
@@ -1325,6 +1323,9 @@ static void nvme_passthru_end(struct nvme_ctrl 
*ctrl, u32 effects)
                 mutex_unlock(&ctrl->subsys->lock);
                 mutex_unlock(&ctrl->scan_lock);
         }
+
+       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
+
         if (effects & NVME_CMD_EFFECTS_CCC)
                 nvme_init_identify(ctrl);
         if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
-- 

> 
> —
> 
> Hi,
> 
> We are running test cases to stress our device that seem to hit race conditions / deadlocks  - every couple of days or so - where multiple threads are trying to acquire ctrl->subsystem->lock or ctrl->scan_lock.
> 
> We'd like to know if our test is valid. If not, we are hoping for some overall guidelines about what a host is not permitted/expected not to do.
> 
> We are sending a lot of nvme cli requests to do Sanitize, Format, FW Download, FW Activate, Flush, Get Log, Identify, and reset requests to two controllers that share a namespace - some of those commands target a namespace, some target a controller.  They're sent in random order / random mix to the two controllers.
> 
> We DO NOT WAIT for nvme cli requests to finish before sending more. So for example, there could be multiple reset requests, multiple format requests, outstanding at the same time as a sanitize, on both paths at the same time, etc. Combos that don't really make sense - it's just meant to create as much stress as possible.
> 
>  From looking at kernel driver sources and various hang traces, they all go thru the nvme_user_cmd, many have effects (e.g. NVME_CMD_EFFECTS_LBCC, NVME_CMD_EFFECTS_NCC), launch things like namespace invalidation, new scans, etc. as part of processing of those effects.
> 
> I've tried to understand various old threads on the nvme_linux reflector and see several that seem to address issues similar to some of the hang traces we've seen.  For example, these threads:
> 
> - nvme namespace scanning might race with controller reset
> - shorten race window in nvme_ns_remove() -
> - nvme: fix a possible deadlock when passthru commands sent to a multipath device
> 
> We are see hang traces that look like some of those addressed in the threads above (so maybe we are hitting some deadlock scenarios not yet discovered .) - for ex. we see multiple blocked scans, and/or  threads stuck in nvme_ns_remove / nvme_scan_queue / nvme_passthru_start at the same time waiting on the same mutex.
> 
> I've pasted one such trace below.
> 
> I know normally people paste not just problems but solutions on this reflector - we are not Linux kernel devs though so what we're trying to do is find out what we need to do to move forward. We need to be able to avoid any potential deadlocks on the OS side so we can do multi-day stress testing of our devices. If we need to change things on our side, that's fine - we just aren't sure of next steps. By the way, forgot to mention that everything always looks clean on the drive side - no commands outstanding, etc.
> 
> If there is additional information we should/could provide or other outlets we should direct our inquiries to, please let me know.
> 
> Thank any/all of you for your time and please see the trace below:
> 
> 2019-09-19 02:42:11,610 INFO nvmeSanitize:: >>> Executing the command:nvme sanitize /dev/nvme0 --sanact=4  PID:13559 childPID:19919
> 
> 2019-09-19 02:43:11,610 ERROR stopTest:: Function:nvmeSanitize cmd:nvme sanitize /dev/nvme0 --sanact=4  timeout occurred ourPid:13559 childPID:19919; Test run will be stopped Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:19919 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 19919  13559 0x00004080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ns_remove+0x2d/0x130 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_remove_invalid_namespaces+0xf1/0x120 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x2c4/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_dev_ioctl+0x10d/0x1d0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:16,790 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13591 childPID:20094
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20094 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20094      1 0x00000080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:17,380 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --ses=1  PID:13625 childPID:20113
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20113 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20113      1 0x00000080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:22,268 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13586 childPID:20230
> 
> Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task nvme:20230 blocked for more than 122 seconds.
> Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:46:24 IOP-Z370-026 kernel: nvme            D    0 20230      1 0x00000080
> Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:46:24 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:46:24 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:46:24 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:46:24 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:46:24 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> the 2 nvme scan worker kernel threads are also blocked waiting for the same mutex:
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task kworker/u24:3:5351 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: kworker/u24:3   D    0  5351      2 0x80004080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: ? check_preempt_curr+0x6b/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: ? try_to_wake_up+0x203/0x5e0 Sep 19 02:44:21 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:44:21 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0 Sep 19 02:44:21 IOP-Z370-026 kernel: kthread+0xf8/0x130
> 
> Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task kworker/u24:1:2229 blocked for more than 122 seconds.
> Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:46:24 IOP-Z370-026 kernel: kworker/u24:1   D    0  2229      2 0x80004080
> Sep 19 02:46:24 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to+0x7a/0x400 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:46:24 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0
> 
> Thank you,
> Judy Brock, Samsung
> p.s. - I tried to post an email about this a few days ago so if it finally goes thru, sorry for the redundancy.
> 
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: : name: possible deadlock in nvme_ns_remove
  2019-10-01 18:08 ` Sagi Grimberg
@ 2019-10-01 22:09   ` Judy Brock
  2019-10-02  8:49   ` Keith Busch
  1 sibling, 0 replies; 5+ messages in thread
From: Judy Brock @ 2019-10-01 22:09 UTC (permalink / raw)
  To: Sagi Grimberg, Javier González, linux-nvme; +Cc: Judy Brock

Thank you Sagi - we are trying it now. If it does not fail within a few days I think it will be safe to say it worked (I need to check with the test team to know exactly what the failure cadence has been).

I think this email might not make it to the mailing list so if you could please post my response I would appreciate it. Thank you so much for the quick response!

Judy Brock
Samsung

-----Original Message-----
From: Sagi Grimberg [mailto:sagi@grimberg.me] 
Sent: Tuesday, October 1, 2019 11:08 AM
To: Javier González; linux-nvme@lists.infradead.org
Cc: Judy Brock
Subject: Re: : name: possible deadlock in nvme_ns_remove

> Sending this bug report from Judy, who is having problems with the mailing list.

Hey Judy,

It appears that you indeed step on a real issue. It seems that
if nvme_update_formats will fail the revalidate we will acquire
the sybsys->lock and hang.

Does this change solve the issue?
-- 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd7dea36c3b6..dc785b1aff8a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1306,8 +1306,6 @@ static void nvme_update_formats(struct nvme_ctrl 
*ctrl)
                 if (ns->disk && nvme_revalidate_disk(ns->disk))
                         nvme_set_queue_dying(ns);
         up_read(&ctrl->namespaces_rwsem);
-
-       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
  }

  static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
@@ -1325,6 +1323,9 @@ static void nvme_passthru_end(struct nvme_ctrl 
*ctrl, u32 effects)
                 mutex_unlock(&ctrl->subsys->lock);
                 mutex_unlock(&ctrl->scan_lock);
         }
+
+       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
+
         if (effects & NVME_CMD_EFFECTS_CCC)
                 nvme_init_identify(ctrl);
         if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
-- 

> 
> —
> 
> Hi,
> 
> We are running test cases to stress our device that seem to hit race conditions / deadlocks  - every couple of days or so - where multiple threads are trying to acquire ctrl->subsystem->lock or ctrl->scan_lock.
> 
> We'd like to know if our test is valid. If not, we are hoping for some overall guidelines about what a host is not permitted/expected not to do.
> 
> We are sending a lot of nvme cli requests to do Sanitize, Format, FW Download, FW Activate, Flush, Get Log, Identify, and reset requests to two controllers that share a namespace - some of those commands target a namespace, some target a controller.  They're sent in random order / random mix to the two controllers.
> 
> We DO NOT WAIT for nvme cli requests to finish before sending more. So for example, there could be multiple reset requests, multiple format requests, outstanding at the same time as a sanitize, on both paths at the same time, etc. Combos that don't really make sense - it's just meant to create as much stress as possible.
> 
>  From looking at kernel driver sources and various hang traces, they all go thru the nvme_user_cmd, many have effects (e.g. NVME_CMD_EFFECTS_LBCC, NVME_CMD_EFFECTS_NCC), launch things like namespace invalidation, new scans, etc. as part of processing of those effects.
> 
> I've tried to understand various old threads on the nvme_linux reflector and see several that seem to address issues similar to some of the hang traces we've seen.  For example, these threads:
> 
> - nvme namespace scanning might race with controller reset
> - shorten race window in nvme_ns_remove() -
> - nvme: fix a possible deadlock when passthru commands sent to a multipath device
> 
> We are see hang traces that look like some of those addressed in the threads above (so maybe we are hitting some deadlock scenarios not yet discovered .) - for ex. we see multiple blocked scans, and/or  threads stuck in nvme_ns_remove / nvme_scan_queue / nvme_passthru_start at the same time waiting on the same mutex.
> 
> I've pasted one such trace below.
> 
> I know normally people paste not just problems but solutions on this reflector - we are not Linux kernel devs though so what we're trying to do is find out what we need to do to move forward. We need to be able to avoid any potential deadlocks on the OS side so we can do multi-day stress testing of our devices. If we need to change things on our side, that's fine - we just aren't sure of next steps. By the way, forgot to mention that everything always looks clean on the drive side - no commands outstanding, etc.
> 
> If there is additional information we should/could provide or other outlets we should direct our inquiries to, please let me know.
> 
> Thank any/all of you for your time and please see the trace below:
> 
> 2019-09-19 02:42:11,610 INFO nvmeSanitize:: >>> Executing the command:nvme sanitize /dev/nvme0 --sanact=4  PID:13559 childPID:19919
> 
> 2019-09-19 02:43:11,610 ERROR stopTest:: Function:nvmeSanitize cmd:nvme sanitize /dev/nvme0 --sanact=4  timeout occurred ourPid:13559 childPID:19919; Test run will be stopped Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:19919 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 19919  13559 0x00004080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ns_remove+0x2d/0x130 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_remove_invalid_namespaces+0xf1/0x120 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x2c4/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_dev_ioctl+0x10d/0x1d0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:16,790 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13591 childPID:20094
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20094 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20094      1 0x00000080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:17,380 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --ses=1  PID:13625 childPID:20113
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20113 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20113      1 0x00000080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> 2019-09-19 02:42:22,268 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13586 childPID:20230
> 
> Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task nvme:20230 blocked for more than 122 seconds.
> Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:46:24 IOP-Z370-026 kernel: nvme            D    0 20230      1 0x00000080
> Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:46:24 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:46:24 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:46:24 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:46:24 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:46:24 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0
> 
> the 2 nvme scan worker kernel threads are also blocked waiting for the same mutex:
> 
> Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task kworker/u24:3:5351 blocked for more than 122 seconds.
> Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:44:21 IOP-Z370-026 kernel: kworker/u24:3   D    0  5351      2 0x80004080
> Sep 19 02:44:21 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: ? check_preempt_curr+0x6b/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: ? try_to_wake_up+0x203/0x5e0 Sep 19 02:44:21 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:44:21 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0 Sep 19 02:44:21 IOP-Z370-026 kernel: kthread+0xf8/0x130
> 
> Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task kworker/u24:1:2229 blocked for more than 122 seconds.
> Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
> Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 19 02:46:24 IOP-Z370-026 kernel: kworker/u24:1   D    0  2229      2 0x80004080
> Sep 19 02:46:24 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
> Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
> Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to+0x7a/0x400 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:46:24 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0
> 
> Thank you,
> Judy Brock, Samsung
> p.s. - I tried to post an email about this a few days ago so if it finally goes thru, sorry for the redundancy.
> 
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIGaQ&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=650RqofpbeiU8wTLHyRmfqlHvmTGVifoySPqbtlF11I&s=olcIVKvCpS7_kUtqsMKSZ0HSngQZ4ETw-3EKBqIRbME&e= 
> 
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: : name: possible deadlock in nvme_ns_remove
  2019-10-01 18:08 ` Sagi Grimberg
  2019-10-01 22:09   ` Judy Brock
@ 2019-10-02  8:49   ` Keith Busch
  2019-10-03  3:32     ` Judy Brock
  1 sibling, 1 reply; 5+ messages in thread
From: Keith Busch @ 2019-10-02  8:49 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Javier González, Judy Brock, linux-nvme

On Tue, Oct 01, 2019 at 11:08:06AM -0700, Sagi Grimberg wrote:
> > Sending this bug report from Judy, who is having problems with the mailing list.
> 
> Hey Judy,
> 
> It appears that you indeed step on a real issue. It seems that
> if nvme_update_formats will fail the revalidate we will acquire
> the sybsys->lock and hang.
> 
> Does this change solve the issue?
> -- 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fd7dea36c3b6..dc785b1aff8a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1306,8 +1306,6 @@ static void nvme_update_formats(struct nvme_ctrl
> *ctrl)
>                 if (ns->disk && nvme_revalidate_disk(ns->disk))
>                         nvme_set_queue_dying(ns);
>         up_read(&ctrl->namespaces_rwsem);
> -
> -       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
>  }
> 
>  static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
> @@ -1325,6 +1323,9 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl,
> u32 effects)
>                 mutex_unlock(&ctrl->subsys->lock);
>                 mutex_unlock(&ctrl->scan_lock);
>         }
> +
> +       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
> +
>         if (effects & NVME_CMD_EFFECTS_CCC)
>                 nvme_init_identify(ctrl);
>         if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
> -- 

Thanks, this patch fixes a real issue regardless. I would just recommend moving
the nvme_remove_invalid_namespaces() call from where you have it to after
releasing the subsys->lock, but before releasing the scan_lock.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: : name: possible deadlock in nvme_ns_remove
  2019-10-02  8:49   ` Keith Busch
@ 2019-10-03  3:32     ` Judy Brock
  0 siblings, 0 replies; 5+ messages in thread
From: Judy Brock @ 2019-10-03  3:32 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg; +Cc: Javier González, Judy Brock, linux-nvme

We started the test earlier today w/original patch I'm assuming we should stop it and make the new minor modification, right? Just wanted to confirm.

Thanks,
Judy

-----Original Message-----
From: Keith Busch [mailto:kbusch@kernel.org] 
Sent: Wednesday, October 2, 2019 1:49 AM
To: Sagi Grimberg
Cc: Javier González; linux-nvme@lists.infradead.org; Judy Brock
Subject: Re: : name: possible deadlock in nvme_ns_remove

On Tue, Oct 01, 2019 at 11:08:06AM -0700, Sagi Grimberg wrote:
> > Sending this bug report from Judy, who is having problems with the mailing list.
> 
> Hey Judy,
> 
> It appears that you indeed step on a real issue. It seems that
> if nvme_update_formats will fail the revalidate we will acquire
> the sybsys->lock and hang.
> 
> Does this change solve the issue?
> -- 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fd7dea36c3b6..dc785b1aff8a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1306,8 +1306,6 @@ static void nvme_update_formats(struct nvme_ctrl
> *ctrl)
>                 if (ns->disk && nvme_revalidate_disk(ns->disk))
>                         nvme_set_queue_dying(ns);
>         up_read(&ctrl->namespaces_rwsem);
> -
> -       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
>  }
> 
>  static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
> @@ -1325,6 +1323,9 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl,
> u32 effects)
>                 mutex_unlock(&ctrl->subsys->lock);
>                 mutex_unlock(&ctrl->scan_lock);
>         }
> +
> +       nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
> +
>         if (effects & NVME_CMD_EFFECTS_CCC)
>                 nvme_init_identify(ctrl);
>         if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
> -- 

Thanks, this patch fixes a real issue regardless. I would just recommend moving
the nvme_remove_invalid_namespaces() call from where you have it to after
releasing the subsys->lock, but before releasing the scan_lock.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-01  8:07 : name: possible deadlock in nvme_ns_remove Javier González
2019-10-01 18:08 ` Sagi Grimberg
2019-10-01 22:09   ` Judy Brock
2019-10-02  8:49   ` Keith Busch
2019-10-03  3:32     ` Judy Brock

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org linux-nvme@archiver.kernel.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/ public-inbox