: name: possible deadlock in nvme_ns_remove

From: "Javier González" <javier@javigon.com>
To: linux-nvme@lists.infradead.org
Cc: Judy Brock <judy.brock@samsung.com>
Subject: : name: possible deadlock in nvme_ns_remove
Date: Tue, 1 Oct 2019 10:07:26 +0200	[thread overview]
Message-ID: <04580CD6-7652-459D-ABDD-732947B4A6DF@javigon.com> (raw)

[-- Attachment #1.1: Type: text/plain, Size: 11800 bytes --]

Sending this bug report from Judy, who is having problems with the mailing list.

—

Hi,

We are running test cases to stress our device that seem to hit race conditions / deadlocks  - every couple of days or so - where multiple threads are trying to acquire ctrl->subsystem->lock or ctrl->scan_lock.

We'd like to know if our test is valid. If not, we are hoping for some overall guidelines about what a host is not permitted/expected not to do.

We are sending a lot of nvme cli requests to do Sanitize, Format, FW Download, FW Activate, Flush, Get Log, Identify, and reset requests to two controllers that share a namespace - some of those commands target a namespace, some target a controller.  They're sent in random order / random mix to the two controllers.

We DO NOT WAIT for nvme cli requests to finish before sending more. So for example, there could be multiple reset requests, multiple format requests, outstanding at the same time as a sanitize, on both paths at the same time, etc. Combos that don't really make sense - it's just meant to create as much stress as possible.

From looking at kernel driver sources and various hang traces, they all go thru the nvme_user_cmd, many have effects (e.g. NVME_CMD_EFFECTS_LBCC, NVME_CMD_EFFECTS_NCC), launch things like namespace invalidation, new scans, etc. as part of processing of those effects.

I've tried to understand various old threads on the nvme_linux reflector and see several that seem to address issues similar to some of the hang traces we've seen.  For example, these threads:

- nvme namespace scanning might race with controller reset
- shorten race window in nvme_ns_remove() -
- nvme: fix a possible deadlock when passthru commands sent to a multipath device

We are see hang traces that look like some of those addressed in the threads above (so maybe we are hitting some deadlock scenarios not yet discovered .) - for ex. we see multiple blocked scans, and/or  threads stuck in nvme_ns_remove / nvme_scan_queue / nvme_passthru_start at the same time waiting on the same mutex.

I've pasted one such trace below.

I know normally people paste not just problems but solutions on this reflector - we are not Linux kernel devs though so what we're trying to do is find out what we need to do to move forward. We need to be able to avoid any potential deadlocks on the OS side so we can do multi-day stress testing of our devices. If we need to change things on our side, that's fine - we just aren't sure of next steps. By the way, forgot to mention that everything always looks clean on the drive side - no commands outstanding, etc.

If there is additional information we should/could provide or other outlets we should direct our inquiries to, please let me know.

Thank any/all of you for your time and please see the trace below:

2019-09-19 02:42:11,610 INFO nvmeSanitize:: >>> Executing the command:nvme sanitize /dev/nvme0 --sanact=4  PID:13559 childPID:19919

2019-09-19 02:43:11,610 ERROR stopTest:: Function:nvmeSanitize cmd:nvme sanitize /dev/nvme0 --sanact=4  timeout occurred ourPid:13559 childPID:19919; Test run will be stopped Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:19919 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 19919  13559 0x00004080
Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ns_remove+0x2d/0x130 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_remove_invalid_namespaces+0xf1/0x120 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x2c4/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_dev_ioctl+0x10d/0x1d0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

2019-09-19 02:42:16,790 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13591 childPID:20094

Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20094 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20094      1 0x00000080
Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

2019-09-19 02:42:17,380 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --ses=1  PID:13625 childPID:20113

Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task nvme:20113 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: nvme            D    0 20113      1 0x00000080
Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:44:21 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:44:21 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:44:21 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:44:21 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

2019-09-19 02:42:22,268 INFO nvmeFormat:: >>> Executing the command:nvme format /dev/nvme0n1 --lbaf=0 --pil=0 --ms=0 --ses=0 --pi=0  PID:13586 childPID:20230

Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task nvme:20230 blocked for more than 122 seconds.
Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:46:24 IOP-Z370-026 kernel: nvme            D    0 20230      1 0x00000080
Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_user_cmd+0x330/0x390 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_ioctl+0x92/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: blkdev_ioctl+0x8f5/0x9a0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? selinux_file_ioctl+0x13b/0x1f0 Sep 19 02:46:24 IOP-Z370-026 kernel: block_ioctl+0x3d/0x40 Sep 19 02:46:24 IOP-Z370-026 kernel: do_vfs_ioctl+0xa9/0x620 Sep 19 02:46:24 IOP-Z370-026 kernel: ksys_ioctl+0x60/0x90 Sep 19 02:46:24 IOP-Z370-026 kernel: __x64_sys_ioctl+0x16/0x20 Sep 19 02:46:24 IOP-Z370-026 kernel: do_syscall_64+0x5b/0x1d0

the 2 nvme scan worker kernel threads are also blocked waiting for the same mutex:

Sep 19 02:44:21 IOP-Z370-026 kernel: INFO: task kworker/u24:3:5351 blocked for more than 122 seconds.
Sep 19 02:44:21 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:44:21 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:44:21 IOP-Z370-026 kernel: kworker/u24:3   D    0  5351      2 0x80004080
Sep 19 02:44:21 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:44:21 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:44:21 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:44:21 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:44:21 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:44:21 IOP-Z370-026 kernel: ? check_preempt_curr+0x6b/0x90 Sep 19 02:44:21 IOP-Z370-026 kernel: ? try_to_wake_up+0x203/0x5e0 Sep 19 02:44:21 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:44:21 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0 Sep 19 02:44:21 IOP-Z370-026 kernel: kthread+0xf8/0x130

Sep 19 02:46:24 IOP-Z370-026 kernel: INFO: task kworker/u24:1:2229 blocked for more than 122 seconds.
Sep 19 02:46:24 IOP-Z370-026 kernel:      Not tainted 5.3.0-rc6 #1
Sep 19 02:46:24 IOP-Z370-026 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 19 02:46:24 IOP-Z370-026 kernel: kworker/u24:1   D    0  2229      2 0x80004080
Sep 19 02:46:24 IOP-Z370-026 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: Call Trace:
Sep 19 02:46:24 IOP-Z370-026 kernel: ? __schedule+0x2a9/0x650 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule+0x39/0xa0 Sep 19 02:46:24 IOP-Z370-026 kernel: schedule_preempt_disabled+0xa/0x10
Sep 19 02:46:24 IOP-Z370-026 kernel: __mutex_lock.isra.11+0x233/0x4e0 Sep 19 02:46:24 IOP-Z370-026 kernel: ? nvme_identify_ctrl.isra.58+0x6a/0xb0 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: nvme_scan_work+0xa9/0x380 [nvme_core] Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x40/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to+0x7a/0x400 Sep 19 02:46:24 IOP-Z370-026 kernel: ? __switch_to_asm+0x34/0x70 Sep 19 02:46:24 IOP-Z370-026 kernel: process_one_work+0x171/0x380 Sep 19 02:46:24 IOP-Z370-026 kernel: worker_thread+0x49/0x3f0

Thank you,
Judy Brock, Samsung
p.s. - I tried to post an email about this a few days ago so if it finally goes thru, sorry for the redundancy.

[-- Attachment #1.2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme