From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Anton Eidelman <anton@lightbitslabs.com>,
Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 16/65] nvme-multipath: fix deadlock due to head->lock
Date: Tue, 7 Jul 2020 17:16:55 +0200 [thread overview]
Message-ID: <20200707145753.240242645@linuxfoundation.org> (raw)
In-Reply-To: <20200707145752.417212219@linuxfoundation.org>
From: Anton Eidelman <anton@lightbitslabs.com>
[ Upstream commit d8a22f85609fadb46ba699e0136cc3ebdeebff79 ]
In the following scenario scan_work and ana_work will deadlock:
When scan_work calls nvme_mpath_add_disk() this holds ana_lock
and invokes nvme_parse_ana_log(), which may issue IO
in device_add_disk() and hang waiting for an accessible path.
While nvme_mpath_set_live() only called when nvme_state_is_live(),
a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO.
Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on
ANY ctrl will not be able to complete nvme_mpath_set_live()
on the same ns->head, which is required in order to update
the new accessible path and remove NVME_NS_ANA_PENDING..
Therefore IO never completes: deadlock [1].
Fix:
Move device_add_disk out of the head->lock and protect it with an
atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit.
[1]:
kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds.
kernel: Tainted: G OE 5.3.5-050305-generic #201910071830
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kworker/u8:2 D 0 160 2 0x80004000
kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: schedule_preempt_disabled+0xe/0x10
kernel: __mutex_lock.isra.0+0x182/0x4f0
kernel: __mutex_lock_slowpath+0x13/0x20
kernel: mutex_lock+0x2e/0x40
kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core]
kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: nvme_read_ana_log+0x76/0x100 [nvme_core]
kernel: nvme_ana_work+0x15/0x20 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x4d/0x400
kernel: kthread+0x104/0x140
kernel: ret_from_fork+0x35/0x40
kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds.
kernel: Tainted: G OE 5.3.5-050305-generic #201910071830
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kworker/u8:4 D 0 439 2 0x80004000
kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: efi_partition+0x1e6/0x708
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: nvme_scan_work+0x256/0x390 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x4d/0x400
kernel: kthread+0x104/0x140
kernel: ret_from_fork+0x35/0x40
Fixes: 0d0b660f214d ("nvme: add ANA support")
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/multipath.c | 4 ++--
drivers/nvme/host/nvme.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 18f0a05c74b56..574b52e911f08 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -417,11 +417,11 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
if (!head->disk)
return;
- mutex_lock(&head->lock);
- if (!(head->disk->flags & GENHD_FL_UP))
+ if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags))
device_add_disk(&head->subsys->dev, head->disk,
nvme_ns_id_attr_groups);
+ mutex_lock(&head->lock);
if (nvme_path_is_optimized(ns)) {
int node, srcu_idx;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 22e8401352c22..ed02260862cb5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -345,6 +345,8 @@ struct nvme_ns_head {
spinlock_t requeue_lock;
struct work_struct requeue_work;
struct mutex lock;
+ unsigned long flags;
+#define NVME_NSHEAD_DISK_LIVE 0
struct nvme_ns __rcu *current_path[];
#endif
};
--
2.25.1
next prev parent reply other threads:[~2020-07-07 15:33 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-07 15:16 [PATCH 5.4 00/65] 5.4.51-rc1 review Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 01/65] io_uring: make sure async workqueue is canceled on exit Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 02/65] mm: fix swap cache node allocation mask Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 03/65] EDAC/amd64: Read back the scrub rate PCI register on F15h Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 04/65] usbnet: smsc95xx: Fix use-after-free after removal Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 05/65] sched/debug: Make sd->flags sysctl read-only Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 06/65] mm/slub.c: fix corrupted freechain in deactivate_slab() Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 07/65] mm/slub: fix stack overruns with SLUB_STATS Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 08/65] rxrpc: Fix race between incoming ACK parser and retransmitter Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 09/65] usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 10/65] tools lib traceevent: Add append() function helper for appending strings Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 11/65] tools lib traceevent: Handle __attribute__((user)) in field names Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 12/65] s390/debug: avoid kernel warning on too large number of pages Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 13/65] nvme-multipath: set bdi capabilities once Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 14/65] nvme: fix possible deadlock when I/O is blocked Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 15/65] nvme-multipath: fix deadlock between ana_work and scan_work Greg Kroah-Hartman
2020-07-07 15:16 ` Greg Kroah-Hartman [this message]
2020-07-07 15:16 ` [PATCH 5.4 17/65] nvme-multipath: fix bogus request queue reference put Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 18/65] kgdb: Avoid suspicious RCU usage warning Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 19/65] selftests: tpm: Use /bin/sh instead of /bin/bash Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 5.4 20/65] tpm: Fix TIS locality timeout problems Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 21/65] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 22/65] drm/msm/dpu: fix error return code in dpu_encoder_init Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 23/65] rxrpc: Fix afs large storage transmission performance drop Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 24/65] RDMA/counter: Query a counter before release Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 25/65] cxgb4: use unaligned conversion for fetching timestamp Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 26/65] cxgb4: parse TC-U32 key values and masks natively Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 27/65] cxgb4: fix endian conversions for L4 ports in filters Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 28/65] cxgb4: use correct type for all-mask IP address comparison Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 29/65] cxgb4: fix SGE queue dump destination buffer context Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 30/65] hwmon: (max6697) Make sure the OVERT mask is set correctly Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 31/65] hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 32/65] thermal/drivers/mediatek: Fix bank number settings on mt8183 Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 33/65] thermal/drivers/rcar_gen3: Fix undefined temperature if negative Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 34/65] kthread: save thread function Greg Kroah-Hartman
2020-07-07 15:31 ` J. Bruce Fields
2020-07-07 15:17 ` [PATCH 5.4 35/65] nfsd: clients dont need to break their own delegations Greg Kroah-Hartman
2020-07-07 15:31 ` J. Bruce Fields
2020-07-07 17:20 ` Sasha Levin
2020-07-07 17:29 ` Sasha Levin
2020-07-07 17:31 ` J. Bruce Fields
2020-07-07 15:17 ` [PATCH 5.4 36/65] nfsd4: fix nfsdfs reference count loop Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 37/65] nfsd: fix nfsdfs inode reference count leak Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 38/65] drm: sun4i: hdmi: Remove extra HPD polling Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 39/65] virtio-blk: free vblk-vqs in error path of virtblk_probe() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 40/65] SMB3: Honor posix flag for multiuser mounts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 41/65] nvme: fix identify error status silent ignore Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 42/65] nvme: fix a crash in nvme_mpath_add_disk Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 43/65] samples/vfs: avoid warning in statx override Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 44/65] i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 45/65] i2c: mlxcpld: check correct size of maximum RECV_LEN packet Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 46/65] spi: spi-fsl-dspi: Fix external abort on interrupt in resume or exit paths Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 47/65] nfsd: apply umask on fs without ACL support Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 48/65] Revert "ALSA: usb-audio: Improve frames size computation" Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 49/65] SMB3: Honor seal flag for multiuser mounts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 50/65] SMB3: Honor persistent/resilient handle flags " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 51/65] SMB3: Honor lease disabling " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 52/65] SMB3: Honor handletimeout flag " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 53/65] cifs: Fix the target file was deleted when rename failed Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 54/65] MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 55/65] MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 56/65] drm/amd/display: Only revalidate bandwidth on medium and fast updates Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 57/65] drm/amdgpu: use %u rather than %d for sclk/mclk Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 58/65] drm/amdgpu/atomfirmware: fix vram_info fetching for renoir Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 59/65] dma-buf: Move dma_buf_release() from fops to dentry_ops Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 60/65] irqchip/gic: Atomically update affinity Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 61/65] mm, compaction: fully assume capture is not NULL in compact_zone_order() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 62/65] mm, compaction: make capture control handling safe wrt interrupts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 63/65] x86/resctrl: Fix memory bandwidth counter width for AMD Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 64/65] dm zoned: assign max_io_len correctly Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 5.4 65/65] efi: Make it possible to disable efivar_ssdt entirely Greg Kroah-Hartman
2020-07-08 5:30 ` [PATCH 5.4 00/65] 5.4.51-rc1 review Naresh Kamboju
2020-07-08 8:41 ` Jon Hunter
2020-07-08 15:15 ` Greg Kroah-Hartman
2020-07-08 16:58 ` Jon Hunter
2020-07-09 9:30 ` Greg Kroah-Hartman
2020-07-08 15:00 ` Shuah Khan
2020-07-08 17:53 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200707145753.240242645@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=anton@lightbitslabs.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).