From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Tong Zhang <ztong0001@gmail.com>, Keith Busch <kbusch@kernel.org>,
Sagi Grimberg <sagi@grimberg.me>, Sasha Levin <sashal@kernel.org>,
linux-nvme@lists.infradead.org
Subject: [PATCH AUTOSEL 5.4 29/43] nvme-pci: cancel nvme device request before disabling
Date: Mon, 7 Sep 2020 12:33:15 -0400 [thread overview]
Message-ID: <20200907163329.1280888-29-sashal@kernel.org> (raw)
In-Reply-To: <20200907163329.1280888-1-sashal@kernel.org>
From: Tong Zhang <ztong0001@gmail.com>
[ Upstream commit 7ad92f656bddff4cf8f641e0e3b1acd4eb9644cb ]
This patch addresses an irq free warning and null pointer dereference
error problem when nvme devices got timeout error during initialization.
This problem happens when nvme_timeout() function is called while
nvme_reset_work() is still in execution. This patch fixed the problem by
setting flag of the problematic request to NVME_REQ_CANCELLED before
calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
an error code and let nvme_submit_sync_cmd() fail gracefully.
The following is console output.
[ 62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[ 62.488796] nvme nvme0: could not set timestamp (881)
[ 62.494888] ------------[ cut here ]------------
[ 62.495142] Trying to free already-free IRQ 11
[ 62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
[ 62.495742] Modules linked in:
[ 62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
[ 62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[ 62.496772] Workqueue: nvme-reset-wq nvme_reset_work
[ 62.497019] RIP: 0010:free_irq+0x1f7/0x370
[ 62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
[ 62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
[ 62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
[ 62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
[ 62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
[ 62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
[ 62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
[ 62.500140] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[ 62.500534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[ 62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 62.501864] Call Trace:
[ 62.501993] pci_free_irq+0x13/0x20
[ 62.502167] nvme_reset_work+0x5d0/0x12a0
[ 62.502369] ? update_load_avg+0x59/0x580
[ 62.502569] ? ttwu_queue_wakelist+0xa8/0xc0
[ 62.502780] ? try_to_wake_up+0x1a2/0x450
[ 62.502979] process_one_work+0x1d2/0x390
[ 62.503179] worker_thread+0x45/0x3b0
[ 62.503361] ? process_one_work+0x390/0x390
[ 62.503568] kthread+0xf9/0x130
[ 62.503726] ? kthread_park+0x80/0x80
[ 62.503911] ret_from_fork+0x22/0x30
[ 62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
[ 123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
[ 123.914670] nvme nvme0: 1/0/0 default/read/poll queues
[ 123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 123.917469] #PF: supervisor write access in kernel mode
[ 123.917725] #PF: error_code(0x0002) - not-present page
[ 123.917976] PGD 0 P4D 0
[ 123.918109] Oops: 0002 [#1] SMP PTI
[ 123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G W 5.8.0+ #8
[ 123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[ 123.919219] Workqueue: nvme-reset-wq nvme_reset_work
[ 123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[ 123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[ 123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[ 123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[ 123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[ 123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[ 123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[ 123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[ 123.922641] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[ 123.923032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[ 123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 123.924353] Call Trace:
[ 123.924479] blk_mq_alloc_tag_set+0x137/0x2a0
[ 123.924694] nvme_reset_work+0xed6/0x12a0
[ 123.924898] process_one_work+0x1d2/0x390
[ 123.925099] worker_thread+0x45/0x3b0
[ 123.925280] ? process_one_work+0x390/0x390
[ 123.925486] kthread+0xf9/0x130
[ 123.925642] ? kthread_park+0x80/0x80
[ 123.925825] ret_from_fork+0x22/0x30
[ 123.926004] Modules linked in:
[ 123.926158] CR2: 0000000000000000
[ 123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
[ 123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[ 123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[ 123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[ 123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[ 123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[ 123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[ 123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[ 123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[ 123.929715] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[ 123.930106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[ 123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/pci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 100da11ce98cb..a91433bdf5de4 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1274,8 +1274,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
dev_warn_ratelimited(dev->ctrl.device,
"I/O %d QID %d timeout, disable controller\n",
req->tag, nvmeq->qid);
- nvme_dev_disable(dev, true);
nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+ nvme_dev_disable(dev, true);
return BLK_EH_DONE;
case NVME_CTRL_RESETTING:
return BLK_EH_RESET_TIMER;
@@ -1292,10 +1292,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
dev_warn(dev->ctrl.device,
"I/O %d QID %d timeout, reset controller\n",
req->tag, nvmeq->qid);
+ nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_dev_disable(dev, false);
nvme_reset_ctrl(&dev->ctrl);
- nvme_req(req)->flags |= NVME_REQ_CANCELLED;
return BLK_EH_DONE;
}
--
2.25.1
next prev parent reply other threads:[~2020-09-07 16:56 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-07 16:32 [PATCH AUTOSEL 5.4 01/43] ARC: HSDK: wireup perf irq Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 02/43] dmaengine: acpi: Put the CSRT table after using it Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 03/43] netfilter: conntrack: allow sctp hearbeat after connection re-use Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 04/43] rxrpc: Keep the ACK serial in a var in rxrpc_input_ack() Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 05/43] drivers/net/wan/lapbether: Added needed_tailroom Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 06/43] NFC: st95hf: Fix memleak in st95hf_in_send_cmd Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 07/43] firestream: Fix memleak in fs_open Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 08/43] ALSA: hda: Fix 2 channel swapping for Tegra Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 09/43] ALSA: hda/tegra: Program WAKEEN register " Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 10/43] drivers/dma/dma-jz4780: Fix race condition between probe and irq handler Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 11/43] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 12/43] net: hns3: Fix for geneve tx checksum bug Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 13/43] xfs: fix off-by-one in inode alloc block reservation calculation Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 14/43] drivers/net/wan/lapbether: Set network_header before transmitting Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 15/43] cfg80211: regulatory: reject invalid hints Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 16/43] cfg80211: Adjust 6 GHz frequency to channel conversion Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 17/43] net: usb: Fix uninit-was-stored issue in asix_read_phy_addr() Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 18/43] xfs: initialize the shortform attr header padding entry Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 19/43] irqchip/eznps: Fix build error for !ARC700 builds Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 20/43] nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 21/43] nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 22/43] nvme: have nvme_wait_freeze_timeout return if it timed out Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 23/43] nvme-tcp: serialize controller teardown sequences Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 24/43] nvme-tcp: fix timeout handler Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 25/43] nvme-tcp: fix reset hang if controller died in the middle of a reset Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 26/43] nvme-rdma: serialize controller teardown sequences Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 27/43] nvme-rdma: fix timeout handler Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 28/43] nvme-rdma: fix reset hang if controller died in the middle of a reset Sasha Levin
2020-09-07 16:33 ` Sasha Levin [this message]
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 30/43] HID: quirks: Set INCREMENT_USAGE_ON_DUPLICATE for all Saitek X52 devices Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 31/43] HID: microsoft: Add rumble support for the 8bitdo SN30 Pro+ controller Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 32/43] drivers/net/wan/hdlc_cisco: Add hard_header_len Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 33/43] HID: elan: Fix memleak in elan_input_configured Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 34/43] ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 35/43] cpufreq: intel_pstate: Refuse to turn off with HWP enabled Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 36/43] cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 37/43] net: usb: dm9601: Add USB ID of Keenetic Plus DSL Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 38/43] arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 39/43] ALSA: hda: hdmi - add Rocketlake support Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 40/43] ALSA: hda: fix a runtime pm issue in SOF when integrated GPU is disabled Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 41/43] drm/amdgpu: Fix bug in reporting voltage for CIK Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 42/43] iommu/amd: Do not use IOMMUv2 functionality when SME is active Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 43/43] gcov: Disable gcov build with GCC 10 Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200907163329.1280888-29-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
--cc=ztong0001@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).