linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Tong Zhang <ztong0001@gmail.com>, Keith Busch <kbusch@kernel.org>,
	Sagi Grimberg <sagi@grimberg.me>, Sasha Levin <sashal@kernel.org>,
	linux-nvme@lists.infradead.org
Subject: [PATCH AUTOSEL 5.4 29/43] nvme-pci: cancel nvme device request before disabling
Date: Mon,  7 Sep 2020 12:33:15 -0400	[thread overview]
Message-ID: <20200907163329.1280888-29-sashal@kernel.org> (raw)
In-Reply-To: <20200907163329.1280888-1-sashal@kernel.org>

From: Tong Zhang <ztong0001@gmail.com>

[ Upstream commit 7ad92f656bddff4cf8f641e0e3b1acd4eb9644cb ]

This patch addresses an irq free warning and null pointer dereference
error problem when nvme devices got timeout error during initialization.
This problem happens when nvme_timeout() function is called while
nvme_reset_work() is still in execution. This patch fixed the problem by
setting flag of the problematic request to NVME_REQ_CANCELLED before
calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
an error code and let nvme_submit_sync_cmd() fail gracefully.
The following is console output.

[   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[   62.488796] nvme nvme0: could not set timestamp (881)
[   62.494888] ------------[ cut here ]------------
[   62.495142] Trying to free already-free IRQ 11
[   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
[   62.495742] Modules linked in:
[   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
[   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
[   62.497019] RIP: 0010:free_irq+0x1f7/0x370
[   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
[   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
[   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
[   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
[   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
[   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
[   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
[   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   62.501864] Call Trace:
[   62.501993]  pci_free_irq+0x13/0x20
[   62.502167]  nvme_reset_work+0x5d0/0x12a0
[   62.502369]  ? update_load_avg+0x59/0x580
[   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
[   62.502780]  ? try_to_wake_up+0x1a2/0x450
[   62.502979]  process_one_work+0x1d2/0x390
[   62.503179]  worker_thread+0x45/0x3b0
[   62.503361]  ? process_one_work+0x390/0x390
[   62.503568]  kthread+0xf9/0x130
[   62.503726]  ? kthread_park+0x80/0x80
[   62.503911]  ret_from_fork+0x22/0x30
[   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
[  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
[  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
[  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  123.917469] #PF: supervisor write access in kernel mode
[  123.917725] #PF: error_code(0x0002) - not-present page
[  123.917976] PGD 0 P4D 0
[  123.918109] Oops: 0002 [#1] SMP PTI
[  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
[  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
[  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  123.924353] Call Trace:
[  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
[  123.924694]  nvme_reset_work+0xed6/0x12a0
[  123.924898]  process_one_work+0x1d2/0x390
[  123.925099]  worker_thread+0x45/0x3b0
[  123.925280]  ? process_one_work+0x390/0x390
[  123.925486]  kthread+0xf9/0x130
[  123.925642]  ? kthread_park+0x80/0x80
[  123.925825]  ret_from_fork+0x22/0x30
[  123.926004] Modules linked in:
[  123.926158] CR2: 0000000000000000
[  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
[  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 100da11ce98cb..a91433bdf5de4 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1274,8 +1274,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn_ratelimited(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, true);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+		nvme_dev_disable(dev, true);
 		return BLK_EH_DONE;
 	case NVME_CTRL_RESETTING:
 		return BLK_EH_RESET_TIMER;
@@ -1292,10 +1292,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
+		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		nvme_dev_disable(dev, false);
 		nvme_reset_ctrl(&dev->ctrl);
 
-		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	}
 
-- 
2.25.1


  parent reply	other threads:[~2020-09-07 16:56 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 16:32 [PATCH AUTOSEL 5.4 01/43] ARC: HSDK: wireup perf irq Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 02/43] dmaengine: acpi: Put the CSRT table after using it Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 03/43] netfilter: conntrack: allow sctp hearbeat after connection re-use Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 04/43] rxrpc: Keep the ACK serial in a var in rxrpc_input_ack() Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 05/43] drivers/net/wan/lapbether: Added needed_tailroom Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 06/43] NFC: st95hf: Fix memleak in st95hf_in_send_cmd Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 07/43] firestream: Fix memleak in fs_open Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 08/43] ALSA: hda: Fix 2 channel swapping for Tegra Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 09/43] ALSA: hda/tegra: Program WAKEEN register " Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 10/43] drivers/dma/dma-jz4780: Fix race condition between probe and irq handler Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 11/43] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 12/43] net: hns3: Fix for geneve tx checksum bug Sasha Levin
2020-09-07 16:32 ` [PATCH AUTOSEL 5.4 13/43] xfs: fix off-by-one in inode alloc block reservation calculation Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 14/43] drivers/net/wan/lapbether: Set network_header before transmitting Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 15/43] cfg80211: regulatory: reject invalid hints Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 16/43] cfg80211: Adjust 6 GHz frequency to channel conversion Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 17/43] net: usb: Fix uninit-was-stored issue in asix_read_phy_addr() Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 18/43] xfs: initialize the shortform attr header padding entry Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 19/43] irqchip/eznps: Fix build error for !ARC700 builds Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 20/43] nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 21/43] nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 22/43] nvme: have nvme_wait_freeze_timeout return if it timed out Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 23/43] nvme-tcp: serialize controller teardown sequences Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 24/43] nvme-tcp: fix timeout handler Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 25/43] nvme-tcp: fix reset hang if controller died in the middle of a reset Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 26/43] nvme-rdma: serialize controller teardown sequences Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 27/43] nvme-rdma: fix timeout handler Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 28/43] nvme-rdma: fix reset hang if controller died in the middle of a reset Sasha Levin
2020-09-07 16:33 ` Sasha Levin [this message]
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 30/43] HID: quirks: Set INCREMENT_USAGE_ON_DUPLICATE for all Saitek X52 devices Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 31/43] HID: microsoft: Add rumble support for the 8bitdo SN30 Pro+ controller Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 32/43] drivers/net/wan/hdlc_cisco: Add hard_header_len Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 33/43] HID: elan: Fix memleak in elan_input_configured Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 34/43] ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 35/43] cpufreq: intel_pstate: Refuse to turn off with HWP enabled Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 36/43] cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 37/43] net: usb: dm9601: Add USB ID of Keenetic Plus DSL Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 38/43] arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 39/43] ALSA: hda: hdmi - add Rocketlake support Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 40/43] ALSA: hda: fix a runtime pm issue in SOF when integrated GPU is disabled Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 41/43] drm/amdgpu: Fix bug in reporting voltage for CIK Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 42/43] iommu/amd: Do not use IOMMUv2 functionality when SME is active Sasha Levin
2020-09-07 16:33 ` [PATCH AUTOSEL 5.4 43/43] gcov: Disable gcov build with GCC 10 Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200907163329.1280888-29-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    --cc=ztong0001@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).