linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>, Jens Axboe <axboe@kernel.dk>,
	Sasha Levin <sashal@kernel.org>,
	linux-nvme@lists.infradead.org
Subject: [PATCH AUTOSEL 4.20 60/77] nvme-rdma: fix timeout handler
Date: Thu, 14 Feb 2019 21:08:38 -0500	[thread overview]
Message-ID: <20190215020855.176727-60-sashal@kernel.org> (raw)
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Sagi Grimberg <sagi@grimberg.me>

[ Upstream commit 4c174e6366746ae8d49f9cc409f728eebb7a9ac9 ]

Currently, we have several problems with the timeout
handler:
1. If we timeout on the controller establishment flow, we will hang
because we don't execute the error recovery (and we shouldn't because
the create_ctrl flow needs to fail and cleanup on its own)
2. We might also hang if we get a disconnet on a queue while the
controller is already deleting. This racy flow can cause the controller
disable/shutdown admin command to hang.

We cannot complete a timed out request from the timeout handler without
mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work).
So we serialize it in the timeout handler and teardown io and admin
queues to guarantee that no one races with us from completing the
request.

Reported-by: Jaesoo Lee <jalee@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/nvme/host/rdma.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ab6ec7295bf9..6e24b20304b5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1679,18 +1679,28 @@ static enum blk_eh_timer_return
 nvme_rdma_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_rdma_queue *queue = req->queue;
+	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
 
-	dev_warn(req->queue->ctrl->ctrl.device,
-		 "I/O %d QID %d timeout, reset controller\n",
-		 rq->tag, nvme_rdma_queue_idx(req->queue));
+	dev_warn(ctrl->ctrl.device, "I/O %d QID %d timeout\n",
+		 rq->tag, nvme_rdma_queue_idx(queue));
 
-	/* queue error recovery */
-	nvme_rdma_error_recovery(req->queue->ctrl);
+	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
+		/*
+		 * Teardown immediately if controller times out while starting
+		 * or we are already started error recovery. all outstanding
+		 * requests are completed on shutdown, so we return BLK_EH_DONE.
+		 */
+		flush_work(&ctrl->err_work);
+		nvme_rdma_teardown_io_queues(ctrl, false);
+		nvme_rdma_teardown_admin_queue(ctrl, false);
+		return BLK_EH_DONE;
+	}
 
-	/* fail with DNR on cmd timeout */
-	nvme_req(rq)->status = NVME_SC_ABORT_REQ | NVME_SC_DNR;
+	dev_warn(ctrl->ctrl.device, "starting error recovery\n");
+	nvme_rdma_error_recovery(ctrl);
 
-	return BLK_EH_DONE;
+	return BLK_EH_RESET_TIMER;
 }
 
 static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
-- 
2.19.1


  parent reply	other threads:[~2019-02-15  2:37 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15  2:07 [PATCH AUTOSEL 4.20 01/77] drm/msm: Unblock writer if reader closes file Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 02/77] ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 03/77] ALSA: compress: prevent potential divide by zero bugs Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 04/77] ASoC: rt5682: Fix recording no sound issue Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 05/77] ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 06/77] ASoC: soc-core: defer card probe until all component is added to list Sasha Levin
2019-02-15 11:57   ` Mark Brown
2019-02-27 17:31     ` Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 07/77] clk: tegra: dfll: Fix a potential Oop in remove() Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 08/77] clk: sysfs: fix invalid JSON in clk_dump Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 09/77] clk: vc5: Abort clock configuration without upstream clock Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 10/77] ASoC: soc-core: Hold client_mutex around soc_init_dai_link() Sasha Levin
2019-02-15 11:57   ` Mark Brown
2019-02-27 17:31     ` Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 11/77] thermal: int340x_thermal: Fix a NULL vs IS_ERR() check Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 12/77] usb: dwc3: gadget: synchronize_irq dwc irq in suspend Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 13/77] usb: dwc3: gadget: Fix the uninitialized link_state when udc starts Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 14/77] usb: gadget: Potential NULL dereference on allocation error Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 15/77] HID: i2c-hid: Disable runtime PM on Goodix touchpad Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 16/77] ASoC: soc-core: fix init platform memory handling Sasha Levin
2019-02-15 11:58   ` Mark Brown
2019-02-27 17:34     ` Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 17/77] ASoC: core: Make snd_soc_find_component() more robust Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 18/77] selftests: rtc: rtctest: fix alarm tests Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 19/77] selftests: rtc: rtctest: add alarm test on minute boundary Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 20/77] genirq: Make sure the initial affinity is not empty Sasha Levin
2019-02-15  2:07 ` [PATCH AUTOSEL 4.20 21/77] x86/mm/mem_encrypt: Fix erroneous sizeof() Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 22/77] ASoC: core: Don't defer probe on optional, NULL components Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 23/77] ASoC: rt5682: Fix PLL source register definitions Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 24/77] ASoC: dapm: change snprintf to scnprintf for possible overflow Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 25/77] ASoC: imx-audmux: " Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 26/77] selftests/vm/gup_benchmark.c: match gup struct to kernel Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 27/77] phy: ath79-usb: Fix the power on error path Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 28/77] phy: ath79-usb: Fix the main reset name to match the DT binding Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 29/77] selftests: seccomp: use LDLIBS instead of LDFLAGS Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 30/77] selftests: gpio-mockup-chardev: Check asprintf() for error Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 31/77] irqchip/gic-v3-mbi: Fix uninitialized mbi_lock Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 32/77] ARC: fix __ffs return value to avoid build warnings Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 33/77] ARC: show_regs: lockdep: avoid page allocator Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 34/77] drivers: thermal: int340x_thermal: Fix sysfs race condition Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 35/77] staging: rtl8723bs: Fix build error with Clang when inlining is disabled Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 36/77] mac80211: fix miscounting of ttl-dropped frames Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 37/77] sched/wait: Fix rcuwait_wake_up() ordering Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 38/77] sched/wake_q: Fix wakeup ordering for wake_q Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 39/77] futex: Fix (possible) missed wakeup Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 40/77] locking/rwsem: " Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 41/77] libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 42/77] dm: fix clone_bio() to trigger blk_recount_segments() Sasha Levin
2019-02-15  3:49   ` Mike Snitzer
2019-02-27 17:38     ` Sasha Levin
2019-02-27 18:48       ` Mike Snitzer
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 43/77] drm/amd/powerplay: OD setting fix on Vega10 Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 44/77] tty: serial: qcom_geni_serial: Allow mctrl when flow control is disabled Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 45/77] serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 46/77] drm/sun4i: hdmi: Fix usage of TMDS clock Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 47/77] staging: android: ion: Support cpu access during dma_buf_detach Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 48/77] direct-io: allow direct writes to empty inodes Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 49/77] block: cover another queue enter recursion via BIO_QUEUE_ENTERED Sasha Levin
2019-02-15  2:24   ` Tetsuo Handa
2019-02-15  2:28   ` Ming Lei
2019-02-27 17:39     ` Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 50/77] writeback: synchronize sync(2) against cgroup writeback membership switches Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 51/77] scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 52/77] scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 53/77] scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state() Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 54/77] net: altera_tse: fix connect_local_phy error path Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 55/77] hv_netvsc: Fix ethtool change hash key error Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 56/77] hv_netvsc: Refactor assignments of struct netvsc_device_info Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 57/77] hv_netvsc: Fix hash key value reset after other ops Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 58/77] sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 59/77] ax25: fix possible use-after-free Sasha Levin
2019-02-15  2:08 ` Sasha Levin [this message]
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 61/77] nvme-multipath: drop optimization for static ANA group IDs Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 62/77] cifs: fix memory leak of an allocated cifs_ntsd structure Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 63/77] drm/msm: Fix A6XX support for opp-level Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 64/77] drm/msm: avoid unused function warning Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 65/77] CIFS: Do not assume one credit for async responses Sasha Levin
2019-02-15 20:10   ` Pavel Shilovskiy
2019-02-27 17:54     ` Sasha Levin
2019-02-27 19:39       ` Pavel Shilovsky
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 66/77] CIFS: Fix mounts if the client is low on credits Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 67/77] net: usb: asix: ax88772_bind return error when hw_reset fail Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 68/77] net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 69/77] ibmveth: Do not process frames after calling napi_reschedule Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 70/77] mac80211: don't initiate TDLS connection if station is not associated to AP Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 71/77] mac80211: Add attribute aligned(2) to struct 'action' Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 72/77] cfg80211: extend range deviation for DMG Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 73/77] svm: Fix AVIC incomplete IPI emulation Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 74/77] KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 75/77] kvm: selftests: Fix region overlap check in kvm_util Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 76/77] KVM: selftests: check returned evmcs version range Sasha Levin
2019-02-15  2:08 ` [PATCH AUTOSEL 4.20 77/77] Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED" Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190215020855.176727-60-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).