All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 0/3] scsi timeout handling updates
@ 2018-11-26 16:54 Keith Busch
  2018-11-26 16:54 ` [PATCHv4 1/3] blk-mq: Return true if request was completed Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Keith Busch @ 2018-11-26 16:54 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Christoph Hellwig,
	Keith Busch

The iterative update to the previous version taking into account review
comments.

Background:

The main objective is to remove the generic block layer's lock prefix
currently required to transition a request to its completed state by
shifting that expense to lower level drivers that actually need it,
and removing the software layering violation that was required to use
that mechnaism.

Changes since v3:

  The complete state is moved to its own field separate from the
  non-atomic scsi_cmnd "flags" field.

  Code comments added to describe the more obscure handling for fake
  timeout injection.

Keith Busch (3):
  blk-mq: Return true if request was completed
  scsi: Do not rely on blk-mq for double completions
  blk-mq: Simplify request completion state

 block/blk-mq.c            |  9 ++++-----
 drivers/scsi/scsi_error.c | 22 +++++++++++-----------
 drivers/scsi/scsi_lib.c   | 13 ++++++++++++-
 include/linux/blk-mq.h    | 16 +---------------
 include/scsi/scsi_cmnd.h  |  4 ++++
 5 files changed, 32 insertions(+), 32 deletions(-)

-- 
2.14.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv4 1/3] blk-mq: Return true if request was completed
  2018-11-26 16:54 [PATCHv4 0/3] scsi timeout handling updates Keith Busch
@ 2018-11-26 16:54 ` Keith Busch
  2018-11-26 17:00   ` Christoph Hellwig
  2018-11-26 16:54 ` [PATCHv4 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-26 16:54 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Christoph Hellwig,
	Keith Busch

A driver may have internal state to cleanup if we're pretending a request
didn't complete. Return 'false' if the command wasn't actually completed
due to the timeout error injection, and true otherwise.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         | 5 +++--
 include/linux/blk-mq.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 37674c1766a7..7c8cfa0cd420 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -638,11 +638,12 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
  *	Ends all I/O on a request. It does not handle partial completions.
  *	The actual completion happens out-of-order, through a IPI handler.
  **/
-void blk_mq_complete_request(struct request *rq)
+bool blk_mq_complete_request(struct request *rq)
 {
 	if (unlikely(blk_should_fake_timeout(rq->q)))
-		return;
+		return false;
 	__blk_mq_complete_request(rq);
+	return true;
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ca0520ca6437..6e3da356a8eb 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -298,7 +298,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 				bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
-void blk_mq_complete_request(struct request *rq);
+bool blk_mq_complete_request(struct request *rq);
 bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
 			   struct bio *bio);
 bool blk_mq_queue_stopped(struct request_queue *q);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-26 16:54 [PATCHv4 0/3] scsi timeout handling updates Keith Busch
  2018-11-26 16:54 ` [PATCHv4 1/3] blk-mq: Return true if request was completed Keith Busch
@ 2018-11-26 16:54 ` Keith Busch
  2018-11-26 17:01   ` Christoph Hellwig
  2018-11-26 16:54 ` [PATCHv4 3/3] blk-mq: Simplify request completion state Keith Busch
  2018-11-26 17:33 ` [PATCHv4 0/3] scsi timeout handling updates Jens Axboe
  3 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-26 16:54 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Christoph Hellwig,
	Keith Busch

The scsi timeout error handling had been directly updating the block
layer's request state to prevent a error handling and a natural completion
from completing the same request twice. Fix this layering violation
by having scsi control the fate of its commands with scsi owned flags
rather than use blk-mq's.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/scsi/scsi_error.c | 22 +++++++++++-----------
 drivers/scsi/scsi_lib.c   | 13 ++++++++++++-
 include/scsi/scsi_cmnd.h  |  4 ++++
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index dd338a8cd275..16eef068e9e9 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -297,19 +297,19 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 
 	if (rtn == BLK_EH_DONE) {
 		/*
-		 * For blk-mq, we must set the request state to complete now
-		 * before sending the request to the scsi error handler. This
-		 * will prevent a use-after-free in the event the LLD manages
-		 * to complete the request before the error handler finishes
-		 * processing this timed out request.
+		 * Set the command to complete first in order to prevent a real
+		 * completion from releasing the command while error handling
+		 * is using it. If the command was already completed, then the
+		 * lower level driver beat the timeout handler, and it is safe
+		 * to return without escalating error recovery.
 		 *
-		 * If the request was already completed, then the LLD beat the
-		 * time out handler from transferring the request to the scsi
-		 * error handler. In that case we can return immediately as no
-		 * further action is required.
+		 * If timeout handling lost the race to a real completion, the
+		 * block layer may ignore that due to a fake timeout injection,
+		 * so return RESET_TIMER to allow error handling another shot
+		 * at this command.
 		 */
-		if (!blk_mq_mark_complete(req))
-			return rtn;
+		if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state))
+			return BLK_EH_RESET_TIMER;
 		if (scsi_abort_command(scmd) != SUCCESS) {
 			set_host_byte(scmd, DID_TIME_OUT);
 			scsi_eh_scmd_add(scmd);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0df15cb738d2..0dbf25512778 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1642,8 +1642,18 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
 
 static void scsi_mq_done(struct scsi_cmnd *cmd)
 {
+	if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
+		return;
 	trace_scsi_dispatch_cmd_done(cmd);
-	blk_mq_complete_request(cmd->request);
+
+	/*
+	 * If the block layer didn't complete the request due to a timeout
+	 * injection, scsi must clear its internal completed state so that the
+	 * timeout handler will see it needs to escalate its own error
+	 * recovery.
+	 */
+	if (unlikely(!blk_mq_complete_request(cmd->request)))
+		clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
 }
 
 static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
@@ -1702,6 +1712,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (!scsi_host_queue_ready(q, shost, sdev))
 		goto out_dec_target_busy;
 
+	clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
 	if (!(req->rq_flags & RQF_DONTPREP)) {
 		ret = scsi_mq_prep_fn(req);
 		if (ret != BLK_STS_OK)
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index d6fd2aba0380..3de905e205ce 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -61,6 +61,9 @@ struct scsi_pointer {
 /* flags preserved across unprep / reprep */
 #define SCMD_PRESERVED_FLAGS	(SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED)
 
+/* for scmd->state */
+#define SCMD_STATE_COMPLETE	(1 << 0)
+
 struct scsi_cmnd {
 	struct scsi_request req;
 	struct scsi_device *device;
@@ -145,6 +148,7 @@ struct scsi_cmnd {
 
 	int result;		/* Status code from lower level driver */
 	int flags;		/* Command flags */
+	unsigned long state;	/* Command completion state */
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 };
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 3/3] blk-mq: Simplify request completion state
  2018-11-26 16:54 [PATCHv4 0/3] scsi timeout handling updates Keith Busch
  2018-11-26 16:54 ` [PATCHv4 1/3] blk-mq: Return true if request was completed Keith Busch
  2018-11-26 16:54 ` [PATCHv4 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
@ 2018-11-26 16:54 ` Keith Busch
  2018-11-26 17:01   ` Christoph Hellwig
  2018-11-26 17:33 ` [PATCHv4 0/3] scsi timeout handling updates Jens Axboe
  3 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-26 16:54 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Christoph Hellwig,
	Keith Busch

There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         |  4 +---
 include/linux/blk-mq.h | 14 --------------
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7c8cfa0cd420..cda698804422 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -568,9 +568,7 @@ static void __blk_mq_complete_request(struct request *rq)
 	bool shared = false;
 	int cpu;
 
-	if (!blk_mq_mark_complete(rq))
-		return;
-
+	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
 	/*
 	 * Most of single queue controllers, there is only one irq vector
 	 * for handling IO completion, and the only irq's affinity is set
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 6e3da356a8eb..b8de11e0603b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -329,20 +329,6 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q);
 
 unsigned int blk_mq_rq_cpu(struct request *rq);
 
-/**
- * blk_mq_mark_complete() - Set request state to complete
- * @rq: request to set to complete state
- *
- * Returns true if request state was successfully set to complete. If
- * successful, the caller is responsibile for seeing this request is ended, as
- * blk_mq_complete_request will not work again.
- */
-static inline bool blk_mq_mark_complete(struct request *rq)
-{
-	return cmpxchg(&rq->state, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE) ==
-			MQ_RQ_IN_FLIGHT;
-}
-
 /*
  * Driver command data is immediately after the request. So subtract request
  * size to get back to the original request, add request size to get the PDU.
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 1/3] blk-mq: Return true if request was completed
  2018-11-26 16:54 ` [PATCHv4 1/3] blk-mq: Return true if request was completed Keith Busch
@ 2018-11-26 17:00   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-11-26 17:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen,
	Bart Van Assche, Christoph Hellwig

On Mon, Nov 26, 2018 at 09:54:28AM -0700, Keith Busch wrote:
> A driver may have internal state to cleanup if we're pretending a request
> didn't complete. Return 'false' if the command wasn't actually completed
> due to the timeout error injection, and true otherwise.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-26 16:54 ` [PATCHv4 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
@ 2018-11-26 17:01   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-11-26 17:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen,
	Bart Van Assche, Christoph Hellwig

On Mon, Nov 26, 2018 at 09:54:29AM -0700, Keith Busch wrote:
> The scsi timeout error handling had been directly updating the block
> layer's request state to prevent a error handling and a natural completion
> from completing the same request twice. Fix this layering violation
> by having scsi control the fate of its commands with scsi owned flags
> rather than use blk-mq's.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 3/3] blk-mq: Simplify request completion state
  2018-11-26 16:54 ` [PATCHv4 3/3] blk-mq: Simplify request completion state Keith Busch
@ 2018-11-26 17:01   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-11-26 17:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen,
	Bart Van Assche, Christoph Hellwig

On Mon, Nov 26, 2018 at 09:54:30AM -0700, Keith Busch wrote:
> There are no more users relying on blk-mq request states to prevent
> double completions, so replace the relatively expensive cmpxchg operation
> with WRITE_ONCE.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-26 16:54 [PATCHv4 0/3] scsi timeout handling updates Keith Busch
                   ` (2 preceding siblings ...)
  2018-11-26 16:54 ` [PATCHv4 3/3] blk-mq: Simplify request completion state Keith Busch
@ 2018-11-26 17:33 ` Jens Axboe
  2018-11-28  2:20   ` Ming Lei
  3 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2018-11-26 17:33 UTC (permalink / raw)
  To: Keith Busch, linux-scsi, linux-block
  Cc: Martin Petersen, Bart Van Assche, Christoph Hellwig

On 11/26/18 9:54 AM, Keith Busch wrote:
> The iterative update to the previous version taking into account review
> comments.
> 
> Background:
> 
> The main objective is to remove the generic block layer's lock prefix
> currently required to transition a request to its completed state by
> shifting that expense to lower level drivers that actually need it,
> and removing the software layering violation that was required to use
> that mechnaism.

Thanks Keith, added for 4.21.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-26 17:33 ` [PATCHv4 0/3] scsi timeout handling updates Jens Axboe
@ 2018-11-28  2:20   ` Ming Lei
  2018-11-28  7:00     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2018-11-28  2:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, linux-scsi, linux-block, Martin Petersen,
	Bart Van Assche, Christoph Hellwig

On Mon, Nov 26, 2018 at 10:33:32AM -0700, Jens Axboe wrote:
> On 11/26/18 9:54 AM, Keith Busch wrote:
> > The iterative update to the previous version taking into account review
> > comments.
> > 
> > Background:
> > 
> > The main objective is to remove the generic block layer's lock prefix
> > currently required to transition a request to its completed state by
> > shifting that expense to lower level drivers that actually need it,
> > and removing the software layering violation that was required to use
> > that mechnaism.
> 
> Thanks Keith, added for 4.21.

Hi Jens & Keith,

Seems this patchset causes the following kernel panic, since not see
this issue with commit 4ab32bf3305e

[ 1478.366776] nvmet: adding nsid 1 to subsystem testnqn
[ 1478.385104] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:8786b40e-ed54-4f9e-b1b0-6507377ffa97.
[ 1478.386975] nvme nvme1: creating 4 I/O queues.
[ 1478.388784] nvme nvme1: new ctrl: "testnqn"
[ 1479.396523] nvme io: runtime 40
[ 1575.259358] nvme reset: runtime 40
[ 1575.263920] nvme nvme0: resetting controller
[ 1575.288473] nvme nvme0: 4/0/0 read/write/poll queues
[ 1575.291531] nvme nvme1: resetting controller
[ 1575.298338] print_req_error: I/O error, dev nvme1n1, sector 524287272
[ 1575.299169] print_req_error: I/O error, dev nvme1n1, sector 524287328
[ 1575.299950] BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
[ 1575.299951] PGD 0 P4D 0 
[ 1575.299954] Oops: 0002 [#1] PREEMPT SMP PTI
[ 1575.299972] CPU: 0 PID: 12014 Comm: kworker/u8:0 Not tainted 4.20.0-rc3_5f0ed774ed29_for-next+ #1
[ 1575.299973] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[ 1575.299978] Workqueue: nvme-reset-wq nvme_loop_reset_ctrl_work [nvme_loop]
[ 1575.299985] RIP: 0010:blk_mq_free_request+0x7e/0xe4
[ 1575.299987] Code: 00 00 00 00 00 8b 53 18 b8 01 00 00 00 84 d2 74 0b 31 c0 81 e2 00 08 06 00 0f 95 c0 49 ff 84 c5 80 00 00 00 f6 43 1c 40 74 09 <f0> 41 ff 8c 24 98 01 00 00 83 3d 0b c6 52 01 00 74 15 0f b6 43 18
[ 1575.299988] RSP: 0018:ffff888277a03e48 EFLAGS: 00010002
[ 1575.299989] RAX: 0000000000000001 RBX: ffff88823aa04e00 RCX: ffff888265e71dc0
[ 1575.299990] RDX: 0000000000080700 RSI: ffff888265e71c00 RDI: ffff88821d822808
[ 1575.299991] RBP: ffff888267a3d750 R08: 00000000ffffffff R09: ffff888265e71dd0
[ 1575.299992] R10: ffff888265e71dd0 R11: 0000000000000020 R12: 0000000000000000
[ 1575.299993] R13: ffffe8ffffc0be40 R14: 0000000000000000 R15: 0000000000006000
[ 1575.299994] FS:  0000000000000000(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
[ 1575.299995] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1575.299996] CR2: 0000000000000198 CR3: 0000000236de8005 CR4: 0000000000760ef0
[ 1575.300000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1575.300001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1575.300001] PKRU: 55555554
[ 1575.300002] Call Trace:
[ 1575.300012]  <IRQ>
[ 1575.300016]  blk_mq_complete_request+0xda/0xfd
[ 1575.300021]  nvmet_req_complete+0x11/0x5e [nvmet]
[ 1575.300025]  nvmet_bio_done+0x2b/0x3d [nvmet]
[ 1575.300027]  blk_update_request+0x177/0x27b
[ 1575.300032]  ? null_complete_rq+0x11/0x11 [null_blk]
[ 1575.300034]  blk_mq_end_request+0x1a/0xc8
[ 1575.300036]  null_cmd_timer_expired+0xe/0x11 [null_blk]
[ 1575.300040]  __hrtimer_run_queues+0x176/0x238
[ 1575.300043]  ? kvm_clock_read+0x14/0x23
[ 1575.300045]  hrtimer_interrupt+0x103/0x21c
[ 1575.300049]  smp_apic_timer_interrupt+0xd3/0x141
[ 1575.300051]  apic_timer_interrupt+0xf/0x20
[ 1575.300053]  </IRQ>
[ 1575.300055] RIP: 0010:console_unlock+0x3a8/0x45f
[ 1575.300057] Code: 59 00 8a 1d bc 0e 5d 01 48 c7 c7 f0 0c 6a 82 4c 89 2d b6 0e 5d 01 e8 fe c0 59 00 84 db 75 15 e8 9f 1c 00 00 48 8b 3c 24 57 9d <0f> 1f 44 00 00 e9 aa fc ff ff c6 05 89 0e 5d 01 00 e8 83 1c 00 00
[ 1575.300057] RSP: 0018:ffffc90003a3fc20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 1575.300059] RAX: 0000000080000002 RBX: 0000000000000000 RCX: 0000000000000000
[ 1575.300060] RDX: 0000000000000001 RSI: 0000000000000046 RDI: 0000000000000246
[ 1575.300060] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
[ 1575.300061] R10: 0000000005f5e100 R11: ffffffff826a00e7 R12: ffffffff821e7460
[ 1575.300062] R13: 0000000000000000 R14: ffffffff826a00e0 R15: 0000000000000000
[ 1575.300067]  vprintk_emit+0x1ff/0x236
[ 1575.300069]  printk+0x52/0x6e
[ 1575.300071]  blk_update_request+0x10a/0x27b
[ 1575.300074]  blk_mq_end_request+0x1a/0xc8
[ 1575.300076]  blk_mq_complete_request+0xda/0xfd
[ 1575.300080]  nvme_cancel_request+0x5b/0x60
[ 1575.300082]  bt_tags_for_each+0xe2/0x10c
[ 1575.300085]  ? nvme_complete_rq+0x162/0x162
[ 1575.300087]  ? nvme_complete_rq+0x162/0x162
[ 1575.300089]  blk_mq_tagset_busy_iter+0x68/0x75
[ 1575.300091]  nvme_loop_shutdown_ctrl+0x38/0x86 [nvme_loop]
[ 1575.300094]  nvme_loop_reset_ctrl_work+0x2a/0xcf [nvme_loop]
[ 1575.300098]  process_one_work+0x1da/0x313
[ 1575.300101]  ? rescuer_thread+0x282/0x282
[ 1575.300103]  process_scheduled_works+0x27/0x2c
[ 1575.300105]  worker_thread+0x1e7/0x295
[ 1575.300107]  kthread+0x115/0x11d
[ 1575.300109]  ? kthread_park+0x76/0x76
[ 1575.300111]  ret_from_fork+0x35/0x40
[ 1575.300113] Modules linked in: nvme_loop nvmet nvme_fabrics null_blk scsi_debug xfs libcrc32c isofs iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core lpc_ich mfd_core ip_tables sr_mod cdrom usb_storage sd_mod ahci libahci libata crc32c_intel virtio_scsi qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_mod [last unloaded: null_blk]
[ 1575.300127] Dumping ftrace buffer:
[ 1575.300129]    (ftrace buffer empty)
[ 1575.300130] CR2: 0000000000000198
[ 1575.300133] ---[ end trace f941476562612d99 ]---

thanks,
Ming

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28  2:20   ` Ming Lei
@ 2018-11-28  7:00     ` Christoph Hellwig
  2018-11-28 10:07       ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche, Christoph Hellwig

On Wed, Nov 28, 2018 at 10:20:01AM +0800, Ming Lei wrote:
> On Mon, Nov 26, 2018 at 10:33:32AM -0700, Jens Axboe wrote:
> > On 11/26/18 9:54 AM, Keith Busch wrote:
> > > The iterative update to the previous version taking into account review
> > > comments.
> > > 
> > > Background:
> > > 
> > > The main objective is to remove the generic block layer's lock prefix
> > > currently required to transition a request to its completed state by
> > > shifting that expense to lower level drivers that actually need it,
> > > and removing the software layering violation that was required to use
> > > that mechnaism.
> > 
> > Thanks Keith, added for 4.21.
> 
> Hi Jens & Keith,
> 
> Seems this patchset causes the following kernel panic, since not see
> this issue with commit 4ab32bf3305e

Is this the nvme target on top of null_blk?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28  7:00     ` Christoph Hellwig
@ 2018-11-28 10:07       ` Ming Lei
  2018-11-28 10:08         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2018-11-28 10:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 08:00:10AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 28, 2018 at 10:20:01AM +0800, Ming Lei wrote:
> > On Mon, Nov 26, 2018 at 10:33:32AM -0700, Jens Axboe wrote:
> > > On 11/26/18 9:54 AM, Keith Busch wrote:
> > > > The iterative update to the previous version taking into account review
> > > > comments.
> > > > 
> > > > Background:
> > > > 
> > > > The main objective is to remove the generic block layer's lock prefix
> > > > currently required to transition a request to its completed state by
> > > > shifting that expense to lower level drivers that actually need it,
> > > > and removing the software layering violation that was required to use
> > > > that mechnaism.
> > > 
> > > Thanks Keith, added for 4.21.
> > 
> > Hi Jens & Keith,
> > 
> > Seems this patchset causes the following kernel panic, since not see
> > this issue with commit 4ab32bf3305e
> 
> Is this the nvme target on top of null_blk?

Yes.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 10:07       ` Ming Lei
@ 2018-11-28 10:08         ` Christoph Hellwig
  2018-11-28 15:49           ` Keith Busch
  2018-11-29  2:15           ` Ming Lei
  0 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-11-28 10:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, linux-scsi,
	linux-block, Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> > Is this the nvme target on top of null_blk?
> 
> Yes.

And it goes away if you revert just the last patch?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 10:08         ` Christoph Hellwig
@ 2018-11-28 15:49           ` Keith Busch
  2018-11-28 15:58             ` Jens Axboe
  2018-11-29  2:15           ` Ming Lei
  1 sibling, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-28 15:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-scsi, linux-block, Martin Petersen,
	Bart Van Assche

On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> > > Is this the nvme target on top of null_blk?
> > 
> > Yes.
> 
> And it goes away if you revert just the last patch?

It looks like a problem existed before that last patch. Reverting it
helps only if the request happened to have not been reallocated. If it
had been reallocated, the NULL_IRQ_TIMER would have completed the wrong
request out-of-order. If this were a real device, that'd probably result
in data corruption.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 15:49           ` Keith Busch
@ 2018-11-28 15:58             ` Jens Axboe
  2018-11-28 16:26               ` Keith Busch
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2018-11-28 15:58 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: Ming Lei, linux-scsi, linux-block, Martin Petersen, Bart Van Assche

On 11/28/18 8:49 AM, Keith Busch wrote:
> On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
>> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
>>>> Is this the nvme target on top of null_blk?
>>>
>>> Yes.
>>
>> And it goes away if you revert just the last patch?
> 
> It looks like a problem existed before that last patch. Reverting it
> helps only if the request happened to have not been reallocated. If it
> had been reallocated, the NULL_IRQ_TIMER would have completed the wrong
> request out-of-order. If this were a real device, that'd probably result
> in data corruption.

null_blk just needs updating for this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 15:58             ` Jens Axboe
@ 2018-11-28 16:26               ` Keith Busch
  2018-11-28 16:31                 ` Jens Axboe
                                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Keith Busch @ 2018-11-28 16:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 08:58:00AM -0700, Jens Axboe wrote:
> On 11/28/18 8:49 AM, Keith Busch wrote:
> > On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
> >> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> >>>> Is this the nvme target on top of null_blk?
> >>>
> >>> Yes.
> >>
> >> And it goes away if you revert just the last patch?
> > 
> > It looks like a problem existed before that last patch. Reverting it
> > helps only if the request happened to have not been reallocated. If it
> > had been reallocated, the NULL_IRQ_TIMER would have completed the wrong
> > request out-of-order. If this were a real device, that'd probably result
> > in data corruption.
> 
> null_blk just needs updating for this.

Isn't this the nvme target's problem? It shouldn't complete requests
dispatched to its backing device, so I'm thinking something like the
following is what should happen. Untested at the moment, but will try
it out shortly.

---
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9908082b32c4..116398b240e5 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -428,10 +428,14 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 {
 	if (ctrl->ctrl.queue_count > 1) {
-		nvme_stop_queues(&ctrl->ctrl);
-		blk_mq_tagset_busy_iter(&ctrl->tag_set,
-					nvme_cancel_request, &ctrl->ctrl);
+		/*
+		 * The back end device driver is responsible for completing all
+		 * entered requests
+		 */
+		nvme_start_freeze(&ctrl->ctrl);
+		nvme_wait_freeze(&ctrl->ctrl);
 		nvme_loop_destroy_io_queues(ctrl);
+		nvme_unfreeze(&ctrl->ctrl);
 	}
 
 	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
---

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 16:26               ` Keith Busch
@ 2018-11-28 16:31                 ` Jens Axboe
  2018-11-28 17:56                 ` Keith Busch
  2018-11-28 22:31                 ` Keith Busch
  2 siblings, 0 replies; 26+ messages in thread
From: Jens Axboe @ 2018-11-28 16:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Ming Lei, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On 11/28/18 9:26 AM, Keith Busch wrote:
> On Wed, Nov 28, 2018 at 08:58:00AM -0700, Jens Axboe wrote:
>> On 11/28/18 8:49 AM, Keith Busch wrote:
>>> On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
>>>> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
>>>>>> Is this the nvme target on top of null_blk?
>>>>>
>>>>> Yes.
>>>>
>>>> And it goes away if you revert just the last patch?
>>>
>>> It looks like a problem existed before that last patch. Reverting it
>>> helps only if the request happened to have not been reallocated. If it
>>> had been reallocated, the NULL_IRQ_TIMER would have completed the wrong
>>> request out-of-order. If this were a real device, that'd probably result
>>> in data corruption.
>>
>> null_blk just needs updating for this.
> 
> Isn't this the nvme target's problem? It shouldn't complete requests
> dispatched to its backing device, so I'm thinking something like the
> following is what should happen. Untested at the moment, but will try
> it out shortly.

I looked at null_blk after the fact, and my recollection of how
that timeout stuff worked was wrong. I think you are right.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 16:26               ` Keith Busch
  2018-11-28 16:31                 ` Jens Axboe
@ 2018-11-28 17:56                 ` Keith Busch
  2018-11-29  1:18                   ` Ming Lei
  2018-11-28 22:31                 ` Keith Busch
  2 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-28 17:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 09:26:55AM -0700, Keith Busch wrote:
> ---
> diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> index 9908082b32c4..116398b240e5 100644
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -428,10 +428,14 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>  static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
>  {
>  	if (ctrl->ctrl.queue_count > 1) {
> -		nvme_stop_queues(&ctrl->ctrl);
> -		blk_mq_tagset_busy_iter(&ctrl->tag_set,
> -					nvme_cancel_request, &ctrl->ctrl);
> +		/*
> +		 * The back end device driver is responsible for completing all
> +		 * entered requests
> +		 */
> +		nvme_start_freeze(&ctrl->ctrl);
> +		nvme_wait_freeze(&ctrl->ctrl);
>  		nvme_loop_destroy_io_queues(ctrl);
> +		nvme_unfreeze(&ctrl->ctrl);
>  	}
>  
>  	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
> ---

The above tests fine with io and nvme resets on a target nvme loop
backed by null_blk, but I also couldn't recreate the original report
without the patch.

Ming,

Could you tell me a little more how you set it up? I'm just configuring
null_blk with queue_mode=2 irqmode=2. Anything else to recreate?

Thanks,
Keith 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 16:26               ` Keith Busch
  2018-11-28 16:31                 ` Jens Axboe
  2018-11-28 17:56                 ` Keith Busch
@ 2018-11-28 22:31                 ` Keith Busch
  2018-11-28 23:36                   ` Keith Busch
  2 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-28 22:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 09:26:55AM -0700, Keith Busch wrote:
> ---
> diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> index 9908082b32c4..116398b240e5 100644
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -428,10 +428,14 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
>  static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
>  {
>  	if (ctrl->ctrl.queue_count > 1) {
> -		nvme_stop_queues(&ctrl->ctrl);
> -		blk_mq_tagset_busy_iter(&ctrl->tag_set,
> -					nvme_cancel_request, &ctrl->ctrl);
> +		/*
> +		 * The back end device driver is responsible for completing all
> +		 * entered requests
> +		 */
> +		nvme_start_freeze(&ctrl->ctrl);
> +		nvme_wait_freeze(&ctrl->ctrl);
>  		nvme_loop_destroy_io_queues(ctrl);
> +		nvme_unfreeze(&ctrl->ctrl);
>  	}
>  
>  	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
> ---

Grr, the above doesn't work so well when not using NVMe mpath because
nvmf_check_ready() will requeue it, leaving entered requests that won't
be started.

Waiting for a freeze isn't really the criteria we need anyway: we don't
care if there are entered requests in REQ_MQ_IDLE. We just want to wait
for dispatched ones to return, and we currently don't have a good way
to sync with that condition.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 22:31                 ` Keith Busch
@ 2018-11-28 23:36                   ` Keith Busch
  2018-11-29 17:11                     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-28 23:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 03:31:46PM -0700, Keith Busch wrote:
> Waiting for a freeze isn't really the criteria we need anyway: we don't
> care if there are entered requests in REQ_MQ_IDLE. We just want to wait
> for dispatched ones to return, and we currently don't have a good way
> to sync with that condition.

One thing making this weird is that blk_mq_request_started() returns true
for COMPLETED requests, and that makes no sense to me. Completed is
the opposite of started, so I'm not sure why we would return true for
such states. Is anyone actually depending on that?

If we can return true only for started commands, the following
implements the desired wait criteria:

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a82830f39933..d0ef540711c7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
 
 int blk_mq_request_started(struct request *rq)
 {
-	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
+	return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT;
 }
 EXPORT_SYMBOL_GPL(blk_mq_request_started);
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9908082b32c4..ae50b6ed95fb 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -14,6 +14,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include <linux/scatterlist.h>
 #include <linux/blk-mq.h>
+#include <linux/delay.h>
 #include <linux/nvme.h>
 #include <linux/module.h>
 #include <linux/parser.h>
@@ -425,12 +426,37 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	return error;
 }
 
+bool nvme_count_active(struct request *req, void *data, bool reserved)
+{
+	unsigned int *active = data;
+
+	(*active)++;
+	return true;
+}
+
+/*
+ * It is the backing device driver's responsibility to ensure all dispatched
+ * requests are eventually completed.
+ */
+static void nvme_wait_for_stopped(struct nvme_loop_ctrl *ctrl)
+{
+	unsigned int active;
+
+	do {
+		active = 0;
+		blk_mq_tagset_busy_iter(&ctrl->tag_set, nvme_count_active,
+					&active);
+		if (!active)
+			return;
+		msleep(100);
+	} while (true);
+}
+
 static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 {
 	if (ctrl->ctrl.queue_count > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
-		blk_mq_tagset_busy_iter(&ctrl->tag_set,
-					nvme_cancel_request, &ctrl->ctrl);
+		nvme_wait_for_stopped(ctrl);
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
--

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 17:56                 ` Keith Busch
@ 2018-11-29  1:18                   ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2018-11-29  1:18 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Christoph Hellwig, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 10:56:25AM -0700, Keith Busch wrote:
> On Wed, Nov 28, 2018 at 09:26:55AM -0700, Keith Busch wrote:
> > ---
> > diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> > index 9908082b32c4..116398b240e5 100644
> > --- a/drivers/nvme/target/loop.c
> > +++ b/drivers/nvme/target/loop.c
> > @@ -428,10 +428,14 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
> >  static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
> >  {
> >  	if (ctrl->ctrl.queue_count > 1) {
> > -		nvme_stop_queues(&ctrl->ctrl);
> > -		blk_mq_tagset_busy_iter(&ctrl->tag_set,
> > -					nvme_cancel_request, &ctrl->ctrl);
> > +		/*
> > +		 * The back end device driver is responsible for completing all
> > +		 * entered requests
> > +		 */
> > +		nvme_start_freeze(&ctrl->ctrl);
> > +		nvme_wait_freeze(&ctrl->ctrl);
> >  		nvme_loop_destroy_io_queues(ctrl);
> > +		nvme_unfreeze(&ctrl->ctrl);
> >  	}
> >  
> >  	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
> > ---
> 
> The above tests fine with io and nvme resets on a target nvme loop
> backed by null_blk, but I also couldn't recreate the original report
> without the patch.
> 
> Ming,
> 
> Could you tell me a little more how you set it up? I'm just configuring
> null_blk with queue_mode=2 irqmode=2. Anything else to recreate?

No any parameters for null_blk.

You may find all the scripts and .json here:

http://people.redhat.com/minlei/tests/tools/nvme/

And the test entry is nvme_sanity.

thanks,
Ming

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 10:08         ` Christoph Hellwig
  2018-11-28 15:49           ` Keith Busch
@ 2018-11-29  2:15           ` Ming Lei
  2018-11-29  2:39             ` Martin K. Petersen
  1 sibling, 1 reply; 26+ messages in thread
From: Ming Lei @ 2018-11-29  2:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> > > Is this the nvme target on top of null_blk?
> > 
> > Yes.
> 
> And it goes away if you revert just the last patch?

Today I run this test again and can't reproduce it any more.

So maybe not a new issue, and it is just triggered in yesterday's
for-4.21/block, :-)

Thanks,
Ming

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-29  2:15           ` Ming Lei
@ 2018-11-29  2:39             ` Martin K. Petersen
  2018-11-29  8:12               ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Martin K. Petersen @ 2018-11-29  2:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, linux-scsi,
	linux-block, Martin Petersen, Bart Van Assche


Ming,

> On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
>> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
>> > > Is this the nvme target on top of null_blk?
>> > 
>> > Yes.
>> 
>> And it goes away if you revert just the last patch?
>
> Today I run this test again and can't reproduce it any more.
>
> So maybe not a new issue, and it is just triggered in yesterday's
> for-4.21/block, :-)

Can you reproduce with yesterday's tree then?

I'm obviously a bit concerned about merging something without knowing
the cause of the problems you were seeing.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-29  2:39             ` Martin K. Petersen
@ 2018-11-29  8:12               ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2018-11-29  8:12 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, linux-scsi,
	linux-block, Bart Van Assche

On Wed, Nov 28, 2018 at 09:39:44PM -0500, Martin K. Petersen wrote:
> 
> Ming,
> 
> > On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
> >> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> >> > > Is this the nvme target on top of null_blk?
> >> > 
> >> > Yes.
> >> 
> >> And it goes away if you revert just the last patch?
> >
> > Today I run this test again and can't reproduce it any more.
> >
> > So maybe not a new issue, and it is just triggered in yesterday's
> > for-4.21/block, :-)
> 
> Can you reproduce with yesterday's tree then?
> 
> I'm obviously a bit concerned about merging something without knowing
> the cause of the problems you were seeing.

Hi Martin,

The panic is triggered on my auto daily test on yesterday's
for-4.21/block.

It isn't triggered in today's for-4.21/block. Also not triggered any more
when I run same test again on yesterday's for-4.21/block(5f0ed774ed29).


Thanks,
Ming

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-28 23:36                   ` Keith Busch
@ 2018-11-29 17:11                     ` Christoph Hellwig
  2018-11-29 17:20                       ` Keith Busch
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-11-29 17:11 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Christoph Hellwig, Ming Lei, linux-scsi, linux-block,
	Martin Petersen, Bart Van Assche

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a82830f39933..d0ef540711c7 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
>  
>  int blk_mq_request_started(struct request *rq)
>  {
> -	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
> +	return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT;
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_request_started);

Independ of this series this change looks like the right thing to do.
But this whole area is a mine field, so separate testing would be
very helpful.

I also wonder why we even bother with the above helper, a direct
state comparism seems a lot more obvious to the reader.

Last but not least the blk_mq_request_started check in nbd
should probably be lifted into blk_mq_tag_to_rq while we're at it..

As for the nvme issue - it seems to me like we need to decouple the
nvme loop frontend request from the target backend request.  In case
of a timeout/reset we'll complete struct request like all other nvme
transport drivers, but we leave the backend target state around, which
will be freed when it completes (or leaks when the completion is lost).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-29 17:11                     ` Christoph Hellwig
@ 2018-11-29 17:20                       ` Keith Busch
  2018-12-28 12:47                         ` Hannes Reinecke
  0 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2018-11-29 17:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Ming Lei, linux-scsi, linux-block, Martin Petersen,
	Bart Van Assche

On Thu, Nov 29, 2018 at 06:11:59PM +0100, Christoph Hellwig wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index a82830f39933..d0ef540711c7 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
> >  
> >  int blk_mq_request_started(struct request *rq)
> >  {
> > -	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
> > +	return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT;
> >  }
> >  EXPORT_SYMBOL_GPL(blk_mq_request_started);
> 
> Independ of this series this change looks like the right thing to do.
> But this whole area is a mine field, so separate testing would be
> very helpful.
>
> I also wonder why we even bother with the above helper, a direct
> state comparism seems a lot more obvious to the reader.

I think it's just because blk_mq_rq_state() is a private interface. The
enum mq_rq_state is defined under include/linux/, so it looks okay to
make getting the state public too.
 
> Last but not least the blk_mq_request_started check in nbd
> should probably be lifted into blk_mq_tag_to_rq while we're at it..
> 
> As for the nvme issue - it seems to me like we need to decouple the
> nvme loop frontend request from the target backend request.  In case
> of a timeout/reset we'll complete struct request like all other nvme
> transport drivers, but we leave the backend target state around, which
> will be freed when it completes (or leaks when the completion is lost).

I don't think nvme's loop target should do anything to help a command
complete. It shouldn't even implement a timeout for the same reason
no stacking block driver implements these. If a request is stuck, the
lowest level is the only driver that should have the responsibility to
make it unstuck.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 0/3] scsi timeout handling updates
  2018-11-29 17:20                       ` Keith Busch
@ 2018-12-28 12:47                         ` Hannes Reinecke
  0 siblings, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2018-12-28 12:47 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: Jens Axboe, Ming Lei, linux-scsi, linux-block, Martin Petersen,
	Bart Van Assche

On 11/29/18 6:20 PM, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 06:11:59PM +0100, Christoph Hellwig wrote:
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index a82830f39933..d0ef540711c7 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
>>>   
>>>   int blk_mq_request_started(struct request *rq)
>>>   {
>>> -	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
>>> +	return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT;
>>>   }
>>>   EXPORT_SYMBOL_GPL(blk_mq_request_started);
>>
>> Independ of this series this change looks like the right thing to do.
>> But this whole area is a mine field, so separate testing would be
>> very helpful.
>>
>> I also wonder why we even bother with the above helper, a direct
>> state comparism seems a lot more obvious to the reader.
> 
> I think it's just because blk_mq_rq_state() is a private interface. The
> enum mq_rq_state is defined under include/linux/, so it looks okay to
> make getting the state public too.
>   
>> Last but not least the blk_mq_request_started check in nbd
>> should probably be lifted into blk_mq_tag_to_rq while we're at it..
>>
>> As for the nvme issue - it seems to me like we need to decouple the
>> nvme loop frontend request from the target backend request.  In case
>> of a timeout/reset we'll complete struct request like all other nvme
>> transport drivers, but we leave the backend target state around, which
>> will be freed when it completes (or leaks when the completion is lost).
> 
> I don't think nvme's loop target should do anything to help a command
> complete. It shouldn't even implement a timeout for the same reason
> no stacking block driver implements these. If a request is stuck, the
> lowest level is the only driver that should have the responsibility to
> make it unstuck.
> 
Not quite.
You still need to be able to reset the controller, which you can't if 
you have to wait for the lowest level.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-12-28 12:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 16:54 [PATCHv4 0/3] scsi timeout handling updates Keith Busch
2018-11-26 16:54 ` [PATCHv4 1/3] blk-mq: Return true if request was completed Keith Busch
2018-11-26 17:00   ` Christoph Hellwig
2018-11-26 16:54 ` [PATCHv4 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
2018-11-26 17:01   ` Christoph Hellwig
2018-11-26 16:54 ` [PATCHv4 3/3] blk-mq: Simplify request completion state Keith Busch
2018-11-26 17:01   ` Christoph Hellwig
2018-11-26 17:33 ` [PATCHv4 0/3] scsi timeout handling updates Jens Axboe
2018-11-28  2:20   ` Ming Lei
2018-11-28  7:00     ` Christoph Hellwig
2018-11-28 10:07       ` Ming Lei
2018-11-28 10:08         ` Christoph Hellwig
2018-11-28 15:49           ` Keith Busch
2018-11-28 15:58             ` Jens Axboe
2018-11-28 16:26               ` Keith Busch
2018-11-28 16:31                 ` Jens Axboe
2018-11-28 17:56                 ` Keith Busch
2018-11-29  1:18                   ` Ming Lei
2018-11-28 22:31                 ` Keith Busch
2018-11-28 23:36                   ` Keith Busch
2018-11-29 17:11                     ` Christoph Hellwig
2018-11-29 17:20                       ` Keith Busch
2018-12-28 12:47                         ` Hannes Reinecke
2018-11-29  2:15           ` Ming Lei
2018-11-29  2:39             ` Martin K. Petersen
2018-11-29  8:12               ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.