All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/3] scsi timeout handling updates
@ 2018-11-15 17:58 Keith Busch
  2018-11-15 17:58 ` [PATCHv3 1/3] blk-mq: Return true if request was completed Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Keith Busch @ 2018-11-15 17:58 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Keith Busch

The main objective is to remove the generic block layer's lock prefix
currently required to transition a request to its completed state by
shifting that expense to lower level drivers that need it, and removing
the software layering violation that was required to use that mechnaism.

Changes since v2:

  This one really plugs any gaps with fake timeout injection and
  additional coode comments included.

Keith Busch (3):
  blk-mq: Return true if request was completed
  scsi: Do not rely on blk-mq for double completions
  blk-mq: Simplify request completion state

 block/blk-mq.c            |  9 ++++-----
 drivers/scsi/scsi_error.c | 22 +++++++++++-----------
 drivers/scsi/scsi_lib.c   |  6 +++++-
 include/linux/blk-mq.h    | 16 +---------------
 include/scsi/scsi_cmnd.h  |  5 ++++-
 5 files changed, 25 insertions(+), 33 deletions(-)

-- 
2.14.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv3 1/3] blk-mq: Return true if request was completed
  2018-11-15 17:58 [PATCHv3 0/3] scsi timeout handling updates Keith Busch
@ 2018-11-15 17:58 ` Keith Busch
  2018-11-19  8:31   ` Christoph Hellwig
  2018-11-15 17:58 ` [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-11-15 17:58 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Keith Busch

A driver may have internal state to cleanup if we're pretending a request
didn't complete. Return 'false' if the command wasn't actually completed
due to the timeout error injection, and true otherwise.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         | 5 +++--
 include/linux/blk-mq.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 352051ea30f7..1fb0f2050d31 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -633,11 +633,12 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
  *	Ends all I/O on a request. It does not handle partial completions.
  *	The actual completion happens out-of-order, through a IPI handler.
  **/
-void blk_mq_complete_request(struct request *rq)
+bool blk_mq_complete_request(struct request *rq)
 {
 	if (unlikely(blk_should_fake_timeout(rq->q)))
-		return;
+		return false;
 	__blk_mq_complete_request(rq);
+	return true;
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 929e8abc5535..f0dc26fc86a9 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -298,7 +298,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 				bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
-void blk_mq_complete_request(struct request *rq);
+bool blk_mq_complete_request(struct request *rq);
 bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
 			   struct bio *bio);
 bool blk_mq_queue_stopped(struct request_queue *q);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-15 17:58 [PATCHv3 0/3] scsi timeout handling updates Keith Busch
  2018-11-15 17:58 ` [PATCHv3 1/3] blk-mq: Return true if request was completed Keith Busch
@ 2018-11-15 17:58 ` Keith Busch
  2018-11-16  9:53   ` Hannes Reinecke
  2018-11-19  8:58   ` Christoph Hellwig
  2018-11-15 17:58 ` [PATCHv3 3/3] blk-mq: Simplify request completion state Keith Busch
  2018-11-15 19:53 ` [PATCHv3 0/3] scsi timeout handling updates Jens Axboe
  3 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2018-11-15 17:58 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Keith Busch

The scsi timeout error handling had been directly updating the block
layer's request state to prevent a error handling and a natural completion
from completing the same request twice. Fix this layering violation
by having scsi control the fate of its commands with scsi owned flags
rather than use blk-mq's.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/scsi/scsi_error.c | 22 +++++++++++-----------
 drivers/scsi/scsi_lib.c   |  6 +++++-
 include/scsi/scsi_cmnd.h  |  5 ++++-
 3 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index dd338a8cd275..e92e088f636f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -297,19 +297,19 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 
 	if (rtn == BLK_EH_DONE) {
 		/*
-		 * For blk-mq, we must set the request state to complete now
-		 * before sending the request to the scsi error handler. This
-		 * will prevent a use-after-free in the event the LLD manages
-		 * to complete the request before the error handler finishes
-		 * processing this timed out request.
+		 * Set the command to complete first in order to prevent a real
+		 * completion from releasing the command while error handling
+		 * is using it. If the command was already completed, then the
+		 * lower level driver beat the timeout handler, and it is safe
+		 * to return without escalating error recovery.
 		 *
-		 * If the request was already completed, then the LLD beat the
-		 * time out handler from transferring the request to the scsi
-		 * error handler. In that case we can return immediately as no
-		 * further action is required.
+		 * If timeout handling lost the race to a real completion, the
+		 * block layer may ignore that due to a fake timeout injection,
+		 * so return RESET_TIMER to allow error handling another shot
+		 * at this command.
 		 */
-		if (!blk_mq_mark_complete(req))
-			return rtn;
+		if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags))
+			return BLK_EH_RESET_TIMER;
 		if (scsi_abort_command(scmd) != SUCCESS) {
 			set_host_byte(scmd, DID_TIME_OUT);
 			scsi_eh_scmd_add(scmd);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5d83a162d03b..c1d5e4e36125 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
 
 static void scsi_mq_done(struct scsi_cmnd *cmd)
 {
+	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
+		return;
 	trace_scsi_dispatch_cmd_done(cmd);
-	blk_mq_complete_request(cmd->request);
+	if (unlikely(!blk_mq_complete_request(cmd->request)))
+		clear_bit(__SCMD_COMPLETE, &cmd->flags);
 }
 
 static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
@@ -1701,6 +1704,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 			goto out_dec_host_busy;
 		req->rq_flags |= RQF_DONTPREP;
 	} else {
+		cmd->flags &= ~SCMD_COMPLETE;
 		blk_mq_start_request(req);
 	}
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index d6fd2aba0380..ded7c7194a28 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -58,6 +58,9 @@ struct scsi_pointer {
 #define SCMD_TAGGED		(1 << 0)
 #define SCMD_UNCHECKED_ISA_DMA	(1 << 1)
 #define SCMD_INITIALIZED	(1 << 2)
+
+#define __SCMD_COMPLETE		3
+#define SCMD_COMPLETE		(1 << __SCMD_COMPLETE)
 /* flags preserved across unprep / reprep */
 #define SCMD_PRESERVED_FLAGS	(SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED)
 
@@ -144,7 +147,7 @@ struct scsi_cmnd {
 					 * to be at an address < 16Mb). */
 
 	int result;		/* Status code from lower level driver */
-	int flags;		/* Command flags */
+	unsigned long flags;	/* Command flags */
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 };
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 3/3] blk-mq: Simplify request completion state
  2018-11-15 17:58 [PATCHv3 0/3] scsi timeout handling updates Keith Busch
  2018-11-15 17:58 ` [PATCHv3 1/3] blk-mq: Return true if request was completed Keith Busch
  2018-11-15 17:58 ` [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
@ 2018-11-15 17:58 ` Keith Busch
  2018-11-19  8:58   ` Christoph Hellwig
  2018-11-15 19:53 ` [PATCHv3 0/3] scsi timeout handling updates Jens Axboe
  3 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-11-15 17:58 UTC (permalink / raw)
  To: linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche, Keith Busch

There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         |  4 +---
 include/linux/blk-mq.h | 14 --------------
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1fb0f2050d31..ce769132cfbe 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -568,9 +568,7 @@ static void __blk_mq_complete_request(struct request *rq)
 	bool shared = false;
 	int cpu;
 
-	if (!blk_mq_mark_complete(rq))
-		return;
-
+	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
 	/*
 	 * Most of single queue controllers, there is only one irq vector
 	 * for handling IO completion, and the only irq's affinity is set
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f0dc26fc86a9..9b5528fcdd45 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -329,20 +329,6 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q);
 
 unsigned int blk_mq_rq_cpu(struct request *rq);
 
-/**
- * blk_mq_mark_complete() - Set request state to complete
- * @rq: request to set to complete state
- *
- * Returns true if request state was successfully set to complete. If
- * successful, the caller is responsibile for seeing this request is ended, as
- * blk_mq_complete_request will not work again.
- */
-static inline bool blk_mq_mark_complete(struct request *rq)
-{
-	return cmpxchg(&rq->state, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE) ==
-			MQ_RQ_IN_FLIGHT;
-}
-
 /*
  * Driver command data is immediately after the request. So subtract request
  * size to get back to the original request, add request size to get the PDU.
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 0/3] scsi timeout handling updates
  2018-11-15 17:58 [PATCHv3 0/3] scsi timeout handling updates Keith Busch
                   ` (2 preceding siblings ...)
  2018-11-15 17:58 ` [PATCHv3 3/3] blk-mq: Simplify request completion state Keith Busch
@ 2018-11-15 19:53 ` Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 19:53 UTC (permalink / raw)
  To: Keith Busch, linux-scsi, linux-block; +Cc: Martin Petersen, Bart Van Assche

On 11/15/18 10:58 AM, Keith Busch wrote:
> The main objective is to remove the generic block layer's lock prefix
> currently required to transition a request to its completed state by
> shifting that expense to lower level drivers that need it, and removing
> the software layering violation that was required to use that mechnaism.

Looks good to me.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-15 17:58 ` [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
@ 2018-11-16  9:53   ` Hannes Reinecke
  2018-11-16 14:46     ` Bart Van Assche
  2018-11-19  8:58   ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2018-11-16  9:53 UTC (permalink / raw)
  To: Keith Busch, linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen, Bart Van Assche

On 11/15/18 6:58 PM, Keith Busch wrote:
> The scsi timeout error handling had been directly updating the block
> layer's request state to prevent a error handling and a natural completion
> from completing the same request twice. Fix this layering violation
> by having scsi control the fate of its commands with scsi owned flags
> rather than use blk-mq's.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>   drivers/scsi/scsi_error.c | 22 +++++++++++-----------
>   drivers/scsi/scsi_lib.c   |  6 +++++-
>   include/scsi/scsi_cmnd.h  |  5 ++++-
>   3 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index dd338a8cd275..e92e088f636f 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -297,19 +297,19 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>   
>   	if (rtn == BLK_EH_DONE) {
>   		/*
> -		 * For blk-mq, we must set the request state to complete now
> -		 * before sending the request to the scsi error handler. This
> -		 * will prevent a use-after-free in the event the LLD manages
> -		 * to complete the request before the error handler finishes
> -		 * processing this timed out request.
> +		 * Set the command to complete first in order to prevent a real
> +		 * completion from releasing the command while error handling
> +		 * is using it. If the command was already completed, then the
> +		 * lower level driver beat the timeout handler, and it is safe
> +		 * to return without escalating error recovery.
>   		 *
> -		 * If the request was already completed, then the LLD beat the
> -		 * time out handler from transferring the request to the scsi
> -		 * error handler. In that case we can return immediately as no
> -		 * further action is required.
> +		 * If timeout handling lost the race to a real completion, the
> +		 * block layer may ignore that due to a fake timeout injection,
> +		 * so return RESET_TIMER to allow error handling another shot
> +		 * at this command.
>   		 */
> -		if (!blk_mq_mark_complete(req))
> -			return rtn;
> +		if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags))
> +			return BLK_EH_RESET_TIMER;
>   		if (scsi_abort_command(scmd) != SUCCESS) {
>   			set_host_byte(scmd, DID_TIME_OUT);
>   			scsi_eh_scmd_add(scmd);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 5d83a162d03b..c1d5e4e36125 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
>   
>   static void scsi_mq_done(struct scsi_cmnd *cmd)
>   {
> +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
> +		return;
>   	trace_scsi_dispatch_cmd_done(cmd);
> -	blk_mq_complete_request(cmd->request);
> +	if (unlikely(!blk_mq_complete_request(cmd->request)))
> +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
>   }
>   
>   static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
> @@ -1701,6 +1704,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>   			goto out_dec_host_busy;
>   		req->rq_flags |= RQF_DONTPREP;
>   	} else {
> +		cmd->flags &= ~SCMD_COMPLETE;
>   		blk_mq_start_request(req);
>   	}
>   
Why do you use a normal assignment here (and not a clear_bit() as in the 
above hunk)?

And shouldn't you add a barrier here to broadcast the state change to 
other cores?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-16  9:53   ` Hannes Reinecke
@ 2018-11-16 14:46     ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-11-16 14:46 UTC (permalink / raw)
  To: Hannes Reinecke, Keith Busch, linux-scsi, linux-block
  Cc: Jens Axboe, Martin Petersen

On Fri, 2018-11-16 at 10:53 +0100, Hannes Reinecke wrote:
> On 11/15/18 6:58 PM, Keith Busch wrote:
> > The scsi timeout error handling had been directly updating the block
> > layer's request state to prevent a error handling and a natural completion
> > from completing the same request twice. Fix this layering violation
> > by having scsi control the fate of its commands with scsi owned flags
> > rather than use blk-mq's.
> > 
> > Signed-off-by: Keith Busch <keith.busch@intel.com>
> > ---
> >   drivers/scsi/scsi_error.c | 22 +++++++++++-----------
> >   drivers/scsi/scsi_lib.c   |  6 +++++-
> >   include/scsi/scsi_cmnd.h  |  5 ++++-
> >   3 files changed, 20 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index dd338a8cd275..e92e088f636f 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -297,19 +297,19 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> >   
> >   	if (rtn == BLK_EH_DONE) {
> >   		/*
> > -		 * For blk-mq, we must set the request state to complete now
> > -		 * before sending the request to the scsi error handler. This
> > -		 * will prevent a use-after-free in the event the LLD manages
> > -		 * to complete the request before the error handler finishes
> > -		 * processing this timed out request.
> > +		 * Set the command to complete first in order to prevent a real
> > +		 * completion from releasing the command while error handling
> > +		 * is using it. If the command was already completed, then the
> > +		 * lower level driver beat the timeout handler, and it is safe
> > +		 * to return without escalating error recovery.
> >   		 *
> > -		 * If the request was already completed, then the LLD beat the
> > -		 * time out handler from transferring the request to the scsi
> > -		 * error handler. In that case we can return immediately as no
> > -		 * further action is required.
> > +		 * If timeout handling lost the race to a real completion, the
> > +		 * block layer may ignore that due to a fake timeout injection,
> > +		 * so return RESET_TIMER to allow error handling another shot
> > +		 * at this command.
> >   		 */
> > -		if (!blk_mq_mark_complete(req))
> > -			return rtn;
> > +		if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags))
> > +			return BLK_EH_RESET_TIMER;
> >   		if (scsi_abort_command(scmd) != SUCCESS) {
> >   			set_host_byte(scmd, DID_TIME_OUT);
> >   			scsi_eh_scmd_add(scmd);
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 5d83a162d03b..c1d5e4e36125 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
> >   
> >   static void scsi_mq_done(struct scsi_cmnd *cmd)
> >   {
> > +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
> > +		return;
> >   	trace_scsi_dispatch_cmd_done(cmd);
> > -	blk_mq_complete_request(cmd->request);
> > +	if (unlikely(!blk_mq_complete_request(cmd->request)))
> > +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
> >   }
> >   
> >   static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
> > @@ -1701,6 +1704,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
> >   			goto out_dec_host_busy;
> >   		req->rq_flags |= RQF_DONTPREP;
> >   	} else {
> > +		cmd->flags &= ~SCMD_COMPLETE;
> >   		blk_mq_start_request(req);
> >   	}
> >   
> 
> Why do you use a normal assignment here (and not a clear_bit() as in the 
> above hunk)?
> 
> And shouldn't you add a barrier here to broadcast the state change to 
> other cores?

Although I don't like it that request state information is being moved from 
the block layer core into the SCSI core, I think that this patch is fine.
scsi_queue_rq() is only called after a request structure has been recycled
so I don't think that there can be any kind of race between the code paths
that use atomic instructions to manipulate the __SCMD_COMPLETE bit and
scsi_queue_rq().

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 1/3] blk-mq: Return true if request was completed
  2018-11-15 17:58 ` [PATCHv3 1/3] blk-mq: Return true if request was completed Keith Busch
@ 2018-11-19  8:31   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen, Bart Van Assche

On Thu, Nov 15, 2018 at 10:58:18AM -0700, Keith Busch wrote:
> A driver may have internal state to cleanup if we're pretending a request
> didn't complete. Return 'false' if the command wasn't actually completed
> due to the timeout error injection, and true otherwise.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-15 17:58 ` [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
  2018-11-16  9:53   ` Hannes Reinecke
@ 2018-11-19  8:58   ` Christoph Hellwig
  2018-11-19 15:17     ` Jens Axboe
  2018-11-19 15:19     ` Keith Busch
  1 sibling, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen, Bart Van Assche

> index 5d83a162d03b..c1d5e4e36125 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
>  
>  static void scsi_mq_done(struct scsi_cmnd *cmd)
>  {
> +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
> +		return;
>  	trace_scsi_dispatch_cmd_done(cmd);
> -	blk_mq_complete_request(cmd->request);
> +	if (unlikely(!blk_mq_complete_request(cmd->request)))
> +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
>  }

This looks a little odd to me.  If we didn't complete the command
someone else did.  Why would we clear the bit in this case?

>  static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
> @@ -1701,6 +1704,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			goto out_dec_host_busy;
>  		req->rq_flags |= RQF_DONTPREP;
>  	} else {
> +		cmd->flags &= ~SCMD_COMPLETE;
>  		blk_mq_start_request(req);
>  	}
>  
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index d6fd2aba0380..ded7c7194a28 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -58,6 +58,9 @@ struct scsi_pointer {
>  #define SCMD_TAGGED		(1 << 0)
>  #define SCMD_UNCHECKED_ISA_DMA	(1 << 1)
>  #define SCMD_INITIALIZED	(1 << 2)
> +
> +#define __SCMD_COMPLETE		3
> +#define SCMD_COMPLETE		(1 << __SCMD_COMPLETE)

This mixing of atomic and non-atomic bitops looks rather dangerous
to me.  Can you add a new atomic_flags just for the completed flag,
and always use the bitops on it for now? I think we can eventually
kill most of the existing flags except for SCMD_TAGGED over the
next merge window or two and then move that over as well.

Otherwise the concept looks fine to me.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 3/3] blk-mq: Simplify request completion state
  2018-11-15 17:58 ` [PATCHv3 3/3] blk-mq: Simplify request completion state Keith Busch
@ 2018-11-19  8:58   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen, Bart Van Assche

On Thu, Nov 15, 2018 at 10:58:20AM -0700, Keith Busch wrote:
> There are no more users relying on blk-mq request states to prevent
> double completions, so replace the relatively expensive cmpxchg operation
> with WRITE_ONCE.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-19  8:58   ` Christoph Hellwig
@ 2018-11-19 15:17     ` Jens Axboe
  2018-11-19 15:19     ` Keith Busch
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-19 15:17 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: linux-scsi, linux-block, Martin Petersen, Bart Van Assche

On 11/19/18 1:58 AM, Christoph Hellwig wrote:
>> index 5d83a162d03b..c1d5e4e36125 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
>>  
>>  static void scsi_mq_done(struct scsi_cmnd *cmd)
>>  {
>> +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
>> +		return;
>>  	trace_scsi_dispatch_cmd_done(cmd);
>> -	blk_mq_complete_request(cmd->request);
>> +	if (unlikely(!blk_mq_complete_request(cmd->request)))
>> +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
>>  }
> 
> This looks a little odd to me.  If we didn't complete the command
> someone else did.  Why would we clear the bit in this case?

It's strictly for the fake timeout, it didn't complete it, it just
ignored it. This is an artifact of the weird way that works.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-19  8:58   ` Christoph Hellwig
  2018-11-19 15:17     ` Jens Axboe
@ 2018-11-19 15:19     ` Keith Busch
  2018-11-21 13:12       ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Keith Busch @ 2018-11-19 15:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-scsi, linux-block, Jens Axboe, Martin Petersen, Bart Van Assche

On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
> > index 5d83a162d03b..c1d5e4e36125 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
> >  
> >  static void scsi_mq_done(struct scsi_cmnd *cmd)
> >  {
> > +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
> > +		return;
> >  	trace_scsi_dispatch_cmd_done(cmd);
> > -	blk_mq_complete_request(cmd->request);
> > +	if (unlikely(!blk_mq_complete_request(cmd->request)))
> > +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
> >  }
> 
> This looks a little odd to me.  If we didn't complete the command
> someone else did.  Why would we clear the bit in this case?

It's only to go along with the fake timeout. If we don't clear the bit,
then then scsi timeout handler will believe it has nothing to do because
scsi did its required part. The block layer just pretends the LLD didn't
do its part, so scsi has to play along too.

> >  static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
> > @@ -1701,6 +1704,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  			goto out_dec_host_busy;
> >  		req->rq_flags |= RQF_DONTPREP;
> >  	} else {
> > +		cmd->flags &= ~SCMD_COMPLETE;
> >  		blk_mq_start_request(req);
> >  	}
> >  
> > diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> > index d6fd2aba0380..ded7c7194a28 100644
> > --- a/include/scsi/scsi_cmnd.h
> > +++ b/include/scsi/scsi_cmnd.h
> > @@ -58,6 +58,9 @@ struct scsi_pointer {
> >  #define SCMD_TAGGED		(1 << 0)
> >  #define SCMD_UNCHECKED_ISA_DMA	(1 << 1)
> >  #define SCMD_INITIALIZED	(1 << 2)
> > +
> > +#define __SCMD_COMPLETE		3
> > +#define SCMD_COMPLETE		(1 << __SCMD_COMPLETE)
> 
> This mixing of atomic and non-atomic bitops looks rather dangerous
> to me.  Can you add a new atomic_flags just for the completed flag,
> and always use the bitops on it for now? I think we can eventually
> kill most of the existing flags except for SCMD_TAGGED over the
> next merge window or two and then move that over as well.

The only concurrent access is completion + timeout, otherwise access is
single-threaded. I'm using the atomic operations only where it is
needed.

We implicitly clear the SCMD_COMPLETED flag along with SCMD_TAGGED in
scsi_init_command() too, and I didn't want to add new overhead with
new atomics.

> Otherwise the concept looks fine to me.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-19 15:19     ` Keith Busch
@ 2018-11-21 13:12       ` Christoph Hellwig
  2018-11-26 15:32         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2018-11-21 13:12 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, linux-scsi, linux-block, Jens Axboe,
	Martin Petersen, Bart Van Assche

On Mon, Nov 19, 2018 at 08:19:00AM -0700, Keith Busch wrote:
> On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
> > > index 5d83a162d03b..c1d5e4e36125 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
> > >  
> > >  static void scsi_mq_done(struct scsi_cmnd *cmd)
> > >  {
> > > +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
> > > +		return;
> > >  	trace_scsi_dispatch_cmd_done(cmd);
> > > -	blk_mq_complete_request(cmd->request);
> > > +	if (unlikely(!blk_mq_complete_request(cmd->request)))
> > > +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
> > >  }
> > 
> > This looks a little odd to me.  If we didn't complete the command
> > someone else did.  Why would we clear the bit in this case?
> 
> It's only to go along with the fake timeout. If we don't clear the bit,
> then then scsi timeout handler will believe it has nothing to do because
> scsi did its required part. The block layer just pretends the LLD didn't
> do its part, so scsi has to play along too.

This just looks way to magic to me.  In other word - it needs a big fat
comment explaining the situation.

> > > +#define __SCMD_COMPLETE		3
> > > +#define SCMD_COMPLETE		(1 << __SCMD_COMPLETE)
> > 
> > This mixing of atomic and non-atomic bitops looks rather dangerous
> > to me.  Can you add a new atomic_flags just for the completed flag,
> > and always use the bitops on it for now? I think we can eventually
> > kill most of the existing flags except for SCMD_TAGGED over the
> > next merge window or two and then move that over as well.
> 
> The only concurrent access is completion + timeout, otherwise access is
> single-threaded. I'm using the atomic operations only where it is
> needed.
> 
> We implicitly clear the SCMD_COMPLETED flag along with SCMD_TAGGED in
> scsi_init_command() too, and I didn't want to add new overhead with
> new atomics.

In general mixing access types on a single field (nevermind bit)
is going to cause us problems further down the road sooner or later.

I'd be much happier with a separate field.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-26 15:32         ` Jens Axboe
@ 2018-11-26 15:31           ` Keith Busch
  0 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2018-11-26 15:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-scsi, linux-block, Martin Petersen,
	Bart Van Assche

On Mon, Nov 26, 2018 at 08:32:45AM -0700, Jens Axboe wrote:
> On 11/21/18 6:12 AM, Christoph Hellwig wrote:
> > On Mon, Nov 19, 2018 at 08:19:00AM -0700, Keith Busch wrote:
> >> On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
> >>>> index 5d83a162d03b..c1d5e4e36125 100644
> >>>> --- a/drivers/scsi/scsi_lib.c
> >>>> +++ b/drivers/scsi/scsi_lib.c
> >>>> @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
> >>>>  
> >>>>  static void scsi_mq_done(struct scsi_cmnd *cmd)
> >>>>  {
> >>>> +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
> >>>> +		return;
> >>>>  	trace_scsi_dispatch_cmd_done(cmd);
> >>>> -	blk_mq_complete_request(cmd->request);
> >>>> +	if (unlikely(!blk_mq_complete_request(cmd->request)))
> >>>> +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
> >>>>  }
> >>>
> >>> This looks a little odd to me.  If we didn't complete the command
> >>> someone else did.  Why would we clear the bit in this case?
> >>
> >> It's only to go along with the fake timeout. If we don't clear the bit,
> >> then then scsi timeout handler will believe it has nothing to do because
> >> scsi did its required part. The block layer just pretends the LLD didn't
> >> do its part, so scsi has to play along too.
> > 
> > This just looks way to magic to me.  In other word - it needs a big fat
> > comment explaining the situation.
> > 
> >>>> +#define __SCMD_COMPLETE		3
> >>>> +#define SCMD_COMPLETE		(1 << __SCMD_COMPLETE)
> >>>
> >>> This mixing of atomic and non-atomic bitops looks rather dangerous
> >>> to me.  Can you add a new atomic_flags just for the completed flag,
> >>> and always use the bitops on it for now? I think we can eventually
> >>> kill most of the existing flags except for SCMD_TAGGED over the
> >>> next merge window or two and then move that over as well.
> >>
> >> The only concurrent access is completion + timeout, otherwise access is
> >> single-threaded. I'm using the atomic operations only where it is
> >> needed.
> >>
> >> We implicitly clear the SCMD_COMPLETED flag along with SCMD_TAGGED in
> >> scsi_init_command() too, and I didn't want to add new overhead with
> >> new atomics.
> > 
> > In general mixing access types on a single field (nevermind bit)
> > is going to cause us problems further down the road sooner or later.
> > 
> > I'd be much happier with a separate field.
> 
> Keith, will you please respin with the separate field? Would be nice
> to get this merged for 4.21.

I'll send out a new version today.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions
  2018-11-21 13:12       ` Christoph Hellwig
@ 2018-11-26 15:32         ` Jens Axboe
  2018-11-26 15:31           ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2018-11-26 15:32 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: linux-scsi, linux-block, Martin Petersen, Bart Van Assche

On 11/21/18 6:12 AM, Christoph Hellwig wrote:
> On Mon, Nov 19, 2018 at 08:19:00AM -0700, Keith Busch wrote:
>> On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
>>>> index 5d83a162d03b..c1d5e4e36125 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
>>>>  
>>>>  static void scsi_mq_done(struct scsi_cmnd *cmd)
>>>>  {
>>>> +	if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)))
>>>> +		return;
>>>>  	trace_scsi_dispatch_cmd_done(cmd);
>>>> -	blk_mq_complete_request(cmd->request);
>>>> +	if (unlikely(!blk_mq_complete_request(cmd->request)))
>>>> +		clear_bit(__SCMD_COMPLETE, &cmd->flags);
>>>>  }
>>>
>>> This looks a little odd to me.  If we didn't complete the command
>>> someone else did.  Why would we clear the bit in this case?
>>
>> It's only to go along with the fake timeout. If we don't clear the bit,
>> then then scsi timeout handler will believe it has nothing to do because
>> scsi did its required part. The block layer just pretends the LLD didn't
>> do its part, so scsi has to play along too.
> 
> This just looks way to magic to me.  In other word - it needs a big fat
> comment explaining the situation.
> 
>>>> +#define __SCMD_COMPLETE		3
>>>> +#define SCMD_COMPLETE		(1 << __SCMD_COMPLETE)
>>>
>>> This mixing of atomic and non-atomic bitops looks rather dangerous
>>> to me.  Can you add a new atomic_flags just for the completed flag,
>>> and always use the bitops on it for now? I think we can eventually
>>> kill most of the existing flags except for SCMD_TAGGED over the
>>> next merge window or two and then move that over as well.
>>
>> The only concurrent access is completion + timeout, otherwise access is
>> single-threaded. I'm using the atomic operations only where it is
>> needed.
>>
>> We implicitly clear the SCMD_COMPLETED flag along with SCMD_TAGGED in
>> scsi_init_command() too, and I didn't want to add new overhead with
>> new atomics.
> 
> In general mixing access types on a single field (nevermind bit)
> is going to cause us problems further down the road sooner or later.
> 
> I'd be much happier with a separate field.

Keith, will you please respin with the separate field? Would be nice
to get this merged for 4.21.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-11-26 15:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-15 17:58 [PATCHv3 0/3] scsi timeout handling updates Keith Busch
2018-11-15 17:58 ` [PATCHv3 1/3] blk-mq: Return true if request was completed Keith Busch
2018-11-19  8:31   ` Christoph Hellwig
2018-11-15 17:58 ` [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions Keith Busch
2018-11-16  9:53   ` Hannes Reinecke
2018-11-16 14:46     ` Bart Van Assche
2018-11-19  8:58   ` Christoph Hellwig
2018-11-19 15:17     ` Jens Axboe
2018-11-19 15:19     ` Keith Busch
2018-11-21 13:12       ` Christoph Hellwig
2018-11-26 15:32         ` Jens Axboe
2018-11-26 15:31           ` Keith Busch
2018-11-15 17:58 ` [PATCHv3 3/3] blk-mq: Simplify request completion state Keith Busch
2018-11-19  8:58   ` Christoph Hellwig
2018-11-15 19:53 ` [PATCHv3 0/3] scsi timeout handling updates Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.