All of lore.kernel.org
 help / color / mirror / Atom feed
* Make SCSI error handler code easier to understand
@ 2014-05-26 15:12 Bart Van Assche
  2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Bart Van Assche @ 2014-05-26 15:12 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Hannes Reinecke, Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

Every now and then someone asks how it is avoided that the SCSI error
handler and the SCSI completion handler are invoked concurrently for
the same SCSI command. Hence this patch series that should make the SCSI
error handler code a little easier to understand.

This patch series consists of the following three patches:

0001-Remove-two-cancel_delayed_work-calls-from-the-error-.patch
0002-block-Introduce-blk_rq_completed.patch
0003-Make-SCSI-error-handler-code-easier-to-understand.patch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-26 15:12 Make SCSI error handler code easier to understand Bart Van Assche
@ 2014-05-26 15:14 ` Bart Van Assche
  2014-05-26 15:15   ` [PATCH 2/3] block: Introduce blk_rq_completed() Bart Van Assche
                     ` (2 more replies)
  2014-05-26 15:15 ` [PATCH 3/3] Make SCSI error handler code easier to understand Bart Van Assche
  2014-05-28 20:15 ` Joe Lawrence
  2 siblings, 3 replies; 31+ messages in thread
From: Bart Van Assche @ 2014-05-26 15:14 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Hannes Reinecke, Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

scsi_put_command() is either invoked before a command is queued or
after a command has completed. scsi_cmnd.abort_work is scheduled
after a command has timed out and before it is finished. The block
layer guarantees that either the softirq_done_fn() or the
rq_timed_out_fn() function is invoked but not both. This means that
scsi_put_command() is never invoked while abort_work is scheduled.
Hence remove the cancel_delayed_work() call from scsi_put_command().

Similarly, scsi_abort_command() is only invoked from the SCSI
timeout handler. If scsi_abort_command() is invoked for a SCSI
command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
hence that scsi_cmnd.abort_work is no longer pending. Hence also
remove the cancel_delayed_work() call from scsi_abort_command().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/scsi.c       | 2 +-
 drivers/scsi/scsi_error.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..c972eab 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 	list_del_init(&cmd->list);
 	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
 
-	cancel_delayed_work(&cmd->abort_work);
+	WARN_ON_ONCE(delayed_work_pending(&cmd->abort_work));
 
 	__scsi_put_command(cmd->device->host, cmd);
 }
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f17aa7a..5232583 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "scmd %p previous abort failed\n", scmd));
-		cancel_delayed_work(&scmd->abort_work);
+		WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
 		return FAILED;
 	}
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
@ 2014-05-26 15:15   ` Bart Van Assche
  2014-05-26 15:27     ` James Bottomley
  2014-05-27  5:40     ` Hannes Reinecke
  2014-05-26 15:23   ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Paolo Bonzini
  2014-05-27  5:40   ` Hannes Reinecke
  2 siblings, 2 replies; 31+ messages in thread
From: Bart Van Assche @ 2014-05-26 15:15 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Hannes Reinecke, Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
block layer core.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
---
 block/blk-softirq.c    | 13 +++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 53b1737..fc7d160 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -172,6 +172,19 @@ void blk_complete_request(struct request *req)
 }
 EXPORT_SYMBOL(blk_complete_request);
 
+/**
+ * blk_rq_completed - whether or not a request has been completed
+ *
+ * Intended for debugging purposes only. Any completion handler code should go
+ * into the softirq_done_fn() and/or in the rq_timed_out_fn() request_queue
+ * callback functions.
+ */
+bool blk_rq_completed(struct request *rq)
+{
+	return test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
+}
+EXPORT_SYMBOL(blk_rq_completed);
+
 static __init int blk_softirq_init(void)
 {
 	int i;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0d84981..a621bc5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,6 +952,7 @@ extern void blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
 extern void blk_abort_request(struct request *);
 extern void blk_unprep_request(struct request *);
+extern bool blk_rq_completed(struct request *);
 
 /*
  * Access functions for manipulating queue properties
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/3] Make SCSI error handler code easier to understand
  2014-05-26 15:12 Make SCSI error handler code easier to understand Bart Van Assche
  2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
@ 2014-05-26 15:15 ` Bart Van Assche
  2014-05-27  5:42   ` Hannes Reinecke
  2014-05-28 20:15 ` Joe Lawrence
  2 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2014-05-26 15:15 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Hannes Reinecke, Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

Every now and then someone asks how it is avoided that the SCSI error
handler and the SCSI completion handler are invoked concurrently for
the same SCSI command. Add a few WARN_ON_ONCE() statements that make
it clear how this is avoided.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/scsi_error.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5232583..aee0dc0 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -120,6 +120,8 @@ scmd_eh_abort_handler(struct work_struct *work)
 	struct scsi_device *sdev = scmd->device;
 	int rtn;
 
+	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
+
 	if (scsi_host_eh_past_deadline(sdev->host)) {
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
@@ -185,6 +187,8 @@ scsi_abort_command(struct scsi_cmnd *scmd)
 	struct Scsi_Host *shost = sdev->host;
 	unsigned long flags;
 
+	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
+
 	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) {
 		/*
 		 * Retry after abort failed, escalate to next level.
@@ -237,6 +241,8 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	unsigned long flags;
 	int ret = 0;
 
+	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
+
 	if (!shost->ehandler)
 		return 0;
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
  2014-05-26 15:15   ` [PATCH 2/3] block: Introduce blk_rq_completed() Bart Van Assche
@ 2014-05-26 15:23   ` Paolo Bonzini
  2014-05-26 15:25     ` James Bottomley
  2014-05-27  8:06     ` Bart Van Assche
  2014-05-27  5:40   ` Hannes Reinecke
  2 siblings, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2014-05-26 15:23 UTC (permalink / raw)
  To: Bart Van Assche, James Bottomley, Christoph Hellwig
  Cc: Hannes Reinecke, Jens Axboe, Joe Lawrence, linux-scsi

Il 26/05/2014 17:14, Bart Van Assche ha scritto:
> scsi_put_command() is either invoked before a command is queued or
> after a command has completed. scsi_cmnd.abort_work is scheduled
> after a command has timed out and before it is finished. The block
> layer guarantees that either the softirq_done_fn() or the
> rq_timed_out_fn() function is invoked but not both. This means that
> scsi_put_command() is never invoked while abort_work is scheduled.
> Hence remove the cancel_delayed_work() call from scsi_put_command().
>
> Similarly, scsi_abort_command() is only invoked from the SCSI
> timeout handler. If scsi_abort_command() is invoked for a SCSI
> command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
> scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
> hence that scsi_cmnd.abort_work is no longer pending. Hence also
> remove the cancel_delayed_work() call from scsi_abort_command().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  drivers/scsi/scsi.c       | 2 +-
>  drivers/scsi/scsi_error.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 88d46fe..c972eab 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>  	list_del_init(&cmd->list);
>  	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>
> -	cancel_delayed_work(&cmd->abort_work);
> +	WARN_ON_ONCE(delayed_work_pending(&cmd->abort_work));
>
>  	__scsi_put_command(cmd->device->host, cmd);
>  }
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f17aa7a..5232583 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>  		SCSI_LOG_ERROR_RECOVERY(3,
>  			scmd_printk(KERN_INFO, scmd,
>  				    "scmd %p previous abort failed\n", scmd));
> -		cancel_delayed_work(&scmd->abort_work);
> +		WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
>  		return FAILED;
>  	}
>
>

I still prefer a BUG_ON in scsi_put_command, but anyway: series

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-26 15:23   ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Paolo Bonzini
@ 2014-05-26 15:25     ` James Bottomley
  2014-05-27  8:06     ` Bart Van Assche
  1 sibling, 0 replies; 31+ messages in thread
From: James Bottomley @ 2014-05-26 15:25 UTC (permalink / raw)
  To: pbonzini; +Cc: linux-scsi, bvanassche, hch, hare, axboe, jdl1291

On Mon, 2014-05-26 at 17:23 +0200, Paolo Bonzini wrote:
> Il 26/05/2014 17:14, Bart Van Assche ha scritto:
> > scsi_put_command() is either invoked before a command is queued or
> > after a command has completed. scsi_cmnd.abort_work is scheduled
> > after a command has timed out and before it is finished. The block
> > layer guarantees that either the softirq_done_fn() or the
> > rq_timed_out_fn() function is invoked but not both. This means that
> > scsi_put_command() is never invoked while abort_work is scheduled.
> > Hence remove the cancel_delayed_work() call from scsi_put_command().
> >
> > Similarly, scsi_abort_command() is only invoked from the SCSI
> > timeout handler. If scsi_abort_command() is invoked for a SCSI
> > command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
> > scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
> > hence that scsi_cmnd.abort_work is no longer pending. Hence also
> > remove the cancel_delayed_work() call from scsi_abort_command().
> >
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Joe Lawrence <joe.lawrence@stratus.com>
> > ---
> >  drivers/scsi/scsi.c       | 2 +-
> >  drivers/scsi/scsi_error.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > index 88d46fe..c972eab 100644
> > --- a/drivers/scsi/scsi.c
> > +++ b/drivers/scsi/scsi.c
> > @@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
> >  	list_del_init(&cmd->list);
> >  	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
> >
> > -	cancel_delayed_work(&cmd->abort_work);
> > +	WARN_ON_ONCE(delayed_work_pending(&cmd->abort_work));
> >
> >  	__scsi_put_command(cmd->device->host, cmd);
> >  }
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index f17aa7a..5232583 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
> >  		SCSI_LOG_ERROR_RECOVERY(3,
> >  			scmd_printk(KERN_INFO, scmd,
> >  				    "scmd %p previous abort failed\n", scmd));
> > -		cancel_delayed_work(&scmd->abort_work);
> > +		WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
> >  		return FAILED;
> >  	}
> >
> >
> 
> I still prefer a BUG_ON in scsi_put_command, but anyway: series

Me too, since this is supposed to be an impossible condition that would
induce corruption in the cmd free lists and hence make the system
unusable.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-26 15:15   ` [PATCH 2/3] block: Introduce blk_rq_completed() Bart Van Assche
@ 2014-05-26 15:27     ` James Bottomley
  2014-05-27  7:49       ` Bart Van Assche
  2014-05-27  5:40     ` Hannes Reinecke
  1 sibling, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-26 15:27 UTC (permalink / raw)
  To: bvanassche; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On Mon, 2014-05-26 at 17:15 +0200, Bart Van Assche wrote:
> Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
> block layer core.

I don't see the value of patches 2,3 they're checking for an impossible
condition ... why might it be possible?

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
  2014-05-26 15:15   ` [PATCH 2/3] block: Introduce blk_rq_completed() Bart Van Assche
  2014-05-26 15:23   ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Paolo Bonzini
@ 2014-05-27  5:40   ` Hannes Reinecke
  2014-05-27  6:08     ` Bart Van Assche
  2 siblings, 1 reply; 31+ messages in thread
From: Hannes Reinecke @ 2014-05-27  5:40 UTC (permalink / raw)
  To: Bart Van Assche, James Bottomley, Christoph Hellwig
  Cc: Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

On 05/26/2014 05:14 PM, Bart Van Assche wrote:
> scsi_put_command() is either invoked before a command is queued or
> after a command has completed. scsi_cmnd.abort_work is scheduled
> after a command has timed out and before it is finished. The block
> layer guarantees that either the softirq_done_fn() or the
> rq_timed_out_fn() function is invoked but not both. This means that
> scsi_put_command() is never invoked while abort_work is scheduled.
> Hence remove the cancel_delayed_work() call from scsi_put_command().
>
> Similarly, scsi_abort_command() is only invoked from the SCSI
> timeout handler. If scsi_abort_command() is invoked for a SCSI
> command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
> scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
> hence that scsi_cmnd.abort_work is no longer pending. Hence also
> remove the cancel_delayed_work() call from scsi_abort_command().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>   drivers/scsi/scsi.c       | 2 +-
>   drivers/scsi/scsi_error.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 88d46fe..c972eab 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>   	list_del_init(&cmd->list);
>   	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>
> -	cancel_delayed_work(&cmd->abort_work);
> +	WARN_ON_ONCE(delayed_work_pending(&cmd->abort_work));
>
>   	__scsi_put_command(cmd->device->host, cmd);
>   }
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f17aa7a..5232583 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>   		SCSI_LOG_ERROR_RECOVERY(3,
>   			scmd_printk(KERN_INFO, scmd,
>   				    "scmd %p previous abort failed\n", scmd));
> -		cancel_delayed_work(&scmd->abort_work);
> +		WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
>   		return FAILED;
>   	}
>
>
The first bit is okay, the second isn't.

The second bit is for these cases where the abort got scheduled (in 
scsi_abort_command()), but the workqueue didn't get executed by the 
time the next timeout occured.
I know, highly unlikely, but there is no safeguarding that it 
_cannot_ happen.
So the second cancel_delayed_work() has to stay.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-26 15:15   ` [PATCH 2/3] block: Introduce blk_rq_completed() Bart Van Assche
  2014-05-26 15:27     ` James Bottomley
@ 2014-05-27  5:40     ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2014-05-27  5:40 UTC (permalink / raw)
  To: Bart Van Assche, James Bottomley, Christoph Hellwig
  Cc: Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

On 05/26/2014 05:15 PM, Bart Van Assche wrote:
> Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
> block layer core.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Joe Lawrence <joe.lawrence@stratus.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] Make SCSI error handler code easier to understand
  2014-05-26 15:15 ` [PATCH 3/3] Make SCSI error handler code easier to understand Bart Van Assche
@ 2014-05-27  5:42   ` Hannes Reinecke
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2014-05-27  5:42 UTC (permalink / raw)
  To: Bart Van Assche, James Bottomley, Christoph Hellwig
  Cc: Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

On 05/26/2014 05:15 PM, Bart Van Assche wrote:
> Every now and then someone asks how it is avoided that the SCSI error
> handler and the SCSI completion handler are invoked concurrently for
> the same SCSI command. Add a few WARN_ON_ONCE() statements that make
> it clear how this is avoided.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>   drivers/scsi/scsi_error.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 5232583..aee0dc0 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -120,6 +120,8 @@ scmd_eh_abort_handler(struct work_struct *work)
>   	struct scsi_device *sdev = scmd->device;
>   	int rtn;
>
> +	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
> +
>   	if (scsi_host_eh_past_deadline(sdev->host)) {
>   		SCSI_LOG_ERROR_RECOVERY(3,
>   			scmd_printk(KERN_INFO, scmd,
> @@ -185,6 +187,8 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>   	struct Scsi_Host *shost = sdev->host;
>   	unsigned long flags;
>
> +	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
> +
>   	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) {
>   		/*
>   		 * Retry after abort failed, escalate to next level.
> @@ -237,6 +241,8 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>   	unsigned long flags;
>   	int ret = 0;
>
> +	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
> +
>   	if (!shost->ehandler)
>   		return 0;
>
>
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-27  5:40   ` Hannes Reinecke
@ 2014-05-27  6:08     ` Bart Van Assche
  2014-05-27  6:22       ` Hannes Reinecke
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2014-05-27  6:08 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley, Christoph Hellwig
  Cc: Jens Axboe, Paolo Bonzini, Joe Lawrence, linux-scsi

On 05/27/14 07:40, Hannes Reinecke wrote:
> On 05/26/2014 05:14 PM, Bart Van Assche wrote:
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index f17aa7a..5232583 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>>           SCSI_LOG_ERROR_RECOVERY(3,
>>               scmd_printk(KERN_INFO, scmd,
>>                       "scmd %p previous abort failed\n", scmd));
>> -        cancel_delayed_work(&scmd->abort_work);
>> +        WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
>>           return FAILED;
>>       }
>>
>>
> The first bit is okay, the second isn't.
> 
> The second bit is for these cases where the abort got scheduled (in
> scsi_abort_command()), but the workqueue didn't get executed by the time
> the next timeout occured.
> I know, highly unlikely, but there is no safeguarding that it _cannot_
> happen.
> So the second cancel_delayed_work() has to stay.

But how could that next timeout occur while abort_work is still pending
? The block layer removes a request from the timeout list before
invoking the timeout handler (see also blk_rq_check_expired()). This
means that no block layer timers are active after abort_work has been
scheduled and before scmd_eh_abort_handler() is called. This also means
that a second timeout can only occur after a SCSI command has been
reinserted to a SCSI device queue. And such a reinsertion can only occur
after scmd_eh_abort_handler() has started. The pending bit is cleared
from a work struct before the associated handler is invoked. This is why
I think the above cancel_delayed_work() statement is not necessary. Or
did I perhaps overlook something ?

Bart.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-27  6:08     ` Bart Van Assche
@ 2014-05-27  6:22       ` Hannes Reinecke
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2014-05-27  6:22 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi

(Resending; mailer rejected it ...)

On 05/27/2014 08:08 AM, Bart Van Assche wrote:
> On 05/27/14 07:40, Hannes Reinecke wrote:
>> On 05/26/2014 05:14 PM, Bart Van Assche wrote:
>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>> index f17aa7a..5232583 100644
>>> --- a/drivers/scsi/scsi_error.c
>>> +++ b/drivers/scsi/scsi_error.c
>>> @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>>>            SCSI_LOG_ERROR_RECOVERY(3,
>>>                scmd_printk(KERN_INFO, scmd,
>>>                        "scmd %p previous abort failed\n", scmd));
>>> -        cancel_delayed_work(&scmd->abort_work);
>>> +        WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
>>>            return FAILED;
>>>        }
>>>
>>>
>> The first bit is okay, the second isn't.
>>
>> The second bit is for these cases where the abort got scheduled (in
>> scsi_abort_command()), but the workqueue didn't get executed by the time
>> the next timeout occured.
>> I know, highly unlikely, but there is no safeguarding that it _cannot_
>> happen.
>> So the second cancel_delayed_work() has to stay.
>
> But how could that next timeout occur while abort_work is still pending
> ? The block layer removes a request from the timeout list before
> invoking the timeout handler (see also blk_rq_check_expired()). This
> means that no block layer timers are active after abort_work has been
> scheduled and before scmd_eh_abort_handler() is called. This also means
> that a second timeout can only occur after a SCSI command has been
> reinserted to a SCSI device queue. And such a reinsertion can only occur
> after scmd_eh_abort_handler() has started. The pending bit is cleared
> from a work struct before the associated handler is invoked. This is why
> I think the above cancel_delayed_work() statement is not necessary. Or
> did I perhaps overlook something ?
>
Hmm. Okay, convinced.

Acked-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-26 15:27     ` James Bottomley
@ 2014-05-27  7:49       ` Bart Van Assche
  2014-05-27  7:52         ` hch
  2014-05-27  8:23         ` James Bottomley
  0 siblings, 2 replies; 31+ messages in thread
From: Bart Van Assche @ 2014-05-27  7:49 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On 05/26/14 17:27, James Bottomley wrote:
> On Mon, 2014-05-26 at 17:15 +0200, Bart Van Assche wrote:
>> Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
>> block layer core.
> 
> I don't see the value of patches 2,3 they're checking for an impossible
> condition ... why might it be possible?

When reading the source code in scsi_error.c it's easy to overlook that
scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
set. Although it is possible to mention this as a comment above these
functions, such comments are not checked at runtime. It would require
additional work from the reader to verify whether or not such source
code comments are up to date. However, the condition inside a
WARN_ON_ONCE() statement is checked every time the code is executed.
Hence my preference for a WARN_ON_ONCE() statement instead of writing
down somewhere that these three functions operate on requests in which
the REQ_ATOM_COMPLETE bit has been set.

Bart.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27  7:49       ` Bart Van Assche
@ 2014-05-27  7:52         ` hch
  2014-05-27  8:00           ` James Bottomley
  2014-05-27  8:23         ` James Bottomley
  1 sibling, 1 reply; 31+ messages in thread
From: hch @ 2014-05-27  7:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On Tue, May 27, 2014 at 09:49:48AM +0200, Bart Van Assche wrote:
> > I don't see the value of patches 2,3 they're checking for an impossible
> > condition ... why might it be possible?
> 
> When reading the source code in scsi_error.c it's easy to overlook that
> scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
> all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
> set. Although it is possible to mention this as a comment above these
> functions, such comments are not checked at runtime. It would require
> additional work from the reader to verify whether or not such source
> code comments are up to date. However, the condition inside a
> WARN_ON_ONCE() statement is checked every time the code is executed.
> Hence my preference for a WARN_ON_ONCE() statement instead of writing
> down somewhere that these three functions operate on requests in which
> the REQ_ATOM_COMPLETE bit has been set.


I really do like the REQ_ATOM_COMPLETE asserts - as experience tells
these kinds of assumptions are best checked, otherwise they will
unintentionally be violated.

I'm less excited about the list walk I have to say, as the overhead is
getting fairly large for a simple assertation.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27  7:52         ` hch
@ 2014-05-27  8:00           ` James Bottomley
  0 siblings, 0 replies; 31+ messages in thread
From: James Bottomley @ 2014-05-27  8:00 UTC (permalink / raw)
  To: hch; +Cc: linux-scsi, bvanassche, hare, pbonzini, axboe, jdl1291

On Tue, 2014-05-27 at 00:52 -0700, hch@infradead.org wrote:
> On Tue, May 27, 2014 at 09:49:48AM +0200, Bart Van Assche wrote:
> > > I don't see the value of patches 2,3 they're checking for an impossible
> > > condition ... why might it be possible?
> > 
> > When reading the source code in scsi_error.c it's easy to overlook that
> > scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
> > all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
> > set. Although it is possible to mention this as a comment above these
> > functions, such comments are not checked at runtime. It would require
> > additional work from the reader to verify whether or not such source
> > code comments are up to date. However, the condition inside a
> > WARN_ON_ONCE() statement is checked every time the code is executed.
> > Hence my preference for a WARN_ON_ONCE() statement instead of writing
> > down somewhere that these three functions operate on requests in which
> > the REQ_ATOM_COMPLETE bit has been set.
> 
> 
> I really do like the REQ_ATOM_COMPLETE asserts - as experience tells
> these kinds of assumptions are best checked, otherwise they will
> unintentionally be violated.
> 
> I'm less excited about the list walk I have to say, as the overhead is
> getting fairly large for a simple assertation.

OK, my two objections are unconditional export of state from block that
we're trying to confine there.  Experience shows we'll grow unwanted
users of blk_rq_completed.  Probably export the assert from block not
blk_rq_completed().

The second is the WARN_ON_ONCE.  If this assert fails, we're doing error
handling on an uncompleted command and that will cause double
completion, leading to a massive cockup but one which might not
necessarily bring the machine down, so this should be BUG_ON

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-26 15:23   ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Paolo Bonzini
  2014-05-26 15:25     ` James Bottomley
@ 2014-05-27  8:06     ` Bart Van Assche
  2014-05-27  8:09       ` James Bottomley
  1 sibling, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2014-05-27  8:06 UTC (permalink / raw)
  To: Paolo Bonzini, James Bottomley, Christoph Hellwig
  Cc: Hannes Reinecke, Jens Axboe, Joe Lawrence, linux-scsi

On 05/26/14 17:23, Paolo Bonzini wrote:
> Il 26/05/2014 17:14, Bart Van Assche ha scritto:
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 88d46fe..c972eab 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>>      list_del_init(&cmd->list);
>>      spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>>
>> -    cancel_delayed_work(&cmd->abort_work);
>> +    WARN_ON_ONCE(delayed_work_pending(&cmd->abort_work));
>>
>>      __scsi_put_command(cmd->device->host, cmd);
>>  }
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index f17aa7a..5232583 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>>          SCSI_LOG_ERROR_RECOVERY(3,
>>              scmd_printk(KERN_INFO, scmd,
>>                      "scmd %p previous abort failed\n", scmd));
>> -        cancel_delayed_work(&scmd->abort_work);
>> +        WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
>>          return FAILED;
>>      }
>>
>>
> 
> I still prefer a BUG_ON in scsi_put_command, but anyway: series
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Hello Paolo,

As you probably know scsi_put_command() can get called from softirq
context. A BUG_ON() in that context might make it unnecessary hard for a
user to collect call traces.

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-27  8:06     ` Bart Van Assche
@ 2014-05-27  8:09       ` James Bottomley
  2014-05-27  8:36         ` Bart Van Assche
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-27  8:09 UTC (permalink / raw)
  To: bvanassche; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On Tue, 2014-05-27 at 10:06 +0200, Bart Van Assche wrote:
> On 05/26/14 17:23, Paolo Bonzini wrote:
> > Il 26/05/2014 17:14, Bart Van Assche ha scritto:
> >> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> >> index 88d46fe..c972eab 100644
> >> --- a/drivers/scsi/scsi.c
> >> +++ b/drivers/scsi/scsi.c
> >> @@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
> >>      list_del_init(&cmd->list);
> >>      spin_unlock_irqrestore(&cmd->device->list_lock, flags);
> >>
> >> -    cancel_delayed_work(&cmd->abort_work);
> >> +    WARN_ON_ONCE(delayed_work_pending(&cmd->abort_work));
> >>
> >>      __scsi_put_command(cmd->device->host, cmd);
> >>  }
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index f17aa7a..5232583 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -193,7 +193,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
> >>          SCSI_LOG_ERROR_RECOVERY(3,
> >>              scmd_printk(KERN_INFO, scmd,
> >>                      "scmd %p previous abort failed\n", scmd));
> >> -        cancel_delayed_work(&scmd->abort_work);
> >> +        WARN_ON_ONCE(delayed_work_pending(&scmd->abort_work));
> >>          return FAILED;
> >>      }
> >>
> >>
> > 
> > I still prefer a BUG_ON in scsi_put_command, but anyway: series
> > 
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Hello Paolo,
> 
> As you probably know scsi_put_command() can get called from softirq
> context. A BUG_ON() in that context might make it unnecessary hard for a
> user to collect call traces.

Why?  The messages dumped are the same, the trace just starts from the
IRQ context ... I don't see what the problem is.

The question isn't ease of gathering the data, it's correctness.  The
point is that if the assert fails we have a free of an in-use command
leading to a nasty use after free ... the machine state is hosed at that
point.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27  7:49       ` Bart Van Assche
  2014-05-27  7:52         ` hch
@ 2014-05-27  8:23         ` James Bottomley
  2014-05-27  9:00           ` Bart Van Assche
  1 sibling, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-27  8:23 UTC (permalink / raw)
  To: bvanassche; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On Tue, 2014-05-27 at 09:49 +0200, Bart Van Assche wrote:
> On 05/26/14 17:27, James Bottomley wrote:
> > On Mon, 2014-05-26 at 17:15 +0200, Bart Van Assche wrote:
> >> Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
> >> block layer core.
> > 
> > I don't see the value of patches 2,3 they're checking for an impossible
> > condition ... why might it be possible?
> 
> When reading the source code in scsi_error.c it's easy to overlook that
> scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
> all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
> set.

I really don't like this entanglement of state of block and SCSI.
"complete" in block terms isn't the same as in SCSI terms.  The
REQ_ATOM_COMPLETE flag is fully internal to block and indicates that
we've taken over processing the command and any completions into block
get ignored.  This is for the possible race between timeout and inbound
command completion.  If you start coding SCSI assertions in terms of it,
you're entangling layers that should be separate.

The assertion in SCSI terms is that abort and ->done cannot race.

James

>  Although it is possible to mention this as a comment above these
> functions, such comments are not checked at runtime. It would require
> additional work from the reader to verify whether or not such source
> code comments are up to date. However, the condition inside a
> WARN_ON_ONCE() statement is checked every time the code is executed.
> Hence my preference for a WARN_ON_ONCE() statement instead of writing
> down somewhere that these three functions operate on requests in which
> the REQ_ATOM_COMPLETE bit has been set.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-27  8:09       ` James Bottomley
@ 2014-05-27  8:36         ` Bart Van Assche
  2014-05-27  8:56           ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2014-05-27  8:36 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On 05/27/14 10:09, James Bottomley wrote:
> On Tue, 2014-05-27 at 10:06 +0200, Bart Van Assche wrote:
>> As you probably know scsi_put_command() can get called from softirq
>> context. A BUG_ON() in that context might make it unnecessary hard for a
>> user to collect call traces.
> 
> Why?  The messages dumped are the same, the trace just starts from the
> IRQ context ... I don't see what the problem is.
> 
> The question isn't ease of gathering the data, it's correctness.  The
> point is that if the assert fails we have a free of an in-use command
> leading to a nasty use after free ... the machine state is hosed at that
> point.

Please keep in mind that even if the SCSI mid-layer functions correctly
it is still possible that another driver in the system could cause these
tests to fail if it triggers e.g. a use-after-free.

If BUG_ON() is invoked the dumped message will be displayed on the
screen but will not be saved in the system log. This is inconvenient
because it means that if someone wants to capture the dumped message a
camera is necessary and one has to step to the physical console to
capture this message. Using WARN_ON() or WARN_ON_ONCE() makes it a lot
easier for users to capture any message that is displayed.

Bart.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-27  8:36         ` Bart Van Assche
@ 2014-05-27  8:56           ` James Bottomley
  2014-05-27  9:06             ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-27  8:56 UTC (permalink / raw)
  To: bvanassche; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On Tue, 2014-05-27 at 10:36 +0200, Bart Van Assche wrote:
> On 05/27/14 10:09, James Bottomley wrote:
> > On Tue, 2014-05-27 at 10:06 +0200, Bart Van Assche wrote:
> >> As you probably know scsi_put_command() can get called from softirq
> >> context. A BUG_ON() in that context might make it unnecessary hard for a
> >> user to collect call traces.
> > 
> > Why?  The messages dumped are the same, the trace just starts from the
> > IRQ context ... I don't see what the problem is.
> > 
> > The question isn't ease of gathering the data, it's correctness.  The
> > point is that if the assert fails we have a free of an in-use command
> > leading to a nasty use after free ... the machine state is hosed at that
> > point.
> 
> Please keep in mind that even if the SCSI mid-layer functions correctly
> it is still possible that another driver in the system could cause these
> tests to fail if it triggers e.g. a use-after-free.
> 
> If BUG_ON() is invoked the dumped message will be displayed on the
> screen but will not be saved in the system log. This is inconvenient
> because it means that if someone wants to capture the dumped message a
> camera is necessary and one has to step to the physical console to
> capture this message. Using WARN_ON() or WARN_ON_ONCE() makes it a lot
> easier for users to capture any message that is displayed.

This isn't debatable: we code for the safety of the user not for some
academic need to capture data; if you don't understand that, you might
like to re-review systems design.  If this assertion fails, the system
state is corrupt and if we let it continue, that corruption will
propagate.  The *only* safe course that protects the user is to stop it
as fast as possible, hopefully before the corruption penetrates to the
permanent storage.

The whole reason BUG_ON doesn't leave a log trace is to try to prevent
corruption propagating to the data storage devices.  What you propose
would be inviting that corruption in the name of getting a log entry.

If I prioritise getting log information over protecting user data, no
user would, quite rightly, ever trust Linux to store their vital
information.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27  8:23         ` James Bottomley
@ 2014-05-27  9:00           ` Bart Van Assche
  2014-05-27 10:21             ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2014-05-27  9:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On 05/27/14 10:23, James Bottomley wrote:
> On Tue, 2014-05-27 at 09:49 +0200, Bart Van Assche wrote:
>> When reading the source code in scsi_error.c it's easy to overlook that
>> scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
>> all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
>> set.
> 
> I really don't like this entanglement of state of block and SCSI.
> "complete" in block terms isn't the same as in SCSI terms.  The
> REQ_ATOM_COMPLETE flag is fully internal to block and indicates that
> we've taken over processing the command and any completions into block
> get ignored.  This is for the possible race between timeout and inbound
> command completion.  If you start coding SCSI assertions in terms of it,
> you're entangling layers that should be separate.
> 
> The assertion in SCSI terms is that abort and ->done cannot race.

Recent e-mail threads on the linux-scsi mailing list have shown that it
is easily overlooked which error handler functions are called only for
requests marked by the block layer as "complete" and hence cannot race
with scsi_softirq_done(). So far my proposals for how to improve the
documentation of how this race is avoided were rejected. Do you perhaps
have a proposal of how this should be documented properly ?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler
  2014-05-27  8:56           ` James Bottomley
@ 2014-05-27  9:06             ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2014-05-27  9:06 UTC (permalink / raw)
  To: James Bottomley, bvanassche; +Cc: linux-scsi, hch, hare, axboe, jdl1291

Il 27/05/2014 10:56, James Bottomley ha scritto:
> The whole reason BUG_ON doesn't leave a log trace is to try to prevent
> corruption propagating to the data storage devices.  What you propose
> would be inviting that corruption in the name of getting a log entry.

Even this is not entirely correct.  kdump can be used to capture the 
full log of the BUG_ON.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27  9:00           ` Bart Van Assche
@ 2014-05-27 10:21             ` James Bottomley
  2014-05-27 10:47               ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-27 10:21 UTC (permalink / raw)
  To: bvanassche; +Cc: linux-scsi, hch, hare, pbonzini, axboe, jdl1291

On Tue, 2014-05-27 at 11:00 +0200, Bart Van Assche wrote:
> On 05/27/14 10:23, James Bottomley wrote:
> > On Tue, 2014-05-27 at 09:49 +0200, Bart Van Assche wrote:
> >> When reading the source code in scsi_error.c it's easy to overlook that
> >> scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
> >> all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
> >> set.
> > 
> > I really don't like this entanglement of state of block and SCSI.
> > "complete" in block terms isn't the same as in SCSI terms.  The
> > REQ_ATOM_COMPLETE flag is fully internal to block and indicates that
> > we've taken over processing the command and any completions into block
> > get ignored.  This is for the possible race between timeout and inbound
> > command completion.  If you start coding SCSI assertions in terms of it,
> > you're entangling layers that should be separate.
> > 
> > The assertion in SCSI terms is that abort and ->done cannot race.
> 
> Recent e-mail threads on the linux-scsi mailing list have shown that it
> is easily overlooked which error handler functions are called only for
> requests marked by the block layer as "complete" and hence cannot race
> with scsi_softirq_done(). So far my proposals for how to improve the
> documentation of how this race is avoided were rejected. Do you perhaps
> have a proposal of how this should be documented properly ?

Um, if that's your goal, then I don't see how adding a 
	WARN_ON_ONCE(!blk_rq_completed(scmd->request));
makes it clear that timeout and the softirq cannot race.

I suppose for me, it's just an obvious systems assertion since every
command must be in a defined state for the state model, either a command
has timed out or it's in normal completion.  But since we inherit this
race mediation from block, isn't that the place to document it if people
are confused?

I could also see us one day extending the TMF capability to abort any
running command, which would make even an assertion of block timed out
or completed invalid.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27 10:21             ` James Bottomley
@ 2014-05-27 10:47               ` Paolo Bonzini
  2014-05-27 10:59                 ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2014-05-27 10:47 UTC (permalink / raw)
  To: James Bottomley, bvanassche; +Cc: linux-scsi, hch, hare, axboe, jdl1291

Il 27/05/2014 12:21, James Bottomley ha scritto:
> I could also see us one day extending the TMF capability to abort any
> running command, which would make even an assertion of block timed out
> or completed invalid.

Actually the assertion would remain valid, and that's exactly what Bart 
wants to document with this assertion.

"Completed" from the point of view of the block layer really means "do 
not run the softirq".  If you wanted to abort any running command, you 
still would mark the request as completed for the block layer before 
issuing the TMF.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27 10:47               ` Paolo Bonzini
@ 2014-05-27 10:59                 ` James Bottomley
  2014-05-27 11:13                   ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-27 10:59 UTC (permalink / raw)
  To: pbonzini; +Cc: linux-scsi, bvanassche, hch, hare, axboe, jdl1291

On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote:
> Il 27/05/2014 12:21, James Bottomley ha scritto:
> > I could also see us one day extending the TMF capability to abort any
> > running command, which would make even an assertion of block timed out
> > or completed invalid.
> 
> Actually the assertion would remain valid, and that's exactly what Bart 
> wants to document with this assertion.

No, it wouldn't: if we abort a running command by definition the command
hadn't timed out and might not be completed.  This is required by TMF
handling because now you have an abort racing with a completion.  Either
the command completes normally because it misses the abort or the abort
gets to it and its returned status is set to TASK_ABORTED.  That's the
only way you can tell if the abort was successful or not.

If you're thinking we would tell block to ignore returning commands
before issuing the abort, we'd never be able to tell if the abort were
successful, so we have to allow the race to collect the status.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27 10:59                 ` James Bottomley
@ 2014-05-27 11:13                   ` Paolo Bonzini
  2014-05-27 11:26                     ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2014-05-27 11:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, bvanassche, hch, hare, axboe, jdl1291

Il 27/05/2014 12:59, James Bottomley ha scritto:
> On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote:
>> Il 27/05/2014 12:21, James Bottomley ha scritto:
>>> I could also see us one day extending the TMF capability to abort any
>>> running command, which would make even an assertion of block timed out
>>> or completed invalid.
>>
>> Actually the assertion would remain valid, and that's exactly what Bart
>> wants to document with this assertion.
>
> No, it wouldn't: if we abort a running command by definition the command
> hadn't timed out and might not be completed.  This is required by TMF
> handling because now you have an abort racing with a completion.  Either
> the command completes normally because it misses the abort or the abort
> gets to it and its returned status is set to TASK_ABORTED.  That's the
> only way you can tell if the abort was successful or not.
>
> If you're thinking we would tell block to ignore returning commands
> before issuing the abort, we'd never be able to tell if the abort were
> successful, so we have to allow the race to collect the status.

You could use a different mechanism than a softirq to tell the abort 
were successful, for example by overriding scsi_done.  But with respect 
to the block layer, the mechanics of avoiding the race and double-free 
would probably be the same.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27 11:13                   ` Paolo Bonzini
@ 2014-05-27 11:26                     ` James Bottomley
  2014-05-27 11:52                       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2014-05-27 11:26 UTC (permalink / raw)
  To: pbonzini; +Cc: linux-scsi, bvanassche, hch, hare, axboe, jdl1291

On Tue, 2014-05-27 at 13:13 +0200, Paolo Bonzini wrote:
> Il 27/05/2014 12:59, James Bottomley ha scritto:
> > On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote:
> >> Il 27/05/2014 12:21, James Bottomley ha scritto:
> >>> I could also see us one day extending the TMF capability to abort any
> >>> running command, which would make even an assertion of block timed out
> >>> or completed invalid.
> >>
> >> Actually the assertion would remain valid, and that's exactly what Bart
> >> wants to document with this assertion.
> >
> > No, it wouldn't: if we abort a running command by definition the command
> > hadn't timed out and might not be completed.  This is required by TMF
> > handling because now you have an abort racing with a completion.  Either
> > the command completes normally because it misses the abort or the abort
> > gets to it and its returned status is set to TASK_ABORTED.  That's the
> > only way you can tell if the abort was successful or not.
> >
> > If you're thinking we would tell block to ignore returning commands
> > before issuing the abort, we'd never be able to tell if the abort were
> > successful, so we have to allow the race to collect the status.
> 
> You could use a different mechanism than a softirq to tell the abort 
> were successful, for example by overriding scsi_done.  But with respect 
> to the block layer, the mechanics of avoiding the race and double-free 
> would probably be the same.

I think there's some confusion about what the race and double free is:
It only occurs with timeouts.  In a timeout situation, the host had
decided it's not waiting any longer for the target to respond and
proceeds to error recovery.  At any time between the host making this
decision up to the point it kicks the target hard enough to clear all
in-flight commands, the target may return the command.  If we didn't
have some ignore function on command completions while we're handling
errors, this would lead to double completion.

If we decided to allow arbitrary aborts of running commands, we would
send a TMF in during the normal (i.e. un timed out) command period.
Because there's no timeout involved, there's no double free problem.
The race in this case is whether the abort catches the command or not
and to mediate that race we need the normal status return.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27 11:26                     ` James Bottomley
@ 2014-05-27 11:52                       ` Paolo Bonzini
  2014-05-27 11:57                         ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2014-05-27 11:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, bvanassche, hch, hare, axboe, jdl1291

Il 27/05/2014 13:26, James Bottomley ha scritto:
>> You could use a different mechanism than a softirq to tell the abort
>> were successful, for example by overriding scsi_done.  But with respect
>> to the block layer, the mechanics of avoiding the race and double-free
>> would probably be the same.
>
> I think there's some confusion about what the race and double free is:
> It only occurs with timeouts.  In a timeout situation, the host had
> decided it's not waiting any longer for the target to respond and
> proceeds to error recovery.  At any time between the host making this
> decision up to the point it kicks the target hard enough to clear all
> in-flight commands, the target may return the command.  If we didn't
> have some ignore function on command completions while we're handling
> errors, this would lead to double completion.
>
> If we decided to allow arbitrary aborts of running commands, we would
> send a TMF in during the normal (i.e. un timed out) command period.
> Because there's no timeout involved, there's no double free problem.
> The race in this case is whether the abort catches the command or not
> and to mediate that race we need the normal status return.

I'm not sure why "no timeout" implies "no double free".  There would 
still be a race between the interrupt handler and softirq on one side, 
and the abort handler on the other.  The interrupt handler's call to 
cmd->scsi_done ends up triggering the softirq and thus freeing the 
command with scsi_put_command.  The abort handler, as you mentioned, 
wants the status return so it needs the interrupt handler to run---but 
not the softirq.

A simple way to avoid this could be to skip the softirq processing, by 
marking the request block-layer-complete, and do everything in the abort 
handler.  The interrupt handler would still run and fill in the status.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] block: Introduce blk_rq_completed()
  2014-05-27 11:52                       ` Paolo Bonzini
@ 2014-05-27 11:57                         ` James Bottomley
  0 siblings, 0 replies; 31+ messages in thread
From: James Bottomley @ 2014-05-27 11:57 UTC (permalink / raw)
  To: pbonzini; +Cc: linux-scsi, bvanassche, hch, hare, axboe, jdl1291

On Tue, 2014-05-27 at 13:52 +0200, Paolo Bonzini wrote:
> Il 27/05/2014 13:26, James Bottomley ha scritto:
> >> You could use a different mechanism than a softirq to tell the abort
> >> were successful, for example by overriding scsi_done.  But with respect
> >> to the block layer, the mechanics of avoiding the race and double-free
> >> would probably be the same.
> >
> > I think there's some confusion about what the race and double free is:
> > It only occurs with timeouts.  In a timeout situation, the host had
> > decided it's not waiting any longer for the target to respond and
> > proceeds to error recovery.  At any time between the host making this
> > decision up to the point it kicks the target hard enough to clear all
> > in-flight commands, the target may return the command.  If we didn't
> > have some ignore function on command completions while we're handling
> > errors, this would lead to double completion.
> >
> > If we decided to allow arbitrary aborts of running commands, we would
> > send a TMF in during the normal (i.e. un timed out) command period.
> > Because there's no timeout involved, there's no double free problem.
> > The race in this case is whether the abort catches the command or not
> > and to mediate that race we need the normal status return.
> 
> I'm not sure why "no timeout" implies "no double free".  There would 
> still be a race between the interrupt handler and softirq on one side, 
> and the abort handler on the other.

You're assuming the current error handling after timeout methodology.
If we allow arbitrary TMFs, that doesn't hold because we have to wait
for the command completion.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Make SCSI error handler code easier to understand
  2014-05-26 15:12 Make SCSI error handler code easier to understand Bart Van Assche
  2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
  2014-05-26 15:15 ` [PATCH 3/3] Make SCSI error handler code easier to understand Bart Van Assche
@ 2014-05-28 20:15 ` Joe Lawrence
  2014-05-29 11:33   ` James Bottomley
  2 siblings, 1 reply; 31+ messages in thread
From: Joe Lawrence @ 2014-05-28 20:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Christoph Hellwig, Hannes Reinecke, Jens Axboe,
	Paolo Bonzini, Joe Lawrence, linux-scsi, Bill Kuzeja

On Mon, 26 May 2014 17:12:27 +0200
Bart Van Assche <bvanassche@acm.org> wrote:

> Every now and then someone asks how it is avoided that the SCSI error
> handler and the SCSI completion handler are invoked concurrently for
> the same SCSI command. Hence this patch series that should make the SCSI
> error handler code a little easier to understand.

Hi Bart,

We talked about REQ_ATOM_COMPLETE a while back (so I may be one of
those people who periodically ask about SCSI error handling /
completion), and I just thought I'd chime in here to clarify the
scenario we were worried about.

Before d555a2ab "[SCSI] Fix spurious request sense in error handling"
we saw the following sequence of events:

  [ 1394.345991] sd 2:0:1:0: [sdw] Done:
  [ 1394.361790] TIMEOUT
  [ 1394.374148] sd 2:0:1:0: [sdw]
  [ 1394.388775] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
  [ 1394.409068] sd 2:0:1:0: [sdw] CDB:
  [ 1394.424782] Read(10): 28 00 00 00 00 18 00 00 08 00
  [ 1394.443772] sd 2:0:1:0: [sdw] scmd ffff88104dce02c0 abort scheduled
  [ 1394.474026] sd 2:0:1:0: [sdw] aborting command ffff88104dce02c0
  [ 1394.573338] qla2xxx [0000:46:00.0]-801c:2: Abort command issued nexus=2:1:0 --  1 2002.
  [ 1394.609313] sd 2:0:1:0: [sdw] scmd ffff88104dce02c0 retry aborted command
  
  [ 1425.397434] sd 2:0:1:0: [sdw] Done:
  [ 1425.417802] TIMEOUT
  [ 1425.431914] sd 2:0:1:0: [sdw]
  [ 1425.448247] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
  [ 1425.470736] sd 2:0:1:0: [sdw] CDB:
  [ 1425.488005] Read(10): 28 00 00 00 00 18 00 00 08 00
  [ 1425.508472] sd 2:0:1:0: [sdw] scmd ffff88104dce02c0 previous abort failed
  [ 1425.533299] scsi_eh_2: waking up 0/1/1
  [ 1425.551189] sd 2:0:1:0: scsi_eh_prt_fail_stats: cmds failed: 1, cancel: 0
  [ 1425.575897] Total of 1 commands on 1 devices require eh work
  [ 1425.597970] sd 2:0:1:0: [sdw] scsi_eh_2: requesting sense

  ie, something like:

  CMD -> .queuecommand ... timeout
  ABORT -> .eh_abort_handler
  CMD (retry) -> .queuecommand ... timeout
  ABORT -> scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)
  ... into scsi_error_handler ...
  REQUEST SENSE -> .queuecommand

We eventually found our way into scsi_unjam_host, which "*knows*
that all commands on the bus have either completed, failed or timed
out.", and then scsi_send_eh_cmnd, which "hijacks" a SCSI command
structure.  Presumably the second comment follows from the first.

But in the case of timeout, when was the LLDD informed to abort or
forget about that scsi command structure?

In our testing, it appeared that the LLDD's .queuecommand was called
back-to-back with the same scsi_cmnd pointer.  The driver in question
was qla2xxx, who keeps a driver-private structure to scsi_cmnd
mapping (assumed to be one-to-one) that got confused by this sequence
of events.

After d555a2ab, in this scenario scsi_unjam_host skips the request
sense and escalates to the next level (scsi_eh_abort_cmds), skipping
straight to the abort as we would have expected...  This avoids the
scsi_cmnd "hijack"... but was it ever safe to do so in the first place?

Or should LLDD expect to gracefully handle the same scsi_cmnd pointer
coming into .queuecommand w/o an intermediate abort or other
scsi_host_template callback?

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Make SCSI error handler code easier to understand
  2014-05-28 20:15 ` Joe Lawrence
@ 2014-05-29 11:33   ` James Bottomley
  0 siblings, 0 replies; 31+ messages in thread
From: James Bottomley @ 2014-05-29 11:33 UTC (permalink / raw)
  To: joe.lawrence
  Cc: hch, bvanassche, hare, axboe, linux-scsi, william.kuzeja,
	jdl1291, pbonzini

On Wed, 2014-05-28 at 16:15 -0400, Joe Lawrence wrote:
> On Mon, 26 May 2014 17:12:27 +0200
> Bart Van Assche <bvanassche@acm.org> wrote:
> 
> > Every now and then someone asks how it is avoided that the SCSI error
> > handler and the SCSI completion handler are invoked concurrently for
> > the same SCSI command. Hence this patch series that should make the SCSI
> > error handler code a little easier to understand.
> 
> Hi Bart,
> 
> We talked about REQ_ATOM_COMPLETE a while back (so I may be one of
> those people who periodically ask about SCSI error handling /
> completion), and I just thought I'd chime in here to clarify the
> scenario we were worried about.
> 
> Before d555a2ab "[SCSI] Fix spurious request sense in error handling"
> we saw the following sequence of events:
> 
>   [ 1394.345991] sd 2:0:1:0: [sdw] Done:
>   [ 1394.361790] TIMEOUT
>   [ 1394.374148] sd 2:0:1:0: [sdw]
>   [ 1394.388775] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
>   [ 1394.409068] sd 2:0:1:0: [sdw] CDB:
>   [ 1394.424782] Read(10): 28 00 00 00 00 18 00 00 08 00
>   [ 1394.443772] sd 2:0:1:0: [sdw] scmd ffff88104dce02c0 abort scheduled
>   [ 1394.474026] sd 2:0:1:0: [sdw] aborting command ffff88104dce02c0
>   [ 1394.573338] qla2xxx [0000:46:00.0]-801c:2: Abort command issued nexus=2:1:0 --  1 2002.
>   [ 1394.609313] sd 2:0:1:0: [sdw] scmd ffff88104dce02c0 retry aborted command
>   
>   [ 1425.397434] sd 2:0:1:0: [sdw] Done:
>   [ 1425.417802] TIMEOUT
>   [ 1425.431914] sd 2:0:1:0: [sdw]
>   [ 1425.448247] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
>   [ 1425.470736] sd 2:0:1:0: [sdw] CDB:
>   [ 1425.488005] Read(10): 28 00 00 00 00 18 00 00 08 00
>   [ 1425.508472] sd 2:0:1:0: [sdw] scmd ffff88104dce02c0 previous abort failed
>   [ 1425.533299] scsi_eh_2: waking up 0/1/1
>   [ 1425.551189] sd 2:0:1:0: scsi_eh_prt_fail_stats: cmds failed: 1, cancel: 0
>   [ 1425.575897] Total of 1 commands on 1 devices require eh work
>   [ 1425.597970] sd 2:0:1:0: [sdw] scsi_eh_2: requesting sense
> 
>   ie, something like:
> 
>   CMD -> .queuecommand ... timeout
>   ABORT -> .eh_abort_handler
>   CMD (retry) -> .queuecommand ... timeout
>   ABORT -> scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)
>   ... into scsi_error_handler ...
>   REQUEST SENSE -> .queuecommand
> 
> We eventually found our way into scsi_unjam_host, which "*knows*
> that all commands on the bus have either completed, failed or timed
> out.", and then scsi_send_eh_cmnd, which "hijacks" a SCSI command
> structure.  Presumably the second comment follows from the first.
> 
> But in the case of timeout, when was the LLDD informed to abort or
> forget about that scsi command structure?

When we get a successful abort, the command is back with the mid layer,
or when we complete a reset (all commands should now be taken from the
LLD).  We don't start trying to probe the device with a TUR until after
the command should be back with the mid layer.

> In our testing, it appeared that the LLDD's .queuecommand was called
> back-to-back with the same scsi_cmnd pointer.  The driver in question
> was qla2xxx, who keeps a driver-private structure to scsi_cmnd
> mapping (assumed to be one-to-one) that got confused by this sequence
> of events.

This was the bug in ACA emulation which the fix corrected.  After that
fix, we should only reuse the command for ACA emulation *if* we get a
successful return code indicating sense should be collected (meaning the
LLD relinquished the command).

> After d555a2ab, in this scenario scsi_unjam_host skips the request
> sense and escalates to the next level (scsi_eh_abort_cmds), skipping
> straight to the abort as we would have expected...  This avoids the
> scsi_cmnd "hijack"... but was it ever safe to do so in the first place?

The hijack is safe provided the LLD has relinquished the command, which
it does after status return, abort or reset.

> Or should LLDD expect to gracefully handle the same scsi_cmnd pointer
> coming into .queuecommand w/o an intermediate abort or other
> scsi_host_template callback?

No.

James


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-05-29 11:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-26 15:12 Make SCSI error handler code easier to understand Bart Van Assche
2014-05-26 15:14 ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Bart Van Assche
2014-05-26 15:15   ` [PATCH 2/3] block: Introduce blk_rq_completed() Bart Van Assche
2014-05-26 15:27     ` James Bottomley
2014-05-27  7:49       ` Bart Van Assche
2014-05-27  7:52         ` hch
2014-05-27  8:00           ` James Bottomley
2014-05-27  8:23         ` James Bottomley
2014-05-27  9:00           ` Bart Van Assche
2014-05-27 10:21             ` James Bottomley
2014-05-27 10:47               ` Paolo Bonzini
2014-05-27 10:59                 ` James Bottomley
2014-05-27 11:13                   ` Paolo Bonzini
2014-05-27 11:26                     ` James Bottomley
2014-05-27 11:52                       ` Paolo Bonzini
2014-05-27 11:57                         ` James Bottomley
2014-05-27  5:40     ` Hannes Reinecke
2014-05-26 15:23   ` [PATCH 1/3] Remove two cancel_delayed_work() calls from the error handler Paolo Bonzini
2014-05-26 15:25     ` James Bottomley
2014-05-27  8:06     ` Bart Van Assche
2014-05-27  8:09       ` James Bottomley
2014-05-27  8:36         ` Bart Van Assche
2014-05-27  8:56           ` James Bottomley
2014-05-27  9:06             ` Paolo Bonzini
2014-05-27  5:40   ` Hannes Reinecke
2014-05-27  6:08     ` Bart Van Assche
2014-05-27  6:22       ` Hannes Reinecke
2014-05-26 15:15 ` [PATCH 3/3] Make SCSI error handler code easier to understand Bart Van Assche
2014-05-27  5:42   ` Hannes Reinecke
2014-05-28 20:15 ` Joe Lawrence
2014-05-29 11:33   ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.