All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Ensure that the SCSI error handler gets woken up
@ 2017-11-30 22:44 Bart Van Assche
  2017-11-30 22:44 ` [PATCH 1/2] " Bart Van Assche
  2017-11-30 22:44 ` [PATCH 2/2] Convert a source code comment into a runtime check Bart Van Assche
  0 siblings, 2 replies; 8+ messages in thread
From: Bart Van Assche @ 2017-11-30 22:44 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley; +Cc: linux-scsi, Bart Van Assche

Hello Martin,

As reported by Pavel Tikhomirov it can happen that the SCSI error handler does
not get woken up. This is very annoying because it results in a queue
stall. The two patches in this series address this issue without acquiring the
SCSI host lock in the hot path. Please consider these patches for kernel
v4.16.

Thanks,

Bart.

Changes compared to v1:
- Ensure that host_lock is held while checking host_failed.
- Moved the lockdep_assert_held() change into a separate patch.

Bart Van Assche (2):
  Ensure that the SCSI error handler gets woken up
  Convert a source code comment into a runtime check

 drivers/scsi/scsi_error.c | 11 +++++++++--
 drivers/scsi/scsi_lib.c   | 39 ++++++++++++++++++++++++++++-----------
 2 files changed, 37 insertions(+), 13 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] Ensure that the SCSI error handler gets woken up
  2017-11-30 22:44 [PATCH 0/2] Ensure that the SCSI error handler gets woken up Bart Van Assche
@ 2017-11-30 22:44 ` Bart Van Assche
  2017-12-01  8:42   ` Pavel Tikhomirov
  2017-12-01  8:45   ` Johannes Thumshirn
  2017-11-30 22:44 ` [PATCH 2/2] Convert a source code comment into a runtime check Bart Van Assche
  1 sibling, 2 replies; 8+ messages in thread
From: Bart Van Assche @ 2017-11-30 22:44 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Bart Van Assche, Konstantin Khorenko, Stuart Hayes,
	Pavel Tikhomirov, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn, stable

If scsi_eh_scmd_add() is called concurrently with
scsi_host_queue_ready() while shost->host_blocked > 0 then it can
happen that neither function wakes up the SCSI error handler. Fix
this by making every function that decreases the host_busy counter
wake up the error handler if necessary and by protecting the
host_failed checks with the SCSI host lock.

Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_error.c |  8 +++++++-
 drivers/scsi/scsi_lib.c   | 39 ++++++++++++++++++++++++++++-----------
 2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5e89049e9b4e..b22a9a23c74c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -233,19 +233,25 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
 void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 {
 	struct Scsi_Host *shost = scmd->device->host;
+	enum scsi_host_state shost_state;
 	unsigned long flags;
 	int ret;
 
 	WARN_ON_ONCE(!shost->ehandler);
 
 	spin_lock_irqsave(shost->host_lock, flags);
+	shost_state = shost->shost_state;
 	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
 		ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
 		WARN_ON_ONCE(ret);
 	}
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
-
+	if (shost_state != shost->shost_state) {
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		synchronize_rcu();
+		spin_lock_irqsave(shost->host_lock, flags);
+	}
 	scsi_eh_reset(scmd);
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b6d3842b6809..7d18fb245d7d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,22 +318,39 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
 		cmd->cmd_len = scsi_command_size(cmd->cmnd);
 }
 
-void scsi_device_unbusy(struct scsi_device *sdev)
+/*
+ * Decrement the host_busy counter and wake up the error handler if necessary.
+ * Avoid as follows that the error handler is not woken up if shost->host_busy
+ * == shost->host_failed: use synchronize_rcu() in scsi_eh_scmd_add() in
+ * combination with an RCU read lock in this function to ensure that this
+ * function in its entirety either finishes before scsi_eh_scmd_add()
+ * increases the host_failed counter or that it notices the shost state change
+ * made by scsi_eh_scmd_add().
+ */
+static void scsi_dec_host_busy(struct Scsi_Host *shost)
 {
-	struct Scsi_Host *shost = sdev->host;
-	struct scsi_target *starget = scsi_target(sdev);
 	unsigned long flags;
 
+	rcu_read_lock();
 	atomic_dec(&shost->host_busy);
-	if (starget->can_queue > 0)
-		atomic_dec(&starget->target_busy);
-
-	if (unlikely(scsi_host_in_recovery(shost) &&
-		     (shost->host_failed || shost->host_eh_scheduled))) {
+	if (unlikely(scsi_host_in_recovery(shost))) {
 		spin_lock_irqsave(shost->host_lock, flags);
-		scsi_eh_wakeup(shost);
+		if (shost->host_failed || shost->host_eh_scheduled)
+			scsi_eh_wakeup(shost);
 		spin_unlock_irqrestore(shost->host_lock, flags);
 	}
+	rcu_read_unlock();
+}
+
+void scsi_device_unbusy(struct scsi_device *sdev)
+{
+	struct Scsi_Host *shost = sdev->host;
+	struct scsi_target *starget = scsi_target(sdev);
+
+	scsi_dec_host_busy(shost);
+
+	if (starget->can_queue > 0)
+		atomic_dec(&starget->target_busy);
 
 	atomic_dec(&sdev->device_busy);
 }
@@ -1531,7 +1548,7 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		list_add_tail(&sdev->starved_entry, &shost->starved_list);
 	spin_unlock_irq(shost->host_lock);
 out_dec:
-	atomic_dec(&shost->host_busy);
+	scsi_dec_host_busy(shost);
 	return 0;
 }
 
@@ -2017,7 +2034,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_OK;
 
 out_dec_host_busy:
-       atomic_dec(&shost->host_busy);
+	scsi_dec_host_busy(shost);
 out_dec_target_busy:
 	if (scsi_target(sdev)->can_queue > 0)
 		atomic_dec(&scsi_target(sdev)->target_busy);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] Convert a source code comment into a runtime check
  2017-11-30 22:44 [PATCH 0/2] Ensure that the SCSI error handler gets woken up Bart Van Assche
  2017-11-30 22:44 ` [PATCH 1/2] " Bart Van Assche
@ 2017-11-30 22:44 ` Bart Van Assche
  2017-12-01  8:46   ` Johannes Thumshirn
  1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2017-11-30 22:44 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Bart Van Assche, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_error.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index b22a9a23c74c..1622eacbf577 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -61,9 +61,10 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
 static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 				 struct scsi_cmnd *);
 
-/* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
+	lockdep_assert_held(shost->host_lock);
+
 	if (atomic_read(&shost->host_busy) == shost->host_failed) {
 		trace_scsi_eh_wakeup(shost);
 		wake_up_process(shost->ehandler);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Ensure that the SCSI error handler gets woken up
  2017-11-30 22:44 ` [PATCH 1/2] " Bart Van Assche
@ 2017-12-01  8:42   ` Pavel Tikhomirov
  2017-12-01 17:42     ` Bart Van Assche
  2017-12-01  8:45   ` Johannes Thumshirn
  1 sibling, 1 reply; 8+ messages in thread
From: Pavel Tikhomirov @ 2017-12-01  8:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Konstantin Khorenko, Stuart Hayes, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn, stable



On 12/01/2017 01:44 AM, Bart Van Assche wrote:
> If scsi_eh_scmd_add() is called concurrently with
> scsi_host_queue_ready() while shost->host_blocked > 0 then it can
> happen that neither function wakes up the SCSI error handler. Fix
> this by making every function that decreases the host_busy counter
> wake up the error handler if necessary and by protecting the
> host_failed checks with the SCSI host lock.
> 
> Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
> Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>   drivers/scsi/scsi_error.c |  8 +++++++-
>   drivers/scsi/scsi_lib.c   | 39 ++++++++++++++++++++++++++++-----------
>   2 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 5e89049e9b4e..b22a9a23c74c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -233,19 +233,25 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
>   void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>   {
>   	struct Scsi_Host *shost = scmd->device->host;
> +	enum scsi_host_state shost_state;
>   	unsigned long flags;
>   	int ret;
>   
>   	WARN_ON_ONCE(!shost->ehandler);
>   
>   	spin_lock_irqsave(shost->host_lock, flags);
> +	shost_state = shost->shost_state;
>   	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
>   		ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
>   		WARN_ON_ONCE(ret);
>   	}
>   	if (shost->eh_deadline != -1 && !shost->last_reset)
>   		shost->last_reset = jiffies;
> -
> +	if (shost_state != shost->shost_state) {
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +		synchronize_rcu();

We can come here from interrupt context, so may be we should use 
call_rcu() here instead, possible backtrace:

  => scsi_eh_scmd_add
  => scsi_times_out
  => blk_rq_timed_out
  => blk_abort_request
  => ata_qc_schedule_eh
  => ata_qc_complete
  => ata_do_link_abort
  => ata_port_abort
  => ahci_handle_port_interrupt
  => ahci_single_irq_intr
  => __handle_irq_event_percpu
  => handle_irq_event_percpu
  => handle_irq_event
  => handle_edge_irq
  => handle_irq
  => do_IRQ

> +		spin_lock_irqsave(shost->host_lock, flags);
> +	}
>   	scsi_eh_reset(scmd);
>   	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>   	shost->host_failed++;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b6d3842b6809..7d18fb245d7d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -318,22 +318,39 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
>   		cmd->cmd_len = scsi_command_size(cmd->cmnd);
>   }
>   
> -void scsi_device_unbusy(struct scsi_device *sdev)
> +/*
> + * Decrement the host_busy counter and wake up the error handler if necessary.
> + * Avoid as follows that the error handler is not woken up if shost->host_busy
> + * == shost->host_failed: use synchronize_rcu() in scsi_eh_scmd_add() in
> + * combination with an RCU read lock in this function to ensure that this
> + * function in its entirety either finishes before scsi_eh_scmd_add()
> + * increases the host_failed counter or that it notices the shost state change
> + * made by scsi_eh_scmd_add().
> + */
> +static void scsi_dec_host_busy(struct Scsi_Host *shost)
>   {
> -	struct Scsi_Host *shost = sdev->host;
> -	struct scsi_target *starget = scsi_target(sdev);
>   	unsigned long flags;
>   
> +	rcu_read_lock();
>   	atomic_dec(&shost->host_busy);
> -	if (starget->can_queue > 0)
> -		atomic_dec(&starget->target_busy);
> -
> -	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> +	if (unlikely(scsi_host_in_recovery(shost))) {
>   		spin_lock_irqsave(shost->host_lock, flags);
> -		scsi_eh_wakeup(shost);
> +		if (shost->host_failed || shost->host_eh_scheduled)
> +			scsi_eh_wakeup(shost);
>   		spin_unlock_irqrestore(shost->host_lock, flags);
>   	}
> +	rcu_read_unlock();
> +}
> +
> +void scsi_device_unbusy(struct scsi_device *sdev)
> +{
> +	struct Scsi_Host *shost = sdev->host;
> +	struct scsi_target *starget = scsi_target(sdev);
> +
> +	scsi_dec_host_busy(shost);
> +
> +	if (starget->can_queue > 0)
> +		atomic_dec(&starget->target_busy);
>   
>   	atomic_dec(&sdev->device_busy);
>   }
> @@ -1531,7 +1548,7 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   		list_add_tail(&sdev->starved_entry, &shost->starved_list);
>   	spin_unlock_irq(shost->host_lock);
>   out_dec:
> -	atomic_dec(&shost->host_busy);
> +	scsi_dec_host_busy(shost);
>   	return 0;
>   }
>   
> @@ -2017,7 +2034,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	return BLK_STS_OK;
>   
>   out_dec_host_busy:
> -       atomic_dec(&shost->host_busy);
> +	scsi_dec_host_busy(shost);
>   out_dec_target_busy:
>   	if (scsi_target(sdev)->can_queue > 0)
>   		atomic_dec(&scsi_target(sdev)->target_busy);
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Ensure that the SCSI error handler gets woken up
  2017-11-30 22:44 ` [PATCH 1/2] " Bart Van Assche
  2017-12-01  8:42   ` Pavel Tikhomirov
@ 2017-12-01  8:45   ` Johannes Thumshirn
  2017-12-01 17:40     ` Bart Van Assche
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Thumshirn @ 2017-12-01  8:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Konstantin Khorenko, Stuart Hayes, Pavel Tikhomirov,
	Christoph Hellwig, Hannes Reinecke, stable

Hi Bart,

Bart Van Assche <bart.vanassche@wdc.com> writes:
[...]

> +	if (shost_state != shost->shost_state) {
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +		synchronize_rcu();
> +		spin_lock_irqsave(shost->host_lock, flags);
> +	}

Plese correct me if I'm wrong, but once you drop the host lock all
assumptions about states it protects are void, aren't they? 

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] Convert a source code comment into a runtime check
  2017-11-30 22:44 ` [PATCH 2/2] Convert a source code comment into a runtime check Bart Van Assche
@ 2017-12-01  8:46   ` Johannes Thumshirn
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2017-12-01  8:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Christoph Hellwig, Hannes Reinecke

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Ensure that the SCSI error handler gets woken up
  2017-12-01  8:45   ` Johannes Thumshirn
@ 2017-12-01 17:40     ` Bart Van Assche
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2017-12-01 17:40 UTC (permalink / raw)
  To: jthumshirn
  Cc: hch, stuart.w.hayes, martin.petersen, stable, linux-scsi, hare,
	jejb, ptikhomirov, khorenko

On Fri, 2017-12-01 at 09:45 +0100, Johannes Thumshirn wrote:
> Bart Van Assche <bart.vanassche@wdc.com> writes:
> [...]
> 
> > +	if (shost_state != shost->shost_state) {
> > +		spin_unlock_irqrestore(shost->host_lock, flags);
> > +		synchronize_rcu();
> > +		spin_lock_irqsave(shost->host_lock, flags);
> > +	}
> 
> Plese correct me if I'm wrong, but once you drop the host lock all
> assumptions about states it protects are void, aren't they? 

Hello Johannes,

That's a good question. I think it is safe to drop the host lock at that point
because waking up the error handler thread will only happen after host_failed
has been incremented.

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Ensure that the SCSI error handler gets woken up
  2017-12-01  8:42   ` Pavel Tikhomirov
@ 2017-12-01 17:42     ` Bart Van Assche
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2017-12-01 17:42 UTC (permalink / raw)
  To: ptikhomirov
  Cc: jthumshirn, hch, stuart.w.hayes, martin.petersen, stable,
	linux-scsi, hare, jejb, khorenko

On Fri, 2017-12-01 at 11:42 +0300, Pavel Tikhomirov wrote:
> On 12/01/2017 01:44 AM, Bart Van Assche wrote:
> > +	if (shost_state != shost->shost_state) {
> > +		spin_unlock_irqrestore(shost->host_lock, flags);
> > +		synchronize_rcu();
> 
> We can come here from interrupt context, so may be we should use 
> call_rcu() here instead.

Hello Pavel,

I will rework this patch such that it uses call_rcu() instead of
synchronize_rcu().

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-01 17:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30 22:44 [PATCH 0/2] Ensure that the SCSI error handler gets woken up Bart Van Assche
2017-11-30 22:44 ` [PATCH 1/2] " Bart Van Assche
2017-12-01  8:42   ` Pavel Tikhomirov
2017-12-01 17:42     ` Bart Van Assche
2017-12-01  8:45   ` Johannes Thumshirn
2017-12-01 17:40     ` Bart Van Assche
2017-11-30 22:44 ` [PATCH 2/2] Convert a source code comment into a runtime check Bart Van Assche
2017-12-01  8:46   ` Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.