linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
@ 2017-09-05 12:54 Pavel Tikhomirov
  2017-10-04  7:18 ` Pavel Tikhomirov
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-09-05 12:54 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-kernel, Konstantin Khorenko,
	Pavel Tikhomirov, devel

We have a problem on several our nodes with scsi EH. Imagine such an
order of execution of two threads:

CPU1 scsi_eh_scmd_add		CPU2 scsi_host_queue_ready
/* shost->host_busy == 1 initialy */

				if (shost->shost_state == SHOST_RECOVERY)
					/* does not get here */
					return 0;

lock(shost->host_lock);
shost->shost_state = SHOST_RECOVERY;

				busy = shost->host_busy++;
				/* host->can_queue == 1 initialy, busy == 1
				 * - go to starved label */
				lock(shost->host_lock) /* wait */

shost->host_failed++;
/* shost->host_busy == 2, shost->host_failed == 1 */
call scsi_eh_wakeup(shost) {
	if (host_busy == host_failed) {
		/* does not get here */
		wake_up_process(shost->ehandler)
	}
}
unlock(shost->host_lock)

				/* acquire lock */
				shost->host_busy--;

Finaly we do not wakeup scsi_error_handler and all other commands
coming will hang as we are in never ending recovery state as there
is no one left to wakeup handler.

So scsi disc in these host becomes unresponsive and all bio on node
hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)

Main idea of the fix is to try to do wake up every time we decrement
host_busy or increment host_failed(the latter is already OK).

Now the very *last* one of busy threads getting host_lock after
decrementing host_busy will see all write operations on host's
shost_state, host_busy and host_failed completed thanks to implied
memory barriers on spin_lock/unlock, so at the time of busy==failed
we will trigger wakeup in at least one thread. (Thats why putting
recovery and failed checks under lock)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..6c99221d60aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	if (starget->can_queue > 0)
 		atomic_dec(&starget->target_busy);
 
+	spin_lock_irqsave(shost->host_lock, flags);
 	if (unlikely(scsi_host_in_recovery(shost) &&
-		     (shost->host_failed || shost->host_eh_scheduled))) {
-		spin_lock_irqsave(shost->host_lock, flags);
+		     (shost->host_failed || shost->host_eh_scheduled)))
 		scsi_eh_wakeup(shost);
-		spin_unlock_irqrestore(shost->host_lock, flags);
-	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	atomic_dec(&sdev->device_busy);
 }
@@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 	spin_unlock_irq(shost->host_lock);
 out_dec:
 	atomic_dec(&shost->host_busy);
+
+	spin_lock_irq(shost->host_lock);
+	if (unlikely(scsi_host_in_recovery(shost) &&
+		     (shost->host_failed || shost->host_eh_scheduled)))
+		scsi_eh_wakeup(shost);
+	spin_unlock_irq(shost->host_lock);
+
 	return 0;
 }
 
@@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 out_dec_host_busy:
 	atomic_dec(&shost->host_busy);
+
+	spin_lock_irq(shost->host_lock);
+	if (unlikely(scsi_host_in_recovery(shost) &&
+		     (shost->host_failed || shost->host_eh_scheduled)))
+		scsi_eh_wakeup(shost);
+	spin_unlock_irq(shost->host_lock);
+
 out_dec_target_busy:
 	if (scsi_target(sdev)->can_queue > 0)
 		atomic_dec(&scsi_target(sdev)->target_busy);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-09-05 12:54 [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy Pavel Tikhomirov
@ 2017-10-04  7:18 ` Pavel Tikhomirov
  2017-10-20  7:48 ` Pavel Tikhomirov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-10-04  7:18 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-kernel, Konstantin Khorenko, devel

Hi. Please tell if there is something I can do to help the patch get 
processed? It is on the list without reply for almost a month.

On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
> We have a problem on several our nodes with scsi EH. Imagine such an
> order of execution of two threads:
> 
> CPU1 scsi_eh_scmd_add		CPU2 scsi_host_queue_ready
> /* shost->host_busy == 1 initialy */
> 
> 				if (shost->shost_state == SHOST_RECOVERY)
> 					/* does not get here */
> 					return 0;
> 
> lock(shost->host_lock);
> shost->shost_state = SHOST_RECOVERY;
> 
> 				busy = shost->host_busy++;
> 				/* host->can_queue == 1 initialy, busy == 1
> 				 * - go to starved label */
> 				lock(shost->host_lock) /* wait */
> 
> shost->host_failed++;
> /* shost->host_busy == 2, shost->host_failed == 1 */
> call scsi_eh_wakeup(shost) {
> 	if (host_busy == host_failed) {
> 		/* does not get here */
> 		wake_up_process(shost->ehandler)
> 	}
> }
> unlock(shost->host_lock)
> 
> 				/* acquire lock */
> 				shost->host_busy--;
> 
> Finaly we do not wakeup scsi_error_handler and all other commands
> coming will hang as we are in never ending recovery state as there
> is no one left to wakeup handler.
> 
> So scsi disc in these host becomes unresponsive and all bio on node
> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
> 
> Main idea of the fix is to try to do wake up every time we decrement
> host_busy or increment host_failed(the latter is already OK).
> 
> Now the very *last* one of busy threads getting host_lock after
> decrementing host_busy will see all write operations on host's
> shost_state, host_busy and host_failed completed thanks to implied
> memory barriers on spin_lock/unlock, so at the time of busy==failed
> we will trigger wakeup in at least one thread. (Thats why putting
> recovery and failed checks under lock)
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
>   drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>   1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..6c99221d60aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>   	if (starget->can_queue > 0)
>   		atomic_dec(&starget->target_busy);
>   
> +	spin_lock_irqsave(shost->host_lock, flags);
>   	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> -		spin_lock_irqsave(shost->host_lock, flags);
> +		     (shost->host_failed || shost->host_eh_scheduled)))
>   		scsi_eh_wakeup(shost);
> -		spin_unlock_irqrestore(shost->host_lock, flags);
> -	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>   
>   	atomic_dec(&sdev->device_busy);
>   }
> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   	spin_unlock_irq(shost->host_lock);
>   out_dec:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   	return 0;
>   }
>   
> @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>   
>   out_dec_host_busy:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   out_dec_target_busy:
>   	if (scsi_target(sdev)->can_queue > 0)
>   		atomic_dec(&scsi_target(sdev)->target_busy);
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-09-05 12:54 [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy Pavel Tikhomirov
  2017-10-04  7:18 ` Pavel Tikhomirov
@ 2017-10-20  7:48 ` Pavel Tikhomirov
  2017-11-09 14:54 ` Pavel Tikhomirov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-10-20  7:48 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-kernel, Konstantin Khorenko, devel

ping

On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
> We have a problem on several our nodes with scsi EH. Imagine such an
> order of execution of two threads:
> 
> CPU1 scsi_eh_scmd_add		CPU2 scsi_host_queue_ready
> /* shost->host_busy == 1 initialy */
> 
> 				if (shost->shost_state == SHOST_RECOVERY)
> 					/* does not get here */
> 					return 0;
> 
> lock(shost->host_lock);
> shost->shost_state = SHOST_RECOVERY;
> 
> 				busy = shost->host_busy++;
> 				/* host->can_queue == 1 initialy, busy == 1
> 				 * - go to starved label */
> 				lock(shost->host_lock) /* wait */
> 
> shost->host_failed++;
> /* shost->host_busy == 2, shost->host_failed == 1 */
> call scsi_eh_wakeup(shost) {
> 	if (host_busy == host_failed) {
> 		/* does not get here */
> 		wake_up_process(shost->ehandler)
> 	}
> }
> unlock(shost->host_lock)
> 
> 				/* acquire lock */
> 				shost->host_busy--;
> 
> Finaly we do not wakeup scsi_error_handler and all other commands
> coming will hang as we are in never ending recovery state as there
> is no one left to wakeup handler.
> 
> So scsi disc in these host becomes unresponsive and all bio on node
> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
> 
> Main idea of the fix is to try to do wake up every time we decrement
> host_busy or increment host_failed(the latter is already OK).
> 
> Now the very *last* one of busy threads getting host_lock after
> decrementing host_busy will see all write operations on host's
> shost_state, host_busy and host_failed completed thanks to implied
> memory barriers on spin_lock/unlock, so at the time of busy==failed
> we will trigger wakeup in at least one thread. (Thats why putting
> recovery and failed checks under lock)
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
>   drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>   1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..6c99221d60aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>   	if (starget->can_queue > 0)
>   		atomic_dec(&starget->target_busy);
>   
> +	spin_lock_irqsave(shost->host_lock, flags);
>   	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> -		spin_lock_irqsave(shost->host_lock, flags);
> +		     (shost->host_failed || shost->host_eh_scheduled)))
>   		scsi_eh_wakeup(shost);
> -		spin_unlock_irqrestore(shost->host_lock, flags);
> -	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>   
>   	atomic_dec(&sdev->device_busy);
>   }
> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   	spin_unlock_irq(shost->host_lock);
>   out_dec:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   	return 0;
>   }
>   
> @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>   
>   out_dec_host_busy:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   out_dec_target_busy:
>   	if (scsi_target(sdev)->can_queue > 0)
>   		atomic_dec(&scsi_target(sdev)->target_busy);
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-09-05 12:54 [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy Pavel Tikhomirov
  2017-10-04  7:18 ` Pavel Tikhomirov
  2017-10-20  7:48 ` Pavel Tikhomirov
@ 2017-11-09 14:54 ` Pavel Tikhomirov
  2017-11-21  6:10   ` Stuart Hayes
  2017-11-21  8:39 ` Pavel Tikhomirov
  2017-11-21 16:53 ` Bart Van Assche
  4 siblings, 1 reply; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-11-09 14:54 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: James E . J . Bottomley, Martin K . Petersen, Christoph Hellwig,
	linux-scsi, linux-kernel, Konstantin Khorenko, devel

 > Are there any issues with this patch 
(https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov 
submitted back in September?  I am willing to help if there's anything I 
can do to help get it accepted.

Hi, Stuart, I asked James Bottomley about the patch status offlist and 
it seems that the problem is - patch lacks testing and review. I would 
highly appreciate review from your side and anyone who wants to participate!

And if you can confirm that the patch solves the problem on your 
environment with no side effects please add "Tested-by:" tag also.

Thanks, Pavel

On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
> We have a problem on several our nodes with scsi EH. Imagine such an
> order of execution of two threads:
> 
> CPU1 scsi_eh_scmd_add		CPU2 scsi_host_queue_ready
> /* shost->host_busy == 1 initialy */
> 
> 				if (shost->shost_state == SHOST_RECOVERY)
> 					/* does not get here */
> 					return 0;
> 
> lock(shost->host_lock);
> shost->shost_state = SHOST_RECOVERY;
> 
> 				busy = shost->host_busy++;
> 				/* host->can_queue == 1 initialy, busy == 1
> 				 * - go to starved label */
> 				lock(shost->host_lock) /* wait */
> 
> shost->host_failed++;
> /* shost->host_busy == 2, shost->host_failed == 1 */
> call scsi_eh_wakeup(shost) {
> 	if (host_busy == host_failed) {
> 		/* does not get here */
> 		wake_up_process(shost->ehandler)
> 	}
> }
> unlock(shost->host_lock)
> 
> 				/* acquire lock */
> 				shost->host_busy--;
> 
> Finaly we do not wakeup scsi_error_handler and all other commands
> coming will hang as we are in never ending recovery state as there
> is no one left to wakeup handler.
> 
> So scsi disc in these host becomes unresponsive and all bio on node
> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
> 
> Main idea of the fix is to try to do wake up every time we decrement
> host_busy or increment host_failed(the latter is already OK).
> 
> Now the very *last* one of busy threads getting host_lock after
> decrementing host_busy will see all write operations on host's
> shost_state, host_busy and host_failed completed thanks to implied
> memory barriers on spin_lock/unlock, so at the time of busy==failed
> we will trigger wakeup in at least one thread. (Thats why putting
> recovery and failed checks under lock)
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
>   drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>   1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..6c99221d60aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>   	if (starget->can_queue > 0)
>   		atomic_dec(&starget->target_busy);
>   
> +	spin_lock_irqsave(shost->host_lock, flags);
>   	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> -		spin_lock_irqsave(shost->host_lock, flags);
> +		     (shost->host_failed || shost->host_eh_scheduled)))
>   		scsi_eh_wakeup(shost);
> -		spin_unlock_irqrestore(shost->host_lock, flags);
> -	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>   
>   	atomic_dec(&sdev->device_busy);
>   }
> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   	spin_unlock_irq(shost->host_lock);
>   out_dec:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   	return 0;
>   }
>   
> @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>   
>   out_dec_host_busy:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   out_dec_target_busy:
>   	if (scsi_target(sdev)->can_queue > 0)
>   		atomic_dec(&scsi_target(sdev)->target_busy);
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-11-09 14:54 ` Pavel Tikhomirov
@ 2017-11-21  6:10   ` Stuart Hayes
  2017-11-21  8:09     ` Pavel Tikhomirov
  0 siblings, 1 reply; 11+ messages in thread
From: Stuart Hayes @ 2017-11-21  6:10 UTC (permalink / raw)
  To: Pavel Tikhomirov
  Cc: James E . J . Bottomley, Martin K . Petersen, Christoph Hellwig,
	linux-scsi, linux-kernel, Konstantin Khorenko, devel

Pavel,

It turns out that the error handler on our systems was not getting woken up for a different reason... I submitted a patch earlier today that fixes the issue I were seeing (I CCed you on the patch).

Before I got my hands on the failing system and was able to root cause it, I was pretty sure that your patch was going to fix our issue, because after I examined the code paths, I couldn't find any other reason that the error handler would not get woken up.  I tried forcing the bug that your patch fixes to occur, by compiling in some mdelay()s in a key place or two in the scsi code, but it never failed for me that way.  With my patch, several systems that previously failed in 10 minutes or less successfully ran for many days. 

Thanks,
Stuart

On 11/9/2017 8:54 AM, Pavel Tikhomirov wrote:
>> Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September?  I am willing to help if there's anything I can do to help get it accepted.
> 
> Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate!
> 
> And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also.
> 
> Thanks, Pavel
> 
> On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
>> We have a problem on several our nodes with scsi EH. Imagine such an
>> order of execution of two threads:
>>
>> CPU1 scsi_eh_scmd_add        CPU2 scsi_host_queue_ready
>> /* shost->host_busy == 1 initialy */
>>
>>                 if (shost->shost_state == SHOST_RECOVERY)
>>                     /* does not get here */
>>                     return 0;
>>
>> lock(shost->host_lock);
>> shost->shost_state = SHOST_RECOVERY;
>>
>>                 busy = shost->host_busy++;
>>                 /* host->can_queue == 1 initialy, busy == 1
>>                  * - go to starved label */
>>                 lock(shost->host_lock) /* wait */
>>
>> shost->host_failed++;
>> /* shost->host_busy == 2, shost->host_failed == 1 */
>> call scsi_eh_wakeup(shost) {
>>     if (host_busy == host_failed) {
>>         /* does not get here */
>>         wake_up_process(shost->ehandler)
>>     }
>> }
>> unlock(shost->host_lock)
>>
>>                 /* acquire lock */
>>                 shost->host_busy--;
>>
>> Finaly we do not wakeup scsi_error_handler and all other commands
>> coming will hang as we are in never ending recovery state as there
>> is no one left to wakeup handler.
>>
>> So scsi disc in these host becomes unresponsive and all bio on node
>> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
>>
>> Main idea of the fix is to try to do wake up every time we decrement
>> host_busy or increment host_failed(the latter is already OK).
>>
>> Now the very *last* one of busy threads getting host_lock after
>> decrementing host_busy will see all write operations on host's
>> shost_state, host_busy and host_failed completed thanks to implied
>> memory barriers on spin_lock/unlock, so at the time of busy==failed
>> we will trigger wakeup in at least one thread. (Thats why putting
>> recovery and failed checks under lock)
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>>   drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>>   1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..6c99221d60aa 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>>       if (starget->can_queue > 0)
>>           atomic_dec(&starget->target_busy);
>>   +    spin_lock_irqsave(shost->host_lock, flags);
>>       if (unlikely(scsi_host_in_recovery(shost) &&
>> -             (shost->host_failed || shost->host_eh_scheduled))) {
>> -        spin_lock_irqsave(shost->host_lock, flags);
>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>           scsi_eh_wakeup(shost);
>> -        spin_unlock_irqrestore(shost->host_lock, flags);
>> -    }
>> +    spin_unlock_irqrestore(shost->host_lock, flags);
>>         atomic_dec(&sdev->device_busy);
>>   }
>> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>>       spin_unlock_irq(shost->host_lock);
>>   out_dec:
>>       atomic_dec(&shost->host_busy);
>> +
>> +    spin_lock_irq(shost->host_lock);
>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>> +             (shost->host_failed || shost->host_eh_scheduled)))
>> +        scsi_eh_wakeup(shost);
>> +    spin_unlock_irq(shost->host_lock);
>> +
>>       return 0;
>>   }
>>   @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>     out_dec_host_busy:
>>       atomic_dec(&shost->host_busy);
>> +
>> +    spin_lock_irq(shost->host_lock);
>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>> +             (shost->host_failed || shost->host_eh_scheduled)))
>> +        scsi_eh_wakeup(shost);
>> +    spin_unlock_irq(shost->host_lock);
>> +
>>   out_dec_target_busy:
>>       if (scsi_target(sdev)->can_queue > 0)
>>           atomic_dec(&scsi_target(sdev)->target_busy);
>>
> 

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-11-21  6:10   ` Stuart Hayes
@ 2017-11-21  8:09     ` Pavel Tikhomirov
  2017-11-22  0:49       ` Stuart Hayes
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-11-21  8:09 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: James E . J . Bottomley, Martin K . Petersen, Christoph Hellwig,
	linux-scsi, linux-kernel, Konstantin Khorenko, devel

My patch should also fix your issue too, please see explanation in reply 
to your patch. Do your testing show that it doesn't?

Thanks, Pavel.

On 11/21/2017 09:10 AM, Stuart Hayes wrote:
> Pavel,
> 
> It turns out that the error handler on our systems was not getting woken up for a different reason... I submitted a patch earlier today that fixes the issue I were seeing (I CCed you on the patch).
> 
> Before I got my hands on the failing system and was able to root cause it, I was pretty sure that your patch was going to fix our issue, because after I examined the code paths, I couldn't find any other reason that the error handler would not get woken up.  I tried forcing the bug that your patch fixes to occur, by compiling in some mdelay()s in a key place or two in the scsi code, but it never failed for me that way.  With my patch, several systems that previously failed in 10 minutes or less successfully ran for many days.
> 
> Thanks,
> Stuart
> 
> On 11/9/2017 8:54 AM, Pavel Tikhomirov wrote:
>>> Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September?  I am willing to help if there's anything I can do to help get it accepted.
>>
>> Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate!
>>
>> And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also.
>>
>> Thanks, Pavel
>>
>> On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
>>> We have a problem on several our nodes with scsi EH. Imagine such an
>>> order of execution of two threads:
>>>
>>> CPU1 scsi_eh_scmd_add        CPU2 scsi_host_queue_ready
>>> /* shost->host_busy == 1 initialy */
>>>
>>>                  if (shost->shost_state == SHOST_RECOVERY)
>>>                      /* does not get here */
>>>                      return 0;
>>>
>>> lock(shost->host_lock);
>>> shost->shost_state = SHOST_RECOVERY;
>>>
>>>                  busy = shost->host_busy++;
>>>                  /* host->can_queue == 1 initialy, busy == 1
>>>                   * - go to starved label */
>>>                  lock(shost->host_lock) /* wait */
>>>
>>> shost->host_failed++;
>>> /* shost->host_busy == 2, shost->host_failed == 1 */
>>> call scsi_eh_wakeup(shost) {
>>>      if (host_busy == host_failed) {
>>>          /* does not get here */
>>>          wake_up_process(shost->ehandler)
>>>      }
>>> }
>>> unlock(shost->host_lock)
>>>
>>>                  /* acquire lock */
>>>                  shost->host_busy--;
>>>
>>> Finaly we do not wakeup scsi_error_handler and all other commands
>>> coming will hang as we are in never ending recovery state as there
>>> is no one left to wakeup handler.
>>>
>>> So scsi disc in these host becomes unresponsive and all bio on node
>>> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
>>>
>>> Main idea of the fix is to try to do wake up every time we decrement
>>> host_busy or increment host_failed(the latter is already OK).
>>>
>>> Now the very *last* one of busy threads getting host_lock after
>>> decrementing host_busy will see all write operations on host's
>>> shost_state, host_busy and host_failed completed thanks to implied
>>> memory barriers on spin_lock/unlock, so at the time of busy==failed
>>> we will trigger wakeup in at least one thread. (Thats why putting
>>> recovery and failed checks under lock)
>>>
>>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>> ---
>>>    drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>>>    1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index f6097b89d5d3..6c99221d60aa 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>>>        if (starget->can_queue > 0)
>>>            atomic_dec(&starget->target_busy);
>>>    +    spin_lock_irqsave(shost->host_lock, flags);
>>>        if (unlikely(scsi_host_in_recovery(shost) &&
>>> -             (shost->host_failed || shost->host_eh_scheduled))) {
>>> -        spin_lock_irqsave(shost->host_lock, flags);
>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>            scsi_eh_wakeup(shost);
>>> -        spin_unlock_irqrestore(shost->host_lock, flags);
>>> -    }
>>> +    spin_unlock_irqrestore(shost->host_lock, flags);
>>>          atomic_dec(&sdev->device_busy);
>>>    }
>>> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>>>        spin_unlock_irq(shost->host_lock);
>>>    out_dec:
>>>        atomic_dec(&shost->host_busy);
>>> +
>>> +    spin_lock_irq(shost->host_lock);
>>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>> +        scsi_eh_wakeup(shost);
>>> +    spin_unlock_irq(shost->host_lock);
>>> +
>>>        return 0;
>>>    }
>>>    @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>      out_dec_host_busy:
>>>        atomic_dec(&shost->host_busy);
>>> +
>>> +    spin_lock_irq(shost->host_lock);
>>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>> +        scsi_eh_wakeup(shost);
>>> +    spin_unlock_irq(shost->host_lock);
>>> +
>>>    out_dec_target_busy:
>>>        if (scsi_target(sdev)->can_queue > 0)
>>>            atomic_dec(&scsi_target(sdev)->target_busy);
>>>
>>
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-09-05 12:54 [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy Pavel Tikhomirov
                   ` (2 preceding siblings ...)
  2017-11-09 14:54 ` Pavel Tikhomirov
@ 2017-11-21  8:39 ` Pavel Tikhomirov
  2017-11-21 16:53 ` Bart Van Assche
  4 siblings, 0 replies; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-11-21  8:39 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-kernel, Konstantin Khorenko, devel

JFYI these patch is in Virtuozzo7 kernel from September, and we have no 
issues found with it until now by out testing, and initial problem does 
not reproduce for 2.5 months.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-09-05 12:54 [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy Pavel Tikhomirov
                   ` (3 preceding siblings ...)
  2017-11-21  8:39 ` Pavel Tikhomirov
@ 2017-11-21 16:53 ` Bart Van Assche
  4 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2017-11-21 16:53 UTC (permalink / raw)
  To: jejb, ptikhomirov, martin.petersen
  Cc: hch, linux-scsi, linux-kernel, devel, khorenko

On Tue, 2017-09-05 at 15:54 +0300, Pavel Tikhomirov wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..6c99221d60aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>  	if (starget->can_queue > 0)
>  		atomic_dec(&starget->target_busy);
>  
> +	spin_lock_irqsave(shost->host_lock, flags);
>  	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> -		spin_lock_irqsave(shost->host_lock, flags);
> +		     (shost->host_failed || shost->host_eh_scheduled)))
>  		scsi_eh_wakeup(shost);
> -		spin_unlock_irqrestore(shost->host_lock, flags);
> -	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>  
>  	atomic_dec(&sdev->device_busy);
>  }
> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  	spin_unlock_irq(shost->host_lock);
>  out_dec:
>  	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>  	return 0;
>  }
>  
> @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  
>  out_dec_host_busy:
>  	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>  out_dec_target_busy:
>  	if (scsi_target(sdev)->can_queue > 0)
>  		atomic_dec(&scsi_target(sdev)->target_busy);

An important achievement of the scsi-mq code was removal of all
spin_lock_irq(shost->host_lock) statements from the hot path. The above
changes will have a significant negative performance impact, especially if
multiple LUNs associated with the same SCSI host are involved. Can the
reported race be fixed without slowing down the hot path significantly? I
think that both adding spin lock or smp_mb() calls in the hot path will
have a significant negative performance impact.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-11-21  8:09     ` Pavel Tikhomirov
@ 2017-11-22  0:49       ` Stuart Hayes
  2017-11-22  7:01         ` Pavel Tikhomirov
  0 siblings, 1 reply; 11+ messages in thread
From: Stuart Hayes @ 2017-11-22  0:49 UTC (permalink / raw)
  To: Pavel Tikhomirov
  Cc: James E . J . Bottomley, Martin K . Petersen, Christoph Hellwig,
	linux-scsi, linux-kernel, Konstantin Khorenko, devel

My apologies... yes, your patch also fixes my issue.  I was looking at the two new places from which you were calling scsi_eh_wakeup(), and didn't notice that you moved the spinlock in scsi_device_unbusy()... moving the spinlock in scsi_device_unbusy() also should the issue I'm seeing, given that scsi_eh_scmd_add() also uses the spinlock.

I tested your patch on my issue, and it did indeed fix my issue.

So you can add...

Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>

Thanks
Stuart


On 11/21/2017 2:09 AM, Pavel Tikhomirov wrote:
> My patch should also fix your issue too, please see explanation in reply to your patch. Do your testing show that it doesn't?
> 
> Thanks, Pavel.
> 
> On 11/21/2017 09:10 AM, Stuart Hayes wrote:
>> Pavel,
>>
>> It turns out that the error handler on our systems was not getting woken up for a different reason... I submitted a patch earlier today that fixes the issue I were seeing (I CCed you on the patch).
>>
>> Before I got my hands on the failing system and was able to root cause it, I was pretty sure that your patch was going to fix our issue, because after I examined the code paths, I couldn't find any other reason that the error handler would not get woken up.  I tried forcing the bug that your patch fixes to occur, by compiling in some mdelay()s in a key place or two in the scsi code, but it never failed for me that way.  With my patch, several systems that previously failed in 10 minutes or less successfully ran for many days.
>>
>> Thanks,
>> Stuart
>>
>> On 11/9/2017 8:54 AM, Pavel Tikhomirov wrote:
>>>> Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September?  I am willing to help if there's anything I can do to help get it accepted.
>>>
>>> Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate!
>>>
>>> And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also.
>>>
>>> Thanks, Pavel
>>>
>>> On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
>>>> We have a problem on several our nodes with scsi EH. Imagine such an
>>>> order of execution of two threads:
>>>>
>>>> CPU1 scsi_eh_scmd_add        CPU2 scsi_host_queue_ready
>>>> /* shost->host_busy == 1 initialy */
>>>>
>>>>                  if (shost->shost_state == SHOST_RECOVERY)
>>>>                      /* does not get here */
>>>>                      return 0;
>>>>
>>>> lock(shost->host_lock);
>>>> shost->shost_state = SHOST_RECOVERY;
>>>>
>>>>                  busy = shost->host_busy++;
>>>>                  /* host->can_queue == 1 initialy, busy == 1
>>>>                   * - go to starved label */
>>>>                  lock(shost->host_lock) /* wait */
>>>>
>>>> shost->host_failed++;
>>>> /* shost->host_busy == 2, shost->host_failed == 1 */
>>>> call scsi_eh_wakeup(shost) {
>>>>      if (host_busy == host_failed) {
>>>>          /* does not get here */
>>>>          wake_up_process(shost->ehandler)
>>>>      }
>>>> }
>>>> unlock(shost->host_lock)
>>>>
>>>>                  /* acquire lock */
>>>>                  shost->host_busy--;
>>>>
>>>> Finaly we do not wakeup scsi_error_handler and all other commands
>>>> coming will hang as we are in never ending recovery state as there
>>>> is no one left to wakeup handler.
>>>>
>>>> So scsi disc in these host becomes unresponsive and all bio on node
>>>> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
>>>>
>>>> Main idea of the fix is to try to do wake up every time we decrement
>>>> host_busy or increment host_failed(the latter is already OK).
>>>>
>>>> Now the very *last* one of busy threads getting host_lock after
>>>> decrementing host_busy will see all write operations on host's
>>>> shost_state, host_busy and host_failed completed thanks to implied
>>>> memory barriers on spin_lock/unlock, so at the time of busy==failed
>>>> we will trigger wakeup in at least one thread. (Thats why putting
>>>> recovery and failed checks under lock)
>>>>
>>>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>>> ---
>>>>    drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>>>>    1 file changed, 17 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index f6097b89d5d3..6c99221d60aa 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>>>>        if (starget->can_queue > 0)
>>>>            atomic_dec(&starget->target_busy);
>>>>    +    spin_lock_irqsave(shost->host_lock, flags);
>>>>        if (unlikely(scsi_host_in_recovery(shost) &&
>>>> -             (shost->host_failed || shost->host_eh_scheduled))) {
>>>> -        spin_lock_irqsave(shost->host_lock, flags);
>>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>>            scsi_eh_wakeup(shost);
>>>> -        spin_unlock_irqrestore(shost->host_lock, flags);
>>>> -    }
>>>> +    spin_unlock_irqrestore(shost->host_lock, flags);
>>>>          atomic_dec(&sdev->device_busy);
>>>>    }
>>>> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>>>>        spin_unlock_irq(shost->host_lock);
>>>>    out_dec:
>>>>        atomic_dec(&shost->host_busy);
>>>> +
>>>> +    spin_lock_irq(shost->host_lock);
>>>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>> +        scsi_eh_wakeup(shost);
>>>> +    spin_unlock_irq(shost->host_lock);
>>>> +
>>>>        return 0;
>>>>    }
>>>>    @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>      out_dec_host_busy:
>>>>        atomic_dec(&shost->host_busy);
>>>> +
>>>> +    spin_lock_irq(shost->host_lock);
>>>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>> +        scsi_eh_wakeup(shost);
>>>> +    spin_unlock_irq(shost->host_lock);
>>>> +
>>>>    out_dec_target_busy:
>>>>        if (scsi_target(sdev)->can_queue > 0)
>>>>            atomic_dec(&scsi_target(sdev)->target_busy);
>>>>
>>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
  2017-11-22  0:49       ` Stuart Hayes
@ 2017-11-22  7:01         ` Pavel Tikhomirov
  0 siblings, 0 replies; 11+ messages in thread
From: Pavel Tikhomirov @ 2017-11-22  7:01 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: James E . J . Bottomley, Martin K . Petersen, Christoph Hellwig,
	linux-scsi, linux-kernel, Konstantin Khorenko, devel

Great news, that it works for you!

Thanks a lot!
Pavel

On 11/22/2017 03:49 AM, Stuart Hayes wrote:
> My apologies... yes, your patch also fixes my issue.  I was looking at the two new places from which you were calling scsi_eh_wakeup(), and didn't notice that you moved the spinlock in scsi_device_unbusy()... moving the spinlock in scsi_device_unbusy() also should the issue I'm seeing, given that scsi_eh_scmd_add() also uses the spinlock.
> 
> I tested your patch on my issue, and it did indeed fix my issue.
> 
> So you can add...
> 
> Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> 
> Thanks
> Stuart
> 
> 
> On 11/21/2017 2:09 AM, Pavel Tikhomirov wrote:
>> My patch should also fix your issue too, please see explanation in reply to your patch. Do your testing show that it doesn't?
>>
>> Thanks, Pavel.
>>
>> On 11/21/2017 09:10 AM, Stuart Hayes wrote:
>>> Pavel,
>>>
>>> It turns out that the error handler on our systems was not getting woken up for a different reason... I submitted a patch earlier today that fixes the issue I were seeing (I CCed you on the patch).
>>>
>>> Before I got my hands on the failing system and was able to root cause it, I was pretty sure that your patch was going to fix our issue, because after I examined the code paths, I couldn't find any other reason that the error handler would not get woken up.  I tried forcing the bug that your patch fixes to occur, by compiling in some mdelay()s in a key place or two in the scsi code, but it never failed for me that way.  With my patch, several systems that previously failed in 10 minutes or less successfully ran for many days.
>>>
>>> Thanks,
>>> Stuart
>>>
>>> On 11/9/2017 8:54 AM, Pavel Tikhomirov wrote:
>>>>> Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September?  I am willing to help if there's anything I can do to help get it accepted.
>>>>
>>>> Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate!
>>>>
>>>> And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also.
>>>>
>>>> Thanks, Pavel
>>>>
>>>> On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
>>>>> We have a problem on several our nodes with scsi EH. Imagine such an
>>>>> order of execution of two threads:
>>>>>
>>>>> CPU1 scsi_eh_scmd_add        CPU2 scsi_host_queue_ready
>>>>> /* shost->host_busy == 1 initialy */
>>>>>
>>>>>                   if (shost->shost_state == SHOST_RECOVERY)
>>>>>                       /* does not get here */
>>>>>                       return 0;
>>>>>
>>>>> lock(shost->host_lock);
>>>>> shost->shost_state = SHOST_RECOVERY;
>>>>>
>>>>>                   busy = shost->host_busy++;
>>>>>                   /* host->can_queue == 1 initialy, busy == 1
>>>>>                    * - go to starved label */
>>>>>                   lock(shost->host_lock) /* wait */
>>>>>
>>>>> shost->host_failed++;
>>>>> /* shost->host_busy == 2, shost->host_failed == 1 */
>>>>> call scsi_eh_wakeup(shost) {
>>>>>       if (host_busy == host_failed) {
>>>>>           /* does not get here */
>>>>>           wake_up_process(shost->ehandler)
>>>>>       }
>>>>> }
>>>>> unlock(shost->host_lock)
>>>>>
>>>>>                   /* acquire lock */
>>>>>                   shost->host_busy--;
>>>>>
>>>>> Finaly we do not wakeup scsi_error_handler and all other commands
>>>>> coming will hang as we are in never ending recovery state as there
>>>>> is no one left to wakeup handler.
>>>>>
>>>>> So scsi disc in these host becomes unresponsive and all bio on node
>>>>> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
>>>>>
>>>>> Main idea of the fix is to try to do wake up every time we decrement
>>>>> host_busy or increment host_failed(the latter is already OK).
>>>>>
>>>>> Now the very *last* one of busy threads getting host_lock after
>>>>> decrementing host_busy will see all write operations on host's
>>>>> shost_state, host_busy and host_failed completed thanks to implied
>>>>> memory barriers on spin_lock/unlock, so at the time of busy==failed
>>>>> we will trigger wakeup in at least one thread. (Thats why putting
>>>>> recovery and failed checks under lock)
>>>>>
>>>>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>>>> ---
>>>>>     drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>>>>>     1 file changed, 17 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index f6097b89d5d3..6c99221d60aa 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>>>>>         if (starget->can_queue > 0)
>>>>>             atomic_dec(&starget->target_busy);
>>>>>     +    spin_lock_irqsave(shost->host_lock, flags);
>>>>>         if (unlikely(scsi_host_in_recovery(shost) &&
>>>>> -             (shost->host_failed || shost->host_eh_scheduled))) {
>>>>> -        spin_lock_irqsave(shost->host_lock, flags);
>>>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>>>             scsi_eh_wakeup(shost);
>>>>> -        spin_unlock_irqrestore(shost->host_lock, flags);
>>>>> -    }
>>>>> +    spin_unlock_irqrestore(shost->host_lock, flags);
>>>>>           atomic_dec(&sdev->device_busy);
>>>>>     }
>>>>> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>>>>>         spin_unlock_irq(shost->host_lock);
>>>>>     out_dec:
>>>>>         atomic_dec(&shost->host_busy);
>>>>> +
>>>>> +    spin_lock_irq(shost->host_lock);
>>>>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>>>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>>> +        scsi_eh_wakeup(shost);
>>>>> +    spin_unlock_irq(shost->host_lock);
>>>>> +
>>>>>         return 0;
>>>>>     }
>>>>>     @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>       out_dec_host_busy:
>>>>>         atomic_dec(&shost->host_busy);
>>>>> +
>>>>> +    spin_lock_irq(shost->host_lock);
>>>>> +    if (unlikely(scsi_host_in_recovery(shost) &&
>>>>> +             (shost->host_failed || shost->host_eh_scheduled)))
>>>>> +        scsi_eh_wakeup(shost);
>>>>> +    spin_unlock_irq(shost->host_lock);
>>>>> +
>>>>>     out_dec_target_busy:
>>>>>         if (scsi_target(sdev)->can_queue > 0)
>>>>>             atomic_dec(&scsi_target(sdev)->target_busy);
>>>>>
>>>>
>>>
>>> ---
>>> This email has been checked for viruses by Avast antivirus software.
>>> https://www.avast.com/antivirus
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy
@ 2017-11-09  5:38 Stuart Hayes
  0 siblings, 0 replies; 11+ messages in thread
From: Stuart Hayes @ 2017-11-09  5:38 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, Pavel Tikhomirov
  Cc: linux-kernel, linux-scsi, Konstantin Khorenko, devel, Christoph Hellwig


Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September?  I am willing to help if there's anything I can do to help get it accepted.

The failing case I'm working on involves lots of servers with disk read/write activity with periodic INQUIRY commands that aren't supported by the target device, so the error handler gets invoked periodically, and some of the systems will hang after a day or two.

Thanks
Stuart

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-22  7:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-05 12:54 [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy Pavel Tikhomirov
2017-10-04  7:18 ` Pavel Tikhomirov
2017-10-20  7:48 ` Pavel Tikhomirov
2017-11-09 14:54 ` Pavel Tikhomirov
2017-11-21  6:10   ` Stuart Hayes
2017-11-21  8:09     ` Pavel Tikhomirov
2017-11-22  0:49       ` Stuart Hayes
2017-11-22  7:01         ` Pavel Tikhomirov
2017-11-21  8:39 ` Pavel Tikhomirov
2017-11-21 16:53 ` Bart Van Assche
2017-11-09  5:38 Stuart Hayes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).