All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: scsi_host_queue_ready: increase busy count early
@ 2021-01-20 18:45 mwilck
  2021-01-20 20:26 ` John Garry
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: mwilck @ 2021-01-20 18:45 UTC (permalink / raw)
  To: Donald Buczek, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Martin Wilck,
	Don Brace, Kevin Barnett, John Garry, Paul Menzel

From: Martin Wilck <mwilck@suse.com>

Donald: please give this patch a try.

Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
contained this hunk:

-       busy = atomic_inc_return(&shost->host_busy) - 1;
        if (atomic_read(&shost->host_blocked) > 0) {
-               if (busy)
+               if (scsi_host_busy(shost) > 0)
                        goto starved;

The previous code would increase the busy count before checking host_blocked.
With 6eb045e092ef, the busy count would be increased (by setting the
SCMD_STATE_INFLIGHT bit) after the if clause for host_blocked above.

Users have reported a regression with the smartpqi driver [1] which has been
shown to be caused by this commit [2].

It seems that by moving the increase of the busy counter further down, it could
happen that the can_queue limit of the controller could be exceeded if several
CPUs were executing this code in parallel on different queues.

This patch attempts to fix it by moving setting the SCMD_STATE_INFLIGHT before
the host_blocked test again. It also inserts barriers to make sure
scsi_host_busy() on once CPU will notice the increase of the count from another.

[1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
[2]: https://marc.info/?l=linux-scsi&m=161116163722099&w=2

Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Don Brace <Don.Brace@microchip.com>
Cc: Kevin Barnett <Kevin.Barnett@microchip.com>
Cc: Donald Buczek <buczek@molgen.mpg.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/hosts.c    | 2 ++
 drivers/scsi/scsi_lib.c | 8 +++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 2f162603876f..1c452a1c18fd 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct request *rq, void *data,
 	int *count = data;
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
 
+	/* This pairs with set_bit() in scsi_host_queue_ready() */
+	smp_mb__before_atomic();
 	if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
 		(*count)++;
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b3f14f05340a..0a9a36c349ee 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1353,8 +1353,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 	if (scsi_host_in_recovery(shost))
 		return 0;
 
+	set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
+	/* This pairs with test_bit() in scsi_host_check_in_flight() */
+	smp_mb__after_atomic();
+
 	if (atomic_read(&shost->host_blocked) > 0) {
-		if (scsi_host_busy(shost) > 0)
+		if (scsi_host_busy(shost) > 1)
 			goto starved;
 
 		/*
@@ -1379,8 +1383,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		spin_unlock_irq(shost->host_lock);
 	}
 
-	__set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
-
 	return 1;
 
 starved:
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-20 18:45 [PATCH] scsi: scsi_host_queue_ready: increase busy count early mwilck
@ 2021-01-20 20:26 ` John Garry
  2021-01-21 12:01   ` Donald Buczek
  2021-01-21  9:07 ` Donald Buczek
  2021-01-22  3:23 ` Ming Lei
  2 siblings, 1 reply; 27+ messages in thread
From: John Garry @ 2021-01-20 20:26 UTC (permalink / raw)
  To: mwilck, Donald Buczek, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel

On 20/01/2021 18:45, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Donald: please give this patch a try.
> 
> Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> contained this hunk:
> 
> -       busy = atomic_inc_return(&shost->host_busy) - 1;
>          if (atomic_read(&shost->host_blocked) > 0) {
> -               if (busy)
> +               if (scsi_host_busy(shost) > 0)
>                          goto starved;
> 
> The previous code would increase the busy count before checking host_blocked.
> With 6eb045e092ef, the busy count would be increased (by setting the
> SCMD_STATE_INFLIGHT bit) after the if clause for host_blocked above.
> 
> Users have reported a regression with the smartpqi driver [1] which has been
> shown to be caused by this commit [2].
> 

Please note these, where potential issue was discussed before:
https://lore.kernel.org/linux-scsi/8a3c8d37-8d15-4060-f461-8d400b19cb34@suse.de/
https://lore.kernel.org/linux-scsi/20200714104437.GB602708@T590/

> It seems that by moving the increase of the busy counter further down, it could
> happen that the can_queue limit of the controller could be exceeded if several
> CPUs were executing this code in parallel on different queues.
> 
> This patch attempts to fix it by moving setting the SCMD_STATE_INFLIGHT before
> the host_blocked test again. It also inserts barriers to make sure
> scsi_host_busy() on once CPU will notice the increase of the count from another.
> 
> [1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
> [2]: https://marc.info/?l=linux-scsi&m=161116163722099&w=2
> 
> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")

For failing test, it would be good to know:
- host .can_queue
- host .nr_hw_queues
- number of LUNs in test and their queue depth

All can be got from lsscsi, apart from nr_hw_queues, which can be got 
from scsi_host sysfs for kernel >= 5.10

Cheers,
John

> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Don Brace <Don.Brace@microchip.com>
> Cc: Kevin Barnett <Kevin.Barnett@microchip.com>
> Cc: Donald Buczek <buczek@molgen.mpg.de>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   drivers/scsi/hosts.c    | 2 ++
>   drivers/scsi/scsi_lib.c | 8 +++++---
>   2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 2f162603876f..1c452a1c18fd 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct request *rq, void *data,
>   	int *count = data;
>   	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>   
> +	/* This pairs with set_bit() in scsi_host_queue_ready() */
> +	smp_mb__before_atomic();
>   	if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
>   		(*count)++;
>   
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b3f14f05340a..0a9a36c349ee 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1353,8 +1353,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   	if (scsi_host_in_recovery(shost))
>   		return 0;
>   
> +	set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> +	/* This pairs with test_bit() in scsi_host_check_in_flight() */
> +	smp_mb__after_atomic();
> +
>   	if (atomic_read(&shost->host_blocked) > 0) {
> -		if (scsi_host_busy(shost) > 0)
> +		if (scsi_host_busy(shost) > 1)
>   			goto starved;
>   
>   		/*
> @@ -1379,8 +1383,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   		spin_unlock_irq(shost->host_lock);
>   	}
>   
> -	__set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> -
>   	return 1;
>   
>   starved:
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-20 18:45 [PATCH] scsi: scsi_host_queue_ready: increase busy count early mwilck
  2021-01-20 20:26 ` John Garry
@ 2021-01-21  9:07 ` Donald Buczek
  2021-01-21 10:05   ` Martin Wilck
  2021-01-22  3:23 ` Ming Lei
  2 siblings, 1 reply; 27+ messages in thread
From: Donald Buczek @ 2021-01-21  9:07 UTC (permalink / raw)
  To: mwilck, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, John Garry, Paul Menzel

On 20.01.21 19:45, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Donald: please give this patch a try.
> 
> Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> [...]


Unfortunately, this patch (on top of 6eb045e092ef) did not fix the problem. Same error (""controller is offline: status code 0x6100c"") after about 20 minutes of the test load.

Best
   Donald


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21  9:07 ` Donald Buczek
@ 2021-01-21 10:05   ` Martin Wilck
  2021-01-22  0:14     ` Martin Wilck
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Wilck @ 2021-01-21 10:05 UTC (permalink / raw)
  To: Donald Buczek, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, John Garry, Paul Menzel

On Thu, 2021-01-21 at 10:07 +0100, Donald Buczek wrote:
> On 20.01.21 19:45, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > Donald: please give this patch a try.
> > 
> > Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter
> > for scsi_mq")
> > [...]
> 
> 
> Unfortunately, this patch (on top of 6eb045e092ef) did not fix the
> problem. Same error (""controller is offline: status code 0x6100c"")
> after about 20 minutes of the test load.

Too bad. Well, I believe it was worth a try. Thanks for testing it.

Martin






^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-20 20:26 ` John Garry
@ 2021-01-21 12:01   ` Donald Buczek
  2021-01-21 12:35     ` John Garry
  0 siblings, 1 reply; 27+ messages in thread
From: Donald Buczek @ 2021-01-21 12:01 UTC (permalink / raw)
  To: John Garry, mwilck, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel

On 20.01.21 21:26, John Garry wrote:
> On 20/01/2021 18:45, mwilck@suse.com wrote:
>> From: Martin Wilck <mwilck@suse.com>
>>
>> Donald: please give this patch a try.
>>
>> Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
>> contained this hunk:
>>
>> -       busy = atomic_inc_return(&shost->host_busy) - 1;
>>          if (atomic_read(&shost->host_blocked) > 0) {
>> -               if (busy)
>> +               if (scsi_host_busy(shost) > 0)
>>                          goto starved;
>>
>> The previous code would increase the busy count before checking host_blocked.
>> With 6eb045e092ef, the busy count would be increased (by setting the
>> SCMD_STATE_INFLIGHT bit) after the if clause for host_blocked above.
>>
>> Users have reported a regression with the smartpqi driver [1] which has been
>> shown to be caused by this commit [2].
>>
> 
> Please note these, where potential issue was discussed before:
> https://lore.kernel.org/linux-scsi/8a3c8d37-8d15-4060-f461-8d400b19cb34@suse.de/
> https://lore.kernel.org/linux-scsi/20200714104437.GB602708@T590/
> 
>> It seems that by moving the increase of the busy counter further down, it could
>> happen that the can_queue limit of the controller could be exceeded if several
>> CPUs were executing this code in parallel on different queues.
>>
>> This patch attempts to fix it by moving setting the SCMD_STATE_INFLIGHT before
>> the host_blocked test again. It also inserts barriers to make sure
>> scsi_host_busy() on once CPU will notice the increase of the count from another.
>>
>> [1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
>> [2]: https://marc.info/?l=linux-scsi&m=161116163722099&w=2
>>
>> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> 
> For failing test, it would be good to know:
> - host .can_queue
> - host .nr_hw_queues
> - number of LUNs in test and their queue dept>> All can be got from lsscsi, apart from nr_hw_queues, which can be got from scsi_host sysfs for kernel >= 5.10

The last test was on 6eb045e092ef + Martins experimental patch, which technically is 5.4.0-rc1

I'm not 100% sure about which data you need and where to find  nr_hw_queues exposed. Please ask, if you need more data.

+ lsscsi
[0:0:0:0]    disk    ATA      ST500NM0011      PA09  /dev/sda
[0:0:32:0]   enclosu DP       BP13G+           2.23  -
[12:0:0:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdb
[12:0:1:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdc
[12:0:2:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdd
[12:0:3:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sde
[12:0:4:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdf
[12:0:5:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdg
[12:0:6:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdh
[12:0:7:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdi
[12:0:8:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdj
[12:0:9:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdk
[12:0:10:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdl
[12:0:11:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdm
[12:0:12:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdn
[12:0:13:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdo
[12:0:14:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdp
[12:0:15:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdq
[12:0:16:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdr
[12:0:17:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sds
[12:0:18:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdt
[12:0:19:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdu
[12:0:20:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdv
[12:0:21:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdw
[12:0:22:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdx
[12:0:23:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdy
[12:0:24:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdz
[12:0:25:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaa
[12:0:26:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdab
[12:0:27:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdac
[12:0:28:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdad
[12:0:29:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdae
[12:0:30:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaf
[12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
[12:0:32:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
[12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
[12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
[12:2:0:0]   storage Adaptec  1100-8e          3.21  -
+ for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
+ lsscsi -L 12:2:0:0
[12:2:0:0]   storage Adaptec  1100-8e          3.21  -
   device_blocked=0
   iocounterbits=32
   iodone_cnt=0x1e
   ioerr_cnt=0x0
   iorequest_cnt=0x1e
   queue_depth=1013
   queue_type=simple
   scsi_level=6
   state=running
   timeout=30
   type=12
+ for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
+ lsscsi -L 12:0:34:0
[12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
   device_blocked=0
   iocounterbits=32
   iodone_cnt=0x12
   ioerr_cnt=0x0
   iorequest_cnt=0x12
   queue_depth=1
   queue_type=none
   scsi_level=6
   state=running
   timeout=30
   type=13
+ for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
+ lsscsi -L 12:0:33:0
[12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
   device_blocked=0
   iocounterbits=32
   iodone_cnt=0x12
   ioerr_cnt=0x0
   iorequest_cnt=0x12
   queue_depth=1
   queue_type=simple
   scsi_level=6
   state=running
   timeout=30
   type=13
+ for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
+ lsscsi -L 12:0:31:0
[12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
   device_blocked=0
   iocounterbits=32
   iodone_cnt=0x19b94
   ioerr_cnt=0x0
   iorequest_cnt=0x19bba
   queue_depth=64
   queue_type=simple
   scsi_level=8
   state=running
   timeout=30
   type=0
+ cat /sys/bus/scsi/devices/host12/scsi_host/host12/can_queue
1013

Best
   Donald

> 
> Cheers,
> John
> 
>>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Don Brace <Don.Brace@microchip.com>
>> Cc: Kevin Barnett <Kevin.Barnett@microchip.com>
>> Cc: Donald Buczek <buczek@molgen.mpg.de>
>> Cc: John Garry <john.garry@huawei.com>
>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>> Signed-off-by: Martin Wilck <mwilck@suse.com>
>> ---
>>   drivers/scsi/hosts.c    | 2 ++
>>   drivers/scsi/scsi_lib.c | 8 +++++---
>>   2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index 2f162603876f..1c452a1c18fd 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct request *rq, void *data,
>>       int *count = data;
>>       struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>> +    /* This pairs with set_bit() in scsi_host_queue_ready() */
>> +    smp_mb__before_atomic();
>>       if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
>>           (*count)++;
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index b3f14f05340a..0a9a36c349ee 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1353,8 +1353,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>>       if (scsi_host_in_recovery(shost))
>>           return 0;
>> +    set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
>> +    /* This pairs with test_bit() in scsi_host_check_in_flight() */
>> +    smp_mb__after_atomic();
>> +
>>       if (atomic_read(&shost->host_blocked) > 0) {
>> -        if (scsi_host_busy(shost) > 0)
>> +        if (scsi_host_busy(shost) > 1)
>>               goto starved;
>>           /*
>> @@ -1379,8 +1383,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>>           spin_unlock_irq(shost->host_lock);
>>       }
>> -    __set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
>> -
>>       return 1;
>>   starved:
>>
> 

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 12:01   ` Donald Buczek
@ 2021-01-21 12:35     ` John Garry
  2021-01-21 12:44       ` Donald Buczek
  0 siblings, 1 reply; 27+ messages in thread
From: John Garry @ 2021-01-21 12:35 UTC (permalink / raw)
  To: Donald Buczek, mwilck, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel, Hannes Reinecke

On 21/01/2021 12:01, Donald Buczek wrote:
>>> pts to fix it by moving setting the SCMD_STATE_INFLIGHT before
>>> the host_blocked test again. It also inserts barriers to make sure
>>> scsi_host_busy() on once CPU will notice the increase of the count from another.
>>>
>>> [1]:https://marc.info/?l=linux-scsi&m=160271263114829&w=2
>>> [2]:https://marc.info/?l=linux-scsi&m=161116163722099&w=2
>>>
>>> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
>> For failing test, it would be good to know:
>> - host .can_queue
>> - host .nr_hw_queues
>> - number of LUNs in test and their queue dept>> All can be got from lsscsi, apart from nr_hw_queues, which can be got from scsi_host sysfs for kernel >= 5.10
> The last test was on 6eb045e092ef + Martins experimental patch, which technically is 5.4.0-rc1
> 
> I'm not 100% sure about which data you need and where to find  nr_hw_queues exposed. 

nr_hw_queues is not available on 5.4 kernel via sysfs

it's prob same as count of CPUs in the system

or you can check number of hctxX folders in /sys/kernel/debug/block/sdX 
(need to be root, and debugfs enabled)

Please ask, if you need more data.
> 
> + lsscsi
> [0:0:0:0]    disk    ATA      ST500NM0011      PA09  /dev/sda
> [0:0:32:0]   enclosu DP       BP13G+           2.23  -
> [12:0:0:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdb
> [12:0:1:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdc
> [12:0:2:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdd
> [12:0:3:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sde
> [12:0:4:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdf
> [12:0:5:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdg
> [12:0:6:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdh
> [12:0:7:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdi
> [12:0:8:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdj
> [12:0:9:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdk
> [12:0:10:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdl
> [12:0:11:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdm
> [12:0:12:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdn
> [12:0:13:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdo
> [12:0:14:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdp
> [12:0:15:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdq
> [12:0:16:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdr
> [12:0:17:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sds
> [12:0:18:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdt
> [12:0:19:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdu
> [12:0:20:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdv
> [12:0:21:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdw
> [12:0:22:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdx
> [12:0:23:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdy
> [12:0:24:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdz
> [12:0:25:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaa
> [12:0:26:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdab
> [12:0:27:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdac
> [12:0:28:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdad
> [12:0:29:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdae
> [12:0:30:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaf
> [12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
> [12:0:32:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
> [12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
> [12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
> [12:2:0:0]   storage Adaptec  1100-8e          3.21  -
> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0

I figure sdev queue depth is 64 for all disks, like /dev/sdag, below.

lsscsi -l would give that

But, as said here:
https://lore.kernel.org/linux-scsi/20200714104437.GB602708@T590/

Looks like block layer can send too many commands to SCSI host if trying 
to achieve highest datarates with all those 32x disks. Before 
6eb045e092ef, that would be avoided. That's how I see it.

> + lsscsi -L 12:2:0:0
> [12:2:0:0]   storage Adaptec  1100-8e          3.21  -
>     device_blocked=0
>     iocounterbits=32
>     iodone_cnt=0x1e
>     ioerr_cnt=0x0
>     iorequest_cnt=0x1e
>     queue_depth=1013
>     queue_type=simple
>     scsi_level=6
>     state=running
>     timeout=30
>     type=12
> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
> + lsscsi -L 12:0:34:0
> [12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
>     device_blocked=0
>     iocounterbits=32
>     iodone_cnt=0x12
>     ioerr_cnt=0x0
>     iorequest_cnt=0x12
>     queue_depth=1
>     queue_type=none
>     scsi_level=6
>     state=running
>     timeout=30
>     type=13
> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
> + lsscsi -L 12:0:33:0
> [12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
>     device_blocked=0
>     iocounterbits=32
>     iodone_cnt=0x12
>     ioerr_cnt=0x0
>     iorequest_cnt=0x12
>     queue_depth=1
>     queue_type=simple
>     scsi_level=6
>     state=running
>     timeout=30
>     type=13
> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
> + lsscsi -L 12:0:31:0
> [12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
>     device_blocked=0
>     iocounterbits=32
>     iodone_cnt=0x19b94
>     ioerr_cnt=0x0
>     iorequest_cnt=0x19bba
>     queue_depth=64
>     queue_type=simple
>     scsi_level=8
>     state=running
>     timeout=30
>     type=0
> + cat /sys/bus/scsi/devices/host12/scsi_host/host12/can_queue
> 1013
> 

Thanks,
John


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 12:35     ` John Garry
@ 2021-01-21 12:44       ` Donald Buczek
  2021-01-21 13:05         ` John Garry
  0 siblings, 1 reply; 27+ messages in thread
From: Donald Buczek @ 2021-01-21 12:44 UTC (permalink / raw)
  To: John Garry, mwilck, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel, Hannes Reinecke

On 21.01.21 13:35, John Garry wrote:
> On 21/01/2021 12:01, Donald Buczek wrote:
>>>> pts to fix it by moving setting the SCMD_STATE_INFLIGHT before
>>>> the host_blocked test again. It also inserts barriers to make sure
>>>> scsi_host_busy() on once CPU will notice the increase of the count from another.
>>>>
>>>> [1]:https://marc.info/?l=linux-scsi&m=160271263114829&w=2
>>>> [2]:https://marc.info/?l=linux-scsi&m=161116163722099&w=2
>>>>
>>>> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
>>> For failing test, it would be good to know:
>>> - host .can_queue
>>> - host .nr_hw_queues
>>> - number of LUNs in test and their queue dept>> All can be got from lsscsi, apart from nr_hw_queues, which can be got from scsi_host sysfs for kernel >= 5.10
>> The last test was on 6eb045e092ef + Martins experimental patch, which technically is 5.4.0-rc1
>>
>> I'm not 100% sure about which data you need and where to find  nr_hw_queues exposed. 
> 
> nr_hw_queues is not available on 5.4 kernel via sysfs
> 
> it's prob same as count of CPUs in the system
> 
> or you can check number of hctxX folders in /sys/kernel/debug/block/sdX (need to be root, and debugfs enabled)

root@deadbird:~# ls -d /sys/kernel/debug/block/sdz/hc*
/sys/kernel/debug/block/sdz/hctx0   /sys/kernel/debug/block/sdz/hctx16	/sys/kernel/debug/block/sdz/hctx23  /sys/kernel/debug/block/sdz/hctx30	/sys/kernel/debug/block/sdz/hctx38
/sys/kernel/debug/block/sdz/hctx1   /sys/kernel/debug/block/sdz/hctx17	/sys/kernel/debug/block/sdz/hctx24  /sys/kernel/debug/block/sdz/hctx31	/sys/kernel/debug/block/sdz/hctx39
/sys/kernel/debug/block/sdz/hctx10  /sys/kernel/debug/block/sdz/hctx18	/sys/kernel/debug/block/sdz/hctx25  /sys/kernel/debug/block/sdz/hctx32	/sys/kernel/debug/block/sdz/hctx4
/sys/kernel/debug/block/sdz/hctx11  /sys/kernel/debug/block/sdz/hctx19	/sys/kernel/debug/block/sdz/hctx26  /sys/kernel/debug/block/sdz/hctx33	/sys/kernel/debug/block/sdz/hctx5
/sys/kernel/debug/block/sdz/hctx12  /sys/kernel/debug/block/sdz/hctx2	/sys/kernel/debug/block/sdz/hctx27  /sys/kernel/debug/block/sdz/hctx34	/sys/kernel/debug/block/sdz/hctx6
/sys/kernel/debug/block/sdz/hctx13  /sys/kernel/debug/block/sdz/hctx20	/sys/kernel/debug/block/sdz/hctx28  /sys/kernel/debug/block/sdz/hctx35	/sys/kernel/debug/block/sdz/hctx7
/sys/kernel/debug/block/sdz/hctx14  /sys/kernel/debug/block/sdz/hctx21	/sys/kernel/debug/block/sdz/hctx29  /sys/kernel/debug/block/sdz/hctx36	/sys/kernel/debug/block/sdz/hctx8
/sys/kernel/debug/block/sdz/hctx15  /sys/kernel/debug/block/sdz/hctx22	/sys/kernel/debug/block/sdz/hctx3   /sys/kernel/debug/block/sdz/hctx37	/sys/kernel/debug/block/sdz/hctx9
root@deadbird:~# ls -d /sys/kernel/debug/block/sdz/hc*|wc
      40      40    1390
root@deadbird:~# nproc
40


> Please ask, if you need more data.
>>
>> + lsscsi
>> [0:0:0:0]    disk    ATA      ST500NM0011      PA09  /dev/sda
>> [0:0:32:0]   enclosu DP       BP13G+           2.23  -
>> [12:0:0:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdb
>> [12:0:1:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdc
>> [12:0:2:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdd
>> [12:0:3:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sde
>> [12:0:4:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdf
>> [12:0:5:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdg
>> [12:0:6:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdh
>> [12:0:7:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdi
>> [12:0:8:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdj
>> [12:0:9:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdk
>> [12:0:10:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdl
>> [12:0:11:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdm
>> [12:0:12:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdn
>> [12:0:13:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdo
>> [12:0:14:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdp
>> [12:0:15:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdq
>> [12:0:16:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdr
>> [12:0:17:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sds
>> [12:0:18:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdt
>> [12:0:19:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdu
>> [12:0:20:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdv
>> [12:0:21:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdw
>> [12:0:22:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdx
>> [12:0:23:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdy
>> [12:0:24:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdz
>> [12:0:25:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaa
>> [12:0:26:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdab
>> [12:0:27:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdac
>> [12:0:28:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdad
>> [12:0:29:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdae
>> [12:0:30:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaf
>> [12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
>> [12:0:32:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
>> [12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
>> [12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
>> [12:2:0:0]   storage Adaptec  1100-8e          3.21  -
>> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
> 
> I figure sdev queue depth is 64 for all disks, like /dev/sdag, below.

Yes, I send an example (one of two enclosures, 1 of 32 disks) but verified, that they are the same.

Best
   Donald

> 
> lsscsi -l would give that
> 
> But, as said here:
> https://lore.kernel.org/linux-scsi/20200714104437.GB602708@T590/
> 
> Looks like block layer can send too many commands to SCSI host if trying to achieve highest datarates with all those 32x disks. Before 6eb045e092ef, that would be avoided. That's how I see it.
> 
>> + lsscsi -L 12:2:0:0
>> [12:2:0:0]   storage Adaptec  1100-8e          3.21  -
>>     device_blocked=0
>>     iocounterbits=32
>>     iodone_cnt=0x1e
>>     ioerr_cnt=0x0
>>     iorequest_cnt=0x1e
>>     queue_depth=1013
>>     queue_type=simple
>>     scsi_level=6
>>     state=running
>>     timeout=30
>>     type=12
>> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
>> + lsscsi -L 12:0:34:0
>> [12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
>>     device_blocked=0
>>     iocounterbits=32
>>     iodone_cnt=0x12
>>     ioerr_cnt=0x0
>>     iorequest_cnt=0x12
>>     queue_depth=1
>>     queue_type=none
>>     scsi_level=6
>>     state=running
>>     timeout=30
>>     type=13
>> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
>> + lsscsi -L 12:0:33:0
>> [12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
>>     device_blocked=0
>>     iocounterbits=32
>>     iodone_cnt=0x12
>>     ioerr_cnt=0x0
>>     iorequest_cnt=0x12
>>     queue_depth=1
>>     queue_type=simple
>>     scsi_level=6
>>     state=running
>>     timeout=30
>>     type=13
>> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
>> + lsscsi -L 12:0:31:0
>> [12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
>>     device_blocked=0
>>     iocounterbits=32
>>     iodone_cnt=0x19b94
>>     ioerr_cnt=0x0
>>     iorequest_cnt=0x19bba
>>     queue_depth=64
>>     queue_type=simple
>>     scsi_level=8
>>     state=running
>>     timeout=30
>>     type=0
>> + cat /sys/bus/scsi/devices/host12/scsi_host/host12/can_queue
>> 1013
>>
> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 12:44       ` Donald Buczek
@ 2021-01-21 13:05         ` John Garry
  2021-01-21 23:32           ` Martin Wilck
                             ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: John Garry @ 2021-01-21 13:05 UTC (permalink / raw)
  To: Donald Buczek, mwilck, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel, Hannes Reinecke

>>> I'm not 100% sure about which data you need and where to find  
>>> nr_hw_queues exposed. 
>>
>> nr_hw_queues is not available on 5.4 kernel via sysfs
>>
>> it's prob same as count of CPUs in the system
>>
>> or you can check number of hctxX folders in 
>> /sys/kernel/debug/block/sdX (need to be root, and debugfs enabled)
> 
> root@deadbird:~# ls -d /sys/kernel/debug/block/sdz/hc*
> /sys/kernel/debug/block/sdz/hctx0 /sys/kernel/debug/block/sdz/hctx16 
> /sys/kernel/debug/block/sdz/hctx23 /sys/kernel/debug/block/sdz/hctx30    
> /sys/kernel/debug/block/sdz/hctx38
> /sys/kernel/debug/block/sdz/hctx1 /sys/kernel/debug/block/sdz/hctx17 
> /sys/kernel/debug/block/sdz/hctx24 /sys/kernel/debug/block/sdz/hctx31    
> /sys/kernel/debug/block/sdz/hctx39
> /sys/kernel/debug/block/sdz/hctx10 /sys/kernel/debug/block/sdz/hctx18 
> /sys/kernel/debug/block/sdz/hctx25 /sys/kernel/debug/block/sdz/hctx32    
> /sys/kernel/debug/block/sdz/hctx4
> /sys/kernel/debug/block/sdz/hctx11 /sys/kernel/debug/block/sdz/hctx19 
> /sys/kernel/debug/block/sdz/hctx26 /sys/kernel/debug/block/sdz/hctx33    
> /sys/kernel/debug/block/sdz/hctx5
> /sys/kernel/debug/block/sdz/hctx12 /sys/kernel/debug/block/sdz/hctx2    
> /sys/kernel/debug/block/sdz/hctx27 /sys/kernel/debug/block/sdz/hctx34    
> /sys/kernel/debug/block/sdz/hctx6
> /sys/kernel/debug/block/sdz/hctx13 /sys/kernel/debug/block/sdz/hctx20 
> /sys/kernel/debug/block/sdz/hctx28 /sys/kernel/debug/block/sdz/hctx35    
> /sys/kernel/debug/block/sdz/hctx7
> /sys/kernel/debug/block/sdz/hctx14 /sys/kernel/debug/block/sdz/hctx21 
> /sys/kernel/debug/block/sdz/hctx29 /sys/kernel/debug/block/sdz/hctx36    
> /sys/kernel/debug/block/sdz/hctx8
> /sys/kernel/debug/block/sdz/hctx15 /sys/kernel/debug/block/sdz/hctx22 
> /sys/kernel/debug/block/sdz/hctx3 /sys/kernel/debug/block/sdz/hctx37    
> /sys/kernel/debug/block/sdz/hctx9
> root@deadbird:~# ls -d /sys/kernel/debug/block/sdz/hc*|wc
>       40      40    1390
> root@deadbird:~# nproc
> 40
> 
> 
>> Please ask, if you need more data.
>>>
>>> + lsscsi
>>> [0:0:0:0]    disk    ATA      ST500NM0011      PA09  /dev/sda
>>> [0:0:32:0]   enclosu DP       BP13G+           2.23  -
>>> [12:0:0:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdb
>>> [12:0:1:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdc
>>> [12:0:2:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdd
>>> [12:0:3:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sde
>>> [12:0:4:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdf
>>> [12:0:5:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdg
>>> [12:0:6:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdh
>>> [12:0:7:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdi
>>> [12:0:8:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdj
>>> [12:0:9:0]   disk    SEAGATE  ST8000NM001A     E002  /dev/sdk
>>> [12:0:10:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdl
>>> [12:0:11:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdm
>>> [12:0:12:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdn
>>> [12:0:13:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdo
>>> [12:0:14:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdp
>>> [12:0:15:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdq
>>> [12:0:16:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdr
>>> [12:0:17:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sds
>>> [12:0:18:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdt
>>> [12:0:19:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdu
>>> [12:0:20:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdv
>>> [12:0:21:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdw
>>> [12:0:22:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdx
>>> [12:0:23:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdy
>>> [12:0:24:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdz
>>> [12:0:25:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaa
>>> [12:0:26:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdab
>>> [12:0:27:0]  disk    SEAGATE  ST8000NM001A     E002  /dev/sdac
>>> [12:0:28:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdad
>>> [12:0:29:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdae
>>> [12:0:30:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdaf
>>> [12:0:31:0]  disk    SEAGATE  ST8000NM001A     E001  /dev/sdag
>>> [12:0:32:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
>>> [12:0:33:0]  enclosu AIC 12G  3U16SAS3swap     0c01  -
>>> [12:0:34:0]  enclosu Adaptec  Smart Adapter    3.21  -
>>> [12:2:0:0]   storage Adaptec  1100-8e          3.21  -
>>> + for i in 12:2:0:0 12:0:34:0 12:0:33:0 12:0:31:0
>>
>> I figure sdev queue depth is 64 for all disks, like /dev/sdag, below.
> 
> Yes, I send an example (one of two enclosures, 1 of 32 disks) but 
> verified, that they are the same.

Confirmed my suspicions - it looks like the host is sent more commands 
than it can handle. We would need many disks to see this issue though, 
which you have.

So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and I 
suppose it could be simply fixed by setting .host_tagset in scsi host 
template there.

Thanks,
John

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 13:05         ` John Garry
@ 2021-01-21 23:32           ` Martin Wilck
  2021-03-11 16:36             ` Donald Buczek
  2021-02-01 22:44           ` Don.Brace
  2021-02-02 20:04           ` Don.Brace
  2 siblings, 1 reply; 27+ messages in thread
From: Martin Wilck @ 2021-01-21 23:32 UTC (permalink / raw)
  To: John Garry, Donald Buczek, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel, Hannes Reinecke

On Thu, 2021-01-21 at 13:05 +0000, John Garry wrote:
> > > 
> 
> Confirmed my suspicions - it looks like the host is sent more
> commands 
> than it can handle. We would need many disks to see this issue
> though, 
> which you have.
> 
> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
> I 
> suppose it could be simply fixed by setting .host_tagset in scsi host
> template there.

If it's really just that, it should be easy enough to verify.

@Donald, you'd need to test with a 5.10 kernel, and after reproducing
the issue, add 

        .host_tagset            = 1,

to the definition of pqi_driver_template in
drivers/scsi/smartpqi/smartpqi_init.c. 

You don't need a patch to test that, I believe. Would you able to do
this test?

Regards,
Martin



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 10:05   ` Martin Wilck
@ 2021-01-22  0:14     ` Martin Wilck
  0 siblings, 0 replies; 27+ messages in thread
From: Martin Wilck @ 2021-01-22  0:14 UTC (permalink / raw)
  To: Donald Buczek, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, John Garry, Paul Menzel

On Thu, 2021-01-21 at 11:05 +0100, Martin Wilck wrote:
> On Thu, 2021-01-21 at 10:07 +0100, Donald Buczek wrote:
> > 
> > Unfortunately, this patch (on top of 6eb045e092ef) did not fix the
> > problem. Same error (""controller is offline: status code
> > 0x6100c"")

Rethinking, this patch couldn't have fixed your problem. I apologize
for the dumb suggestion.

However, I still believe I had a point in that patch, and would like to
ask the experts for an opinion.

Assume no commands in flight (busy == 0), host_blocked == 2, and that
two CPUs enter scsi_host_queue_ready() at the same time. Before
6eb045e092ef, it was impossible that the function returned 1 on either
CPU in this situation.

CPU 1                              CPU 2

scsi_host_queue_ready              scsi_host_queue_ready
   host_busy = 1                      host_busy = 2
      (busy = 0)                         (busy = 1)
   host_blocked = 1                   goto starved
   goto out_dec                       add sdev to starved list
   host_busy = 1                      host_busy = 0
   return 0                           return 0
                                      
With 6eb045e092ef (and without my patch), the result could be:

CPU 1                              CPU 2

scsi_host_queue_ready              scsi_host_queue_ready
  read scsi_host_busy() = 0          read scsi_host_busy() = 0
  host_blocked = 1                   host_blocked = 0
  goto out_dec                       remove sdev from starved list
  return 0                           set SCMD_STATE_INFLIGHT 
                                       (host_busy = 1)
                                     return 1
  
So now, one command can be sent, and the host is unblocked. A very
different outcome than before. Was this intentional?

Thanks
Martin




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-20 18:45 [PATCH] scsi: scsi_host_queue_ready: increase busy count early mwilck
  2021-01-20 20:26 ` John Garry
  2021-01-21  9:07 ` Donald Buczek
@ 2021-01-22  3:23 ` Ming Lei
  2021-01-22 14:05   ` Martin Wilck
  2 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2021-01-22  3:23 UTC (permalink / raw)
  To: mwilck
  Cc: Donald Buczek, Martin K. Petersen, James Bottomley, linux-scsi,
	Hannes Reinecke, Don Brace, Kevin Barnett, John Garry,
	Paul Menzel

On Wed, Jan 20, 2021 at 07:45:48PM +0100, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Donald: please give this patch a try.
> 
> Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> contained this hunk:
> 
> -       busy = atomic_inc_return(&shost->host_busy) - 1;
>         if (atomic_read(&shost->host_blocked) > 0) {
> -               if (busy)
> +               if (scsi_host_busy(shost) > 0)
>                         goto starved;
> 
> The previous code would increase the busy count before checking host_blocked.
> With 6eb045e092ef, the busy count would be increased (by setting the
> SCMD_STATE_INFLIGHT bit) after the if clause for host_blocked above.
> 
> Users have reported a regression with the smartpqi driver [1] which has been
> shown to be caused by this commit [2].
> 
> It seems that by moving the increase of the busy counter further down, it could
> happen that the can_queue limit of the controller could be exceeded if several
> CPUs were executing this code in parallel on different queues.

can_queue limit should never be exceeded because it is respected by
blk-mq since each hw queue's queue depth is .can_queue.

smartpqi's issue is that its .can_queue does not represent each hw
queue's depth, instead the .can_queue represents queue depth of the
whole HBA.

As John mentioned, smartpqi should have switched to hosttags.

BTW, looks the following code has soft lockup risk:

pqi_alloc_io_request():
        while (1) {
                io_request = &ctrl_info->io_request_pool[i];
                if (atomic_inc_return(&io_request->refcount) == 1)
                        break;
                atomic_dec(&io_request->refcount);
                i = (i + 1) % ctrl_info->max_io_slots;
        }

> 
> This patch attempts to fix it by moving setting the SCMD_STATE_INFLIGHT before
> the host_blocked test again. It also inserts barriers to make sure
> scsi_host_busy() on once CPU will notice the increase of the count from another.
> 
> [1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
> [2]: https://marc.info/?l=linux-scsi&m=161116163722099&w=2

If the above is true wrt. smartpqi's can_queue usage, your patch may not fix the
issue completely in which you think '.can_queue is exceeded'.

> 
> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Don Brace <Don.Brace@microchip.com>
> Cc: Kevin Barnett <Kevin.Barnett@microchip.com>
> Cc: Donald Buczek <buczek@molgen.mpg.de>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/hosts.c    | 2 ++
>  drivers/scsi/scsi_lib.c | 8 +++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 2f162603876f..1c452a1c18fd 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct request *rq, void *data,
>  	int *count = data;
>  	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>  
> +	/* This pairs with set_bit() in scsi_host_queue_ready() */
> +	smp_mb__before_atomic();

So the above barrier orders atomic_read(&shost->host_blocked) and
test_bit()?

>  	if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
>  		(*count)++;
>  
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b3f14f05340a..0a9a36c349ee 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1353,8 +1353,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  	if (scsi_host_in_recovery(shost))
>  		return 0;
>  
> +	set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> +	/* This pairs with test_bit() in scsi_host_check_in_flight() */
> +	smp_mb__after_atomic();
> +
>  	if (atomic_read(&shost->host_blocked) > 0) {
> -		if (scsi_host_busy(shost) > 0)
> +		if (scsi_host_busy(shost) > 1)
>  			goto starved;
>  
>  		/*
> @@ -1379,8 +1383,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  		spin_unlock_irq(shost->host_lock);
>  	}
>  
> -	__set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> -

Looks this patch fine.

However, I'd suggest to confirm smartpqi's .can_queue usage first, which
looks one big issue.

-- 
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-22  3:23 ` Ming Lei
@ 2021-01-22 14:05   ` Martin Wilck
  0 siblings, 0 replies; 27+ messages in thread
From: Martin Wilck @ 2021-01-22 14:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: Donald Buczek, Martin K. Petersen, James Bottomley, linux-scsi,
	Hannes Reinecke, Don Brace, Kevin Barnett, John Garry,
	Paul Menzel

On Fri, 2021-01-22 at 11:23 +0800, Ming Lei wrote:
> > 
> > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> > index 2f162603876f..1c452a1c18fd 100644
> > --- a/drivers/scsi/hosts.c
> > +++ b/drivers/scsi/hosts.c
> > @@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct
> > request *rq, void *data,
> >         int *count = data;
> >         struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
> >  
> > +       /* This pairs with set_bit() in scsi_host_queue_ready() */
> > +       smp_mb__before_atomic();
> 
> So the above barrier orders atomic_read(&shost->host_blocked) and
> test_bit()?

Yes, I believe the change to SCMD_STATE_INFLIGHT should be visible
before host_blocked is tested. For that, I would need to insert a full
smb_mb() instead of smp_mb_after_atomic() below, though ...
right? Which means that the smp_mb__before_atomic() above could be
removed. The important thing is that if two threads execute
scsi_host_queue_ready() simultaneously, one of them must see the
SCMD_STATE_INFLIGHT bit of the other.

Btw, I realized that calling scsi_host_busy() in
scsi_host_queue_ready() is inefficient, because all we need to know
there is whether the value is > 1; we don't need to iterate through the
entire tag space.

Thanks,
Martin

> 
> >         if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
> >                 (*count)++;
> >  
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index b3f14f05340a..0a9a36c349ee 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1353,8 +1353,12 @@ static inline int
> > scsi_host_queue_ready(struct request_queue *q,
> >         if (scsi_host_in_recovery(shost))
> >                 return 0;
> >  
> > +       set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> > +       /* This pairs with test_bit() in
> > scsi_host_check_in_flight() */
> > +       smp_mb__after_atomic();
> > +
> >         if (atomic_read(&shost->host_blocked) > 0) {
> > -               if (scsi_host_busy(shost) > 0)
> > +               if (scsi_host_busy(shost) > 1)
> >                         goto starved;
> >  
> >                 /*
> > @@ -1379,8 +1383,6 @@ static inline int
> > scsi_host_queue_ready(struct request_queue *q,
> >                 spin_unlock_irq(shost->host_lock);
> >         }
> >  
> > -       __set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> > -
> 
> Looks this patch fine.
> 
> However, I'd suggest to confirm smartpqi's .can_queue usage first,
> which
> looks one big issue.



> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 13:05         ` John Garry
  2021-01-21 23:32           ` Martin Wilck
@ 2021-02-01 22:44           ` Don.Brace
  2021-02-02 20:04           ` Don.Brace
  2 siblings, 0 replies; 27+ messages in thread
From: Don.Brace @ 2021-02-01 22:44 UTC (permalink / raw)
  To: john.garry, buczek, mwilck, martin.petersen, ming.lei
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

-----Original Message-----
From: John Garry [mailto:john.garry@huawei.com] 

Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

>>> I'm not 100% sure about which data you need and where to find 
>>> nr_hw_queues exposed.
>>
>> nr_hw_queues is not available on 5.4 kernel via sysfs
>>
>> it's prob same as count of CPUs in the system
>>
>> or you can check number of hctxX folders in 
>> /sys/kernel/debug/block/sdX (need to be root, and debugfs enabled)
>
>> I figure sdev queue depth is 64 for all disks, like /dev/sdag, below.
>
> Yes, I send an example (one of two enclosures, 1 of 32 disks) but 
> verified, that they are the same.

Confirmed my suspicions - it looks like the host is sent more commands than it can handle. We would need many disks to see this issue though, which you have.

So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and I suppose it could be simply fixed by setting .host_tagset in scsi host template there.

Thanks,
John


John, I tried this for smartpqi, so far, setting host_tagset = 1 seems to be working. The issue normally repros very quickly.

I want to run a few more tests before calling this a good fix.
Thanks for your suggestion,
Don Brace 

----
[root@cleddyf ~]# lsscsi
[0:0:0:0]    disk    Generic- SD/MMC CRW       1.00  /dev/sdc 
[1:0:0:0]    disk    ASMT     2115             0     /dev/sda 
[2:0:0:0]    disk    ASMT     2115             0     /dev/sdb 
[3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd 
[3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde 
[3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf 
[3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg 
[3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh 
[3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi 
[3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj 
[3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk 
[3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl 
[3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm 
[3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn 
[3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo 
[3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp 
[3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq 
[3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr 
[3:0:15:0]   enclosu HP       D3700            5.04  -        
[3:0:16:0]   enclosu HP       D3700            5.04  -        
[3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -        
[3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds 
[3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -      

---
[global]
ioengine=libaio
; rw=randwrite
; percentage_random=40
rw=write
size=100g
bs=4k
direct=1
ramp_time=15
; filename=/mnt/fio_test
; cpus_allowed=0-27
iodepth=1024

[/dev/sdd]
[/dev/sde]
[/dev/sdf]
[/dev/sdg]
[/dev/sdh]
[/dev/sdi]
[/dev/sdj]
[/dev/sdk]
[/dev/sdl]
[/dev/sdm]
[/dev/sdn]
[/dev/sdo]
[/dev/sdp]
[/dev/sdq]
[/dev/sdr]  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 13:05         ` John Garry
  2021-01-21 23:32           ` Martin Wilck
  2021-02-01 22:44           ` Don.Brace
@ 2021-02-02 20:04           ` Don.Brace
  2021-02-02 20:48             ` Martin Wilck
  2 siblings, 1 reply; 27+ messages in thread
From: Don.Brace @ 2021-02-02 20:04 UTC (permalink / raw)
  To: john.garry, buczek, mwilck, martin.petersen, ming.lei, mwilck
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

-----Original Message-----
From: John Garry [mailto:john.garry@huawei.com] 
Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early


Confirmed my suspicions - it looks like the host is sent more commands than it can handle. We would need many disks to see this issue though, which you have.

So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and I suppose it could be simply fixed by setting .host_tagset in scsi host template there.

Thanks,
John
--
Don: Even though this works for current kernels, what would chances of this getting back-ported to 5.9 or even further?

Otherwise the original patch smartpqi_fix_host_qdepth_limit would correct this issue for older kernels.

Thanks,
Don Brace 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-02 20:04           ` Don.Brace
@ 2021-02-02 20:48             ` Martin Wilck
  2021-02-03  8:49               ` John Garry
  2021-02-03 15:56               ` Don.Brace
  0 siblings, 2 replies; 27+ messages in thread
From: Martin Wilck @ 2021-02-02 20:48 UTC (permalink / raw)
  To: Don.Brace, john.garry, buczek, martin.petersen, ming.lei
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

On Tue, 2021-02-02 at 20:04 +0000, Don.Brace@microchip.com wrote:
> -----Original Message-----
> From: John Garry [mailto:john.garry@huawei.com] 
> Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count
> early
> 
> 
> Confirmed my suspicions - it looks like the host is sent more
> commands than it can handle. We would need many disks to see this
> issue though, which you have.
> 
> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
> I suppose it could be simply fixed by setting .host_tagset in scsi
> host template there.
> 
> Thanks,
> John
> --
> Don: Even though this works for current kernels, what would chances
> of this getting back-ported to 5.9 or even further?
> 
> Otherwise the original patch smartpqi_fix_host_qdepth_limit would
> correct this issue for older kernels.

True. However this is 5.12 material, so we shouldn't be bothered by
that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure
whether smartpqi_fix_host_qdepth_limit would be the solution.
You could simply divide can_queue by nr_hw_queues, as suggested before,
or even simpler, set nr_hw_queues = 1.

How much performance would that cost you?

Distribution kernels would be yet another issue, distros can backport
host_tagset and get rid of the issue.

Regards
Martin










^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-02 20:48             ` Martin Wilck
@ 2021-02-03  8:49               ` John Garry
  2021-02-03  8:58                 ` Paul Menzel
  2021-02-03 15:56               ` Don.Brace
  1 sibling, 1 reply; 27+ messages in thread
From: John Garry @ 2021-02-03  8:49 UTC (permalink / raw)
  To: Martin Wilck, Don.Brace, buczek, martin.petersen, ming.lei
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

On 02/02/2021 20:48, Martin Wilck wrote:
> On Tue, 2021-02-02 at 20:04 +0000, Don.Brace@microchip.com wrote:
>> -----Original Message-----
>> From: John Garry [mailto:john.garry@huawei.com]
>> Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count
>> early
>>
>>
>> Confirmed my suspicions - it looks like the host is sent more
>> commands than it can handle. We would need many disks to see this
>> issue though, which you have.
>>
>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
>> I suppose it could be simply fixed by setting .host_tagset in scsi
>> host template there.
>>
>> Thanks,
>> John
>> --
>> Don: Even though this works for current kernels, what would chances
>> of this getting back-ported to 5.9 or even further?
>>
>> Otherwise the original patch smartpqi_fix_host_qdepth_limit would
>> correct this issue for older kernels.
> 
> True. However this is 5.12 material, so we shouldn't be bothered by
> that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure
> whether smartpqi_fix_host_qdepth_limit would be the solution.
> You could simply divide can_queue by nr_hw_queues, as suggested before,
> or even simpler, set nr_hw_queues = 1.
> 
> How much performance would that cost you?
> 
> Distribution kernels would be yet another issue, distros can backport
> host_tagset and get rid of the issue.

Aren't they (distros) the only issue? As I mentioned above, for 5.10 
mainline stable, I think it's reasonable to backport a patch to set 
.host_tagset for the driver.

Thanks,
John

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-03  8:49               ` John Garry
@ 2021-02-03  8:58                 ` Paul Menzel
  2021-02-03 15:30                   ` Don.Brace
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Menzel @ 2021-02-03  8:58 UTC (permalink / raw)
  To: John Garry, Martin Wilck, Don Brace
  Cc: jejb, linux-scsi, hare, Kevin Barnett, hare, Ming Lei,
	Martin Petersen, Donald Buczek, it+linux-scsi

Dear Linux folks,


Am 03.02.21 um 09:49 schrieb John Garry:
> On 02/02/2021 20:48, Martin Wilck wrote:
>> On Tue, 2021-02-02 at 20:04 +0000, Don.Brace@microchip.com wrote:
>>> -----Original Message-----
>>> From: John Garry [mailto:john.garry@huawei.com]
>>> Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
>>>
>>>
>>> Confirmed my suspicions - it looks like the host is sent more
>>> commands than it can handle. We would need many disks to see this
>>> issue though, which you have.
>>>
>>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
>>> I suppose it could be simply fixed by setting .host_tagset in scsi
>>> host template there.
>>>
>>> Thanks,
>>> John
>>> -- 
>>> Don: Even though this works for current kernels, what would chances
>>> of this getting back-ported to 5.9 or even further?
>>>
>>> Otherwise the original patch smartpqi_fix_host_qdepth_limit would
>>> correct this issue for older kernels.
>>
>> True. However this is 5.12 material, so we shouldn't be bothered by
>> that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure
>> whether smartpqi_fix_host_qdepth_limit would be the solution.
>> You could simply divide can_queue by nr_hw_queues, as suggested before,
>> or even simpler, set nr_hw_queues = 1.
>>
>> How much performance would that cost you?
>>
>> Distribution kernels would be yet another issue, distros can backport
>> host_tagset and get rid of the issue.
> 
> Aren't they (distros) the only issue? As I mentioned above, for 5.10 
> mainline stable, I think it's reasonable to backport a patch to set 
> .host_tagset for the driver.

Indeed. As per the Linux kernel Web site [1]: 5.5 and 5.9 are not 
maintained anymore (EOL) upstream. So, if distributions decided to go 
with another Linux kernel release, it’s their job to backport things. If 
the commit message is well written, and contains the Fixes tag, their 
tooling should be able to pick it up.


Kind regards,

Paul


[1]: https://www.kernel.org/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-03  8:58                 ` Paul Menzel
@ 2021-02-03 15:30                   ` Don.Brace
  0 siblings, 0 replies; 27+ messages in thread
From: Don.Brace @ 2021-02-03 15:30 UTC (permalink / raw)
  To: pmenzel, john.garry, mwilck
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, hare, ming.lei,
	martin.petersen, buczek, it+linux-scsi

-----Original Message-----
From: Paul Menzel [mailto:pmenzel@molgen.mpg.de] 
Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

>>> Confirmed my suspicions - it looks like the host is sent more 
>>> commands than it can handle. We would need many disks to see this 
>>> issue though, which you have.
>>>
>>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, 
>>> and I suppose it could be simply fixed by setting .host_tagset in 
>>> scsi host template there.
>>>
>>> Thanks,
>>> John
>>> --
>>> Don: Even though this works for current kernels, what would chances 
>>> of this getting back-ported to 5.9 or even further?
>>>
>>> Otherwise the original patch smartpqi_fix_host_qdepth_limit would 
>>> correct this issue for older kernels.
>>
>> True. However this is 5.12 material, so we shouldn't be bothered by 
>> that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure 
>> whether smartpqi_fix_host_qdepth_limit would be the solution.
>> You could simply divide can_queue by nr_hw_queues, as suggested 
>> before, or even simpler, set nr_hw_queues = 1.
>>
>> How much performance would that cost you?
>>
>> Distribution kernels would be yet another issue, distros can backport 
>> host_tagset and get rid of the issue.
>
> Aren't they (distros) the only issue? As I mentioned above, for 5.10 
> mainline stable, I think it's reasonable to backport a patch to set 
> .host_tagset for the driver.

Indeed. As per the Linux kernel Web site [1]: 5.5 and 5.9 are not maintained anymore (EOL) upstream. So, if distributions decided to go with another Linux kernel release, it’s their job to backport things. If the commit message is well written, and contains the Fixes tag, their tooling should be able to pick it up.


Kind regards,

Paul


[1]: https://www.kernel.org/

I remove patch smartpqi-fix_host_qdepth_limit and replaced it with a patch that sets host_tagset = 1 in function pqi_register_scsi

Thanks to all for  your hard work.
Don

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-02 20:48             ` Martin Wilck
  2021-02-03  8:49               ` John Garry
@ 2021-02-03 15:56               ` Don.Brace
  2021-02-03 18:25                 ` John Garry
  2021-02-22 14:23                 ` Roger Willcocks
  1 sibling, 2 replies; 27+ messages in thread
From: Don.Brace @ 2021-02-03 15:56 UTC (permalink / raw)
  To: mwilck, john.garry, buczek, martin.petersen, ming.lei
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

-----Original Message-----
From: Martin Wilck [mailto:mwilck@suse.com] 
Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

>
>
> Confirmed my suspicions - it looks like the host is sent more commands 
> than it can handle. We would need many disks to see this issue though, 
> which you have.
>
> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and 
> I suppose it could be simply fixed by setting .host_tagset in scsi 
> host template there.
>
> Thanks,
> John
> --
> Don: Even though this works for current kernels, what would chances of 
> this getting back-ported to 5.9 or even further?
>
> Otherwise the original patch smartpqi_fix_host_qdepth_limit would 
> correct this issue for older kernels.

True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution.
You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1.

How much performance would that cost you?

Don: For my HBA disk tests...

Dividing can_queue / nr_hw_queues is about a 40% drop.
~380K - 400K IOPS
Setting nr_hw_queues = 1 results in a 1.5 X gain in performance.
~980K IOPS
Setting host_tagset = 1
~640K IOPS

So, it seem that setting nr_hw_queues = 1 results in the best performance.

Is this expected? Would this also be true for the future?

Thanks,
Don Brace

Below is my setup.
---
[3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd 
[3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde 
[3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf 
[3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg 
[3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh 
[3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi 
[3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj 
[3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk 
[3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl 
[3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm 
[3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn 
[3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo 
[3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp 
[3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq 
[3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr 
[3:0:15:0]   enclosu HP       D3700            5.04  -        
[3:0:16:0]   enclosu HP       D3700            5.04  -        
[3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -        
[3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds 
[3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -        
-----
[global]
ioengine=libaio
; rw=randwrite
; percentage_random=40
rw=write
size=100g
bs=4k
direct=1
ramp_time=15
; filename=/mnt/fio_test
; cpus_allowed=0-27
iodepth=4096

[/dev/sdd]
[/dev/sde]
[/dev/sdf]
[/dev/sdg]
[/dev/sdh]
[/dev/sdi]
[/dev/sdj]
[/dev/sdk]
[/dev/sdl]
[/dev/sdm]
[/dev/sdn]
[/dev/sdo]
[/dev/sdp]
[/dev/sdq]
[/dev/sdr]


Distribution kernels would be yet another issue, distros can backport host_tagset and get rid of the issue.

Regards
Martin










^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-03 15:56               ` Don.Brace
@ 2021-02-03 18:25                 ` John Garry
  2021-02-03 19:01                   ` Don.Brace
  2021-02-22 14:23                 ` Roger Willcocks
  1 sibling, 1 reply; 27+ messages in thread
From: John Garry @ 2021-02-03 18:25 UTC (permalink / raw)
  To: Don.Brace, mwilck, buczek, martin.petersen, ming.lei
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

On 03/02/2021 15:56, Don.Brace@microchip.com wrote:
> True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution.
> You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1.
> 
> How much performance would that cost you?
> 
> Don: For my HBA disk tests...
> 
> Dividing can_queue / nr_hw_queues is about a 40% drop.
> ~380K - 400K IOPS
> Setting nr_hw_queues = 1 results in a 1.5 X gain in performance.
> ~980K IOPS

So do you just set shost.nr_hw_queues = 1, yet leave the rest of the 
driver as is?

Please note that when changing from nr_hw_queues many -> 1, then the 
default IO scheduler changes from none -> mq-deadline, but I would hope 
that would not make such a big difference.

> Setting host_tagset = 1

For this, v5.11-rc6 has a fix which may affect you (2569063c7140), so 
please include it

> ~640K IOPS
> 
> So, it seem that setting nr_hw_queues = 1 results in the best performance.
> 
> Is this expected? Would this also be true for the future?

Not expected by me

> 
> Thanks,
> Don Brace
> 
> Below is my setup.
> ---
> [3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd
> [3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde
> [3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf
> [3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg
> [3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh
> [3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi
> [3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj
> [3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk
> [3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl
> [3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm
> [3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn
> [3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo
> [3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp
> [3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq
> [3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr
> [3:0:15:0]   enclosu HP       D3700            5.04  -
> [3:0:16:0]   enclosu HP       D3700            5.04  -
> [3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -
> [3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds
> [3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -
> -----
> [global]
> ioengine=libaio
> ; rw=randwrite
> ; percentage_random=40
> rw=write
> size=100g
> bs=4k
> direct=1
> ramp_time=15
> ; filename=/mnt/fio_test
> ; cpus_allowed=0-27
> iodepth=4096

I normally use iodepth circa 40 to 128, but then I normally just do 
rw=read for performance testing

> 
> [/dev/sdd]
> [/dev/sde]
> [/dev/sdf]
> [/dev/sdg]
> [/dev/sdh]
> [/dev/sdi]
> [/dev/sdj]
> [/dev/sdk]
> [/dev/sdl]
> [/dev/sdm]
> [/dev/sdn]
> [/dev/sdo]
> [/dev/sdp]
> [/dev/sdq]
> [/dev/sdr]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-03 18:25                 ` John Garry
@ 2021-02-03 19:01                   ` Don.Brace
  0 siblings, 0 replies; 27+ messages in thread
From: Don.Brace @ 2021-02-03 19:01 UTC (permalink / raw)
  To: john.garry, mwilck, buczek, martin.petersen, ming.lei
  Cc: jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

-----Original Message-----
From: John Garry [mailto:john.garry@huawei.com] 
Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

On 03/02/2021 15:56, Don.Brace@microchip.com wrote:
> True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution.
> You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1.
>
> How much performance would that cost you?
>
> Don: For my HBA disk tests...
>
> Dividing can_queue / nr_hw_queues is about a 40% drop.
> ~380K - 400K IOPS
> Setting nr_hw_queues = 1 results in a 1.5 X gain in performance.
> ~980K IOPS

So do you just set shost.nr_hw_queues = 1, yet leave the rest of the driver as is?

Please note that when changing from nr_hw_queues many -> 1, then the default IO scheduler changes from none -> mq-deadline, but I would hope that would not make such a big difference.

> Setting host_tagset = 1

For this, v5.11-rc6 has a fix which may affect you (2569063c7140), so please include it

> ~640K IOPS
>
> So, it seem that setting nr_hw_queues = 1 results in the best performance.
>
> Is this expected? Would this also be true for the future?

Not expected by me

Don: Ok, setting both host_tagset  = 1 and nr_hw_queues = 1 yields the same better performance, about 940K IOPS.

Thanks,
Don Brace


>
> Thanks,
> Don Brace
>
> Below is my setup.
> ---
> [3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd
> [3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde
> [3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf
> [3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg
> [3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh
> [3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi
> [3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj
> [3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk
> [3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl
> [3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm
> [3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn
> [3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo
> [3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp
> [3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq
> [3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr
> [3:0:15:0]   enclosu HP       D3700            5.04  -
> [3:0:16:0]   enclosu HP       D3700            5.04  -
> [3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -
> [3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds
> [3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -
> -----
> [global]
> ioengine=libaio
> ; rw=randwrite
> ; percentage_random=40
> rw=write
> size=100g
> bs=4k
> direct=1
> ramp_time=15
> ; filename=/mnt/fio_test
> ; cpus_allowed=0-27
> iodepth=4096

I normally use iodepth circa 40 to 128, but then I normally just do rw=read for performance testing

>
> [/dev/sdd]
> [/dev/sde]
> [/dev/sdf]
> [/dev/sdg]
> [/dev/sdh]
> [/dev/sdi]
> [/dev/sdj]
> [/dev/sdk]
> [/dev/sdl]
> [/dev/sdm]
> [/dev/sdn]
> [/dev/sdo]
> [/dev/sdp]
> [/dev/sdq]
> [/dev/sdr]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-03 15:56               ` Don.Brace
  2021-02-03 18:25                 ` John Garry
@ 2021-02-22 14:23                 ` Roger Willcocks
  2021-02-23  8:57                   ` John Garry
  2021-03-01 14:51                   ` Paul Menzel
  1 sibling, 2 replies; 27+ messages in thread
From: Roger Willcocks @ 2021-02-22 14:23 UTC (permalink / raw)
  To: Don.Brace
  Cc: Roger Willcocks, mwilck, john.garry, buczek, martin.petersen,
	ming.lei, jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare

FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)

Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs.

Test software writes to the array using multiple threads in parallel.

The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c

Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it.

Would, of course, be helpful if this was back-ported.

—
Roger



> On 3 Feb 2021, at 15:56, Don.Brace@microchip.com wrote:
> 
> -----Original Message-----
> From: Martin Wilck [mailto:mwilck@suse.com] 
> Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
> 
>> 
>> 
>> Confirmed my suspicions - it looks like the host is sent more commands 
>> than it can handle. We would need many disks to see this issue though, 
>> which you have.
>> 
>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and 
>> I suppose it could be simply fixed by setting .host_tagset in scsi 
>> host template there.
>> 
>> Thanks,
>> John
>> --
>> Don: Even though this works for current kernels, what would chances of 
>> this getting back-ported to 5.9 or even further?
>> 
>> Otherwise the original patch smartpqi_fix_host_qdepth_limit would 
>> correct this issue for older kernels.
> 
> True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution.
> You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1.
> 
> How much performance would that cost you?
> 
> Don: For my HBA disk tests...
> 
> Dividing can_queue / nr_hw_queues is about a 40% drop.
> ~380K - 400K IOPS
> Setting nr_hw_queues = 1 results in a 1.5 X gain in performance.
> ~980K IOPS
> Setting host_tagset = 1
> ~640K IOPS
> 
> So, it seem that setting nr_hw_queues = 1 results in the best performance.
> 
> Is this expected? Would this also be true for the future?
> 
> Thanks,
> Don Brace
> 
> Below is my setup.
> ---
> [3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd 
> [3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde 
> [3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf 
> [3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg 
> [3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh 
> [3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi 
> [3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj 
> [3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk 
> [3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl 
> [3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm 
> [3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn 
> [3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo 
> [3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp 
> [3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq 
> [3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr 
> [3:0:15:0]   enclosu HP       D3700            5.04  -        
> [3:0:16:0]   enclosu HP       D3700            5.04  -        
> [3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -        
> [3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds 
> [3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -        
> -----
> [global]
> ioengine=libaio
> ; rw=randwrite
> ; percentage_random=40
> rw=write
> size=100g
> bs=4k
> direct=1
> ramp_time=15
> ; filename=/mnt/fio_test
> ; cpus_allowed=0-27
> iodepth=4096
> 
> [/dev/sdd]
> [/dev/sde]
> [/dev/sdf]
> [/dev/sdg]
> [/dev/sdh]
> [/dev/sdi]
> [/dev/sdj]
> [/dev/sdk]
> [/dev/sdl]
> [/dev/sdm]
> [/dev/sdn]
> [/dev/sdo]
> [/dev/sdp]
> [/dev/sdq]
> [/dev/sdr]
> 
> 
> Distribution kernels would be yet another issue, distros can backport host_tagset and get rid of the issue.
> 
> Regards
> Martin
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-22 14:23                 ` Roger Willcocks
@ 2021-02-23  8:57                   ` John Garry
  2021-02-23 14:06                     ` Roger Willcocks
  2021-03-01 14:51                   ` Paul Menzel
  1 sibling, 1 reply; 27+ messages in thread
From: John Garry @ 2021-02-23  8:57 UTC (permalink / raw)
  To: Roger Willcocks, Don.Brace
  Cc: mwilck, buczek, martin.petersen, ming.lei, jejb, linux-scsi,
	hare, Kevin.Barnett, pmenzel, hare

On 22/02/2021 14:23, Roger Willcocks wrote:
> FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)
> 
> Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs.
> 
> Test software writes to the array using multiple threads in parallel.
> 
> The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c
> 
> Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it.
> 

That just makes the driver single-queue.

As such, since the driver uses blk_mq_unique_tag_to_hwq(), only hw queue 
#0 will ever be used in the driver.

And then, since the driver still spreads MSI-X interrupt vectors over 
all CPUs [from pci_alloc_vectors(PCI_IRQ_AFFINITY)], if CPUs associated 
with HW queue #0 are offlined (probably just cpu0), there is no CPUs 
available to service queue #0 interrupt. That's what I think would 
happen, from a quick glance at the code.


> Would, of course, be helpful if this was back-ported.
> 
> —
> Roger
> 
> 
> 
>> On 3 Feb 2021, at 15:56, Don.Brace@microchip.com wrote:
>>
>> -----Original Message-----
>> From: Martin Wilck [mailto:mwilck@suse.com]
>> Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
>>
>>>
>>>
>>> Confirmed my suspicions - it looks like the host is sent more commands
>>> than it can handle. We would need many disks to see this issue though,
>>> which you have.
>>>
>>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
>>> I suppose it could be simply fixed by setting .host_tagset in scsi
>>> host template there.
>>>
>>> Thanks,
>>> John
>>> --
>>> Don: Even though this works for current kernels, what would chances of
>>> this getting back-ported to 5.9 or even further?
>>>
>>> Otherwise the original patch smartpqi_fix_host_qdepth_limit would
>>> correct this issue for older kernels.
>>
>> True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution.
>> You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1.
>>
>> How much performance would that cost you?
>>
>> Don: For my HBA disk tests...
>>
>> Dividing can_queue / nr_hw_queues is about a 40% drop.
>> ~380K - 400K IOPS
>> Setting nr_hw_queues = 1 results in a 1.5 X gain in performance.
>> ~980K IOPS
>> Setting host_tagset = 1
>> ~640K IOPS
>>
>> So, it seem that setting nr_hw_queues = 1 results in the best performance.
>>
>> Is this expected? Would this also be true for the future?
>>
>> Thanks,
>> Don Brace
>>
>> Below is my setup.
>> ---
>> [3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd
>> [3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde
>> [3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf
>> [3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg
>> [3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh
>> [3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi
>> [3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj
>> [3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk
>> [3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl
>> [3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm
>> [3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn
>> [3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo
>> [3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp
>> [3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq
>> [3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr
>> [3:0:15:0]   enclosu HP       D3700            5.04  -
>> [3:0:16:0]   enclosu HP       D3700            5.04  -
>> [3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -
>> [3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds
>> [3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -
>> -----
>> [global]
>> ioengine=libaio
>> ; rw=randwrite
>> ; percentage_random=40
>> rw=write
>> size=100g
>> bs=4k
>> direct=1
>> ramp_time=15
>> ; filename=/mnt/fio_test
>> ; cpus_allowed=0-27
>> iodepth=4096
>>
>> [/dev/sdd]
>> [/dev/sde]
>> [/dev/sdf]
>> [/dev/sdg]
>> [/dev/sdh]
>> [/dev/sdi]
>> [/dev/sdj]
>> [/dev/sdk]
>> [/dev/sdl]
>> [/dev/sdm]
>> [/dev/sdn]
>> [/dev/sdo]
>> [/dev/sdp]
>> [/dev/sdq]
>> [/dev/sdr]
>>
>>
>> Distribution kernels would be yet another issue, distros can backport host_tagset and get rid of the issue.
>>
>> Regards
>> Martin
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-23  8:57                   ` John Garry
@ 2021-02-23 14:06                     ` Roger Willcocks
  2021-02-23 16:17                       ` John Garry
  0 siblings, 1 reply; 27+ messages in thread
From: Roger Willcocks @ 2021-02-23 14:06 UTC (permalink / raw)
  To: John Garry
  Cc: Roger Willcocks, Don.Brace, mwilck, buczek, martin.petersen,
	ming.lei, jejb, linux-scsi, hare, Kevin.Barnett, pmenzel, hare



> On 23 Feb 2021, at 08:57, John Garry <john.garry@huawei.com> wrote:
> 
> On 22/02/2021 14:23, Roger Willcocks wrote:
>> FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)
>> Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs.
>> Test software writes to the array using multiple threads in parallel.
>> The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c
>> Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it.
> 
> That just makes the driver single-queue.
> 

All I can say is it fixes the problem. Write performance is two or three percent faster than CentOS 6.5 on the same hardware.


> As such, since the driver uses blk_mq_unique_tag_to_hwq(), only hw queue #0 will ever be used in the driver.
> 
> And then, since the driver still spreads MSI-X interrupt vectors over all CPUs [from pci_alloc_vectors(PCI_IRQ_AFFINITY)], if CPUs associated with HW queue #0 are offlined (probably just cpu0), there is no CPUs available to service queue #0 interrupt. That's what I think would happen, from a quick glance at the code.
> 

Surely that would be an issue even if it used multiple queues (one of which would be queue #0) ?

> 
>> Would, of course, be helpful if this was back-ported.
>> —
>> Roger



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-23 14:06                     ` Roger Willcocks
@ 2021-02-23 16:17                       ` John Garry
  0 siblings, 0 replies; 27+ messages in thread
From: John Garry @ 2021-02-23 16:17 UTC (permalink / raw)
  To: Roger Willcocks
  Cc: Don.Brace, mwilck, buczek, martin.petersen, ming.lei, jejb,
	linux-scsi, hare, Kevin.Barnett, pmenzel, hare

On 23/02/2021 14:06, Roger Willcocks wrote:
> 
> 
>> On 23 Feb 2021, at 08:57, John Garry <john.garry@huawei.com> wrote:
>>
>> On 22/02/2021 14:23, Roger Willcocks wrote:
>>> FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)
>>> Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs.
>>> Test software writes to the array using multiple threads in parallel.
>>> The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c
>>> Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it.
>>
>> That just makes the driver single-queue.
>>
> 
> All I can say is it fixes the problem. Write performance is two or three percent faster than CentOS 6.5 on the same hardware.
> 
> 
>> As such, since the driver uses blk_mq_unique_tag_to_hwq(), only hw queue #0 will ever be used in the driver.
>>
>> And then, since the driver still spreads MSI-X interrupt vectors over all CPUs [from pci_alloc_vectors(PCI_IRQ_AFFINITY)], if CPUs associated with HW queue #0 are offlined (probably just cpu0), there is no CPUs available to service queue #0 interrupt. That's what I think would happen, from a quick glance at the code.
>>
> 
> Surely that would be an issue even if it used multiple queues (one of which would be queue #0) ?
> 

Well, no. Because there is currently a symmetry between HW queue context 
in the block layer and the HW queues in the LLDD. So if hwq0 were mapped 
to cpu0 only, if cpu0 is offline, block layer will not send commands on 
hwq0. By setting nr_hw_queues=1, that symmetry breaks - every cpu tries 
to send on hwq0, but irq core code will disable hwq0 interrupt when cpu0 
is offline - that's because it is managed.

That's how it looks to me - I did not check the LLDD code too closely. 
Please discuss with Don.

Thanks,
John

>>
>>> Would, of course, be helpful if this was back-ported.
>>> —
>>> Roger
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-02-22 14:23                 ` Roger Willcocks
  2021-02-23  8:57                   ` John Garry
@ 2021-03-01 14:51                   ` Paul Menzel
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Menzel @ 2021-03-01 14:51 UTC (permalink / raw)
  To: Roger Willcocks, Don.Brace
  Cc: mwilck, john.garry, buczek, martin.petersen, ming.lei, jejb,
	linux-scsi, hare, Kevin.Barnett, hare, linux-scsi


Dear Roger,


Am 22.02.21 um 15:23 schrieb Roger Willcocks:
> FYI we have exactly this issue on a machine here running CentOS 8.3
> (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)

What driver version do you use?

> Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives
> configured as five x twelve-drive raid-6, software striped using md,
> and formatted with xfs.
> 
> Test software writes to the array using multiple threads in
> parallel.
> 
> The smartpqi driver would report controller offline within ten
> minutes or so, with status code 0x6100c
> 
> Changed the driver to set 'nr_hw_queues = 1’ and then tested by
> filling the array with random files (which took a couple of days),
> which completed fine, so it looks like that one-line change fixes
> it.
> 
> Would, of course, be helpful if this was back-ported.
We only noticed the issue starting with Linux 5.5 (commit 6eb045e092ef 
("scsi: core: avoid host-wide host_busy counter for scsi_mq").

So I am curious how this problem can be in CentOS 8.3 (Linux kernel 4.18.x).

> —
> Roger

Apple Mail mangled your signature delimiter (-- ).


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early
  2021-01-21 23:32           ` Martin Wilck
@ 2021-03-11 16:36             ` Donald Buczek
  0 siblings, 0 replies; 27+ messages in thread
From: Donald Buczek @ 2021-03-11 16:36 UTC (permalink / raw)
  To: Martin Wilck, John Garry, Martin K. Petersen, Ming Lei
  Cc: James Bottomley, linux-scsi, Hannes Reinecke, Don Brace,
	Kevin Barnett, Paul Menzel, Hannes Reinecke

On 22.01.21 00:32, Martin Wilck wrote:
> On Thu, 2021-01-21 at 13:05 +0000, John Garry wrote:
>>>>
>>
>> Confirmed my suspicions - it looks like the host is sent more
>> commands
>> than it can handle. We would need many disks to see this issue
>> though,
>> which you have.
>>
>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
>> I
>> suppose it could be simply fixed by setting .host_tagset in scsi host
>> template there.
> 
> If it's really just that, it should be easy enough to verify.
> 
> @Donald, you'd need to test with a 5.10 kernel, and after reproducing
> the issue, add
> 
>          .host_tagset            = 1,
> 
> to the definition of pqi_driver_template in
> drivers/scsi/smartpqi/smartpqi_init.c.
> 
> You don't need a patch to test that, I believe. Would you able to do
> this test?

Sorry, I had overlooked this request. I reviewed this thread now, because I want to switch our production systems to 5.10 LTS.

I could reproduce the problem with Linux 5.10.22. When setting `host_tagset = 1`, the problem disappeared. Additionally, we have 5.10.22 with the patch running on two previously affected production systems for over 24 hours now. Statistics suggest, that these systems were very likely to trigger the problem in that time frame if the patch didn't work.
    
So I think this is a working fix which should go to 5.10 stable.

Best
   Donald


diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 9d0229656681f..be429a7cb1512 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -6571,6 +6571,7 @@ static struct scsi_host_template pqi_driver_template = {
         .map_queues = pqi_map_queues,
         .sdev_attrs = pqi_sdev_attrs,
         .shost_attrs = pqi_shost_attrs,
+       .host_tagset = 1,
  };
  
  static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-11 16:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 18:45 [PATCH] scsi: scsi_host_queue_ready: increase busy count early mwilck
2021-01-20 20:26 ` John Garry
2021-01-21 12:01   ` Donald Buczek
2021-01-21 12:35     ` John Garry
2021-01-21 12:44       ` Donald Buczek
2021-01-21 13:05         ` John Garry
2021-01-21 23:32           ` Martin Wilck
2021-03-11 16:36             ` Donald Buczek
2021-02-01 22:44           ` Don.Brace
2021-02-02 20:04           ` Don.Brace
2021-02-02 20:48             ` Martin Wilck
2021-02-03  8:49               ` John Garry
2021-02-03  8:58                 ` Paul Menzel
2021-02-03 15:30                   ` Don.Brace
2021-02-03 15:56               ` Don.Brace
2021-02-03 18:25                 ` John Garry
2021-02-03 19:01                   ` Don.Brace
2021-02-22 14:23                 ` Roger Willcocks
2021-02-23  8:57                   ` John Garry
2021-02-23 14:06                     ` Roger Willcocks
2021-02-23 16:17                       ` John Garry
2021-03-01 14:51                   ` Paul Menzel
2021-01-21  9:07 ` Donald Buczek
2021-01-21 10:05   ` Martin Wilck
2021-01-22  0:14     ` Martin Wilck
2021-01-22  3:23 ` Ming Lei
2021-01-22 14:05   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.