linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mpt3sas and /sys/block/<dev>/device/timeout
@ 2022-06-24 23:10 Forza
  2022-06-25 10:42 ` Forza
  2022-06-28  3:11 ` Martin K. Petersen
  0 siblings, 2 replies; 3+ messages in thread
From: Forza @ 2022-06-24 23:10 UTC (permalink / raw)
  To: linux-scsi

Hi,

I am currently facing an issue with a Broadcom HBA 9500-8i SAS 
controller where 'blkdiscard /dev/sdX' on WD SA500 SATA SSDs cause an IO 
timeout and device reset.

* LSI/Broadcom HBA 9500-8i SAS/SATA controller
* WD RED SA500 NAS SATA SSD 2TB (WDS200T1R0A-68A4W0)
   Drive FW: 411000WR
* Alpine Linux kernel 5.15.48
* /sys/block/sdf/queue/
   discard_granularity:512
   discard_max_bytes:134217216
   discard_max_hw_bytes:134217216

I simply issue a 'blkdiscard /dev/sdf' and after about 30 seconds the 
following errors show in dmesg (quite a lot of rows). The full 
blkdiscard takes between 1.5 and 2.5 minutes depending on the SSD I run 
on (I have 4 drives). The problem is the same if I run fstrim on a 
mounted XFS or Btrfs (but not ext4) filesystem on these drives.

[  +0.000003] scsi target6:0:4: handle(0x0029), 
sas_address(0x5003048020db4543), phy(3)
[  +0.000003] scsi target6:0:4: enclosure logical 
id(0x5003048020db457f), slot(3)
[  +0.000003] scsi target6:0:4: enclosure level(0x0000), connector name( 
C0.1)
[  +0.000003] sd 6:0:4:0: No reference found at driver, assuming 
scmd(0x00000000eb0d9438) might have completed
[  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x00000000eb0d9438)
[  +0.000012] sd 6:0:4:0: attempting task 
abort!scmd(0x0000000075f63919), outstanding for 30397 ms & timeout 30000 ms
[  +0.000003] sd 6:0:4:0: [sdg] tag#2762 CDB: opcode=0x42 42 00 00 00 00 
00 00 00 18 00
[  +0.000002] scsi target6:0:4: handle(0x0029), 
sas_address(0x5003048020db4543), phy(3)
[  +0.000004] scsi target6:0:4: enclosure logical 
id(0x5003048020db457f), slot(3)
[  +0.000002] scsi target6:0:4: enclosure level(0x0000), connector name( 
C0.1)
[  +0.000003] sd 6:0:4:0: No reference found at driver, assuming 
scmd(0x0000000075f63919) might have completed
[  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x0000000075f63919)
[  +0.255021] sd 6:0:4:0: Power-on or device reset occurred


Does the mpt3sas driver or the HBA controller not follow the 
/sys/block/<dev>/device/timeout value? I have mine set to 180 seconds.

It seems that there are many hardcoded timeout values in the driver code.

https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt3sas/mpt3sas_scsih.c
https://github.com/torvalds/linux/blob/6a0a17e6c6d1091ada18d43afd87fb26a82a9823/drivers/scsi/mpt3sas/mpt3sas_scsih.c#L3303-L3306

Any thoughts other than trying to avoid using discards/fstrim? I did 
reach out to Broadcom for support, and they claim it is a fault in the 
fstrim code and that on FreeBSD they had fixed this. Not sure how 
relevant that statement is though.

Thanks,
Forza


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mpt3sas and /sys/block/<dev>/device/timeout
  2022-06-24 23:10 mpt3sas and /sys/block/<dev>/device/timeout Forza
@ 2022-06-25 10:42 ` Forza
  2022-06-28  3:11 ` Martin K. Petersen
  1 sibling, 0 replies; 3+ messages in thread
From: Forza @ 2022-06-25 10:42 UTC (permalink / raw)
  To: linux-scsi



On 6/25/22 01:10, Forza wrote:
> Hi,
> 
> I am currently facing an issue with a Broadcom HBA 9500-8i SAS 
> controller where 'blkdiscard /dev/sdX' on WD SA500 SATA SSDs cause an IO 
> timeout and device reset.
> 
> * LSI/Broadcom HBA 9500-8i SAS/SATA controller
> * WD RED SA500 NAS SATA SSD 2TB (WDS200T1R0A-68A4W0)
>    Drive FW: 411000WR
> * Alpine Linux kernel 5.15.48
> * /sys/block/sdf/queue/
>    discard_granularity:512
>    discard_max_bytes:134217216
>    discard_max_hw_bytes:134217216
> 
> I simply issue a 'blkdiscard /dev/sdf' and after about 30 seconds the 
> following errors show in dmesg (quite a lot of rows). The full 
> blkdiscard takes between 1.5 and 2.5 minutes depending on the SSD I run 
> on (I have 4 drives). The problem is the same if I run fstrim on a 
> mounted XFS or Btrfs (but not ext4) filesystem on these drives.
> 
> [  +0.000003] scsi target6:0:4: handle(0x0029), 
> sas_address(0x5003048020db4543), phy(3)
> [  +0.000003] scsi target6:0:4: enclosure logical 
> id(0x5003048020db457f), slot(3)
> [  +0.000003] scsi target6:0:4: enclosure level(0x0000), connector name( 
> C0.1)
> [  +0.000003] sd 6:0:4:0: No reference found at driver, assuming 
> scmd(0x00000000eb0d9438) might have completed
> [  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x00000000eb0d9438)
> [  +0.000012] sd 6:0:4:0: attempting task 
> abort!scmd(0x0000000075f63919), outstanding for 30397 ms & timeout 30000 ms
> [  +0.000003] sd 6:0:4:0: [sdg] tag#2762 CDB: opcode=0x42 42 00 00 00 00 
> 00 00 00 18 00
> [  +0.000002] scsi target6:0:4: handle(0x0029), 
> sas_address(0x5003048020db4543), phy(3)
> [  +0.000004] scsi target6:0:4: enclosure logical 
> id(0x5003048020db457f), slot(3)
> [  +0.000002] scsi target6:0:4: enclosure level(0x0000), connector name( 
> C0.1)
> [  +0.000003] sd 6:0:4:0: No reference found at driver, assuming 
> scmd(0x0000000075f63919) might have completed
> [  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x0000000075f63919)
> [  +0.255021] sd 6:0:4:0: Power-on or device reset occurred
> 
> 
> Does the mpt3sas driver or the HBA controller not follow the 
> /sys/block/<dev>/device/timeout value? I have mine set to 180 seconds.
> 
> It seems that there are many hardcoded timeout values in the driver code.
> 
> https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
> 
> https://github.com/torvalds/linux/blob/6a0a17e6c6d1091ada18d43afd87fb26a82a9823/drivers/scsi/mpt3sas/mpt3sas_scsih.c#L3303-L3306 
> 
> 
> Any thoughts other than trying to avoid using discards/fstrim? I did 
> reach out to Broadcom for support, and they claim it is a fault in the 
> fstrim code and that on FreeBSD they had fixed this. Not sure how 
> relevant that statement is though.
> 
> Thanks,
> Forza
> 

I used blktrace to capture some events that might help debug where the 
issues are. I don't know how to interpret the results though.


1) running blkdiscard /dev/sdf1
https://paste.tnonline.net/files/MYymVOfP4ROQ_blktrace.tar.xz

2) running mkfs.btrfs /dev/sdf1
https://paste.tnonline.net/files/5UUvxYlQD46Q_blktrace_mkfs.tar.xz

3) running fstrim on mounted btrfs
https://paste.tnonline.net/files/gultxaa7OP8P_blktrace_fstrim.tar.xz
https://paste.tnonline.net/files/8Vy7D0uR2Mk5_dmesg_fstrim.txt

Example output:
Input file sdf.blktrace.0 added
   8,80   1        1     0.000000000  5527  A   D 2048 + 8388607 <- (8,81) 0
   8,80   1        2     0.000002456  5527  Q   D 2048 + 8388607 
[blkdiscard]
   8,80   1        3     0.000004876  5527  X   D 2048 / 264191 [blkdiscard]
   8,80   1        4     0.000009894  5527  G   D 2048 + 262143 [blkdiscard]
   8,80   1        5     0.000010754  5527  P   N [blkdiscard]
   8,80   1        6     0.000012881  5527  X   D 264191 / 526334 
[blkdiscard]
   8,80   1        7     0.000013925  5527  G   D 264191 + 262143 
[blkdiscard]
   8,80   1        8     0.000014896  5527  U   N [blkdiscard] 1
   8,80   1        9     0.000015890  5527  I   D 2048 + 262143 [blkdiscard]
   8,80   1       10     0.000027270  5527  D   D 2048 + 262143 [blkdiscard]
   8,80   1       11     0.000039543  5527  P   N [blkdiscard]
   8,80   1       12     0.000040775  5527  X   D 526334 / 788477 
[blkdiscard]
   8,80   1       13     0.000041572  5527  G   D 526334 + 262143 
[blkdiscard]
   8,80   1       14     0.000041842  5527  U   N [blkdiscard] 1
   8,80   1       15     0.000042175  5527  I   D 264191 + 262143 
[blkdiscard]
   8,80   1       16     0.000044967  5527  D   D 264191 + 262143 
[blkdiscard]
   8,80   1       17     0.000047802  5527  P   N [blkdiscard]
   8,80   1       18     0.000048798  5527  X   D 788477 / 1050620 
[blkdiscard]
   8,80   1       19     0.000049493  5527  G   D 788477 + 262143 
[blkdiscard]
   8,80   1       20     0.000049715  5527  U   N [blkdiscard] 1
   8,80   1       21     0.000050022  5527  I   D 526334 + 262143 
[blkdiscard]
   8,80   1       22     0.000052386  5527  D   D 526334 + 262143 
[blkdiscard]
   8,80   1       23     0.000055063  5527  P   N [blkdiscard]
...
snip ~100k lines
...
   8,80   4    10940    97.593173642   548  D  DS 3904898591 + 262143 
[kworker/4:1H]
   8,80   4    10941    97.603196002     0  C  DS 3900704303 + 3932145 [0]
   8,80   4    10942    97.603246256   548  D  DS 3905160734 + 262143 
[kworker/4:1H]
   8,80   4    10943    97.613166491     0  C  DS 3900704303 + 4194288 [0]
   8,80   4    10944    97.613216413   548  D  DS 3905422877 + 262143 
[kworker/4:1H]
   8,80   4    10945    97.623198445     0  C  DS 3900704303 + 4456431 [0]
   8,80   4    10946    97.623249722   548  D  DS 3905685020 + 262143 
[kworker/4:1H]
   8,80   4    10947    97.633162688     0  C  DS 3900704303 + 4718574 [0]
   8,80   4    10948    97.633212563   548  D  DS 3905947163 + 262143 
[kworker/4:1H]
   8,80   4    10949    97.643154345     0  C  DS 3900704303 + 4980717 [0]
   8,80   4    10950    97.643204045   548  D  DS 3906209306 + 262143 
[kworker/4:1H]
   8,80   4    10951    97.653270127     0  C  DS 3900704303 + 5242860 [0]
   8,80   4    10952    97.653319479   548  D  DS 3906471449 + 262143 
[kworker/4:1H]
   8,80   4    10953    97.663202719     0  C  DS 3900704303 + 5505003 [0]
   8,80   4    10954    97.663253381   548  D  DS 3906733592 + 262143 
[kworker/4:1H]
   8,80   4    10955    97.673273616     0  C  DS 3900704303 + 5767146 [0]
   8,80   4    10956    97.673321526   548  D  DS 3906995735 + 33257 
[kworker/4:1H]
   8,80   4    10957    97.683204146     0  C  DS 3900704303 + 6029289 [0]
   8,80   4    10958    97.693695724     0  C  DS 3900704303 + 6291432 [0]
   8,80   4    10959    97.703304939     0  C   D 3892315696 + 6553575 [0]
   8,80   4    10960    97.713278422     0  C   D 3867149875 + 6553575 [0]
   8,80   4    10961    97.723309633     0  C   D 3867149875 + 6815718 [0]
   8,80   4    10962    97.733228328     0  C   D 3867149875 + 7077861 [0]
   8,80   4    10963    97.743206422     0  C   D 3867149875 + 7340004 [0]
   8,80   4    10964    97.753131380     0  C   D 3867149875 + 7602147 [0]
   8,80   4    10965    97.763353833     0  C   D 3867149875 + 7864290 [0]
   8,80   4    10966    97.773132921     0  C   D 3867149875 + 8126433 [0]
   8,80   4    10967    97.783267092     0  C   D 3867149875 + 8388576 [0]
   8,80   4    10968    97.784616230     0  C   D 3867149875 + 8388607 [0]
   8,80   4    10969    97.793083064     0  C   D 3875538482 + 262143 [0]
   8,80   4    10970    97.803187466     0  C   D 3875538482 + 524286 [0]
   8,80   4    10971    97.813136517     0  C   D 3875538482 + 786429 [0]
   8,80   4    10972    97.823092905     0  C   D 3875538482 + 1048572 [0]
   8,80   4    10973    97.833401461     0  C   D 3875538482 + 1310715 [0]
   8,80   4    10974    97.843100535     0  C   D 3875538482 + 1572858 [0]
   8,80   4    10975    97.853242242     0  C   D 3875538482 + 1835001 [0]
   8,80   4    10976    97.863173598     0  C   D 3875538482 + 2097144 [0]
   8,80   4    10977    97.873196944     0  C   D 3875538482 + 2359287 [0]
   8,80   4    10978    97.883177261     0  C   D 3875538482 + 2621430 [0]
   8,80   4    10979    97.893191605     0  C   D 3875538482 + 2883573 [0]
   8,80   4    10980    97.903370941     0  C   D 3875538482 + 3145716 [0]
   8,80   4    10981    97.913331748     0  C   D 3875538482 + 3407859 [0]
   8,80   4    10982    97.923181282     0  C   D 3875538482 + 3670002 [0]
   8,80   4    10983    97.933252172     0  C   D 3875538482 + 3932145 [0]
   8,80   4    10984    97.943217689     0  C   D 3875538482 + 4194288 [0]
   8,80   4    10985    97.953153062     0  C   D 3875538482 + 4456431 [0]
   8,80   4    10986    97.963283883     0  C   D 3875538482 + 4718574 [0]
   8,80   4    10987    97.973210464     0  C   D 3892315696 + 6291432 [0]
   8,80   4    10988    97.983769074     0  C  DS 3900704303 + 6324689 [0]

CPU0 (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  Read depth:             2               Write depth:            32
  PC Reads Queued: , iB   PC Writes Queued: , iB
  PC Read Disp.:   , iB   PC Write Disp.:   , iB
  PC Reads Req.:                  PC Writes Req.:
  PC Reads Compl.:                PC Writes Compl.:
  IO unplugs:          2360               Timer unplugs:         348
CPU1 (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  Read depth:             2               Write depth:            32
  IO unplugs:          5500               Timer unplugs:         778
CPU2 (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  Read depth:             2               Write depth:            32
  IO unplugs:          2419               Timer unplugs:         354
CPU3 (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  Read depth:             2               Write depth:            32
  IO unplugs:           977               Timer unplugs:         144
CPU4 (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  Read depth:             2               Write depth:            32
  IO unplugs:          1334               Timer unplugs:         197
CPU5 (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  Read depth:             2               Write depth:            32
  PC Reads Queued: , iB   PC Writes Queued: , iB
  PC Read Disp.:   , iB   PC Write Disp.:   , iB
  PC Reads Req.:                  PC Writes Req.:
  PC Reads Compl.:                PC Writes Compl.:
  IO unplugs:           518               Timer unplugs:          71

Total (sdf):
  Reads Queued:    , iB   Writes Queued:    , iB
  Read Dispatches: , iB   Write Dispatches: , iB
  Reads Requeued:                 Writes Requeued:
  Reads Completed: , iB   Writes Completed: , iB
  Read Merges:     , iB   Write Merges:     , iB
  PC Reads Queued: , iB   PC Writes Queued: , iB
  PC Read Disp.:   , iB   PC Write Disp.:   , iB
  PC Reads Req.:                  PC Writes Req.:
  PC Reads Compl.:                PC Writes Compl.:
  IO unplugs:         13108               Timer unplugs:        1892
Events (sdf): Skips: 0 forward (0 -   0.0%)
Trace started at Sat Jun 25 12:05:11 2022


Thanks,
Forza

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mpt3sas and /sys/block/<dev>/device/timeout
  2022-06-24 23:10 mpt3sas and /sys/block/<dev>/device/timeout Forza
  2022-06-25 10:42 ` Forza
@ 2022-06-28  3:11 ` Martin K. Petersen
  1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2022-06-28  3:11 UTC (permalink / raw)
  To: Forza; +Cc: linux-scsi


Forza,

> Does the mpt3sas driver or the HBA controller not follow the
> /sys/block/<dev>/device/timeout value?

Discard operations have different timeouts than regular filesystem
read/write I/O.

> Any thoughts other than trying to avoid using discards/fstrim? I did
> reach out to Broadcom for support, and they claim it is a fault in the
> fstrim code and that on FreeBSD they had fixed this. Not sure how
> relevant that statement is though.

Since this HBA performs SCSI->ATA translation in firmware, Linux has
very little control over what's going on. We issue discard commands
constrained by the drive limits reported to us by the HBA.

I recommend making sure you run the latest HBA firmware. You can also
try setting discard_max_bytes in sysfs for the block device in question
to a smaller value.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-28  3:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-24 23:10 mpt3sas and /sys/block/<dev>/device/timeout Forza
2022-06-25 10:42 ` Forza
2022-06-28  3:11 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).