All of lore.kernel.org
 help / color / mirror / Atom feed
* Scrub: Uncorrectable error due to SSD read error
@ 2021-06-03 19:59 Martin Steigerwald
  2021-06-04  5:50 ` Chris Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Steigerwald @ 2021-06-03 19:59 UTC (permalink / raw)
  To: linux-btrfs

Hi!

I got read errors on a Samsung SSD 870 2 TB drive that is just a few months
old with BTRFS on Devuan Beowulf with 5.10.7 kernel from Debian Sid. The
affected BTRFS filesystem are running on top of LVM with LUKS. This is
on a ThinkPad L560.

BTRFS reported most of them during a scrub. One was an input/output
error during trying to backup a file with rsync – this one was not reported
by scrub. I fixed it by replacing the file from an older backup.

Most other scrub errors where related to a file. I fixed it them replacing
it from an older backup.

However two more scrub errors were not related to a file. One example:

[15786.830191] BTRFS error (device dm-3): bdev /dev/mapper/sata-home errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
[15786.830206] BTRFS error (device dm-3): unable to fixup (regular) error at logical 176045789184 on dev /dev/mapper/sata-home
[15930.966633] BTRFS info (device dm-3): scrub: finished on devid 1 with status: 0

(rd is 3 cause I did the scrub before already – these are on a different
BTRFS filesystem than the other errors, but on the same disk and the
same LVM)

I bet this error is somewhere in metadata.

So what is your best suggestion to fix up this one? btrfs check --repair?
Or just redoing the whole BTRFS from backup?


I reported the errors to the retailer I bought the SSD from, but their
suggestion was just to try updating the BIOS and try without
encryption. I got the impression they did not really understand that
LUKS is software based encryption. They did not really answer to
my question whether the SSD needs replacement. Turning of encryption
is nothing I would like to do.

The device reports SMART status clean, but has the error in SMART log.
So far it appears there are no further errors.

So far no further errors appeared, I find it difficult to judge whether the
drive needs replacement or these where just one time bad blocks that
the SSD firmware will not try to use again.


Here is a somewhat shortened excerpt of what the kernel reported for just
one of the read errors that BTRFS reported.

[15487.535977] BTRFS info (device dm-3): scrub: started on devid 1
[15784.497016] ata1.00: READ LOG DMA EXT failed, trying PIO
[15784.497903] ata1: failed to read log page 10h (errno=-5)
[15784.497928] ata1.00: exception Emask 0x1 SAct 0xff9fffff SErr 0x40000 action 0x0
[15784.497936] ata1.00: irq_stat 0x40000008
[15784.497944] ata1: SError: { CommWake }
[15784.497953] ata1.00: failed command: READ FPDMA QUEUED
[15784.497971] ata1.00: cmd 60/00:00:28:e5:b1/01:00:19:00:00/40 tag 0 ncq dma 131072 in
                        res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
[15784.497978] ata1.00: status: { DRDY }
[15784.497986] ata1.00: failed command: READ FPDMA QUEUED
[15784.498001] ata1.00: cmd 60/00:08:28:e6:b1/02:00:19:00:00/40 tag 1 ncq dma 262144 in
                        res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
[15784.498007] ata1.00: status: { DRDY }
[15784.498014] ata1.00: failed command: READ FPDMA QUEUED
[15784.498028] ata1.00: cmd 60/00:10:28:e8:b1/02:00:19:00:00/40 tag 2 ncq dma 262144 in
                        res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
[…]
[15784.498707] ata1.00: failed command: READ FPDMA QUEUED
[15784.498721] ata1.00: cmd 60/00:f8:28:f3:b1/01:00:19:00:00/40 tag 31 ncq dma 131072 in
                        res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
[15784.498726] ata1.00: status: { DRDY }
[15784.499040] ata1.00: both IDENTIFYs aborted, assuming NODEV
[15784.499049] ata1.00: revalidation failed (errno=-2)
[15784.499064] ata1: hard resetting link
[15784.810213] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[15784.812120] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15784.812131] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15784.812328] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15784.812339] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15784.812759] ata1.00: supports DRM functions and may not be fully accessible
[15784.815649] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15784.815658] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15784.815731] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15784.815738] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15784.816006] ata1.00: supports DRM functions and may not be fully accessible
[15784.817970] ata1.00: configured for UDMA/133
[15784.828580] ata1: EH complete
[15784.829623] ata1.00: Enabling discard_zeroes_data
[15785.021191] ata1: failed to read log page 10h (errno=-5)
[15785.021216] ata1.00: exception Emask 0x1 SAct 0xfffffff1 SErr 0x0 action 0x0
[15785.021222] ata1.00: irq_stat 0x40000008
[15785.021232] ata1.00: failed command: WRITE FPDMA QUEUED
[15785.021249] ata1.00: cmd 61/00:00:60:16:7c/02:00:00:00:00/40 tag 0 ncq dma 262144 out
                        res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
[15785.021257] ata1.00: status: { DRDY }
[15785.021264] ata1.00: failed command: READ FPDMA QUEUED
[15785.021280] ata1.00: cmd 60/00:20:28:d0:b1/06:00:19:00:00/40 tag 4 ncq dma 786432 in
                        res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
[15785.021286] ata1.00: status: { DRDY }
[15785.021293] ata1.00: failed command: READ FPDMA QUEUED
[15785.021308] ata1.00: cmd 60/00:28:d8:0d:b2/01:00:19:00:00/40 tag 5 ncq dma 131072 in
                        res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
[…]
[15785.021957] ata1.00: failed command: READ FPDMA QUEUED
[15785.021970] ata1.00: cmd 60/00:f8:28:e8:b1/02:00:19:00:00/40 tag 31 ncq dma 262144 in
                        res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
[15785.021976] ata1.00: status: { DRDY }
[15785.022456] ata1.00: both IDENTIFYs aborted, assuming NODEV
[15785.022464] ata1.00: revalidation failed (errno=-2)
[15785.022480] ata1: hard resetting link
[15785.333073] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[15785.335233] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15785.335244] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15785.335508] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15785.335519] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15785.336179] ata1.00: supports DRM functions and may not be fully accessible
[15785.341214] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15785.341226] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15785.341398] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15785.341406] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15785.341731] ata1.00: supports DRM functions and may not be fully accessible
[15785.344055] ata1.00: configured for UDMA/133
[15785.354541] ata1: EH complete
[15785.355149] ata1.00: Enabling discard_zeroes_data
[15785.553288] ata1: failed to read log page 10h (errno=-5)
[15785.553314] ata1.00: exception Emask 0x1 SAct 0x83ffffff SErr 0x0 action 0x0
[15785.553321] ata1.00: irq_stat 0x40000008
[15785.553330] ata1.00: failed command: READ FPDMA QUEUED
[15785.553348] ata1.00: cmd 60/00:00:28:d0:b1/06:00:19:00:00/40 tag 0 ncq dma 786432 in
                        res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15785.553356] ata1.00: status: { DRDY }
[15785.553363] ata1.00: failed command: WRITE FPDMA QUEUED
[15785.553379] ata1.00: cmd 61/00:08:60:16:7c/02:00:00:00:00/40 tag 1 ncq dma 262144 out
                        res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15785.553385] ata1.00: status: { DRDY }
[15785.553392] ata1.00: failed command: READ FPDMA QUEUED
[15785.553407] ata1.00: cmd 60/00:10:d8:20:b2/02:00:19:00:00/40 tag 2 ncq dma 262144 in
                        res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15785.553413] ata1.00: status: { DRDY }
[…]
[15785.553984] ata1.00: failed command: SEND FPDMA QUEUED
[15785.553998] ata1.00: cmd 64/01:c8:00:00:00/00:00:00:00:00/a0 tag 25 ncq dma 512 out
                        res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15785.554004] ata1.00: status: { DRDY }
[15785.554011] ata1.00: failed command: READ FPDMA QUEUED
[15785.554025] ata1.00: cmd 60/00:f8:d8:1e:b2/02:00:19:00:00/40 tag 31 ncq dma 262144 in
                        res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15785.554031] ata1.00: status: { DRDY }
[15785.554667] ata1.00: both IDENTIFYs aborted, assuming NODEV
[15785.554676] ata1.00: revalidation failed (errno=-2)
[15785.554692] ata1: hard resetting link
[15785.868941] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[15785.870412] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15785.870424] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15785.870554] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15785.870562] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15785.870946] ata1.00: supports DRM functions and may not be fully accessible
[15785.874186] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15785.874194] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15785.874288] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15785.874295] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15785.874583] ata1.00: supports DRM functions and may not be fully accessible
[15785.876723] ata1.00: configured for UDMA/133
[15785.887214] sd 0:0:0:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[15785.887225] sd 0:0:0:0: [sda] tag#25 Sense Key : Illegal Request [current] 
[15785.887232] sd 0:0:0:0: [sda] tag#25 Add. Sense: Unaligned write command
[15785.887241] sd 0:0:0:0: [sda] tag#25 CDB: Write same(16) 93 08 00 00 00 00 1c 39 a6 30 00 00 00 40 00 00
[15785.887251] blk_update_request: I/O error, dev sda, sector 473540144 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[15785.887295] ata1: EH complete
[15785.887988] ata1.00: Enabling discard_zeroes_data
[15786.077348] ata1: failed to read log page 10h (errno=-5)
[15786.077367] ata1.00: NCQ disabled due to excessive errors
[15786.077377] ata1.00: exception Emask 0x1 SAct 0xf8807fff SErr 0x0 action 0x0
[15786.077383] ata1.00: irq_stat 0x40000008
[15786.077392] ata1.00: failed command: READ FPDMA QUEUED
[15786.077411] ata1.00: cmd 60/00:00:d8:66:b2/05:00:19:00:00/40 tag 0 ncq dma 655360 in
                        res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15786.077418] ata1.00: status: { DRDY }
[15786.077426] ata1.00: failed command: READ FPDMA QUEUED
[15786.077441] ata1.00: cmd 60/00:08:d8:6b:b2/05:00:19:00:00/40 tag 1 ncq dma 655360 in
                        res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15786.077448] ata1.00: status: { DRDY }
[15786.077454] ata1.00: failed command: READ FPDMA QUEUED
[15786.077469] ata1.00: cmd 60/00:10:d8:70:b2/05:00:19:00:00/40 tag 2 ncq dma 655360 in
                        res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15786.077475] ata1.00: status: { DRDY }
[15786.077481] ata1.00: failed command: READ FPDMA QUEUED
[15786.077497] ata1.00: cmd 60/00:18:d8:75:b2/02:00:19:00:00/40 tag 3 ncq dma 262144 in
                        res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[…]
[15786.077930] ata1.00: failed command: READ FPDMA QUEUED
[15786.077944] ata1.00: cmd 60/00:f8:d8:61:b2/05:00:19:00:00/40 tag 31 ncq dma 655360 in
                        res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
[15786.077950] ata1.00: status: { DRDY }
[15786.078721] ata1.00: both IDENTIFYs aborted, assuming NODEV
[15786.078730] ata1.00: revalidation failed (errno=-2)
[15786.078747] ata1: hard resetting link
[15786.390145] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[15786.391640] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15786.391652] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15786.391920] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15786.391932] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15786.392523] ata1.00: supports DRM functions and may not be fully accessible
[15786.397343] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[15786.397354] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[15786.397527] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
[15786.397536] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[15786.397934] ata1.00: supports DRM functions and may not be fully accessible
[15786.400238] ata1.00: configured for UDMA/133
[15786.410829] ata1: EH complete
[15786.521139] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[15786.521158] ata1.00: irq_stat 0x40000001
[15786.521169] ata1.00: failed command: READ DMA EXT
[15786.521188] ata1.00: cmd 25/00:00:28:d0:b1/00:06:19:00:00/e0 tag 17 dma 786432 in
                        res 51/40:00:c8:d2:b1/00:00:19:00:00/e0 Emask 0x9 (media error)
[15786.521195] ata1.00: status: { DRDY ERR }
[15786.521202] ata1.00: error: { UNC }
[15786.521937] ata1.00: supports DRM functions and may not be fully accessible
[15786.525850] ata1.00: supports DRM functions and may not be fully accessible
[15786.528948] ata1.00: configured for UDMA/133
[15786.529028] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
[15786.529038] sd 0:0:0:0: [sda] tag#17 Sense Key : Medium Error [current] 
[15786.529047] sd 0:0:0:0: [sda] tag#17 Add. Sense: Unrecovered read error - auto reallocate failed
[15786.529055] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 19 b1 d0 28 00 06 00 00
[15786.529066] blk_update_request: I/O error, dev sda, sector 431084232 op 0x0:(READ) flags 0x4000 phys_seg 63 prio class 0
[15786.529150] ata1: EH complete
[15786.533347] ata1.00: Enabling discard_zeroes_data
[15786.552084] ata1.00: Enabling discard_zeroes_data
[15786.820703] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[15786.820718] ata1.00: irq_stat 0x40000001
[15786.820728] ata1.00: failed command: READ DMA EXT
[15786.820748] ata1.00: cmd 25/00:08:c8:d2:b1/00:00:19:00:00/e0 tag 18 dma 4096 in
                        res 51/40:00:c8:d2:b1/00:00:19:00:00/e0 Emask 0x9 (media error)
[15786.820756] ata1.00: status: { DRDY ERR }
[15786.820762] ata1.00: error: { UNC }
[15786.821313] ata1.00: supports DRM functions and may not be fully accessible
[15786.824200] ata1.00: supports DRM functions and may not be fully accessible
[15786.826515] ata1.00: configured for UDMA/133
[15786.826557] sd 0:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[15786.826567] sd 0:0:0:0: [sda] tag#18 Sense Key : Medium Error [current] 
[15786.826575] sd 0:0:0:0: [sda] tag#18 Add. Sense: Unrecovered read error - auto reallocate failed
[15786.826584] sd 0:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 19 b1 d2 c8 00 00 08 00
[15786.826594] blk_update_request: I/O error, dev sda, sector 431084232 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
[15786.826643] ata1: EH complete
[15786.828426] ata1.00: Enabling discard_zeroes_data
[15786.830191] BTRFS error (device dm-3): bdev /dev/mapper/sata-home errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
[15786.830206] BTRFS error (device dm-3): unable to fixup (regular) error at logical 176045789184 on dev /dev/mapper/sata-home
[15930.966633] BTRFS info (device dm-3): scrub: finished on devid 1 with status: 0

Thanks
-- 
Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scrub: Uncorrectable error due to SSD read error
  2021-06-03 19:59 Scrub: Uncorrectable error due to SSD read error Martin Steigerwald
@ 2021-06-04  5:50 ` Chris Murphy
  2021-06-04  7:07   ` Martin Steigerwald
  2021-06-04 14:00   ` Phillip Susi
  0 siblings, 2 replies; 5+ messages in thread
From: Chris Murphy @ 2021-06-04  5:50 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Btrfs BTRFS

On Thu, Jun 3, 2021 at 2:08 PM Martin Steigerwald <martin@lichtvoll.de> wrote:
>
> Hi!
>
> I got read errors on a Samsung SSD 870 2 TB drive that is just a few months
> old with BTRFS on Devuan Beowulf with 5.10.7 kernel from Debian Sid. The
> affected BTRFS filesystem are running on top of LVM with LUKS. This is
> on a ThinkPad L560.
>
> BTRFS reported most of them during a scrub. One was an input/output
> error during trying to backup a file with rsync – this one was not reported
> by scrub. I fixed it by replacing the file from an older backup.
>
> Most other scrub errors where related to a file. I fixed it them replacing
> it from an older backup.
>
> However two more scrub errors were not related to a file. One example:
>
> [15786.830191] BTRFS error (device dm-3): bdev /dev/mapper/sata-home errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> [15786.830206] BTRFS error (device dm-3): unable to fixup (regular) error at logical 176045789184 on dev /dev/mapper/sata-home
> [15930.966633] BTRFS info (device dm-3): scrub: finished on devid 1 with status: 0
>
> (rd is 3 cause I did the scrub before already – these are on a different
> BTRFS filesystem than the other errors, but on the same disk and the
> same LVM)
>
> I bet this error is somewhere in metadata.
>
> So what is your best suggestion to fix up this one? btrfs check --repair?

btrfs check --readonly

Let's see what the problem is first.

>
>
> Here is a somewhat shortened excerpt of what the kernel reported for just
> one of the read errors that BTRFS reported.

The vast majority of these messages are from libata. Not dm-crypt or
btrfs. This is the part of the kernel that talks to the drive, and
many are errors being reported by the drive itself...


>
> [15487.535977] BTRFS info (device dm-3): scrub: started on devid 1
> [15784.497016] ata1.00: READ LOG DMA EXT failed, trying PIO
> [15784.497903] ata1: failed to read log page 10h (errno=-5)
> [15784.497928] ata1.00: exception Emask 0x1 SAct 0xff9fffff SErr 0x40000 action 0x0
> [15784.497936] ata1.00: irq_stat 0x40000008
> [15784.497944] ata1: SError: { CommWake }
> [15784.497953] ata1.00: failed command: READ FPDMA QUEUED
> [15784.497971] ata1.00: cmd 60/00:00:28:e5:b1/01:00:19:00:00/40 tag 0 ncq dma 131072 in
>                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> [15784.497978] ata1.00: status: { DRDY }
> [15784.497986] ata1.00: failed command: READ FPDMA QUEUED
> [15784.498001] ata1.00: cmd 60/00:08:28:e6:b1/02:00:19:00:00/40 tag 1 ncq dma 262144 in
>                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> [15784.498007] ata1.00: status: { DRDY }
> [15784.498014] ata1.00: failed command: READ FPDMA QUEUED
> [15784.498028] ata1.00: cmd 60/00:10:28:e8:b1/02:00:19:00:00/40 tag 2 ncq dma 262144 in
>                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> […]
> [15784.498707] ata1.00: failed command: READ FPDMA QUEUED
> [15784.498721] ata1.00: cmd 60/00:f8:28:f3:b1/01:00:19:00:00/40 tag 31 ncq dma 131072 in
>                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> [15784.498726] ata1.00: status: { DRDY }
> [15784.499040] ata1.00: both IDENTIFYs aborted, assuming NODEV
> [15784.499049] ata1.00: revalidation failed (errno=-2)
> [15784.499064] ata1: hard resetting link
> [15784.810213] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [15784.812120] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15784.812131] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15784.812328] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15784.812339] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15784.812759] ata1.00: supports DRM functions and may not be fully accessible
> [15784.815649] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15784.815658] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15784.815731] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15784.815738] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15784.816006] ata1.00: supports DRM functions and may not be fully accessible
> [15784.817970] ata1.00: configured for UDMA/133
> [15784.828580] ata1: EH complete
> [15784.829623] ata1.00: Enabling discard_zeroes_data
> [15785.021191] ata1: failed to read log page 10h (errno=-5)
> [15785.021216] ata1.00: exception Emask 0x1 SAct 0xfffffff1 SErr 0x0 action 0x0
> [15785.021222] ata1.00: irq_stat 0x40000008
> [15785.021232] ata1.00: failed command: WRITE FPDMA QUEUED
> [15785.021249] ata1.00: cmd 61/00:00:60:16:7c/02:00:00:00:00/40 tag 0 ncq dma 262144 out
>                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> [15785.021257] ata1.00: status: { DRDY }
> [15785.021264] ata1.00: failed command: READ FPDMA QUEUED
> [15785.021280] ata1.00: cmd 60/00:20:28:d0:b1/06:00:19:00:00/40 tag 4 ncq dma 786432 in
>                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> [15785.021286] ata1.00: status: { DRDY }
> [15785.021293] ata1.00: failed command: READ FPDMA QUEUED
> [15785.021308] ata1.00: cmd 60/00:28:d8:0d:b2/01:00:19:00:00/40 tag 5 ncq dma 131072 in
>                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> […]
> [15785.021957] ata1.00: failed command: READ FPDMA QUEUED
> [15785.021970] ata1.00: cmd 60/00:f8:28:e8:b1/02:00:19:00:00/40 tag 31 ncq dma 262144 in
>                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> [15785.021976] ata1.00: status: { DRDY }
> [15785.022456] ata1.00: both IDENTIFYs aborted, assuming NODEV
> [15785.022464] ata1.00: revalidation failed (errno=-2)
> [15785.022480] ata1: hard resetting link
> [15785.333073] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [15785.335233] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15785.335244] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15785.335508] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15785.335519] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15785.336179] ata1.00: supports DRM functions and may not be fully accessible
> [15785.341214] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15785.341226] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15785.341398] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15785.341406] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15785.341731] ata1.00: supports DRM functions and may not be fully accessible
> [15785.344055] ata1.00: configured for UDMA/133
> [15785.354541] ata1: EH complete
> [15785.355149] ata1.00: Enabling discard_zeroes_data
> [15785.553288] ata1: failed to read log page 10h (errno=-5)
> [15785.553314] ata1.00: exception Emask 0x1 SAct 0x83ffffff SErr 0x0 action 0x0
> [15785.553321] ata1.00: irq_stat 0x40000008
> [15785.553330] ata1.00: failed command: READ FPDMA QUEUED
> [15785.553348] ata1.00: cmd 60/00:00:28:d0:b1/06:00:19:00:00/40 tag 0 ncq dma 786432 in
>                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15785.553356] ata1.00: status: { DRDY }
> [15785.553363] ata1.00: failed command: WRITE FPDMA QUEUED
> [15785.553379] ata1.00: cmd 61/00:08:60:16:7c/02:00:00:00:00/40 tag 1 ncq dma 262144 out
>                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15785.553385] ata1.00: status: { DRDY }
> [15785.553392] ata1.00: failed command: READ FPDMA QUEUED
> [15785.553407] ata1.00: cmd 60/00:10:d8:20:b2/02:00:19:00:00/40 tag 2 ncq dma 262144 in
>                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15785.553413] ata1.00: status: { DRDY }
> […]
> [15785.553984] ata1.00: failed command: SEND FPDMA QUEUED
> [15785.553998] ata1.00: cmd 64/01:c8:00:00:00/00:00:00:00:00/a0 tag 25 ncq dma 512 out
>                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15785.554004] ata1.00: status: { DRDY }
> [15785.554011] ata1.00: failed command: READ FPDMA QUEUED
> [15785.554025] ata1.00: cmd 60/00:f8:d8:1e:b2/02:00:19:00:00/40 tag 31 ncq dma 262144 in
>                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15785.554031] ata1.00: status: { DRDY }
> [15785.554667] ata1.00: both IDENTIFYs aborted, assuming NODEV
> [15785.554676] ata1.00: revalidation failed (errno=-2)
> [15785.554692] ata1: hard resetting link
> [15785.868941] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [15785.870412] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15785.870424] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15785.870554] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15785.870562] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15785.870946] ata1.00: supports DRM functions and may not be fully accessible
> [15785.874186] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15785.874194] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15785.874288] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15785.874295] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15785.874583] ata1.00: supports DRM functions and may not be fully accessible
> [15785.876723] ata1.00: configured for UDMA/133
> [15785.887214] sd 0:0:0:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
> [15785.887225] sd 0:0:0:0: [sda] tag#25 Sense Key : Illegal Request [current]
> [15785.887232] sd 0:0:0:0: [sda] tag#25 Add. Sense: Unaligned write command
> [15785.887241] sd 0:0:0:0: [sda] tag#25 CDB: Write same(16) 93 08 00 00 00 00 1c 39 a6 30 00 00 00 40 00 00
> [15785.887251] blk_update_request: I/O error, dev sda, sector 473540144 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0

What mount options are you using? Please list all of them from fstab
or /proc/mounts. In particular though it looks like the discard mount
option is being used? Remove it.

Also it might be worth looking at mainline kernel source and see if
this model is in the deny list for any discard related things. And if
not, maybe report your findings to the proper list. I'm not sure if
that would be libata, scsi, or block. But this would work around the
problem by not sending commands that the drive firmware can't handle.
Yes, that means the kernel is working around known manufacturing
device defects. That's where things are.


> [15785.887295] ata1: EH complete
> [15785.887988] ata1.00: Enabling discard_zeroes_data
> [15786.077348] ata1: failed to read log page 10h (errno=-5)
> [15786.077367] ata1.00: NCQ disabled due to excessive errors
> [15786.077377] ata1.00: exception Emask 0x1 SAct 0xf8807fff SErr 0x0 action 0x0
> [15786.077383] ata1.00: irq_stat 0x40000008
> [15786.077392] ata1.00: failed command: READ FPDMA QUEUED
> [15786.077411] ata1.00: cmd 60/00:00:d8:66:b2/05:00:19:00:00/40 tag 0 ncq dma 655360 in
>                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15786.077418] ata1.00: status: { DRDY }
> [15786.077426] ata1.00: failed command: READ FPDMA QUEUED
> [15786.077441] ata1.00: cmd 60/00:08:d8:6b:b2/05:00:19:00:00/40 tag 1 ncq dma 655360 in
>                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15786.077448] ata1.00: status: { DRDY }
> [15786.077454] ata1.00: failed command: READ FPDMA QUEUED
> [15786.077469] ata1.00: cmd 60/00:10:d8:70:b2/05:00:19:00:00/40 tag 2 ncq dma 655360 in
>                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15786.077475] ata1.00: status: { DRDY }
> [15786.077481] ata1.00: failed command: READ FPDMA QUEUED
> [15786.077497] ata1.00: cmd 60/00:18:d8:75:b2/02:00:19:00:00/40 tag 3 ncq dma 262144 in
>                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> […]
> [15786.077930] ata1.00: failed command: READ FPDMA QUEUED
> [15786.077944] ata1.00: cmd 60/00:f8:d8:61:b2/05:00:19:00:00/40 tag 31 ncq dma 655360 in
>                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> [15786.077950] ata1.00: status: { DRDY }
> [15786.078721] ata1.00: both IDENTIFYs aborted, assuming NODEV
> [15786.078730] ata1.00: revalidation failed (errno=-2)
> [15786.078747] ata1: hard resetting link
> [15786.390145] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [15786.391640] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15786.391652] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15786.391920] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15786.391932] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15786.392523] ata1.00: supports DRM functions and may not be fully accessible
> [15786.397343] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> [15786.397354] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [15786.397527] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> [15786.397536] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [15786.397934] ata1.00: supports DRM functions and may not be fully accessible
> [15786.400238] ata1.00: configured for UDMA/133
> [15786.410829] ata1: EH complete
> [15786.521139] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [15786.521158] ata1.00: irq_stat 0x40000001
> [15786.521169] ata1.00: failed command: READ DMA EXT
> [15786.521188] ata1.00: cmd 25/00:00:28:d0:b1/00:06:19:00:00/e0 tag 17 dma 786432 in
>                         res 51/40:00:c8:d2:b1/00:00:19:00:00/e0 Emask 0x9 (media error)
> [15786.521195] ata1.00: status: { DRDY ERR }
> [15786.521202] ata1.00: error: { UNC }
> [15786.521937] ata1.00: supports DRM functions and may not be fully accessible
> [15786.525850] ata1.00: supports DRM functions and may not be fully accessible
> [15786.528948] ata1.00: configured for UDMA/133
> [15786.529028] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
> [15786.529038] sd 0:0:0:0: [sda] tag#17 Sense Key : Medium Error [current]
> [15786.529047] sd 0:0:0:0: [sda] tag#17 Add. Sense: Unrecovered read error - auto reallocate failed
> [15786.529055] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 19 b1 d0 28 00 06 00 00
> [15786.529066] blk_update_request: I/O error, dev sda, sector 431084232 op 0x0:(READ) flags 0x4000 phys_seg 63 prio class 0
> [15786.529150] ata1: EH complete
> [15786.533347] ata1.00: Enabling discard_zeroes_data
> [15786.552084] ata1.00: Enabling discard_zeroes_data

Yeah here's a pretty good bet that discard is included in the btrfs
mount option for this file system. I don't think discards happen
without the file system issuing them, and dm-crypt and device-mapper
for the LVM layer each have discard pass down enabled. It's frequently
the default for LVM these days, but not dm-crypt (it is the default on
Fedora though, for example).

So the drive is not handling queued discard properly is what I
suspect. It gets confused. Face plants. Libata then does a hard reset
to try to get the drive to come back to reality. And then another
discard gets issued and the drive wigs out again.

I'd say it's some kind of defect. Is it unique to this particular
drive and thus a warranty issue? Maybe. Or is it buggy firmware that
affects this entire make/model/firmware? Maybe. If removing discard
mount option fixes the problem, it's probably the latter. But still
worth searching mainline kernel and 5.10.42 if it's in the deny list
for queued trim. If not, worth letting the proper upstream list know
about this problem. If they get more reports, then they'll add the
make/model to the appropriate deny list.

A better option any is scheduled fstrim once a week. Or for very heavy
write, delete, write workloads where it's useful to provide hinting
for unused blocks more often than once per week, use discard=async.
But for a few weeks at least, I suggest no discard option at all to
see if the problem goes away.


> [15786.820703] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [15786.820718] ata1.00: irq_stat 0x40000001
> [15786.820728] ata1.00: failed command: READ DMA EXT
> [15786.820748] ata1.00: cmd 25/00:08:c8:d2:b1/00:00:19:00:00/e0 tag 18 dma 4096 in
>                         res 51/40:00:c8:d2:b1/00:00:19:00:00/e0 Emask 0x9 (media error)
> [15786.820756] ata1.00: status: { DRDY ERR }
> [15786.820762] ata1.00: error: { UNC }
> [15786.821313] ata1.00: supports DRM functions and may not be fully accessible
> [15786.824200] ata1.00: supports DRM functions and may not be fully accessible
> [15786.826515] ata1.00: configured for UDMA/133
> [15786.826557] sd 0:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
> [15786.826567] sd 0:0:0:0: [sda] tag#18 Sense Key : Medium Error [current]
> [15786.826575] sd 0:0:0:0: [sda] tag#18 Add. Sense: Unrecovered read error - auto reallocate failed
> [15786.826584] sd 0:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 19 b1 d2 c8 00 00 08 00
> [15786.826594] blk_update_request: I/O error, dev sda, sector 431084232 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [15786.826643] ata1: EH complete
> [15786.828426] ata1.00: Enabling discard_zeroes_data
> [15786.830191] BTRFS error (device dm-3): bdev /dev/mapper/sata-home errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> [15786.830206] BTRFS error (device dm-3): unable to fixup (regular) error at logical 176045789184 on dev /dev/mapper/sata-home
> [15930.966633] BTRFS info (device dm-3): scrub: finished on devid 1 with status: 0


Yeah I'm not sure if this metadata corruption is the result of
discards. Could be. But the first underlying problem to figure out are
all these ata errors, which point to the drive itself not liking
discards (queued trim), and not something btrfs or device-mapper are
doing wrong.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scrub: Uncorrectable error due to SSD read error
  2021-06-04  5:50 ` Chris Murphy
@ 2021-06-04  7:07   ` Martin Steigerwald
  2021-06-04  7:26     ` Chris Murphy
  2021-06-04 14:00   ` Phillip Susi
  1 sibling, 1 reply; 5+ messages in thread
From: Martin Steigerwald @ 2021-06-04  7:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Chris Murphy - 04.06.21, 07:50:27 CEST:
> On Thu, Jun 3, 2021 at 2:08 PM Martin Steigerwald <martin@lichtvoll.de> wrote:
> > I got read errors on a Samsung SSD 870 2 TB drive that is just a few months
> > old with BTRFS on Devuan Beowulf with 5.10.7 kernel from Debian Sid. The
> > affected BTRFS filesystem are running on top of LVM with LUKS. This is
> > on a ThinkPad L560.
> >
> > BTRFS reported most of them during a scrub. One was an input/output
> > error during trying to backup a file with rsync – this one was not reported
> > by scrub. I fixed it by replacing the file from an older backup.
> >
> > Most other scrub errors where related to a file. I fixed it them replacing
> > it from an older backup.
> >
> > However two more scrub errors were not related to a file. One example:
> >
> > [15786.830191] BTRFS error (device dm-3): bdev /dev/mapper/sata-home errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> > [15786.830206] BTRFS error (device dm-3): unable to fixup (regular) error at logical 176045789184 on dev /dev/mapper/sata-home
> > [15930.966633] BTRFS info (device dm-3): scrub: finished on devid 1 with status: 0
> >
> > (rd is 3 cause I did the scrub before already – these are on a different
> > BTRFS filesystem than the other errors, but on the same disk and the
> > same LVM)
> >
> > I bet this error is somewhere in metadata.
> >
> > So what is your best suggestion to fix up this one? btrfs check --repair?
> 
> btrfs check --readonly
> 
> Let's see what the problem is first.

Ok, will do.

Interestingly one of the scrub read errors of that filesystem disappeared
in the scrub I did yesterday. It was there about a week ago. Maybe BTRFS
got rid of the block. So it may be that, if I make sure those errors do not
happen again, the other one may also disappear. Let's see.

> > Here is a somewhat shortened excerpt of what the kernel reported for just
> > one of the read errors that BTRFS reported.
> 
> The vast majority of these messages are from libata. Not dm-crypt or
> btrfs. This is the part of the kernel that talks to the drive, and
> many are errors being reported by the drive itself...

Yeah, I am aware of that.

> > [15487.535977] BTRFS info (device dm-3): scrub: started on devid 1
> > [15784.497016] ata1.00: READ LOG DMA EXT failed, trying PIO
> > [15784.497903] ata1: failed to read log page 10h (errno=-5)
> > [15784.497928] ata1.00: exception Emask 0x1 SAct 0xff9fffff SErr 0x40000 action 0x0
> > [15784.497936] ata1.00: irq_stat 0x40000008
> > [15784.497944] ata1: SError: { CommWake }
> > [15784.497953] ata1.00: failed command: READ FPDMA QUEUED
> > [15784.497971] ata1.00: cmd 60/00:00:28:e5:b1/01:00:19:00:00/40 tag 0 ncq dma 131072 in
> >                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15784.497978] ata1.00: status: { DRDY }
> > [15784.497986] ata1.00: failed command: READ FPDMA QUEUED
> > [15784.498001] ata1.00: cmd 60/00:08:28:e6:b1/02:00:19:00:00/40 tag 1 ncq dma 262144 in
> >                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15784.498007] ata1.00: status: { DRDY }
> > [15784.498014] ata1.00: failed command: READ FPDMA QUEUED
> > [15784.498028] ata1.00: cmd 60/00:10:28:e8:b1/02:00:19:00:00/40 tag 2 ncq dma 262144 in
> >                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> > […]
> > [15784.498707] ata1.00: failed command: READ FPDMA QUEUED
> > [15784.498721] ata1.00: cmd 60/00:f8:28:f3:b1/01:00:19:00:00/40 tag 31 ncq dma 131072 in
> >                         res 40/00:c8:28:d8:b1/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15784.498726] ata1.00: status: { DRDY }
> > [15784.499040] ata1.00: both IDENTIFYs aborted, assuming NODEV
> > [15784.499049] ata1.00: revalidation failed (errno=-2)
> > [15784.499064] ata1: hard resetting link
> > [15784.810213] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [15784.812120] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15784.812131] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15784.812328] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15784.812339] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15784.812759] ata1.00: supports DRM functions and may not be fully accessible
> > [15784.815649] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15784.815658] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15784.815731] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15784.815738] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15784.816006] ata1.00: supports DRM functions and may not be fully accessible
> > [15784.817970] ata1.00: configured for UDMA/133
> > [15784.828580] ata1: EH complete
> > [15784.829623] ata1.00: Enabling discard_zeroes_data
> > [15785.021191] ata1: failed to read log page 10h (errno=-5)
> > [15785.021216] ata1.00: exception Emask 0x1 SAct 0xfffffff1 SErr 0x0 action 0x0
> > [15785.021222] ata1.00: irq_stat 0x40000008
> > [15785.021232] ata1.00: failed command: WRITE FPDMA QUEUED
> > [15785.021249] ata1.00: cmd 61/00:00:60:16:7c/02:00:00:00:00/40 tag 0 ncq dma 262144 out
> >                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> > [15785.021257] ata1.00: status: { DRDY }
> > [15785.021264] ata1.00: failed command: READ FPDMA QUEUED
> > [15785.021280] ata1.00: cmd 60/00:20:28:d0:b1/06:00:19:00:00/40 tag 4 ncq dma 786432 in
> >                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> > [15785.021286] ata1.00: status: { DRDY }
> > [15785.021293] ata1.00: failed command: READ FPDMA QUEUED
> > [15785.021308] ata1.00: cmd 60/00:28:d8:0d:b2/01:00:19:00:00/40 tag 5 ncq dma 131072 in
> >                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> > […]
> > [15785.021957] ata1.00: failed command: READ FPDMA QUEUED
> > [15785.021970] ata1.00: cmd 60/00:f8:28:e8:b1/02:00:19:00:00/40 tag 31 ncq dma 262144 in
> >                         res 40/00:e0:88:4a:67/00:00:07:00:00/40 Emask 0x1 (device error)
> > [15785.021976] ata1.00: status: { DRDY }
> > [15785.022456] ata1.00: both IDENTIFYs aborted, assuming NODEV
> > [15785.022464] ata1.00: revalidation failed (errno=-2)
> > [15785.022480] ata1: hard resetting link
> > [15785.333073] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [15785.335233] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.335244] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15785.335508] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.335519] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15785.336179] ata1.00: supports DRM functions and may not be fully accessible
> > [15785.341214] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.341226] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15785.341398] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.341406] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15785.341731] ata1.00: supports DRM functions and may not be fully accessible
> > [15785.344055] ata1.00: configured for UDMA/133
> > [15785.354541] ata1: EH complete
> > [15785.355149] ata1.00: Enabling discard_zeroes_data
> > [15785.553288] ata1: failed to read log page 10h (errno=-5)
> > [15785.553314] ata1.00: exception Emask 0x1 SAct 0x83ffffff SErr 0x0 action 0x0
> > [15785.553321] ata1.00: irq_stat 0x40000008
> > [15785.553330] ata1.00: failed command: READ FPDMA QUEUED
> > [15785.553348] ata1.00: cmd 60/00:00:28:d0:b1/06:00:19:00:00/40 tag 0 ncq dma 786432 in
> >                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15785.553356] ata1.00: status: { DRDY }
> > [15785.553363] ata1.00: failed command: WRITE FPDMA QUEUED
> > [15785.553379] ata1.00: cmd 61/00:08:60:16:7c/02:00:00:00:00/40 tag 1 ncq dma 262144 out
> >                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15785.553385] ata1.00: status: { DRDY }
> > [15785.553392] ata1.00: failed command: READ FPDMA QUEUED
> > [15785.553407] ata1.00: cmd 60/00:10:d8:20:b2/02:00:19:00:00/40 tag 2 ncq dma 262144 in
> >                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15785.553413] ata1.00: status: { DRDY }
> > […]
> > [15785.553984] ata1.00: failed command: SEND FPDMA QUEUED
> > [15785.553998] ata1.00: cmd 64/01:c8:00:00:00/00:00:00:00:00/a0 tag 25 ncq dma 512 out
> >                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15785.554004] ata1.00: status: { DRDY }
> > [15785.554011] ata1.00: failed command: READ FPDMA QUEUED
> > [15785.554025] ata1.00: cmd 60/00:f8:d8:1e:b2/02:00:19:00:00/40 tag 31 ncq dma 262144 in
> >                         res 40/00:78:d8:47:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15785.554031] ata1.00: status: { DRDY }
> > [15785.554667] ata1.00: both IDENTIFYs aborted, assuming NODEV
> > [15785.554676] ata1.00: revalidation failed (errno=-2)
> > [15785.554692] ata1: hard resetting link
> > [15785.868941] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [15785.870412] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.870424] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15785.870554] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.870562] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15785.870946] ata1.00: supports DRM functions and may not be fully accessible
> > [15785.874186] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.874194] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15785.874288] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15785.874295] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15785.874583] ata1.00: supports DRM functions and may not be fully accessible
> > [15785.876723] ata1.00: configured for UDMA/133
> > [15785.887214] sd 0:0:0:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
> > [15785.887225] sd 0:0:0:0: [sda] tag#25 Sense Key : Illegal Request [current]
> > [15785.887232] sd 0:0:0:0: [sda] tag#25 Add. Sense: Unaligned write command
> > [15785.887241] sd 0:0:0:0: [sda] tag#25 CDB: Write same(16) 93 08 00 00 00 00 1c 39 a6 30 00 00 00 40 00 00
> > [15785.887251] blk_update_request: I/O error, dev sda, sector 473540144 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
> 
> What mount options are you using? Please list all of them from fstab
> or /proc/mounts. In particular though it looks like the discard mount
> option is being used? Remove it.

Mount options look like this.
 
rw,relatime,lazytime,compress=zstd:3,ssd,discard=async,space_cache,subvolid=257,subvol=/home

I am removing "discard=async". I thought this would be safe with modern
SSDs.

I always used fstrim manually before, because I was aware of SSDs having
issue with discard, but when BTRFS introduced queued or batched discard
I thought I give it a try on a modern SSD.
 
> Also it might be worth looking at mainline kernel source and see if
> this model is in the deny list for any discard related things. And if
> not, maybe report your findings to the proper list. I'm not sure if
> that would be libata, scsi, or block. But this would work around the
> problem by not sending commands that the drive firmware can't handle.
> Yes, that means the kernel is working around known manufacturing
> device defects. That's where things are.
>
> > [15785.887295] ata1: EH complete
> > [15785.887988] ata1.00: Enabling discard_zeroes_data
> > [15786.077348] ata1: failed to read log page 10h (errno=-5)
> > [15786.077367] ata1.00: NCQ disabled due to excessive errors
> > [15786.077377] ata1.00: exception Emask 0x1 SAct 0xf8807fff SErr 0x0 action 0x0
> > [15786.077383] ata1.00: irq_stat 0x40000008
> > [15786.077392] ata1.00: failed command: READ FPDMA QUEUED
> > [15786.077411] ata1.00: cmd 60/00:00:d8:66:b2/05:00:19:00:00/40 tag 0 ncq dma 655360 in
> >                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15786.077418] ata1.00: status: { DRDY }
> > [15786.077426] ata1.00: failed command: READ FPDMA QUEUED
> > [15786.077441] ata1.00: cmd 60/00:08:d8:6b:b2/05:00:19:00:00/40 tag 1 ncq dma 655360 in
> >                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15786.077448] ata1.00: status: { DRDY }
> > [15786.077454] ata1.00: failed command: READ FPDMA QUEUED
> > [15786.077469] ata1.00: cmd 60/00:10:d8:70:b2/05:00:19:00:00/40 tag 2 ncq dma 655360 in
> >                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15786.077475] ata1.00: status: { DRDY }
> > [15786.077481] ata1.00: failed command: READ FPDMA QUEUED
> > [15786.077497] ata1.00: cmd 60/00:18:d8:75:b2/02:00:19:00:00/40 tag 3 ncq dma 262144 in
> >                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > […]
> > [15786.077930] ata1.00: failed command: READ FPDMA QUEUED
> > [15786.077944] ata1.00: cmd 60/00:f8:d8:61:b2/05:00:19:00:00/40 tag 31 ncq dma 655360 in
> >                         res 40/00:28:d8:7e:b2/00:00:19:00:00/40 Emask 0x1 (device error)
> > [15786.077950] ata1.00: status: { DRDY }
> > [15786.078721] ata1.00: both IDENTIFYs aborted, assuming NODEV
> > [15786.078730] ata1.00: revalidation failed (errno=-2)
> > [15786.078747] ata1: hard resetting link
> > [15786.390145] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [15786.391640] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15786.391652] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15786.391920] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15786.391932] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15786.392523] ata1.00: supports DRM functions and may not be fully accessible
> > [15786.397343] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
> > [15786.397354] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> > [15786.397527] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
> > [15786.397536] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> > [15786.397934] ata1.00: supports DRM functions and may not be fully accessible
> > [15786.400238] ata1.00: configured for UDMA/133
> > [15786.410829] ata1: EH complete
> > [15786.521139] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [15786.521158] ata1.00: irq_stat 0x40000001
> > [15786.521169] ata1.00: failed command: READ DMA EXT
> > [15786.521188] ata1.00: cmd 25/00:00:28:d0:b1/00:06:19:00:00/e0 tag 17 dma 786432 in
> >                         res 51/40:00:c8:d2:b1/00:00:19:00:00/e0 Emask 0x9 (media error)
> > [15786.521195] ata1.00: status: { DRDY ERR }
> > [15786.521202] ata1.00: error: { UNC }
> > [15786.521937] ata1.00: supports DRM functions and may not be fully accessible
> > [15786.525850] ata1.00: supports DRM functions and may not be fully accessible
> > [15786.528948] ata1.00: configured for UDMA/133
> > [15786.529028] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
> > [15786.529038] sd 0:0:0:0: [sda] tag#17 Sense Key : Medium Error [current]
> > [15786.529047] sd 0:0:0:0: [sda] tag#17 Add. Sense: Unrecovered read error - auto reallocate failed
> > [15786.529055] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 19 b1 d0 28 00 06 00 00
> > [15786.529066] blk_update_request: I/O error, dev sda, sector 431084232 op 0x0:(READ) flags 0x4000 phys_seg 63 prio class 0
> > [15786.529150] ata1: EH complete
> > [15786.533347] ata1.00: Enabling discard_zeroes_data
> > [15786.552084] ata1.00: Enabling discard_zeroes_data
> 
> Yeah here's a pretty good bet that discard is included in the btrfs
> mount option for this file system. I don't think discards happen
> without the file system issuing them, and dm-crypt and device-mapper
> for the LVM layer each have discard pass down enabled. It's frequently
> the default for LVM these days, but not dm-crypt (it is the default on
> Fedora though, for example).

I think it is passing discards through by default on Debian 10 as well.

> So the drive is not handling queued discard properly is what I
> suspect. It gets confused. Face plants. Libata then does a hard reset
> to try to get the drive to come back to reality. And then another
> discard gets issued and the drive wigs out again.

Thank you. Where did you read this from? I find it challenging to interpret
the error message the kernel ata layer puts out. They are very detailed and
I think that is good, but I just do not understand most of it.

> I'd say it's some kind of defect. Is it unique to this particular
> drive and thus a warranty issue? Maybe. Or is it buggy firmware that
> affects this entire make/model/firmware? Maybe. If removing discard
> mount option fixes the problem, it's probably the latter. But still
> worth searching mainline kernel and 5.10.42 if it's in the deny list
> for queued trim. If not, worth letting the proper upstream list know
> about this problem. If they get more reports, then they'll add the
> make/model to the appropriate deny list.

Will do so. At least there does not seem to be a firmware update available.
(Or Samsung does not provide those through LVFS. Need to check.)

% fwupdmgr update
Devices with no available firmware updates: 
[…]
 • Samsung SSD 870 EVO 2TB
[…]

> A better option any is scheduled fstrim once a week. Or for very heavy
> write, delete, write workloads where it's useful to provide hinting
> for unused blocks more often than once per week, use discard=async.

Well that is what I use. "discard=async"

> But for a few weeks at least, I suggest no discard option at all to
> see if the problem goes away.

Will do so.

> > [15786.820703] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [15786.820718] ata1.00: irq_stat 0x40000001
> > [15786.820728] ata1.00: failed command: READ DMA EXT
> > [15786.820748] ata1.00: cmd 25/00:08:c8:d2:b1/00:00:19:00:00/e0 tag 18 dma 4096 in
> >                         res 51/40:00:c8:d2:b1/00:00:19:00:00/e0 Emask 0x9 (media error)
> > [15786.820756] ata1.00: status: { DRDY ERR }
> > [15786.820762] ata1.00: error: { UNC }
> > [15786.821313] ata1.00: supports DRM functions and may not be fully accessible
> > [15786.824200] ata1.00: supports DRM functions and may not be fully accessible
> > [15786.826515] ata1.00: configured for UDMA/133
> > [15786.826557] sd 0:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
> > [15786.826567] sd 0:0:0:0: [sda] tag#18 Sense Key : Medium Error [current]
> > [15786.826575] sd 0:0:0:0: [sda] tag#18 Add. Sense: Unrecovered read error - auto reallocate failed
> > [15786.826584] sd 0:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 19 b1 d2 c8 00 00 08 00
> > [15786.826594] blk_update_request: I/O error, dev sda, sector 431084232 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> > [15786.826643] ata1: EH complete
> > [15786.828426] ata1.00: Enabling discard_zeroes_data
> > [15786.830191] BTRFS error (device dm-3): bdev /dev/mapper/sata-home errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> > [15786.830206] BTRFS error (device dm-3): unable to fixup (regular) error at logical 176045789184 on dev /dev/mapper/sata-home
> > [15930.966633] BTRFS info (device dm-3): scrub: finished on devid 1 with status: 0
[…]
> Yeah I'm not sure if this metadata corruption is the result of
> discards. Could be. But the first underlying problem to figure out are
> all these ata errors, which point to the drive itself not liking
> discards (queued trim), and not something btrfs or device-mapper are
> doing wrong.

I did not suspect Linux doing anything wrong here. I was pretty sure it
would be something with the drive, yet did not know that it is discard
related.

Ciao,
-- 
Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scrub: Uncorrectable error due to SSD read error
  2021-06-04  7:07   ` Martin Steigerwald
@ 2021-06-04  7:26     ` Chris Murphy
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2021-06-04  7:26 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Chris Murphy, Btrfs BTRFS

On Fri, Jun 4, 2021 at 1:07 AM Martin Steigerwald <martin@lichtvoll.de> wrote:
>
> Mount options look like this.
>
> rw,relatime,lazytime,compress=zstd:3,ssd,discard=async,space_cache,subvolid=257,subvol=/home
>
> I am removing "discard=async". I thought this would be safe with modern
> SSDs.

It should be. So now the question is whether it's some kind of defect
with this particular SSD. Or if it affects make/model. And if it's in
the deny list for mainline. And if upstream is aware of any issues
with this model.

> I always used fstrim manually before, because I was aware of SSDs having
> issue with discard, but when BTRFS introduced queued or batched discard
> I thought I give it a try on a modern SSD.

Completely reasonable.



> > So the drive is not handling queued discard properly is what I
> > suspect. It gets confused. Face plants. Libata then does a hard reset
> > to try to get the drive to come back to reality. And then another
> > discard gets issued and the drive wigs out again.
>
> Thank you. Where did you read this from? I find it challenging to interpret
> the error message the kernel ata layer puts out. They are very detailed and
> I think that is good, but I just do not understand most of it.

Oh man lots of emails on linux-raid@ list over the years. And other
people decipher them. I don't understand most of it either.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scrub: Uncorrectable error due to SSD read error
  2021-06-04  5:50 ` Chris Murphy
  2021-06-04  7:07   ` Martin Steigerwald
@ 2021-06-04 14:00   ` Phillip Susi
  1 sibling, 0 replies; 5+ messages in thread
From: Phillip Susi @ 2021-06-04 14:00 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Martin Steigerwald, Btrfs BTRFS


Chris Murphy <lists@colorremedies.com> writes:

> Yeah here's a pretty good bet that discard is included in the btrfs
> mount option for this file system. I don't think discards happen
> without the file system issuing them, and dm-crypt and device-mapper
> for the LVM layer each have discard pass down enabled. It's frequently
> the default for LVM these days, but not dm-crypt (it is the default on
> Fedora though, for example).
>
> So the drive is not handling queued discard properly is what I
> suspect. It gets confused. Face plants. Libata then does a hard reset
> to try to get the drive to come back to reality. And then another
> discard gets issued and the drive wigs out again.

This is an interesting theory, but I was going to ask what gave you the
idea since I didn't see anything about trim or discard on my first read
through the log.  Now that I look at it again, I see some interesting
things.

[15784.497016] ata1.00: READ LOG DMA EXT failed, trying PIO
[15784.497903] ata1: failed to read log page 10h (errno=-5)

Some googling indicates that log page 10h contains detailed error
information for NCQ.  IIRC, without that, when an error happens there is
no way to know which queued command had the error, so all of them have
to assume to have failed and be retried.  Then several READ commands
with different tags are listed, indicating that they were all marked as
failed.  There is no TRIM command listed.  The drive indeed has gone out
to lunch as attempting to re-read its IDENTIFY page also fails twice,
and so a hard reset is performed to attempt to recover.  After that we
see this:

ata1.00: Enabling discard_zeroes_data

I believe that means not that a TRIM is being issued, but only that
libata noticed that the drive has the flag set in its IDENTIFY page that
means it claims that reading from a block after TRIMming it will return
zeros.

This repeats a few times before finally we see:

[15785.887214] sd 0:0:0:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[15785.887225] sd 0:0:0:0: [sda] tag#25 Sense Key : Illegal Request [current] 
[15785.887232] sd 0:0:0:0: [sda] tag#25 Add. Sense: Unaligned write command
[15785.887241] sd 0:0:0:0: [sda] tag#25 CDB: Write same(16) 93 08 00 00 00 00 1c 39 a6 30 00 00 00 40 00 00
[15785.887251] blk_update_request: I/O error, dev sda, sector
473540144 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0

One of the previously failed commands at the ATA layer was a "SEND FPDMA
QUEUED".  I'm guessing that is the discard.  I believe the scsi layer
tried to fall back to using WRITE SAME at this point to zero the
discarded blocks, but I am not sure if that previous SEND FPDMA was the
attempted WRITE SAME, or if that was the TRIM and then after that
failed, scsi tried WRITE SAME.  I think it was the former.  In that
case, I don't see where an actual TRIM was ever issued or failed to
cause scsi to call back to WRITE SAME.  Or maybe it was that at the scsi
layer, a discard is translated to a WRITE SAME command with some flag
bit set that indicates that it is really a TRIM request.  In that case,
then this was simply a discard that happened to occur this time and fail
along with the other reads.  The previous cycles all failed without the
need of a TRIM, so I think it is just happenstance that there was one
here.

Finally after another round of hard reset, the drive starts giving
normal uncorrectable read errors ( media error ).

Something else that might be worth a try is disabling NCQ.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-04 14:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 19:59 Scrub: Uncorrectable error due to SSD read error Martin Steigerwald
2021-06-04  5:50 ` Chris Murphy
2021-06-04  7:07   ` Martin Steigerwald
2021-06-04  7:26     ` Chris Murphy
2021-06-04 14:00   ` Phillip Susi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.