linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache
@ 2020-01-06 17:32 Stephen Conrad
  2020-01-06 23:56 ` Chris Murphy
  2020-01-07  1:49 ` Remi Gauvin
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen Conrad @ 2020-01-06 17:32 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7359 bytes --]

Hello,

I have a pair of identical drives that I set up with btrfs for the purpose of 
redundancy for a 24x7 network attached storage application.  The pair of 
drives reside in an external enclosure and are each connected to the servers 
with a dedicated eSATA cable.  I set up the array in Fall of 2017 and had 
about a year of operation without noticing any issues.

In January 2019 the file system remounted read-only.  I have retained the dmesg 
logs and notes on the actions I took at that time.  Summary is that I saw a 
bunch of SATA "link is slow to respond" and "hard resetting link" messages 
followed by BTRFS errors, "BTRFS: Transaction aborted (error -5)", and a stack 
trace.  I tried a few things (unsuccessfully) and had to power down the 
hardware.  After a power cycle the file system mounted R/W as normal.   I then 
ran a btrfs check and btrfs scrub which showed no errors.  

# btrfs check -p /dev/disk/by-uuid/X
Checking filesystem on /dev/disk/by-uuid/X
UUID: X
checking extents [o]
checking free space cache [.]
checking fs roots [o]
checking csums
checking root refs
found 2953419067392 bytes used err is 0
total csum bytes: 2880319844
total tree bytes: 3249602560
total fs tree bytes: 153665536
total extent tree bytes: 30392320
btree space waste bytes: 162811208
file data blocks allocated: 2950169763840
 referenced 2950168137728

# btrfs scrub start -B /dev/disk/by-uuid/X
scrub done for X
        scrub started at Thu Jan 24 21:18:33 2019 and finished after 03:58:21
        total bytes scrubbed: 2.69TiB with 0 errors

I also swapped out my long eSATA cables (6 foot) for shorter cables (20 inch) 
to try to address the link issues... Thought everything was good, but this 
episode may be related to subsequent events...

<10 Months Later>

In October 2019 I had a similar event.  Logs show SATA "hard resetting link" 
for both drives as well as BTRFS "lost page write due to IO error" messages.  
The logs show "forced readonly", "BTRFS: Transaction aborted (error -5)" and a 
stack trace.  After hard power cycle I ran a "btrfs check -p" which resulted 
in a stream messages like "parent transid verify failed on 2418006753280 
wanted 34457 found 30647" and then the following:
parent transid verify failed on 2417279598592 wanted 35322 found 29823
checking root refs
found 3066087231488 bytes used err is 0
total csum bytes: 2990197476
total tree bytes: 3376070656
total fs tree bytes: 158498816
total extent tree bytes: 31932416
btree space waste bytes: 171614389
file data blocks allocated: 3062711459840
 referenced 3062709796864

I re-ran the check and captured all output to file just in case. File is 315000 
lines long...

After some internet research I decided to mount the file system and run a 
scrub.  File system mounts read/write successfully, but I was seeing patterns 
of log entries like: "BTRFS error (device dm-1): parent transid verify failed 
on 2176843776 wanted 34455 found 27114" and then "BTRFS info (device dm-1): 
read error corrected: ino 1 off 2176843776 (dev /dev/mapper/K1JG82AD sector 
4251648)"

# btrfs scrub start -B /dev/disk/by-uuid/X
After starting the scrub the logs were showing several patterns
verify_parent_transid: 89 callbacks suppressed
repair_io_failure: 386 callbacks suppressed
BTRFS error (device dm-1): csum mismatch on free space cache
BTRFS warning (device dm-1): failed to load free space cache for block group 
1131786797056, rebuilding it now
BTRFS error (device dm-1): parent transid verify failed on 2417279385600 
wanted 35322 found 29823
BTRFS info (device dm-1): read error corrected: ino 1 off 2417279385600 (dev /
dev/mapper/K1JG82AD sector 4719086112)

But the summary output looked OK...
scrub done for X
        scrub started at Thu Oct 31 00:27:29 2019 and finished after 04:07:45
        total bytes scrubbed: 2.79TiB with 0 errors

# date
Sun Dec  8 11:56:29 EST 2019
# btrfs fi show
Label: none  uuid: X
        Total devices 2 FS bytes used 2.80TiB
        devid    1 size 5.46TiB used 2.80TiB path /dev/mapper/K1JG82AD
        devid    2 size 5.46TiB used 2.80TiB path /dev/mapper/K1JGYJJD

# btrfs dev stat /mnt/raid
[/dev/mapper/K1JG82AD].write_io_errs   15799831
[/dev/mapper/K1JG82AD].read_io_errs    15764242
[/dev/mapper/K1JG82AD].flush_io_errs   4385
[/dev/mapper/K1JG82AD].corruption_errs 0
[/dev/mapper/K1JG82AD].generation_errs 0
[/dev/mapper/K1JGYJJD].write_io_errs   0
[/dev/mapper/K1JGYJJD].read_io_errs    0
[/dev/mapper/K1JGYJJD].flush_io_errs   0
[/dev/mapper/K1JGYJJD].corruption_errs 0
[/dev/mapper/K1JGYJJD].generation_errs 0

At this point I was still seeing occasional log entries for "parent transid 
verify failed" and "read error corrected" so I decided to upgrade from Debian9 
to Debian10 to get more current tools.  Running a scrub with Debian10 tools I 
saw errors detected and corrected... I also saw sata link issues during the 
scrub...

# date
Mon 09 Dec 2019 10:29:05 PM EST
# btrfs scrub start -B -d /dev/disk/by-uuid/X
scrub device /dev/mapper/K1JG82AD (id 1) done
        scrub started at Sun Dec  8 23:06:59 2019 and finished after 05:46:26
        total bytes scrubbed: 2.80TiB with 9490467 errors
        error details: verify=1349 csum=9489118
        corrected errors: 9490467, uncorrectable errors: 0, unverified errors: 
0
WARNING: errors detected during scrubbing, corrected

# btrfs dev stat /mnt/raid
[/dev/mapper/K1JG82AD].write_io_errs    15799831
[/dev/mapper/K1JG82AD].read_io_errs     15764242
[/dev/mapper/K1JG82AD].flush_io_errs    4385
[/dev/mapper/K1JG82AD].corruption_errs  9489118
[/dev/mapper/K1JG82AD].generation_errs  1349
[/dev/mapper/K1JGYJJD].write_io_errs    0
[/dev/mapper/K1JGYJJD].read_io_errs     0
[/dev/mapper/K1JGYJJD].flush_io_errs    0
[/dev/mapper/K1JGYJJD].corruption_errs  0
[/dev/mapper/K1JGYJJD].generation_errs  0

At this point I want to separate the file system debug from the hardware debug.  
I would like to offload all data to a different disk (if possible) to maintain 
availability while I work with the hardware, then return to this hardware once 
I am certain it is rock solid...

Questions:
1) How should I interpret these errors?  Seems that btrfs messages are telling 
me that there are an abundance of errors everywhere, but that they are all 
correctable...  Should I panic?  Should I proceed?

2) Is my file system broken? Is my data corrupted?  Should I be able to scrub 
etc to get back to operation without scary log messages? Can I trust data that 
I copy out now, or need to fall back on old/incomplete backups?

3) What steps are recommended to backup/offload/recover data?  I am considering 
installing the disks into a different machine, then mounting the array read-
only, and then pulling a full copy of the data...

4) What steps should I take to clean up the file system errors/messages?  Start 
fresh after full backup, (though I hate the idea of migrating off a redundant 
array onto a single disk in the process)?  Scrub etc?  Evaluate each disk 
independently and rebuild one from the other?

Regards,
Stephen

 - System Information - 
2017 - 2019/12/8 (Debian 9)
linux-4.9.110
Package: btrfs-progs (4.7.3-1)

2019/12/8 - present (Debian 10)
# uname -a
Linux 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/
Linux
# btrfs --version
btrfs-progs v4.20.1
# btrfs fi show
<See above>

- dmesg / stack traces - 
Attached




[-- Attachment #2: 20190121_dmesg_trace.txt --]
[-- Type: text/plain, Size: 11761 bytes --]

...
[13940826.679535] ata4: link is slow to respond, please be patient (ready=0)
[13940826.715588] ata3: link is slow to respond, please be patient (ready=0)
[13940830.855320] ata3: COMRESET failed (errno=-16)
[13940830.857487] ata3: hard resetting link
[13940830.879305] ata4: COMRESET failed (errno=-16)
[13940830.881402] ata4: hard resetting link
[13940831.567340] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[13940831.591337] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[13940831.842762] ata4.00: configured for UDMA/133
[13940831.842774] ata4: EH complete
[13940831.895485] ata3.00: configured for UDMA/133
[13940831.895495] ata3: EH complete
[13942645.785185] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[13942645.787579] ata4: irq_stat 0x00400040, connection status changed
[13942645.789981] ata4: SError: { PHYRdyChg 10B8B DevExch }
[13942645.792035] ata4: hard resetting link
[13942645.792053] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[13942645.794225] ata3: irq_stat 0x00400040, connection status changed
[13942645.796163] ata3: SError: { PHYRdyChg 10B8B DevExch }
[13942645.798122] ata3: hard resetting link
[13942651.692129] ata3: link is slow to respond, please be patient (ready=0)
[13942651.692141] ata4: link is slow to respond, please be patient (ready=0)
[13942654.932049] ata3: SATA link down (SStatus 0 SControl 310)
[13942654.932064] ata3.00: link offline, clearing class 1 to NONE
[13942655.076201] ata3: hard resetting link
[13942655.076288] ata4: SATA link down (SStatus 0 SControl 310)
[13942655.076297] ata4.00: link offline, clearing class 1 to NONE
[13942655.082787] ata4: hard resetting link
[13942660.431729] ata3: link is slow to respond, please be patient (ready=0)
[13942660.439653] ata4: link is slow to respond, please be patient (ready=0)
[13942663.859531] ata4: SATA link down (SStatus 0 SControl 310)
[13942663.859540] ata4.00: link offline, clearing class 1 to NONE
[13942663.998391] ata4: hard resetting link
[13942663.998462] ata3: SATA link down (SStatus 0 SControl 310)
[13942663.998471] ata3.00: link offline, clearing class 1 to NONE
[13942664.042066] ata3: hard resetting link
[13942669.391142] ata4: link is slow to respond, please be patient (ready=0)
[13942669.391156] ata3: link is slow to respond, please be patient (ready=0)
[13942672.811032] ata4: SATA link down (SStatus 0 SControl 310)
[13942672.811043] ata4.00: link offline, clearing class 1 to NONE
[13942672.811052] ata4.00: disabled
[13942672.811085] ata4: EH complete
[13942672.811094] ata3: SATA link down (SStatus 0 SControl 310)
[13942672.811105] ata3.00: link offline, clearing class 1 to NONE
[13942672.811115] ata3.00: disabled
[13942672.811118] ata4.00: detaching (SCSI 3:0:0:0)
[13942672.811143] ata3: EH complete
[13942672.811165] ata3.00: detaching (SCSI 2:0:0:0)
[13942672.812676] sd 3:0:0:0: [sdc] Stopping disk
[13942672.812751] sd 2:0:0:0: [sdb] Stopping disk
[13942672.812777] sd 3:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[13942672.812803] sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[13942672.927959] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[13942672.931572] ata4: irq_stat 0x00000040, connection status changed
[13942672.934757] ata4: SError: { CommWake DevExch }
[13942672.937431] ata4: hard resetting link
[13942672.983564] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[13942672.986278] ata3: irq_stat 0x00000040, connection status changed
[13942672.989031] ata3: SError: { CommWake DevExch }
[13942672.992316] ata3: hard resetting link
[13942678.686710] ata4: link is slow to respond, please be patient (ready=0)
[13942678.742705] ata3: link is slow to respond, please be patient (ready=0)
[13942682.946487] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[13942683.002426] ata3: COMRESET failed (errno=-16)
[13942683.005434] ata3: hard resetting link
[13942683.284655] ata4.00: ATA-9: WDC WD6002FFWX-68TZ4N0, 83.H0A83, max UDMA/133
[13942683.284662] ata4.00: 11721045168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[13942683.292587] ata4.00: configured for UDMA/133
[13942683.292606] ata4: EH complete
[13942683.293066] scsi 3:0:0:0: Direct-Access     ATA      WDC WD6002FFWX-6 0A83 PQ: 0 ANSI: 5
[13942683.335331] sd 3:0:0:0: [sdf] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[13942683.335341] sd 3:0:0:0: [sdf] 4096-byte physical blocks
[13942683.335470] sd 3:0:0:0: Attached scsi generic sg1 type 0
[13942683.335706] sd 3:0:0:0: [sdf] Write Protect is off
[13942683.335720] sd 3:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[13942683.335856] sd 3:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13942683.352758] sd 3:0:0:0: [sdf] Attached SCSI removable disk
[13942683.722409] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[13942683.726180] ata3.00: ATA-9: WDC WD6002FFWX-68TZ4N0, 83.H0A83, max UDMA/133
[13942683.726188] ata3.00: 11721045168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[13942683.734444] ata3.00: configured for UDMA/133
[13942683.734462] ata3: EH complete
[13942683.734971] scsi 2:0:0:0: Direct-Access     ATA      WDC WD6002FFWX-6 0A83 PQ: 0 ANSI: 5
[13942683.779103] sd 2:0:0:0: [sdg] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[13942683.779111] sd 2:0:0:0: [sdg] 4096-byte physical blocks
[13942683.779282] sd 2:0:0:0: Attached scsi generic sg2 type 0
[13942683.779483] sd 2:0:0:0: [sdg] Write Protect is off
[13942683.779498] sd 2:0:0:0: [sdg] Mode Sense: 00 3a 00 00
[13942683.779664] sd 2:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13942683.796395] sd 2:0:0:0: [sdg] Attached SCSI removable disk
[13942886.397293] BTRFS error (device dm-1): bdev /dev/mapper/K1JGYJJD errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[13942886.397515] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[13942886.398349] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[13942886.399066] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[13942886.399834] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[13942886.402749] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[13942886.402763] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[13942886.402786] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[13942886.402801] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[13942886.404401] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[13942886.427753] ------------[ cut here ]------------
[13942886.427801] WARNING: CPU: 2 PID: 3344 at /build/linux-iyqref/linux-4.9.110/fs/btrfs/extent-tree.c:6959 __btrfs_free_extent.isra.69+0x821/0xd40 [btrfs]
[13942886.427803] BTRFS: Transaction aborted (error -5)
[13942886.427804] Modules linked in: ses enclosure scsi_transport_sas uas usb_storage cpuid dm_crypt algif_skcipher af_alg fuse ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs libcrc32c dm_mod snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ir_lirc_codec ir_rc5_decoder lirc_dev rc_hauppauge au8522_dig xc5000 tuner au8522_decoder au8522_common ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 iTCO_wdt iTCO_vendor_support intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_hdmi ipt_REJECT ghash_clmulni_intel nf_reject_ipv4 intel_cstate arc4 intel_uncore snd_hda_codec_realtek intel_rapl_perf snd_hda_codec_generic pcspkr evdev serio_raw rtl8723ae btcoexist snd_hda_intel rtl8723_common rtl_pci rtlwifi
[13942886.427884]  snd_hda_codec i915 mac80211 snd_hda_core snd_hwdep au0828 btusb btrtl btbcm tveeprom btintel snd_pcm drm_kms_helper xt_limit dvb_core rc_core cfg80211 v4l2_common bluetooth videobuf2_vmalloc xt_tcpudp videobuf2_memops drm videobuf2_v4l2 rfkill snd_timer videobuf2_core snd videodev mei_me i2c_algo_bit soundcore sg media mei lpc_ich shpchp mfd_core xt_addrtype video ac button nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse i2c_i801 i2c_smbus
[13942886.427969]  ahci libahci r8169 mii libata xhci_pci ehci_pci xhci_hcd ehci_hcd scsi_mod e1000e usbcore ptp usb_common pps_core fan thermal
[13942886.427991] CPU: 2 PID: 3344 Comm: btrfs-transacti Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-3+deb9u1
[13942886.427993] Hardware name: CompuLab Intense-PC/Intense-PC, BIOS CR_2.2.0.346 X64 11/25/2012
[13942886.427996]  0000000000000000 ffffffffa5530694 ffffaf9c03443af8 0000000000000000
[13942886.428002]  ffffffffa5278d6e 0000000008eb0000 ffffaf9c03443b50 00000000fffffffb
[13942886.428007]  ffff9114141aa8c0 ffff9117891e6000 0000000000000000 ffffffffa5278def
[13942886.428012] Call Trace:
[13942886.428022]  [<ffffffffa5530694>] ? dump_stack+0x5c/0x78
[13942886.428030]  [<ffffffffa5278d6e>] ? __warn+0xbe/0xe0
[13942886.428035]  [<ffffffffa5278def>] ? warn_slowpath_fmt+0x5f/0x80
[13942886.428067]  [<ffffffffc04e81a1>] ? __btrfs_free_extent.isra.69+0x821/0xd40 [btrfs]
[13942886.428074]  [<ffffffffa538ea01>] ? __set_page_dirty_nobuffers+0xa1/0x140
[13942886.428108]  [<ffffffffc0558af9>] ? btrfs_merge_delayed_refs+0x69/0x570 [btrfs]
[13942886.428136]  [<ffffffffc04ec99b>] ? __btrfs_run_delayed_refs+0xb9b/0x13a0 [btrfs]
[13942886.428163]  [<ffffffffc04f013d>] ? btrfs_run_delayed_refs+0x9d/0x2b0 [btrfs]
[13942886.428193]  [<ffffffffc05045f3>] ? commit_cowonly_roots+0x113/0x2e0 [btrfs]
[13942886.428219]  [<ffffffffc04f02a4>] ? btrfs_run_delayed_refs+0x204/0x2b0 [btrfs]
[13942886.428251]  [<ffffffffc05744b5>] ? btrfs_qgroup_account_extents+0x85/0x180 [btrfs]
[13942886.428280]  [<ffffffffc050707c>] ? btrfs_commit_transaction+0x4dc/0xa10 [btrfs]
[13942886.428306]  [<ffffffffc0507646>] ? start_transaction+0x96/0x480 [btrfs]
[13942886.428333]  [<ffffffffc0501a4c>] ? transaction_kthread+0x1dc/0x200 [btrfs]
[13942886.428358]  [<ffffffffc0501870>] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
[13942886.428363]  [<ffffffffa52988e9>] ? kthread+0xd9/0xf0
[13942886.428368]  [<ffffffffa5298810>] ? kthread_park+0x60/0x60
[13942886.428372]  [<ffffffffa5814df7>] ? ret_from_fork+0x57/0x70
[13942886.428375] ---[ end trace 5167b44ab18fee6a ]---
[13942886.428379] BTRFS: error (device dm-1) in __btrfs_free_extent:6959: errno=-5 IO failure
[13942886.430140] BTRFS info (device dm-1): forced readonly
[13942886.430144] BTRFS: error (device dm-1) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
[13942886.431954] BTRFS warning (device dm-1): Skipping commit of aborted transaction.
[13942886.431958] BTRFS: error (device dm-1) in cleanup_transaction:1850: errno=-5 IO failure
[13942886.433953] BTRFS error (device dm-1): pending csums is 1048576
[13943151.748688] btrfs_dev_stat_print_on_error: 32 callbacks suppressed
[13943151.748700] BTRFS error (device dm-1): bdev /dev/mapper/K1JGYJJD errs: wr 20, rd 2, flush 0, corrupt 0, gen 0
[13943151.750963] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 20, rd 2, flush 0, corrupt 0, gen 0

[-- Attachment #3: 20191028_syslog_trace.txt --]
[-- Type: text/plain, Size: 10193 bytes --]

Oct 28 21:07:35 kernel: [22575699.639844] btrfs_dev_stat_print_on_error: 151 callbacks suppressed
Oct 28 21:07:35 kernel: [22575699.639859] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798124, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.448265] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798125, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.456928] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798126, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.464365] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798127, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.470414] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798128, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.476524] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798129, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.482249] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798130, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.487931] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798131, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.493286] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798132, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.498637] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798133, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.503710] BTRFS error (device dm-1): bdev /dev/mapper/K1JG82AD errs: wr 15798134, rd 15764237, flush 4361, corrupt 0, gen 0
Oct 28 21:07:50 kernel: [22575714.532609] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:07:50 kernel: [22575714.686683] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:07:50 kernel: [22575714.686694] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:07:50 kernel: [22575714.686734] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:07:50 kernel: [22575714.693146] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:08:11 kernel: [22575735.017883] btrfs_dev_stat_print_on_error: 167 callbacks suppressed
...
Oct 28 21:10:58 kernel: [22575902.412741] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:10:58 kernel: [22575902.412753] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 21:10:58 kernel: [22575902.412761] BTRFS warning (device dm-1): lost page write due to IO error on /dev/mapper/K1JG82AD
Oct 28 22:14:56 kernel: [22579740.894360] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
Oct 28 22:14:56 kernel: [22579740.900000] ata3: irq_stat 0x00400040, connection status changed
Oct 28 22:14:56 kernel: [22579740.905262] ata3: SError: { PHYRdyChg 10B8B DevExch }
Oct 28 22:14:56 kernel: [22579740.910205] ata3: hard resetting link
Oct 28 22:14:56 kernel: [22579740.910251] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
Oct 28 22:14:56 kernel: [22579740.915268] ata4: irq_stat 0x00400040, connection status changed
Oct 28 22:14:56 kernel: [22579740.919644] ata4: SError: { PHYRdyChg 10B8B DevExch }
Oct 28 22:14:56 kernel: [22579740.924090] ata4: hard resetting link
Oct 28 22:15:02 kernel: [22579746.747303] ata3: link is slow to respond, please be patient (ready=0)
Oct 28 22:15:02 kernel: [22579746.783228] ata4: link is slow to respond, please be patient (ready=0)
Oct 28 22:15:05 kernel: [22579749.987500] ata3: SATA link down (SStatus 0 SControl 310)
Oct 28 22:15:05 kernel: [22579749.987509] ata3.00: link offline, clearing class 1 to NONE
Oct 28 22:15:06 kernel: [22579750.102050] ata3: hard resetting link
Oct 28 22:15:06 kernel: [22579750.939545] ata4: COMRESET failed (errno=-16)
Oct 28 22:15:06 kernel: [22579750.944784] ata4: hard resetting link
Oct 28 22:15:11 kernel: [22579755.451791] ata3: link is slow to respond, please be patient (ready=0)
Oct 28 22:15:12 kernel: [22579756.695842] ata4: link is slow to respond, please be patient (ready=0)
Oct 28 22:15:15 kernel: [22579759.096054] ata4: SATA link down (SStatus 0 SControl 310)
Oct 28 22:15:15 kernel: [22579759.096064] ata4.00: link offline, clearing class 1 to NONE
Oct 28 22:15:15 kernel: [22579759.184120] ata4: hard resetting link
Oct 28 22:15:15 kernel: [22579759.244021] ata3: SATA link down (SStatus 0 SControl 310)
Oct 28 22:15:15 kernel: [22579759.244031] ata3.00: link offline, clearing class 1 to NONE
Oct 28 22:15:15 kernel: [22579759.302538] ata3: hard resetting link
Oct 28 22:15:20 kernel: [22579764.536373] ata4: link is slow to respond, please be patient (ready=0)
Oct 28 22:15:20 kernel: [22579764.652375] ata3: link is slow to respond, please be patient (ready=0)
Oct 28 22:15:24 kernel: [22579768.432584] ata3: SATA link down (SStatus 0 SControl 310)
Oct 28 22:15:24 kernel: [22579768.432594] ata3.00: link offline, clearing class 1 to NONE
Oct 28 22:15:24 kernel: [22579768.432600] ata3.00: disabled
Oct 28 22:15:24 kernel: [22579768.432620] ata3: EH complete
Oct 28 22:15:24 kernel: [22579768.432681] ata3.00: detaching (SCSI 2:0:0:0)
Oct 28 22:15:24 kernel: [22579768.433252] sd 2:0:0:0: [sdb] Stopping disk
Oct 28 22:15:24 kernel: [22579768.433323] sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
...
Oct 28 23:03:58 kernel: [22582682.521731] BTRFS info (device dm-1): forced readonly
Oct 28 23:03:58 kernel: [22582682.521732] BTRFS warning (device dm-1): Skipping commit of aborted transaction.
Oct 28 23:03:58 kernel: [22582682.521733] ------------[ cut here ]------------
Oct 28 23:03:58 kernel: [22582682.521755] WARNING: CPU: 0 PID: 1659 at /build/linux-EbeuWA/linux-4.9.130/fs/btrfs/transaction.c:1850 cleanup_transaction+0x1f3/0x2e0 [btrfs]
Oct 28 23:03:58 kernel: [22582682.521756] BTRFS: Transaction aborted (error -5)
Oct 28 23:03:58 kernel: [22582682.521756] Modules linked in: fuse ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs libcrc32c cpuid dm_crypt algif_skcipher af_alg dm_mod snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ir_lirc_codec ir_rc5_decoder lirc_dev rc_hauppauge au8522_dig xc5000 tuner au8522_decoder au8522_common iTCO_wdt iTCO_vendor_support ip6t_REJECT nf_reject_ipv6 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm irqbypass arc4 crct10dif_pclmul crc32_pclmul xt_hl snd_hda_codec_realtek snd_hda_codec_generic ip6t_rt ghash_clmulni_intel rtl8723ae intel_cstate btcoexist rtl8723_common rtl_pci intel_uncore intel_rapl_perf rtlwifi i915 evdev serio_raw pcspkr snd_hda_intel mac80211 nf_conntrack_ipv6 nf_defrag_ipv6 snd_hda_codec au0828 btusb cfg80211 drm_kms_helper tveeprom
Oct 28 23:03:58 kernel: [22582682.521797]  snd_hda_core btrtl sg btbcm snd_hwdep btintel dvb_core snd_pcm rc_core v4l2_common bluetooth videobuf2_vmalloc snd_timer videobuf2_memops videobuf2_v4l2 videobuf2_core drm mei_me snd videodev rfkill i2c_algo_bit lpc_ich soundcore ipt_REJECT media shpchp mei mfd_core nf_reject_ipv4 xt_limit xt_tcpudp xt_addrtype video button ac nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ahci libahci libata i2c_i801 scsi_mod i2c_smbus r8169 mii xhci_pci ehci_pci xhci_hcd e1000e ehci_hcd ptp pps_core usbcore usb_common fan thermal
Oct 28 23:03:58 kernel: [22582682.521846] CPU: 0 PID: 1659 Comm: btrfs-transacti Not tainted 4.9.0-8-amd64 #1 Debian 4.9.130-2
Oct 28 23:03:58 kernel: [22582682.521847] Hardware name: CompuLab Intense-PC/Intense-PC, BIOS CR_2.2.0.346 X64 11/25/2012
Oct 28 23:03:58 kernel: [22582682.521849]  0000000000000000 ffffffff9bd33d74 ffffb3d04273fd50 0000000000000000
Oct 28 23:03:58 kernel: [22582682.521851]  ffffffff9ba7a59e ffff94e6732621b8 ffffb3d04273fda8 ffff94e573994460
Oct 28 23:03:58 kernel: [22582682.521853]  00000000fffffffb ffff94e7c78f40c0 0000000000000000 ffffffff9ba7a61f
Oct 28 23:03:58 kernel: [22582682.521855] Call Trace:
Oct 28 23:03:58 kernel: [22582682.521861]  [<ffffffff9bd33d74>] ? dump_stack+0x5c/0x78
Oct 28 23:03:58 kernel: [22582682.521863]  [<ffffffff9ba7a59e>] ? __warn+0xbe/0xe0
Oct 28 23:03:58 kernel: [22582682.521865]  [<ffffffff9ba7a61f>] ? warn_slowpath_fmt+0x5f/0x80
Oct 28 23:03:58 kernel: [22582682.521878]  [<ffffffffc0346423>] ? cleanup_transaction+0x1f3/0x2e0 [btrfs]
Oct 28 23:03:58 kernel: [22582682.521880]  [<ffffffff9babd210>] ? prepare_to_wait_event+0xf0/0xf0
Oct 28 23:03:58 kernel: [22582682.521892]  [<ffffffffc0347e78>] ? btrfs_commit_transaction+0x298/0xa10 [btrfs]
Oct 28 23:03:58 kernel: [22582682.521903]  [<ffffffffc0348686>] ? start_transaction+0x96/0x480 [btrfs]
Oct 28 23:03:58 kernel: [22582682.521914]  [<ffffffffc0342a8c>] ? transaction_kthread+0x1dc/0x200 [btrfs]
Oct 28 23:03:58 kernel: [22582682.521924]  [<ffffffffc03428b0>] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
Oct 28 23:03:58 kernel: [22582682.521926]  [<ffffffff9ba9a569>] ? kthread+0xd9/0xf0
Oct 28 23:03:58 kernel: [22582682.521928]  [<ffffffff9c0190a4>] ? __switch_to_asm+0x34/0x70
Oct 28 23:03:58 kernel: [22582682.521930]  [<ffffffff9ba9a490>] ? kthread_park+0x60/0x60
Oct 28 23:03:58 kernel: [22582682.521932]  [<ffffffff9c019137>] ? ret_from_fork+0x57/0x70
Oct 28 23:03:58 kernel: [22582682.521933] ---[ end trace 4d239283ca8f1b47 ]---
Oct 28 23:03:58 kernel: [22582682.521934] BTRFS: error (device dm-1) in cleanup_transaction:1850: errno=-5 IO failure
Oct 28 23:03:58 kernel: [22582682.523013] BTRFS info (device dm-1): delayed_refs has NO entry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache
  2020-01-06 17:32 Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache Stephen Conrad
@ 2020-01-06 23:56 ` Chris Murphy
  2020-03-16 20:17   ` Stephen Conrad
  2020-01-07  1:49 ` Remi Gauvin
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2020-01-06 23:56 UTC (permalink / raw)
  To: Stephen Conrad; +Cc: Btrfs BTRFS, Qu Wenruo

On Mon, Jan 6, 2020 at 10:32 AM Stephen Conrad <conradsd@gmail.com> wrote:

> In January 2019 the file system remounted read-only.  I have retained the dmesg
> logs and notes on the actions I took at that time.  Summary is that I saw a
> bunch of SATA "link is slow to respond" and "hard resetting link" messages

This is raid1? I can't tell.

Anyway, this is common when drive SCT ERC is a greater duration than
the SCSI command timer (defaults to 30s). Consumer drives often have
it disabled and you'd have to dig through specs to find out what
duration that means, but with HDDs it is often way more than 30s. The
idea is single drive "deep recovery" - it's better for the drive to
become slow than to report EIO for a bad sector. That may be specious,
but for sure it's not a good idea for RAID. First choice is to set SCT
ERC less than 30s. Common is 70 deciseconds for NAS and enterprise
drives. Second choice, if SCT ERC cannot be configured, is change the
SCSI command timer to something like 120-180s, which I know is crazy
but there you go. More here.

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

Note that SCT ERC is a drive firmware configuration. Where the SCSI
command timer is a kernel timer that applies per (whole) block device.
Both reset to defaults on power cycle. You can use a udev rule for
either of these (you only need one, again SCT ERC is preferred if
available). SCT ERC is set by smartctl, and SCSI command timer is set
via sysfs so you can just echo the value you want.


> followed by BTRFS errors, "BTRFS: Transaction aborted (error -5)", and a stack
> trace.  I tried a few things (unsuccessfully) and had to power down the
> hardware.  After a power cycle the file system mounted R/W as normal.   I then
> ran a btrfs check and btrfs scrub which showed no errors.

Make sure timeout mismatch doesn't exist, and if you make a change,
redo the scrub to be sure. You really want the drive to give up as
fast as  possible so Btrfs is informed of the bad read, which will
include the bad sector's LBA, Btrfs can look up what's there, find a
good copy from mirror, and write over the bad sector causing it to be
fixed up.


> In October 2019 I had a similar event.  Logs show SATA "hard resetting link"
> for both drives as well as BTRFS "lost page write due to IO error" messages.

Yeah virtually certain it's a misconfiguration. Very common. Bad
sectors start out recovering pretty quick, but get slower and slower,
eventually hitting the 30s SCSI command timer, and then the kernel
does a link reset. And then these things never get fixed up, because
when the link is reset, the entire drive's command queue is lost. That
means no discrete read error, and no LBA for the bad sector which
Btrfs needs to know in order to find a good copy of that data and
write it back to the bad sector, thereby fixing it.


> The logs show "forced readonly", "BTRFS: Transaction aborted (error -5)" and a
> stack trace.  After hard power cycle I ran a "btrfs check -p" which resulted
> in a stream messages like "parent transid verify failed on 2418006753280
> wanted 34457 found 30647" and then the following:
> parent transid verify failed on 2417279598592 wanted 35322 found 29823

Yuck. That's a lot of missing generations. I have no idea how that's
possible, especially if this is raid1 (or DUP) metadata.


> # btrfs dev stat /mnt/
> [/dev/mapper/K1JG82AD].write_io_errs   15799831
> [/dev/mapper/K1JG82AD].read_io_errs    15764242
> [/dev/mapper/K1JG82AD].flush_io_errs   4385

That is incredibly bad. That drive has dropped many millions of
writes, and the flush errors could mean this drive is failing to
adhere to proper write ordering. Maybe Qu can tell us.


> At this point I was still seeing occasional log entries for "parent transid
> verify failed" and "read error corrected" so I decided to upgrade from Debian9
> to Debian10 to get more current tools.  Running a scrub with Debian10 tools I
> saw errors detected and corrected... I also saw sata link issues during the
> scrub...

Right. You'll also find this discussed a ton on the linux-raid@ list
for the very same reason, it's not unique to Btrfs.


> # date
> Mon 09 Dec 2019 10:29:05 PM EST
> # btrfs scrub start -B -d /dev/disk/by-uuid/X
> scrub device /dev/mapper/K1JG82AD (id 1) done
>         scrub started at Sun Dec  8 23:06:59 2019 and finished after 05:46:26
>         total bytes scrubbed: 2.80TiB with 9490467 errors
>         error details: verify=1349 csum=9489118
>         corrected errors: 9490467, uncorrectable errors: 0, unverified errors:
> 0

Cool!

BTW you'll want to reset those stats with -z, so that you can tell if
these errors start happening again.

> 1) How should I interpret these errors?  Seems that btrfs messages are telling
> me that there are an abundance of errors everywhere, but that they are all
> correctable...  Should I panic?  Should I proceed?

Never panic. Often results in misunderstanding the problem and bad decisions.

The drive with all the errors has a bunch of bad sectors, maybe by now
they've been repaired, but it's not clear until you ensure the timeout
mismatch is addressed. Then the scrub will be reliable, even if there
are stalls on bad sectors that take a damn long time to either
successfully read or fail with the proper error.

What you really want are consumer drives with configurable SCT ERC; or
get "NAS" drives which already have this set to a default of 70
deciseconds.

>
> 2) Is my file system broken? Is my data corrupted?  Should I be able to scrub
> etc to get back to operation without scary log messages? Can I trust data that
> I copy out now, or need to fall back on old/incomplete backups?

Btrfs will EIO rather than propagate corrupt data to user space. You
can trust the data you copy out. This assumes all data has checksums.
If you're using nocow for anything, then there can be silent
corruption.


> 3) What steps are recommended to backup/offload/recover data?  I am considering
> installing the disks into a different machine, then mounting the array read-
> only, and then pulling a full copy of the data...
>
> 4) What steps should I take to clean up the file system errors/messages?  Start
> fresh after full backup, (though I hate the idea of migrating off a redundant
> array onto a single disk in the process)?  Scrub etc?  Evaluate each disk
> independently and rebuild one from the other?

I think all of these are addressed, but let me know if something isn't clear.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache
  2020-01-06 17:32 Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache Stephen Conrad
  2020-01-06 23:56 ` Chris Murphy
@ 2020-01-07  1:49 ` Remi Gauvin
  1 sibling, 0 replies; 5+ messages in thread
From: Remi Gauvin @ 2020-01-07  1:49 UTC (permalink / raw)
  To: Stephen Conrad, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1833 bytes --]

On 2020-01-06 12:32 p.m., Stephen Conrad wrote:

> 
> 4) What steps should I take to clean up the file system errors/messages?  Start 
> fresh after full backup, (though I hate the idea of migrating off a redundant 
> array onto a single disk in the process)?  Scrub etc?  Evaluate each disk 
> independently and rebuild one from the other?


I'm not really an expert, so this is just my understanding of btrfs so far:

Assuming these two devices are in a btrfs raid 1, only one of the two
devices has errors, (which is what I would expect for a device that
disconnected while the filesystem was running.)

With Millions of failed writes, i don't think it's appropriate to count
on scrub to have fixed up all the potential data errors.  Scrub on BTRFS
will check the CRC independently on both drives, and copy over data
where CRC fails, but CRC isn't perfect.  To the best of my knowledge,
BTRFS does not actually compare the data between mirror copies.

So my suggestion, to fix this safely, is to do a btrfs replace, and
replace /dev/mapper/K1JG82AD with a new device.


btrfs replace -r /dev/mapper/K1JG82AD /dev/new_device_partition /btrfs

The -r switch will ensure data is only copied from the drive that is not
full of potentially bad data with CRC collisions.

If you determine that /dev/mapper/K1JG82AD is actually physically fine,
you can re-add to the array.. maybe even use it to replace the spare
drive we just put in.

One note about using btrfs replace,, I don't know if this might be fixed
with newer kernels,, but I find that the error count from the old device
will re-appear on the new device, but only after a reboot, (or maybe
unmount/mount.) you should check for that, and if the new device
inherits your old errors, you should zero it with btrfs dev stats -z







[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache
  2020-01-06 23:56 ` Chris Murphy
@ 2020-03-16 20:17   ` Stephen Conrad
  2020-03-17 21:42     ` Chris Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Conrad @ 2020-03-16 20:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo

On Monday, January 6, 2020 6:56:19 PM EDT Chris Murphy wrote:
> On Mon, Jan 6, 2020 at 10:32 AM Stephen Conrad <conradsd@gmail.com> wrote:
> > In January 2019 the file system remounted read-only.  I have retained the
> > dmesg logs and notes on the actions I took at that time.  Summary is that
> > I saw a bunch of SATA "link is slow to respond" and "hard resetting link"
> > messages
> This is raid1? I can't tell.
> 

# btrfs filesystem show <UUID>
Label: none  uuid: <UUID>
        Total devices 2 FS bytes used 2.80TiB
        devid    1 size 5.46TiB used 2.80TiB path /dev/mapper/XXX
        devid    2 size 5.46TiB used 2.80TiB path /dev/mapper/YYY
 
# btrfs filesystem df /mnt/raid 
Data, RAID1: total=2.80TiB, used=2.80TiB
System, RAID1: total=32.00MiB, used=416.00KiB
Metadata, RAID1: total=4.00GiB, used=3.16GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

> Anyway, this is common when drive SCT ERC is a greater duration than
> the SCSI command timer (defaults to 30s). Consumer drives often have
> it disabled and you'd have to dig through specs to find out what
> duration that means, but with HDDs it is often way more than 30s. The
> idea is single drive "deep recovery" - it's better for the drive to
> become slow than to report EIO for a bad sector. That may be specious,
> but for sure it's not a good idea for RAID. First choice is to set SCT
> ERC less than 30s. Common is 70 deciseconds for NAS and enterprise
> drives. Second choice, if SCT ERC cannot be configured, is change the
> SCSI command timer to something like 120-180s, which I know is crazy
> but there you go. More here.
> 
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> 
> Note that SCT ERC is a drive firmware configuration. Where the SCSI
> command timer is a kernel timer that applies per (whole) block device.
> Both reset to defaults on power cycle. You can use a udev rule for
> either of these (you only need one, again SCT ERC is preferred if
> available). SCT ERC is set by smartctl, and SCSI command timer is set
> via sysfs so you can just echo the value you want.
> 

These are WD Red Pro drives.  At the time I set things up (Fall 2017) they 
were not fully supported in smartmontools and while I was aware of the 
importance of SCT ERC I was unable to see or change the settings.  Based on 
some internet hearsay I proceeded with assumption that the default would be 
acceptable.

Now (2020 / Debian 10) they are fully supported by smartmontools:
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-8-amd64] (local build)
Model Family:     Western Digital Red Pro
Device Model:     WDC WD6002FFWX-68TZ4N0
Firmware Version: 83.H0A83
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

# smartctl -l scterc /dev/disk/by-id/ata-WDC_WD6002FFWX-68TZ4N0_...
SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

So at least now both disks look like they are defaulting to 7 seconds.  Since 
this is a firmware thing I presume that they always have been at 7 seconds.

# cat /sys/block/sdb/device/timeout 
30
# cat /sys/block/sdc/device/timeout 
30

So all of my timing looks like it is (and probably always has been) good...

> > followed by BTRFS errors, "BTRFS: Transaction aborted (error -5)", and a
> > stack trace.  I tried a few things (unsuccessfully) and had to power down
> > the hardware.  After a power cycle the file system mounted R/W as normal.
> >   I then ran a btrfs check and btrfs scrub which showed no errors.
> 
> Make sure timeout mismatch doesn't exist, and if you make a change,
> redo the scrub to be sure. You really want the drive to give up as
> fast as  possible so Btrfs is informed of the bad read, which will
> include the bad sector's LBA, Btrfs can look up what's there, find a
> good copy from mirror, and write over the bad sector causing it to be
> fixed up.
> 
> > In October 2019 I had a similar event.  Logs show SATA "hard resetting
> > link" for both drives as well as BTRFS "lost page write due to IO error"
> > messages.
> Yeah virtually certain it's a misconfiguration. Very common. Bad
> sectors start out recovering pretty quick, but get slower and slower,
> eventually hitting the 30s SCSI command timer, and then the kernel
> does a link reset. And then these things never get fixed up, because
> when the link is reset, the entire drive's command queue is lost. That
> means no discrete read error, and no LBA for the bad sector which
> Btrfs needs to know in order to find a good copy of that data and
> write it back to the bad sector, thereby fixing it.
> 
> > The logs show "forced readonly", "BTRFS: Transaction aborted (error -5)"
> > and a stack trace.  After hard power cycle I ran a "btrfs check -p" which
> > resulted in a stream messages like "parent transid verify failed on
> > 2418006753280 wanted 34457 found 30647" and then the following:
> > parent transid verify failed on 2417279598592 wanted 35322 found 29823
> 
> Yuck. That's a lot of missing generations. I have no idea how that's
> possible, especially if this is raid1 (or DUP) metadata.
> 
> > # btrfs dev stat /mnt/
> > [/dev/mapper/K1JG82AD].write_io_errs   15799831
> > [/dev/mapper/K1JG82AD].read_io_errs    15764242
> > [/dev/mapper/K1JG82AD].flush_io_errs   4385
> 
> That is incredibly bad. That drive has dropped many millions of
> writes, and the flush errors could mean this drive is failing to
> adhere to proper write ordering. Maybe Qu can tell us.
> 
> > At this point I was still seeing occasional log entries for "parent
> > transid
> > verify failed" and "read error corrected" so I decided to upgrade from
> > Debian9 to Debian10 to get more current tools.  Running a scrub with
> > Debian10 tools I saw errors detected and corrected... I also saw sata
> > link issues during the scrub...
> 
> Right. You'll also find this discussed a ton on the linux-raid@ list
> for the very same reason, it's not unique to Btrfs.
> 
> > # date
> > Mon 09 Dec 2019 10:29:05 PM EST
> > # btrfs scrub start -B -d /dev/disk/by-uuid/X
> > scrub device /dev/mapper/K1JG82AD (id 1) done
> > 
> >         scrub started at Sun Dec  8 23:06:59 2019 and finished after
> >         05:46:26
> >         total bytes scrubbed: 2.80TiB with 9490467 errors
> >         error details: verify=1349 csum=9489118
> > 
> >         corrected errors: 9490467, uncorrectable errors: 0, unverified 
errors:
> > 0
> 
> Cool!
> 
> BTW you'll want to reset those stats with -z, so that you can tell if
> these errors start happening again.
> 
> > 1) How should I interpret these errors?  Seems that btrfs messages are
> > telling me that there are an abundance of errors everywhere, but that
> > they are all correctable...  Should I panic?  Should I proceed?
> 
> Never panic. Often results in misunderstanding the problem and bad
> decisions.
> 
> The drive with all the errors has a bunch of bad sectors, maybe by now
> they've been repaired, but it's not clear until you ensure the timeout
> mismatch is addressed. Then the scrub will be reliable, even if there
> are stalls on bad sectors that take a damn long time to either
> successfully read or fail with the proper error.
> 
> What you really want are consumer drives with configurable SCT ERC; or
> get "NAS" drives which already have this set to a default of 70
> deciseconds.
> 

As I just added above, these are WD Red Pro drives.

> > 2) Is my file system broken? Is my data corrupted?  Should I be able to
> > scrub etc to get back to operation without scary log messages? Can I
> > trust data that I copy out now, or need to fall back on old/incomplete
> > backups?
> 
> Btrfs will EIO rather than propagate corrupt data to user space. You
> can trust the data you copy out. This assumes all data has checksums.
> If you're using nocow for anything, then there can be silent
> corruption.
> 
> > 3) What steps are recommended to backup/offload/recover data?  I am
> > considering installing the disks into a different machine, then mounting
> > the array read- only, and then pulling a full copy of the data...
> > 
> > 4) What steps should I take to clean up the file system errors/messages? 
> > Start fresh after full backup, (though I hate the idea of migrating off a
> > redundant array onto a single disk in the process)?  Scrub etc?  Evaluate
> > each disk independently and rebuild one from the other?
> 
> I think all of these are addressed, but let me know if something isn't
> clear.

I have pulled the drives from the original server and installed them in a 
different workstation.  On that system I mounted the array read-only 
(deliberately) and made a new additional backup copy of all the files.  Neither 
drive had any SMART errors logged and both disks passed a long self test.

At that point I ran a btrfs check and step "[2/7] checking extents" logs a 
number of errors.  I am seeing four errors like "space cache generation (xxx) 
does not match inode (yyy)", twelve errors like "csum mismatch on free space 
cache", and each of those errors are followed by a "failed to load free space 
cache for block group ZZZZ".  I then mounted read/write and ran a scrub, which 
reported zero errors.  Re-running the btrfs check however shows the same 
errors as before.

In this entire process neither disk had any more link reset errors while 
installed in the new(different) system and the error counts shown by "btrfs dev 
stat" remained constant.  I also note that he smart data for these drives has 
a much higher "Power_Cycle_Count" than expected for the server, so either the 
link errors were resulting in disk power cycles, or perhaps the disk was power 
cycling and causing link resets.  In any case it seems that my original 
problem is hardware related (power supply/firmware/chipset/etc.).  Seems my 
disks are good but I still have residual file system trouble.

At this point what are my options to correct the remaining errors shown in 
btrfs check?  Then what steps can I take to verify my data integrity and gain 
confidence that everything is OK?

In order to debug the hardware on the original server, any ideas on generating 
a lot of disk activity to try to test for link resets without risking any 
data?

Thanks




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache
  2020-03-16 20:17   ` Stephen Conrad
@ 2020-03-17 21:42     ` Chris Murphy
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2020-03-17 21:42 UTC (permalink / raw)
  To: Stephen Conrad; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo

From the original email I see: btrfs-progs v4.20.1

Can you use something newer to do a btrfs check? v5.4.1 is current but
I think v5.3 is probably adequate. Whereas btrfs-progs 4.20 is a year
old. Please post the entire output of the btrfs check (without using
repair).

---
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-17 21:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-06 17:32 Errors after SATA hard resets: parent transid verify failed, csum mismatch on free space cache Stephen Conrad
2020-01-06 23:56 ` Chris Murphy
2020-03-16 20:17   ` Stephen Conrad
2020-03-17 21:42     ` Chris Murphy
2020-01-07  1:49 ` Remi Gauvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).