linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unmountable degraded BTRFS RAID6 filesystem
@ 2019-09-03 22:20 Edmund Urbani
  2019-09-03 23:30 ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Edmund Urbani @ 2019-09-03 22:20 UTC (permalink / raw)
  To: linux-btrfs


Hi all,

two days ago my btrfs filesystem became quite slow and the logs showed a
lot of I/O errors on one of the HDDs. I ordered a replacement drive and
tried to remove the failing drive from the filesystem (btrfs device
remove). That removal command did not finish but just sat there without
any output.

Today the new drive arrived. Device removal still had not finished, but
the filesystem had entered read-only mode last night. I shut down the
system to replace the defective drive. However, after the reboot I am no
longer able to mount the filesystem at all or recover any data from it.:(

*****
uname -a

Linux phoenix 4.14.78-gentoo #1 SMP Mon Dec 3 09:25:24 CET 2018 x86_64
AMD Opteron(tm) Processor 6174 AuthenticAMD GNU/Linux

*****
btrfs --version

btrfs-progs v4.19

*****
btrfs fi show

warning, device 8 is missing
warning, device 8 is missing
checksum verify failed on 71133554540544 found B52922D9 wanted C8FB97CF
checksum verify failed on 71133554540544 found 9820D207 wanted 189B50C0
checksum verify failed on 71133554540544 found 9820D207 wanted 189B50C0
bad tree block 71133554540544, bytenr mismatch, want=71133554540544,
have=7227596181724576485
ERROR: cannot read chunk root
Label: none uuid: 108df6ea-2846-4a88-8a50-61aedeef92b4
Total devices 10 FS bytes used 14.71TiB
devid 1 size 2.73TiB used 2.04TiB path /dev/sdg1
devid 2 size 2.73TiB used 2.04TiB path /dev/sdh1
devid 3 size 2.73TiB used 2.04TiB path /dev/sdj1
devid 4 size 2.73TiB used 2.04TiB path /dev/sdi1
devid 5 size 2.73TiB used 2.04TiB path /dev/sde1
devid 6 size 2.73TiB used 2.04TiB path /dev/sdf1
devid 7 size 2.73TiB used 2.04TiB path /dev/sda1
devid 9 size 2.73TiB used 2.04TiB path /dev/sdc1
devid 10 size 2.73TiB used 2.04TiB path /dev/sdd1
*** Some devices missing

*****
dmesg (after attempting mount with -o degraded)

...
[ 8904.358084] BTRFS info (device sda1): turning on discard
[ 8904.358088] BTRFS info (device sda1): allowing degraded mounts
[ 8904.358089] BTRFS info (device sda1): disk space caching is enabled
[ 8904.358091] BTRFS info (device sda1): has skinny extents
[ 8904.361743] BTRFS warning (device sda1): devid 8 uuid
0e8b4aff-6d64-4d31-a135-705421928f94 is missing
[ 8905.705036] BTRFS info (device sda1): bdev (null) errs: wr 0, rd
14809, flush 0, corrupt 4, gen 0
[ 8905.705041] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
4, flush 0, corrupt 0, gen 0
[ 8905.705052] BTRFS info (device sda1): bdev /dev/sdf1 errs: wr 0, rd
10543, flush 0, corrupt 0, gen 0
[ 8905.705062] BTRFS info (device sda1): bdev /dev/sdc1 errs: wr 0, rd
8, flush 0, corrupt 0, gen 0
[ 8909.565118] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.565978] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.567462] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.568439] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.569861] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.570695] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.572146] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.572969] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.574175] BTRFS error (device sda1): bad tree block start
12170572967447269873 34958581399552
[ 8909.574189] BTRFS error (device sda1): failed to read block groups: -5
[ 8909.635991] BTRFS error (device sda1): open_ctree failed

*****
btrfs check /dev/sda1

Opening filesystem to check...
warning, device 8 is missing
warning, device 8 is missing
checksum verify failed on 71133554540544 found B52922D9 wanted C8FB97CF
checksum verify failed on 71133554540544 found 9820D207 wanted 189B50C0
checksum verify failed on 71133554540544 found 9820D207 wanted 189B50C0
bad tree block 71133554540544, bytenr mismatch, want=71133554540544,
have=7227596181724576485
ERROR: cannot read chunk root
ERROR: cannot open file system

*****

I have tried all the mount / restore options listed here:
https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490

... and all I keep getting is "bad tree block" errors. Superblocks seem
fine (btrfs rescue super-reecover found no problem). I am considering
trying "btrfs rescue chunk-recover" at this point.

Could this help in my situation? What do you think?

Kind regards
 Edmund


-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-03 22:20 Unmountable degraded BTRFS RAID6 filesystem Edmund Urbani
@ 2019-09-03 23:30 ` Chris Murphy
  2019-09-04  4:39   ` Edmund Urbani
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2019-09-03 23:30 UTC (permalink / raw)
  To: Edmund Urbani, Qu Wenruo; +Cc: Btrfs BTRFS

On Tue, Sep 3, 2019 at 2:20 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>
>
> Hi all,
>
> two days ago my btrfs filesystem became quite slow and the logs showed a
> lot of I/O errors on one of the HDDs. I ordered a replacement drive and
> tried to remove the failing drive from the filesystem (btrfs device
> remove). That removal command did not finish but just sat there without
> any output.

What exact commands?

'btrfs device del missing' I expect causes reconstruction from parity
as well as a balance to create the new 9 device stripe width (well, 7
data + 2 parity). This is not an inherently bad thing to do, it should
work and should be COW. And there's one extra copy available in case
of an unrecoverable read error, it can still do additional
reconstruction.

Because it's a balance though, it might be really, really slow and I
don't think there is no way to cancel device removal. I don't think
it's possible to cancel it with btrfs balance stop.

How many subvolumes and snapshots? Are quotas enabled?


> Today the new drive arrived. Device removal still had not finished, but
> the filesystem had entered read-only mode last night.

Likely pre-existing problem is discovered during the balance, or bug
triggered, or both, and the file system goes read only to avoid
further corruption. Do you have kernel messages for the entire time
starting at 'device delete' until the file system goes read only?

> Linux phoenix 4.14.78-gentoo #1 SMP Mon Dec 3 09:25:24 CET 2018 x86_64

kernel 4.14.141 is the current version LTS for that series, and there
are hundreds of bug fix insertions/removals between just those two
versions
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v4.14.141&id2=v4.14.78&dt=2

between kernel 4.14.141 and 5.2.11, there are thousands of changes
just in Btrfs... thousands
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v5.2.11&id2=v4.14.141&dt=2

And quite a few in raid56.c which isn't that big to begin with, but
there are a lot of simplifications and improvements from what I can
tell
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/fs/btrfs/raid56.c?id=v5.2.11&id2=v4.14.141

Anyway, it's worth a try to try and mount with 5.2.11 using '-o
ro,degraded' and at least see if it will mount. But it gives you some
idea why there's a strong bias toward using newer kernels. It's too
hard to remember all the changes, even for developers.



> AMD Opteron(tm) Processor 6174 AuthenticAMD GNU/Linux
>
> *****
> btrfs --version
>
> btrfs-progs v4.19

This is OK, but the change log will show lots of bug fixes here too. I
wouldn't make changes (no repair attempts at all, including chunk
recover or --repair) until you get some dev advice about the next
step.


> [ 8904.358084] BTRFS info (device sda1): turning on discard

Unexpected.

> [ 8904.358088] BTRFS info (device sda1): allowing degraded mounts
> [ 8904.358089] BTRFS info (device sda1): disk space caching is enabled
> [ 8904.358091] BTRFS info (device sda1): has skinny extents
> [ 8904.361743] BTRFS warning (device sda1): devid 8 uuid
> 0e8b4aff-6d64-4d31-a135-705421928f94 is missing
> [ 8905.705036] BTRFS info (device sda1): bdev (null) errs: wr 0, rd
> 14809, flush 0, corrupt 4, gen 0
> [ 8905.705041] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
> 4, flush 0, corrupt 0, gen 0
> [ 8905.705052] BTRFS info (device sda1): bdev /dev/sdf1 errs: wr 0, rd
> 10543, flush 0, corrupt 0, gen 0
> [ 8905.705062] BTRFS info (device sda1): bdev /dev/sdc1 errs: wr 0, rd
> 8, flush 0, corrupt 0, gen 0

four devices with read errors

When was the last time the volume was scrubbed? Do you know for sure
these errors have not gone up at all since the last successful scrub?
And were any errors reported for that last scrub?


> I have tried all the mount / restore options listed here:
> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490

Good. Stick with ro attempts for now. Including if you want to try a
newer kernel. If it succeeds to mount ro, my advice is to update
backups so at least critical information isn't lost. Back up while you
can. Any repair attempt makes changes that will risk the data being
permanently lost. So it's important to be really deliberate about any
changes.


> ... and all I keep getting is "bad tree block" errors. Superblocks seem
> fine (btrfs rescue super-reecover found no problem). I am considering
> trying "btrfs rescue chunk-recover" at this point.
>
> Could this help in my situation? What do you think?

I'm not sure if chunk recover can work on degraded volumes. Your best
bet is to not make any further changes to the volume itself.

Preserve all logs.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-03 23:30 ` Chris Murphy
@ 2019-09-04  4:39   ` Edmund Urbani
  2019-09-04  5:36     ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Edmund Urbani @ 2019-09-04  4:39 UTC (permalink / raw)
  To: linux-btrfs



On 09/04/2019 01:30 AM, Chris Murphy wrote:
> On Tue, Sep 3, 2019 at 2:20 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>>
>> Hi all,
>>
>> two days ago my btrfs filesystem became quite slow and the logs showed a
>> lot of I/O errors on one of the HDDs. I ordered a replacement drive and
>> tried to remove the failing drive from the filesystem (btrfs device
>> remove). That removal command did not finish but just sat there without
>> any output.
> What exact commands?
btrfs device delete /dev/sdb1 /mnt/shared
>
> 'btrfs device del missing' I expect causes reconstruction from parity
> as well as a balance to create the new 9 device stripe width (well, 7
> data + 2 parity). This is not an inherently bad thing to do, it should
> work and should be COW. And there's one extra copy available in case
> of an unrecoverable read error, it can still do additional
> reconstruction.
>
> Because it's a balance though, it might be really, really slow and I
> don't think there is no way to cancel device removal. I don't think
> it's possible to cancel it with btrfs balance stop.
>
> How many subvolumes and snapshots? Are quotas enabled?
One subvolume, no snapshot (I used to have some, but I think I removed
them all), no quotes.
>
>
>> Today the new drive arrived. Device removal still had not finished, but
>> the filesystem had entered read-only mode last night.
> Likely pre-existing problem is discovered during the balance, or bug
> triggered, or both, and the file system goes read only to avoid
> further corruption. Do you have kernel messages for the entire time
> starting at 'device delete' until the file system goes read only?
Well, I found this in the logs and that might be around the time I
started device removal:

Sep  1 16:19:30 phoenix kernel: ------------[ cut here ]------------
Sep  1 16:19:30 phoenix kernel: WARNING: CPU: 9 PID: 6401 at
fs/btrfs/ctree.h:1564 btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: Modules linked in: nfnetlink_queue
nfnetlink_log nvidia_uvm(O) nvidia(PO) tun nfsd auth_rpcgss oid_registry
nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink xfrm_user xt_addrtype br_netfilter bridge stp llc
nf_conntrack_irc xt_CT xt_tcpudp xt_helper nf_conntrack_ftp nf_log_ipv4
nf_log_common ip6table_raw ip6table_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT
nf_reject_ipv4 iptable_raw iptable_mangle xt_multiport xt_state xt_limit
xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter
ip_tables x_tables binfm
t_misc snd_hda_codec_hdmi ata_generic kvm_amd kvm snd_hda_intel
irqbypass ftdi_sio snd_hda_codec usbserial snd_hda_core pcspkr ipmi_si
snd_pcm i2c_piix4 k10temp snd_timer pata_acpi ohci_pci
Sep  1 16:19:30 phoenix kernel:  snd e1000e nvidiafb vgastate shpchp
evdev xts crypto_simd cryptd glue_helper aes_x86_64 ixgb ixgbe tulip
cxgb3 cxgb mdio bonding vxlan ip6_udp_tunnel udp_tunnel macvlan tg3
libphy sky2 r8169 pcnet32 mii igb ptp pps_core dca e1000 bnx2 msdos fat
fscrypto configfs overlay fuse nfs lockd grace sunrpc fscache btrfs
zstd_decompress zstd_compress xxhash zlib_deflate dm_thin_pool
dm_persistent_data dm_bio_prison hid_sunplus hid_sony hid_samsung hid_pl
hid_petalynx hid_logitech_dj hid_gyration sl811_hcd usbhid xhci_pci
xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd qla2xxx
megaraid_sas megaraid aa
craid sx8 DAC960 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc
scsi_transport_fc mptspi mptscsih mptbase atp870u dc395x qla1280 imm
parport dmx3191d sym53c8xx gdth BusLogic aic7xxx aic79xx
Sep  1 16:19:30 phoenix kernel:  scsi_transport_spi sr_mod cdrom sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti
pata_amd pata_ali pata_it8213 pata_ns87415 pata_ns87410 pata_serverworks
pata_cypress pata_oldpiix pata_artop pata_it821x pata_optidma
pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix [last
unloaded: nvid
ia]
Sep  1 16:19:30 phoenix kernel: CPU: 9 PID: 6401 Comm: btrfs Tainted:
P        W  O    4.14.78-gentoo #1
Sep  1 16:19:30 phoenix kernel: Hardware name: System manufacturer
System Product Name/KGP(M)E-D16, BIOS 2202    03/29/2012
Sep  1 16:19:30 phoenix kernel: task: ffff880090b7c700 task.stack:
ffffc9000b394000
Sep  1 16:19:30 phoenix kernel: RIP:
0010:btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: RSP: 0018:ffffc9000b397b98 EFLAGS: 00010206
Sep  1 16:19:30 phoenix kernel: RAX: 0000000000000fff RBX:
ffff88035d1b4690 RCX: 000002baa1371e00
Sep  1 16:19:30 phoenix kernel: RDX: 0000000000001000 RSI:
ffff880000000000 RDI: ffff88031455fd20
Sep  1 16:19:30 phoenix kernel: RBP: ffff88022c53e400 R08:
ffffc9000b397b50 R09: ffffc9000b397b58
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000003 R11:
0000000000003000 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000003cf0 R14:
ffff88031455fd20 R15: ffff880826250348
Sep  1 16:19:30 phoenix kernel: FS:  00007f16f8e4e8c0(0000)
GS:ffff88042f980000(0000) knlGS:0000000000000000
Sep  1 16:19:30 phoenix kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  1 16:19:30 phoenix kernel: CR2: 00007f0c0564b180 CR3:
0000000438a32000 CR4: 00000000000006e0
Sep  1 16:19:30 phoenix kernel: Call Trace:
Sep  1 16:19:30 phoenix kernel:  btrfs_remove_chunk+0x2f9/0x700 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_relocate_chunk+0x9c/0xd0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_shrink_device+0x1c0/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ?
btrfs_find_device_missing_or_by_path+0x30/0x120 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_rm_device+0x19b/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_from_user+0x3f/0x80
Sep  1 16:19:30 phoenix kernel:  btrfs_ioctl+0x2129/0x2380 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_to_user+0x22/0x30
Sep  1 16:19:30 phoenix kernel:  ? cp_new_stat+0x138/0x150
Sep  1 16:19:30 phoenix kernel:  do_vfs_ioctl+0x9b/0x5e0
Sep  1 16:19:30 phoenix kernel:  ? SyS_newstat+0x35/0x40
Sep  1 16:19:30 phoenix kernel:  SyS_ioctl+0x47/0x90
Sep  1 16:19:30 phoenix kernel:  do_syscall_64+0x55/0x100
Sep  1 16:19:30 phoenix kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Sep  1 16:19:30 phoenix kernel: RIP: 0033:0x7f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RSP: 002b:00007ffd701d26e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Sep  1 16:19:30 phoenix kernel: RAX: ffffffffffffffda RBX:
00007ffd701d4880 RCX: 00007f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RDX: 00007ffd701d3720 RSI:
000000005000943a RDI: 0000000000000003
Sep  1 16:19:30 phoenix kernel: RBP: 00007ffd701d3720 R08:
0000000000000000 R09: 000000000000000c
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000572 R11:
0000000000000206 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000000000 R14:
0000000000000003 R15: 00007ffd701d4888
Sep  1 16:19:30 phoenix kernel: Code: 4c 89 f7 45 31 c0 ba 10 00 00 00
4c 89 ee e8 9a 40 ff ff 4c 89 f7 e8 42 29 fd ff e9 de fe ff ff 41 bc f4
ff ff ff e9 db fe ff ff <0f> 0b eb b7 0f 1f 40 00 66 2e 0f 1f 84 00 00
00 00 00 53 31 d2
Sep  1 16:19:30 phoenix kernel: ---[ end trace edd626af3a502d93 ]---
Sep  1 16:19:30 phoenix kernel: ------------[ cut here ]------------
Sep  1 16:19:30 phoenix kernel: WARNING: CPU: 10 PID: 6401 at
fs/btrfs/ctree.h:1564 btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: Modules linked in: nfnetlink_queue
nfnetlink_log nvidia_uvm(O) nvidia(PO) tun nfsd auth_rpcgss oid_registry
nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink xfrm_user xt_addrtype br_netfilter bridge stp llc
nf_conntrack_irc xt_CT xt_tcpudp xt_helper nf_conntrack_ftp nf_log_ipv4
nf_log_common ip6table_raw ip6table_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT
nf_reject_ipv4 iptable_raw iptable_mangle xt_multiport xt_state xt_limit
xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter
ip_tables x_tables binfm
t_misc snd_hda_codec_hdmi ata_generic kvm_amd kvm snd_hda_intel
irqbypass ftdi_sio snd_hda_codec usbserial snd_hda_core pcspkr ipmi_si
snd_pcm i2c_piix4 k10temp snd_timer pata_acpi ohci_pci
Sep  1 16:19:30 phoenix kernel:  snd e1000e nvidiafb vgastate shpchp
evdev xts crypto_simd cryptd glue_helper aes_x86_64 ixgb ixgbe tulip
cxgb3 cxgb mdio bonding vxlan ip6_udp_tunnel udp_tunnel macvlan tg3
libphy sky2 r8169 pcnet32 mii igb ptp pps_core dca e1000 bnx2 msdos fat
fscrypto configfs overlay fuse nfs lockd grace sunrpc fscache btrfs
zstd_decompress zstd_compress xxhash zlib_deflate dm_thin_pool
dm_persistent_data dm_bio_prison hid_sunplus hid_sony hid_samsung hid_pl
hid_petalynx hid_logitech_dj hid_gyration sl811_hcd usbhid xhci_pci
xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd qla2xxx
megaraid_sas megaraid aa
craid sx8 DAC960 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc
scsi_transport_fc mptspi mptscsih mptbase atp870u dc395x qla1280 imm
parport dmx3191d sym53c8xx gdth BusLogic aic7xxx aic79xx
Sep  1 16:19:30 phoenix kernel:  scsi_transport_spi sr_mod cdrom sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti
pata_amd pata_ali pata_it8213 pata_ns87415 pata_ns87410 pata_serverworks
pata_cypress pata_oldpiix pata_artop pata_it821x pata_optidma
pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix [last
unloaded: nvid
ia]
Sep  1 16:19:30 phoenix kernel: CPU: 10 PID: 6401 Comm: btrfs Tainted:
P        W  O    4.14.78-gentoo #1
Sep  1 16:19:30 phoenix kernel: Hardware name: System manufacturer
System Product Name/KGP(M)E-D16, BIOS 2202    03/29/2012
Sep  1 16:19:30 phoenix kernel: task: ffff880090b7c700 task.stack:
ffffc9000b394000
Sep  1 16:19:30 phoenix kernel: RIP:
0010:btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: RSP: 0018:ffffc9000b397b98 EFLAGS: 00010206
Sep  1 16:19:30 phoenix kernel: RAX: 0000000000000fff RBX:
ffff88035d1b4690 RCX: 000002baa1371e00
Sep  1 16:19:30 phoenix kernel: RDX: 0000000000001000 RSI:
ffff880000000000 RDI: ffff88031455fd20
Sep  1 16:19:30 phoenix kernel: RBP: ffff8808261adc00 R08:
ffffc9000b397b50 R09: ffffc9000b397b58
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000003 R11:
0000000000003000 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000003e16 R14:
ffff88031455fd20 R15: ffff880826250348
Sep  1 16:19:30 phoenix kernel: FS:  00007f16f8e4e8c0(0000)
GS:ffff88042fa00000(0000) knlGS:0000000000000000
Sep  1 16:19:30 phoenix kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  1 16:19:30 phoenix kernel: CR2: 00007f0c0564b180 CR3:
0000000438a32000 CR4: 00000000000006e0
Sep  1 16:19:30 phoenix kernel: Call Trace:
Sep  1 16:19:30 phoenix kernel:  btrfs_remove_chunk+0x2f9/0x700 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_relocate_chunk+0x9c/0xd0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_shrink_device+0x1c0/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ?
btrfs_find_device_missing_or_by_path+0x30/0x120 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_rm_device+0x19b/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_from_user+0x3f/0x80
Sep  1 16:19:30 phoenix kernel:  btrfs_ioctl+0x2129/0x2380 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_to_user+0x22/0x30
Sep  1 16:19:30 phoenix kernel:  ? cp_new_stat+0x138/0x150
Sep  1 16:19:30 phoenix kernel:  do_vfs_ioctl+0x9b/0x5e0
Sep  1 16:19:30 phoenix kernel:  ? SyS_newstat+0x35/0x40
Sep  1 16:19:30 phoenix kernel:  SyS_ioctl+0x47/0x90
Sep  1 16:19:30 phoenix kernel:  do_syscall_64+0x55/0x100
Sep  1 16:19:30 phoenix kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Sep  1 16:19:30 phoenix kernel: RIP: 0033:0x7f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RSP: 002b:00007ffd701d26e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Sep  1 16:19:30 phoenix kernel: RAX: ffffffffffffffda RBX:
00007ffd701d4880 RCX: 00007f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RDX: 00007ffd701d3720 RSI:
000000005000943a RDI: 0000000000000003
Sep  1 16:19:30 phoenix kernel: RBP: 00007ffd701d3720 R08:
0000000000000000 R09: 000000000000000c
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000572 R11:
0000000000000206 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000000000 R14:
0000000000000003 R15: 00007ffd701d4888
Sep  1 16:19:30 phoenix kernel: Code: 4c 89 f7 45 31 c0 ba 10 00 00 00
4c 89 ee e8 9a 40 ff ff 4c 89 f7 e8 42 29 fd ff e9 de fe ff ff 41 bc f4
ff ff ff e9 db fe ff ff <0f> 0b eb b7 0f 1f 40 00 66 2e 0f 1f 84 00 00
00 00 00 53 31 d2
Sep  1 16:19:30 phoenix kernel: ---[ end trace edd626af3a502d94 ]---
Sep  1 16:19:30 phoenix kernel: ------------[ cut here ]------------
Sep  1 16:19:30 phoenix kernel: WARNING: CPU: 11 PID: 6401 at
fs/btrfs/ctree.h:1564 btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: Modules linked in: nfnetlink_queue
nfnetlink_log nvidia_uvm(O) nvidia(PO) tun nfsd auth_rpcgss oid_registry
nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink xfrm_user xt_addrtype br_netfilter bridge stp llc
nf_conntrack_irc xt_CT xt_tcpudp xt_helper nf_conntrack_ftp nf_log_ipv4
nf_log_common ip6table_raw ip6table_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT
nf_reject_ipv4 iptable_raw iptable_mangle xt_multiport xt_state xt_limit
xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter
ip_tables x_tables binfmt_misc snd_hda_codec_hdmi ata_generic kvm_amd
kvm snd_hda_intel irqbypass ftdi_sio snd_hda_codec usbserial
snd_hda_core pcspkr ipmi_si snd_pcm i2c_piix4 k10temp snd_timer
pata_acpi ohci_pci
Sep  1 16:19:30 phoenix kernel:  snd e1000e nvidiafb vgastate shpchp
evdev xts crypto_simd cryptd glue_helper aes_x86_64 ixgb ixgbe tulip
cxgb3 cxgb mdio bonding vxlan ip6_udp_tunnel udp_tunnel macvlan tg3
libphy sky2 r8169 pcnet32 mii igb ptp pps_core dca e1000 bnx2 msdos fat
fscrypto configfs overlay fuse nfs lockd grace sunrpc fscache btrfs
zstd_decompress zstd_compress xxhash zlib_deflate dm_thin_pool
dm_persistent_data dm_bio_prison hid_sunplus hid_sony hid_samsung hid_pl
hid_petalynx hid_logitech_dj hid_gyration sl811_hcd usbhid xhci_pci
xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd qla2xxx
megaraid_sas megaraid aacraid sx8 DAC960 3w_9xxx 3w_xxxx mptsas
scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase
atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth BusLogic
aic7xxx aic79xx
Sep  1 16:19:30 phoenix kernel:  scsi_transport_spi sr_mod cdrom sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti
pata_amd pata_ali pata_it8213 pata_ns87415 pata_ns87410 pata_serverworks
pata_cypress pata_oldpiix pata_artop pata_it821x pata_optidma
pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix [last
unloaded: nvidia]
Sep  1 16:19:30 phoenix kernel: CPU: 11 PID: 6401 Comm: btrfs Tainted:
P        W  O    4.14.78-gentoo #1
Sep  1 16:19:30 phoenix kernel: Hardware name: System manufacturer
System Product Name/KGP(M)E-D16, BIOS 2202    03/29/2012
Sep  1 16:19:30 phoenix kernel: task: ffff880090b7c700 task.stack:
ffffc9000b394000
Sep  1 16:19:30 phoenix kernel: RIP:
0010:btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: RSP: 0018:ffffc9000b397b98 EFLAGS: 00010206
Sep  1 16:19:30 phoenix kernel: RAX: 0000000000000fff RBX:
ffff88035d1b4690 RCX: 000002baa1371e00
Sep  1 16:19:30 phoenix kernel: RDX: 0000000000001000 RSI:
ffff880000000000 RDI: ffff88031455fd20
Sep  1 16:19:30 phoenix kernel: RBP: ffff880825590400 R08:
ffffc9000b397b50 R09: ffffc9000b397b58
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000003 R11:
0000000000003000 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000003db4 R14:
ffff88031455fd20 R15: ffff880826250348
Sep  1 16:19:30 phoenix kernel: FS:  00007f16f8e4e8c0(0000)
GS:ffff88042fa80000(0000) knlGS:0000000000000000
Sep  1 16:19:30 phoenix kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  1 16:19:30 phoenix kernel: CR2: 000056068cac8ec8 CR3:
0000000438a32000 CR4: 00000000000006e0
Sep  1 16:19:30 phoenix kernel: Call Trace:
Sep  1 16:19:30 phoenix kernel:  btrfs_remove_chunk+0x2f9/0x700 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_relocate_chunk+0x9c/0xd0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_shrink_device+0x1c0/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ?
btrfs_find_device_missing_or_by_path+0x30/0x120 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_rm_device+0x19b/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_from_user+0x3f/0x80
Sep  1 16:19:30 phoenix kernel:  btrfs_ioctl+0x2129/0x2380 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_to_user+0x22/0x30
Sep  1 16:19:30 phoenix kernel:  ? cp_new_stat+0x138/0x150
Sep  1 16:19:30 phoenix kernel:  do_vfs_ioctl+0x9b/0x5e0
Sep  1 16:19:30 phoenix kernel:  ? SyS_newstat+0x35/0x40
Sep  1 16:19:30 phoenix kernel:  SyS_ioctl+0x47/0x90
Sep  1 16:19:30 phoenix kernel:  do_syscall_64+0x55/0x100
Sep  1 16:19:30 phoenix kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Sep  1 16:19:30 phoenix kernel: RIP: 0033:0x7f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RSP: 002b:00007ffd701d26e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Sep  1 16:19:30 phoenix kernel: RAX: ffffffffffffffda RBX:
00007ffd701d4880 RCX: 00007f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RDX: 00007ffd701d3720 RSI:
000000005000943a RDI: 0000000000000003
Sep  1 16:19:30 phoenix kernel: RBP: 00007ffd701d3720 R08:
0000000000000000 R09: 000000000000000c
Sep  1 16:19:30 phoenix kernel: R13: 0000000000000000 R14:
0000000000000003 R15: 00007ffd701d4888
Sep  1 16:19:30 phoenix kernel: Code: 4c 89 f7 45 31 c0 ba 10 00 00 00
4c 89 ee e8 9a 40 ff ff 4c 89 f7 e8 42 29 fd ff e9 de fe ff ff 41 bc f4
ff ff ff e9 db fe ff ff <0f> 0b eb b7 0f 1f 40 00 66 2e 0f 1f 84 00 00
00 00 00 53 31 d2
Sep  1 16:19:30 phoenix kernel: ---[ end trace edd626af3a502d95 ]---
Sep  1 16:19:30 phoenix kernel: ------------[ cut here ]------------
Sep  1 16:19:30 phoenix kernel: WARNING: CPU: 6 PID: 6401 at
fs/btrfs/ctree.h:1564 btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: Modules linked in: nfnetlink_queue
nfnetlink_log nvidia_uvm(O) nvidia(PO) tun nfsd auth_rpcgss oid_registry
nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink xfrm_user xt_addrtype br_netfilter bridge stp llc
nf_conntrack_irc xt_CT xt_tcpudp xt_helper nf_conntrack_ftp nf_log_ipv4
nf_log_common ip6table_raw ip6table_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT
nf_reject_ipv4 iptable_raw iptable_mangle xt_multiport xt_state xt_limit
xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter
ip_tables x_tables binfmt_misc snd_hda_codec_hdmi ata_generic kvm_amd
kvm snd_hda_intel irqbypass ftdi_sio snd_hda_codec usbserial
snd_hda_core pcspkr ipmi_si snd_pcm i2c_piix4 k10temp snd_timer
pata_acpi ohci_pci
Sep  1 16:19:30 phoenix kernel:  snd e1000e nvidiafb vgastate shpchp
evdev xts crypto_simd cryptd glue_helper aes_x86_64 ixgb ixgbe tulip
cxgb3 cxgb mdio bonding vxlan ip6_udp_tunnel udp_tunnel macvlan tg3
libphy sky2 r8169 pcnet32 mii igb ptp pps_core dca e1000 bnx2 msdos fat
fscrypto configfs overlay fuse nfs lockd grace sunrpc fscache btrfs
zstd_decompress zstd_compress xxhash zlib_deflate dm_thin_pool
dm_persistent_data dm_bio_prison hid_sunplus hid_sony hid_samsung hid_pl
hid_petalynx hid_logitech_dj hid_gyration sl811_hcd usbhid xhci_pci
xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd qla2xxx
megaraid_sas megaraid aacraid sx8 DAC960 3w_9xxx 3w_xxxx mptsas
scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase
atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth BusLogic
aic7xxx aic79xx
Sep  1 16:19:30 phoenix kernel:  scsi_transport_spi sr_mod cdrom sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti
pata_amd pata_ali pata_it8213 pata_ns87415 pata_ns87410 pata_serverworks
pata_cypress pata_oldpiix pata_artop pata_it821x pata_optidma
pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix [last
unloaded: nvidia]
Sep  1 16:19:30 phoenix kernel: CPU: 6 PID: 6401 Comm: btrfs Tainted:
P        W  O    4.14.78-gentoo #1
Sep  1 16:19:30 phoenix kernel: Hardware name: System manufacturer
System Product Name/KGP(M)E-D16, BIOS 2202    03/29/2012
Sep  1 16:19:30 phoenix kernel: task: ffff880090b7c700 task.stack:
ffffc9000b394000
Sep  1 16:19:30 phoenix kernel: RIP:
0010:btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: RSP: 0018:ffffc9000b397b98 EFLAGS: 00010206
Sep  1 16:19:30 phoenix kernel: RAX: 0000000000000fff RBX:
ffff88035d1b4690 RCX: 000002baa1371e00
Sep  1 16:19:30 phoenix kernel: RDX: 0000000000001000 RSI:
ffff880000000000 RDI: ffff880250a8a388
Sep  1 16:19:30 phoenix kernel: RBP: ffff880825886400 R08:
ffffc9000b397b50 R09: ffffc9000b397b58
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000003 R11:
0000000000003000 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000003f9e R14:
ffff880250a8a388 R15: ffff880826250348
Sep  1 16:19:30 phoenix kernel: FS:  00007f16f8e4e8c0(0000)
GS:ffff88042f800000(0000) knlGS:0000000000000000
Sep  1 16:19:30 phoenix kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  1 16:19:30 phoenix kernel: CR2: 00007f0c0564b180 CR3:
0000000438a32000 CR4: 00000000000006e0
Sep  1 16:19:30 phoenix kernel: Call Trace:
Sep  1 16:19:30 phoenix kernel:  btrfs_remove_chunk+0x2f9/0x700 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_relocate_chunk+0x9c/0xd0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_shrink_device+0x1c0/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ?
btrfs_find_device_missing_or_by_path+0x30/0x120 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_rm_device+0x19b/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_from_user+0x3f/0x80
Sep  1 16:19:30 phoenix kernel:  btrfs_ioctl+0x2129/0x2380 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_to_user+0x22/0x30
Sep  1 16:19:30 phoenix kernel:  ? cp_new_stat+0x138/0x150
Sep  1 16:19:30 phoenix kernel:  do_vfs_ioctl+0x9b/0x5e0
Sep  1 16:19:30 phoenix kernel:  ? SyS_newstat+0x35/0x40
Sep  1 16:19:30 phoenix kernel:  SyS_ioctl+0x47/0x90
Sep  1 16:19:30 phoenix kernel:  do_syscall_64+0x55/0x100
Sep  1 16:19:30 phoenix kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Sep  1 16:19:30 phoenix kernel: RIP: 0033:0x7f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RSP: 002b:00007ffd701d26e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Sep  1 16:19:30 phoenix kernel: RAX: ffffffffffffffda RBX:
00007ffd701d4880 RCX: 00007f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RDX: 00007ffd701d3720 RSI:
000000005000943a RDI: 0000000000000003
Sep  1 16:19:30 phoenix kernel: RBP: 00007ffd701d3720 R08:
0000000000000000 R09: 000000000000000c
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000572 R11:
0000000000000206 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000000000 R14:
0000000000000003 R15: 00007ffd701d4888
Sep  1 16:19:30 phoenix kernel: Code: 4c 89 f7 45 31 c0 ba 10 00 00 00
4c 89 ee e8 9a 40 ff ff 4c 89 f7 e8 42 29 fd ff e9 de fe ff ff 41 bc f4
ff ff ff e9 db fe ff ff <0f> 0b eb b7 0f 1f 40 00 66 2e 0f 1f 84 00 00
00 00 00 53 31 d2
Sep  1 16:19:30 phoenix kernel: ---[ end trace edd626af3a502d96 ]---
Sep  1 16:19:30 phoenix kernel: ------------[ cut here ]------------
Sep  1 16:19:30 phoenix kernel: WARNING: CPU: 7 PID: 6401 at
fs/btrfs/ctree.h:1564 btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: Modules linked in: nfnetlink_queue
nfnetlink_log nvidia_uvm(O) nvidia(PO) tun nfsd auth_rpcgss oid_registry
nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink xfrm_user xt_addrtype br_netfilter bridge stp llc
nf_conntrack_irc xt_CT xt_tcpudp xt_helper nf_conntrack_ftp nf_log_ipv4
nf_log_common ip6table_raw ip6table_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT
nf_reject_ipv4 iptable_raw iptable_mangle xt_multiport xt_state xt_limit
xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter
ip_tables x_tables binfmt_misc snd_hda_codec_hdmi ata_generic kvm_amd
kvm snd_hda_intel irqbypass ftdi_sio snd_hda_codec usbserial
snd_hda_core pcspkr ipmi_si snd_pcm i2c_piix4 k10temp snd_timer
pata_acpi ohci_pci
Sep  1 16:19:30 phoenix kernel:  snd e1000e nvidiafb vgastate shpchp
evdev xts crypto_simd cryptd glue_helper aes_x86_64 ixgb ixgbe tulip
cxgb3 cxgb mdio bonding vxlan ip6_udp_tunnel udp_tunnel macvlan tg3
libphy sky2 r8169 pcnet32 mii igb ptp pps_core dca e1000 bnx2 msdos fat
fscrypto configfs overlay fuse nfs lockd grace sunrpc fscache btrfs
zstd_decompress zstd_compress xxhash zlib_deflate dm_thin_pool
dm_persistent_data dm_bio_prison hid_sunplus hid_sony hid_samsung hid_pl
hid_petalynx hid_logitech_dj hid_gyration sl811_hcd usbhid xhci_pci
xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd qla2xxx
megaraid_sas megaraid aacraid sx8 DAC960 3w_9xxx 3w_xxxx mptsas
scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase
atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth BusLogic
aic7xxx aic79xx
Sep  1 16:19:30 phoenix kernel:  scsi_transport_spi sr_mod cdrom sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti
pata_amd pata_ali pata_it8213 pata_ns87415 pata_ns87410 pata_serverworks
pata_cypress pata_oldpiix pata_artop pata_it821x pata_optidma
pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix [last
unloaded: nvidia]
Sep  1 16:19:30 phoenix kernel: CPU: 7 PID: 6401 Comm: btrfs Tainted:
P        W  O    4.14.78-gentoo #1
Sep  1 16:19:30 phoenix kernel: Hardware name: System manufacturer
System Product Name/KGP(M)E-D16, BIOS 2202    03/29/2012
Sep  1 16:19:30 phoenix kernel: task: ffff880090b7c700 task.stack:
ffffc9000b394000
Sep  1 16:19:30 phoenix kernel: RIP:
0010:btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: RSP: 0018:ffffc9000b397b98 EFLAGS: 00010206
Sep  1 16:19:30 phoenix kernel: RAX: 0000000000000fff RBX:
ffff88035d1b4690 RCX: 000002baa1371e00
Sep  1 16:19:30 phoenix kernel: RDX: 0000000000001000 RSI:
ffff880000000000 RDI: ffff88031455fd20
Sep  1 16:19:30 phoenix kernel: RBP: ffff880825881400 R08:
ffffc9000b397b50 R09: ffffc9000b397b58
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000003 R11:
0000000000003000 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000003c8e R14:
ffff88031455fd20 R15: ffff880826250348
Sep  1 16:19:30 phoenix kernel: FS:  00007f16f8e4e8c0(0000)
GS:ffff88042f880000(0000) knlGS:0000000000000000
Sep  1 16:19:30 phoenix kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  1 16:19:30 phoenix kernel: CR2: 00007f0bfd599394 CR3:
0000000438a32000 CR4: 00000000000006e0
Sep  1 16:19:30 phoenix kernel: Call Trace:
Sep  1 16:19:30 phoenix kernel:  btrfs_remove_chunk+0x2f9/0x700 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_relocate_chunk+0x9c/0xd0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_shrink_device+0x1c0/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ?
btrfs_find_device_missing_or_by_path+0x30/0x120 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_rm_device+0x19b/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_from_user+0x3f/0x80
Sep  1 16:19:30 phoenix kernel:  btrfs_ioctl+0x2129/0x2380 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_to_user+0x22/0x30
Sep  1 16:19:30 phoenix kernel:  ? cp_new_stat+0x138/0x150
Sep  1 16:19:30 phoenix kernel:  do_vfs_ioctl+0x9b/0x5e0
Sep  1 16:19:30 phoenix kernel:  ? SyS_newstat+0x35/0x40
Sep  1 16:19:30 phoenix kernel:  SyS_ioctl+0x47/0x90
Sep  1 16:19:30 phoenix kernel:  do_syscall_64+0x55/0x100
Sep  1 16:19:30 phoenix kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Sep  1 16:19:30 phoenix kernel: RIP: 0033:0x7f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RSP: 002b:00007ffd701d26e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Sep  1 16:19:30 phoenix kernel: RAX: ffffffffffffffda RBX:
00007ffd701d4880 RCX: 00007f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RDX: 00007ffd701d3720 RSI:
000000005000943a RDI: 0000000000000003
Sep  1 16:19:30 phoenix kernel: RBP: 00007ffd701d3720 R08:
0000000000000000 R09: 000000000000000c
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000572 R11:
0000000000000206 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000000000 R14:
0000000000000003 R15: 00007ffd701d4888
Sep  1 16:19:30 phoenix kernel: Code: 4c 89 f7 45 31 c0 ba 10 00 00 00
4c 89 ee e8 9a 40 ff ff 4c 89 f7 e8 42 29 fd ff e9 de fe ff ff 41 bc f4
ff ff ff e9 db fe ff ff <0f> 0b eb b7 0f 1f 40 00 66 2e 0f 1f 84 00 00
00 00 00 53 31 d2
Sep  1 16:19:30 phoenix kernel: ---[ end trace edd626af3a502d97 ]---
Sep  1 16:19:30 phoenix kernel: ------------[ cut here ]------------
Sep  1 16:19:30 phoenix kernel: WARNING: CPU: 8 PID: 6401 at
fs/btrfs/ctree.h:1564 btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: Modules linked in: nfnetlink_queue
nfnetlink_log nvidia_uvm(O) nvidia(PO) tun nfsd auth_rpcgss oid_registry
nfs_acl ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink xfrm_user xt_addrtype br_netfilter bridge stp llc
nf_conntrack_irc xt_CT xt_tcpudp xt_helper nf_conntrack_ftp nf_log_ipv4
nf_log_common ip6table_raw ip6table_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT
nf_reject_ipv4 iptable_raw iptable_mangle xt_multiport xt_state xt_limit
xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter
ip_tables x_tables binfmt_misc snd_hda_codec_hdmi ata_generic kvm_amd
kvm snd_hda_intel irqbypass ftdi_sio snd_hda_codec usbserial
snd_hda_core pcspkr ipmi_si snd_pcm i2c_piix4 k10temp snd_timer
pata_acpi ohci_pci
Sep  1 16:19:30 phoenix kernel:  snd e1000e nvidiafb vgastate shpchp
evdev xts crypto_simd cryptd glue_helper aes_x86_64 ixgb ixgbe tulip
cxgb3 cxgb mdio bonding vxlan ip6_udp_tunnel udp_tunnel macvlan tg3
libphy sky2 r8169 pcnet32 mii igb ptp pps_core dca e1000 bnx2 msdos fat
fscrypto configfs overlay fuse nfs lockd grace sunrpc fscache btrfs
zstd_decompress zstd_compress xxhash zlib_deflate dm_thin_pool
dm_persistent_data dm_bio_prison hid_sunplus hid_sony hid_samsung hid_pl
hid_petalynx hid_logitech_dj hid_gyration sl811_hcd usbhid xhci_pci
xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd qla2xxx
megaraid_sas megaraid aacraid sx8 DAC960 3w_9xxx 3w_xxxx mptsas
scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase
atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth BusLogic
aic7xxx aic79xx
Sep  1 16:19:30 phoenix kernel:  scsi_transport_spi sr_mod cdrom sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti
pata_amd pata_ali pata_it8213 pata_ns87415 pata_ns87410 pata_serverworks
pata_cypress pata_oldpiix pata_artop pata_it821x pata_optidma
pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix [last
unloaded: nvidia]
Sep  1 16:19:30 phoenix kernel: CPU: 8 PID: 6401 Comm: btrfs Tainted:
P        W  O    4.14.78-gentoo #1
Sep  1 16:19:30 phoenix kernel: Hardware name: System manufacturer
System Product Name/KGP(M)E-D16, BIOS 2202    03/29/2012
Sep  1 16:19:30 phoenix kernel: task: ffff880090b7c700 task.stack:
ffffc9000b394000
Sep  1 16:19:30 phoenix kernel: RIP:
0010:btrfs_update_device+0x1ae/0x1c0 [btrfs]
Sep  1 16:19:30 phoenix kernel: RSP: 0018:ffffc9000b397b98 EFLAGS: 00010206
Sep  1 16:19:30 phoenix kernel: RAX: 0000000000000fff RBX:
ffff88035d1b4690 RCX: 000002baa1371e00
Sep  1 16:19:30 phoenix kernel: RDX: 0000000000001000 RSI:
ffff880000000000 RDI: ffff88031455fd20
Sep  1 16:19:30 phoenix kernel: RBP: ffff88022bb44c00 R08:
ffffc9000b397b50 R09: ffffc9000b397b58
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000003 R11:
0000000000003000 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000003d52 R14:
ffff88031455fd20 R15: ffff880826250348
Sep  1 16:19:30 phoenix kernel: FS:  00007f16f8e4e8c0(0000)
GS:ffff88042f900000(0000) knlGS:0000000000000000
Sep  1 16:19:30 phoenix kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  1 16:19:30 phoenix kernel: CR2: 00007f5854176078 CR3:
0000000438a32000 CR4: 00000000000006e0
Sep  1 16:19:30 phoenix kernel: Call Trace:
Sep  1 16:19:30 phoenix kernel:  btrfs_remove_chunk+0x2f9/0x700 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_relocate_chunk+0x9c/0xd0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_shrink_device+0x1c0/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ?
btrfs_find_device_missing_or_by_path+0x30/0x120 [btrfs]
Sep  1 16:19:30 phoenix kernel:  btrfs_rm_device+0x19b/0x4f0 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_from_user+0x3f/0x80
Sep  1 16:19:30 phoenix kernel:  btrfs_ioctl+0x2129/0x2380 [btrfs]
Sep  1 16:19:30 phoenix kernel:  ? _copy_to_user+0x22/0x30
Sep  1 16:19:30 phoenix kernel:  ? cp_new_stat+0x138/0x150
Sep  1 16:19:30 phoenix kernel:  do_vfs_ioctl+0x9b/0x5e0
Sep  1 16:19:30 phoenix kernel:  ? SyS_newstat+0x35/0x40
Sep  1 16:19:30 phoenix kernel:  SyS_ioctl+0x47/0x90
Sep  1 16:19:30 phoenix kernel:  do_syscall_64+0x55/0x100
Sep  1 16:19:30 phoenix kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Sep  1 16:19:30 phoenix kernel: RIP: 0033:0x7f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RSP: 002b:00007ffd701d26e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Sep  1 16:19:30 phoenix kernel: RAX: ffffffffffffffda RBX:
00007ffd701d4880 RCX: 00007f16f7be80f7
Sep  1 16:19:30 phoenix kernel: RDX: 00007ffd701d3720 RSI:
000000005000943a RDI: 0000000000000003
Sep  1 16:19:30 phoenix kernel: RBP: 00007ffd701d3720 R08:
0000000000000000 R09: 000000000000000c
Sep  1 16:19:30 phoenix kernel: R10: 0000000000000572 R11:
0000000000000206 R12: 0000000000000000
Sep  1 16:19:30 phoenix kernel: R13: 0000000000000000 R14:
0000000000000003 R15: 00007ffd701d4888
Sep  1 16:19:30 phoenix kernel: Code: 4c 89 f7 45 31 c0 ba 10 00 00 00
4c 89 ee e8 9a 40 ff ff 4c 89 f7 e8 42 29 fd ff e9 de fe ff ff 41 bc f4
ff ff ff e9 db fe ff ff <0f> 0b eb b7 0f 1f 40 00 66 2e 0f 1f 84 00 00
00 00 00 53 31 d2
Sep  1 16:19:30 phoenix kernel: ---[ end trace edd626af3a502d98 ]---
Sep  1 16:19:30 phoenix kernel: BTRFS info (device sdb1): relocating
block group 71262403559424 flags data|raid6

There are more of this sort with other blocks groups to be found in the log.

Also there are a few of these:
Sep  1 21:10:17 phoenix kernel: ata6.00: exception Emask 0x0 SAct
0x10000020 SErr 0x0 action 0x0
Sep  1 21:10:17 phoenix kernel: ata6.00: irq_stat 0x40000008
Sep  1 21:10:17 phoenix kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep  1 21:10:17 phoenix kernel: ata6.00: cmd
60/20:28:80:66:09/00:00:50:01:00/40 tag 5 ncq dma 16384 in\x0a        
res 41/40:00:88:66:09/00:00:50:01:00/40 Emask 0x409 (media error) <F>
Sep  1 21:10:17 phoenix kernel: ata6.00: status: { DRDY ERR }
Sep  1 21:10:17 phoenix kernel: ata6.00: error: { UNC }
Sep  1 21:10:17 phoenix kernel: ata6.00: configured for UDMA/133
Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 UNKNOWN(0x2003)
Result: hostbyte=0x00 driverbyte=0x08
Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 Sense Key : 0x3
[current]
Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 ASC=0x11 ASCQ=0x4
Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 CDB: opcode=0x88
88 00 00 00 00 01 50 09 66 80 00 00 00 20 00 00
Sep  1 21:10:17 phoenix kernel: print_req_error: I/O error, dev sdf,
sector 5637760640
Sep  1 21:10:17 phoenix kernel: BTRFS error (device sdb1): bdev
/dev/sdf1 errs: wr 0, rd 289, flush 0, corrupt 0, gen 0
Sep  1 21:10:17 phoenix kernel: ata6: EH complete
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0
Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
level 0

So, yes /dev/sdf is also causing problems. I was going to replace that
one next. I still have the old /dev/sdb lying around BTW. It has not
completely failed yet, it's just not installed at the moment.

I am still looking for log entries related to the filesystem going
read-only. Not sure when exactly that happened and the logs are spammed
with plenty of the above...
>
>> Linux phoenix 4.14.78-gentoo #1 SMP Mon Dec 3 09:25:24 CET 2018 x86_64
> kernel 4.14.141 is the current version LTS for that series, and there
> are hundreds of bug fix insertions/removals between just those two
> versions
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v4.14.141&id2=v4.14.78&dt=2
>
> between kernel 4.14.141 and 5.2.11, there are thousands of changes
> just in Btrfs... thousands
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v5.2.11&id2=v4.14.141&dt=2
>
> And quite a few in raid56.c which isn't that big to begin with, but
> there are a lot of simplifications and improvements from what I can
> tell
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/fs/btrfs/raid56.c?id=v5.2.11&id2=v4.14.141
>
> Anyway, it's worth a try to try and mount with 5.2.11 using '-o
> ro,degraded' and at least see if it will mount. But it gives you some
> idea why there's a strong bias toward using newer kernels. It's too
> hard to remember all the changes, even for developers.
The latest kernel version in the gentoo tree is 5.2.9. I am compiling
that now...

>
>
>> AMD Opteron(tm) Processor 6174 AuthenticAMD GNU/Linux
>>
>> *****
>> btrfs --version
>>
>> btrfs-progs v4.19
> This is OK, but the change log will show lots of bug fixes here too. I
> wouldn't make changes (no repair attempts at all, including chunk
> recover or --repair) until you get some dev advice about the next
> step.
I already tried --repair as well, but it would not do anything anyway in
the filesystem's current state.
>
>
>
>
>> [ 8904.358084] BTRFS info (device sda1): turning on discard
> Unexpected.
I had still that in fstab for some reason. Not sure why/when I added
that. These are not SSDs.
>
>> [ 8904.358088] BTRFS info (device sda1): allowing degraded mounts
>> [ 8904.358089] BTRFS info (device sda1): disk space caching is enabled
>> [ 8904.358091] BTRFS info (device sda1): has skinny extents
>> [ 8904.361743] BTRFS warning (device sda1): devid 8 uuid
>> 0e8b4aff-6d64-4d31-a135-705421928f94 is missing
>> [ 8905.705036] BTRFS info (device sda1): bdev (null) errs: wr 0, rd
>> 14809, flush 0, corrupt 4, gen 0
>> [ 8905.705041] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
>> 4, flush 0, corrupt 0, gen 0
>> [ 8905.705052] BTRFS info (device sda1): bdev /dev/sdf1 errs: wr 0, rd
>> 10543, flush 0, corrupt 0, gen 0
>> [ 8905.705062] BTRFS info (device sda1): bdev /dev/sdc1 errs: wr 0, rd
>> 8, flush 0, corrupt 0, gen 0
> four devices with read errors
>
> When was the last time the volume was scrubbed? Do you know for sure
> these errors have not gone up at all since the last successful scrub?
> And were any errors reported for that last scrub?
Oh, that must have been quite a while ago. Sometime in 2018? Maybe? All
these drives have been up and running for several years now. sda and sdc
should still be fine, the replaced drive is sdb and sdf is next in line.
>
>
>> I have tried all the mount / restore options listed here:
>> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
> Good. Stick with ro attempts for now. Including if you want to try a
> newer kernel. If it succeeds to mount ro, my advice is to update
> backups so at least critical information isn't lost. Back up while you
> can. Any repair attempt makes changes that will risk the data being
> permanently lost. So it's important to be really deliberate about any
> changes.
I'll let you know, when I have the new kernel up and running.
>
>
>> ... and all I keep getting is "bad tree block" errors. Superblocks seem
>> fine (btrfs rescue super-reecover found no problem). I am considering
>> trying "btrfs rescue chunk-recover" at this point.
>>
>> Could this help in my situation? What do you think?
> I'm not sure if chunk recover can work on degraded volumes. Your best
> bet is to not make any further changes to the volume itself.
>
> Preserve all logs.
>
ok




-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-04  4:39   ` Edmund Urbani
@ 2019-09-04  5:36     ` Chris Murphy
  2019-09-04  6:18       ` Edmund Urbani
  2019-09-05 19:17       ` Edmund Urbani
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Murphy @ 2019-09-04  5:36 UTC (permalink / raw)
  To: Edmund Urbani; +Cc: Btrfs BTRFS

On Tue, Sep 3, 2019 at 10:41 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>
> Also there are a few of these:
> Sep  1 21:10:17 phoenix kernel: ata6.00: exception Emask 0x0 SAct
> 0x10000020 SErr 0x0 action 0x0
> Sep  1 21:10:17 phoenix kernel: ata6.00: irq_stat 0x40000008
> Sep  1 21:10:17 phoenix kernel: ata6.00: failed command: READ FPDMA QUEUED
> Sep  1 21:10:17 phoenix kernel: ata6.00: cmd
> 60/20:28:80:66:09/00:00:50:01:00/40 tag 5 ncq dma 16384 in\x0a
> res 41/40:00:88:66:09/00:00:50:01:00/40 Emask 0x409 (media error) <F>
> Sep  1 21:10:17 phoenix kernel: ata6.00: status: { DRDY ERR }
> Sep  1 21:10:17 phoenix kernel: ata6.00: error: { UNC }
> Sep  1 21:10:17 phoenix kernel: ata6.00: configured for UDMA/133
> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 UNKNOWN(0x2003)
> Result: hostbyte=0x00 driverbyte=0x08
> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 Sense Key : 0x3
> [current]
> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 ASC=0x11 ASCQ=0x4
> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 CDB: opcode=0x88
> 88 00 00 00 00 01 50 09 66 80 00 00 00 20 00 00
> Sep  1 21:10:17 phoenix kernel: print_req_error: I/O error, dev sdf,
> sector 5637760640
> Sep  1 21:10:17 phoenix kernel: BTRFS error (device sdb1): bdev
> /dev/sdf1 errs: wr 0, rd 289, flush 0, corrupt 0, gen 0
> Sep  1 21:10:17 phoenix kernel: ata6: EH complete
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0
> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
> level 0

OK so the file system is not degraded, but sdb1 is giving you
problems, so you've deleted it and its in the process of being removed
(fs shrink, move chunks, and restripe).

Here /dev/sdf has  issued an uncorrectable read error. Classic case of
bad sector. And btrfs is trying to get data off sdb1 to try and fix
it, but this fails with checksum errors multiple times. So basically
it is a two device failure for the stripe currently being read. It
should still be possible to recover the stripe unless there is one
more error from another drive - but the included dmesg doesn't go on
far enough to tell us how this event turned out.


> I am still looking for log entries related to the filesystem going
> read-only. Not sure when exactly that happened and the logs are spammed
> with plenty of the above...

They're relevant because if there's a third failure at the same time,
and if it affects metadata, reconstruction isn't possible, the
metadata is missing. So then it's, what's missing and can it be
manually reconstructed. It's super tedious.





> >> [ 8904.358088] BTRFS info (device sda1): allowing degraded mounts
> >> [ 8904.358089] BTRFS info (device sda1): disk space caching is enabled
> >> [ 8904.358091] BTRFS info (device sda1): has skinny extents
> >> [ 8904.361743] BTRFS warning (device sda1): devid 8 uuid
> >> 0e8b4aff-6d64-4d31-a135-705421928f94 is missing
> >> [ 8905.705036] BTRFS info (device sda1): bdev (null) errs: wr 0, rd
> >> 14809, flush 0, corrupt 4, gen 0
> >> [ 8905.705041] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
> >> 4, flush 0, corrupt 0, gen 0
> >> [ 8905.705052] BTRFS info (device sda1): bdev /dev/sdf1 errs: wr 0, rd
> >> 10543, flush 0, corrupt 0, gen 0
> >> [ 8905.705062] BTRFS info (device sda1): bdev /dev/sdc1 errs: wr 0, rd
> >> 8, flush 0, corrupt 0, gen 0
> > four devices with read errors
> >
> > When was the last time the volume was scrubbed? Do you know for sure
> > these errors have not gone up at all since the last successful scrub?
> > And were any errors reported for that last scrub?
> Oh, that must have been quite a while ago. Sometime in 2018? Maybe? All
> these drives have been up and running for several years now. sda and sdc
> should still be fine, the replaced drive is sdb and sdf is next in line.

There's evidence of four drives with problems at some point in time.
And there's evidence in the kernel messages above of at least two
problems at the same time with the same stripe. So, all it takes is
one more problem with that stripe, and then that stripe can't be
recovered - and if it's a metadata stripe? That's 512KiB of metadata
lost, which is quite a lot, it probably kills the file system,
depending on where it happens. If it's data - no big deal. Btrfs won't
even care, it will just report EIO and the path to the bad file, and
continue on.

The whole point of regular scrubs is to prevent single sector
corruptions and failures. If you don't do that, they can accumulate
over time and then it's a huge problem when just one drive dies. So
when did you last do a scrub? Are they all the same make model drive?
Do they all have the same SCT ERC value? And is that value, for all
drives, less than the value found at /sys/block/sdN/device/timeout ?



> >
> >
> >> I have tried all the mount / restore options listed here:
> >> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
> > Good. Stick with ro attempts for now. Including if you want to try a
> > newer kernel. If it succeeds to mount ro, my advice is to update
> > backups so at least critical information isn't lost. Back up while you
> > can. Any repair attempt makes changes that will risk the data being
> > permanently lost. So it's important to be really deliberate about any
> > changes.
> I'll let you know, when I have the new kernel up and running.

I think you should have all the original drives installed, and try to
mount -o ro first. And if that doesn't work, try -o ro,degraded, and
then we'll just have to see which drive it doesn't like.



--
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-04  5:36     ` Chris Murphy
@ 2019-09-04  6:18       ` Edmund Urbani
  2019-09-04 12:35         ` Piotr Szymaniak
  2019-09-05 19:17       ` Edmund Urbani
  1 sibling, 1 reply; 12+ messages in thread
From: Edmund Urbani @ 2019-09-04  6:18 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



On 09/04/2019 07:36 AM, Chris Murphy wrote:
> On Tue, Sep 3, 2019 at 10:41 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>> Also there are a few of these:
>> Sep  1 21:10:17 phoenix kernel: ata6.00: exception Emask 0x0 SAct
>> 0x10000020 SErr 0x0 action 0x0
>> Sep  1 21:10:17 phoenix kernel: ata6.00: irq_stat 0x40000008
>> Sep  1 21:10:17 phoenix kernel: ata6.00: failed command: READ FPDMA QUEUED
>> Sep  1 21:10:17 phoenix kernel: ata6.00: cmd
>> 60/20:28:80:66:09/00:00:50:01:00/40 tag 5 ncq dma 16384 in\x0a
>> res 41/40:00:88:66:09/00:00:50:01:00/40 Emask 0x409 (media error) <F>
>> Sep  1 21:10:17 phoenix kernel: ata6.00: status: { DRDY ERR }
>> Sep  1 21:10:17 phoenix kernel: ata6.00: error: { UNC }
>> Sep  1 21:10:17 phoenix kernel: ata6.00: configured for UDMA/133
>> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 UNKNOWN(0x2003)
>> Result: hostbyte=0x00 driverbyte=0x08
>> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 Sense Key : 0x3
>> [current]
>> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 ASC=0x11 ASCQ=0x4
>> Sep  1 21:10:17 phoenix kernel: sd 5:0:0:0: [sdf] tag#5 CDB: opcode=0x88
>> 88 00 00 00 00 01 50 09 66 80 00 00 00 20 00 00
>> Sep  1 21:10:17 phoenix kernel: print_req_error: I/O error, dev sdf,
>> sector 5637760640
>> Sep  1 21:10:17 phoenix kernel: BTRFS error (device sdb1): bdev
>> /dev/sdf1 errs: wr 0, rd 289, flush 0, corrupt 0, gen 0
>> Sep  1 21:10:17 phoenix kernel: ata6: EH complete
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
>> Sep  1 21:10:17 phoenix kernel: BTRFS warning (device sdb1): sdb1
>> checksum verify failed on 70943861833728 wanted 49137758 found 776101D6
>> level 0
> OK so the file system is not degraded, but sdb1 is giving you
> problems, so you've deleted it and its in the process of being removed
> (fs shrink, move chunks, and restripe).
I am about to install the old sdb in another system to try ddrescue on
it and see what I can salvage.
>
> Here /dev/sdf has  issued an uncorrectable read error. Classic case of
> bad sector. And btrfs is trying to get data off sdb1 to try and fix
> it, but this fails with checksum errors multiple times. So basically
> it is a two device failure for the stripe currently being read. It
> should still be possible to recover the stripe unless there is one
> more error from another drive - but the included dmesg doesn't go on
> far enough to tell us how this event turned out.
I suspect that whatever is in sector 5637760640 on sdf is metadata. I
grepped the log for it and that bad sector pops up 10000+ times.


>
>
>> I am still looking for log entries related to the filesystem going
>> read-only. Not sure when exactly that happened and the logs are spammed
>> with plenty of the above...
> They're relevant because if there's a third failure at the same time,
> and if it affects metadata, reconstruction isn't possible, the
> metadata is missing. So then it's, what's missing and can it be
> manually reconstructed. It's super tedious.
>
apparently, this is where it went read-only:

Sep  3 00:31:31 phoenix kernel: ata1.01: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x0
Sep  3 00:31:31 phoenix kernel: ata1.01: BMDMA stat 0x64
Sep  3 00:31:31 phoenix kernel: ata1.01: failed command: READ DMA EXT
Sep  3 00:31:31 phoenix kernel: ata1.01: cmd
25/00:20:80:72:5a/00:00:a9:00:00/f0 tag 0 dma 16384 in\x0a         res
51/40:20:80:72:5a/40:00:a9:00:00/f0 Emask 0x9 (media error)
Sep  3 00:31:31 phoenix kernel: ata1.01: status: { DRDY ERR }
Sep  3 00:31:31 phoenix kernel: ata1.01: error: { UNC }
Sep  3 00:31:31 phoenix kernel: ata1.00: configured for UDMA/100
Sep  3 00:31:31 phoenix kernel: ata1.01: configured for UDMA/100
Sep  3 00:31:31 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 UNKNOWN(0x2003)
Result: hostbyte=0x00 driverbyte=0x08
Sep  3 00:31:31 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 Sense Key : 0x3
[current]
Sep  3 00:31:31 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 ASC=0x11 ASCQ=0x4
Sep  3 00:31:31 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 CDB: opcode=0x88
88 00 00 00 00 00 a9 5a 72 80 00 00 00 20 00 00
Sep  3 00:31:31 phoenix kernel: print_req_error: I/O error, dev sdb,
sector 2841277056
Sep  3 00:31:31 phoenix kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 14786, flush 0, corrupt 0, gen 0
Sep  3 00:31:31 phoenix kernel: ata1: EH complete
Sep  3 00:31:31 phoenix kernel: BTRFS info (device sdb1): read error
corrected: ino 0 off 34958490861568 (dev /dev/sdb1 sector 2841275008)
Sep  3 00:31:31 phoenix kernel: BTRFS info (device sdb1): read error
corrected: ino 0 off 34958490865664 (dev /dev/sdb1 sector 2841275016)
Sep  3 00:31:31 phoenix kernel: BTRFS info (device sdb1): read error
corrected: ino 0 off 34958490869760 (dev /dev/sdb1 sector 2841275024)
Sep  3 00:31:31 phoenix kernel: BTRFS info (device sdb1): read error
corrected: ino 0 off 34958490873856 (dev /dev/sdb1 sector 2841275032)
Sep  3 00:31:39 phoenix kernel: ata1.01: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x0
Sep  3 00:31:39 phoenix kernel: ata1.01: BMDMA stat 0x64
Sep  3 00:31:39 phoenix kernel: ata1.01: failed command: READ DMA EXT
Sep  3 00:31:39 phoenix kernel: ata1.01: cmd
25/00:20:60:74:5b/00:00:a9:00:00/f0 tag 0 dma 16384 in\x0a         res
51/40:20:60:74:5b/40:00:a9:00:00/f0 Emask 0x9 (media error)
Sep  3 00:31:39 phoenix kernel: ata1.01: status: { DRDY ERR }
Sep  3 00:31:39 phoenix kernel: ata1.01: error: { UNC }
Sep  3 00:31:39 phoenix kernel: ata1.00: configured for UDMA/100
Sep  3 00:31:39 phoenix kernel: ata1.01: configured for UDMA/100
Sep  3 00:31:39 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 UNKNOWN(0x2003)
Result: hostbyte=0x00 driverbyte=0x08
Sep  3 00:31:39 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 Sense Key : 0x3
[current]
Sep  3 00:31:39 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 ASC=0x11 ASCQ=0x4
Sep  3 00:31:39 phoenix kernel: sd 0:0:1:0: [sdb] tag#0 CDB: opcode=0x88
88 00 00 00 00 00 a9 5b 74 60 00 00 00 20 00 00
Sep  3 00:31:39 phoenix kernel: print_req_error: I/O error, dev sdb,
sector 2841343072
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 14787, flush 0, corrupt 0, gen 0
Sep  3 00:31:39 phoenix kernel: ata1: EH complete
Sep  3 00:31:39 phoenix kernel: btree_readpage_end_io_hook: 8 callbacks
suppressed
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): bad tree
block start 3417494780583899951 34958760591360
Sep  3 00:31:39 phoenix kernel: BTRFS: error (device sdb1) in
__btrfs_free_extent:7084: errno=-5 IO failure
Sep  3 00:31:39 phoenix kernel: BTRFS info (device sdb1): forced readonly
Sep  3 00:31:39 phoenix kernel: BTRFS: error (device sdb1) in
btrfs_run_delayed_refs:3089: errno=-5 IO failure
Sep  3 00:31:39 phoenix kernel: BTRFS error (device sdb1): pending csums
is 24776704
>
>
>
>>>> [ 8904.358088] BTRFS info (device sda1): allowing degraded mounts
>>>> [ 8904.358089] BTRFS info (device sda1): disk space caching is enabled
>>>> [ 8904.358091] BTRFS info (device sda1): has skinny extents
>>>> [ 8904.361743] BTRFS warning (device sda1): devid 8 uuid
>>>> 0e8b4aff-6d64-4d31-a135-705421928f94 is missing
>>>> [ 8905.705036] BTRFS info (device sda1): bdev (null) errs: wr 0, rd
>>>> 14809, flush 0, corrupt 4, gen 0
>>>> [ 8905.705041] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
>>>> 4, flush 0, corrupt 0, gen 0
>>>> [ 8905.705052] BTRFS info (device sda1): bdev /dev/sdf1 errs: wr 0, rd
>>>> 10543, flush 0, corrupt 0, gen 0
>>>> [ 8905.705062] BTRFS info (device sda1): bdev /dev/sdc1 errs: wr 0, rd
>>>> 8, flush 0, corrupt 0, gen 0
>>> four devices with read errors
>>>
>>> When was the last time the volume was scrubbed? Do you know for sure
>>> these errors have not gone up at all since the last successful scrub?
>>> And were any errors reported for that last scrub?
>> Oh, that must have been quite a while ago. Sometime in 2018? Maybe? All
>> these drives have been up and running for several years now. sda and sdc
>> should still be fine, the replaced drive is sdb and sdf is next in line.
> There's evidence of four drives with problems at some point in time.
> And there's evidence in the kernel messages above of at least two
> problems at the same time with the same stripe. So, all it takes is
> one more problem with that stripe, and then that stripe can't be
> recovered - and if it's a metadata stripe? That's 512KiB of metadata
> lost, which is quite a lot, it probably kills the file system,
> depending on where it happens. If it's data - no big deal. Btrfs won't
> even care, it will just report EIO and the path to the bad file, and
> continue on.
>
> The whole point of regular scrubs is to prevent single sector
> corruptions and failures. If you don't do that, they can accumulate
> over time and then it's a huge problem when just one drive dies. So
> when did you last do a scrub? Are they all the same make model drive?
> Do they all have the same SCT ERC value? And is that value, for all
> drives, less than the value found at /sys/block/sdN/device/timeout ?
I'll make sure to setup a cron job for monthly scrubbing once this is
over. I really can't remember exactly when I last did a manual scrub.

They are all WD Red 3TB, though not all the same generation.

smartctl reports these models:
Device Model:     WDC WD30EFRX-68AX9N0
Device Model:     WDC WD30EFRX-68EUZN0

The timeout under /sys is 30 for all of them.

I am not sure what you mean by SCT ERC value and where to look for that.
Here's all the info smartctl gives me about sda:

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.2.9-gentoo] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T1311633
LU WWN Device Id: 5 0014ee 602ee2554
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Sep  4 06:04:20 2019 -00
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection:
Disabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                        without error or no self-test
has ever
                                        been run.
Total time to complete Offline
data collection:                (40320) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 404) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control
supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
  3 Spin_Up_Time            0x0027   178   177   021    Pre-fail 
Always       -       6058
  4 Start_Stop_Count        0x0032   100   100   000    Old_age  
Always       -       61
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age  
Always       -       0
  9 Power_On_Hours          0x0032   020   020   000    Old_age  
Always       -       58487
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age  
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age  
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age  
Always       -       61
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age  
Always       -       43
193 Load_Cycle_Count        0x0032   200   200   000    Old_age  
Always       -       17
194 Temperature_Celsius     0x0022   110   098   000    Old_age  
Always       -       40
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age  
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age  
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age  
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age  
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age  
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       
13         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



>
>
>
>>>
>>>> I have tried all the mount / restore options listed here:
>>>> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
>>> Good. Stick with ro attempts for now. Including if you want to try a
>>> newer kernel. If it succeeds to mount ro, my advice is to update
>>> backups so at least critical information isn't lost. Back up while you
>>> can. Any repair attempt makes changes that will risk the data being
>>> permanently lost. So it's important to be really deliberate about any
>>> changes.
>> I'll let you know, when I have the new kernel up and running.
> I think you should have all the original drives installed, and try to
> mount -o ro first. And if that doesn't work, try -o ro,degraded, and
> then we'll just have to see which drive it doesn't like.
>
>
>
> --
> Chris Murphy
Well, I already tried mounting with 5.2.9 in the meantime without
success (without the original sdb drive). It still does not mount (mount
-o ro,degraded /dev/sda1 /mnt/shared/):

[  209.459309] BTRFS info (device sdg1): allowing degraded mounts
[  209.459313] BTRFS info (device sdg1): disk space caching is enabled
[  209.459314] BTRFS info (device sdg1): has skinny extents
[  209.461246] BTRFS warning (device sdg1): devid 8 uuid
0e8b4aff-6d64-4d31-a135-705421928f94 is missing
[  209.544603] BTRFS warning (device sdg1): devid 8 uuid
0e8b4aff-6d64-4d31-a135-705421928f94 is missing
[  211.401375] BTRFS info (device sdg1): bdev (efault) errs: wr 0, rd
14809, flush 0, corrupt 4, gen 0
[  211.401388] BTRFS info (device sdg1): bdev /dev/sdf1 errs: wr 0, rd
10543, flush 0, corrupt 0, gen 0
[  211.401391] BTRFS info (device sdg1): bdev /dev/sda1 errs: wr 0, rd
4, flush 0, corrupt 0, gen 0
[  211.401394] BTRFS info (device sdg1): bdev /dev/sdc1 errs: wr 0, rd
8, flush 0, corrupt 0, gen 0
[  215.381805] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.382603] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.386155] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.389539] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.393053] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.395223] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.397188] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.399307] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.402410] BTRFS error (device sdg1): bad tree block start, want
34958581399552 have 12170572967447269873
[  215.402447] BTRFS error (device sdg1): failed to read block groups: -5
[  215.844527] BTRFS error (device sdg1): open_ctree failed


I'm planning to install the copy of sdb once ddrescue is done.

Regards,
 Edmund

-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-04  6:18       ` Edmund Urbani
@ 2019-09-04 12:35         ` Piotr Szymaniak
  2019-09-04 14:04           ` Edmund Urbani
  0 siblings, 1 reply; 12+ messages in thread
From: Piotr Szymaniak @ 2019-09-04 12:35 UTC (permalink / raw)
  To: Edmund Urbani; +Cc: Chris Murphy, Btrfs BTRFS

On Wed, Sep 04, 2019 at 08:18:09AM +0200, Edmund Urbani wrote:
> *snip*
> The timeout under /sys is 30 for all of them.
> 
> I am not sure what you mean by SCT ERC value and where to look for that.
> Here's all the info smartctl gives me about sda:

Try:
smartctl -l scterc /dev/ice

ie. one of my drives (also WD Red) outputs:
$ smartctl -l scterc /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.62] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

SCT ERC value should be lower then the value in /sys.


Best regards,
Piotr Szymaniak.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-04 12:35         ` Piotr Szymaniak
@ 2019-09-04 14:04           ` Edmund Urbani
  0 siblings, 0 replies; 12+ messages in thread
From: Edmund Urbani @ 2019-09-04 14:04 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: Chris Murphy, Btrfs BTRFS

On 9/4/19 2:35 PM, Piotr Szymaniak wrote:
> On Wed, Sep 04, 2019 at 08:18:09AM +0200, Edmund Urbani wrote:
>> *snip*
>> The timeout under /sys is 30 for all of them.
>>
>> I am not sure what you mean by SCT ERC value and where to look for that.
>> Here's all the info smartctl gives me about sda:
> Try:
> smartctl -l scterc /dev/ice
>
> ie. one of my drives (also WD Red) outputs:
> $ smartctl -l scterc /dev/sdb
> smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.62] (local build)
> Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)
>
> SCT ERC value should be lower then the value in /sys.
>
>
> Best regards,
> Piotr Szymaniak.

Ok, thanks. Mine also all have 7 seconds set.

Kind regards,
 Edmund

-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-04  5:36     ` Chris Murphy
  2019-09-04  6:18       ` Edmund Urbani
@ 2019-09-05 19:17       ` Edmund Urbani
  2019-09-05 19:57         ` Chris Murphy
  1 sibling, 1 reply; 12+ messages in thread
From: Edmund Urbani @ 2019-09-05 19:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS


On 04.09.2019 07:36, Chris Murphy wrote:
>
>>>
>>>> I have tried all the mount / restore options listed here:
>>>> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
>>> Good. Stick with ro attempts for now. Including if you want to try a
>>> newer kernel. If it succeeds to mount ro, my advice is to update
>>> backups so at least critical information isn't lost. Back up while you
>>> can. Any repair attempt makes changes that will risk the data being
>>> permanently lost. So it's important to be really deliberate about any
>>> changes.
>> I'll let you know, when I have the new kernel up and running.
> I think you should have all the original drives installed, and try to
> mount -o ro first. And if that doesn't work, try -o ro,degraded, and
> then we'll just have to see which drive it doesn't like.

Things are finally looking up. I have replaced both sdb and sdf with 
ddrescue'd copies. sdb had some 10MB bad sectors and sdf 8KB which could 
not be recovered.

I am now able to mount the volume again. :)

btrfsck /dev/sda1

Opening filesystem to check...
Checking filesystem on /dev/sda1
UUID: 108df6ea-2846-4a88-8a50-61aedeef92b4
[1/7] checking root items
checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
parent transid verify failed on 34958760591360 wanted 3331734 found 1544337
checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
bad tree block 34958760591360, bytenr mismatch, want=34958760591360, 
have=27967614209536
ERROR: failed to repair root items: Input/output error

Anyway, I am about to mount it read-only again to try and backup a few 
things. And once I am done with that, should I run btrfs scrub?

Kind regards,
  Edmund



-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-05 19:17       ` Edmund Urbani
@ 2019-09-05 19:57         ` Chris Murphy
  2019-09-05 20:44           ` Edmund Urbani
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2019-09-05 19:57 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Edmund Urbani, Qu Wenruo

On Thu, Sep 5, 2019 at 1:18 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>
>
> On 04.09.2019 07:36, Chris Murphy wrote:
> >
> >>>
> >>>> I have tried all the mount / restore options listed here:
> >>>> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
> >>> Good. Stick with ro attempts for now. Including if you want to try a
> >>> newer kernel. If it succeeds to mount ro, my advice is to update
> >>> backups so at least critical information isn't lost. Back up while you
> >>> can. Any repair attempt makes changes that will risk the data being
> >>> permanently lost. So it's important to be really deliberate about any
> >>> changes.
> >> I'll let you know, when I have the new kernel up and running.
> > I think you should have all the original drives installed, and try to
> > mount -o ro first. And if that doesn't work, try -o ro,degraded, and
> > then we'll just have to see which drive it doesn't like.
>
> Things are finally looking up. I have replaced both sdb and sdf with
> ddrescue'd copies. sdb had some 10MB bad sectors and sdf 8KB which could
> not be recovered.
>
> I am now able to mount the volume again. :)
>
> btrfsck /dev/sda1
>
> Opening filesystem to check...
> Checking filesystem on /dev/sda1
> UUID: 108df6ea-2846-4a88-8a50-61aedeef92b4
> [1/7] checking root items
> checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
> checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
> parent transid verify failed on 34958760591360 wanted 3331734 found 1544337
> checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
> checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
> bad tree block 34958760591360, bytenr mismatch, want=34958760591360,
> have=27967614209536
> ERROR: failed to repair root items: Input/output error
>
> Anyway, I am about to mount it read-only again to try and backup a few
> things. And once I am done with that, should I run btrfs scrub?

Did it mount with ro alone, or did you need ro,degraded?

I'm a little confused by the i/o error, which I'd expect will also
produce a message at the same time in dmesg that will hint what the
nature of the i/o error is. That suggests some kind of hardware issue
still exists, even if it is an uncorrectable sector read error. For
sure rw mounted scrubs can fix those thing, if enough redundancy
exists, and those copies aren't also corrupt. But I'm off hand not
sure whether 'btrfs check --repair' can fixup bad sectors like scrub
can.

Anyway, I suggest 'btfs check --repair' is a last resort, no matter
the version of btrfs-progs. 'btrfs check' alone is safe. So in order:

* you've done these

*dmesg
*btrfs check --readonly  ##safe, makes no changes, maybe gives a hint
of the problem
*mount -o ro
*mount -o ro,degraded
mount -o rw  ## all devices available
mount -o rw,degraded

I'm not sure a read only scrub helps much. It might be interesting?
What you really want is to be able to mount rw with all devices, and
then scrub.

But even rw,degraded is better, because you must be rw mounted to make
scrub repairs, and also to do device replacements. I personally would
not do a degraded scrub, because that scrub requires reading the whole
volume. If you're going to read the whole volume anyway, you might as
well rebuild the bad/missing device, so that you can more quickly get
back to undegraded/normal RAID6 operation.

If you can only mount 'rw,degraded' we need to see 'btrfs fi show' and
the kernel messages for the failed mount and the successful degraded
mount, so we can figure out what devices are affected, maybe why, and
then what the next step is.

Anyone know if latest kernel and progs now reliably supports 'btrfs
replace' for RAID6? For a bit it was recommended to do it the old way,
with 'btrfs device add' followed by 'btrfs device delete'. Main
difference for the user is that 'replace' requires that the
replacement drive is at least as big (in bytes) as the one being
replaced and also that 'replace' will not resize the volume after
replacement is finished, that has to be done manually. Otherwise I
think it's preferred?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-05 19:57         ` Chris Murphy
@ 2019-09-05 20:44           ` Edmund Urbani
  2019-09-05 22:33             ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Edmund Urbani @ 2019-09-05 20:44 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS; +Cc: Qu Wenruo


On 05.09.2019 21:57, Chris Murphy wrote:
> On Thu, Sep 5, 2019 at 1:18 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>>
>> On 04.09.2019 07:36, Chris Murphy wrote:
>>>>>> I have tried all the mount / restore options listed here:
>>>>>> https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
>>>>> Good. Stick with ro attempts for now. Including if you want to try a
>>>>> newer kernel. If it succeeds to mount ro, my advice is to update
>>>>> backups so at least critical information isn't lost. Back up while you
>>>>> can. Any repair attempt makes changes that will risk the data being
>>>>> permanently lost. So it's important to be really deliberate about any
>>>>> changes.
>>>> I'll let you know, when I have the new kernel up and running.
>>> I think you should have all the original drives installed, and try to
>>> mount -o ro first. And if that doesn't work, try -o ro,degraded, and
>>> then we'll just have to see which drive it doesn't like.
>> Things are finally looking up. I have replaced both sdb and sdf with
>> ddrescue'd copies. sdb had some 10MB bad sectors and sdf 8KB which could
>> not be recovered.
>>
>> I am now able to mount the volume again. :)
>>
>> btrfsck /dev/sda1
>>
>> Opening filesystem to check...
>> Checking filesystem on /dev/sda1
>> UUID: 108df6ea-2846-4a88-8a50-61aedeef92b4
>> [1/7] checking root items
>> checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
>> checksum verify failed on 34958760591360 found E4E3BDB6 wanted 00000000
>> parent transid verify failed on 34958760591360 wanted 3331734 found 1544337
>> checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
>> checksum verify failed on 34958760591360 found 04DEBA71 wanted B9FBE54D
>> bad tree block 34958760591360, bytenr mismatch, want=34958760591360,
>> have=27967614209536
>> ERROR: failed to repair root items: Input/output error
>>
>> Anyway, I am about to mount it read-only again to try and backup a few
>> things. And once I am done with that, should I run btrfs scrub?
> Did it mount with ro alone, or did you need ro,degraded?
>
> I'm a little confused by the i/o error, which I'd expect will also
> produce a message at the same time in dmesg that will hint what the
> nature of the i/o error is. That suggests some kind of hardware issue
> still exists, even if it is an uncorrectable sector read error. For
> sure rw mounted scrubs can fix those thing, if enough redundancy
> exists, and those copies aren't also corrupt. But I'm off hand not
> sure whether 'btrfs check --repair' can fixup bad sectors like scrub
> can.
>
> Anyway, I suggest 'btfs check --repair' is a last resort, no matter
> the version of btrfs-progs. 'btrfs check' alone is safe. So in order:
>
> * you've done these
>
> *dmesg
> *btrfs check --readonly  ##safe, makes no changes, maybe gives a hint
> of the problem
> *mount -o ro
> *mount -o ro,degraded
> mount -o rw  ## all devices available
> mount -o rw,degraded
>
> I'm not sure a read only scrub helps much. It might be interesting?
> What you really want is to be able to mount rw with all devices, and
> then scrub.
>
> But even rw,degraded is better, because you must be rw mounted to make
> scrub repairs, and also to do device replacements. I personally would
> not do a degraded scrub, because that scrub requires reading the whole
> volume. If you're going to read the whole volume anyway, you might as
> well rebuild the bad/missing device, so that you can more quickly get
> back to undegraded/normal RAID6 operation.
>
> If you can only mount 'rw,degraded' we need to see 'btrfs fi show' and
> the kernel messages for the failed mount and the successful degraded
> mount, so we can figure out what devices are affected, maybe why, and
> then what the next step is.
>
> Anyone know if latest kernel and progs now reliably supports 'btrfs
> replace' for RAID6? For a bit it was recommended to do it the old way,
> with 'btrfs device add' followed by 'btrfs device delete'. Main
> difference for the user is that 'replace' requires that the
> replacement drive is at least as big (in bytes) as the one being
> replaced and also that 'replace' will not resize the volume after
> replacement is finished, that has to be done manually. Otherwise I
> think it's preferred?
>
I did not need the degraded option. And so far I see no HW I/O errors in 
dmesg. I have encountered a few errors while copying files and found 
these in the log:

[ 3560.273634] btrfs_print_data_csum_error: 50 callbacks suppressed
[ 3560.273639] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0x98f94189 expected csum 0xcb3af09a mirror 1
[ 3560.825942] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 2
[ 3560.826588] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 3
[ 3560.827813] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 4
[ 3560.829063] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 5
[ 3560.830366] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 6
[ 3560.831559] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 7
[ 3560.832998] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 8
[ 3560.834649] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 9
[ 3560.836188] BTRFS warning (device sdg1): csum failed root 262 ino 
1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 10

and also:

[ 3889.813300] btree_readpage_end_io_hook: 1860 callbacks suppressed
[ 3889.813304] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 0
[ 3889.825732] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.826375] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.828149] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.829649] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.831592] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.833436] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.835458] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.836968] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972
[ 3889.848545] BTRFS error (device sdg1): bad tree block start, want 
34958548107264 have 12157064991241308972

I think that Input/output error btrfsck is showing is actually a 
filesystem checksum error and not triggered by faulty hardware (not 
anymore, I hope). If there actually are any more failing drives here, I 
will most likely do the ddrescue thing again. Currently there are no 
free SATA ports in that system to connect an additional drive, so I 
cannot simply add one (at least not without also installing an 
additional SATA controller).

Anyway, I have some peace of mind now that most of my data is accessible 
again. Time to get some sleep...

Thank you, Chris!

Kind regards,
  Edmund



-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-05 20:44           ` Edmund Urbani
@ 2019-09-05 22:33             ` Chris Murphy
  2019-09-16 14:22               ` Urbani, Edmund
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2019-09-05 22:33 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Qu Wenruo, Edmund Urbani

On Thu, Sep 5, 2019 at 2:44 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
>
> I did not need the degraded option. And so far I see no HW I/O errors in
> dmesg. I have encountered a few errors while copying files and found
> these in the log:
>
> [ 3560.273634] btrfs_print_data_csum_error: 50 callbacks suppressed
> [ 3560.273639] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0x98f94189 expected csum 0xcb3af09a mirror 1

Not a bit flip
0x98f94189
10011000111110010100000110001001
0xcb3af09a
11001011001110101111000010011010


> [ 3560.825942] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 2
> [ 3560.826588] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 3
> [ 3560.827813] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 4
> [ 3560.829063] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 5
> [ 3560.830366] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 6
> [ 3560.831559] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 7
> [ 3560.832998] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 8
> [ 3560.834649] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 9
> [ 3560.836188] BTRFS warning (device sdg1): csum failed root 262 ino
> 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 10

Also not a bit flip.
0xc0248289
11000000001001001000001010001001
0xcb3af09a
11001011001110101111000010011010

I'm not sure what it means or suggests has happened, that all the
copies are wrong. Plausible with raid5 metadata. But seems unlikely
with raid6 metadata, and also with all devices accounted for.

The file itself is probably fine - these look like metadata
complaints. If you find the file this inode belongs to, either
duplicating it or deleting it is fine, should cause this bad leaf to
just go away. Make sure you delete the correct file, each subvolume
has its own list of inodes, this one is in subvol id 262.

>
> and also:
>
> [ 3889.813300] btree_readpage_end_io_hook: 1860 callbacks suppressed
> [ 3889.813304] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 0
> [ 3889.825732] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.826375] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.828149] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.829649] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.831592] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.833436] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.835458] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.836968] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972
> [ 3889.848545] BTRFS error (device sdg1): bad tree block start, want
> 34958548107264 have 12157064991241308972

I'm skeptical that a scrub will fix these things, because Btrfs is
passively scrubbing on reads, so any checksum mismatches should get
fixed up, if they can be fixed, from reconstruction, on the fly as
well as scrub. This is a different problem, I'm not sure how serious
it is.

I would still do the full scrub. And then unmount it and run 'btrfs
check --mode=lowmem'. On a file system of this size it will take a
long time. So maybe do it over a weekend

>
> I think that Input/output error btrfsck is showing is actually a
> filesystem checksum error and not triggered by faulty hardware (not
> anymore, I hope). If there actually are any more failing drives here, I
> will most likely do the ddrescue thing again. Currently there are no
> free SATA ports in that system to connect an additional drive, so I
> cannot simply add one (at least not without also installing an
> additional SATA controller).

I suggest start planning how to migrate the data to a new Btrfs
volume. If the problems can't be repaired, this becomes inevitable. A
reasonable strategy is to take read-only snapshots of each subvolume
you want to preserve. And either 'btrfs send/receive' or 'rsync' to
new storage. That way you can keep using the volume rw in the
meantime. Once that completes, do another read only snapshot of each
subvolume, and do an incremental 'send -p' or rsync to migrate the
much smaller changes.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unmountable degraded BTRFS RAID6 filesystem
  2019-09-05 22:33             ` Chris Murphy
@ 2019-09-16 14:22               ` Urbani, Edmund
  0 siblings, 0 replies; 12+ messages in thread
From: Urbani, Edmund @ 2019-09-16 14:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo

Am Fr., 6. Sept. 2019 um 00:33 Uhr schrieb Chris Murphy
<lists@colorremedies.com>:
>
> On Thu, Sep 5, 2019 at 2:44 PM Edmund Urbani <edmund.urbani@liland.com> wrote:
> >
> > I did not need the degraded option. And so far I see no HW I/O errors in
> > dmesg. I have encountered a few errors while copying files and found
> > these in the log:
> >
> > [ 3560.273634] btrfs_print_data_csum_error: 50 callbacks suppressed
> > [ 3560.273639] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0x98f94189 expected csum 0xcb3af09a mirror 1
>
> Not a bit flip
> 0x98f94189
> 10011000111110010100000110001001
> 0xcb3af09a
> 11001011001110101111000010011010
>
>
> > [ 3560.825942] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 2
> > [ 3560.826588] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 3
> > [ 3560.827813] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 4
> > [ 3560.829063] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 5
> > [ 3560.830366] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 6
> > [ 3560.831559] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 7
> > [ 3560.832998] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 8
> > [ 3560.834649] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 9
> > [ 3560.836188] BTRFS warning (device sdg1): csum failed root 262 ino
> > 1838364 off 14467072 csum 0xc0248289 expected csum 0xcb3af09a mirror 10
>
> Also not a bit flip.
> 0xc0248289
> 11000000001001001000001010001001
> 0xcb3af09a
> 11001011001110101111000010011010
>
> I'm not sure what it means or suggests has happened, that all the
> copies are wrong. Plausible with raid5 metadata. But seems unlikely
> with raid6 metadata, and also with all devices accounted for.
>
> The file itself is probably fine - these look like metadata
> complaints. If you find the file this inode belongs to, either
> duplicating it or deleting it is fine, should cause this bad leaf to
> just go away. Make sure you delete the correct file, each subvolume
> has its own list of inodes, this one is in subvol id 262.
>
> >
> > and also:
> >
> > [ 3889.813300] btree_readpage_end_io_hook: 1860 callbacks suppressed
> > [ 3889.813304] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 0
> > [ 3889.825732] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.826375] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.828149] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.829649] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.831592] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.833436] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.835458] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.836968] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
> > [ 3889.848545] BTRFS error (device sdg1): bad tree block start, want
> > 34958548107264 have 12157064991241308972
>
> I'm skeptical that a scrub will fix these things, because Btrfs is
> passively scrubbing on reads, so any checksum mismatches should get
> fixed up, if they can be fixed, from reconstruction, on the fly as
> well as scrub. This is a different problem, I'm not sure how serious
> it is.
>
> I would still do the full scrub. And then unmount it and run 'btrfs
> check --mode=lowmem'. On a file system of this size it will take a
> long time. So maybe do it over a weekend
>
> >
> > I think that Input/output error btrfsck is showing is actually a
> > filesystem checksum error and not triggered by faulty hardware (not
> > anymore, I hope). If there actually are any more failing drives here, I
> > will most likely do the ddrescue thing again. Currently there are no
> > free SATA ports in that system to connect an additional drive, so I
> > cannot simply add one (at least not without also installing an
> > additional SATA controller).
>
> I suggest start planning how to migrate the data to a new Btrfs
> volume. If the problems can't be repaired, this becomes inevitable. A
> reasonable strategy is to take read-only snapshots of each subvolume
> you want to preserve. And either 'btrfs send/receive' or 'rsync' to
> new storage. That way you can keep using the volume rw in the
> meantime. Once that completes, do another read only snapshot of each
> subvolume, and do an incremental 'send -p' or rsync to migrate the
> much smaller changes.
>
>
> --
> Chris Murphy


Here's a little status update. I am still in the process of salvaging
files (remounting rw did not work for long and btrfs soon reverted to
read-only state and I left it that way for now). After completing my
first rsync pass I was still missing several large directory trees and
found corresponding errors in the logs:
Sep 15 20:34:39 phoenix kernel: BTRFS error (device sdg1): parent
transid verify failed on 34960626352128 wanted 3332854 found 3332691

I remounted with ro,recover,nospace_cache,clear_cache. Now I am able
to access more of the filesystem, but some errors still remain. I am
seeing plenty of csum errors in the logs:
Sep 16 12:08:53 phoenix kernel: BTRFS info (device sdg1): no csum
found for inode 6126287 start 1673527296

then there's these (for all 10 mirrors):
Sep 16 12:09:13 phoenix kernel: BTRFS warning (device sdg1): csum
failed root 261 ino 6126287 off 1734606848 csum 0x7430ddcb expected
csum 0x00000000 mirror 10
curiously at least the recent log entries all refer to inode 6126287
(start, offset etc. vary).

And then there's also still occasionally this:
Sep 16 12:09:19 phoenix kernel: BTRFS error (device sdg1): parent
transid verify failed on 34960627597312 wanted 3332854 found 3332691

I'll investigate the logs further when the second rsync pass is done.

Kind regards,
 Edmund

-- 
*Liland IT GmbH*


Ferlach ● Wien ● München
Tel: +43 463 220111
Tel: +49 89 
458 15 940
office@Liland.com
https://Liland.com <https://Liland.com> 



Copyright © 2019 Liland IT GmbH 

Diese Mail enthaelt vertrauliche und/oder 
rechtlich geschuetzte Informationen. 
Wenn Sie nicht der richtige Adressat 
sind oder diese Email irrtuemlich erhalten haben, informieren Sie bitte 
sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren 
sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

This 
email may contain confidential and/or privileged information. 
If you are 
not the intended recipient (or have received this email in error) please 
notify the sender immediately and destroy this email. Any unauthorised 
copying, disclosure or distribution of the material in this email is 
strictly forbidden.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-09-16 14:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03 22:20 Unmountable degraded BTRFS RAID6 filesystem Edmund Urbani
2019-09-03 23:30 ` Chris Murphy
2019-09-04  4:39   ` Edmund Urbani
2019-09-04  5:36     ` Chris Murphy
2019-09-04  6:18       ` Edmund Urbani
2019-09-04 12:35         ` Piotr Szymaniak
2019-09-04 14:04           ` Edmund Urbani
2019-09-05 19:17       ` Edmund Urbani
2019-09-05 19:57         ` Chris Murphy
2019-09-05 20:44           ` Edmund Urbani
2019-09-05 22:33             ` Chris Murphy
2019-09-16 14:22               ` Urbani, Edmund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).