All of lore.kernel.org
 help / color / mirror / Atom feed
* Trying to rescue my data :(
@ 2016-06-24 14:52 Steven Haigh
  2016-06-24 16:26 ` Steven Haigh
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Haigh @ 2016-06-24 14:52 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 6633 bytes --]

Ok, so I figured that despite what the BTRFS wiki seems to imply, the
'multi parity' support just isn't stable enough to be used. So, I'm
trying to revert to what I had before.

My setup consist of:
	* 2 x 3Tb drives +
	* 3 x 2Tb drives.

I've got (had?) about 4.9Tb of data.

My idea was to convert the existing setup using a balance to a 'single'
setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
on that, then copy the data across.

So, great - first the balance:
$ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
it'll reduce the metadata redundancy).

This promptly was followed by a system crash.

After a reboot, I can no longer mount the BTRFS in read-write:
[  134.768908] BTRFS info (device xvdd): disk space caching is enabled
[  134.769032] BTRFS: has skinny extents
[  134.769856] BTRFS: failed to read the system array on xvdd
[  134.776055] BTRFS: open_ctree failed
[  143.900055] BTRFS info (device xvdd): allowing degraded mounts
[  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
[  143.900243] BTRFS info (device xvdd): disk space caching is enabled
[  143.900330] BTRFS: has skinny extents
[  143.901860] BTRFS warning (device xvdd): devid 4 uuid
61ccce61-9787-453e-b793-1b86f8015ee1 is missing
[  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
mount is not allowed
[  146.552051] BTRFS: open_ctree failed

I can mount it read only - but then I also get crashes when it seems to
hit a read error:
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
csum 3245290974 wanted 982056704 mirror 0
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
390821102 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
550556475 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1279883714 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2566472073 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1876236691 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
3350537857 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
3319706190 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2377458007 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2066127208 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
657140479 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1239359620 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1598877324 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1082738394 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
371906697 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2156787247 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
3777709399 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
180814340 wanted 982056704 mirror 1
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:2401!
invalid opcode: 0000 [#1] SMP
Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2610978113 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
59610051 wanted 982056704 mirror 1
CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
RIP: e030:[<ffffffffa039e0e0>]  [<ffffffffa039e0e0>]
btrfs_check_repairable+0x100/0x110 [btrfs]
RSP: e02b:ffff88007878bcc8  EFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
FS:  00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
Stack:
 ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
 ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
 ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
Call Trace:
 [<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
 [<ffffffff812f40c0>] bio_endio+0x40/0x60
 [<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
 [<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
 [<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
 [<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
 [<ffffffff81093844>] process_one_work+0x154/0x400
 [<ffffffff8109438a>] worker_thread+0x11a/0x460
 [<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
 [<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
 [<ffffffff810993f9>] kthread+0xc9/0xe0
 [<ffffffff81099330>] ? kthread_park+0x60/0x60
 [<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
 [<ffffffff81099330>] ? kthread_park+0x60/0x60
Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
 RSP <ffff88007878bcc8>
------------[ cut here ]------------
<more crashes until the system hangs>

So, where to from here? Sadly, I feel there is data loss in my future,
but not sure how to minimise this :\

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 14:52 Trying to rescue my data :( Steven Haigh
@ 2016-06-24 16:26 ` Steven Haigh
  2016-06-24 16:59   ` ronnie sahlberg
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Haigh @ 2016-06-24 16:26 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 13359 bytes --]

On 25/06/16 00:52, Steven Haigh wrote:
> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
> 'multi parity' support just isn't stable enough to be used. So, I'm
> trying to revert to what I had before.
> 
> My setup consist of:
> 	* 2 x 3Tb drives +
> 	* 3 x 2Tb drives.
> 
> I've got (had?) about 4.9Tb of data.
> 
> My idea was to convert the existing setup using a balance to a 'single'
> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
> on that, then copy the data across.
> 
> So, great - first the balance:
> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
> it'll reduce the metadata redundancy).
> 
> This promptly was followed by a system crash.
> 
> After a reboot, I can no longer mount the BTRFS in read-write:
> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
> [  134.769032] BTRFS: has skinny extents
> [  134.769856] BTRFS: failed to read the system array on xvdd
> [  134.776055] BTRFS: open_ctree failed
> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
> [  143.900330] BTRFS: has skinny extents
> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
> mount is not allowed
> [  146.552051] BTRFS: open_ctree failed
> 
> I can mount it read only - but then I also get crashes when it seems to
> hit a read error:
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
> csum 3245290974 wanted 982056704 mirror 0
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 390821102 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 550556475 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1279883714 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2566472073 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1876236691 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3350537857 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3319706190 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2377458007 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2066127208 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 657140479 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1239359620 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1598877324 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1082738394 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 371906697 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2156787247 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3777709399 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 180814340 wanted 982056704 mirror 1
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:2401!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2610978113 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 59610051 wanted 982056704 mirror 1
> CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
> RIP: e030:[<ffffffffa039e0e0>]  [<ffffffffa039e0e0>]
> btrfs_check_repairable+0x100/0x110 [btrfs]
> RSP: e02b:ffff88007878bcc8  EFLAGS: 00010297
> RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
> RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
> RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
> R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
> FS:  00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
> Stack:
>  ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
>  ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
>  ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
> Call Trace:
>  [<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
>  [<ffffffff812f40c0>] bio_endio+0x40/0x60
>  [<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
>  [<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
>  [<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
>  [<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
>  [<ffffffff81093844>] process_one_work+0x154/0x400
>  [<ffffffff8109438a>] worker_thread+0x11a/0x460
>  [<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
>  [<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
>  [<ffffffff810993f9>] kthread+0xc9/0xe0
>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>  [<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
> Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
> 48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> RIP  [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
>  RSP <ffff88007878bcc8>
> ------------[ cut here ]------------
> <more crashes until the system hangs>
> 
> So, where to from here? Sadly, I feel there is data loss in my future,
> but not sure how to minimise this :\
> 

The more I look at this, the more I'm wondering if this is a total
corruption scenario:

$ btrfs restore -D -l /dev/xvdc
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=59973363410688
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=59973363410688
Couldn't read chunk tree
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdd
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvde
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
bytenr mismatch, want=11224137170944, have=59973365311232
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
bytenr mismatch, want=11224137170944, have=59973365311232
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdf
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdg
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=11224137105408
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=11224137105408
ERROR: cannot read chunk root
Could not open root, trying backup super

If I mount it read only:
$ mount -o nossd,degraded,ro /dev/xvdc /mnt/fileshare/

$ btrfs device usage /mnt/fileshare/

/dev/xvdc, ID: 1
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,single:             5.00GiB
   Data,RAID6:              1.60TiB
   Data,RAID6:              2.75GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.12TiB

/dev/xvdd, ID: 2
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,single:             1.00GiB
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.12TiB

/dev/xvde, ID: 3
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:           213.23GiB

/dev/xvdf, ID: 6
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:            882.62GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   Unallocated:           977.33GiB

/dev/xvdg, ID: 5
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:           213.23GiB

missing, ID: 4
   Device size:               0.00B
   Device slack:           16.00EiB
   Data,RAID6:            758.00GiB
   Data,RAID6:              4.31GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.07TiB

Hoping this isn't a total loss ;)

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 16:26 ` Steven Haigh
@ 2016-06-24 16:59   ` ronnie sahlberg
  2016-06-24 17:05     ` Steven Haigh
  0 siblings, 1 reply; 20+ messages in thread
From: ronnie sahlberg @ 2016-06-24 16:59 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Btrfs BTRFS

What I would do in this situation :

1, Immediately stop writing to these disks/filesystem. ONLY access it
in read-only mode until you have salvaged what can be salvaged.
2, get a new 5T UDB drive (they are cheap) and copy file by file off the array.
3, when you hit files that cause panics, make a node of the inode and
avoid touching that file again.

Will likely take a lot of work and time since I suspect it is a
largely manual process. But if the data is important ...


Once you have all salvageable data copied to the new drive you can
decide on how to proceed.
I.e. if you want to try to repair the filesystem (I have low
confidence in this for parity raid case) or if you will simply rebuild
a new fs from scratch.

On Fri, Jun 24, 2016 at 9:26 AM, Steven Haigh <netwiz@crc.id.au> wrote:
> On 25/06/16 00:52, Steven Haigh wrote:
>> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
>> 'multi parity' support just isn't stable enough to be used. So, I'm
>> trying to revert to what I had before.
>>
>> My setup consist of:
>>       * 2 x 3Tb drives +
>>       * 3 x 2Tb drives.
>>
>> I've got (had?) about 4.9Tb of data.
>>
>> My idea was to convert the existing setup using a balance to a 'single'
>> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
>> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
>> on that, then copy the data across.
>>
>> So, great - first the balance:
>> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
>> it'll reduce the metadata redundancy).
>>
>> This promptly was followed by a system crash.
>>
>> After a reboot, I can no longer mount the BTRFS in read-write:
>> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
>> [  134.769032] BTRFS: has skinny extents
>> [  134.769856] BTRFS: failed to read the system array on xvdd
>> [  134.776055] BTRFS: open_ctree failed
>> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
>> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
>> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
>> [  143.900330] BTRFS: has skinny extents
>> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
>> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
>> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
>> mount is not allowed
>> [  146.552051] BTRFS: open_ctree failed
>>
>> I can mount it read only - but then I also get crashes when it seems to
>> hit a read error:
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
>> csum 3245290974 wanted 982056704 mirror 0
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 390821102 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 550556475 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1279883714 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2566472073 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1876236691 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 3350537857 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 3319706190 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2377458007 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2066127208 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 657140479 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1239359620 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1598877324 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1082738394 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 371906697 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2156787247 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 3777709399 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 180814340 wanted 982056704 mirror 1
>> ------------[ cut here ]------------
>> kernel BUG at fs/btrfs/extent_io.c:2401!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
>> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
>> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
>> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2610978113 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 59610051 wanted 982056704 mirror 1
>> CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
>> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
>> task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
>> RIP: e030:[<ffffffffa039e0e0>]  [<ffffffffa039e0e0>]
>> btrfs_check_repairable+0x100/0x110 [btrfs]
>> RSP: e02b:ffff88007878bcc8  EFLAGS: 00010297
>> RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
>> RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
>> RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
>> R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
>> R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
>> FS:  00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
>> Stack:
>>  ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
>>  ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
>>  ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
>> Call Trace:
>>  [<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
>>  [<ffffffff812f40c0>] bio_endio+0x40/0x60
>>  [<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
>>  [<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
>>  [<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
>>  [<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
>>  [<ffffffff81093844>] process_one_work+0x154/0x400
>>  [<ffffffff8109438a>] worker_thread+0x11a/0x460
>>  [<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
>>  [<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
>>  [<ffffffff810993f9>] kthread+0xc9/0xe0
>>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>>  [<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
>>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>> Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
>> 48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
>> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
>> RIP  [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
>>  RSP <ffff88007878bcc8>
>> ------------[ cut here ]------------
>> <more crashes until the system hangs>
>>
>> So, where to from here? Sadly, I feel there is data loss in my future,
>> but not sure how to minimise this :\
>>
>
> The more I look at this, the more I'm wondering if this is a total
> corruption scenario:
>
> $ btrfs restore -D -l /dev/xvdc
> warning, device 4 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=11224137564160
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 3 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=59973363410688
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 3 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=59973363410688
> Couldn't read chunk tree
> Could not open root, trying backup super
>
> $ btrfs restore -D -l /dev/xvdd
> warning, device 4 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=11224137564160
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 3 is missing
> bytenr mismatch, want=11224137170944, have=0
> ERROR: cannot read chunk root
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 3 is missing
> bytenr mismatch, want=11224137170944, have=0
> ERROR: cannot read chunk root
> Could not open root, trying backup super
>
> $ btrfs restore -D -l /dev/xvde
> warning, device 4 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=11224137564160
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
> bytenr mismatch, want=11224137170944, have=59973365311232
> ERROR: cannot read chunk root
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
> bytenr mismatch, want=11224137170944, have=59973365311232
> ERROR: cannot read chunk root
> Could not open root, trying backup super
>
> $ btrfs restore -D -l /dev/xvdf
> warning, device 4 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=11224137564160
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 3 is missing
> bytenr mismatch, want=11224137170944, have=0
> ERROR: cannot read chunk root
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 3 is missing
> bytenr mismatch, want=11224137170944, have=0
> ERROR: cannot read chunk root
> Could not open root, trying backup super
>
> $ btrfs restore -D -l /dev/xvdg
> warning, device 4 is missing
> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
> bytenr mismatch, want=11224137433088, have=11224137564160
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 3 is missing
> bytenr mismatch, want=11224137170944, have=11224137105408
> ERROR: cannot read chunk root
> Could not open root, trying backup super
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 4 is missing
> warning, device 3 is missing
> bytenr mismatch, want=11224137170944, have=11224137105408
> ERROR: cannot read chunk root
> Could not open root, trying backup super
>
> If I mount it read only:
> $ mount -o nossd,degraded,ro /dev/xvdc /mnt/fileshare/
>
> $ btrfs device usage /mnt/fileshare/
>
> /dev/xvdc, ID: 1
>    Device size:             2.73TiB
>    Device slack:              0.00B
>    Data,single:             5.00GiB
>    Data,RAID6:              1.60TiB
>    Data,RAID6:              2.75GiB
>    Data,RAID6:              1.00GiB
>    Metadata,RAID6:          2.06GiB
>    System,RAID6:           32.00MiB
>    Unallocated:             1.12TiB
>
> /dev/xvdd, ID: 2
>    Device size:             2.73TiB
>    Device slack:              0.00B
>    Data,single:             1.00GiB
>    Data,RAID6:              1.60TiB
>    Data,RAID6:              7.07GiB
>    Data,RAID6:              1.00GiB
>    Metadata,RAID6:          2.06GiB
>    System,RAID6:           32.00MiB
>    Unallocated:             1.12TiB
>
> /dev/xvde, ID: 3
>    Device size:             1.82TiB
>    Device slack:              0.00B
>    Data,RAID6:              1.60TiB
>    Data,RAID6:              7.07GiB
>    Metadata,RAID6:          2.06GiB
>    System,RAID6:           32.00MiB
>    Unallocated:           213.23GiB
>
> /dev/xvdf, ID: 6
>    Device size:             1.82TiB
>    Device slack:              0.00B
>    Data,RAID6:            882.62GiB
>    Data,RAID6:              1.00GiB
>    Metadata,RAID6:          2.06GiB
>    Unallocated:           977.33GiB
>
> /dev/xvdg, ID: 5
>    Device size:             1.82TiB
>    Device slack:              0.00B
>    Data,RAID6:              1.60TiB
>    Data,RAID6:              7.07GiB
>    Metadata,RAID6:          2.06GiB
>    System,RAID6:           32.00MiB
>    Unallocated:           213.23GiB
>
> missing, ID: 4
>    Device size:               0.00B
>    Device slack:           16.00EiB
>    Data,RAID6:            758.00GiB
>    Data,RAID6:              4.31GiB
>    System,RAID6:           32.00MiB
>    Unallocated:             1.07TiB
>
> Hoping this isn't a total loss ;)
>
> --
> Steven Haigh
>
> Email: netwiz@crc.id.au
> Web: https://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 16:59   ` ronnie sahlberg
@ 2016-06-24 17:05     ` Steven Haigh
  2016-06-24 17:40       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Haigh @ 2016-06-24 17:05 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 16239 bytes --]

On 25/06/16 02:59, ronnie sahlberg wrote:
> What I would do in this situation :
> 
> 1, Immediately stop writing to these disks/filesystem. ONLY access it
> in read-only mode until you have salvaged what can be salvaged.

That's ok - I can't even mount it in RW mode :)

> 2, get a new 5T UDB drive (they are cheap) and copy file by file off the array.

I've actually got enough combined space to store stuff places in the
mean time...

> 3, when you hit files that cause panics, make a node of the inode and
> avoid touching that file again.

What I have in mind here is that a file seems to get CREATED when I copy
the file that crashes the system in the target directory. I'm thinking
if I 'cp -an source/ target/' that it will make this somewhat easier (it
won't overwrite the zero byte file).

> Will likely take a lot of work and time since I suspect it is a
> largely manual process. But if the data is important ...

Yeah - there's only about 80Gb on the array that I *really* care about -
the rest is just a bonus if its there - not rage-worthy :P

> Once you have all salvageable data copied to the new drive you can
> decide on how to proceed.
> I.e. if you want to try to repair the filesystem (I have low
> confidence in this for parity raid case) or if you will simply rebuild
> a new fs from scratch.

I honestly think it'll be scorched earth and start again with a new FS.
I'm thinking of going back to mdadm for the RAID (which has worked
perfectly for years) and using maybe a vanilla BTRFS on top of that
block device.

Anything else seems like too much work for too little reward - and lack
of confidence.

> On Fri, Jun 24, 2016 at 9:26 AM, Steven Haigh <netwiz@crc.id.au> wrote:
>> On 25/06/16 00:52, Steven Haigh wrote:
>>> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
>>> 'multi parity' support just isn't stable enough to be used. So, I'm
>>> trying to revert to what I had before.
>>>
>>> My setup consist of:
>>>       * 2 x 3Tb drives +
>>>       * 3 x 2Tb drives.
>>>
>>> I've got (had?) about 4.9Tb of data.
>>>
>>> My idea was to convert the existing setup using a balance to a 'single'
>>> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
>>> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
>>> on that, then copy the data across.
>>>
>>> So, great - first the balance:
>>> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
>>> it'll reduce the metadata redundancy).
>>>
>>> This promptly was followed by a system crash.
>>>
>>> After a reboot, I can no longer mount the BTRFS in read-write:
>>> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
>>> [  134.769032] BTRFS: has skinny extents
>>> [  134.769856] BTRFS: failed to read the system array on xvdd
>>> [  134.776055] BTRFS: open_ctree failed
>>> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
>>> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
>>> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
>>> [  143.900330] BTRFS: has skinny extents
>>> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
>>> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
>>> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
>>> mount is not allowed
>>> [  146.552051] BTRFS: open_ctree failed
>>>
>>> I can mount it read only - but then I also get crashes when it seems to
>>> hit a read error:
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
>>> csum 3245290974 wanted 982056704 mirror 0
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 390821102 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 550556475 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1279883714 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 2566472073 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1876236691 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 3350537857 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 3319706190 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 2377458007 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 2066127208 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 657140479 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1239359620 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1598877324 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1082738394 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 371906697 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 2156787247 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 3777709399 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 180814340 wanted 982056704 mirror 1
>>> ------------[ cut here ]------------
>>> kernel BUG at fs/btrfs/extent_io.c:2401!
>>> invalid opcode: 0000 [#1] SMP
>>> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
>>> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
>>> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
>>> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 2610978113 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 59610051 wanted 982056704 mirror 1
>>> CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
>>> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
>>> task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
>>> RIP: e030:[<ffffffffa039e0e0>]  [<ffffffffa039e0e0>]
>>> btrfs_check_repairable+0x100/0x110 [btrfs]
>>> RSP: e02b:ffff88007878bcc8  EFLAGS: 00010297
>>> RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
>>> RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
>>> RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
>>> R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
>>> R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
>>> FS:  00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
>>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
>>> Stack:
>>>  ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
>>>  ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
>>>  ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
>>> Call Trace:
>>>  [<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
>>>  [<ffffffff812f40c0>] bio_endio+0x40/0x60
>>>  [<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
>>>  [<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
>>>  [<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
>>>  [<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
>>>  [<ffffffff81093844>] process_one_work+0x154/0x400
>>>  [<ffffffff8109438a>] worker_thread+0x11a/0x460
>>>  [<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
>>>  [<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
>>>  [<ffffffff810993f9>] kthread+0xc9/0xe0
>>>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>>>  [<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
>>>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>>> Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
>>> 48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
>>> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
>>> RIP  [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
>>>  RSP <ffff88007878bcc8>
>>> ------------[ cut here ]------------
>>> <more crashes until the system hangs>
>>>
>>> So, where to from here? Sadly, I feel there is data loss in my future,
>>> but not sure how to minimise this :\
>>>
>>
>> The more I look at this, the more I'm wondering if this is a total
>> corruption scenario:
>>
>> $ btrfs restore -D -l /dev/xvdc
>> warning, device 4 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=11224137564160
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> warning, device 3 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=59973363410688
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> warning, device 3 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=59973363410688
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>>
>> $ btrfs restore -D -l /dev/xvdd
>> warning, device 4 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=11224137564160
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> warning, device 3 is missing
>> bytenr mismatch, want=11224137170944, have=0
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> warning, device 3 is missing
>> bytenr mismatch, want=11224137170944, have=0
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>>
>> $ btrfs restore -D -l /dev/xvde
>> warning, device 4 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=11224137564160
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
>> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
>> bytenr mismatch, want=11224137170944, have=59973365311232
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
>> checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
>> bytenr mismatch, want=11224137170944, have=59973365311232
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>>
>> $ btrfs restore -D -l /dev/xvdf
>> warning, device 4 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=11224137564160
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> warning, device 3 is missing
>> bytenr mismatch, want=11224137170944, have=0
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 5 is missing
>> warning, device 3 is missing
>> bytenr mismatch, want=11224137170944, have=0
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>>
>> $ btrfs restore -D -l /dev/xvdg
>> warning, device 4 is missing
>> checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
>> bytenr mismatch, want=11224137433088, have=11224137564160
>> Couldn't read chunk tree
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 3 is missing
>> bytenr mismatch, want=11224137170944, have=11224137105408
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>> warning, device 1 is missing
>> warning, device 2 is missing
>> warning, device 4 is missing
>> warning, device 3 is missing
>> bytenr mismatch, want=11224137170944, have=11224137105408
>> ERROR: cannot read chunk root
>> Could not open root, trying backup super
>>
>> If I mount it read only:
>> $ mount -o nossd,degraded,ro /dev/xvdc /mnt/fileshare/
>>
>> $ btrfs device usage /mnt/fileshare/
>>
>> /dev/xvdc, ID: 1
>>    Device size:             2.73TiB
>>    Device slack:              0.00B
>>    Data,single:             5.00GiB
>>    Data,RAID6:              1.60TiB
>>    Data,RAID6:              2.75GiB
>>    Data,RAID6:              1.00GiB
>>    Metadata,RAID6:          2.06GiB
>>    System,RAID6:           32.00MiB
>>    Unallocated:             1.12TiB
>>
>> /dev/xvdd, ID: 2
>>    Device size:             2.73TiB
>>    Device slack:              0.00B
>>    Data,single:             1.00GiB
>>    Data,RAID6:              1.60TiB
>>    Data,RAID6:              7.07GiB
>>    Data,RAID6:              1.00GiB
>>    Metadata,RAID6:          2.06GiB
>>    System,RAID6:           32.00MiB
>>    Unallocated:             1.12TiB
>>
>> /dev/xvde, ID: 3
>>    Device size:             1.82TiB
>>    Device slack:              0.00B
>>    Data,RAID6:              1.60TiB
>>    Data,RAID6:              7.07GiB
>>    Metadata,RAID6:          2.06GiB
>>    System,RAID6:           32.00MiB
>>    Unallocated:           213.23GiB
>>
>> /dev/xvdf, ID: 6
>>    Device size:             1.82TiB
>>    Device slack:              0.00B
>>    Data,RAID6:            882.62GiB
>>    Data,RAID6:              1.00GiB
>>    Metadata,RAID6:          2.06GiB
>>    Unallocated:           977.33GiB
>>
>> /dev/xvdg, ID: 5
>>    Device size:             1.82TiB
>>    Device slack:              0.00B
>>    Data,RAID6:              1.60TiB
>>    Data,RAID6:              7.07GiB
>>    Metadata,RAID6:          2.06GiB
>>    System,RAID6:           32.00MiB
>>    Unallocated:           213.23GiB
>>
>> missing, ID: 4
>>    Device size:               0.00B
>>    Device slack:           16.00EiB
>>    Data,RAID6:            758.00GiB
>>    Data,RAID6:              4.31GiB
>>    System,RAID6:           32.00MiB
>>    Unallocated:             1.07TiB
>>
>> Hoping this isn't a total loss ;)
>>
>> --
>> Steven Haigh
>>
>> Email: netwiz@crc.id.au
>> Web: https://www.crc.id.au
>> Phone: (03) 9001 6090 - 0412 935 897
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 17:05     ` Steven Haigh
@ 2016-06-24 17:40       ` Austin S. Hemmelgarn
  2016-06-24 17:43         ` Steven Haigh
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-24 17:40 UTC (permalink / raw)
  To: Steven Haigh, linux-btrfs

On 2016-06-24 13:05, Steven Haigh wrote:
> On 25/06/16 02:59, ronnie sahlberg wrote:
> What I have in mind here is that a file seems to get CREATED when I copy
> the file that crashes the system in the target directory. I'm thinking
> if I 'cp -an source/ target/' that it will make this somewhat easier (it
> won't overwrite the zero byte file).
You may want to try with rsync (rsync -vahogSHAXOP should get just about 
everything possible out of the filesystem except for some security 
attributes (stuff like SELinux context), and will give you nice 
information about progress as well).  It will keep running in the face 
of individual read errors, and will only try each file once.  It also 
has the advantage of showing you the transfer rate and exactly where in 
the directory structure you are, and handles partial copies sanely too 
(it's more reliable restarting an rsync transfer than a cp one that got 
interrupted part way through).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 17:40       ` Austin S. Hemmelgarn
@ 2016-06-24 17:43         ` Steven Haigh
  2016-06-24 17:50           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Haigh @ 2016-06-24 17:43 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2145 bytes --]

On 25/06/16 03:40, Austin S. Hemmelgarn wrote:
> On 2016-06-24 13:05, Steven Haigh wrote:
>> On 25/06/16 02:59, ronnie sahlberg wrote:
>> What I have in mind here is that a file seems to get CREATED when I copy
>> the file that crashes the system in the target directory. I'm thinking
>> if I 'cp -an source/ target/' that it will make this somewhat easier (it
>> won't overwrite the zero byte file).
> You may want to try with rsync (rsync -vahogSHAXOP should get just about
> everything possible out of the filesystem except for some security
> attributes (stuff like SELinux context), and will give you nice
> information about progress as well).  It will keep running in the face
> of individual read errors, and will only try each file once.  It also
> has the advantage of showing you the transfer rate and exactly where in
> the directory structure you are, and handles partial copies sanely too
> (it's more reliable restarting an rsync transfer than a cp one that got
> interrupted part way through).

I may try that - I came up with this:
#!/bin/bash

mount -o ro,nossd,degraded /dev/xvdc /mnt/fileshare/

find /mnt/fileshare/data/Photos/ -type f -print0 |
    while IFS= read -r -d $'\0' line; do
        echo "Processing $line"
        DIR=`dirname "$line"`
        mkdir -p "/mnt/recover/$DIR"
        if [ ! -e "/mnt/recover/$line" ]; then
                echo "Copying $line to /mnt/recover/$line"
                touch "/mnt/recover/$line"
                sync
                cp -f "$line" "/mnt/recover/$line"
                sync
        fi
    done

umount /mnt/fileshare

I'm slowly picking through the data - and it has crashed a few times...
It seems that there are some checksum failures that don't crash the
entire system - so that's a good thing to know - not sure if that means
that it is correcting the data with parity - or something else.

I'll see how much data I can extract with this and go from there - as it
may be good enough to call it a success.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 17:43         ` Steven Haigh
@ 2016-06-24 17:50           ` Austin S. Hemmelgarn
  2016-06-25  4:19             ` Steven Haigh
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-24 17:50 UTC (permalink / raw)
  To: Steven Haigh, linux-btrfs

On 2016-06-24 13:43, Steven Haigh wrote:
> On 25/06/16 03:40, Austin S. Hemmelgarn wrote:
>> On 2016-06-24 13:05, Steven Haigh wrote:
>>> On 25/06/16 02:59, ronnie sahlberg wrote:
>>> What I have in mind here is that a file seems to get CREATED when I copy
>>> the file that crashes the system in the target directory. I'm thinking
>>> if I 'cp -an source/ target/' that it will make this somewhat easier (it
>>> won't overwrite the zero byte file).
>> You may want to try with rsync (rsync -vahogSHAXOP should get just about
>> everything possible out of the filesystem except for some security
>> attributes (stuff like SELinux context), and will give you nice
>> information about progress as well).  It will keep running in the face
>> of individual read errors, and will only try each file once.  It also
>> has the advantage of showing you the transfer rate and exactly where in
>> the directory structure you are, and handles partial copies sanely too
>> (it's more reliable restarting an rsync transfer than a cp one that got
>> interrupted part way through).
>
> I may try that - I came up with this:
> #!/bin/bash
>
> mount -o ro,nossd,degraded /dev/xvdc /mnt/fileshare/
>
> find /mnt/fileshare/data/Photos/ -type f -print0 |
>     while IFS= read -r -d $'\0' line; do
>         echo "Processing $line"
>         DIR=`dirname "$line"`
>         mkdir -p "/mnt/recover/$DIR"
>         if [ ! -e "/mnt/recover/$line" ]; then
>                 echo "Copying $line to /mnt/recover/$line"
>                 touch "/mnt/recover/$line"
>                 sync
>                 cp -f "$line" "/mnt/recover/$line"
>                 sync
>         fi
>     done
>
> umount /mnt/fileshare
>
> I'm slowly picking through the data - and it has crashed a few times...
> It seems that there are some checksum failures that don't crash the
> entire system - so that's a good thing to know - not sure if that means
> that it is correcting the data with parity - or something else.
>
> I'll see how much data I can extract with this and go from there - as it
> may be good enough to call it a success.
>
AH, if you're having issues with crashes when you hit errors, you may 
want to avoid rsync then, it will try to reread any files that don't 
match in size and mtime, so it would likely just keep crashing on the 
same file over and over again.

Also, looking at the script you've got, that will probably run faster 
too because it shouldn't need to call stat() on everything like rsync 
does (because of the size and mtime comparison).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-24 17:50           ` Austin S. Hemmelgarn
@ 2016-06-25  4:19             ` Steven Haigh
  2016-06-25 16:25               ` Chris Murphy
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Haigh @ 2016-06-25  4:19 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3783 bytes --]

On 25/06/2016 3:50 AM, Austin S. Hemmelgarn wrote:
> On 2016-06-24 13:43, Steven Haigh wrote:
>> On 25/06/16 03:40, Austin S. Hemmelgarn wrote:
>>> On 2016-06-24 13:05, Steven Haigh wrote:
>>>> On 25/06/16 02:59, ronnie sahlberg wrote:
>>>> What I have in mind here is that a file seems to get CREATED when I
>>>> copy
>>>> the file that crashes the system in the target directory. I'm thinking
>>>> if I 'cp -an source/ target/' that it will make this somewhat easier
>>>> (it
>>>> won't overwrite the zero byte file).
>>> You may want to try with rsync (rsync -vahogSHAXOP should get just about
>>> everything possible out of the filesystem except for some security
>>> attributes (stuff like SELinux context), and will give you nice
>>> information about progress as well).  It will keep running in the face
>>> of individual read errors, and will only try each file once.  It also
>>> has the advantage of showing you the transfer rate and exactly where in
>>> the directory structure you are, and handles partial copies sanely too
>>> (it's more reliable restarting an rsync transfer than a cp one that got
>>> interrupted part way through).
>>
>> I may try that - I came up with this:
>> #!/bin/bash
>>
>> mount -o ro,nossd,degraded /dev/xvdc /mnt/fileshare/
>>
>> find /mnt/fileshare/data/Photos/ -type f -print0 |
>>     while IFS= read -r -d $'\0' line; do
>>         echo "Processing $line"
>>         DIR=`dirname "$line"`
>>         mkdir -p "/mnt/recover/$DIR"
>>         if [ ! -e "/mnt/recover/$line" ]; then
>>                 echo "Copying $line to /mnt/recover/$line"
>>                 touch "/mnt/recover/$line"
>>                 sync
>>                 cp -f "$line" "/mnt/recover/$line"
>>                 sync
>>         fi
>>     done
>>
>> umount /mnt/fileshare
>>
>> I'm slowly picking through the data - and it has crashed a few times...
>> It seems that there are some checksum failures that don't crash the
>> entire system - so that's a good thing to know - not sure if that means
>> that it is correcting the data with parity - or something else.
>>
>> I'll see how much data I can extract with this and go from there - as it
>> may be good enough to call it a success.
>>
> AH, if you're having issues with crashes when you hit errors, you may
> want to avoid rsync then, it will try to reread any files that don't
> match in size and mtime, so it would likely just keep crashing on the
> same file over and over again.
> 
> Also, looking at the script you've got, that will probably run faster
> too because it shouldn't need to call stat() on everything like rsync
> does (because of the size and mtime comparison).

Well, as a data point, the data is slowly coming off the RAID6 array.
Some stuff is just dead and crashes the entire host whenever you try to
access it. At the moment, my average uptime is about 2-3 minutes...

I've added my recovery rsync script to /etc/rc.local - and I'm just
starting / destroying the VM every time it crashes.

I'm also rsync'ing the data from that system out to other areas of
storage so I can pull off as much data as possible (I don't have a spare
4.4Tb to use).

I lost a total of 5 photos out of 83Gb worth - which is good. My music
collection doesn't seem to be that lucky - which means lots of time
ripping CDs in the future :P

I haven't tried the applications / ISOs directory yet - but we'll see
how that goes when I get there...

The photos were the main thing I was concerned about, the rest is just
handy.

Interesting though that EVERY crash references:
	kernel BUG at fs/btrfs/extent_io.c:2401!


-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-25  4:19             ` Steven Haigh
@ 2016-06-25 16:25               ` Chris Murphy
  2016-06-25 16:39                 ` Steven Haigh
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2016-06-25 16:25 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Btrfs BTRFS

On Fri, Jun 24, 2016 at 10:19 PM, Steven Haigh <netwiz@crc.id.au> wrote:

>
> Interesting though that EVERY crash references:
>         kernel BUG at fs/btrfs/extent_io.c:2401!

Yeah because you're mounted ro, and if this is 4.4.13 unmodified btrfs
from kernel.org then that's the 3rd line:

if (head->is_data) {
    ret = btrfs_del_csums(trans, root,
       node->bytenr,
       node->num_bytes);

So why/what is it cleaning up if it's mounted ro? Anyway, once you're
no longer making forward progress you could try something newer,
although it's a coin toss what to try. There are some issues with
4.6.0-4.6.2 but there have been a lot of changes in btrfs/extent_io.c
and btrfs/raid56.c between 4.4.13 that you're using and 4.6.2, so you
could try that or even build 4.7.rc4 or rc5 by tomorrowish and see how
that fairs. It sounds like there's just too much (mostly metadata)
corruption for the degraded state to deal with so it may not matter.
I'm really skeptical of btrfsck on degraded fs's so I don't think
that'll help.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-25 16:25               ` Chris Murphy
@ 2016-06-25 16:39                 ` Steven Haigh
  2016-06-25 17:14                   ` Chris Murphy
  2016-06-26  2:30                   ` Duncan
  0 siblings, 2 replies; 20+ messages in thread
From: Steven Haigh @ 2016-06-25 16:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2874 bytes --]

On 26/06/16 02:25, Chris Murphy wrote:
> On Fri, Jun 24, 2016 at 10:19 PM, Steven Haigh <netwiz@crc.id.au> wrote:
> 
>>
>> Interesting though that EVERY crash references:
>>         kernel BUG at fs/btrfs/extent_io.c:2401!
> 
> Yeah because you're mounted ro, and if this is 4.4.13 unmodified btrfs
> from kernel.org then that's the 3rd line:
> 
> if (head->is_data) {
>     ret = btrfs_del_csums(trans, root,
>        node->bytenr,
>        node->num_bytes);
> 
> So why/what is it cleaning up if it's mounted ro? Anyway, once you're
> no longer making forward progress you could try something newer,
> although it's a coin toss what to try. There are some issues with
> 4.6.0-4.6.2 but there have been a lot of changes in btrfs/extent_io.c
> and btrfs/raid56.c between 4.4.13 that you're using and 4.6.2, so you
> could try that or even build 4.7.rc4 or rc5 by tomorrowish and see how
> that fairs. It sounds like there's just too much (mostly metadata)
> corruption for the degraded state to deal with so it may not matter.
> I'm really skeptical of btrfsck on degraded fs's so I don't think
> that'll help.

Well, I did end up recovering the data that I cared about. I'm not
really keen to ride the BTRFS RAID6 train again any time soon :\

I now have the same as I've had for years - md RAID6 with XFS on top of
it. I'm still copying data back to the array from the various sources I
had to copy it to so I had enough space to do so.

What I find interesting is that the patterns of corruption in the BTRFS
RAID6 is quite clustered. I have ~80Gb of MP3s ripped over the years -
of that, the corruption would take out 3-4 songs in a row, then the next
10 albums or so were intact. What made recovery VERY hard, is that it
got to several situations that just caused a complete system hang.

I tried it on bare metal - just in case it was a Xen thing, but it hard
hung the entire machine then. In every case, it was a flurry of csum
error messages, then instant death. I would have been much happier if
the file had been skipped or returned as unavailable instead of having
the entire machine crash.

I ended up putting the bit of script that I posted earlier in
/etc/rc.local - then just kept doing:
	xl destroy myvm && xl create /etc/xen/myvm -c

Wait for the crash, run the above again.

All in all, it took me about 350 boots with an average uptime of about 3
minutes to get the data out that I decided to keep. While not a BTRFS
loss, I did decide with how long it was going to take to not bother
recovering ~3.5Tb of other data that is easily available in other places
on the internet. If I really need the Fedora 24 KDE Spin ISO, or the
CentOS 6 Install DVD, etc etc I can download it again.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-25 16:39                 ` Steven Haigh
@ 2016-06-25 17:14                   ` Chris Murphy
  2016-06-26  2:30                   ` Duncan
  1 sibling, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2016-06-25 17:14 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, Jun 25, 2016 at 10:39 AM, Steven Haigh <netwiz@crc.id.au> wrote:

> Well, I did end up recovering the data that I cared about. I'm not
> really keen to ride the BTRFS RAID6 train again any time soon :\
>
> I now have the same as I've had for years - md RAID6 with XFS on top of
> it. I'm still copying data back to the array from the various sources I
> had to copy it to so I had enough space to do so.

Just make sure you've got each drive's SCT ERC shorter than the kernel
SCSI command timer for each block device in
/sys/block/device-name/device/timeout or you can very easily end up
with the same if not worse problem which is total array collapse. It's
more rare to see the problem on mdraid6 because the extra parity ends
up papering over the problem caused by this misconfiguration, but it's
a misconfiguration that's the default unless you're using
enterprise/NAS specific drives with short recoveries set on them by
default. The linux-raid@ list is full of problems resulting from this
issue.

I think the obvious mistake here though is assuming reshapes entail no
risk. There's a -f required for a reason. You could have ended up in
just as bad situation doing a reshape without a backup of an md or lvm
based array. Yes it should work, and if it doesn't it's a bug, but how
much data do you want to lose today?



> What I find interesting is that the patterns of corruption in the BTRFS
> RAID6 is quite clustered. I have ~80Gb of MP3s ripped over the years -
> of that, the corruption would take out 3-4 songs in a row, then the next
> 10 albums or so were intact. What made recovery VERY hard, is that it
> got to several situations that just caused a complete system hang.

The data stripe size is 64KiB * (num of disks - 2). So in your case I
think that's 64 *3 = 192KiB. That's less than the size of one song, so
that means roughly 15 bad stripes in a row. That's less than a block
group also.

The Btrfs conversion should be safer than methods used by mdadm and
lvm because the operation is cow. The raid6 block group is supposed to
remain intact and "live" if you will, until the single block group is
written to stable media. The full crash set of kernel messages might
be useful to find out what was happening that instigated all of this
corruption. But even still the subsequent mount should at worst
rollback to state of block groups of different profiles where the most
recent (failed) conversion is still a raid6 block group intact.

So, I'd still say btrfs-image it and host it somewhere, file a bug,
cross reference this thread in the bug, and the bug URL in this
thread. Might take months or even a year before a dev looks at it, but
better than nothing.


>
> I tried it on bare metal - just in case it was a Xen thing, but it hard
> hung the entire machine then. In every case, it was a flurry of csum
> error messages, then instant death. I would have been much happier if
> the file had been skipped or returned as unavailable instead of having
> the entire machine crash.

Of course. The unanswered question though is why are there so many
csum errors? Are these metadata csum errors, or are they EXTENT_CSUM
errors, and how are they becoming wrong? Wrongly read, wrongly
written, wrongly recomputed from parity? How did the parity go bad if
that's the case? So it needs an autopsy or it just doesn't get better.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-25 16:39                 ` Steven Haigh
  2016-06-25 17:14                   ` Chris Murphy
@ 2016-06-26  2:30                   ` Duncan
  2016-06-26  3:13                     ` Steven Haigh
  1 sibling, 1 reply; 20+ messages in thread
From: Duncan @ 2016-06-26  2:30 UTC (permalink / raw)
  To: linux-btrfs

Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:

> In every case, it was a flurry of csum error messages, then instant
> death.

This is very possibly a known bug in btrfs, that occurs even in raid1 
where a later scrub repairs all csum errors.  While in theory btrfs raid1 
should simply pull from the mirrored copy if its first try fails checksum 
(assuming the second one passes, of course), and it seems to do this just 
fine if there's only an occasional csum error, if it gets too many at 
once, it *does* unfortunately crash, despite the second copy being 
available and being just fine as later demonstrated by the scrub fixing 
the bad copy from the good one.

I'm used to dealing with that here any time I have a bad shutdown (and 
I'm running live-git kde, which currently has a bug that triggers a 
system crash if I let it idle and shut off the monitors, so I've been 
getting crash shutdowns and having to deal with this unfortunately often, 
recently).  Fortunately I keep my root, with all system executables, etc, 
mounted read-only by default, so it's not affected and I can /almost/ 
boot normally after such a crash.  The problem is /var/log and /home 
(which has some parts of /var that need to be writable symlinked into /
home/var, so / can stay read-only).  Something in the normal after-crash 
boot triggers enough csum errors there that I often crash again.

So I have to boot to emergency mode and manually mount the filesystems in 
question, so nothing's trying to access them until I run the scrub and 
fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully, 
and once it has repaired all the csum errors due to partial writes on one 
mirror that either were never made or were properly completed on the 
other mirror, I can exit emergency mode and complete the normal boot (to 
the multi-user default target).  As there's no more csum errors then 
because scrub fixed them all, the boot doesn't crash due to too many such 
errors, and I'm back in business.


Tho I believe at least the csum bug that affects me may only trigger if 
compression is (or perhaps has been in the past) enabled.  Since I run 
compress=lzo everywhere, that would certainly affect me.  It would also 
explain why the bug has remained around for quite some time as well, 
since presumably the devs don't run with compression on enough for this 
to have become a personal itch they needed to scratch, thus its remaining 
untraced and unfixed.

So if you weren't using the compress option, your bug is probably 
different, but either way, the whole thing about too many csum errors at 
once triggering a system crash sure does sound familiar, here.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Trying to rescue my data :(
  2016-06-26  2:30                   ` Duncan
@ 2016-06-26  3:13                     ` Steven Haigh
  2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Haigh @ 2016-06-26  3:13 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3072 bytes --]

On 26/06/16 12:30, Duncan wrote:
> Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
> 
>> In every case, it was a flurry of csum error messages, then instant
>> death.
> 
> This is very possibly a known bug in btrfs, that occurs even in raid1 
> where a later scrub repairs all csum errors.  While in theory btrfs raid1 
> should simply pull from the mirrored copy if its first try fails checksum 
> (assuming the second one passes, of course), and it seems to do this just 
> fine if there's only an occasional csum error, if it gets too many at 
> once, it *does* unfortunately crash, despite the second copy being 
> available and being just fine as later demonstrated by the scrub fixing 
> the bad copy from the good one.
> 
> I'm used to dealing with that here any time I have a bad shutdown (and 
> I'm running live-git kde, which currently has a bug that triggers a 
> system crash if I let it idle and shut off the monitors, so I've been 
> getting crash shutdowns and having to deal with this unfortunately often, 
> recently).  Fortunately I keep my root, with all system executables, etc, 
> mounted read-only by default, so it's not affected and I can /almost/ 
> boot normally after such a crash.  The problem is /var/log and /home 
> (which has some parts of /var that need to be writable symlinked into /
> home/var, so / can stay read-only).  Something in the normal after-crash 
> boot triggers enough csum errors there that I often crash again.
> 
> So I have to boot to emergency mode and manually mount the filesystems in 
> question, so nothing's trying to access them until I run the scrub and 
> fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully, 
> and once it has repaired all the csum errors due to partial writes on one 
> mirror that either were never made or were properly completed on the 
> other mirror, I can exit emergency mode and complete the normal boot (to 
> the multi-user default target).  As there's no more csum errors then 
> because scrub fixed them all, the boot doesn't crash due to too many such 
> errors, and I'm back in business.
> 
> 
> Tho I believe at least the csum bug that affects me may only trigger if 
> compression is (or perhaps has been in the past) enabled.  Since I run 
> compress=lzo everywhere, that would certainly affect me.  It would also 
> explain why the bug has remained around for quite some time as well, 
> since presumably the devs don't run with compression on enough for this 
> to have become a personal itch they needed to scratch, thus its remaining 
> untraced and unfixed.
> 
> So if you weren't using the compress option, your bug is probably 
> different, but either way, the whole thing about too many csum errors at 
> once triggering a system crash sure does sound familiar, here.

Yes, I was running the compress=lzo option as well... Maybe here lays a
common problem?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* compress=lzo safe to use? (was: Re: Trying to rescue my data :()
  2016-06-26  3:13                     ` Steven Haigh
@ 2016-09-11 19:48                       ` Martin Steigerwald
  2016-09-11 20:06                         ` Adam Borowski
                                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Martin Steigerwald @ 2016-09-11 19:48 UTC (permalink / raw)
  To: Steven Haigh; +Cc: linux-btrfs

Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh:
> On 26/06/16 12:30, Duncan wrote:
> > Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
> >> In every case, it was a flurry of csum error messages, then instant
> >> death.
> > 
> > This is very possibly a known bug in btrfs, that occurs even in raid1
> > where a later scrub repairs all csum errors.  While in theory btrfs raid1
> > should simply pull from the mirrored copy if its first try fails checksum
> > (assuming the second one passes, of course), and it seems to do this just
> > fine if there's only an occasional csum error, if it gets too many at
> > once, it *does* unfortunately crash, despite the second copy being
> > available and being just fine as later demonstrated by the scrub fixing
> > the bad copy from the good one.
> > 
> > I'm used to dealing with that here any time I have a bad shutdown (and
> > I'm running live-git kde, which currently has a bug that triggers a
> > system crash if I let it idle and shut off the monitors, so I've been
> > getting crash shutdowns and having to deal with this unfortunately often,
> > recently).  Fortunately I keep my root, with all system executables, etc,
> > mounted read-only by default, so it's not affected and I can /almost/
> > boot normally after such a crash.  The problem is /var/log and /home
> > (which has some parts of /var that need to be writable symlinked into /
> > home/var, so / can stay read-only).  Something in the normal after-crash
> > boot triggers enough csum errors there that I often crash again.
> > 
> > So I have to boot to emergency mode and manually mount the filesystems in
> > question, so nothing's trying to access them until I run the scrub and
> > fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully,
> > and once it has repaired all the csum errors due to partial writes on one
> > mirror that either were never made or were properly completed on the
> > other mirror, I can exit emergency mode and complete the normal boot (to
> > the multi-user default target).  As there's no more csum errors then
> > because scrub fixed them all, the boot doesn't crash due to too many such
> > errors, and I'm back in business.
> > 
> > 
> > Tho I believe at least the csum bug that affects me may only trigger if
> > compression is (or perhaps has been in the past) enabled.  Since I run
> > compress=lzo everywhere, that would certainly affect me.  It would also
> > explain why the bug has remained around for quite some time as well,
> > since presumably the devs don't run with compression on enough for this
> > to have become a personal itch they needed to scratch, thus its remaining
> > untraced and unfixed.
> > 
> > So if you weren't using the compress option, your bug is probably
> > different, but either way, the whole thing about too many csum errors at
> > once triggering a system crash sure does sound familiar, here.
> 
> Yes, I was running the compress=lzo option as well... Maybe here lays a
> common problem?

Hmm… I found this from being referred to by reading Debian wiki page on 
BTRFS¹.

I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an 
issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?

I just want to assess whether using compress=lzo might be dangerous to use in 
my setup. Actually right now I like to keep using it, since I think at least 
one of the SSDs does not compress. And… well… /home and / where I use it are 
both quite full already.

[1] https://wiki.debian.org/Btrfs#WARNINGS

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compress=lzo safe to use? (was: Re: Trying to rescue my data :()
  2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
@ 2016-09-11 20:06                         ` Adam Borowski
  2016-09-11 20:27                           ` Chris Murphy
  2016-09-11 20:49                         ` compress=lzo safe to use? Hans van Kranenburg
  2016-09-12  1:00                         ` Steven Haigh
  2 siblings, 1 reply; 20+ messages in thread
From: Adam Borowski @ 2016-09-11 20:06 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Steven Haigh, linux-btrfs

On Sun, Sep 11, 2016 at 09:48:35PM +0200, Martin Steigerwald wrote:
> Hmm… I found this from being referred to by reading Debian wiki page on 
> BTRFS¹.
> 
> I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an 
> issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?
> 
> I just want to assess whether using compress=lzo might be dangerous to use in 
> my setup. Actually right now I like to keep using it, since I think at least 
> one of the SSDs does not compress. And… well… /home and / where I use it are 
> both quite full already.
> 
> [1] https://wiki.debian.org/Btrfs#WARNINGS

I have used compress=lzo for years, kernels 3.8, 3.13 and 3.14 (a bunch of
machines), without a single glitch; heavy snapshotting, single dev only, no
quota.  Until recently I did never balanced.

I did have a case of ENOSPC with <80% full on 4.7 which might or might not
be related to compress=lzo.

-- 
Second "wet cat laying down on a powered-on box-less SoC on the desk" close
shave in a week.  Protect your ARMs, folks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compress=lzo safe to use? (was: Re: Trying to rescue my data :()
  2016-09-11 20:06                         ` Adam Borowski
@ 2016-09-11 20:27                           ` Chris Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2016-09-11 20:27 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Martin Steigerwald, Steven Haigh, Btrfs BTRFS

On Sun, Sep 11, 2016 at 2:06 PM, Adam Borowski <kilobyte@angband.pl> wrote:
> On Sun, Sep 11, 2016 at 09:48:35PM +0200, Martin Steigerwald wrote:
>> Hmm… I found this from being referred to by reading Debian wiki page on
>> BTRFS¹.
>>
>> I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an
>> issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?
>>
>> I just want to assess whether using compress=lzo might be dangerous to use in
>> my setup. Actually right now I like to keep using it, since I think at least
>> one of the SSDs does not compress. And… well… /home and / where I use it are
>> both quite full already.
>>
>> [1] https://wiki.debian.org/Btrfs#WARNINGS
>
> I have used compress=lzo for years, kernels 3.8, 3.13 and 3.14 (a bunch of
> machines), without a single glitch; heavy snapshotting, single dev only, no
> quota.  Until recently I did never balanced.
>
> I did have a case of ENOSPC with <80% full on 4.7 which might or might not
> be related to compress=lzo.

I'm not finding it off hand, but Duncan has some experience with this
issue, where he'd occasionally have some sort of problem (hand wave),
I don't know how serious it was, maybe just scary warnings like a call
trace or something, but no actual problem? My recollection is that
compression might be making certain edge case problems more difficult
to recover from. I don't know why that would be, as metadata itself
isn't compressed (the inline data saved in metadata nodes can be
compressed). But there you go, if things start going wonky compression
might make it more difficult. But that's speculative. And I also don't
know if there's any difference between lzo and zlib in this regard
either.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compress=lzo safe to use?
  2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
  2016-09-11 20:06                         ` Adam Borowski
@ 2016-09-11 20:49                         ` Hans van Kranenburg
  2016-09-12  4:36                           ` Duncan
  2016-09-12  1:00                         ` Steven Haigh
  2 siblings, 1 reply; 20+ messages in thread
From: Hans van Kranenburg @ 2016-09-11 20:49 UTC (permalink / raw)
  To: Martin Steigerwald, Steven Haigh; +Cc: linux-btrfs

On 09/11/2016 09:48 PM, Martin Steigerwald wrote:
> Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh:
>> On 26/06/16 12:30, Duncan wrote:
>>> Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
>>>> In every case, it was a flurry of csum error messages, then instant
>>>> death.
>>>
>>> This is very possibly a known bug in btrfs, that occurs even in raid1
>>> where a later scrub repairs all csum errors.  While in theory btrfs raid1
>>> should simply pull from the mirrored copy if its first try fails checksum
>>> (assuming the second one passes, of course), and it seems to do this just
>>> fine if there's only an occasional csum error, if it gets too many at
>>> once, it *does* unfortunately crash [...]

[...]

>>> different, but either way, the whole thing about too many csum errors at
>>> once triggering a system crash sure does sound familiar, here.
>>
>> Yes, I was running the compress=lzo option as well... Maybe here lays a
>> common problem?
> 
> Hmm… I found this from being referred to by reading Debian wiki page on 
> BTRFS¹.
> 
> I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an 
> issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?

To quote you from the "stability a joke" thread (which I guess this
might be related to)... "For me so far even compress=lzo seems to be
stable, but well for others it may not."

So, you can use a lot of compress without problems for years.

Only if your hardware is starting to break in a specific way, causing
lots and lots of checksum errors, the kernel might not be able to handle
all of them at the same time currently.

The compress might be super stable itself, but in this case another part
of the filesystem is not perfecty able to handle certain failure
scenario's involving it.

Another way to find out about "are there issues with compression" is
looking in the kernel git history.

When searching for "compression" and "corruption", you'll find fixes
like these:

commit 0305cd5f7fca85dae392b9ba85b116896eb7c1c7
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Oct 16 12:34:25 2015 +0100

    Btrfs: fix truncation of compressed and inlined extents

commit 808f80b46790f27e145c72112189d6a3be2bc884
Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Sep 28 09:56:26 2015 +0100

    Btrfs: update fix for read corruption of compressed and shared extents

commit 005efedf2c7d0a270ffbe28d8997b03844f3e3e7
Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Sep 14 09:09:31 2015 +0100

    Btrfs: fix read corruption of compressed and shared extents

commit 619d8c4ef7c5dd346add55da82c9179cd2e3387e
Author: Filipe Manana <fdmanana@suse.com>
Date:   Sun May 3 01:56:00 2015 +0100

    Btrfs: incremental send, fix clone operations for compressed extents

These commits fix actual data corruption issues. Still, it might be bugs
that you've never seen, even when using a kernel with these bugs for
years, because they require a certain "nasty sequence of events" to trigger.

But, when using compression you certainly want to have these commits in
the kernel you're running right now. And when the bugs caused
corruption, using a fixed kernel with not retroactively fix the corrupt
data.

Hint: "this was fixed in 4.x.y, so run that version or later" is not
always the only answer here, because you'll see that fixes like these
even show up in kernels like 3.16.y

But maybe I should continue by replying on the joke thread instead of
typing more here.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compress=lzo safe to use?
  2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
  2016-09-11 20:06                         ` Adam Borowski
  2016-09-11 20:49                         ` compress=lzo safe to use? Hans van Kranenburg
@ 2016-09-12  1:00                         ` Steven Haigh
  2 siblings, 0 replies; 20+ messages in thread
From: Steven Haigh @ 2016-09-12  1:00 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs

On 2016-09-12 05:48, Martin Steigerwald wrote:
> Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh:
>> On 26/06/16 12:30, Duncan wrote:
>> > Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
>> >> In every case, it was a flurry of csum error messages, then instant
>> >> death.
>> >
>> > This is very possibly a known bug in btrfs, that occurs even in raid1
>> > where a later scrub repairs all csum errors.  While in theory btrfs raid1
>> > should simply pull from the mirrored copy if its first try fails checksum
>> > (assuming the second one passes, of course), and it seems to do this just
>> > fine if there's only an occasional csum error, if it gets too many at
>> > once, it *does* unfortunately crash, despite the second copy being
>> > available and being just fine as later demonstrated by the scrub fixing
>> > the bad copy from the good one.
>> >
>> > I'm used to dealing with that here any time I have a bad shutdown (and
>> > I'm running live-git kde, which currently has a bug that triggers a
>> > system crash if I let it idle and shut off the monitors, so I've been
>> > getting crash shutdowns and having to deal with this unfortunately often,
>> > recently).  Fortunately I keep my root, with all system executables, etc,
>> > mounted read-only by default, so it's not affected and I can /almost/
>> > boot normally after such a crash.  The problem is /var/log and /home
>> > (which has some parts of /var that need to be writable symlinked into /
>> > home/var, so / can stay read-only).  Something in the normal after-crash
>> > boot triggers enough csum errors there that I often crash again.
>> >
>> > So I have to boot to emergency mode and manually mount the filesystems in
>> > question, so nothing's trying to access them until I run the scrub and
>> > fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully,
>> > and once it has repaired all the csum errors due to partial writes on one
>> > mirror that either were never made or were properly completed on the
>> > other mirror, I can exit emergency mode and complete the normal boot (to
>> > the multi-user default target).  As there's no more csum errors then
>> > because scrub fixed them all, the boot doesn't crash due to too many such
>> > errors, and I'm back in business.
>> >
>> >
>> > Tho I believe at least the csum bug that affects me may only trigger if
>> > compression is (or perhaps has been in the past) enabled.  Since I run
>> > compress=lzo everywhere, that would certainly affect me.  It would also
>> > explain why the bug has remained around for quite some time as well,
>> > since presumably the devs don't run with compression on enough for this
>> > to have become a personal itch they needed to scratch, thus its remaining
>> > untraced and unfixed.
>> >
>> > So if you weren't using the compress option, your bug is probably
>> > different, but either way, the whole thing about too many csum errors at
>> > once triggering a system crash sure does sound familiar, here.
>> 
>> Yes, I was running the compress=lzo option as well... Maybe here lays 
>> a
>> common problem?
> 
> Hmm… I found this from being referred to by reading Debian wiki page on
> BTRFS¹.
> 
> I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found 
> an
> issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?

Yes, I was using RAID6 - and it has had a track record of eating data. 
There's lots of problems with the implementation / correctness of 
RAID5/6 parity - which I'm pretty sure haven't been nailed down yet. The 
recommendation at the moment is just not to use RAID5 or RAID6 modes of 
BTRFS. The last I heard, if you were using RAID5/6 in BTRFS, the 
recommended action was to migrate your data to a different profile or a 
different FS.

> I just want to assess whether using compress=lzo might be dangerous to 
> use in
> my setup. Actually right now I like to keep using it, since I think at 
> least
> one of the SSDs does not compress. And… well… /home and / where I use 
> it are
> both quite full already.

I don't believe the compress=lzo option by itself was a problem - but it 
*may* have an impact in the RAID5/6 parity problems? I'd be guessing 
here, but am happy to be corrected.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compress=lzo safe to use?
  2016-09-11 20:49                         ` compress=lzo safe to use? Hans van Kranenburg
@ 2016-09-12  4:36                           ` Duncan
  2016-09-17  9:30                             ` Kai Krakow
  0 siblings, 1 reply; 20+ messages in thread
From: Duncan @ 2016-09-12  4:36 UTC (permalink / raw)
  To: linux-btrfs

Hans van Kranenburg posted on Sun, 11 Sep 2016 22:49:58 +0200 as
excerpted:

> So, you can use a lot of compress without problems for years.
> 
> Only if your hardware is starting to break in a specific way, causing
> lots and lots of checksum errors, the kernel might not be able to handle
> all of them at the same time currently.
> 
> The compress might be super stable itself, but in this case another part
> of the filesystem is not perfecty able to handle certain failure
> scenario's involving it.

Well put.

In my case I had problems trigger due to exactly two things, tho there 
are obviously other ways of triggering the same issues, including a crash 
in the middle of a commit, with one copy of the raid1 already updated 
while the other is still being written.:

1) I first discovered the problem when one of my pair of ssds was going 
bad.  Because I had btrfs raid1 and could normally scrub-fix things, and 
because I had backups anyway, I chose to continue running it for some 
time, just to see how it handled things, as more and more sectors became 
unwritable and were replaced by spares.  By the end I had several MiB 
worth of spares in-use, altho smart reported I had only used about 15% of 
the available spares, but by then it was getting bad enough and the 
newness had worn off, so I just replaced it and got rid of the hassle.

But as a result of the above, I had a *LOT* of practice with btrfs 
recovery, mostly running scrub.

And what I found was that if btrfs raid1 encounters too many checksum 
errors in compressed data it will crash btrfs and the kernel, even when 
it *SHOULD* recover from the other device because it has a good copy, as 
demonstrated by the fact that after a reboot, I could run a scrub and fix 
everything, no uncorrected errors at all.

At first I thought it was just the way btrfs worked -- that it could 
handle a few checksum errors but not too many at once.  I had no idea it 
was compression related.  But nobody else seemed to mention the problem, 
which I though a bit strange, until someone /did/ mention it, and 
furthermore, actually tested both compressed and uncompressed btrfs, and 
found the problem only when btrfs was reading compressed data.  If the 
data wasn't compressed, btrfs went ahead and read the second copy 
correctly, without crashing the system, every time.

The extra kink in this is that at the time, I had a boot-time service 
setup to cache (via cat > /dev/null) a bunch of files in a particular 
directory.  This particular directory is a cache for news archives, with 
articles on some groups going back over a decade to 2002, and my news 
client (pan) is slow to startup with several gigs of cached messages like 
that, so I had the boot-time service pre-cache everything, so by the time 
I started X and pan, it would be done or nearly so and I'd not have to 
wait for pan to startup.

The problem was that many of the new files were in this directory, and 
all that activity tended to hit the going-bad sectors on that ssd rather 
frequently, making one copy often bad.  Additionally, these are mostly 
text messages, so they compress quite well, meaning compress=lzo would 
trigger compression on many of them.

And because I had it reading them at boot, the kernel tended to overload 
on checksum errors before it finished booting, far more frequently than 
it would have otherwise.  Of course, that would crash the system before I 
could get a login in ordered to run btrfs scrub and fix the problem.

What I had to do then was boot to rescue mode, with the filesystems 
mounted but before normal services (including this caching service) ran, 
run the scrub from there, and then continue boot, which would then work 
just fine because I'd fixed all the checksum errors.

But, as I said I eventually got tired of the hassle and just replaced the 
failing device.  Btrfs replace worked nicely. =:^)

2a) My second trigger is that I've found that with multiple devices, as 
in multi-device btrfs, but also when I used to run mdraid, don't always 
resume from suspend-to-RAM very well.  Often one device takes longer to 
wake up than the other(s), and the kernel will try to resume while one 
still isn't responding properly.  (FWIW, I ran into this problem on 
spinning rust back on mdraid, but I see it now on ssds on btrfs as well, 
so it seems to be a common issue, which probably remains relatively 
obscure I'd guess because relatively few people with multi-device btrfs 
or mdraid do suspend-to-ram.)

The result is that btrfs will try to write to the remaining device(s), 
getting them out of sync with the one that isn't responding properly 
yet.  Ultimately this leads to a crash if I don't catch it and complete a 
controlled shutdown before that, and sometimes I see the same crash-on-
boot-due-to-too-many-checksum-errors problem I saw with #1.  I no longer 
have that caching job running at boot and thus don't see it as often, but 
it still happens occasionally.  Again, once I boot to rescue mode and run 
scrub, it fixes the problem and I can resume the normal mode boot without 
further issue.

So I pretty much quit suspending to RAM, at least for any longer period, 
and just shutdown and reboot, now.  With systemd and ssds, the boot 
doesn't take significantly longer anyway, tho it does mean I can't simply 
resume and pick up where I was, I have to reopen my work, etc.

2b) Closely related to #2a and most recent, since I'm no longer trying to 
suspend to RAM, I think one of the ssds now has a bad backup capacitor or 
something, as if I leave it idle for too long it'll fail to respond once 
I start trying to use it again.  Same story, the other device gets writes 
that the unresponsive device is missing, and eventually if I don't reboot 
I crash.  Upon reboot, again, if there were too many things written to 
the device that stayed up that didn't make it to the other one, it can 
trigger a crash due to checksum failure.  However, if I can get a command 
prompt, either because it boots all the way or because I boot to rescue 
mode, I can run a scrub and update the bad device from the good one, and 
then everything works fine once again... until the device goes 
unresponsive, again.


Again, I once thought all this was just the stage at which btrfs was, 
until I found out that it doesn't seem to happen if btrfs compression 
isn't being used.  Something about the way it recovers from checksum 
errors on compressed data differs from the way it recovers from checksum 
errors on uncompressed data, and there's a bug in the compressed data 
processing path.  But beyond that, I'm not a dev and it gets a bit fuzzy, 
which also explains why I've not gone code diving and submitted patches 
to try to fix it, myself.

But if I'm correct, it probably doesn't matter what the compression type 
is, only how much of it there is.  So compress-force would tend to 
trigger the issue far more frequently than simply compress, unless of 
course your use-case is a corner-case like my trying to read all those 
compressible text messages into cache at boot was, but compress (or 
compress-force) =lzo vs =zlib shouldn't matter.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: compress=lzo safe to use?
  2016-09-12  4:36                           ` Duncan
@ 2016-09-17  9:30                             ` Kai Krakow
  0 siblings, 0 replies; 20+ messages in thread
From: Kai Krakow @ 2016-09-17  9:30 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 12 Sep 2016 04:36:07 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Again, I once thought all this was just the stage at which btrfs was, 
> until I found out that it doesn't seem to happen if btrfs compression 
> isn't being used.  Something about the way it recovers from checksum 
> errors on compressed data differs from the way it recovers from
> checksum errors on uncompressed data, and there's a bug in the
> compressed data processing path.  But beyond that, I'm not a dev and
> it gets a bit fuzzy, which also explains why I've not gone code
> diving and submitted patches to try to fix it, myself.

I suspect that may very well come from the decompression routine which
crashes - and not from btrfs itself. So essentially, the decompression
needs to be fixed instead (which probably slows it down by factors).

Only when this is tested and fixed, one should look into why btrfs
fails when decompression fails.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-09-17  9:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24 14:52 Trying to rescue my data :( Steven Haigh
2016-06-24 16:26 ` Steven Haigh
2016-06-24 16:59   ` ronnie sahlberg
2016-06-24 17:05     ` Steven Haigh
2016-06-24 17:40       ` Austin S. Hemmelgarn
2016-06-24 17:43         ` Steven Haigh
2016-06-24 17:50           ` Austin S. Hemmelgarn
2016-06-25  4:19             ` Steven Haigh
2016-06-25 16:25               ` Chris Murphy
2016-06-25 16:39                 ` Steven Haigh
2016-06-25 17:14                   ` Chris Murphy
2016-06-26  2:30                   ` Duncan
2016-06-26  3:13                     ` Steven Haigh
2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
2016-09-11 20:06                         ` Adam Borowski
2016-09-11 20:27                           ` Chris Murphy
2016-09-11 20:49                         ` compress=lzo safe to use? Hans van Kranenburg
2016-09-12  4:36                           ` Duncan
2016-09-17  9:30                             ` Kai Krakow
2016-09-12  1:00                         ` Steven Haigh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.