Nasty corruption on large array, ideas welcome

* Nasty corruption on large array, ideas welcome
@ 2019-01-08 19:33 Thiago Ramon
  2019-01-09  0:05 ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Thiago Ramon @ 2019-01-08 19:33 UTC (permalink / raw)
  To: linux-btrfs

I have a pretty complicated setup here, so first a general description:
8 HDs: 4x5TB, 2x4TB, 2x8TB

Each disk is a LVM PV containing a BCACHE backing device, which then
contains the BTRFS disks. All the drives then were in writeback mode
on a SSD BCACHE cache partition (terrible setup, I know, but without
the caching the system was getting too slow to use).

I had all my data, metadata and system blocks on RAID1, but as I'm
running out of space, and the new kernels are getting better RAID5/6
support recently, I've finally decided to migrate to RAID6 and was
starting it off with the metadata.

It was running well (I was already expecting it to be slow, so no
problem there), but I had to spend some days away from the machine.
Due to an air conditioning failure, the room temperature went pretty
high and one of the disks decided to die (apparently only
temporarily). BCACHE couldn't write to the backing device anymore, so
it ejected all drives and let them cope with it by themselves. I've
caught the trouble some 12h later, still away, and shut down anything
accessing the disks until I could be physically there to handle the
issue.

After I got back and got the temperature down to acceptable levels,
I've checked the failed drive, which seems to be working well after
getting re-inserted, but it's of course out of date with the rest of
the drives. But apparently the rest got some corruption as well when
they got ejected from the cache, and I'm getting some errors I haven't
been able to handle.

I've gone through the steps here that helped me before when having
complicated crashes on this system, but this time it wasn't enough,
and I'll need some advice from people who know the BTRFS internals
better than me to get this back running. I have around 20TB of data in
the drives, so copying the data out is the last resort, as I'd prefer
to let most of it die than to buy a few disks to fit all of that.

Now on to the errors:

I've tried both with the "failed" drive in (which gives me additional
transid errors) and without it.

Trying to mount with it gives me:
[Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag
[ +0.000010] BTRFS info (device bcache0): disk space caching is enabled
[ +0.671411] BTRFS error (device bcache0): parent transid verify
failed on 77292724051968 wanted > 1499510 found 1499467
[ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2
block=77292724051968 slot=2, bad key order, prev (39029522223104 168
212992) current (39029521915904 168 16384)
[ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5
[ +0.022884] BTRFS error (device bcache0): open_ctree failed

Trying without the disk (and -o degraded) gives me:
[Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag
[ +0.000002] BTRFS info (device bcache1): allowing degraded mounts
[ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated,
use 'usebackuproot' instead
[ +0.000000] BTRFS info (device bcache1): trying to use backup root at
mount time[ +0.000002] BTRFS info (device bcache1): disabling disk
space caching
[ +0.000001] BTRFS info (device bcache1): force clearing of disk cache
[ +0.001334] BTRFS warning (device bcache1): devid 2 uuid
27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing
[ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2
block=77291982323712 slot=0, unexpected item end, have 685883288
expect 3995
[ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5
[ +0.017842] BTRFS error (device bcache1): open_ctree failed

btrfs check output (without drive):
warning, device 2 is missing
checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6
checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
bytenr mismatch, want=77088164081664, have=274663271295232
Couldn't read chunk tree
ERROR: cannot open file system

I've already tried super-recover, zero-log and chunk-recover without
any results, and check with --repair fails the same way as without.

So, any ideas? I'll be happy to run experiments and grab more logs if
anyone wants more details.

And thanks for any suggestions.

^ permalink raw reply	[flat|nested] 9+ messages in thread