Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* Nasty corruption on large array, ideas welcome
@ 2019-01-08 19:33 Thiago Ramon
  2019-01-09  0:05 ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Thiago Ramon @ 2019-01-08 19:33 UTC (permalink / raw)
  To: linux-btrfs

I have a pretty complicated setup here, so first a general description:
8 HDs: 4x5TB, 2x4TB, 2x8TB

Each disk is a LVM PV containing a BCACHE backing device, which then
contains the BTRFS disks. All the drives then were in writeback mode
on a SSD BCACHE cache partition (terrible setup, I know, but without
the caching the system was getting too slow to use).

I had all my data, metadata and system blocks on RAID1, but as I'm
running out of space, and the new kernels are getting better RAID5/6
support recently, I've finally decided to migrate to RAID6 and was
starting it off with the metadata.


It was running well (I was already expecting it to be slow, so no
problem there), but I had to spend some days away from the machine.
Due to an air conditioning failure, the room temperature went pretty
high and one of the disks decided to die (apparently only
temporarily). BCACHE couldn't write to the backing device anymore, so
it ejected all drives and let them cope with it by themselves. I've
caught the trouble some 12h later, still away, and shut down anything
accessing the disks until I could be physically there to handle the
issue.

After I got back and got the temperature down to acceptable levels,
I've checked the failed drive, which seems to be working well after
getting re-inserted, but it's of course out of date with the rest of
the drives. But apparently the rest got some corruption as well when
they got ejected from the cache, and I'm getting some errors I haven't
been able to handle.

I've gone through the steps here that helped me before when having
complicated crashes on this system, but this time it wasn't enough,
and I'll need some advice from people who know the BTRFS internals
better than me to get this back running. I have around 20TB of data in
the drives, so copying the data out is the last resort, as I'd prefer
to let most of it die than to buy a few disks to fit all of that.


Now on to the errors:

I've tried both with the "failed" drive in (which gives me additional
transid errors) and without it.

Trying to mount with it gives me:
[Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag
[ +0.000010] BTRFS info (device bcache0): disk space caching is enabled
[ +0.671411] BTRFS error (device bcache0): parent transid verify
failed on 77292724051968 wanted > 1499510 found 1499467
[ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2
block=77292724051968 slot=2, bad key order, prev (39029522223104 168
212992) current (39029521915904 168 16384)
[ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5
[ +0.022884] BTRFS error (device bcache0): open_ctree failed

Trying without the disk (and -o degraded) gives me:
[Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag
[ +0.000002] BTRFS info (device bcache1): allowing degraded mounts
[ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated,
use 'usebackuproot' instead
[ +0.000000] BTRFS info (device bcache1): trying to use backup root at
mount time[ +0.000002] BTRFS info (device bcache1): disabling disk
space caching
[ +0.000001] BTRFS info (device bcache1): force clearing of disk cache
[ +0.001334] BTRFS warning (device bcache1): devid 2 uuid
27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing
[ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2
block=77291982323712 slot=0, unexpected item end, have 685883288
expect 3995
[ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5
[ +0.017842] BTRFS error (device bcache1): open_ctree failed

btrfs check output (without drive):
warning, device 2 is missing
checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6
checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
bytenr mismatch, want=77088164081664, have=274663271295232
Couldn't read chunk tree
ERROR: cannot open file system

I've already tried super-recover, zero-log and chunk-recover without
any results, and check with --repair fails the same way as without.

So, any ideas? I'll be happy to run experiments and grab more logs if
anyone wants more details.


And thanks for any suggestions.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Nasty corruption on large array, ideas welcome
  2019-01-08 19:33 Nasty corruption on large array, ideas welcome Thiago Ramon
@ 2019-01-09  0:05 ` Qu Wenruo
  2019-01-10 15:50   ` Thiago Ramon
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2019-01-09  0:05 UTC (permalink / raw)
  To: Thiago Ramon, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 4911 bytes --]



On 2019/1/9 上午3:33, Thiago Ramon wrote:
> I have a pretty complicated setup here, so first a general description:
> 8 HDs: 4x5TB, 2x4TB, 2x8TB
> 
> Each disk is a LVM PV containing a BCACHE backing device, which then
> contains the BTRFS disks. All the drives then were in writeback mode
> on a SSD BCACHE cache partition (terrible setup, I know, but without
> the caching the system was getting too slow to use).
> 
> I had all my data, metadata and system blocks on RAID1, but as I'm
> running out of space, and the new kernels are getting better RAID5/6
> support recently, I've finally decided to migrate to RAID6 and was
> starting it off with the metadata.
> 
> 
> It was running well (I was already expecting it to be slow, so no
> problem there), but I had to spend some days away from the machine.
> Due to an air conditioning failure, the room temperature went pretty
> high and one of the disks decided to die (apparently only
> temporarily). BCACHE couldn't write to the backing device anymore, so
> it ejected all drives and let them cope with it by themselves. I've
> caught the trouble some 12h later, still away, and shut down anything
> accessing the disks until I could be physically there to handle the
> issue.
> 
> After I got back and got the temperature down to acceptable levels,
> I've checked the failed drive, which seems to be working well after
> getting re-inserted, but it's of course out of date with the rest of
> the drives. But apparently the rest got some corruption as well when
> they got ejected from the cache, and I'm getting some errors I haven't
> been able to handle.
> 
> I've gone through the steps here that helped me before when having
> complicated crashes on this system, but this time it wasn't enough,
> and I'll need some advice from people who know the BTRFS internals
> better than me to get this back running. I have around 20TB of data in
> the drives, so copying the data out is the last resort, as I'd prefer
> to let most of it die than to buy a few disks to fit all of that.
> 
> 
> Now on to the errors:
> 
> I've tried both with the "failed" drive in (which gives me additional
> transid errors) and without it.
> 
> Trying to mount with it gives me:
> [Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag
> [ +0.000010] BTRFS info (device bcache0): disk space caching is enabled
> [ +0.671411] BTRFS error (device bcache0): parent transid verify
> failed on 77292724051968 wanted > 1499510 found 1499467
> [ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2
> block=77292724051968 slot=2, bad key order, prev (39029522223104 168
> 212992) current (39029521915904 168 16384)

Heavily corrupted extent tree.

And there is a very good experimental patch for you:
https://patchwork.kernel.org/patch/10738583/

Then go mount with "skip_bg,ro" mount option.

Please note this can only help you to salvage data (kernel version of
btrfs-store).

AFAIK, the corruption may affect fs trees too, so be aware of corrupted
data.

Thanks,
Qu


> [ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5
> [ +0.022884] BTRFS error (device bcache0): open_ctree failed
> 
> Trying without the disk (and -o degraded) gives me:
> [Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag
> [ +0.000002] BTRFS info (device bcache1): allowing degraded mounts
> [ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated,
> use 'usebackuproot' instead
> [ +0.000000] BTRFS info (device bcache1): trying to use backup root at
> mount time[ +0.000002] BTRFS info (device bcache1): disabling disk
> space caching
> [ +0.000001] BTRFS info (device bcache1): force clearing of disk cache
> [ +0.001334] BTRFS warning (device bcache1): devid 2 uuid
> 27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing
> [ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2
> block=77291982323712 slot=0, unexpected item end, have 685883288
> expect 3995
> [ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5
> [ +0.017842] BTRFS error (device bcache1): open_ctree failed
> 
> btrfs check output (without drive):
> warning, device 2 is missing
> checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6
> checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
> checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
> bytenr mismatch, want=77088164081664, have=274663271295232
> Couldn't read chunk tree
> ERROR: cannot open file system
> 
> I've already tried super-recover, zero-log and chunk-recover without
> any results, and check with --repair fails the same way as without.
> 
> So, any ideas? I'll be happy to run experiments and grab more logs if
> anyone wants more details.
> 
> 
> And thanks for any suggestions.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Nasty corruption on large array, ideas welcome
  2019-01-09  0:05 ` Qu Wenruo
@ 2019-01-10 15:50   ` Thiago Ramon
  0 siblings, 0 replies; 3+ messages in thread
From: Thiago Ramon @ 2019-01-10 15:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jan 8, 2019 at 10:05 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/1/9 上午3:33, Thiago Ramon wrote:
> > I have a pretty complicated setup here, so first a general description:
> > 8 HDs: 4x5TB, 2x4TB, 2x8TB
> >
> > Each disk is a LVM PV containing a BCACHE backing device, which then
> > contains the BTRFS disks. All the drives then were in writeback mode
> > on a SSD BCACHE cache partition (terrible setup, I know, but without
> > the caching the system was getting too slow to use).
> >
> > I had all my data, metadata and system blocks on RAID1, but as I'm
> > running out of space, and the new kernels are getting better RAID5/6
> > support recently, I've finally decided to migrate to RAID6 and was
> > starting it off with the metadata.
> >
> >
> > It was running well (I was already expecting it to be slow, so no
> > problem there), but I had to spend some days away from the machine.
> > Due to an air conditioning failure, the room temperature went pretty
> > high and one of the disks decided to die (apparently only
> > temporarily). BCACHE couldn't write to the backing device anymore, so
> > it ejected all drives and let them cope with it by themselves. I've
> > caught the trouble some 12h later, still away, and shut down anything
> > accessing the disks until I could be physically there to handle the
> > issue.
> >
> > After I got back and got the temperature down to acceptable levels,
> > I've checked the failed drive, which seems to be working well after
> > getting re-inserted, but it's of course out of date with the rest of
> > the drives. But apparently the rest got some corruption as well when
> > they got ejected from the cache, and I'm getting some errors I haven't
> > been able to handle.
> >
> > I've gone through the steps here that helped me before when having
> > complicated crashes on this system, but this time it wasn't enough,
> > and I'll need some advice from people who know the BTRFS internals
> > better than me to get this back running. I have around 20TB of data in
> > the drives, so copying the data out is the last resort, as I'd prefer
> > to let most of it die than to buy a few disks to fit all of that.
> >
> >
> > Now on to the errors:
> >
> > I've tried both with the "failed" drive in (which gives me additional
> > transid errors) and without it.
> >
> > Trying to mount with it gives me:
> > [Jan 7 20:18] BTRFS info (device bcache0): enabling auto defrag
> > [ +0.000010] BTRFS info (device bcache0): disk space caching is enabled
> > [ +0.671411] BTRFS error (device bcache0): parent transid verify
> > failed on 77292724051968 wanted > 1499510 found 1499467
> > [ +0.005950] BTRFS critical (device bcache0): corrupt leaf: root=2
> > block=77292724051968 slot=2, bad key order, prev (39029522223104 168
> > 212992) current (39029521915904 168 16384)
>
> Heavily corrupted extent tree.
>
> And there is a very good experimental patch for you:
> https://patchwork.kernel.org/patch/10738583/
>
> Then go mount with "skip_bg,ro" mount option.
>
> Please note this can only help you to salvage data (kernel version of
> btrfs-store).
>
> AFAIK, the corruption may affect fs trees too, so be aware of corrupted
> data.
>
> Thanks,
> Qu
>
>

Thanks for pointing me to that patch, I've tried it and the FS mounted
without issues.
I've managed to get a snapshot of the folder structure and haven't
noticed anything important missing, is there some way to get a list of
anything that might have been corrupted, or I'll just find out as I
try to access the file contents?
Also, is there any hope of recovering the trees in place or should I
just abandon this one and start with a new volume?
It occurred to me that I might be able to run a scrub in the disk now
that it's mounted, is that even possible in a situation like this, and
more importantly, is it sane?

and finally, thanks again for the patch,
Thiago Ramon

> > [ +0.000378] BTRFS error (device bcache0): failed to read block groups: -5
> > [ +0.022884] BTRFS error (device bcache0): open_ctree failed
> >
> > Trying without the disk (and -o degraded) gives me:
> > [Jan 8 12:51] BTRFS info (device bcache1): enabling auto defrag
> > [ +0.000002] BTRFS info (device bcache1): allowing degraded mounts
> > [ +0.000002] BTRFS warning (device bcache1): 'recovery' is deprecated,
> > use 'usebackuproot' instead
> > [ +0.000000] BTRFS info (device bcache1): trying to use backup root at
> > mount time[ +0.000002] BTRFS info (device bcache1): disabling disk
> > space caching
> > [ +0.000001] BTRFS info (device bcache1): force clearing of disk cache
> > [ +0.001334] BTRFS warning (device bcache1): devid 2 uuid
> > 27f87964-1b9a-466c-ac18-b47c0d2faa1c is missing
> > [ +1.049591] BTRFS critical (device bcache1): corrupt leaf: root=2
> > block=77291982323712 slot=0, unexpected item end, have 685883288
> > expect 3995
> > [ +0.000739] BTRFS error (device bcache1): failed to read block groups: -5
> > [ +0.017842] BTRFS error (device bcache1): open_ctree failed
> >
> > btrfs check output (without drive):
> > warning, device 2 is missing
> > checksum verify failed on 77088164081664 found 715B4470 wanted 580444F6
> > checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
> > checksum verify failed on 77088164081664 found 98775719 wanted FA63AD42
> > bytenr mismatch, want=77088164081664, have=274663271295232
> > Couldn't read chunk tree
> > ERROR: cannot open file system
> >
> > I've already tried super-recover, zero-log and chunk-recover without
> > any results, and check with --repair fails the same way as without.
> >
> > So, any ideas? I'll be happy to run experiments and grab more logs if
> > anyone wants more details.
> >
> >
> > And thanks for any suggestions.
> >
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-08 19:33 Nasty corruption on large array, ideas welcome Thiago Ramon
2019-01-09  0:05 ` Qu Wenruo
2019-01-10 15:50   ` Thiago Ramon

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox