Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"

All of lore.kernel.org
 help / color / mirror / Atom feed

* Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"
@ 2018-07-11 15:37 Udo Waechter
  2018-07-11 17:48 ` Chris Murphy
  2018-07-16  8:15 ` Udo Waechter
  0 siblings, 2 replies; 5+ messages in thread
From: Udo Waechter @ 2018-07-11 15:37 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2324 bytes --]

Hello everyone,

I have a corrupted filesystem which I can't seem to recover.

The machine is:
Debian Linux, kernel 4.9 and btrfs-progs v4.13.3

I have a HDD RAID5 with LVM and the volume in question is a LVM volume.
On top of that I had a RAID1 SSD cache with lvm-cache.

Yesterday both! SSDs died within minutes. This lead to the corruped
filesystem that I have now.

I hope I followed the procedure correctly.

What I tried so far:
* "mount -o usebackuproot,ro " and "nospace_cache" "clear_cache" and all
permutations of these mount options

I'm getting:

[96926.830400] BTRFS info (device dm-2): trying to use backup root at
mount time
[96926.830406] BTRFS info (device dm-2): disk space caching is enabled
[96926.927978] BTRFS error (device dm-2): parent transid verify failed
on 321269628928 wanted 3276017 found 3275985
[96926.938619] BTRFS error (device dm-2): parent transid verify failed
on 321269628928 wanted 3276017 found 3275985
[96926.940705] BTRFS error (device dm-2): failed to recover balance: -5
[96926.985801] BTRFS error (device dm-2): open_ctree failed

The weird thing is that I can't really find information about the
"failed to recover balance: -5" error. - There was no rebalancing
running when during the crash.

* btrfs-find-root: https://pastebin.com/qkjnSUF7 - It bothers me that I
don't see any "good generations" as described here:
https://btrfs.wiki.kernel.org/index.php/Restore

* "btrfs rescue" - it starts, then goes to "looping on XYZ" then stops

* "btrfs rescue super-recover -v" gives:

All Devices:
	Device: id = 1, name = /dev/vg00/...
Before Recovering:
	[All good supers]:
		device name = /dev/vg00/...
		superblock bytenr = 65536

		device name = /dev/vg00/...
		superblock bytenr = 67108864

		device name = /dev/vg00/...
		superblock bytenr = 274877906944

	[All bad supers]:

All supers are valid, no need to recover


* Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
turns out that might have been a bad idea


* Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy

The volume contained qcow2 images for VMs. I need only one of those,
since one piece of important software decided to not do backups :(

Any help is highly appreciated.

Many thanks,
udo.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"
  2018-07-11 15:37 Corrupted FS with "open_ctree failed" and "failed to recover balance: -5" Udo Waechter
@ 2018-07-11 17:48 ` Chris Murphy
  2018-07-16  8:15 ` Udo Waechter
  1 sibling, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2018-07-11 17:48 UTC (permalink / raw)
  To: Udo Waechter; +Cc: Btrfs BTRFS

On Wed, Jul 11, 2018 at 9:37 AM, Udo Waechter <root@zoide.net> wrote:
> Hello everyone,
>
> I have a corrupted filesystem which I can't seem to recover.
>
> The machine is:
> Debian Linux, kernel 4.9 and btrfs-progs v4.13.3
>
> I have a HDD RAID5 with LVM and the volume in question is a LVM volume.
> On top of that I had a RAID1 SSD cache with lvm-cache.
>
> Yesterday both! SSDs died within minutes. This lead to the corruped
> filesystem that I have now.
>
> I hope I followed the procedure correctly.
>
> What I tried so far:
> * "mount -o usebackuproot,ro " and "nospace_cache" "clear_cache" and all
> permutations of these mount options
>
> I'm getting:
>
> [96926.830400] BTRFS info (device dm-2): trying to use backup root at
> mount time
> [96926.830406] BTRFS info (device dm-2): disk space caching is enabled
> [96926.927978] BTRFS error (device dm-2): parent transid verify failed
> on 321269628928 wanted 3276017 found 3275985
> [96926.938619] BTRFS error (device dm-2): parent transid verify failed
> on 321269628928 wanted 3276017 found 3275985
> [96926.940705] BTRFS error (device dm-2): failed to recover balance: -5
> [96926.985801] BTRFS error (device dm-2): open_ctree failed
>
> The weird thing is that I can't really find information about the
> "failed to recover balance: -5" error. - There was no rebalancing
> running when during the crash.
>
> * btrfs-find-root: https://pastebin.com/qkjnSUF7 - It bothers me that I
> don't see any "good generations" as described here:
> https://btrfs.wiki.kernel.org/index.php/Restore
>
> * "btrfs rescue" - it starts, then goes to "looping on XYZ" then stops
>
> * "btrfs rescue super-recover -v" gives:
>
> All Devices:
>         Device: id = 1, name = /dev/vg00/...
> Before Recovering:
>         [All good supers]:
>                 device name = /dev/vg00/...
>                 superblock bytenr = 65536
>
>                 device name = /dev/vg00/...
>                 superblock bytenr = 67108864
>
>                 device name = /dev/vg00/...
>                 superblock bytenr = 274877906944
>
>         [All bad supers]:
>
> All supers are valid, no need to recover
>
>
> * Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
> turns out that might have been a bad idea
>
>
> * Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy
>
> The volume contained qcow2 images for VMs. I need only one of those,
> since one piece of important software decided to not do backups :(
>
> Any help is highly appreciated.

You should ask for help sooner. It's much harder to give advice after
you've modified the file system multiple times since the original
problem happened. But maybe someone has ideas on the way forward,
other than 'btrfs restore' which is the offline scrape tool.
https://btrfs.wiki.kernel.org/index.php/Restore

There's a bunch of fixes since btrfs-progs 4.13 and 4.17 which is now
current. But anyway with lvmcache and the SSDs dying, it sounds like
there are too many transaction commits to Btrfs that are lost in the
failed lvmcache.

Also, gmail considers your email phishing. So something with your mail
is misconfigured for use on lists.

"This message has a from address in zoide.net but has failed
zoide.net's required tests for authentication.  Learn more"

My best guess from the header is that dmarc is set by your email
provider to fail, and while many mail clients ignore this, Google
honors it. And it's the dmarc fail that makes it incompatible with
email lists because lists always rewrite the email posting (they add
footers and rewrite headers).

Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@zoide.net
header.s=mx header.b=vATMNdwx;
       spf=pass (google.com: best guess record for domain of
linux-btrfs-owner@vger.kernel.org designates 209.132.180.67 as
permitted sender) smtp.mailfrom=linux-btrfs-owner@vger.kernel.org;
       dmarc=fail (p=REJECT sp=REJECT dis=QUARANTINE) header.from=zoide.net


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"
  2018-07-11 15:37 Corrupted FS with "open_ctree failed" and "failed to recover balance: -5" Udo Waechter
  2018-07-11 17:48 ` Chris Murphy
@ 2018-07-16  8:15 ` Udo Waechter
  2018-07-16  8:32   ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Udo Waechter @ 2018-07-16  8:15 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2593 bytes --]

Hello,

noone any ideas? Do you need more information?

Cheers,
udo.

On 11/07/18 17:37, Udo Waechter wrote:
> Hello everyone,
> 
> I have a corrupted filesystem which I can't seem to recover.
> 
> The machine is:
> Debian Linux, kernel 4.9 and btrfs-progs v4.13.3
> 
> I have a HDD RAID5 with LVM and the volume in question is a LVM volume.
> On top of that I had a RAID1 SSD cache with lvm-cache.
> 
> Yesterday both! SSDs died within minutes. This lead to the corruped
> filesystem that I have now.
> 
> I hope I followed the procedure correctly.
> 
> What I tried so far:
> * "mount -o usebackuproot,ro " and "nospace_cache" "clear_cache" and all
> permutations of these mount options
> 
> I'm getting:
> 
> [96926.830400] BTRFS info (device dm-2): trying to use backup root at
> mount time
> [96926.830406] BTRFS info (device dm-2): disk space caching is enabled
> [96926.927978] BTRFS error (device dm-2): parent transid verify failed
> on 321269628928 wanted 3276017 found 3275985
> [96926.938619] BTRFS error (device dm-2): parent transid verify failed
> on 321269628928 wanted 3276017 found 3275985
> [96926.940705] BTRFS error (device dm-2): failed to recover balance: -5
> [96926.985801] BTRFS error (device dm-2): open_ctree failed
> 
> The weird thing is that I can't really find information about the
> "failed to recover balance: -5" error. - There was no rebalancing
> running when during the crash.
> 
> * btrfs-find-root: https://pastebin.com/qkjnSUF7 - It bothers me that I
> don't see any "good generations" as described here:
> https://btrfs.wiki.kernel.org/index.php/Restore
> 
> * "btrfs rescue" - it starts, then goes to "looping on XYZ" then stops
> 
> * "btrfs rescue super-recover -v" gives:
> 
> All Devices:
> 	Device: id = 1, name = /dev/vg00/...
> Before Recovering:
> 	[All good supers]:
> 		device name = /dev/vg00/...
> 		superblock bytenr = 65536
> 
> 		device name = /dev/vg00/...
> 		superblock bytenr = 67108864
> 
> 		device name = /dev/vg00/...
> 		superblock bytenr = 274877906944
> 
> 	[All bad supers]:
> 
> All supers are valid, no need to recover
> 
> 
> * Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
> turns out that might have been a bad idea
> 
> 
> * Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy
> 
> The volume contained qcow2 images for VMs. I need only one of those,
> since one piece of important software decided to not do backups :(
> 
> Any help is highly appreciated.
> 
> Many thanks,
> udo.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"
  2018-07-16  8:15 ` Udo Waechter
@ 2018-07-16  8:32   ` Qu Wenruo
  2018-07-17  8:00     ` Udo Waechter
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2018-07-16  8:32 UTC (permalink / raw)
  To: Udo Waechter, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3375 bytes --]



On 2018年07月16日 16:15, Udo Waechter wrote:
> Hello,
> 
> noone any ideas? Do you need more information?
> 
> Cheers,
> udo.
> 
> On 11/07/18 17:37, Udo Waechter wrote:
>> Hello everyone,
>>
>> I have a corrupted filesystem which I can't seem to recover.
>>
>> The machine is:
>> Debian Linux, kernel 4.9 and btrfs-progs v4.13.3
>>
>> I have a HDD RAID5 with LVM and the volume in question is a LVM volume.
>> On top of that I had a RAID1 SSD cache with lvm-cache.
>>
>> Yesterday both! SSDs died within minutes. This lead to the corruped
>> filesystem that I have now.
>>
>> I hope I followed the procedure correctly.
>>
>> What I tried so far:
>> * "mount -o usebackuproot,ro " and "nospace_cache" "clear_cache" and all
>> permutations of these mount options
>>
>> I'm getting:
>>
>> [96926.830400] BTRFS info (device dm-2): trying to use backup root at
>> mount time
>> [96926.830406] BTRFS info (device dm-2): disk space caching is enabled
>> [96926.927978] BTRFS error (device dm-2): parent transid verify failed
>> on 321269628928 wanted 3276017 found 3275985
>> [96926.938619] BTRFS error (device dm-2): parent transid verify failed
>> on 321269628928 wanted 3276017 found 3275985
>> [96926.940705] BTRFS error (device dm-2): failed to recover balance: -5

This means your fs failed to recover the balance.

And it should mostly be caused by transid error just one line above.
Normally this means your fs is more or less corrupted, could be caused
by powerloss or something else.

>> [96926.985801] BTRFS error (device dm-2): open_ctree failed
>>
>> The weird thing is that I can't really find information about the
>> "failed to recover balance: -5" error. - There was no rebalancing
>> running when during the crash.

Can only be determined by tree dump.

# btrfs ins dump-tree -t root <device>

>>
>> * btrfs-find-root: https://pastebin.com/qkjnSUF7 - It bothers me that I
>> don't see any "good generations" as described here:
>> https://btrfs.wiki.kernel.org/index.php/Restore
>>
>> * "btrfs rescue" - it starts, then goes to "looping on XYZ" then stops
>>
>> * "btrfs rescue super-recover -v" gives:
>>
>> All Devices:
>> 	Device: id = 1, name = /dev/vg00/...
>> Before Recovering:
>> 	[All good supers]:
>> 		device name = /dev/vg00/...
>> 		superblock bytenr = 65536
>>
>> 		device name = /dev/vg00/...
>> 		superblock bytenr = 67108864
>>
>> 		device name = /dev/vg00/...
>> 		superblock bytenr = 274877906944
>>
>> 	[All bad supers]:
>>
>> All supers are valid, no need to recover
>>
>>
>> * Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
>> turns out that might have been a bad idea
>>
>>
>> * Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy

Then it is making things worse, fortunately it should terminate before
it causes more damage.

I'm just curious why people doesn't try the safest "btrfs check" without
any options, but goes the most dangerous option.

And "btrfs check" output please.
If possible, "btrfs check --mode=lowmem" is also good for debug.

Thanks,
Qu

>>
>> The volume contained qcow2 images for VMs. I need only one of those,
>> since one piece of important software decided to not do backups :(
>>
>> Any help is highly appreciated.
>>
>> Many thanks,
>> udo.
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"
  2018-07-16  8:32   ` Qu Wenruo
@ 2018-07-17  8:00     ` Udo Waechter
  0 siblings, 0 replies; 5+ messages in thread
From: Udo Waechter @ 2018-07-17  8:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2205 bytes --]

Thanks for the answer.

On 16/07/18 10:32, Qu Wenruo wrote:
> 
> 
> On 2018年07月16日 16:15, Udo Waechter wrote:
>>> The weird thing is that I can't really find information about the
>>> "failed to recover balance: -5" error. - There was no rebalancing
>>> running when during the crash.
> 
> Can only be determined by tree dump.
> 
> # btrfs ins dump-tree -t root <device>
This gives me:

btrfs-progs v4.13.3
parent transid verify failed on 321265147904 wanted 3276017 found 3273915
parent transid verify failed on 321265147904 wanted 3276017 found 3273915
parent transid verify failed on 321265147904 wanted 3276017 found 3263707
parent transid verify failed on 321265147904 wanted 3276017 found 3273915
Ignoring transid failure
leaf parent key incorrect 321265147904
ERROR: unable to open /dev/vg00/var_....

>>> * Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
>>> turns out that might have been a bad idea
>>>
>>>
>>> * Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy
> 
> Then it is making things worse, fortunately it should terminate before
> it causes more damage.
> 
> I'm just curious why people doesn't try the safest "btrfs check" without
> any options, but goes the most dangerous option.
> 
> And "btrfs check" output please.
> If possible, "btrfs check --mode=lowmem" is also good for debug.
> 
Same thing here:

parent transid verify failed on 321265147904 wanted 3276017 found 3273915
parent transid verify failed on 321265147904 wanted 3276017 found 3273915
parent transid verify failed on 321265147904 wanted 3276017 found 3263707
parent transid verify failed on 321265147904 wanted 3276017 found 3273915
Ignoring transid failure
leaf parent key incorrect 321265147904
ERROR: cannot open file system


I did an image with dd pretty early in this process. Unfortunatly this
gives me the same error.


Thanks,
udo.

> Thanks,
> Qu
> 
>>>
>>> The volume contained qcow2 images for VMs. I need only one of those,
>>> since one piece of important software decided to not do backups :(
>>>
>>> Any help is highly appreciated.
>>>
>>> Many thanks,
>>> udo.
>>>
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-17  8:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-11 15:37 Corrupted FS with "open_ctree failed" and "failed to recover balance: -5" Udo Waechter
2018-07-11 17:48 ` Chris Murphy
2018-07-16  8:15 ` Udo Waechter
2018-07-16  8:32   ` Qu Wenruo
2018-07-17  8:00     ` Udo Waechter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.