linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTRFS recovery not possible
@ 2019-06-11 10:53 claudius
  2019-06-11 13:02 ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: claudius @ 2019-06-11 10:53 UTC (permalink / raw)
  To: linux-btrfs

HI Guys,

you are my last try. I was so happy to use BTRFS but now i really hate 
it....


Linux CIA 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux
btrfs-progs v4.15.1

btrfs fi show
Label: none  uuid: 9622fd5c-5f7a-4e72-8efa-3d56a462ba85
         Total devices 1 FS bytes used 4.58TiB
         devid    1 size 7.28TiB used 4.59TiB path /dev/mapper/volume1


dmesg

[57501.267526] BTRFS info (device dm-5): trying to use backup root at 
mount time
[57501.267528] BTRFS info (device dm-5): disk space caching is enabled
[57501.267529] BTRFS info (device dm-5): has skinny extents
[57507.511830] BTRFS error (device dm-5): parent transid verify failed 
on 2069131051008 wanted 4240 found 5115
[57507.518764] BTRFS error (device dm-5): parent transid verify failed 
on 2069131051008 wanted 4240 found 5115
[57507.519265] BTRFS error (device dm-5): failed to read block groups: 
-5
[57507.605939] BTRFS error (device dm-5): open_ctree failed


btrfs check /dev/mapper/volume1
parent transid verify failed on 2069131051008 wanted 4240 found 5115
parent transid verify failed on 2069131051008 wanted 4240 found 5115
parent transid verify failed on 2069131051008 wanted 4240 found 5115
parent transid verify failed on 2069131051008 wanted 4240 found 5115
Ignoring transid failure
extent buffer leak: start 2024985772032 len 16384
ERROR: cannot open file system



im not able to mount it anymore.


I found the drive in RO the other day and realized somthing was wrong 
... i did a reboot and now i cant mount anmyore


any help

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS recovery not possible
  2019-06-11 10:53 BTRFS recovery not possible claudius
@ 2019-06-11 13:02 ` Qu Wenruo
  2019-06-15 22:05   ` Claudius Winkel
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2019-06-11 13:02 UTC (permalink / raw)
  To: claudius, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2774 bytes --]



On 2019/6/11 下午6:53, claudius@winca.de wrote:
> HI Guys,
> 
> you are my last try. I was so happy to use BTRFS but now i really hate
> it....
> 
> 
> Linux CIA 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019
> x86_64 x86_64 x86_64 GNU/Linux
> btrfs-progs v4.15.1

So old kernel and old progs.

> 
> btrfs fi show
> Label: none  uuid: 9622fd5c-5f7a-4e72-8efa-3d56a462ba85
>         Total devices 1 FS bytes used 4.58TiB
>         devid    1 size 7.28TiB used 4.59TiB path /dev/mapper/volume1
> 
> 
> dmesg
> 
> [57501.267526] BTRFS info (device dm-5): trying to use backup root at
> mount time
> [57501.267528] BTRFS info (device dm-5): disk space caching is enabled
> [57501.267529] BTRFS info (device dm-5): has skinny extents
> [57507.511830] BTRFS error (device dm-5): parent transid verify failed
> on 2069131051008 wanted 4240 found 5115

Some metadata CoW is not recorded correctly.

Hopes you didn't every try any btrfs check --repair|--init-* or anything
other than --readonly.
As there is a long exiting bug in btrfs-progs which could cause similar
corruption.



> [57507.518764] BTRFS error (device dm-5): parent transid verify failed
> on 2069131051008 wanted 4240 found 5115
> [57507.519265] BTRFS error (device dm-5): failed to read block groups: -5
> [57507.605939] BTRFS error (device dm-5): open_ctree failed
> 
> 
> btrfs check /dev/mapper/volume1
> parent transid verify failed on 2069131051008 wanted 4240 found 5115
> parent transid verify failed on 2069131051008 wanted 4240 found 5115
> parent transid verify failed on 2069131051008 wanted 4240 found 5115
> parent transid verify failed on 2069131051008 wanted 4240 found 5115
> Ignoring transid failure
> extent buffer leak: start 2024985772032 len 16384
> ERROR: cannot open file system
> 
> 
> 
> im not able to mount it anymore.
> 
> 
> I found the drive in RO the other day and realized somthing was wrong
> ... i did a reboot and now i cant mount anmyore

Btrfs extent tree must has been corrupted at that time.

Full recovery back to fully RW mountable fs doesn't look possible.
As metadata CoW is completely screwed up in this case.

Either you could use btrfs-restore to try to restore the data into
another location.

Or try my kernel branch:
https://github.com/adam900710/linux/tree/rescue_options

It's an older branch based on v5.1-rc4.
But it has some extra new mount options.
For your case, you need to compile the kernel, then mount it with "-o
ro,rescue=skip_bg,rescue=no_log_replay".

If it mounts (as RO), then do all your salvage.
It should be a faster than btrfs-restore, and you can use all your
regular tool to backup.

Thanks,
Qu

> 
> 
> any help


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS recovery not possible
  2019-06-11 13:02 ` Qu Wenruo
@ 2019-06-15 22:05   ` Claudius Winkel
  2019-06-19 23:45     ` Zygo Blaxell
  0 siblings, 1 reply; 5+ messages in thread
From: Claudius Winkel @ 2019-06-15 22:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Thanks for the Help

I get my data back.

But now I`m thinking... how did it come so far?

Was it luks the dm-crypt?

What did i do wrong? Old Ubuntu Kernel? ubuntu 18.04

What should I do now ... to use btrfs safely? Should i not use it with 
DM-crypt

Or even use ZFS instead...

Am 11/06/2019 um 15:02 schrieb Qu Wenruo:
>
> On 2019/6/11 下午6:53, claudius@winca.de wrote:
>> HI Guys,
>>
>> you are my last try. I was so happy to use BTRFS but now i really hate
>> it....
>>
>>
>> Linux CIA 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019
>> x86_64 x86_64 x86_64 GNU/Linux
>> btrfs-progs v4.15.1
> So old kernel and old progs.
>
>> btrfs fi show
>> Label: none  uuid: 9622fd5c-5f7a-4e72-8efa-3d56a462ba85
>>          Total devices 1 FS bytes used 4.58TiB
>>          devid    1 size 7.28TiB used 4.59TiB path /dev/mapper/volume1
>>
>>
>> dmesg
>>
>> [57501.267526] BTRFS info (device dm-5): trying to use backup root at
>> mount time
>> [57501.267528] BTRFS info (device dm-5): disk space caching is enabled
>> [57501.267529] BTRFS info (device dm-5): has skinny extents
>> [57507.511830] BTRFS error (device dm-5): parent transid verify failed
>> on 2069131051008 wanted 4240 found 5115
> Some metadata CoW is not recorded correctly.
>
> Hopes you didn't every try any btrfs check --repair|--init-* or anything
> other than --readonly.
> As there is a long exiting bug in btrfs-progs which could cause similar
> corruption.
>
>
>
>> [57507.518764] BTRFS error (device dm-5): parent transid verify failed
>> on 2069131051008 wanted 4240 found 5115
>> [57507.519265] BTRFS error (device dm-5): failed to read block groups: -5
>> [57507.605939] BTRFS error (device dm-5): open_ctree failed
>>
>>
>> btrfs check /dev/mapper/volume1
>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>> Ignoring transid failure
>> extent buffer leak: start 2024985772032 len 16384
>> ERROR: cannot open file system
>>
>>
>>
>> im not able to mount it anymore.
>>
>>
>> I found the drive in RO the other day and realized somthing was wrong
>> ... i did a reboot and now i cant mount anmyore
> Btrfs extent tree must has been corrupted at that time.
>
> Full recovery back to fully RW mountable fs doesn't look possible.
> As metadata CoW is completely screwed up in this case.
>
> Either you could use btrfs-restore to try to restore the data into
> another location.
>
> Or try my kernel branch:
> https://github.com/adam900710/linux/tree/rescue_options
>
> It's an older branch based on v5.1-rc4.
> But it has some extra new mount options.
> For your case, you need to compile the kernel, then mount it with "-o
> ro,rescue=skip_bg,rescue=no_log_replay".
>
> If it mounts (as RO), then do all your salvage.
> It should be a faster than btrfs-restore, and you can use all your
> regular tool to backup.
>
> Thanks,
> Qu
>
>>
>> any help

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS recovery not possible
  2019-06-15 22:05   ` Claudius Winkel
@ 2019-06-19 23:45     ` Zygo Blaxell
  2019-06-20  5:00       ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Zygo Blaxell @ 2019-06-19 23:45 UTC (permalink / raw)
  To: Claudius Winkel; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4836 bytes --]

On Sun, Jun 16, 2019 at 12:05:21AM +0200, Claudius Winkel wrote:
> Thanks for the Help
> 
> I get my data back.
> 
> But now I`m thinking... how did it come so far?
> 
> Was it luks the dm-crypt?

dm-crypt is fine.  dm-crypt is not a magical tool for creating data loss
in Linux storage stacks.  I've never been able to prove dm-crypt ever
lost any data on my watch, and I've been testing for that event for 10+
years (half of them on btrfs).

dm-crypt's predecessors (e.g. cryptoloop) were notoriously underspecified
and buggy, but they are not dm-crypt.  Hopefully no modern distro still
offers these as an install option.

> What did i do wrong? Old Ubuntu Kernel? ubuntu 18.04

4.15 isn't old enough for its age alone to cause the issues you
encountered.

> What should I do now ... to use btrfs safely? Should i not use it with
> DM-crypt

You might need to disable write caching on your drives, i.e. hdparm -W0.

I have a few drives in my collection that don't have working write cache.
They are usually fine, but when otherwise minor failure events occur (e.g.
bad cables, bad power supply, failing UNC sectors) then the write cache
doesn't behave correctly, and any filesystem or database on the drive
gets trashed.  This isn't normal behavior, but the problem does affect
the default configuration of some popular mid-range drive models from
top-3 hard disk vendors, so it's quite common.

After turning off write caching, btrfs can keep running on these problem
drive models until they get too old and broken to spin up any more.
With write caching turned on, these drive models will eat a btrfs every
few months.


> Or even use ZFS instead...
>
> Am 11/06/2019 um 15:02 schrieb Qu Wenruo:
> > 
> > On 2019/6/11 下午6:53, claudius@winca.de wrote:
> > > HI Guys,
> > > 
> > > you are my last try. I was so happy to use BTRFS but now i really hate
> > > it....
> > > 
> > > 
> > > Linux CIA 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > > btrfs-progs v4.15.1
> > So old kernel and old progs.
> > 
> > > btrfs fi show
> > > Label: none  uuid: 9622fd5c-5f7a-4e72-8efa-3d56a462ba85
> > >          Total devices 1 FS bytes used 4.58TiB
> > >          devid    1 size 7.28TiB used 4.59TiB path /dev/mapper/volume1
> > > 
> > > 
> > > dmesg
> > > 
> > > [57501.267526] BTRFS info (device dm-5): trying to use backup root at
> > > mount time
> > > [57501.267528] BTRFS info (device dm-5): disk space caching is enabled
> > > [57501.267529] BTRFS info (device dm-5): has skinny extents
> > > [57507.511830] BTRFS error (device dm-5): parent transid verify failed
> > > on 2069131051008 wanted 4240 found 5115
> > Some metadata CoW is not recorded correctly.
> > 
> > Hopes you didn't every try any btrfs check --repair|--init-* or anything
> > other than --readonly.
> > As there is a long exiting bug in btrfs-progs which could cause similar
> > corruption.
> > 
> > 
> > 
> > > [57507.518764] BTRFS error (device dm-5): parent transid verify failed
> > > on 2069131051008 wanted 4240 found 5115
> > > [57507.519265] BTRFS error (device dm-5): failed to read block groups: -5
> > > [57507.605939] BTRFS error (device dm-5): open_ctree failed
> > > 
> > > 
> > > btrfs check /dev/mapper/volume1
> > > parent transid verify failed on 2069131051008 wanted 4240 found 5115
> > > parent transid verify failed on 2069131051008 wanted 4240 found 5115
> > > parent transid verify failed on 2069131051008 wanted 4240 found 5115
> > > parent transid verify failed on 2069131051008 wanted 4240 found 5115
> > > Ignoring transid failure
> > > extent buffer leak: start 2024985772032 len 16384
> > > ERROR: cannot open file system
> > > 
> > > 
> > > 
> > > im not able to mount it anymore.
> > > 
> > > 
> > > I found the drive in RO the other day and realized somthing was wrong
> > > ... i did a reboot and now i cant mount anmyore
> > Btrfs extent tree must has been corrupted at that time.
> > 
> > Full recovery back to fully RW mountable fs doesn't look possible.
> > As metadata CoW is completely screwed up in this case.
> > 
> > Either you could use btrfs-restore to try to restore the data into
> > another location.
> > 
> > Or try my kernel branch:
> > https://github.com/adam900710/linux/tree/rescue_options
> > 
> > It's an older branch based on v5.1-rc4.
> > But it has some extra new mount options.
> > For your case, you need to compile the kernel, then mount it with "-o
> > ro,rescue=skip_bg,rescue=no_log_replay".
> > 
> > If it mounts (as RO), then do all your salvage.
> > It should be a faster than btrfs-restore, and you can use all your
> > regular tool to backup.
> > 
> > Thanks,
> > Qu
> > 
> > > 
> > > any help

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS recovery not possible
  2019-06-19 23:45     ` Zygo Blaxell
@ 2019-06-20  5:00       ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-06-20  5:00 UTC (permalink / raw)
  To: Zygo Blaxell, Claudius Winkel; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5810 bytes --]



On 2019/6/20 上午7:45, Zygo Blaxell wrote:
> On Sun, Jun 16, 2019 at 12:05:21AM +0200, Claudius Winkel wrote:
>> Thanks for the Help
>>
>> I get my data back.
>>
>> But now I`m thinking... how did it come so far?
>>
>> Was it luks the dm-crypt?
> 
> dm-crypt is fine.  dm-crypt is not a magical tool for creating data loss
> in Linux storage stacks.  I've never been able to prove dm-crypt ever
> lost any data on my watch, and I've been testing for that event for 10+
> years (half of them on btrfs).
> 
> dm-crypt's predecessors (e.g. cryptoloop) were notoriously underspecified
> and buggy, but they are not dm-crypt.  Hopefully no modern distro still
> offers these as an install option.
> 
>> What did i do wrong? Old Ubuntu Kernel? ubuntu 18.04
> 
> 4.15 isn't old enough for its age alone to cause the issues you
> encountered.
> 
>> What should I do now ... to use btrfs safely? Should i not use it with
>> DM-crypt
> 
> You might need to disable write caching on your drives, i.e. hdparm -W0.

This is quite troublesome.

Disabling write cache normally means performance impact.

And disabling it normally would hide the true cause (if it's something
btrfs' fault).

> 
> I have a few drives in my collection that don't have working write cache.
> They are usually fine, but when otherwise minor failure events occur (e.g.
> bad cables, bad power supply, failing UNC sectors) then the write cache
> doesn't behave correctly, and any filesystem or database on the drive
> gets trashed.

Normally this shouldn't be the case, as long as the fs has correct
journal and flush/barrier.

If it's really the hardware to blame, then it means its flush/fua is not
implemented properly at all, thus the possibility of a single power loss
leading to corruption should be VERY VERY high.

>  This isn't normal behavior, but the problem does affect
> the default configuration of some popular mid-range drive models from
> top-3 hard disk vendors, so it's quite common.

Would you like to share the info and test methodology to determine it's
the device to blame? (maybe in another thread)

Your idea on hardware's faulty FLUSH/FUA implementation could definitely
cause exactly the same problem, but the last time I asked similar
problem to fs-devel, there is no proof for such possibility.

The problem is always a ghost to chase, extra info would greatly help us
to pin it down.

Thanks,
Qu

> 
> After turning off write caching, btrfs can keep running on these problem
> drive models until they get too old and broken to spin up any more.
> With write caching turned on, these drive models will eat a btrfs every
> few months.
> 
> 
>> Or even use ZFS instead...
>>
>> Am 11/06/2019 um 15:02 schrieb Qu Wenruo:
>>>
>>> On 2019/6/11 下午6:53, claudius@winca.de wrote:
>>>> HI Guys,
>>>>
>>>> you are my last try. I was so happy to use BTRFS but now i really hate
>>>> it....
>>>>
>>>>
>>>> Linux CIA 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019
>>>> x86_64 x86_64 x86_64 GNU/Linux
>>>> btrfs-progs v4.15.1
>>> So old kernel and old progs.
>>>
>>>> btrfs fi show
>>>> Label: none  uuid: 9622fd5c-5f7a-4e72-8efa-3d56a462ba85
>>>>          Total devices 1 FS bytes used 4.58TiB
>>>>          devid    1 size 7.28TiB used 4.59TiB path /dev/mapper/volume1
>>>>
>>>>
>>>> dmesg
>>>>
>>>> [57501.267526] BTRFS info (device dm-5): trying to use backup root at
>>>> mount time
>>>> [57501.267528] BTRFS info (device dm-5): disk space caching is enabled
>>>> [57501.267529] BTRFS info (device dm-5): has skinny extents
>>>> [57507.511830] BTRFS error (device dm-5): parent transid verify failed
>>>> on 2069131051008 wanted 4240 found 5115
>>> Some metadata CoW is not recorded correctly.
>>>
>>> Hopes you didn't every try any btrfs check --repair|--init-* or anything
>>> other than --readonly.
>>> As there is a long exiting bug in btrfs-progs which could cause similar
>>> corruption.
>>>
>>>
>>>
>>>> [57507.518764] BTRFS error (device dm-5): parent transid verify failed
>>>> on 2069131051008 wanted 4240 found 5115
>>>> [57507.519265] BTRFS error (device dm-5): failed to read block groups: -5
>>>> [57507.605939] BTRFS error (device dm-5): open_ctree failed
>>>>
>>>>
>>>> btrfs check /dev/mapper/volume1
>>>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>>>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>>>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>>>> parent transid verify failed on 2069131051008 wanted 4240 found 5115
>>>> Ignoring transid failure
>>>> extent buffer leak: start 2024985772032 len 16384
>>>> ERROR: cannot open file system
>>>>
>>>>
>>>>
>>>> im not able to mount it anymore.
>>>>
>>>>
>>>> I found the drive in RO the other day and realized somthing was wrong
>>>> ... i did a reboot and now i cant mount anmyore
>>> Btrfs extent tree must has been corrupted at that time.
>>>
>>> Full recovery back to fully RW mountable fs doesn't look possible.
>>> As metadata CoW is completely screwed up in this case.
>>>
>>> Either you could use btrfs-restore to try to restore the data into
>>> another location.
>>>
>>> Or try my kernel branch:
>>> https://github.com/adam900710/linux/tree/rescue_options
>>>
>>> It's an older branch based on v5.1-rc4.
>>> But it has some extra new mount options.
>>> For your case, you need to compile the kernel, then mount it with "-o
>>> ro,rescue=skip_bg,rescue=no_log_replay".
>>>
>>> If it mounts (as RO), then do all your salvage.
>>> It should be a faster than btrfs-restore, and you can use all your
>>> regular tool to backup.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> any help


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-20  5:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-11 10:53 BTRFS recovery not possible claudius
2019-06-11 13:02 ` Qu Wenruo
2019-06-15 22:05   ` Claudius Winkel
2019-06-19 23:45     ` Zygo Blaxell
2019-06-20  5:00       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).