All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs fails to mount after power outage
@ 2018-04-11 15:33 Tom Vincent
  2018-04-11 23:25 ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Tom Vincent @ 2018-04-11 15:33 UTC (permalink / raw)
  To: linux-btrfs

My btrfs laptop had a power outage and failed to boot with "parent
transid verify failed..." errors. (I have backups).

I couldn't rw mount on a live disk, but could ro mount. I tried btrfs
scrub and then btrfs check --repair to no avail. However, btrfs rescue
zero-log _did_ work; the drive can be rw mounted and the machine boots
fine again.

Although there doesn't appear to be an immediate data loss, there's
still a transid error during boot. "BTRFS error (device dm-0): parent
transid verify failed on 115490816 wanted 339949 found 340182".

If I umount and re-run btrfs check, I'm given further transid errors
and pages of "inode [n] errors 2001, no inode item, link count wrong".

What steps should I take now?

btrfs progs 4.15
kernel 4.15.15
NVMe drive

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs fails to mount after power outage
  2018-04-11 15:33 btrfs fails to mount after power outage Tom Vincent
@ 2018-04-11 23:25 ` Qu Wenruo
  2018-04-13  5:46   ` Duncan
  2018-04-16 16:07   ` Tom Vincent
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2018-04-11 23:25 UTC (permalink / raw)
  To: Tom Vincent, linux-btrfs



On 2018年04月11日 23:33, Tom Vincent wrote:
> My btrfs laptop had a power outage and failed to boot with "parent
> transid verify failed..." errors. (I have backups).

Metadata corruption, again.

I'm curious about what's the underlying disk?
Is it plain physical device? Or have other layers like bcache/lvm?

And what's the physical device? SSD or HDD? Vendor info is also helpful
here.
(Intel 600P used to have problem with XFS, not sure if it will affect btrfs)

> 
> I couldn't rw mount on a live disk, but could ro mount. I tried btrfs
> scrub and then btrfs check --repair to no avail. However, btrfs rescue
> zero-log _did_ work; the drive can be rw mounted and the machine boots
> fine again.
> 
> Although there doesn't appear to be an immediate data loss, there's
> still a transid error during boot. "BTRFS error (device dm-0): parent
> transid verify failed on 115490816 wanted 339949 found 340182".
> 
> If I umount and re-run btrfs check, I'm given further transid errors
> and pages of "inode [n] errors 2001, no inode item, link count wrong".

Full output please.

> 
> What steps should I take now?

For transid error, btrfs check --repair can fix it, but only do it when
that's the only problem.

Thanks,
Qu

> 
> btrfs progs 4.15
> kernel 4.15.15
> NVMe drive
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs fails to mount after power outage
  2018-04-11 23:25 ` Qu Wenruo
@ 2018-04-13  5:46   ` Duncan
  2018-04-16 16:07   ` Tom Vincent
  1 sibling, 0 replies; 5+ messages in thread
From: Duncan @ 2018-04-13  5:46 UTC (permalink / raw)
  To: linux-btrfs

Qu Wenruo posted on Thu, 12 Apr 2018 07:25:15 +0800 as excerpted:


> On 2018年04月11日 23:33, Tom Vincent wrote:
>> My btrfs laptop had a power outage and failed to boot with "parent
>> transid verify failed..." errors. (I have backups).
> 
> Metadata corruption, again.
> 
> I'm curious about what's the underlying disk?
> Is it plain physical device? Or have other layers like bcache/lvm?
> 
> And what's the physical device? SSD or HDD?

The last line of his message said progs 4.15, kernel 4.15.15, NVMe, so 
it's SSD.

Another important question, tho, if not for this instance, than for 
easiest repair the next time something goes wrong:

What mount options?  In particular, is the discard option used (and of 
course I'm assuming nothing as insane as nobarrier)?

Because as came up on a recent thread here...

Btrfs normally keeps a few generations of root blocks around and one 
method of recovery is using the usebackuproot (or the deprecated 
recovery) option to try to use them if the current root is bad.  But 
apparently nobody considered how discard and the backup roots would 
interact, and there's (currently) nothing keeping them from being marked 
for discard just as soon as the next new root becomes current.  Now some 
device firmware batches up discards as garbage-collection that can be 
done periodically, when the number of unwritten erase-blocks gets low, 
but others do discards basically immediately, meaning those backup roots 
are lost effectively immediately, making the usebackuproots recovery 
feature worthless. =:^(

Not a tradeoff that would occur to most people, obviously including the 
btrfs devs that setup btrfs discard behavior, considering whether to 
enable discard or not. =:^(

But it's definitely a tradeoff to consider once you /do/ know it!

Presumably that'll be fixed at some point, but not being a dev nor 
knowing how complex the fix might be, I won't venture a guess as to when, 
or whether it'd be considered stable-kernel backport material or not, 
when it happens.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs fails to mount after power outage
  2018-04-11 23:25 ` Qu Wenruo
  2018-04-13  5:46   ` Duncan
@ 2018-04-16 16:07   ` Tom Vincent
  2018-04-17  0:31     ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Vincent @ 2018-04-16 16:07 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12 April 2018 at 00:25, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> I'm curious about what's the underlying disk?

It's an Samsung PM951 NVMe SSD.

> Is it plain physical device? Or have other layers like bcache/lvm?

btrfs on LUKS

>> btrfs check
> Full output please.

https://gist.githubusercontent.com/tlvince/acf51b37622c216e1c33cdc3dfbd321f/raw/d0237948bbffacd4bb8d53fdfa5f23391416c1e2/btrfs-check.txt

> For transid error, btrfs check --repair can fix it, but only do it when
> that's the only problem.

I ran this (for ~12+ hours) to no avail; it appears to have been
looping around "Btree for root 259 is fixed". I grew impatient and
SIGINT-ed, which unsurprisingly toasted the file system once and for
all (I rebuilt from backups at that point).

Full output:

https://gist.githubusercontent.com/tlvince/8060c19526aa011b0baff2b12e3873fd/raw/ecc43bd9dc7b352e490aa0bf0deac368af04e117/btrfs-check-repair.txt

Note, the system was fine for a few days after zero-log (before check
--repair), but then hit the same transid error at boot.

On 13 April 2018 at 06:46, Duncan <1i5t5.duncan@cox.net> wrote:
> What mount options?  In particular, is the discard option used (and of
> course I'm assuming nothing as insane as nobarrier)?

noatime,compress=lzo

... as well as some defaults: rw,noatime,compress=lzo,ssd,space_cache

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs fails to mount after power outage
  2018-04-16 16:07   ` Tom Vincent
@ 2018-04-17  0:31     ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2018-04-17  0:31 UTC (permalink / raw)
  To: Tom Vincent, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2222 bytes --]



On 2018年04月17日 00:07, Tom Vincent wrote:
> On 12 April 2018 at 00:25, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> I'm curious about what's the underlying disk?
> 
> It's an Samsung PM951 NVMe SSD.
> 
>> Is it plain physical device? Or have other layers like bcache/lvm?
> 
> btrfs on LUKS
> 
>>> btrfs check
>> Full output please.
> 
> https://gist.githubusercontent.com/tlvince/acf51b37622c216e1c33cdc3dfbd321f/raw/d0237948bbffacd4bb8d53fdfa5f23391416c1e2/btrfs-check.txt

Unfortunately, not only extent tree, but also fs trees got corrupted:
root 259 inode 19916 errors 2000, link count wrong
	unresolved ref dir 16196710 index 2 namelen 12 name foo.gpg filetype 0
errors 3, no dir item, no dir index

Such output along other error messages means at least one tree block of
your fs trees get corrupted.

And it seems that all corrupted tree blocks belongs to subvolume 259.

>> For transid error, btrfs check --repair can fix it, but only do it when
>> that's the only problem.
> 
> I ran this (for ~12+ hours) to no avail; it appears to have been
> looping around "Btree for root 259 is fixed". I grew impatient and
> SIGINT-ed, which unsurprisingly toasted the file system once and for
> all (I rebuilt from backups at that point).

check --repair won't help much in this case.
So btrfs-restore would be your last chance to salvage data.

> 
> Full output:
> 
> https://gist.githubusercontent.com/tlvince/8060c19526aa011b0baff2b12e3873fd/raw/ecc43bd9dc7b352e490aa0bf0deac368af04e117/btrfs-check-repair.txt
> 
> Note, the system was fine for a few days after zero-log (before check
> --repair), but then hit the same transid error at boot.

Your filesystem is already *CORRUPTED*, so whatever happens is not a
surprise.

Only a filesystem which passes "btrfs check" without any problems could
be ensured to run for a long time.

Thanks,
Qu

> 
> On 13 April 2018 at 06:46, Duncan <1i5t5.duncan@cox.net> wrote:
>> What mount options?  In particular, is the discard option used (and of
>> course I'm assuming nothing as insane as nobarrier)?
> 
> noatime,compress=lzo
> 
> ... as well as some defaults: rw,noatime,compress=lzo,ssd,space_cache
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-17  0:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-11 15:33 btrfs fails to mount after power outage Tom Vincent
2018-04-11 23:25 ` Qu Wenruo
2018-04-13  5:46   ` Duncan
2018-04-16 16:07   ` Tom Vincent
2018-04-17  0:31     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.