linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unbootable root btrfs
@ 2019-05-16 10:36 Lee Fleming
  2019-05-16 21:39 ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Lee Fleming @ 2019-05-16 10:36 UTC (permalink / raw)
  To: linux-btrfs

After seeking advice on reddit I've been advised to post this problem
here. See https://www.reddit.com/r/btrfs/comments/bp0awe/broken_btrfs_filesystem_following_a_reboot/

I have a root btrfs filesystem on top of mdadm raid10 and lvm. The
raid and lvm appear to be ok but the btrfs partition will not mount.

I have booted a live recovery and tried to mount/repair the
filesystem. This is the result.

    % mount /dev/mapper/vg-root /mnt/gentoo
    mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on
/dev/mapper/vg-root,
    missing codepage or helper program, or other error.

Trying to mount with recovery gives the same result:

    % mount -o ro,recovery  /dev/mapper/vg-root /mnt/gentoo

    mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on
/dev/mapper/vg-root,
    missing codepage or helper program, or other error.

And a btrfs check gives the following:

    % btrfs check --repair /dev/mapper/vg-root
    enabling repair mode
    bytenr mismatch, want=898031484928, have=898006728704

    ERROR: cannot open file system

    % dmesg | grep -i btrfs [ 5.562419] Btrfs loaded, crc32c=crc32c-generic
    [ 14.381989] BTRFS: device fsid
1fb019f1-a8cc-46ef-8122-ac6b1bedd522 devid 1 transid 51979 /dev/dm-1
    [ 14.382647] BTRFS info (device dm-1): disk space caching is enabled
    [ 14.382652] BTRFS info (device dm-1): has skinny extents
    [ 15.777186] BTRFS error (device dm-1): bad tree block start 0 898031337472
    [ 15.777334] BTRFS error (device dm-1): bad tree block start 0 898031353856
    [ 15.777486] BTRFS error (device dm-1): bad tree block start 0 898031370240
    [ 15.864239] BTRFS error (device dm-1): bad tree block start
898006728704 898031484928
    [ 15.871367] BTRFS error (device dm-1): bad tree block start
898003812352 898031484928
    [ 15.871382] BTRFS error (device dm-1): failed to read block groups: -5
    [ 15.892051] BTRFS error (device dm-1): open_ctree failed
    [ 16.016182] BTRFS info (device dm-1): disk space caching is enabled
    [ 16.016186] BTRFS info (device dm-1): has skinny extents
    [ 17.319016] BTRFS error (device dm-1): bad tree block start 0 898031337472
    [ 17.319157] BTRFS error (device dm-1): bad tree block start 0 898031353856
    [ 17.319303] BTRFS error (device dm-1): bad tree block start 0 898031370240
    [ 17.422706] BTRFS error (device dm-1): bad tree block start
898006728704 898031484928
    [ 17.429831] BTRFS error (device dm-1): bad tree block start
898003812352 898031484928
    [ 17.429845] BTRFS error (device dm-1): failed to read block groups: -5
    [ 17.450035] BTRFS error (device dm-1): open_ctree failed % uname
-r 4.14.70-std531-amd64

    % wipefs /dev/mapper/vg-root
    DEVICE OFFSET TYPE UUID LABEL
    vg-root 0x10040 btrfs 1fb019f1-a8cc-46ef-8122-ac6b1bedd522

I was asked to try with a more recent kernel. I booted archiso which
showed similar results.

    # uname -r
    5.0.10-arch1-1-ARCH

    # mount /dev/mapper/vg-root /mnt/funtoo
    [ 208.724214] BTRFS error (device dm-1): bad tree block start,
want 898031337472 have 0
    [ 208.724343] BTRFS error (device dm-1): bad tree block start,
want 898031353856 have 0
    [ 208.724556] BTRFS error (device dm-1): bad tree block start,
want 898031370240 have 0
    [ 208.805279] BTRFS error (device dm-1): bad tree block start,
want 898031484928 have 898006728704
    [ 208.812412] BTRFS error (device dm-1): bad tree block strat,
want 898031484928 have 898003812352
    [ 208.812451] BTRFS error (device dm-1): failed to read block groups: -5
    [ 208.840576] BTRFS error (device dm-1): open_ctree failed
    mount: /mnt/funtoo: wrong fs type, bad option, bad superblock on
/dev/mapper/vg-root, missing codepage or helper program, or other
error.
    32

    # dmesg|grep -i btrfs [ 23.028283] Btrfs loaded, crc32c=crc32c-intel
    [ 23.061402] BTRFS: device fsid
1fb019f1-a8cc-46ef-8122-ac6b1bedd522 devid 1 transid 51979 /dev/dm-1
    [ 207.437375] BTRFS info (device dm-1): disk space caching is enabled
    [ 207.437379] BTRFS info (device dm-1): has skinny extents
    [ 208.724214] BTRFS error (device dm-1): bad tree block start,
want 898031337472 have 0
    [ 208.724343] BTRFS error (device dm-1): bad tree block start,
want 898031353856 have 0
    [ 208.724556] BTRFS error (device dm-1): bad tree block start,
want 898031370240 have 0
    [ 208.805279] BTRFS error (device dm-1): bad tree block start,
want 898031484928 have 898006728704
    [ 208.812412] BTRFS error (device dm-1): bad tree block start,
want 898031484928 have 898003812352
    [ 208.812451] BTRFS error (device dm-1): failed to read block groups: -5
    [ 208.840576] BTRFS error (device dm-1): open_ctree failed

Any idea if this can be fixed?

Cheers
Lee

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unbootable root btrfs
  2019-05-16 10:36 Unbootable root btrfs Lee Fleming
@ 2019-05-16 21:39 ` Chris Murphy
       [not found]   ` <CAKS=YrMB6SNbCnJsU=rD5gC6cR5yEnSzPDax5eP-VQ-UpzHvAg@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2019-05-16 21:39 UTC (permalink / raw)
  To: Lee Fleming; +Cc: Btrfs BTRFS

On Thu, May 16, 2019 at 4:37 AM Lee Fleming <leeflemingster@gmail.com> wrote:

> And a btrfs check gives the following:
>
>     % btrfs check --repair /dev/mapper/vg-root

Why use repair? From the man page

Warning
           Do not use --repair unless you are advised to do so by a developer
           or an experienced user


>     [ 17.429845] BTRFS error (device dm-1): failed to read block groups: -5
>     [ 17.450035] BTRFS error (device dm-1): open_ctree failed

Was there a crash or powerfailure during write before the problem
started? What precipitated the problem?

It might be possible to successfully mount with '-o ro,nologreplay,degraded'

If that works, I'd take the opportunity to refresh backups. I'm not
sure if this can be repaired but also not sure what the problem is.

If it doesn't work, then the next step until a developer has an
opinion on it, is 'btrfs restore' which is a way to scrape data out of
an unmountable file system. It's better than nothing if the data is
important, but ideal if at least ro mount can work.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unbootable root btrfs
       [not found]   ` <CAKS=YrMB6SNbCnJsU=rD5gC6cR5yEnSzPDax5eP-VQ-UpzHvAg@mail.gmail.com>
@ 2019-05-18  4:06     ` Chris Murphy
  2019-05-18  4:39       ` Robert White
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2019-05-18  4:06 UTC (permalink / raw)
  To: Lee Fleming, Btrfs BTRFS; +Cc: Chris Murphy

On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@gmail.com> wrote:
>
> I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling.
>
> There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot.

OK the corrupted Btrfs volume is a guest file system?

That's unexpected. There must be some configuration specific issue
that's instigating this. I've done quite a lot of Btrfs testing in
qemu-kvm including the virtioblk devices using unsafe caching, and I
do vile things with the VM's intentionally trying to blow up Btrfs
including force quitting the VM while it's writing. And I haven't
gotten any corruptions.

All I can recommend is to try to reproduce it again and this next time
try to keep track of the exact steps such that anyone can try to
reproduce it. It might be a bug you've found. But we need a
reproducer. Is it using QCOW2 or RAW file backing, or LVM, or plain
partition? What is the qemu command for the VM? You can get that with
'ps -aux | grep qemu' and it should show all the options used
including the kind of block devices and caching. And then what is the
workload inside the VM?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unbootable root btrfs
  2019-05-18  4:06     ` Chris Murphy
@ 2019-05-18  4:39       ` Robert White
  2019-05-18 19:28         ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Robert White @ 2019-05-18  4:39 UTC (permalink / raw)
  To: Chris Murphy, Lee Fleming, Btrfs BTRFS

On 5/18/19 4:06 AM, Chris Murphy wrote:
> On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@gmail.com> wrote:
>>
>> I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling.
>>
>> There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot.
> 
> OK the corrupted Btrfs volume is a guest file system?

Was the reboot a reboot of the guest instance or the host? The reboot of 
the host can be indistinguishable from a crash to the guest file system 
images if shutdown is taking a long time. That megear fifteen second gap 
between SIGTERM and SIGKILL can be a real VM killer even in an orderly 
shutdown. If you don't have a qemu shutdown script in your host 
environment then every orderly shutdown is a risk to any running VM.

The question that comes to my mind is to ask what -blockdev and/or 
-drive parameters you are using? Some of the combinations of features 
and flags can, in the name of speed, "helpfully violate" the necessary 
I/O orderings that filesystems depend on.

So if the crash kills qemu before qemu has flushed and completed a 
guest-system-critical write to the host store you've suffered a 
corruption that has nothing to do with the filesystem code base.

So, for example, you shutdown your host system. I sends SIGTERM to qemu. 
The guest system sends SIGTERM to its processes. The guest is still 
waiting its nominal 15 seconds, when the host evicts it from memory with 
a SIGKILL because it's 15 second timer started sooner.

(15 seconds is the canonical time from my UNIX days, I don't know what 
the real times are for every distribution.)

Upping the caching behaviours for writes can be just as deadly in some 
conditions.

None of this my apply to OP, but it's the thing I'd check before before 
digging too far.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unbootable root btrfs
  2019-05-18  4:39       ` Robert White
@ 2019-05-18 19:28         ` Chris Murphy
  2019-05-18 19:43           ` Lee Fleming
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2019-05-18 19:28 UTC (permalink / raw)
  To: Robert White; +Cc: Chris Murphy, Lee Fleming, Btrfs BTRFS

On Fri, May 17, 2019 at 10:39 PM Robert White <rwhite@pobox.com> wrote:
>
> On 5/18/19 4:06 AM, Chris Murphy wrote:
> > On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@gmail.com> wrote:
> >>
> >> I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling.
> >>
> >> There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot.
> >
> > OK the corrupted Btrfs volume is a guest file system?
>
> Was the reboot a reboot of the guest instance or the host? The reboot of
> the host can be indistinguishable from a crash to the guest file system
> images if shutdown is taking a long time. That megear fifteen second gap
> between SIGTERM and SIGKILL can be a real VM killer even in an orderly
> shutdown. If you don't have a qemu shutdown script in your host
> environment then every orderly shutdown is a risk to any running VM.

Yep it's a good point.


>
> The question that comes to my mind is to ask what -blockdev and/or
> -drive parameters you are using? Some of the combinations of features
> and flags can, in the name of speed, "helpfully violate" the necessary
> I/O orderings that filesystems depend on.

In particular unsafe caching. But it does make for faster writes, in
particular NTFS and Btrfs in the VM guest.


> So if the crash kills qemu before qemu has flushed and completed a
> guest-system-critical write to the host store you've suffered a
> corruption that has nothing to do with the filesystem code base.

For Btrfs, I think the worst case scenario should be you lose up to
30s of writes. The super block should still point to a valid,
completely committed set of trees that point to valid data extents.
But yeah I have no idea what the write ordering could be if say the
guest has written data>metadata>super, and then the host, not honoring
fsync (some cache policies do ignore it), maybe it ends up writing out
a new super before it writes out metadata - of course the host has no
idea what these writes are for from the guest. And before all metadata
is written by the host, the host reboots. So now you have a superblock
that's pointing to a partial metadata write and that will show up as
corruption.

What *should* still be true is Btrfs can be made to fallback to a
previous root tree by using mount option -o usebackuproot



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unbootable root btrfs
  2019-05-18 19:28         ` Chris Murphy
@ 2019-05-18 19:43           ` Lee Fleming
  0 siblings, 0 replies; 6+ messages in thread
From: Lee Fleming @ 2019-05-18 19:43 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Robert White, Btrfs BTRFS

No. It was the host. I've nuked the filesystem now. I'm sorry - I know
that doesn't help you diagnose this problem. There wasn't anything
important on this drive. I just wanted to see if it could be recovered
before reinstalling everything. But I wanted to get it back up and
running now.

On Sat, 18 May 2019 at 20:28, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Fri, May 17, 2019 at 10:39 PM Robert White <rwhite@pobox.com> wrote:
> >
> > On 5/18/19 4:06 AM, Chris Murphy wrote:
> > > On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@gmail.com> wrote:
> > >>
> > >> I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling.
> > >>
> > >> There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot.
> > >
> > > OK the corrupted Btrfs volume is a guest file system?
> >
> > Was the reboot a reboot of the guest instance or the host? The reboot of
> > the host can be indistinguishable from a crash to the guest file system
> > images if shutdown is taking a long time. That megear fifteen second gap
> > between SIGTERM and SIGKILL can be a real VM killer even in an orderly
> > shutdown. If you don't have a qemu shutdown script in your host
> > environment then every orderly shutdown is a risk to any running VM.
>
> Yep it's a good point.
>
>
> >
> > The question that comes to my mind is to ask what -blockdev and/or
> > -drive parameters you are using? Some of the combinations of features
> > and flags can, in the name of speed, "helpfully violate" the necessary
> > I/O orderings that filesystems depend on.
>
> In particular unsafe caching. But it does make for faster writes, in
> particular NTFS and Btrfs in the VM guest.
>
>
> > So if the crash kills qemu before qemu has flushed and completed a
> > guest-system-critical write to the host store you've suffered a
> > corruption that has nothing to do with the filesystem code base.
>
> For Btrfs, I think the worst case scenario should be you lose up to
> 30s of writes. The super block should still point to a valid,
> completely committed set of trees that point to valid data extents.
> But yeah I have no idea what the write ordering could be if say the
> guest has written data>metadata>super, and then the host, not honoring
> fsync (some cache policies do ignore it), maybe it ends up writing out
> a new super before it writes out metadata - of course the host has no
> idea what these writes are for from the guest. And before all metadata
> is written by the host, the host reboots. So now you have a superblock
> that's pointing to a partial metadata write and that will show up as
> corruption.
>
> What *should* still be true is Btrfs can be made to fallback to a
> previous root tree by using mount option -o usebackuproot
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-18 19:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 10:36 Unbootable root btrfs Lee Fleming
2019-05-16 21:39 ` Chris Murphy
     [not found]   ` <CAKS=YrMB6SNbCnJsU=rD5gC6cR5yEnSzPDax5eP-VQ-UpzHvAg@mail.gmail.com>
2019-05-18  4:06     ` Chris Murphy
2019-05-18  4:39       ` Robert White
2019-05-18 19:28         ` Chris Murphy
2019-05-18 19:43           ` Lee Fleming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).