All of lore.kernel.org
 help / color / mirror / Atom feed
* Yet another guy with a "parent transid verify failed" problem
@ 2020-08-24 23:00 MegaBrutal
  2020-08-24 23:41 ` Chris Murphy
  0 siblings, 1 reply; 3+ messages in thread
From: MegaBrutal @ 2020-08-24 23:00 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

My home server computer with BTRFS root file system suffered a power
supply failure recently which caused a sudden power loss.

Now the OS (Ubuntu 18.04) boots properly and it starts a bunch of LXC
containers with the applications the server is supposed to host. After
a certain time of running normally, the root filesystem gets remounted
read-only and the following messages appear in dmesg. Fortunately, the
file systems of the containers are not affected (they are mounted from
separate LVs).

[57038.544637] BTRFS error (device dm-160): parent transid verify
failed on 169222144 wanted 9897860 found 9895362

This happened after multiple reboots.

The file system is located on an LVM volume with raid1 mirroring. I
already did LVM raid1 scrubbing, no mismatches found.

I didn't do much of BTRFS level troubleshooting, but the wiki
suggested that I should try to mount with usebackuproot.

The usebackuproot mount was successful, but I'm not sure how it's
supposed to work... does it correct the file system after one mount
and then I'm supposed to mount the file system normally? Or should I
always use the file system with usebackuproot from now on? (It doesn't
feel right.) Anyway, after one mount with usebackuproot, now I started
the system regularly. But I'm not sure if it solved the problem,
whether usebackuproot did anything, especially that now I rebooted
with regular mount options. Since the problem presents itself after
hours of normal operation, I'm afraid that it might come back anytime.

What to do if the problem reemerges?


Thanks in advance for any insight,
MegaBrutal

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Yet another guy with a "parent transid verify failed" problem
  2020-08-24 23:00 Yet another guy with a "parent transid verify failed" problem MegaBrutal
@ 2020-08-24 23:41 ` Chris Murphy
  2020-08-25  6:30   ` MegaBrutal
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Murphy @ 2020-08-24 23:41 UTC (permalink / raw)
  To: MegaBrutal; +Cc: linux-btrfs

On Mon, Aug 24, 2020 at 5:01 PM MegaBrutal <megabrutal@gmail.com> wrote:
>
> Hi all,
>
> My home server computer with BTRFS root file system suffered a power
> supply failure recently which caused a sudden power loss.
>
> Now the OS (Ubuntu 18.04) boots properly and it starts a bunch of LXC
> containers with the applications the server is supposed to host. After
> a certain time of running normally, the root filesystem gets remounted
> read-only and the following messages appear in dmesg. Fortunately, the
> file systems of the containers are not affected (they are mounted from
> separate LVs).
>
> [57038.544637] BTRFS error (device dm-160): parent transid verify
> failed on 169222144 wanted 9897860 found 9895362

On the surface that looks like 1500 transaction IDs have been dropped.
But it's more likely that this location has long since been
deallocated and the recent commit should have been written there
before the super block, but for some reason (firmware bug?) the
current superblock was written before the metadata writes had been
committed to stable media.

That is, lost writes. It could be write order failure, and then you
just got unlucky and had a crash so it turned into a lost write. Had
there been no crash, eventually the write would have happened. It
being out of order wouldn't matter.

It's possible that the drive does this all the time. It's also
possible it's 1 in 100 fsync/fua commands.


> The usebackuproot mount was successful, but I'm not sure how it's
> supposed to work... does it correct the file system after one mount
> and then I'm supposed to mount the file system normally?

That usebackuproot succeeded does further suggest a write order
failure. The super block made it, but the tree root didn't or whatever
exactly it's failing on that it expects but isn't there.

The way it works it it'll try to use backup roots for mounting, these
backup roots are in the super block. It's sorta like a "rollback"
which means you are probably missing up to 1 minute's worth of data
loss between the time of the crash and the last properly completed
commit in which everything made it on stable media.

At this point it's probably fixed. But it's possible this would have
gone slightly better if the setup were using Btrfs raid1, because in
that case there's a chance one drive didn't drop that write, and Btrfs
would find what it wants on that drive, automatically.

But you should do a `btrfs scrub` to see if there are other issues.
And when you get a chance it's ideal to `btrfs check` because scrub
only checks the checksums.

Disabling the write caches in both drives might reduce the chance of
this happening, but without testing it may only end up reducing write
performance though probably not by much.


>Or should I
> always use the file system with usebackuproot from now on?

No need. But even if you use it, btrfs will figure out the current
root is OK and use it.


>(It doesn't
> feel right.) Anyway, after one mount with usebackuproot, now I started
> the system regularly. But I'm not sure if it solved the problem,
> whether usebackuproot did anything, especially that now I rebooted
> with regular mount options. Since the problem presents itself after
> hours of normal operation, I'm afraid that it might come back anytime.
>
> What to do if the problem reemerges?

If either drive is dropping writes, there's a chance it'll really
confuse the file system. And md raid1 scrub check just detects
differences, it doesn't know which is correct. And if you do an md
raid1 scrub repair then it just picks one as correct and stomps on the
other one.

What do you get for 'btrfs fi us /' ?

Hopefully metadata is at least dup profile.

In the short term I'd probably disable write caching on both drives
because even if it's slower, it's incrementally safer. And also stop
having power failures :D

And in the longer term you want to redo this setup to use Btrfs raid1.
That way it will explicitly rat out which of the two drives is
dropping writes on power failures.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Yet another guy with a "parent transid verify failed" problem
  2020-08-24 23:41 ` Chris Murphy
@ 2020-08-25  6:30   ` MegaBrutal
  0 siblings, 0 replies; 3+ messages in thread
From: MegaBrutal @ 2020-08-25  6:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

Thanks, Chris!

Now I got something else...


[22116.465254] BTRFS critical (device dm-160): corrupt leaf:
block=28653215744 slot=22 extent bytenr=11525382144 len=958464 invalid
generation, have 94701344 expect (0, 9898143]
[22116.465866] BTRFS: error (device dm-160) in
btrfs_run_delayed_refs:3083: errno=-5 IO failure
[22116.466480] BTRFS info (device dm-160): forced readonly


It seems a little friendlier as it doesn't lock the system like the
earlier message. When the previous "parent transid verify failed"
happened, it screwed up the system so much that I couldn't even do a
graceful reboot. (I couldn't even use sudo for example.) Luckily I
could stop the containers gracefully, but then I had to hard reset.


root@vmhost:~# btrfs fi us /
Overall:
    Device size:          20.46GiB
    Device allocated:          20.46GiB
    Device unallocated:           1.00MiB
    Device missing:             0.00B
    Used:              14.46GiB
    Free (estimated):           3.30GiB    (min: 3.30GiB)
    Data ratio:                  1.00
    Metadata ratio:              2.00
    Global reserve:         512.00MiB    (used: 0.00B)

Data,single: Size:16.95GiB, Used:13.64GiB
   /dev/mapper/vmhost--vg-vmhost--rootfs      16.95GiB

Metadata,DUP: Size:1.75GiB, Used:417.00MiB
   /dev/mapper/vmhost--vg-vmhost--rootfs       3.50GiB

System,DUP: Size:8.00MiB, Used:4.00KiB
   /dev/mapper/vmhost--vg-vmhost--rootfs      16.00MiB

Unallocated:
   /dev/mapper/vmhost--vg-vmhost--rootfs       1.00MiB


And... theoretically I started a scrub, but since the FS is read-only,
it can't record the status...


root@vmhost:~# btrfs scrub start /
WARNING: failed to open the progress status socket at
/var/lib/btrfs/scrub.progress.1b23076d-74fc-4091-9c02-fe3f02f02b96:
Read-only file system. Progress cannot be queried
WARNING: failed to write the progress status file: Read-only file
system. Status recording disabled
scrub started on /, fsid 1b23076d-74fc-4091-9c02-fe3f02f02b96 (pid=31845)
root@vmhost:~# btrfs scrub status /
scrub status for 1b23076d-74fc-4091-9c02-fe3f02f02b96
    no stats available
    total bytes scrubbed: 0.00B with 0 errors


So... I don't know how I will I know what it found. :/ Probably I
should have rebooted first or should have tried a remount,rw (no, that
doesn't work, because "Remounting read-write after error is not
allowed"). Now I don't know what to do because stopping all the
containers for a reboot and then restarting them is an ordeal, I won't
like if I have to do it often. At the moment I don't even have time
for that, and I'm having a SMART long test still running on one of the
drives.

I had 695 days of uptime before the power failure happened, so it
doesn't seem common, though it might be relevant that up to the day of
the failure, I was using a 2018 kernel build, because I had that as
the latest when I last started the system. :D And now I rebooted with
~2 year newer kernel as I had it installed but never actually booted
with it before. But it's the same major kernel version as the earlier
that comes with Ubuntu 18.04 (4.15.0-112-generic).

Can I add two linear LVs to BTRFS raid1, or shall I go with raw
partitions? Reclaiming space from LVM to create partitions would be
cumbersome, and I wouldn't like it as I couldn't resize and move the
FS dynamically like with LVM, so I'd probably pass on that. But the
former option would be possible.

I'm going ahead with this question, but I'm planning to add two SSDs
to the system in the following months, and move the root file system
to the SSD. Is there anything I should watch out for when I'm moving
and then using the file system, or can I just pvmove and then add the
"ssd" and probably "discard" mount options?


~ MegaBrutal



Chris Murphy <lists@colorremedies.com> ezt írta (időpont: 2020. aug.
25., K, 1:41):
>
> On Mon, Aug 24, 2020 at 5:01 PM MegaBrutal <megabrutal@gmail.com> wrote:
> >
> > Hi all,
> >
> > My home server computer with BTRFS root file system suffered a power
> > supply failure recently which caused a sudden power loss.
> >
> > Now the OS (Ubuntu 18.04) boots properly and it starts a bunch of LXC
> > containers with the applications the server is supposed to host. After
> > a certain time of running normally, the root filesystem gets remounted
> > read-only and the following messages appear in dmesg. Fortunately, the
> > file systems of the containers are not affected (they are mounted from
> > separate LVs).
> >
> > [57038.544637] BTRFS error (device dm-160): parent transid verify
> > failed on 169222144 wanted 9897860 found 9895362
>
> On the surface that looks like 1500 transaction IDs have been dropped.
> But it's more likely that this location has long since been
> deallocated and the recent commit should have been written there
> before the super block, but for some reason (firmware bug?) the
> current superblock was written before the metadata writes had been
> committed to stable media.
>
> That is, lost writes. It could be write order failure, and then you
> just got unlucky and had a crash so it turned into a lost write. Had
> there been no crash, eventually the write would have happened. It
> being out of order wouldn't matter.
>
> It's possible that the drive does this all the time. It's also
> possible it's 1 in 100 fsync/fua commands.
>
>
> > The usebackuproot mount was successful, but I'm not sure how it's
> > supposed to work... does it correct the file system after one mount
> > and then I'm supposed to mount the file system normally?
>
> That usebackuproot succeeded does further suggest a write order
> failure. The super block made it, but the tree root didn't or whatever
> exactly it's failing on that it expects but isn't there.
>
> The way it works it it'll try to use backup roots for mounting, these
> backup roots are in the super block. It's sorta like a "rollback"
> which means you are probably missing up to 1 minute's worth of data
> loss between the time of the crash and the last properly completed
> commit in which everything made it on stable media.
>
> At this point it's probably fixed. But it's possible this would have
> gone slightly better if the setup were using Btrfs raid1, because in
> that case there's a chance one drive didn't drop that write, and Btrfs
> would find what it wants on that drive, automatically.
>
> But you should do a `btrfs scrub` to see if there are other issues.
> And when you get a chance it's ideal to `btrfs check` because scrub
> only checks the checksums.
>
> Disabling the write caches in both drives might reduce the chance of
> this happening, but without testing it may only end up reducing write
> performance though probably not by much.
>
>
> >Or should I
> > always use the file system with usebackuproot from now on?
>
> No need. But even if you use it, btrfs will figure out the current
> root is OK and use it.
>
>
> >(It doesn't
> > feel right.) Anyway, after one mount with usebackuproot, now I started
> > the system regularly. But I'm not sure if it solved the problem,
> > whether usebackuproot did anything, especially that now I rebooted
> > with regular mount options. Since the problem presents itself after
> > hours of normal operation, I'm afraid that it might come back anytime.
> >
> > What to do if the problem reemerges?
>
> If either drive is dropping writes, there's a chance it'll really
> confuse the file system. And md raid1 scrub check just detects
> differences, it doesn't know which is correct. And if you do an md
> raid1 scrub repair then it just picks one as correct and stomps on the
> other one.
>
> What do you get for 'btrfs fi us /' ?
>
> Hopefully metadata is at least dup profile.
>
> In the short term I'd probably disable write caching on both drives
> because even if it's slower, it's incrementally safer. And also stop
> having power failures :D
>
> And in the longer term you want to redo this setup to use Btrfs raid1.
> That way it will explicitly rat out which of the two drives is
> dropping writes on power failures.
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-25  6:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 23:00 Yet another guy with a "parent transid verify failed" problem MegaBrutal
2020-08-24 23:41 ` Chris Murphy
2020-08-25  6:30   ` MegaBrutal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.