All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Eric Levy <contact@ericlevy.name>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: parent transid verify failed
Date: Fri, 31 Dec 2021 14:14:10 -0500	[thread overview]
Message-ID: <Yc9Wgsint947Tj59@hungrycats.org> (raw)
In-Reply-To: <c0c6ec8de80b8e10185fe1980377dcc7af8d3200.camel@ericlevy.name>

On Thu, Dec 30, 2021 at 04:10:23PM -0500, Eric Levy wrote:
> Hello.
> 
> I had a simple Btrfs partition, with only one subvolume, about 250 Gb
> in size. As it began to fill, I added a second volume, live. By the
> time the size of the file system reached the limit for the first
> volume, the file system reverted to read only.
> 
> >From journalctl, the following message has recurred, with the same
> numeric values:
> 
> BTRFS error (device sdc1): parent transid verify failed on 867434496
> wanted 9212 found 8675

To be clear, do the parent transid verify failed messages appear _before_
or _after_ the filesystem switches to read-only?

"After" is fine.  When the kernel switches btrfs to read-only, it stops
updating the disk, so pointers in memory no longer match what's on disk
and you will get a whole stream of errors that only exist in btrfs's
kernel RAM.  A umount/mount will clear those.  This is most likely caused
by running out of metadata space because you didn't balance data block
groups on the first drive after (or in some cases before) adding the
second drive.

"Before" is the unrecoverable case.  Some drives silently dropped writes,
which is a failure btrfs can detect but not recover from except in
cases where the missing data is present on some other drive (i.e. RAID1
configurations).  Depending on how "added a second volume, live" was done,
writes could be interrupted or lost on the first drive without reporting
to the kernel (e.g. bumping the cables or browning out a power supply).

Since the "after" case can happen on healthy hardware in this scenario,
but the "before" case requires a hardware failure, it's more likely
you're in the "after" case, and the filesystem can be recovered by
carefully rearranging the data on the disks.  We'll need the output of
'btrfs fi usage' to see where to start with this.

> Presently, the file system mounts only as read only. It will not mount
> in read-write, even with the usebackuproot option. 
> 
> It seems that balance and scrub are not available, either due to read-
> only mode, or some other reason. Both abort as soon as they begin to
> run.

Mount with '-o skip_balance'.  If you're in the "after" case then this
will avoid running out of metadata space again during mount.

> What is the best next step for recovery?

Confirm whether the first "parent transid verify failed" message appears
before or after the filesystem is forced read-only.  If it's before,
the best next step is mkfs and restore your backups.

If it's after, try -o skip_balance and provide us with 'btrfs fi usage'
details.

You will need to rearrange free space (balance with filters, delete some
data, or add additional drives temporarily) so that you can do a data
balance, then balance data block groups until both drives have equal free
space on them.  Also you should convert all existing metadata to raid1
profile (there's no sane use case for dup metadata on multiple drives)
but you'll have to do that after making space with data balances.

  parent reply	other threads:[~2021-12-31 19:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-30 21:10 parent transid verify failed Eric Levy
2021-12-30 21:47 ` Chris Murphy
2022-01-01 15:11   ` devel
2021-12-31 19:14 ` Zygo Blaxell [this message]
2021-12-31 20:33   ` Eric Levy
2021-12-31 23:09     ` Chris Murphy
2022-01-01  7:33       ` Eric Levy
2022-01-01 20:49         ` Chris Murphy
2022-01-01 21:57           ` Eric Levy
2022-01-01 20:56     ` Zygo Blaxell
2022-01-01 21:58       ` Eric Levy
2022-01-02  0:15         ` Zygo Blaxell
2022-01-02  0:55           ` Eric Levy
2022-01-02  3:27             ` Zygo Blaxell
2022-01-02  4:03               ` Eric Levy
2022-01-02  5:57                 ` Zygo Blaxell
2022-01-02 10:17                   ` Eric Levy
2022-01-03  7:41                 ` Chris Murphy
2022-01-02  7:31     ` Andrei Borzenkov
  -- strict thread matches above, loose matches on Subject: below --
2017-05-11 10:01 Massimo B.
     [not found] <E18363B1-CD81-41F4-A03C-4D09AA669915@plack.net>
2015-04-28 12:34 ` Anthony Plack
2010-09-06 17:28 Jan Steffens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yc9Wgsint947Tj59@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=contact@ericlevy.name \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.