All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID1 parent transid failed
@ 2023-09-25 23:17 Shaun Hills
  2023-09-25 23:25 ` Qu Wenruo
  0 siblings, 1 reply; 2+ messages in thread
From: Shaun Hills @ 2023-09-25 23:17 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]

Hello,

I have a RAID1 BTRFS volume on my Ubuntu home NAS with 3x4TB SATA
disks. It has been running successfully for some years. But I run an
hourly script to check its health that just started reporting errors
last night. Script does this:

/usr/bin/btrfs device stats --check /mnt/btrfs | grep -vE ' 0$'

# if grep returns 0 i.e. success i.e. a match
if [ $? = 0 ]; then
    echo "Subject: Alert - BTRFS degradation reported on $(hostname)"
| sendmail <address redacted>
fi

What it started getting back:

[/dev/sdd].write_io_errs    93877
[/dev/sdd].read_io_errs     556
[/dev/sdd].corruption_errs  569

Looks like /dev/sdd might be failing. Though I haven't 100% ruled out
a SATA controller problem because an ext4 disk on the box was also
showing some odd behaviour.

I rebooted and ended up in Ubuntu "emergency mode". journalctl -xb
showed some problems with mounting the BTRFS and ext4 disks; don't
remember exact messages.

A further reboot brought everything back seeimgly OK, ext4 fsck is OK,
but when digging deeper I have worrying BTRFS errors:

sudo btrfs check --force --readonly /dev/sdb1
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
parent transid verify failed on 15182296121344 wanted 2311287 found 2310777
parent transid verify failed on 15182153809920 wanted 2311286 found 2311267
parent transid verify failed on 15182088847360 wanted 2311286 found 2311267
parent transid verify failed on 15182164918272 wanted 2311286 found 2311267
parent transid verify failed on 15182318321664 wanted 2311286 found 2311267
parent transid verify failed on 15182164983808 wanted 2311286 found 2311267
parent transid verify failed on 15182319714304 wanted 2311286 found 2311267
parent transid verify failed on 15182319845376 wanted 2311286 found 2311267
parent transid verify failed on 15182396751872 wanted 2311328 found 2310808
^C
(truncated for brevity)

Luckily I can still mount the volume, and am currently running btrfs
scrub (hasn't completed yet).

I *did* have backups, but in a case of bad timing I got rid of them a
few weeks ago :( - offsite company I lost confidence in. New backups
by shipping snapshots to an offsite server are being worked on but not
done yet.

I'm going to buy an external USB disk and try copying all the data
off. After getting the data off I will probably also try replacing
/dev/sdd.

At this stage I have two questions:

1) Is there a command I should use to maximise chances of
success/minimise problems/check for errors when copying data, or is
something like rsync -av OK?

2) Is there anything else I should be trying to correct the transid
verify error? Quite a lot of info on the Internet is a few years old,
and this seems like the sort of error where it pays to be careful.

thanks

===
Info asked for by the mailing list:

uname -a
Linux storage 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC
2023 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version
btrfs-progs v5.16.2

btrfs fi show
Label: none  uuid: ab1359a8-98ae-4f6d-a581-fbfa40ef633f
        Total devices 3 FS bytes used 4.79TiB
        devid    1 size 3.64TiB used 3.34TiB path /dev/sdd
        devid    2 size 3.64TiB used 3.34TiB path /dev/sdb1
        devid    3 size 3.64TiB used 3.34TiB path /dev/sda

btrfs fi df /mnt/btrfs
Data, RAID1: total=5.00TiB, used=4.78TiB
System, RAID1: total=32.00MiB, used=768.00KiB
Metadata, RAID1: total=6.00GiB, used=5.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

dmesg > dmesg.log (attached)

[-- Attachment #2: dmesg.log.gz --]
[-- Type: application/x-gzip, Size: 39992 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RAID1 parent transid failed
  2023-09-25 23:17 RAID1 parent transid failed Shaun Hills
@ 2023-09-25 23:25 ` Qu Wenruo
  0 siblings, 0 replies; 2+ messages in thread
From: Qu Wenruo @ 2023-09-25 23:25 UTC (permalink / raw)
  To: Shaun Hills, linux-btrfs



On 2023/9/26 08:47, Shaun Hills wrote:
> Hello,
>
> I have a RAID1 BTRFS volume on my Ubuntu home NAS with 3x4TB SATA
> disks. It has been running successfully for some years. But I run an
> hourly script to check its health that just started reporting errors
> last night. Script does this:
>
> /usr/bin/btrfs device stats --check /mnt/btrfs | grep -vE ' 0$'
>
> # if grep returns 0 i.e. success i.e. a match
> if [ $? = 0 ]; then
>      echo "Subject: Alert - BTRFS degradation reported on $(hostname)"
> | sendmail <address redacted>
> fi
>
> What it started getting back:
>
> [/dev/sdd].write_io_errs    93877
> [/dev/sdd].read_io_errs     556
> [/dev/sdd].corruption_errs  569
>
> Looks like /dev/sdd might be failing. Though I haven't 100% ruled out
> a SATA controller problem because an ext4 disk on the box was also
> showing some odd behaviour.
>
> I rebooted and ended up in Ubuntu "emergency mode". journalctl -xb
> showed some problems with mounting the BTRFS and ext4 disks; don't
> remember exact messages.
>
> A further reboot brought everything back seeimgly OK, ext4 fsck is OK,
> but when digging deeper I have worrying BTRFS errors:
>
> sudo btrfs check --force --readonly /dev/sdb1

--force is only recommended if your fs is mounted RO.

Or any transid mismatch can happen due to the race between btrfs check
and kernel.

> Opening filesystem to check...
> WARNING: filesystem mounted, continuing because of --force
> parent transid verify failed on 15182296121344 wanted 2311287 found 2310777
> parent transid verify failed on 15182153809920 wanted 2311286 found 2311267
> parent transid verify failed on 15182088847360 wanted 2311286 found 2311267
> parent transid verify failed on 15182164918272 wanted 2311286 found 2311267
> parent transid verify failed on 15182318321664 wanted 2311286 found 2311267
> parent transid verify failed on 15182164983808 wanted 2311286 found 2311267
> parent transid verify failed on 15182319714304 wanted 2311286 found 2311267
> parent transid verify failed on 15182319845376 wanted 2311286 found 2311267
> parent transid verify failed on 15182396751872 wanted 2311328 found 2310808
> ^C
> (truncated for brevity)
>
> Luckily I can still mount the volume, and am currently running btrfs
> scrub (hasn't completed yet).

For mounted fs, scrub is more recommended, but for a full, more
comprehensive result, it's strongly recommended to go "btrfs check
--readonly" on an unmounted fs.

>
> I *did* have backups, but in a case of bad timing I got rid of them a
> few weeks ago :( - offsite company I lost confidence in. New backups
> by shipping snapshots to an offsite server are being worked on but not
> done yet.
>
> I'm going to buy an external USB disk and try copying all the data
> off. After getting the data off I will probably also try replacing
> /dev/sdd.
>
> At this stage I have two questions:
>
> 1) Is there a command I should use to maximise chances of
> success/minimise problems/check for errors when copying data, or is
> something like rsync -av OK?

No special requirement. If you hit an EIO, and you're using btrfs RAID1,
it really means a problem.
Otherwise btrfs would try to fix the problem all by itself.

>
> 2) Is there anything else I should be trying to correct the transid
> verify error? Quite a lot of info on the Internet is a few years old,
> and this seems like the sort of error where it pays to be careful.

I'd say it's more like a false alert due to the forced check on a RW
mounted fs.

But still, your sdd doesn't look good at all.

Thanks,
Qu

>
> thanks
>
> ===
> Info asked for by the mailing list:
>
> uname -a
> Linux storage 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC
> 2023 x86_64 x86_64 x86_64 GNU/Linux
>
> btrfs --version
> btrfs-progs v5.16.2
>
> btrfs fi show
> Label: none  uuid: ab1359a8-98ae-4f6d-a581-fbfa40ef633f
>          Total devices 3 FS bytes used 4.79TiB
>          devid    1 size 3.64TiB used 3.34TiB path /dev/sdd
>          devid    2 size 3.64TiB used 3.34TiB path /dev/sdb1
>          devid    3 size 3.64TiB used 3.34TiB path /dev/sda
>
> btrfs fi df /mnt/btrfs
> Data, RAID1: total=5.00TiB, used=4.78TiB
> System, RAID1: total=32.00MiB, used=768.00KiB
> Metadata, RAID1: total=6.00GiB, used=5.84GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> dmesg > dmesg.log (attached)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-09-25 23:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 23:17 RAID1 parent transid failed Shaun Hills
2023-09-25 23:25 ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.