Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Richard Weinberger <richard@nod.at>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Decoding "unable to fixup (regular)" errors
Date: Fri, 8 Nov 2019 18:39:33 -0500
Message-ID: <20191108233933.GU22121@hungrycats.org> (raw)
In-Reply-To: <1063943113.78786.1573252282368.JavaMail.zimbra@nod.at>

[-- Attachment #1: Type: text/plain, Size: 3648 bytes --]

On Fri, Nov 08, 2019 at 11:31:22PM +0100, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Zygo Blaxell" <ce3g8jdj@umail.furryterror.org>
> > An: "richard" <richard@nod.at>
> > CC: "linux-btrfs" <linux-btrfs@vger.kernel.org>
> > Gesendet: Freitag, 8. November 2019 23:25:57
> > Betreff: Re: Decoding "unable to fixup (regular)" errors
> 
> > On Fri, Nov 08, 2019 at 11:21:56PM +0100, Richard Weinberger wrote:
> >> ----- Ursprüngliche Mail -----
> >> > btrfs found corrupted data on md1.  You appear to be using btrfs
> >> > -dsingle on a single mdadm raid1 device, so no recovery is possible
> >> > ("unable to fixup").
> >> > 
> >> >> The system has ECC memory with md1 being a RAID1 which passes all health checks.
> >> > 
> >> > mdadm doesn't have any way to repair data corruption--it can find
> >> > differences, but it cannot identify which version of the data is correct.
> >> > If one of your drives is corrupting data without reporting IO errors,
> >> > mdadm will simply copy the corruption to the other drive.  If one
> >> > drive is failing by intermittently injecting corrupted bits into reads
> >> > (e.g. because of a failure in the RAM on the drive control board),
> >> > this behavior may not show up in mdadm health checks.
> >> 
> >> Well, this is not cheap hardware...
> >> Possible, but not very likely IMHO
> > 
> > Even the disks?  We see RAM failures in disk drive embedded boards from
> > time to time.
> 
> Yes. Enterprise-Storage RAID-Edition disks (sorry for the marketing buzzwords).

Can you share the model numbers and firmware revisions?  There are a
lot of enterprise RE disks.  Not all of them work.

At least one vendor has the same firmware in their enterprise RE disks
as in their consumer drives, and it's unusually bad.  Apart from the
identical firmware revision string, the consumer and RE disks have
indistinguishable behavior in our failure mode testing, e.g.  they both
have write caching bugs on power failures, they both silently corrupt
a few blocks of data once or twice a drive-year...

> Even if one disk is silently corrupting data, having the bad block copied to
> the second disk is even more less likely to happen.
> And I run the RAID-Health check often.

Your setup is not able to detect this kind of failure very well.
We've had problems with mdadm health-check failing to report errors
even in deliberate data corruption tests.  If a resync is triggered,
all data on one drive is blindly copied to the other.  You also have
nothing checking for integrity failures between mdadm health checks
(other than btrfs csum failures when the corruption propagates to the
filesystem layer, as shown above in your log).

We do a regression test where we corrupt every block on one disk in
a btrfs raid1 (even the superblocks) and check to ensure they are all
correctly reported and repaired without interrupting applications running
on the filesystem.  btrfs has a separate csum so it knows which version
of the block is wrong, and it checks on every read so it will detect
and report errors that occur between scrubs.

The most striking thing about the description of your setup is that you
have ECC RAM and you have a scrub regime to detect errors...but you have
both a huge gap in error detection coverage and a mechanism to propagate
errors across what is supposed to be a fault isolation boundary because
you're using mdadm raid1 instead of btrfs raid1.  If one of your disks
goes bad, not only will it break your filesystem, but you won't know
which disk you need to replace.

> 
> Thanks,
> //richard

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-05 22:03 Richard Weinberger
2019-11-08 22:06 ` Richard Weinberger
2019-11-08 22:16   ` Zygo Blaxell
2019-11-08 22:09 ` Zygo Blaxell
2019-11-08 22:21   ` Richard Weinberger
2019-11-08 22:25     ` Zygo Blaxell
2019-11-08 22:31       ` Richard Weinberger
2019-11-08 23:39         ` Zygo Blaxell [this message]
2019-11-09  9:58           ` checksum errors in orphaned blocks on multiple systems (Was: Re: Decoding "unable to fixup (regular)" errors) Richard Weinberger
2019-11-13  3:34             ` Zygo Blaxell
2019-11-09 10:00           ` Decoding "unable to fixup (regular)" errors Richard Weinberger
2019-11-13  3:31             ` Zygo Blaxell
2019-11-13 18:17             ` Chris Murphy
2019-11-13 18:24               ` Chris Murphy
2019-11-16  6:16               ` Zygo Blaxell

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191108233933.GU22121@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git