All of lore.kernel.org
 help / color / mirror / Atom feed
From: telsch <telsch@gmx.de>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	linux-btrfs@vger.kernel.org
Subject: Aw: Re: Random csum errors
Date: Tue, 3 Aug 2021 20:55:07 +0200	[thread overview]
Message-ID: <trinity-7b251a66-4376-4938-91f7-9fae2a72c5ef-1628016907507@3c-app-gmx-bap48> (raw)
In-Reply-To: <20210802233850.GO10170@hungrycats.org>

> On Mon, Aug 02, 2021 at 04:20:43PM +0200, telsch wrote:
> > Dear devs,
> >
> > since 26.07. scrub keeps reporting csum errors with random files.
> > I replaced these files from backups. Then deleted the snapshots that still contained the
> > the corrupt files. Snapshot with corrupt files I have determined with md5sum, here I get an input/output error.
> > Following new scrub, still finds new csum errors that did not exist before.
> >
> > Beginning with Kernel 5.10.52, current 5.10.55
> > btrfs-progs 5.13
> >
> > Disk layout with problems:
> >
> > mdadm raid10 4xhdd => bcache => luks
> > mdadm raid6  4xhdd => bcache => luks
>
> Missing information:  what are the model/firmware revision of the
> devices, is the bcache in writeback or writethrough mode, how many
> SSDs are there, is there a separate bcache SSD for each HDD or are
> multiple HDDs sharing any bcache SSDs?

1 SanDisk SDSSDA120G/Firmware Version: Z22000RL
I'm using only one SSD in writearound mode for both arrays.

>
> Based on the symptoms, the most likely case is there's one SSD or a
> mdadm-mirrored pair of SSDs for bcache, and at least one SSD is failing.
> It may be a SSD that is not rated for caching use cases, or a SSD with
> firmware bugs that prevent reliable error reporting.  It's also possible
> one or more HDDs is silently corrupting data, but that is less common
> in the wild.
>
> The writeback/writethrough question informs us how recoverable the
> damage is.  Damage in writethrough mode is recoverable in some cases
> by simply removing the cache and mounting the backing drives directly.
> In writeback mode the data is already gone, and if the SSD fails before
> the bcache can be fully flushed, the filesystem will be destroyed.
>
> > Already replaced 2 old hdds with high Raw_Read_Error_Rate values.
>
> 1.  Replace all SSDs in the system, or cleanly remove the SSD devices
> from the bcache.  Silent corruption is a common early failure mode on
> SSDs, and bcache doesn't use checksums to detect it.  If you continue
> to use bcache in writeback mode with a bad SSD, it will corrupt more
> and more data until the SSD finally dies, and the filesystem will be
> unrecoverable after that.  If you're using bcache in writethrough mode,
> the corruption will only be affecting reads, and you can simply remove
> and discard the SSD without damaging the filesystem (it might even fix
> previously uncorrectable data if the copy on the backing HDDs is intact).

Thanks for your explanations!
Since I am in writearound mode and the files that are corrupted were not
rewritten, I had not thought about a failing SSD and corrupted bcache reads.

As last step I detached the caching device, and the previous input/output
errors disapperd :) So you was right, the SSD looks faulty. Many thanks for
your help!

>
> 2.  If that doesn't solve the problem, run mdadm checkarray and look at
> /sys/block/md*/md/mismatch_cnt afterwards.  checkarray doesn't report
> non-zero mismatch_cnt, so you'll need to check for it separately.
> If the mismatch_cnt is non-zero, you'll have to figure out which
> drive is at fault somehow.  Neither mdadm nor SMART will tell you if
> one drive's cache RAM goes bad in an array:  mdadm doesn't know which
> drive is correct when they have different contents, and generally SMART
> cannot detect failures inside the disk's firmware runtime environment
> that might affect data integrity like cache DRAM failure.  You might
> be able to identify the bad drive by manually inspecting blocks with
> different data, but there's no automated way to do this.

It seems Arch Linux does not provide the checkarray script, so i run the
check manually - mismatch_cnt is still zero after.

>
> 3.  To avoid future problems, break the mdadm arrays into separate
> devices and put them all in a btrfs raid1 so in future btrfs can tell you
> immediately which device is corrupting your data.  (raid1 here to avoid
> issues with striped access through a SSD cache).  This might be tricky
> to achieve before the bad device is identified, because the bad device
> will keep injecting corrupted data that will abort btrfs resize/device
> delete operations.

On new systems i have already used btrfs raid1 instead of mdadm.

>
> > Aug 02 15:43:18 server kernel: BTRFS info (device dm-0): scrub: started on devid 1
> > Aug 02 15:46:06 server kernel: BTRFS warning (device dm-0): checksum error at logical 462380818432 on dev /dev/mapper/root, physical 31640150016, root 29539, inode 27412268, offset 131072, length 4096, links 1 (path: docker-volumes/mayan-edms/media/document_cache/804391c5-e3fe-4941-96dc-ecc0a1d5d8c9-23-1815-92bcac02c4a72586e21044c0b244b052f5747c7d2c25e6086ca89ca64098e3f3)
> > Aug 02 15:46:06 server kernel: BTRFS error (device dm-0): bdev /dev/mapper/root errs: wr 0, rd 0, flush 0, corrupt 414, gen 0
> > Aug 02 15:46:06 server kernel: BTRFS error (device dm-0): unable to fixup (regular) error at logical 462380818432 on dev /dev/mapper/root
> > Aug 02 15:47:25 server kernel: BTRFS info (device dm-0): scrub: finished on devid 1 with status: 0
>

  reply	other threads:[~2021-08-03 18:55 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-02 14:20 Random csum errors telsch
2021-08-02 23:38 ` Zygo Blaxell
2021-08-03 18:55   ` telsch [this message]
2021-08-03 21:16     ` Zygo Blaxell
2021-08-04  6:07       ` Andrei Borzenkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=trinity-7b251a66-4376-4938-91f7-9fae2a72c5ef-1628016907507@3c-app-gmx-bap48 \
    --to=telsch@gmx.de \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.