linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Remi Gauvin <remi@georgianit.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs vs write caching firmware bugs (was: Re: BTRFS recovery not possible)
Date: Mon, 24 Jun 2019 00:37:51 -0400	[thread overview]
Message-ID: <20190624043751.GB11820@hungrycats.org> (raw)
In-Reply-To: <8e1b9a48-178b-4f93-6efd-e933ff1a4f54@georgianit.com>

[-- Attachment #1: Type: text/plain, Size: 2129 bytes --]

On Sun, Jun 23, 2019 at 10:45:50PM -0400, Remi Gauvin wrote:
> On 2019-06-23 4:45 p.m., Zygo Blaxell wrote:
> 
> > 	Model Family: Western Digital Green Device Model: WDC WD20EZRX-00DC0B0 Firmware Version: 80.00A80
> > 
> > Change the query to 1-30 power cycles, and we get another model with
> > the same firmware version string:
> > 
> > 	Model Family: Western Digital Red Device Model: WDC WD40EFRX-68WT0N0 Firmware Version: 80.00A80
> > 
> 
> > 
> > These drives have 0 power fail events between mkfs and "parent transid
> > verify failed" events, i.e. it's not necessary to have a power failure
> > at all for these drives to unrecoverably corrupt btrfs.  In all cases the
> > failure occurs on the same days as "Current Pending Sector" and "Offline
> > UNC sector" SMART events.  The WD Black firmware seems to be OK with write
> > cache enabled most of the time (there's years in the log data without any
> > transid-verify failures), but the WD Black will drop its write cache when
> > it sees a UNC sector, and btrfs notices the failure a few hours later.
> > 
> 
> First, thank you very much for sharing.  I've seen you mention several
> times before problems with common consumer drives, but seeing one
> specific identified problem firmware version is *very* valuable info.
> 
> I have a question about the Black Drives dropping the cache on UNC
> error.  If a transid id error like that occurred on a BTRFS RAID 1,
> would BTRFS find the correct metadata on the 2nd drive, or does it stop
> dead on 1 transid failure?

Well, the 2nd drive has to have correct metadata--if you are mirroring
a pair of disks with the same firmware bug, that's not likely to happen.

There is a bench test that will demonstrate the transid verify self-repair
procedure: disconnect one half of a RAID1 array, write for a while, then
reconnect and do a scrub.  btrfs should self-repair all the metadata on
the disconnected drive until it all matches the connected one.  Some of
the data blocks might be hosed though (due to CRC32 collisions), so
don't do this test on data you care about.

> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2019-06-24  4:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-23 20:45 btrfs vs write caching firmware bugs (was: Re: BTRFS recovery not possible) Zygo Blaxell
2019-06-24  0:46 ` Qu Wenruo
2019-06-24  4:29   ` Zygo Blaxell
2019-06-24  5:39     ` Qu Wenruo
2019-06-24 17:31   ` Chris Murphy
2019-06-26  2:30     ` Zygo Blaxell
2019-07-02 13:32     ` Andrea Gelmini
2019-06-24  2:45 ` Remi Gauvin
2019-06-24  4:37   ` Zygo Blaxell [this message]
2019-06-24  5:27     ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190624043751.GB11820@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=remi@georgianit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).