linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?
@ 2021-07-16 22:44 Jorge Bastos
  2021-07-21 17:44 ` David Sterba
  0 siblings, 1 reply; 11+ messages in thread
From: Jorge Bastos @ 2021-07-16 22:44 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,

This was a single disk filesystem, DUP metadata, and this week it stop
mounting out of the blue, the data is not a concern since I have a
full fs snapshot in another server, just curious why this happened, I
remember reading that some WD disks have firmware with write caches
issues, and I believe this disk is affected:

Model family:Western Digital Green
Device model:WDC WD20EZRX-00D8PB0
Firmware version:80.00A80

SMART looks mostly OK, except "Raw read error rate" is high, which in
my experience is never a good sign on these disks, but I didn't get
any read errors so far, also no unclean shutdown, it was working
normally last time I mounted it, and after a clean shutdown, probably
just after deleting some snapshots, I now get this:

Jul 16 23:27:38 TV1 emhttpd: shcmd (129): mount -t btrfs -o
noatime,nodiratime /dev/md20 /mnt/disk20
Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): using free space tree
Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): has skinny extents
Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block
start, want 419774464 have 0
Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block
start, want 419774464 have 0
Jul 16 23:27:38 TV1 kernel: BTRFS warning (device md20): failed to
read root (objectid=2): -5

Kernel is kind of old, 4.19.107, but there are 21 more btrfs file
systems on this server, some using identical disks and no issues for a
long time until now, btrfs check output:

~# btrfs check /dev/md20
Opening filesystem to check...
checksum verify failed on 419774464 found 000000B6 wanted 00000000
checksum verify failed on 419774464 found 00000058 wanted 00000000
checksum verify failed on 419774464 found 000000B6 wanted 00000000
bad tree block 419774464, bytenr mismatch, want=419774464, have=0
ERROR: could not setup extent tree
ERROR: cannot open file system

Could this type of error be explained by a bad disk firmware?

Regards,
Jorge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?
  2021-07-16 22:44 "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue? Jorge Bastos
@ 2021-07-21 17:44 ` David Sterba
  2021-07-21 18:14   ` Jorge Bastos
  2021-07-22  0:18   ` Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?') Qu Wenruo
  0 siblings, 2 replies; 11+ messages in thread
From: David Sterba @ 2021-07-21 17:44 UTC (permalink / raw)
  To: Jorge Bastos; +Cc: Btrfs BTRFS, Zygo Blaxell

On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
> Hi,
> 
> This was a single disk filesystem, DUP metadata, and this week it stop
> mounting out of the blue, the data is not a concern since I have a
> full fs snapshot in another server, just curious why this happened, I
> remember reading that some WD disks have firmware with write caches
> issues, and I believe this disk is affected:
> 
> Model family:Western Digital Green
> Device model:WDC WD20EZRX-00D8PB0
> Firmware version:80.00A80

For the record summing up the discussion from IRC with Zygo, this
particular firmware 80.00A80 on WD Green is known to have problematic
firmware and would explain the observed errors.

Recommendation is not to use WD Green or periodically disable the write
cache by 'hdparm -W0'.

> SMART looks mostly OK, except "Raw read error rate" is high, which in
> my experience is never a good sign on these disks, but I didn't get
> any read errors so far, also no unclean shutdown, it was working
> normally last time I mounted it, and after a clean shutdown, probably
> just after deleting some snapshots, I now get this:
> 
> Jul 16 23:27:38 TV1 emhttpd: shcmd (129): mount -t btrfs -o
> noatime,nodiratime /dev/md20 /mnt/disk20
> Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): using free space tree
> Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): has skinny extents
> Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block
> start, want 419774464 have 0

When the 'have' values are zeros it means the blocks were empty so eg.
trimmed, or not written at all.

> Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block
> start, want 419774464 have 0
> Jul 16 23:27:38 TV1 kernel: BTRFS warning (device md20): failed to
> read root (objectid=2): -5
> 
> Kernel is kind of old, 4.19.107, but there are 21 more btrfs file
> systems on this server, some using identical disks and no issues for a
> long time until now, btrfs check output:
> 
> ~# btrfs check /dev/md20
> Opening filesystem to check...
> checksum verify failed on 419774464 found 000000B6 wanted 00000000
> checksum verify failed on 419774464 found 00000058 wanted 00000000
> checksum verify failed on 419774464 found 000000B6 wanted 00000000
                                            ^^^^^^^^

This is an artifact of incorrectly printed checksums, fixed in
btrfs-progs v5.11.1

> bad tree block 419774464, bytenr mismatch, want=419774464, have=0
> ERROR: could not setup extent tree
> ERROR: cannot open file system
> 
> Could this type of error be explained by a bad disk firmware?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?
  2021-07-21 17:44 ` David Sterba
@ 2021-07-21 18:14   ` Jorge Bastos
  2021-11-22 13:49     ` Jorge Bastos
  2021-07-22  0:18   ` Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?') Qu Wenruo
  1 sibling, 1 reply; 11+ messages in thread
From: Jorge Bastos @ 2021-07-21 18:14 UTC (permalink / raw)
  To: dsterba, Jorge Bastos, Btrfs BTRFS, Zygo Blaxell

On Wed, Jul 21, 2021 at 6:47 PM David Sterba <dsterba@suse.cz> wrote:
>
> For the record summing up the discussion from IRC with Zygo, this
> particular firmware 80.00A80 on WD Green is known to have problematic
> firmware and would explain the observed errors.
>
> Recommendation is not to use WD Green or periodically disable the write
> cache by 'hdparm -W0'.
>

Thank you for the reply, yes, from now on I intend to disable write
cache on those disks, since I still have a lot of them in use.

Jorge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-21 17:44 ` David Sterba
  2021-07-21 18:14   ` Jorge Bastos
@ 2021-07-22  0:18   ` Qu Wenruo
  2021-07-22 13:54     ` David Sterba
  1 sibling, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2021-07-22  0:18 UTC (permalink / raw)
  To: dsterba, Jorge Bastos, Btrfs BTRFS, Zygo Blaxell



On 2021/7/22 上午1:44, David Sterba wrote:
> On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
>> Hi,
>>
>> This was a single disk filesystem, DUP metadata, and this week it stop
>> mounting out of the blue, the data is not a concern since I have a
>> full fs snapshot in another server, just curious why this happened, I
>> remember reading that some WD disks have firmware with write caches
>> issues, and I believe this disk is affected:
>>
>> Model family:Western Digital Green
>> Device model:WDC WD20EZRX-00D8PB0
>> Firmware version:80.00A80
>
> For the record summing up the discussion from IRC with Zygo, this
> particular firmware 80.00A80 on WD Green is known to have problematic
> firmware and would explain the observed errors.
>
> Recommendation is not to use WD Green or periodically disable the write
> cache by 'hdparm -W0'.

Zygo is always the god to expose bad hardware.

Can we maintain a list of known bad hardware inside btrfs-wiki?
And maybe escalate it to other fses too?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-22  0:18   ` Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?') Qu Wenruo
@ 2021-07-22 13:54     ` David Sterba
  2021-07-24 23:15       ` Zygo Blaxell
  0 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2021-07-22 13:54 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Jorge Bastos, Btrfs BTRFS, Zygo Blaxell

On Thu, Jul 22, 2021 at 08:18:21AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/7/22 上午1:44, David Sterba wrote:
> > On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
> >> Hi,
> >>
> >> This was a single disk filesystem, DUP metadata, and this week it stop
> >> mounting out of the blue, the data is not a concern since I have a
> >> full fs snapshot in another server, just curious why this happened, I
> >> remember reading that some WD disks have firmware with write caches
> >> issues, and I believe this disk is affected:
> >>
> >> Model family:Western Digital Green
> >> Device model:WDC WD20EZRX-00D8PB0
> >> Firmware version:80.00A80
> >
> > For the record summing up the discussion from IRC with Zygo, this
> > particular firmware 80.00A80 on WD Green is known to have problematic
> > firmware and would explain the observed errors.
> >
> > Recommendation is not to use WD Green or periodically disable the write
> > cache by 'hdparm -W0'.
> 
> Zygo is always the god to expose bad hardware.
> 
> Can we maintain a list of known bad hardware inside btrfs-wiki?
> And maybe escalate it to other fses too?

Yeah a list on wiki would be great, though I'm a bit skeptical about
keeping it up up to date, there are very few active wiki editors, the
knowledge is still mostly stored in the IRC logs. But without a landing
page on wiki we can't even start, so I'll create it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-22 13:54     ` David Sterba
@ 2021-07-24 23:15       ` Zygo Blaxell
  2021-07-25  3:34         ` Chris Murphy
  2021-07-25  5:27         ` Qu Wenruo
  0 siblings, 2 replies; 11+ messages in thread
From: Zygo Blaxell @ 2021-07-24 23:15 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Jorge Bastos, Btrfs BTRFS

On Thu, Jul 22, 2021 at 03:54:55PM +0200, David Sterba wrote:
> On Thu, Jul 22, 2021 at 08:18:21AM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2021/7/22 上午1:44, David Sterba wrote:
> > > On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
> > >> Hi,
> > >>
> > >> This was a single disk filesystem, DUP metadata, and this week it stop
> > >> mounting out of the blue, the data is not a concern since I have a
> > >> full fs snapshot in another server, just curious why this happened, I
> > >> remember reading that some WD disks have firmware with write caches
> > >> issues, and I believe this disk is affected:
> > >>
> > >> Model family:Western Digital Green
> > >> Device model:WDC WD20EZRX-00D8PB0
> > >> Firmware version:80.00A80
> > >
> > > For the record summing up the discussion from IRC with Zygo, this
> > > particular firmware 80.00A80 on WD Green is known to have problematic
> > > firmware and would explain the observed errors.
> > >
> > > Recommendation is not to use WD Green or periodically disable the write
> > > cache by 'hdparm -W0'.
> > 
> > Zygo is always the god to expose bad hardware.
> > 
> > Can we maintain a list of known bad hardware inside btrfs-wiki?
> > And maybe escalate it to other fses too?
> 
> Yeah a list on wiki would be great, though I'm a bit skeptical about
> keeping it up up to date, there are very few active wiki editors, the
> knowledge is still mostly stored in the IRC logs. But without a landing
> page on wiki we can't even start, so I'll create it.

Some points to note:

Most HDD *models* are good (all but 4% of models I've tested, and the
ones that failed were mostly 8?.00A8?), but the very few models that
are bad form a significant portion of drives in use:  they are the cheap
drives that consumers and OEMs buy millions of every year.

80.00A80 keeps popping up in parent-transid-verify-failed reports from
IRC users.  Sometimes also 81.00A81 and 82.00A82 (those two revisions
appear on some NAS vendor blacklists as well).  I've never seen 83.00A83
fail--I have some drives with that firmware, and they seem OK, and I
have not seen any reports about it.

80.00A80 may appear in a lot of low-end WD drive models (here "low end"
is "anything below Gold and Ultrastar"), marketed under other names like
White Label, or starring as the unspecified model inside USB external
drives.

The bad WD firmware has been sold over a period of at least 8 years.
Retail consumers can buy new drives today with this firmware (the most
recent instance we found was a WD Blue 1TB if I'm decoding the model
string correctly).  Even though WD seems to have fixed the bugs years
ago (in 83.00A83), the bad firmware doesn't die out as hardware ages
out of the user population because users keep buying new drives with
the old firmware.

It seems that _any_ HDD might have write cache issues if it is having
some kind of hardware failure at the same time (e.g. UNC sectors or
power supply issues).  A failing drive is a failing drive, it might blow
up a btrfs with dup profile that would otherwise have survived.  It is
possible that firmware bugs are involved in these cases, but it's hard
to make a test fleet large enough for meaningful and consistent results.

SSDs are a different story:  there are so many models, firmware revisions
are far more diverse, and vendors are still rapidly updating their
designs, so we never see exactly the same firmware in any two incident
reports.  A firmware list would be obsolete in days.  There is nothing
in SSD firmware like the decade-long stability there is in HDD firmware.

IRC users report occasional parent-transid-verify-failure or similar
metadata corruption failures on SSDs, but they don't seem to be repeatable
with other instances of the same model device.  Samsung dominates the
SSD problem reports, but Samsung also dominates the consumer SSD market,
so I think we are just seeing messy-but-normal-for-SSD hardware failures,
not evidence of firmware bugs.

It's also possible that the window for exploiting a powerfail write cache
bug is much, much shorter for SSD than HDD, so even if the bugs do exist,
the probability of hitting one is negligible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-24 23:15       ` Zygo Blaxell
@ 2021-07-25  3:34         ` Chris Murphy
  2021-07-27  9:02           ` David Sterba
  2021-07-25  5:27         ` Qu Wenruo
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2021-07-25  3:34 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: David Sterba, Qu Wenruo, Jorge Bastos, Btrfs BTRFS

On Sat, Jul 24, 2021 at 5:16 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> SSDs are a different story:  there are so many models, firmware revisions
> are far more diverse, and vendors are still rapidly updating their
> designs, so we never see exactly the same firmware in any two incident
> reports.  A firmware list would be obsolete in days.  There is nothing
> in SSD firmware like the decade-long stability there is in HDD firmware.

It might still be worth having reports act as a counter. 0-3 might be
"not enough info", 4-7 might be "suspicious", 8+ might be "consider
yourself warned".

But the scale could be a problem due to the small sample size.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-24 23:15       ` Zygo Blaxell
  2021-07-25  3:34         ` Chris Murphy
@ 2021-07-25  5:27         ` Qu Wenruo
  2021-07-26  2:53           ` Zygo Blaxell
  1 sibling, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2021-07-25  5:27 UTC (permalink / raw)
  To: Zygo Blaxell, dsterba, Jorge Bastos, Btrfs BTRFS



On 2021/7/25 上午7:15, Zygo Blaxell wrote:
> On Thu, Jul 22, 2021 at 03:54:55PM +0200, David Sterba wrote:
>> On Thu, Jul 22, 2021 at 08:18:21AM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/7/22 上午1:44, David Sterba wrote:
>>>> On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
>>>>> Hi,
>>>>>
>>>>> This was a single disk filesystem, DUP metadata, and this week it stop
>>>>> mounting out of the blue, the data is not a concern since I have a
>>>>> full fs snapshot in another server, just curious why this happened, I
>>>>> remember reading that some WD disks have firmware with write caches
>>>>> issues, and I believe this disk is affected:
>>>>>
>>>>> Model family:Western Digital Green
>>>>> Device model:WDC WD20EZRX-00D8PB0
>>>>> Firmware version:80.00A80
>>>>
>>>> For the record summing up the discussion from IRC with Zygo, this
>>>> particular firmware 80.00A80 on WD Green is known to have problematic
>>>> firmware and would explain the observed errors.
>>>>
>>>> Recommendation is not to use WD Green or periodically disable the write
>>>> cache by 'hdparm -W0'.
>>>
>>> Zygo is always the god to expose bad hardware.
>>>
>>> Can we maintain a list of known bad hardware inside btrfs-wiki?
>>> And maybe escalate it to other fses too?
>>
>> Yeah a list on wiki would be great, though I'm a bit skeptical about
>> keeping it up up to date, there are very few active wiki editors, the
>> knowledge is still mostly stored in the IRC logs. But without a landing
>> page on wiki we can't even start, so I'll create it.
>
> Some points to note:
>
> Most HDD *models* are good (all but 4% of models I've tested, and the
> ones that failed were mostly 8?.00A8?),

That's what we expect.

> but the very few models that
> are bad form a significant portion of drives in use:  they are the cheap
> drives that consumers and OEMs buy millions of every year.
>
> 80.00A80 keeps popping up in parent-transid-verify-failed reports from
> IRC users.  Sometimes also 81.00A81 and 82.00A82 (those two revisions
> appear on some NAS vendor blacklists as well).  I've never seen 83.00A83
> fail--I have some drives with that firmware, and they seem OK, and I
> have not seen any reports about it.

In fact, even just one model number is much better than nothing.

We know nowadays btrfs is even able to detect bitflip, but we don't
really want weird hardware to bring blame which we don't deserve.

>
> 80.00A80 may appear in a lot of low-end WD drive models (here "low end"
> is "anything below Gold and Ultrastar"), marketed under other names like
> White Label, or starring as the unspecified model inside USB external
> drives.
>
> The bad WD firmware has been sold over a period of at least 8 years.
> Retail consumers can buy new drives today with this firmware (the most
> recent instance we found was a WD Blue 1TB if I'm decoding the model
> string correctly).  Even though WD seems to have fixed the bugs years
> ago (in 83.00A83), the bad firmware doesn't die out as hardware ages
> out of the user population because users keep buying new drives with
> the old firmware.
>
> It seems that _any_ HDD might have write cache issues if it is having
> some kind of hardware failure at the same time (e.g. UNC sectors or
> power supply issues).  A failing drive is a failing drive, it might blow
> up a btrfs with dup profile that would otherwise have survived.  It is
> possible that firmware bugs are involved in these cases, but it's hard
> to make a test fleet large enough for meaningful and consistent results.

For such case, I guess smart is enough to tell the drive is failing?
Thus it shouldn't be that a big concern IMHO.

>
> SSDs are a different story:  there are so many models, firmware revisions
> are far more diverse, and vendors are still rapidly updating their
> designs, so we never see exactly the same firmware in any two incident
> reports.  A firmware list would be obsolete in days.  There is nothing
> in SSD firmware like the decade-long stability there is in HDD firmware.

Yeah, that's more or less expected.

So we don't need to bother that for now.

Thanks for your awesome info again!
Qu

>
> IRC users report occasional parent-transid-verify-failure or similar
> metadata corruption failures on SSDs, but they don't seem to be repeatable
> with other instances of the same model device.  Samsung dominates the
> SSD problem reports, but Samsung also dominates the consumer SSD market,
> so I think we are just seeing messy-but-normal-for-SSD hardware failures,
> not evidence of firmware bugs.
>
> It's also possible that the window for exploiting a powerfail write cache
> bug is much, much shorter for SSD than HDD, so even if the bugs do exist,
> the probability of hitting one is negligible.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-25  5:27         ` Qu Wenruo
@ 2021-07-26  2:53           ` Zygo Blaxell
  0 siblings, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2021-07-26  2:53 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Jorge Bastos, Btrfs BTRFS

On Sun, Jul 25, 2021 at 01:27:56PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/7/25 上午7:15, Zygo Blaxell wrote:
> > On Thu, Jul 22, 2021 at 03:54:55PM +0200, David Sterba wrote:
> > > On Thu, Jul 22, 2021 at 08:18:21AM +0800, Qu Wenruo wrote:
> > > > 
> > > > 
> > > > On 2021/7/22 上午1:44, David Sterba wrote:
> > > > > On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > This was a single disk filesystem, DUP metadata, and this week it stop
> > > > > > mounting out of the blue, the data is not a concern since I have a
> > > > > > full fs snapshot in another server, just curious why this happened, I
> > > > > > remember reading that some WD disks have firmware with write caches
> > > > > > issues, and I believe this disk is affected:
> > > > > > 
> > > > > > Model family:Western Digital Green
> > > > > > Device model:WDC WD20EZRX-00D8PB0
> > > > > > Firmware version:80.00A80
> > > > > 
> > > > > For the record summing up the discussion from IRC with Zygo, this
> > > > > particular firmware 80.00A80 on WD Green is known to have problematic
> > > > > firmware and would explain the observed errors.
> > > > > 
> > > > > Recommendation is not to use WD Green or periodically disable the write
> > > > > cache by 'hdparm -W0'.
> > > > 
> > > > Zygo is always the god to expose bad hardware.
> > > > 
> > > > Can we maintain a list of known bad hardware inside btrfs-wiki?
> > > > And maybe escalate it to other fses too?
> > > 
> > > Yeah a list on wiki would be great, though I'm a bit skeptical about
> > > keeping it up up to date, there are very few active wiki editors, the
> > > knowledge is still mostly stored in the IRC logs. But without a landing
> > > page on wiki we can't even start, so I'll create it.
> > 
> > Some points to note:
> > 
> > Most HDD *models* are good (all but 4% of models I've tested, and the
> > ones that failed were mostly 8?.00A8?),
> 
> That's what we expect.
> 
> > but the very few models that
> > are bad form a significant portion of drives in use:  they are the cheap
> > drives that consumers and OEMs buy millions of every year.
> > 
> > 80.00A80 keeps popping up in parent-transid-verify-failed reports from
> > IRC users.  Sometimes also 81.00A81 and 82.00A82 (those two revisions
> > appear on some NAS vendor blacklists as well).  I've never seen 83.00A83
> > fail--I have some drives with that firmware, and they seem OK, and I
> > have not seen any reports about it.
> 
> In fact, even just one model number is much better than nothing.

WD in particular seems to use firmware revisions as unique identifiers
across many drive models.  For this vendor, firmware revision is a more
predictive indicator than model number.  There may exist drives where
the same model number is used, but newer (and bug-fixed) firmware is
inside them.

Of course if you want to buy a drive, you usually only get to pick the
model number, not the firmware revision.

Other vendors don't do it this way.  Some vendors put "0001" or similar in
the firmware revision field of several completely different drive models.
In those cases we really do need the model number.

A raw table of drive stats should include both fields.

> We know nowadays btrfs is even able to detect bitflip, but we don't
> really want weird hardware to bring blame which we don't deserve.
> 
> > 80.00A80 may appear in a lot of low-end WD drive models (here "low end"
> > is "anything below Gold and Ultrastar"), marketed under other names like
> > White Label, or starring as the unspecified model inside USB external
> > drives.
> > 
> > The bad WD firmware has been sold over a period of at least 8 years.
> > Retail consumers can buy new drives today with this firmware (the most
> > recent instance we found was a WD Blue 1TB if I'm decoding the model
> > string correctly).  Even though WD seems to have fixed the bugs years
> > ago (in 83.00A83), the bad firmware doesn't die out as hardware ages
> > out of the user population because users keep buying new drives with
> > the old firmware.
> > 
> > It seems that _any_ HDD might have write cache issues if it is having
> > some kind of hardware failure at the same time (e.g. UNC sectors or
> > power supply issues).  A failing drive is a failing drive, it might blow
> > up a btrfs with dup profile that would otherwise have survived.  It is
> > possible that firmware bugs are involved in these cases, but it's hard
> > to make a test fleet large enough for meaningful and consistent results.
> 
> For such case, I guess smart is enough to tell the drive is failing?
> Thus it shouldn't be that a big concern IMHO.

A clean report from SMART does not necessarily mean the drive is
healthy--it just means the drive didn't record any failures, possibly
because the facility for recording failures itself is broken.

Some SMART implementations store SMART failure stats on the drive platter.
If there's a hardware failure that prevents the firmware from completing
data writes, that failure can also prevent updates of SMART error logs
or event counters.

Some SSD firmware implementations report no errors at any point in the
drive's lifetime, even as the drive ages and begins to silently corrupt
data at exponentially increasing rates, and the drive eventually stops
responding entirely.  SMART is useless on such devices--the firmware
can't detect errors, so it can't increment any SMART error counters or
add entries to SMART error logs.

Certainly if SMART is reporting new errors, around the same time when
btrfs splats, and drive write cache was enabled during the failures, then
it's very likely that the drive failure broke write caching.  Write cache
disable can let the drive be used successfully with btrfs for some time
after that, until some more fatal failure occurs.

> > SSDs are a different story:  there are so many models, firmware revisions
> > are far more diverse, and vendors are still rapidly updating their
> > designs, so we never see exactly the same firmware in any two incident
> > reports.  A firmware list would be obsolete in days.  There is nothing
> > in SSD firmware like the decade-long stability there is in HDD firmware.
> 
> Yeah, that's more or less expected.
> 
> So we don't need to bother that for now.
> 
> Thanks for your awesome info again!
> Qu
> 
> > 
> > IRC users report occasional parent-transid-verify-failure or similar
> > metadata corruption failures on SSDs, but they don't seem to be repeatable
> > with other instances of the same model device.  Samsung dominates the
> > SSD problem reports, but Samsung also dominates the consumer SSD market,
> > so I think we are just seeing messy-but-normal-for-SSD hardware failures,
> > not evidence of firmware bugs.
> > 
> > It's also possible that the window for exploiting a powerfail write cache
> > bug is much, much shorter for SSD than HDD, so even if the bugs do exist,
> > the probability of hitting one is negligible.
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?')
  2021-07-25  3:34         ` Chris Murphy
@ 2021-07-27  9:02           ` David Sterba
  0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2021-07-27  9:02 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Zygo Blaxell, David Sterba, Qu Wenruo, Jorge Bastos, Btrfs BTRFS

On Sat, Jul 24, 2021 at 09:34:23PM -0600, Chris Murphy wrote:
> On Sat, Jul 24, 2021 at 5:16 PM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > SSDs are a different story:  there are so many models, firmware revisions
> > are far more diverse, and vendors are still rapidly updating their
> > designs, so we never see exactly the same firmware in any two incident
> > reports.  A firmware list would be obsolete in days.  There is nothing
> > in SSD firmware like the decade-long stability there is in HDD firmware.
> 
> It might still be worth having reports act as a counter. 0-3 might be
> "not enough info", 4-7 might be "suspicious", 8+ might be "consider
> yourself warned".
> 
> But the scale could be a problem due to the small sample size.

That's a good idea, I've started something on
https://btrfs.wiki.kernel.org/index.php/Hardware_bugs
using the mentioned WD and firmware as first exapmle.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?
  2021-07-21 18:14   ` Jorge Bastos
@ 2021-11-22 13:49     ` Jorge Bastos
  0 siblings, 0 replies; 11+ messages in thread
From: Jorge Bastos @ 2021-11-22 13:49 UTC (permalink / raw)
  To: dsterba, Jorge Bastos, Btrfs BTRFS, Zygo Blaxell

Hello all,

Small update on this issue, it happened again to the same disk:

Nov 22 13:32:15 TV1 emhttpd: shcmd (126): mount -t btrfs -o
noatime,space_cache=v2 /dev/md20 /mnt/disk20
Nov 22 13:32:15 TV1 kernel: BTRFS info (device md20): enabling free space tree
Nov 22 13:32:15 TV1 kernel: BTRFS info (device md20): using free space tree
Nov 22 13:32:15 TV1 kernel: BTRFS info (device md20): has skinny extents
Nov 22 13:32:16 TV1 kernel: BTRFS error (device md20): bad tree block
start, want 1589201485824 have 620757024
Nov 22 13:32:16 TV1 kernel: BTRFS error (device md20): bad tree block
start, want 1589201485824 have 620757024
Nov 22 13:32:16 TV1 kernel: BTRFS error (device md20): failed to read
block groups: -5

This is a new filesystem, previous one was unrecoverable, again it
happened on boot after a clean shutdown, I have over a 100 similar
btrfs filesystems, more than 10 using this same disk model with no
other issues for years, so the same thing happening to the same disk
in a space of a few months suggests to me it's not just a firmware
issue, something else must be going one, maybe something in the disk
is going bad, controller is an LSI 9207 HBA with 17 more disks
connected, for now I'm going to restore the data to a different disk
and see if it doesn't happen again, might also use this disk in a
small zfs pool I have to see if it also gives issues there.

Regards,
Jorge Bastos

On Wed, Jul 21, 2021 at 7:14 PM Jorge Bastos <jorge.mrbastos@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 6:47 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > For the record summing up the discussion from IRC with Zygo, this
> > particular firmware 80.00A80 on WD Green is known to have problematic
> > firmware and would explain the observed errors.
> >
> > Recommendation is not to use WD Green or periodically disable the write
> > cache by 'hdparm -W0'.
> >
>
> Thank you for the reply, yes, from now on I intend to disable write
> cache on those disks, since I still have a lot of them in use.
>
> Jorge

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-22 13:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 22:44 "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue? Jorge Bastos
2021-07-21 17:44 ` David Sterba
2021-07-21 18:14   ` Jorge Bastos
2021-11-22 13:49     ` Jorge Bastos
2021-07-22  0:18   ` Maybe we want to maintain a bad driver list? (Was 'Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?') Qu Wenruo
2021-07-22 13:54     ` David Sterba
2021-07-24 23:15       ` Zygo Blaxell
2021-07-25  3:34         ` Chris Murphy
2021-07-27  9:02           ` David Sterba
2021-07-25  5:27         ` Qu Wenruo
2021-07-26  2:53           ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).