linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bad sector / bad block support
@ 2019-11-27  3:30 Christopher Staples
  2019-11-27  3:48 ` Zygo Blaxell
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Christopher Staples @ 2019-11-27  3:30 UTC (permalink / raw)
  To: linux-btrfs

will their ever be a better way to handle bad sectors ?  I keep
getting silent corruption from random bad sectors
scrubs keep passing with out showing any errors , but if I do a
ddrescue backup to a new drive I find the bad sectors


the only thing I can do for now is mark I/O error files as bad buy
renaming them and make another file copy onto the file system ,


I like btrfs for the snapshot ability , but when it comes to keeping
data safe ext4 seems better ? at least it looks for bad sectors and
marks them , btrfs just seems to write and assumes its written ..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bad sector / bad block support
  2019-11-27  3:30 bad sector / bad block support Christopher Staples
@ 2019-11-27  3:48 ` Zygo Blaxell
  2019-11-27 14:15 ` Austin S. Hemmelgarn
  2019-11-28  1:23 ` Chris Murphy
  2 siblings, 0 replies; 5+ messages in thread
From: Zygo Blaxell @ 2019-11-27  3:48 UTC (permalink / raw)
  To: Christopher Staples; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On Wed, Nov 27, 2019 at 01:30:04PM +1000, Christopher Staples wrote:
> will their ever be a better way to handle bad sectors ?  I keep
> getting silent corruption from random bad sectors
> scrubs keep passing with out showing any errors , but if I do a
> ddrescue backup to a new drive I find the bad sectors

That is a typical symptom of host RAM corruption.  Make sure your memory
and CPU are OK.

A similar test you can do is to copy a large file or group of files
(say a few GB) and compare the copy with the original.  If there are
differences but no btrfs csum errors, chances are good that you are
looking at some kind of host RAM failure.

> the only thing I can do for now is mark I/O error files as bad buy
> renaming them and make another file copy onto the file system ,
> 
> 
> I like btrfs for the snapshot ability , but when it comes to keeping
> data safe ext4 seems better ? at least it looks for bad sectors and
> marks them , btrfs just seems to write and assumes its written ..

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bad sector / bad block support
  2019-11-27  3:30 bad sector / bad block support Christopher Staples
  2019-11-27  3:48 ` Zygo Blaxell
@ 2019-11-27 14:15 ` Austin S. Hemmelgarn
  2019-11-27 15:42   ` Graham Cobb
  2019-11-28  1:23 ` Chris Murphy
  2 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2019-11-27 14:15 UTC (permalink / raw)
  To: Christopher Staples, linux-btrfs

On 2019-11-26 22:30, Christopher Staples wrote:
> will their ever be a better way to handle bad sectors ?  I keep
> getting silent corruption from random bad sectors
> scrubs keep passing with out showing any errors , but if I do a
> ddrescue backup to a new drive I find the bad sectors
Zygo is correct, if there are no checksum errors, it's almost certainly 
not the storage device.

Put simply, for a media error to cause corruption without a checksum 
error, all of the following need to happen at the same time:

* At least one sector in a data block has to go bad without the storage 
device's built-in error correction catching it. If the ECC functionality 
of the drive caught it, it would either return the correct data or a 
read error. This is actually rather unlikely for small numbers of 
devices (but the likelihood of it happening goes up as you increase the 
number of devices involved).
* A mix of similar errors in the block containing the checksum for the 
data block has to similarly go bad without being detected or corrected 
and it has to match up with the corrupted data _or_ the checksum for the 
corrupted data block has to be valid for the corrupted data. This is 
astronomically unlikely, to a level that you're far more likely to be 
struck and killed by a meteor than this happening.
* The above then has to happen for the checksum for the metadata block 
containing the checksum for that data block, and in turn the same 
condition has to repeat for each block up the tree to the root (usually 
3+ times). This is so unlikely to be a statistical impossibility.

So, as Zygo suggests, check your RAM, check your CPU, possibly check 
your PSU (bad power supplies can cause all kinds of weird things).
> 
> 
> the only thing I can do for now is mark I/O error files as bad buy
> renaming them and make another file copy onto the file system ,
> 
> 
> I like btrfs for the snapshot ability , but when it comes to keeping
> data safe ext4 seems better ? at least it looks for bad sectors and
> marks them , btrfs just seems to write and assumes its written ..

ext4 wouldn't save you here, because you almost certainly aren't dealing 
with bad sectors.

BTRFS doesn't include bad sector support like ext4 because it solves the 
issue a different way, namely by checksumming everything and then 
validating the checksums on read.

On a slightly separate note, you should never need the ext4 bad block 
functionality on any modern hardware unless your storage devices are way 
beyond the point at which they should be replaced.  The original reason 
for having bad block lists in the filesystem was that disk drives didn't 
remap bad sectors, which in turn meant that the filesystem had to deal 
with them. It's been multiple decades since that was the case though, 
and all modern storage devices (except possibly some really cheap USB 
flash drives) remap bad sectors and only let the OS see them when they 
run out of space to remap them to.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bad sector / bad block support
  2019-11-27 14:15 ` Austin S. Hemmelgarn
@ 2019-11-27 15:42   ` Graham Cobb
  0 siblings, 0 replies; 5+ messages in thread
From: Graham Cobb @ 2019-11-27 15:42 UTC (permalink / raw)
  To: Christopher Staples, linux-btrfs

On 27/11/2019 14:15, Austin S. Hemmelgarn wrote:
> On 2019-11-26 22:30, Christopher Staples wrote:
>> will their ever be a better way to handle bad sectors ?  I keep
>> getting silent corruption from random bad sectors
>> scrubs keep passing with out showing any errors , but if I do a
>> ddrescue backup to a new drive I find the bad sectors
> Zygo is correct, if there are no checksum errors, it's almost certainly
> not the storage device.
> 
> Put simply, for a media error to cause corruption without a checksum
> error, all of the following need to happen at the same time:

Or, of course, be using NOCOW (or other no-checksum) files.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bad sector / bad block support
  2019-11-27  3:30 bad sector / bad block support Christopher Staples
  2019-11-27  3:48 ` Zygo Blaxell
  2019-11-27 14:15 ` Austin S. Hemmelgarn
@ 2019-11-28  1:23 ` Chris Murphy
  2 siblings, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2019-11-28  1:23 UTC (permalink / raw)
  To: Christopher Staples; +Cc: Btrfs BTRFS

On Tue, Nov 26, 2019 at 8:30 PM Christopher Staples
<mastercatz@gmail.com> wrote:
>
> will their ever be a better way to handle bad sectors ?  I keep
> getting silent corruption from random bad sectors
> scrubs keep passing with out showing any errors , but if I do a
> ddrescue backup to a new drive I find the bad sectors

Bad sectors manifest in two ways: the drive reports UNC on read or
write, or Btrfs reports a checksum mismatch.

If Btrfs isn't catching it, but the data is wrong, it's probably a
memory problem that causes the corruption and subsequently a checksum
computation based on that corruption which is why Btrfs thinks it's
correct.

> I like btrfs for the snapshot ability , but when it comes to keeping
> data safe ext4 seems better ? at least it looks for bad sectors and
> marks them , btrfs just seems to write and assumes its written ..

That's the wrong way of looking at it.

If there are a small number of bad physical sectors, upon write, the
drive firmware will remap the LBA to a reserve sector. There's no
appearance of bad sectors outside the drive at all, and no error
reported. That's normal behavior. If the drive has a lot of bad
sectors, eventually all the reserve sectors get used up, and now the
drive has to report UNC on write - a write error. This is a device
that's inevitably going to betray you with far worse problems and data
loss, so papering over it with an external bad sector map isn't
something anyone will recommend in a data integrity context. Replace
the drive.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-28  1:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27  3:30 bad sector / bad block support Christopher Staples
2019-11-27  3:48 ` Zygo Blaxell
2019-11-27 14:15 ` Austin S. Hemmelgarn
2019-11-27 15:42   ` Graham Cobb
2019-11-28  1:23 ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).