Feature request: Remove the badblocks list

* Feature request: Remove the badblocks list
@ 2020-08-18 18:00 Roy Sigurd Karlsbakk
  2020-08-18 19:26 ` Wols Lists
  2020-08-18 21:03 ` Håkon Struijk Holmen
  0 siblings, 2 replies; 14+ messages in thread
From: Roy Sigurd Karlsbakk @ 2020-08-18 18:00 UTC (permalink / raw)
  To: Linux RAID Mailing List; +Cc: Håkon

Hi all

It seems the badblocks list was added around 10 years ago[1]. The reason was to keep a track on sectors not readable, which may have made sensee 20 years earlier, but not even in 2010. The first IDE drives came out in the end of the 1980s and were named thus of their 'Integrated Drive Electronics' which was a new thing at the time. Opposed to earlier MFM drives and such, these were "smart" and could handle errors a bit better, even reallocate bad sectors to somewhere else when needed. A lot happened beween 1987 and 2010, but for some reason, this feature slipped through anyway, perhaps becauase Linus was drunk, I don't know. As far as I can understand, this feature works a bit like this

 - If a bad (that is, unreadable) block is found, it is flagged as bad, not to be used ever again, in the md member's superblock
 - If a new disk is added to the array, the block number of the initial bad block is flagged on the new drive, since the whole stripe is rendered useless (erm, didn't we have redundency here?)
 - If replacing the original drive with a new drive, md happily replaces all the data to the new drive and updates the superblock with the same badblock list.

So no attempt is ever done to check or repair that sector. Disks reallocate sectors if they are bad and it's not necessarily a big issue unless there's a lot of such errors. We just say 'this sector or block said *ouch* and is thus dead, and so will his siblings be for ever and ever'. There's a nice article about it here[2].

In practice, if you have data in stripes with badblocks, they may be lost forever and for no reason at all, since drives tend to fix their problems if you issue a write to that sector. ZFS does this nicely - when it finds a bad read, it reconstructs from parity or mirror and writes it again. If it encounters a write error, it tries over. Eventually the drive may fail, but hell, that's why we have redundancy.

As far as I can see, the only solution to remove the badblocks list, is "mdadm ... --assemble --update=no-bbl", from [2], and have md return garbage for those lost sectors, which is fine, since fsck/xfs_repair should fix what's fixable (and still won't be readable anyway). An alternate version written by a friend of mine (Håkon on cc) is present on [3] to remove the list from an offlined array.

As far as I can understand, this list doesn't have any reason to exist, except to annoy sysadmins.

So please remove this useless thing or at least don't enable it by default

[1] https://linux-raid.vger.kernel.narkive.com/R1rvkUiQ/using-the-new-bad-block-log-in-md-for-linux-3-1
[2] https://raid.wiki.kernel.org/index.php/The_Badblocks_controversy
[3] https://git.thehawken.org/hawken/md-badblocktool.git

Vennlig hilsen

roy
-- 
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.

^ permalink raw reply	[flat|nested] 14+ messages in thread