All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Andreas Klauer <Andreas.Klauer@metamorpher.de>,
	Dark Penguin <darkpenguin@yandex.ru>
Cc: linux-raid@vger.kernel.org
Subject: Re: Why not just return an error?
Date: Fri, 7 Oct 2016 10:43:49 -0400	[thread overview]
Message-ID: <e887908f-ba51-0f88-f891-f60e8bac50bd@turmel.org> (raw)
In-Reply-To: <20161007112151.GA4405@metamorpher.de>

Good morning Andreas,

On 10/07/2016 07:21 AM, Andreas Klauer wrote:
> On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
>> why not just return a read error instead?
> 
> You make it sound like it solves all problems, but it does not.
> Errors are just not part of the concept anywhere really.

That's not strictly true. The majority of read errors on large modern
drives are fixable by writing over the troublesome sector.  That may or
may not relocate the sector to the drive's spare area.  Read error
locations that haven't yet been overwritten are identified in the drive
firmware as "Pending Relocations", since the drive doesn't yet know if
the problem is a true media defect or just a write error (power
transient during write, whatever).

Since brand new drives almost never have errors, people assume that's
normal.  Get three or four years in and you see that's not true.  In my
experience, when actual relocations hit double digits, it's time to
replace the drive.  The drive is still operating within spec, though --
it won't be a warranty replacement.

> If a filesystem encounters one, it might flip into read only mode;
> if a program encounters one it might do whatever.
> You still have a huge data loss, corrupt databases, et cetera.

Concur.

> Even so, is that not what you have with "bad block log" enabled, 
> within reason? I disable it everywhere. I want my disks kicked.

I want my disks *fixed* if possible, not kicked.  If they're kicked, the
rest of the good data on that disk is unavailable for keeping my array
running.  I want to see the relocations growing in my daily logwatch
reports so I can use mdadm --replace to maintain the array without *any*
loss of redundancy.

> Using cosmetics to hide errors only works to a certain limit. 
> In the end, RAID only works if the disks work. RAID 5 with 
> two dead disks is dead, no way to get around that. Disks go bad 
> and need to be replaced, if you don't do that, you'll just fail 
> even more horribly later on.

Concur.  We seem to differ on where to draw the line on "bad".

> Your disk produces read errors, or needs 3 minutes to read a single sector, 
> what use is it to anyone? I'm not letting those disks stay, no matter how 
> many more people preach that "read errors are normal". No. They're not. 
> Such disks are utter and complete trash and have to go.

Really?  You get rid of drives on the first read error event?  If you're
discarding them, I'll pay shipping for you to send them to me.  That
would be an especially cost effective source of drives for me. None of
the green or desktop POSes, though.  (-:  Or are you just not noticing
the read errors because MD is silently fixing them for you?

> Don't wait for MD to kick disks out either. Check your disks. 
> Actually replace them if they have errors. Most RAIDs die due 
> to people not monitoring their disks, or delaying replacements.

Yup.

> Replacing disks costs money but that is the price you have to pay 
> for the luxury of using RAID (especially at home) in the first place. 
> When buying a RAID system, the money for the next replacement disk 
> should always be planned into your budget. If you max it out or 
> overdraw your budget for those fancy enterprise RAID disks, 
> you'll find they die just the same.

Enterprise drives are easily justified for heavily loaded arrays in a
small shop.  NAS drives are just fine for small business and home media
servers.  Green and modern desktop drives are utterly unsuited to raid duty.

> Also make backups. RAID never replaces backups.

Indeed.

Phil

  reply	other threads:[~2016-10-07 14:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-06 23:32 Why not just return an error? Dark Penguin
2016-10-07  5:26 ` keld
2016-10-07  8:21   ` Rudy Zijlstra
2016-10-07  9:30     ` keld
2016-10-07 11:21 ` Andreas Klauer
2016-10-07 14:43   ` Phil Turmel [this message]
2016-10-07 16:23     ` Dark Penguin
2016-10-07 16:52       ` Phil Turmel
2016-10-07 17:44         ` Dark Penguin
2016-10-07 18:41           ` Phil Turmel
2016-10-07 20:39             ` Dark Penguin
2016-10-07 23:11             ` Edward Kuns
2016-10-10 20:47           ` Anthony Youngman
2016-10-10 21:37             ` Andreas Klauer
2016-10-10 21:55               ` Wols Lists
2016-10-11  4:00                 ` Brad Campbell
2016-10-11  9:18                   ` Wols Lists
2016-10-11 10:01                     ` Brad Campbell
2016-10-11 10:15                       ` Wols Lists
2016-10-10 22:10             ` Wakko Warner
2016-10-07 14:19 ` Phil Turmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e887908f-ba51-0f88-f891-f60e8bac50bd@turmel.org \
    --to=philip@turmel.org \
    --cc=Andreas.Klauer@metamorpher.de \
    --cc=darkpenguin@yandex.ru \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.