All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikael Abrahamsson <swmike@swm.pp.se>
To: Chris <email.bug@arcor.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: What are mdadm maintainers to do? (error recovery redundancy/data loss)
Date: Tue, 17 Feb 2015 09:48:07 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.02.1502170940010.4007@uplift.swm.pp.se> (raw)
In-Reply-To: <loom.20150217T080345-764@post.gmane.org>

On Tue, 17 Feb 2015, Chris wrote:

> Evererybody please answer with improved versions if you can.
>
> if smartctl tool is available
>  if scterc is disabled
>    /usr/sbin/smartctl -l scterc,70,70 ${DEVNAME}
>  else
>    if screrc is not available
>      echo 180 >/sys/block/${DEVNAME}/device/timeout
>
> Found an older implementation that "seems to work fine":

Hi,

Generally I like this idea, and I agree that this would be a good idea, 
but if I was running raid0 or linear, I might not want scterc to be 
enabled.

Also, what would the harm be to always bump the timeout to 180 seconds? 
Yes, drives would take longer to be kicked out in case of errors, but if 
we're confident in scterc working, wouldn't we want to turn down the 
timeout to 10-15 seconds then?

Personally I turn on scterc if available and turn up the timeout to 180 
seconds, always, regardless what drives I'm running. I'd rather wait 
longer for a drive to be considered dead, than to have drives being kicked 
due to some hiccup in the system (controller or drive reset) that might 
rectify itself.

So I would suggest turning on scterc and turning up the timeout to 180 
seconds as soon as mdadm is installed. This is the best tradeoff I can 
come up with between stability and fast drive-dead-detection time.

Here on the list I see people all the time coming in with multiple drives 
kicked due to controller resets and other intermittent flukes, I never see 
people coming in complaining that it took 30 seconds to detect a drive 
error. I doubt there'd be much complaint for 180 seconds. If someone needs 
faster detect times then my opinion is that they are in the category who 
can be expected to tune this value to their application. 180 seconds works 
best for the "larger crowd" using mdadm.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

  reply	other threads:[~2015-02-17  8:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-14 21:59 re-add POLICY Chris
2015-02-15 19:03 ` re-add POLICY: conflict detection? Chris
2015-02-16  3:28 ` re-add POLICY NeilBrown
2015-02-16 12:23   ` Chris
2015-02-16 13:17     ` Phil Turmel
2015-02-16 16:15       ` desktop disk's error recovery timouts (was: re-add POLICY) Chris
2015-02-16 17:19         ` desktop disk's error recovery timouts Phil Turmel
2015-02-16 17:48           ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) Chris
2015-02-16 19:44             ` What are mdadm maintainers to do? Phil Turmel
2015-02-16 23:49             ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) NeilBrown
2015-02-17  7:52               ` What are mdadm maintainers to do? (error recovery redundancy/data loss) Chris
2015-02-17  8:48                 ` Mikael Abrahamsson [this message]
2015-02-17 10:37                   ` Chris
2015-02-17 19:33                 ` Chris Murphy
2015-02-17 22:47                   ` Adam Goryachev
2015-02-18  1:02                     ` Chris Murphy
2015-02-18 11:04                       ` Chris
2015-02-19  6:12                         ` Chris Murphy
2015-02-20  5:12                           ` Roger Heflin
2015-02-17 23:33                   ` Chris
2015-02-18 15:04               ` help with the little script (erc timout fix) Chris
2015-02-18 21:25                 ` NeilBrown
2015-02-17 15:09     ` re-add POLICY Chris
2015-02-22 13:23       ` Chris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1502170940010.4007@uplift.swm.pp.se \
    --to=swmike@swm.pp.se \
    --cc=email.bug@arcor.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.