All of lore.kernel.org
 help / color / mirror / Atom feed
From: pg@list.for.sabi.co.UK (Peter Grandi)
To: list Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: SSD based sw RAID: is ERC/TLER really important?
Date: Sat, 24 Jul 2021 22:19:02 +0200	[thread overview]
Message-ID: <24828.30134.873619.942883@cyme.ty.sabi.co.uk> (raw)
In-Reply-To: <2232919.g0K5C1TF2C@chirone>

> the recovery time in case of media errors could exceed kernel
> timeouts and possibly kick off the entire drive from the RAID
> set and, in turn, lead to a fault of a RAID5 system upon a
> subsequent error in a second drive.

My understanding seems different:

* The purpose of having a short device error retry period is the
  opposite, it is to fail a drive as fast as possible, in
  workloads where latency matters ( or there is also the risk of
  bus/link resets hitting multiple drives). In those cases error
  retry periods of 1-2 seconds (at most) are common, rather than
  the mid-way "7 seconds" from copy-and-paste from web pages..

* The purpose of having a long device error retry is to instead
  to minimize the chances of declaring a drive failed, hoping
  that many retries succeed. (but note the difference between
  reads and writes).

* It is possible to set the kernel timeouts higher than device
  retry periods, if one does not care about latency, to minimize
  the chances of declaring a drive failed (not the difference
  between Linux command timeouts and retry timeouts, the latter
  can also be long).

> But in the case of SSD drives (where, possibly, the error
> recovery activities performed by the drive firmware are very
> fast) [...]

I guess that depends on the firmware: On one hand MLC cells can
become quite unreliable, especially at higher temperatures,
requiring many retries and lots of ECC, on the other on "write"
allocating a new erase-block is easy, as unlike for most HDDs
with a FTL, SDD sector logical and physical sector locations are
independent. Unfortunately most flash SSD drive makers don't
supply technical information on details like error recovery
strategies.

  reply	other threads:[~2021-07-24 20:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-24 18:41 SSD based sw RAID: is ERC/TLER really important? Gianluca Frustagli
2021-07-24 20:19 ` Peter Grandi [this message]
2021-07-24 21:45   ` Phil Turmel
2021-07-25  7:00     ` Wols Lists
2021-07-25 10:28     ` Peter Grandi
2021-07-26  1:06       ` Phil Turmel
2021-07-26  7:57         ` Peter Grandi
2021-07-26 16:12           ` Peter Grandi
2021-07-25 11:04     ` Peter Grandi
2021-07-24 20:21 ` Andy Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24828.30134.873619.942883@cyme.ty.sabi.co.uk \
    --to=pg@list.for.sabi.co.uk \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.