All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Chris Murphy <lists@colorremedies.com>
Cc: Phil Karn <karn@ka9q.net>, Paul Jones <paul@pauljones.id.au>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Extremely slow device removals
Date: Sun, 3 May 2020 01:26:37 -0400	[thread overview]
Message-ID: <20200503052637.GE10796@hungrycats.org> (raw)
In-Reply-To: <CAJCQCtTGg+Rmisw9QAj4SMaDcZ5e_2h_83-3Hjd=FDC5krgjCg@mail.gmail.com>

On Sat, May 02, 2020 at 11:48:18AM -0600, Chris Murphy wrote:
> On Sat, May 2, 2020 at 3:09 AM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > On SD/MMC and below-$50 SSDs, silent data corruption is the most common
> > failure mode.  I don't think these disks are capable of detecting or
> > reporting individual sector errors.  I've never seen it happen.  They
> > either fall off the bus or they have a catastrophic failure and give
> > an error on every single access.
> 
> I'm still curious about the allocator to use for this device class. SD
> Cards usually self-report rotational=0. Whereas USB sticks report
> rotational=1. The man page seems to suggest nossd or ssd_spread.

Use dup metadata on all single-disk filesystems, unless you are making
an intentionally temporary filesystem (like a RAM disk, or a cache with
totally expendable contents).  The correct function for maximizing btrfs
lifetime does not have "rotational" as a parameter.

> In my very limited sample size from a single vendor, I've only seen SD
> Card fail by becoming read only. i.e. hardware read-only, with the
> kernel spewing sd/mmc related debugging info about the card (or card's
> firmware). Maybe that's a good example? 

Yes, that would be a good example if you can read the card.  Usually
when these devices hit the end of their lives there's nothing left
to read, or big chunks of data are misplaced or missing entirely.

All SSDs eventually end read-only, completely inaccessible, or
otherwise incapable of accepting further writes, if you run them long
enough.  Since it's no longer possible to test the drive's capability
as a storage device after this happens, you can have at most one such
failure per drive.  All the other failure modes can happen multiple times.

Some cheap SSDs will flip a bit (either in data or in a sector address)
at some point during their testable lifetimes.  The same drive can do
this over and over, so the error counts get quite high, and this is
easily the single most common failure event.  Since the drive itself
seems unaware of the errors, it never hits any kind of internal limit
on the number of failures (contrast with UNC sectors, where eventually
the remapping table fills up).  Typical error rates are one sector every
few weeks once the drive is past 50% of its endurance rating, but some
cheap SSDs don't wait for 50% and start corrupting data right away.

Some cheap SSDs fail by dropping off the bus until power-cycled.
Sometimes they corrupt data and drop off the bus at the same time, so
this event can end up being included in the silent data corruption count.
That may produce an elevated silent data corruption count, but silent
data corruption is still the most common event even if all bus drops
are subtracted.

Some cheap SSDs fail by becoming 2 orders of magnitude slower suddenly.
This is rare, and there's no data loss in these events.

Some SSDs detect and report UNC sector errors, either on read operations
or SMART self-tests, which I presume are due to internal data corruption
combined with error checking by the firmware, though they could be
false positives.  Cheap SSDs never do this, it only occurs on drives
outside of the cheap SSD group.

I believe that the cheap SSDs are not capable of detecting or reporting
data corruption errors on individual sectors, given the large number
of opportunities they've been provided to demonstrate this capability
under my observation, and the exactly zero times they've used one.

Most of the above applies to SD/MMC devices as well, except I've never
seen a SD/MMC device that had the UNC sector error detection capability.
They only seem to have the cheap SSD failure modes.

> I suppose it's better to go
> read-only with data still readable, and insofar as Btrfs was concerned
> the data was correct, rather than start returning transiently bad
> data. However, I only knew this due to data checksums.
> 
> 
> -- 
> Chris Murphy
> 

  reply	other threads:[~2020-05-03  5:26 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  7:22 Extremely slow device removals Phil Karn
2020-04-30 17:31 ` Phil Karn
2020-04-30 18:13   ` Jean-Denis Girard
2020-05-01  8:05     ` Phil Karn
2020-05-02  3:35       ` Zygo Blaxell
     [not found]         ` <CAMwB8mjUw+KV8mxg8ynPsv0sj5vSpwG7_khw=oP5n+SnPYzumQ@mail.gmail.com>
2020-05-02  4:31           ` Zygo Blaxell
2020-05-02  4:48         ` Paul Jones
2020-05-02  5:25           ` Phil Karn
2020-05-02  6:04             ` Remi Gauvin
2020-05-02  7:20             ` Zygo Blaxell
2020-05-02  7:27               ` Phil Karn
2020-05-02  7:52                 ` Zygo Blaxell
2020-05-02  6:00           ` Zygo Blaxell
2020-05-02  6:23             ` Paul Jones
2020-05-02  7:20               ` Phil Karn
2020-05-02  7:42                 ` Zygo Blaxell
2020-05-02  8:22                   ` Phil Karn
2020-05-02  8:24                     ` Phil Karn
2020-05-02  9:09                     ` Zygo Blaxell
2020-05-02 17:48                       ` Chris Murphy
2020-05-03  5:26                         ` Zygo Blaxell [this message]
2020-05-03  5:39                           ` Chris Murphy
2020-05-03  6:05                             ` Chris Murphy
2020-05-04  2:09                         ` Phil Karn
2020-05-02  7:43                 ` Jukka Larja
2020-05-02  4:49         ` Phil Karn
2020-04-30 18:40   ` Chris Murphy
2020-04-30 19:59     ` Phil Karn
2020-04-30 20:27       ` Alexandru Dordea
2020-04-30 20:58         ` Phil Karn
2020-05-01  2:47       ` Zygo Blaxell
2020-05-01  4:48         ` Phil Karn
2020-05-01  6:05           ` Alexandru Dordea
2020-05-01  7:29             ` Phil Karn
2020-05-02  4:18               ` Zygo Blaxell
2020-05-02  4:48                 ` Phil Karn
2020-05-02  5:00                 ` Phil Karn
2020-05-03  2:28                 ` Phil Karn
2020-05-04  7:39                   ` Phil Karn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200503052637.GE10796@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=karn@ka9q.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=paul@pauljones.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.