All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Christoph Anton Mitterer <calestyo@scientia.net>
Cc: Goffredo Baroncelli <kreijack@inwind.it>,
	Qu Wenruo <quwenruo@cn.fujitsu.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
Date: Mon, 28 Nov 2016 22:19:08 -0500	[thread overview]
Message-ID: <20161129031908.GP8685@hungrycats.org> (raw)
In-Reply-To: <1480384367.6747.46.camel@scientia.net>

[-- Attachment #1: Type: text/plain, Size: 2058 bytes --]

On Tue, Nov 29, 2016 at 02:52:47AM +0100, Christoph Anton Mitterer wrote:
> On Mon, 2016-11-28 at 16:48 -0500, Zygo Blaxell wrote:
> > If a drive's
> > embedded controller RAM fails, you get corruption on the majority of
> > reads from a single disk, and most writes will be corrupted (even if
> > they
> > were not before).
> 
> Administrating a multi-PiB Tier-2 for the LHC Computing Grid with quite
> a number of disks for nearly 10 years now, I'd have never stumbled on
> such a case of breakage so far...

In data centers you won't see breakages that are common on desktop and
laptop drives.  Laptops in particular sometimes (often?) go to places
that are much less friendly to hardware.

All my NAS and enterprise drives in server racks and data centers just
wake up one morning stone dead or with a few well-behaved bad sectors,
with none of this drama.  Boring!

> Actually most cases are as simple as HDD fails to work and this is
> properly signalled to the controller.
> 
> 
> 
> > If there's a transient failure due to environmental
> > issues (e.g. short-term high-amplitude vibration or overheating) then
> > writes may pause for mechanical retry loops.  If there is bitrot in
> > SSDs
> > (particularly in the address translation tables) it looks like a wall
> > of random noise that only ends when the disk goes offline.  You can
> > get
> > combinations of these (e.g. RAM failures caused by transient
> > overheating)
> > where the drive's behavior changes over time.
> > 
> > When in doubt, don't write.
> 
> Sorry, but these cases as any cases of memory issues (be it main memory
> or HDD controller) would also kick in at any normal writes.

Yes, but in a RAID1 context there will be another disk with a good copy
(or if main RAM is failing, the entire filesystem will be toast no matter
what you do).

> So there's no point in protecting against this on the storage side...
> 
> Either never write at all... or have good backups for these rare cases.
> 
> 
> 
> Cheers,
> Chris.



[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2016-11-29  3:19 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21  8:50 [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q Qu Wenruo
2016-11-21 18:48 ` Goffredo Baroncelli
2016-11-22  0:28   ` Qu Wenruo
2016-11-22 18:02     ` Goffredo Baroncelli
2016-11-25  4:31       ` Zygo Blaxell
2016-11-25  4:40         ` Gareth Pye
2016-11-25  5:07           ` Zygo Blaxell
2016-11-26 13:12         ` Goffredo Baroncelli
2016-11-26 18:54           ` Zygo Blaxell
2016-11-26 23:16             ` Goffredo Baroncelli
2016-11-27 16:53               ` Zygo Blaxell
2016-11-28  0:40               ` Qu Wenruo
2016-11-28 18:45                 ` Goffredo Baroncelli
2016-11-28 19:01                   ` Christoph Anton Mitterer
2016-11-28 19:39                     ` Austin S. Hemmelgarn
2016-11-28  3:37           ` Christoph Anton Mitterer
2016-11-28  3:53             ` Andrei Borzenkov
2016-11-28  4:01               ` Christoph Anton Mitterer
2016-11-28 18:32             ` Goffredo Baroncelli
2016-11-28 19:00               ` Christoph Anton Mitterer
2016-11-28 21:48               ` Zygo Blaxell
2016-11-29  1:52                 ` Christoph Anton Mitterer
2016-11-29  3:19                   ` Zygo Blaxell [this message]
2016-11-29  7:35                   ` Adam Borowski
2016-11-29 14:24                     ` Christoph Anton Mitterer
2016-11-22 18:58 ` Chris Mason
2016-11-23  0:26   ` Qu Wenruo
2016-11-26 17:18     ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161129031908.GP8685@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=calestyo@scientia.net \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.