linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adam Borowski <kilobyte@angband.pl>
To: Supercilious Dude <supercilious.dude@gmail.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	DanglingPointer <danglingpointerexception@gmail.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: RAID56 Warning on "multiple serious data-loss bugs"
Date: Mon, 28 Jan 2019 17:24:27 +0100	[thread overview]
Message-ID: <20190128162427.oztw55e6e3l5fpll@angband.pl> (raw)
In-Reply-To: <CAGmvKk7yMOUJWbzro-=+DdzgMpo71vi7+QaCWu4LYcibBAzhrg@mail.gmail.com>

On Mon, Jan 28, 2019 at 03:23:28PM +0000, Supercilious Dude wrote:
> On Mon, 28 Jan 2019 at 01:18, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >
> > So for current upstream kernel, there should be no major problem despite
> > write hole.
> 
> 
> Can you please elaborate on the implications of the write-hole? Does
> it mean that the transaction currently in-flight might be lost but the
> filesystem is otherwise intact?

No, losing the in-flight transaction is normal operation of every modern
filesystem -- in fact, you _want_ the transaction to be lost instead of
partially torn.

The write hole means corruption of a random _old_ piece of data.

It can be fatal (ie, lead to data loss) if two errors happen together:
* the stripe is degraded
* there's unexpected crash/power loss

Every RAID implementation (not just btrfs) suffers from the write hole
unless some special, costly, precaution is being taken.  Those include
journaling, plug extents, varying-width stripes (ZFS: RAIDZ).  The two
former require effectively writing small writes twice, the latter degrades
small writes to RAID1 as disk capacity goes.

The write hole affects only writes that neighbour some old (ie, not from the
current transaction) data in the same stripe -- as long as everything in a
single stripe belongs to no more than one transaction, all is fine.  

> How does it interact with data and metadata being stored with a different
> profile (one with write hole and one without)?

If there's unrecoverable error due to write hole, you lose a single stripe
worth.  For data, this means a single piece of a file is beyond repair.  For
metadata, you lose a potentially large swatch of the filesystem -- and as
tree nodes close to the root get rewritten the most, a total filesystem loss
is pretty likely.  To make things worse, while data writes are mostly linear
(for small files, btrfs batches writes from the same transaction), metadata
is strewn all around, mixing pieces of different importance and different
age.  RAID5 (all implementations) is also very slow for random writes (such
as btrfs metadata), thus you really want RAID1 metadata both for safety and
performance.  Metadata being only around 1-2% of disk space, the only upside
of RAID5 (better use of capacity) doesn't really matter.

Ie: RAID1 is a clear winner for btrfs metadata; mixing profiles for data vs
metadata is safe.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄⠀⠀⠀⠀

  reply	other threads:[~2019-01-28 16:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-26 11:45 RAID56 Warning on "multiple serious data-loss bugs" DanglingPointer
2019-01-26 12:07 ` waxhead
2019-01-26 14:05   ` Remi Gauvin
2019-01-28  0:52 ` Qu Wenruo
2019-01-28 15:23   ` Supercilious Dude
2019-01-28 16:24     ` Adam Borowski [this message]
2019-01-28 22:07   ` DanglingPointer
2019-01-28 22:52     ` Remi Gauvin
2019-01-29 19:02       ` Chris Murphy
2019-01-29 19:47         ` Goffredo Baroncelli
2019-01-30  1:41           ` DanglingPointer
2019-02-01 18:45         ` Remi Gauvin
2019-01-29  1:46     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128162427.oztw55e6e3l5fpll@angband.pl \
    --to=kilobyte@angband.pl \
    --cc=danglingpointerexception@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=supercilious.dude@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).