Linux-BTRFS Archive on
 help / color / Atom feed
From: Adam Borowski <>
To: Supercilious Dude <>
Cc: Qu Wenruo <>,
	DanglingPointer <>,
Subject: Re: RAID56 Warning on "multiple serious data-loss bugs"
Date: Mon, 28 Jan 2019 17:24:27 +0100
Message-ID: <> (raw)
In-Reply-To: <>

On Mon, Jan 28, 2019 at 03:23:28PM +0000, Supercilious Dude wrote:
> On Mon, 28 Jan 2019 at 01:18, Qu Wenruo <> wrote:
> >
> > So for current upstream kernel, there should be no major problem despite
> > write hole.
> Can you please elaborate on the implications of the write-hole? Does
> it mean that the transaction currently in-flight might be lost but the
> filesystem is otherwise intact?

No, losing the in-flight transaction is normal operation of every modern
filesystem -- in fact, you _want_ the transaction to be lost instead of
partially torn.

The write hole means corruption of a random _old_ piece of data.

It can be fatal (ie, lead to data loss) if two errors happen together:
* the stripe is degraded
* there's unexpected crash/power loss

Every RAID implementation (not just btrfs) suffers from the write hole
unless some special, costly, precaution is being taken.  Those include
journaling, plug extents, varying-width stripes (ZFS: RAIDZ).  The two
former require effectively writing small writes twice, the latter degrades
small writes to RAID1 as disk capacity goes.

The write hole affects only writes that neighbour some old (ie, not from the
current transaction) data in the same stripe -- as long as everything in a
single stripe belongs to no more than one transaction, all is fine.  

> How does it interact with data and metadata being stored with a different
> profile (one with write hole and one without)?

If there's unrecoverable error due to write hole, you lose a single stripe
worth.  For data, this means a single piece of a file is beyond repair.  For
metadata, you lose a potentially large swatch of the filesystem -- and as
tree nodes close to the root get rewritten the most, a total filesystem loss
is pretty likely.  To make things worse, while data writes are mostly linear
(for small files, btrfs batches writes from the same transaction), metadata
is strewn all around, mixing pieces of different importance and different
age.  RAID5 (all implementations) is also very slow for random writes (such
as btrfs metadata), thus you really want RAID1 metadata both for safety and
performance.  Metadata being only around 1-2% of disk space, the only upside
of RAID5 (better use of capacity) doesn't really matter.

Ie: RAID1 is a clear winner for btrfs metadata; mixing profiles for data vs
metadata is safe.

⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.

  reply index

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-26 11:45 DanglingPointer
2019-01-26 12:07 ` waxhead
2019-01-26 14:05   ` Remi Gauvin
2019-01-28  0:52 ` Qu Wenruo
2019-01-28 15:23   ` Supercilious Dude
2019-01-28 16:24     ` Adam Borowski [this message]
2019-01-28 22:07   ` DanglingPointer
2019-01-28 22:52     ` Remi Gauvin
2019-01-29 19:02       ` Chris Murphy
2019-01-29 19:47         ` Goffredo Baroncelli
2019-01-30  1:41           ` DanglingPointer
2019-02-01 18:45         ` Remi Gauvin
2019-01-29  1:46     ` Qu Wenruo

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on

Archives are clonable:
	git clone --mirror linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ \
	public-inbox-index linux-btrfs

Newsgroup available over NNTP:

AGPL code for this site: git clone public-inbox