All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Chris Murphy <lists@colorremedies.com>
Cc: Alex Elsayed <eternaleye@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC] Preliminary BTRFS Encryption
Date: Mon, 19 Sep 2016 21:10:08 -0400	[thread overview]
Message-ID: <20160920011008.GL21290@hungrycats.org> (raw)
In-Reply-To: <20160919223119.GK21290@hungrycats.org>

[-- Attachment #1: Type: text/plain, Size: 3196 bytes --]

On Mon, Sep 19, 2016 at 06:31:22PM -0400, Zygo Blaxell wrote:
> On Mon, Sep 19, 2016 at 04:25:55PM -0600, Chris Murphy wrote:
> > >> Files are modified by creating new extents (using parameters inherited
> > >> from the inode to fill in the extent attributes) and updating the inode
> > >> to refer to the new extent instead of the old one at the modified
> > >> offset. Cloned extents are references to existing extents associated
> > >> with a different inode or at a different place within the same inode (if
> > >> the extent is not compatible with the destination inode, clone fails
> > >> with an error).  A snapshot is an efficient way to clone an entire
> > >> subvol tree at once, including all inodes and attributes.
> > >
> > > There is the caveat of chattr +C, which would need hard-disabled for
> > > extent-level encryption (vs block level).
> > 
> > What about raid56 partial stripe writes? Aren't these effectively nocow?
> 
> Those are a straight-up bug that should be fixed.  They are mixing committed
> data with uncommitted data from two different transactions, and the stripe
> temporarily contains garbage.  Combine that with unclean shutdown in degraded
> mode and the data is gone.

A slightly more detailed answer:

nocow and raid56 partial stripe writes are different because nocow writes
won't corrupt unrelated extents, while raid56 partial stripe writes will.
They are entirely different classes of problem.

Even in non-degraded mode, an interrupted write to a modified stripe
is not recoverable from parity until after the parity is reconstructed
(e.g. by scrub or a later write to the stripe in non-degraded mode).

If one of the disks is significantly slower or has deeper queues than
the others, this could affect many extents, as btrfs could submit a
lot of writes to each disk and then wait for all the disks to finish
asynchronously, leaving a large time window for interruption.

If a disk fails after an unclean shutdown but before a scrub is complete,
data in all of the uncorrected stripes will be lost.  If the array enters
or is already in degraded mode during a write when an unclean shutdown
occurs, data will be lost immediately.

Users who don't scrub immediately after unclean shutdowns are sitting on
a ticking time bomb of corruption that explodes when a disk fails.

If this happens to data extents, only file data is lost.  If it happens
to metadata extents, the filesystem is severely damaged or destroyed
(more likely destroyed as the roots of the metadata trees are usually
the most recently written blocks).

mdadm avoids this by scrubbing immediately after an unclean shutdown
to minimize the vulnerable window (or using the new stripe journalling
feature), but it fails (causing severe filesystem damage) when there
are crashes in degraded mode.  ZFS avoids this using a combination of
dynamic stripe width to avoid failed devices and the ZIL journal.

The best thing to do is rework the raid56 layer (and probably some
higher layers in btrfs) until there are no further references to
raid56_rmw_stripe or async_rmw_stripe, then remove those functions and
never put them back.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2016-09-20  1:10 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
2016-09-13 14:12   ` kbuild test robot
2016-09-13 14:24   ` kbuild test robot
2016-09-13 16:10   ` kbuild test robot
2016-09-13 13:39 ` [PATCH 1/2] btrfs-progs: make wait_for_commit non static Anand Jain
2016-09-13 13:39 ` [PATCH 2/2] btrfs-progs: add encryption support Anand Jain
2016-09-13 13:39 ` [PATCH] fstests: btrfs: support encryption Anand Jain
2016-09-13 16:42 ` [RFC] Preliminary BTRFS Encryption Wilson Meier
2016-09-14  7:02   ` Anand Jain
2016-09-14 18:26     ` Wilson Meier
2016-09-15  4:53 ` Alex Elsayed
2016-09-15 11:33   ` Anand Jain
2016-09-15 11:47     ` Alex Elsayed
2016-09-16 11:35       ` Anand Jain
2016-09-15  5:38 ` Chris Murphy
2016-09-15 11:32   ` Anand Jain
2016-09-15 11:37 ` Austin S. Hemmelgarn
2016-09-15 14:06   ` Anand Jain
2016-09-15 14:24     ` Austin S. Hemmelgarn
2016-09-16  8:58       ` David Sterba
2016-09-17  2:18       ` Zygo Blaxell
2016-09-16  1:12 ` Dave Chinner
2016-09-16  5:47   ` Roman Mamedov
2016-09-16  6:49   ` Alex Elsayed
2016-09-17  4:38     ` Zygo Blaxell
2016-09-17  6:37       ` Alex Elsayed
2016-09-19 18:08         ` Zygo Blaxell
2016-09-19 20:01           ` Alex Elsayed
2016-09-19 22:22             ` Zygo Blaxell
2016-09-19 22:25             ` Chris Murphy
2016-09-19 22:31               ` Zygo Blaxell
2016-09-20  1:10                 ` Zygo Blaxell [this message]
2016-09-17 18:45       ` David Sterba
2016-09-20 14:26         ` Anand Jain
2016-09-16 10:45   ` Brendan Hide
2016-09-16 11:46   ` Anand Jain
2016-09-16  8:49 ` David Sterba
2016-09-16 11:56   ` Anand Jain
2016-09-17 20:35     ` David Sterba
2016-09-18  8:34       ` RAID1 availability issue[2], Hot-spare and auto-replace Anand Jain
2016-09-18 17:28         ` Chris Murphy
2016-09-18 17:34           ` Chris Murphy
2016-09-19  2:25           ` Anand Jain
2016-09-19 12:07             ` Austin S. Hemmelgarn
2016-09-19 12:25           ` Austin S. Hemmelgarn
2016-09-18  9:54       ` [RFC] Preliminary BTRFS Encryption Anand Jain
2016-09-20  0:12   ` Chris Mason
2016-09-20  0:55     ` Anand Jain
2016-09-17  6:58 ` Eric Biggers
2016-09-17  7:13   ` Alex Elsayed
2016-09-19 18:57     ` Zygo Blaxell
2016-09-19 19:50       ` Alex Elsayed
2016-09-19 22:12         ` Zygo Blaxell
2016-09-17 16:12   ` Anand Jain
2016-09-17 18:57     ` Chris Murphy
2016-09-19 15:15 ` Experimental btrfs encryption Theodore Ts'o
2016-09-19 20:58   ` Alex Elsayed
2016-09-20  0:32     ` Chris Mason
2016-09-20  2:47       ` Alex Elsayed
2016-09-20  2:50       ` Theodore Ts'o
2016-09-20  3:05         ` Alex Elsayed
2016-09-20  4:09         ` Zygo Blaxell
2016-09-20 15:44         ` Chris Mason
2016-09-21 13:52           ` Anand Jain
2016-09-20  4:05   ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160920011008.GL21290@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=eternaleye@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.