linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans van Kranenburg <Hans.van.Kranenburg@mendix.com>
To: Christoph Anton Mitterer <calestyo@scientia.net>,
	"Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: dm-integrity + mdadm + btrfs = no journal?
Date: Wed, 30 Jan 2019 16:56:14 +0000	[thread overview]
Message-ID: <9e026f25-703f-0bde-2f35-b405f40d4a07@mendix.com> (raw)
In-Reply-To: <c83b2619-240f-4048-7c4d-545226f4ef0a@mendix.com>

On 1/30/19 5:38 PM, Hans van Kranenburg wrote:
> On 1/30/19 4:26 PM, Christoph Anton Mitterer wrote:
>> On Wed, 2019-01-30 at 07:58 -0500, Austin S. Hemmelgarn wrote:
>>> Running dm-integrity without a journal is roughly equivalent to
>>> using 
>>> the nobarrier mount option (the journal is used to provide the same 
>>> guarantees that barriers do).  IOW, don't do this unless you are
>>> willing 
>>> to lose the whole volume.
>>
>> That sounds a bit strange to me.
>>
>> My understanding was that the idea of being able to disable the journal
>> of dm-integrity was just to avoid any double work, if equivalent
>> guarantees are already given by higher levels.
>>
>> If btrfs is by itself already safe (by using barriers), then I'd have
>> expected that not transaction is committed, unless it got through all
>> lower layers... so either everything works well on the dm-integrity
>> base (and thus no journal is needed)... or it fails there... but then
>> btrfs would already safe by it's own means (barriers + CoW)?
> 
> This. Exactly this.
> 
> The reason that this journal of dm-integrity has to be used is because
> data and the checksum of that data gets written in two different places.
> The result of using it is that you'll always read back data with
> matching checksums, either the previous data, or the new data.
> 
> https://arxiv.org/pdf/1807.00309.pdf
> See Section 4.4 "Recovery on Write Failure".
> 
> "A device must provide atomic updating of both data and metadata.  A
> situation in which one part is written to media while another part
> failed must not occur."
> 
> Now, the great thing here is that btrfs does not overwrite disk data in
> place. It writes out new data, metadata and then the superblock. So,
> e.g. on power loss, I don't care about whatever happened to writes that
> are not visible because the superblock was never written? Btrfs will not
> read these disk sectors back, because it's unused space.

So, to reiterate from first post, this means that I cannot use nocow or
directio", because it goes around the cow safety net.

Also, there is still a risk, which is of course writing the superblocks.
If all copies of superblock on a single device are written, and all of
them lack the updated checksum, then I'll lose the fs, and will have to
either repair that manually, or restore from send/receive backups of a
few minutes ago.

> Also, it's not a write hole like in RAID56, because when "pulling the
> plug" between writing out data and metadata, the checksums of older
> existing data sectors are not corrupted, only new writes that were in
> flight... I think... But the the pdf is still mentioning (also in 4.4)
> "Furthermore, metadata sectors are packed with tags for multiple
> sectors; thus, a write failure must not cause an integrity validation
> failure for other sectors". From the design, I can however not see how
> this could happen.
> 
> I asked on dm-devel list a while ago about this, but the mailing list
> post never got any reply.

Hans

      reply	other threads:[~2019-01-30 16:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 23:15 dm-integrity + mdadm + btrfs = no journal? Hans van Kranenburg
2019-01-30  1:02 ` Chris Murphy
2019-01-30  8:42 ` Roman Mamedov
2019-01-30 12:58 ` Austin S. Hemmelgarn
2019-01-30 15:26   ` Christoph Anton Mitterer
2019-01-30 16:00     ` Austin S. Hemmelgarn
2019-01-30 16:31       ` Christoph Anton Mitterer
2019-01-30 16:38     ` Hans van Kranenburg
2019-01-30 16:56       ` Hans van Kranenburg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e026f25-703f-0bde-2f35-b405f40d4a07@mendix.com \
    --to=hans.van.kranenburg@mendix.com \
    --cc=ahferroin7@gmail.com \
    --cc=calestyo@scientia.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).