Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Filipe Manana <fdmanana@kernel.org>
Cc: dsterba@suse.cz, linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2] Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
Date: Wed, 27 Feb 2019 19:31:49 +0100
Message-ID: <20190227183149.GB31119@twin.jikos.cz> (raw)
In-Reply-To: <CAL3q7H4a6VjrFnxRtZVxdsjt7=bTGJCiXACyYDcEsCif7SHHcw@mail.gmail.com>

On Wed, Feb 27, 2019 at 05:32:55PM +0000, Filipe Manana wrote:
> On Wed, Feb 27, 2019 at 5:25 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Wed, Feb 27, 2019 at 01:42:30PM +0000, fdmanana@kernel.org wrote:
> > > @@ -1897,15 +1899,45 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info)
> > >        * from already being in a transaction and our join_transaction doesn't
> > >        * have to re-take the fs freeze lock.
> > >        */
> > > -     if (btrfs_test_opt(fs_info, FLUSHONCOMMIT))
> > > +     if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) {
> > >               writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC);
> > > +     } else {
> > > +             struct btrfs_pending_snapshot *pending;
> > > +             struct list_head *head = &trans->transaction->pending_snapshots;
> > > +
> > > +             /*
> > > +              * Flush dellaloc for any root that is going to be snapshotted.
> > > +              * This is done to avoid a corrupted version of files, in the
> > > +              * snapshots, that had both buffered and direct IO writes (even
> > > +              * if they were done sequentially) due to an unordered update of
> > > +              * the inode's size on disk.
> > > +              */
> > > +             list_for_each_entry(pending, head, list)
> > > +                     btrfs_start_delalloc_snapshot(pending->root);
> > > +     }
> >
> > A diff would be better than words, incremental on top of your patch:
> 
> You mean, better than a full 2nd version patch I suppose.

I mean better than my attempts to explain by words the error handling
logic that I was proposing.

> >
> > @@ -1912,8 +1912,15 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_trans_handle *trans)
> >                  * if they were done sequentially) due to an unordered update of
> >                  * the inode's size on disk.
> >                  */
> > -               list_for_each_entry(pending, head, list)
> > -                       btrfs_start_delalloc_snapshot(pending->root);
> > +               list_for_each_entry(pending, head, list) {
> > +                       int ret;
> > +
> > +                       ret = btrfs_start_delalloc_snapshot(pending->root);
> > +                       if (ret < 0) {
> > +                               writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC);
> > +                               break;
> > +                       }
> 
> What do you expect by falling back to writeback_inodes_sb()?
> It all ends up going through the same btrfs writeback path.
> And as I left it, if an error happens for one root, it still tries to
> flush writeback for all the remaining roots as well, so I don't get it
> why you fallback to writeback_inodes_sb().

As I read the changelog, you say that the corruption does not happen
with FLUSHONCOMMIT, which does writeback_inodes_sb. Using that would be
too heavy, thus you only start the delalloc on snapshotted roots.

So the idea is to use the same logic of flushoncommit in the unlikely
case when btrfs_start_delalloc_snapshot would fail. Only at some
performance cost, unless I'm missing something.

As for the v2 as you implement it without any error handling, doesn't
this allow the corruption to happen? If start_delalloc_inodes has a lot
of inodes for which it needs to allocate delalloc_work, the failure is
possible. That the list_for_each continues does not affect that
particular root.

> > Making a switch by the exact error is probably not necessary and wouldn't be
> > future proof anyway.
> 
> Not sure I understood that sentence.

Under v1 I was suggesting to filter out all possible errors from
btrfs_start_delalloc_snapshot, EROFS and ENOMEM. So by 'making a switch'
I meant

	if (ret == -EROFS) {
		break;
	} else {
		writeback_inodes_sb();
		break;
	}

> Anyway, it's not clear to me whether as it is it's fine, or do you
> want an incremental patch, or something else.

I want to continue the discussion about the error handling.  The
incremental diff was to show actual code behind my idea.

If this weren't a correctness and commit related code I probably would
not go that far be ok with the errors and abort. So I'm hoping you
could help me find good options with low impact as the change affects
the default commit path.

  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 14:28 [PATCH] Btrfs: fix file corruption after snapshotting fdmanana
2019-02-18 17:11 ` David Sterba
2019-02-18 17:27   ` Filipe Manana
2019-02-27 12:47     ` David Sterba
2019-02-27 13:42       ` Filipe Manana
2019-02-27 17:19         ` David Sterba
2019-02-27 13:42 ` [PATCH v2] Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes fdmanana
2019-02-27 17:26   ` David Sterba
2019-02-27 17:32     ` Filipe Manana
2019-02-27 18:31       ` David Sterba [this message]
2019-02-27 18:56         ` Filipe Manana
2019-03-12 17:13           ` David Sterba

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190227183149.GB31119@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox