From: Sage Weil <sage@newdream.net>
To: Mike Fedyk <mfedyk@mikefedyk.com>
Cc: sander@humilis.net, Josef Bacik <josef@redhat.com>,
Chris Ball <cjb@laptop.org>,
Nickolai Zeldovich <nickolai@csail.mit.edu>,
linux-btrfs@vger.kernel.org
Subject: Re: zero-length files in snapshots
Date: Fri, 19 Feb 2010 14:22:25 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.1002191355490.21676@cobra.newdream.net> (raw)
In-Reply-To: <93cdabd21002131126v4b4d3c1ei2a747a7a5de7b1c8@mail.gmail.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3658 bytes --]
On Sat, 13 Feb 2010, Mike Fedyk wrote:
> On Sat, Feb 13, 2010 at 3:25 AM, Sander <sander@humilis.net> wrote:
> > Mike Fedyk wrote (ao):
> >> On Fri, Feb 12, 2010 at 8:32 AM, Josef Bacik <josef@redhat.com> wrote:
> >> > Creating a file is a metadata operation, and _any_ metadata operation has to be
> >> > committed to disk when the transaction commits in order to maintain a coherent
> >> > fs. ??Thanks,
> >>
> >> What I still don't understand though is that the create could have
> >> taken up to 30 seconds to commit and the same for the few bytes of
> >> data, but a few ms later a snapshot was made and the metadata change
> >> was there and the data change was not. Could it have happened that
> >> the snapshot would not have the newly created file and this was just a
> >> timing issue that should not be relied upon?
> >>
> >> I'm just wondering why that file was there at all.
> >
> > I would say that is because the moment the file got created, the
> > resulting metadata was commited immediately. The data not yet.
>
> Josef explained it to me on IRC. Meta-data changes like file creation
> get added to the current transaction and snapshots start a new
> transaction so that is why the empty file is in the snapshot.
>
> The file is empty because with delayed allocation, the data has not
> hit the filesystem yet and thus has no representation in filesystem
> operations like snapshots.
You can make btrfs include the file data in the snapshot along with the
metadata with the 'flushoncommit' mount option. The problem is that this
will make _all_ btrfs commits more expensive, as they'll block new
operations during the commit while old data is being flushed out.
We could trivially make this happen only when there is a new snapshot, to
get the behavior you expect (see patch below). If the goal is to make a
perfectly consistent snapshot of the file system, this is better than
sync ; btrfsctl -s snap whatever
because there wouldn't be a window where metadata changes make it into the
snapshot but file data does not.
Is there really a use case for the sort of 'lazy' snapshots with
out-of-sync data and metadata (like 0-byte files)? If so, we should add
another ioctl for a full-blown snapshot so that users who _do_ want a
fully consistent snapshot can get it.
If not, something like the below should be sufficient to make all
snapshots fully consistent...
sage
---
From: Sage Weil <sage@newdream.net>
Date: Fri, 19 Feb 2010 14:13:50 -0800
Subject: [PATCH] Btrfs: flush data on snapshot creation
Flush any delalloc extents when we create a snapshot, so that recently
written file data is always included in the snapshot.
Signed-off-by: Sage Weil <sage@newdream.net>
---
fs/btrfs/transaction.c | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e83d4e1..f5b7029 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1084,13 +1084,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
mutex_unlock(&root->fs_info->trans_mutex);
- if (flush_on_commit) {
+ if (flush_on_commit || snap_pending) {
btrfs_start_delalloc_inodes(root, 1);
ret = btrfs_wait_ordered_extents(root, 0, 1);
BUG_ON(ret);
- } else if (snap_pending) {
- ret = btrfs_wait_ordered_extents(root, 0, 1);
- BUG_ON(ret);
}
/*
--
1.6.6.1
next prev parent reply other threads:[~2010-02-19 22:22 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-12 1:49 zero-length files in snapshots Nickolai Zeldovich
2010-02-12 3:11 ` Chris Ball
2010-02-12 4:50 ` Mike Fedyk
2010-02-12 15:19 ` Josef Bacik
2010-02-12 16:18 ` Mike Fedyk
2010-02-12 16:22 ` Josef Bacik
2010-02-12 16:27 ` Mike Fedyk
2010-02-12 16:32 ` Josef Bacik
2010-02-12 17:13 ` Mike Fedyk
2010-02-13 11:25 ` Sander
2010-02-13 19:26 ` Mike Fedyk
2010-02-19 22:22 ` Sage Weil [this message]
2010-02-25 18:57 ` Goffredo Baroncelli
2010-02-12 18:22 ` Ravi Pinjala
2010-02-12 18:45 ` Josef Bacik
2010-02-12 19:03 ` Chris Ball
2010-02-12 19:10 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.1002191355490.21676@cobra.newdream.net \
--to=sage@newdream.net \
--cc=cjb@laptop.org \
--cc=josef@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=mfedyk@mikefedyk.com \
--cc=nickolai@csail.mit.edu \
--cc=sander@humilis.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).