linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Mike Fedyk <mfedyk@mikefedyk.com>
Cc: sander@humilis.net, Josef Bacik <josef@redhat.com>,
	Chris Ball <cjb@laptop.org>,
	Nickolai Zeldovich <nickolai@csail.mit.edu>,
	linux-btrfs@vger.kernel.org
Subject: Re: zero-length files in snapshots
Date: Fri, 19 Feb 2010 14:22:25 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.1002191355490.21676@cobra.newdream.net> (raw)
In-Reply-To: <93cdabd21002131126v4b4d3c1ei2a747a7a5de7b1c8@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3658 bytes --]

On Sat, 13 Feb 2010, Mike Fedyk wrote:
> On Sat, Feb 13, 2010 at 3:25 AM, Sander <sander@humilis.net> wrote:
> > Mike Fedyk wrote (ao):
> >> On Fri, Feb 12, 2010 at 8:32 AM, Josef Bacik <josef@redhat.com> wrote:
> >> > Creating a file is a metadata operation, and _any_ metadata operation has to be
> >> > committed to disk when the transaction commits in order to maintain a coherent
> >> > fs. ??Thanks,
> >>
> >> What I still don't understand though is that the create could have
> >> taken up to 30 seconds to commit and the same for the few bytes of
> >> data, but a few ms later a snapshot was made and the metadata change
> >> was there and the data change was not.  Could it have happened that
> >> the snapshot would not have the newly created file and this was just a
> >> timing issue that should not be relied upon?
> >>
> >> I'm just wondering why that file was there at all.
> >
> > I would say that is because the moment the file got created, the
> > resulting metadata was commited immediately. The data not yet.
> 
> Josef explained it to me on IRC.  Meta-data changes like file creation
> get added to the current transaction and snapshots start a new
> transaction so that is why the empty file is in the snapshot.
> 
> The file is empty because with delayed allocation, the data has not
> hit the filesystem yet and thus has no representation in filesystem
> operations like snapshots.

You can make btrfs include the file data in the snapshot along with the 
metadata with the 'flushoncommit' mount option.  The problem is that this 
will make _all_ btrfs commits more expensive, as they'll block new 
operations during the commit while old data is being flushed out.

We could trivially make this happen only when there is a new snapshot, to 
get the behavior you expect (see patch below).  If the goal is to make a 
perfectly consistent snapshot of the file system, this is better than

	sync ; btrfsctl -s snap whatever

because there wouldn't be a window where metadata changes make it into the 
snapshot but file data does not.

Is there really a use case for the sort of 'lazy' snapshots with 
out-of-sync data and metadata (like 0-byte files)?  If so, we should add 
another ioctl for a full-blown snapshot so that users who _do_ want a 
fully consistent snapshot can get it.

If not, something like the below should be sufficient to make all 
snapshots fully consistent...

sage

---

From: Sage Weil <sage@newdream.net>
Date: Fri, 19 Feb 2010 14:13:50 -0800
Subject: [PATCH] Btrfs: flush data on snapshot creation

Flush any delalloc extents when we create a snapshot, so that recently
written file data is always included in the snapshot.

Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/btrfs/transaction.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e83d4e1..f5b7029 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1084,13 +1084,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 
                mutex_unlock(&root->fs_info->trans_mutex);
 
-               if (flush_on_commit) {
+               if (flush_on_commit || snap_pending) {
                        btrfs_start_delalloc_inodes(root, 1);
                        ret = btrfs_wait_ordered_extents(root, 0, 1);
                        BUG_ON(ret);
-               } else if (snap_pending) {
-                       ret = btrfs_wait_ordered_extents(root, 0, 1);
-                       BUG_ON(ret);
                }
 
                /*
-- 
1.6.6.1

  reply	other threads:[~2010-02-19 22:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-12  1:49 zero-length files in snapshots Nickolai Zeldovich
2010-02-12  3:11 ` Chris Ball
2010-02-12  4:50   ` Mike Fedyk
2010-02-12 15:19     ` Josef Bacik
2010-02-12 16:18       ` Mike Fedyk
2010-02-12 16:22         ` Josef Bacik
2010-02-12 16:27           ` Mike Fedyk
2010-02-12 16:32             ` Josef Bacik
2010-02-12 17:13               ` Mike Fedyk
2010-02-13 11:25                 ` Sander
2010-02-13 19:26                   ` Mike Fedyk
2010-02-19 22:22                     ` Sage Weil [this message]
2010-02-25 18:57                       ` Goffredo Baroncelli
2010-02-12 18:22       ` Ravi Pinjala
2010-02-12 18:45         ` Josef Bacik
2010-02-12 19:03         ` Chris Ball
2010-02-12 19:10       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.1002191355490.21676@cobra.newdream.net \
    --to=sage@newdream.net \
    --cc=cjb@laptop.org \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfedyk@mikefedyk.com \
    --cc=nickolai@csail.mit.edu \
    --cc=sander@humilis.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).