On Sat, 13 Feb 2010, Mike Fedyk wrote: > On Sat, Feb 13, 2010 at 3:25 AM, Sander wrote: > > Mike Fedyk wrote (ao): > >> On Fri, Feb 12, 2010 at 8:32 AM, Josef Bacik wrote: > >> > Creating a file is a metadata operation, and _any_ metadata operation has to be > >> > committed to disk when the transaction commits in order to maintain a coherent > >> > fs. ??Thanks, > >> > >> What I still don't understand though is that the create could have > >> taken up to 30 seconds to commit and the same for the few bytes of > >> data, but a few ms later a snapshot was made and the metadata change > >> was there and the data change was not.  Could it have happened that > >> the snapshot would not have the newly created file and this was just a > >> timing issue that should not be relied upon? > >> > >> I'm just wondering why that file was there at all. > > > > I would say that is because the moment the file got created, the > > resulting metadata was commited immediately. The data not yet. > > Josef explained it to me on IRC. Meta-data changes like file creation > get added to the current transaction and snapshots start a new > transaction so that is why the empty file is in the snapshot. > > The file is empty because with delayed allocation, the data has not > hit the filesystem yet and thus has no representation in filesystem > operations like snapshots. You can make btrfs include the file data in the snapshot along with the metadata with the 'flushoncommit' mount option. The problem is that this will make _all_ btrfs commits more expensive, as they'll block new operations during the commit while old data is being flushed out. We could trivially make this happen only when there is a new snapshot, to get the behavior you expect (see patch below). If the goal is to make a perfectly consistent snapshot of the file system, this is better than sync ; btrfsctl -s snap whatever because there wouldn't be a window where metadata changes make it into the snapshot but file data does not. Is there really a use case for the sort of 'lazy' snapshots with out-of-sync data and metadata (like 0-byte files)? If so, we should add another ioctl for a full-blown snapshot so that users who _do_ want a fully consistent snapshot can get it. If not, something like the below should be sufficient to make all snapshots fully consistent... sage --- From: Sage Weil Date: Fri, 19 Feb 2010 14:13:50 -0800 Subject: [PATCH] Btrfs: flush data on snapshot creation Flush any delalloc extents when we create a snapshot, so that recently written file data is always included in the snapshot. Signed-off-by: Sage Weil --- fs/btrfs/transaction.c | 5 +---- 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e83d4e1..f5b7029 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1084,13 +1084,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, mutex_unlock(&root->fs_info->trans_mutex); - if (flush_on_commit) { + if (flush_on_commit || snap_pending) { btrfs_start_delalloc_inodes(root, 1); ret = btrfs_wait_ordered_extents(root, 0, 1); BUG_ON(ret); - } else if (snap_pending) { - ret = btrfs_wait_ordered_extents(root, 0, 1); - BUG_ON(ret); } /* -- 1.6.6.1