All of lore.kernel.org
 help / color / mirror / Atom feed
From: fdmanana@kernel.org
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES
Date: Tue, 12 Nov 2019 15:13:31 +0000	[thread overview]
Message-ID: <20191112151331.3641-1-fdmanana@kernel.org> (raw)

From: Filipe Manana <fdmanana@suse.com>

When using the NO_HOLES feature, if we punch a hole into a file and then
fsync it, there is a case where a subsequent fsync will miss the fact that
a hole was punched:

1) The extent items of the inode span multiple leafs;

2) The hole covers a range that affects only the extent items of the first
   leaf;

3) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC
   is set in the inode's runtime flags).

That results in the hole not existing after replaying the log tree.

For example, if the fs/subvolume tree has the following layout for a
particular inode:

  Leaf N, generation 10:

  [ ... INODE_ITEM INODE_REF EXTENT_ITEM (0 64K) EXTENT_ITEM (64K 128K) ]

  Leaf N + 1, generation 10:

  [ EXTENT_ITEM (128K 64K) ... ]

If at transaction 11 we punch a hole coverting the range [0, 128K[, we end
up dropping the two extent items from leaf N, but we don't touch the other
leaf, so we end up in the following state:

  Leaf N, generation 11:

  [ ... INODE_ITEM INODE_REF ]

  Leaf N + 1, generation 10:

  [ EXTENT_ITEM (128K 64K) ... ]

A full fsync after punching the hole will only process leaf N because it
was modified in the current transaction, but not leaf N + 1, since it was
not modified in the current transaction (generation 10 and not 11). As
a result the fsync will not log any holes, because it didn't process any
leaf with extent items.

So fix this by detecting any leading hole in the file for a full fsync
when using the NO_HOLES feature if we didn't process any extent items for
the file.

A test case for fstests follows soon.

Fixes: 16e7549f045d33 ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 8a6cc600bf18..dfc07d23c903 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4807,6 +4807,68 @@ static int btrfs_log_trailing_hole(struct btrfs_trans_handle *trans,
 }
 
 /*
+ * When using the NO_HOLES feature we need special care to make sure we don't
+ * miss a hole that starts at file offset 0. If we have an inode with extent
+ * items spanning multiple leafs and we punch a hole covering only the extents
+ * in the first leaf, then the hole punching operation deletes those extent
+ * items from the leaf without touching the remaining leafs (unless the leaf
+ * becomes too small in which case we may move items from the first leaf to the
+ * second leaf) - this causes a full fsync to not copy any extent items and
+ * therefore not detect and log a hole starting at offset 0. We check for such
+ * scenario here and log the hole.
+ */
+static int btrfs_log_leading_hole(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  struct btrfs_inode *inode,
+				  struct btrfs_path *path)
+{
+	struct btrfs_key key;
+	const u64 ino = btrfs_ino(inode);
+	u64 hole_len;
+	int ret;
+
+	if (!btrfs_fs_incompat(root->fs_info, NO_HOLES))
+		return 0;
+
+	if (i_size_read(&inode->vfs_inode) == 0)
+		return 0;
+
+	key.objectid = ino;
+	key.type = BTRFS_EXTENT_DATA_KEY;
+	key.offset = 0;
+
+	/*
+	 * If there's an extent item at file offset 0 we don't have anything
+	 * to do, just return.
+	 */
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret <= 0)
+		return ret;
+
+	/*
+	 * Figure out the size of the hole. If there is no next extent, just
+	 * exit without doing anything (if the file consists of a single hole,
+	 * has no extents, we log the hole at btrfs_log_trailing_hole()).
+	 */
+	if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
+		ret = btrfs_next_leaf(root, path);
+		if (ret < 0)
+			return ret;
+		else if (ret > 0)
+			return 0;
+	}
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	if (key.objectid != ino || key.type != BTRFS_EXTENT_DATA_KEY)
+		return 0;
+
+	hole_len = key.offset;
+	btrfs_release_path(path);
+	ret = btrfs_insert_file_extent(trans, root->log_root, ino, 0, 0, 0,
+				       hole_len, 0, hole_len, 0, 0, 0);
+	return ret;
+}
+
+/*
  * When we are logging a new inode X, check if it doesn't have a reference that
  * matches the reference from some other inode Y created in a past transaction
  * and that was renamed in the current transaction. If we don't do this, then at
@@ -5393,6 +5455,18 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
 	if (err)
 		goto out_unlock;
 	xattrs_logged = true;
+	/*
+	 * If we are doing a full fsync and didn't copy any extent items, we
+	 * may need to log a leading hole extent item when using the NO_HOLES
+	 * feature, otherwise we end up not persisting the hole.
+	 */
+	if (!fast_search && last_extent == 0) {
+		btrfs_release_path(path);
+		btrfs_release_path(dst_path);
+		err = btrfs_log_leading_hole(trans, root, inode, path);
+		if (err)
+			goto out_unlock;
+	}
 	if (max_key.type >= BTRFS_EXTENT_DATA_KEY && !fast_search) {
 		btrfs_release_path(path);
 		btrfs_release_path(dst_path);
-- 
2.11.0


             reply	other threads:[~2019-11-12 15:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12 15:13 fdmanana [this message]
2019-11-12 17:34 ` [PATCH] Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES Josef Bacik
2019-11-12 17:39   ` Filipe Manana
2019-11-12 17:43     ` Josef Bacik
2019-11-14 15:18 ` David Sterba
2019-11-14 15:45   ` Filipe Manana
2019-11-15  9:42     ` David Sterba
2019-11-19 12:07 ` [PATCH v2] " fdmanana
2019-11-19 15:10   ` Josef Bacik
2019-11-25 13:25   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191112151331.3641-1-fdmanana@kernel.org \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.