linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES
@ 2020-01-17 14:02 Josef Bacik
  2020-01-17 14:02 ` [PATCH 1/6] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v2->v3:
- Rebased onto misc-next.
- Added a patch to stop using ->leave_spinning in truncate_inode_items.

v1->v2:
- fixed a bug in 'btrfs: use the file extent tree infrastructure' that would
  result in 0 length files because btrfs_truncate_inode_items() was clearing the
  file extent map when we fsync'ed multiple times.  Validated this with a
  modified fsx and generic/521 that reproduced the problem, those modifications
  were sent up as well.
- dropped the RFC

----------------- Original Message -----------------------
We've historically had this problem where you could flush a targeted section of
an inode and end up with a hole between extents without a hole extent item.
This of course makes fsck complain because this is not ok for a file system that
doesn't have NO_HOLES set.  Because this is a well understood problem I and
others have been ignoring fsck failures during certain xfstests (generic/475 for
example) because they would regularly trigger this edge case.

However this isn't a great behavior to have, we should really be taking all fsck
failures seriously, and we could potentially ignore fsck legitimate fsck errors
because we expect it to be this particular failure.

In order to fix this we need to keep track of where we have valid extent items,
and only update i_size to encompass that area.  This unfortunately means we need
a new per-inode extent_io_tree to keep track of the valid ranges.  This is
relatively straightforward in practice, and helpers have been added to manage
this so that in the case of a NO_HOLES file system we just simply skip this work
altogether.

I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
really like Filipe to take a look and I still have some longer running tests
going on the series.  All of our boxes internally are btrfs and the box I was
testing on ended up with a weird RPM db corruption that was likely from an
earlier, broken version of the patch.  However I cannot be 100% sure that was
the case, so I'm giving it a few more days of testing before I'm satisfied
there's not some weird thing that RPM does that xfstests doesn't cover.

This has gone through several iterations of xfstests already, including many
loops of generic/475 for validation to make sure it was no longer failing.  So
far so good, but for something like this wider testing will definitely be
necessary.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
@ 2020-01-17 14:02 ` Josef Bacik
  2020-01-17 14:02 ` [PATCH 2/6] btrfs: don't use path->leave_spinning for truncate Josef Bacik
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Filipe Manana

We were using btrfs_i_size_write(), which unconditionally jacks up
inode->disk_i_size.  However since clone can operate on ranges we could
have pending ordered extents for a range prior to the start of our clone
operation and thus increase disk_i_size too far and have a hole with no
file extent.

Fix this by using the btrfs_ordered_update_i_size helper which will do
the right thing in the face of pending ordered extents outside of our
clone range.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ef6c5d672860..db95144e4f22 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3332,8 +3332,10 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 	 */
 	if (endoff > destoff + olen)
 		endoff = destoff + olen;
-	if (endoff > inode->i_size)
-		btrfs_i_size_write(BTRFS_I(inode), endoff);
+	if (endoff > inode->i_size) {
+		i_size_write(inode, endoff);
+		btrfs_ordered_update_i_size(inode, endoff, NULL);
+	}
 
 	ret = btrfs_update_inode(trans, root, inode);
 	if (ret) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] btrfs: don't use path->leave_spinning for truncate
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
  2020-01-17 14:02 ` [PATCH 1/6] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
@ 2020-01-17 14:02 ` Josef Bacik
  2020-01-17 15:08   ` Filipe Manana
  2020-01-17 14:02 ` [PATCH 3/6] btrfs: introduce the inode->file_extent_tree Josef Bacik
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The only time we actually leave the path spinning is if we're truncating
a small amount and don't actually free an extent, which is not a common
occurrence.  We have to set the path blocking in order to add the
delayed ref anyway, so the first extent we find we set the path to
blocking and stay blocking for the duration of the operation.  With the
upcoming file extent map stuff there will be another case that we have
to have the path blocking, so just swap to blocking always.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 10087e1a5946..4bdd412182ae 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4066,7 +4066,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		goto out;
 	}
 
-	path->leave_spinning = 1;
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret < 0)
 		goto out;
@@ -4218,7 +4217,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		     root == fs_info->tree_root)) {
 			struct btrfs_ref ref = { 0 };
 
-			btrfs_set_path_blocking(path);
 			bytes_deleted += extent_num_bytes;
 
 			btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] btrfs: introduce the inode->file_extent_tree
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
  2020-01-17 14:02 ` [PATCH 1/6] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
  2020-01-17 14:02 ` [PATCH 2/6] btrfs: don't use path->leave_spinning for truncate Josef Bacik
@ 2020-01-17 14:02 ` Josef Bacik
  2020-01-30 11:17   ` Nikolay Borisov
  2020-02-03 18:11   ` David Sterba
  2020-01-17 14:02 ` [PATCH 4/6] btrfs: use the file extent tree infrastructure Josef Bacik
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Filipe Manana

In order to keep track of where we have file extents on disk, and thus
where it is safe to adjust the i_size to, we need to have a tree in
place to keep track of the contiguous areas we have file extents for.
Add helpers to use this tree, as it's not required for NO_HOLES file
systems.  We will use this by setting DIRTY for areas we know we have
file extent item's set, and clearing it when we remove file extent items
for truncation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/btrfs_inode.h    |  5 +++
 fs/btrfs/ctree.h          |  5 +++
 fs/btrfs/extent-io-tree.h |  1 +
 fs/btrfs/extent_io.c      | 11 +++++
 fs/btrfs/file-item.c      | 91 +++++++++++++++++++++++++++++++++++++++
 fs/btrfs/inode.c          |  6 +++
 6 files changed, 119 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..d9dcbac513ed 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -60,6 +60,11 @@ struct btrfs_inode {
 	 */
 	struct extent_io_tree io_failure_tree;
 
+	/* keeps track of where we have extent items mapped in order to make
+	 * sure our i_size adjustments are accurate.
+	 */
+	struct extent_io_tree file_extent_tree;
+
 	/* held while logging the inode in tree-log.c */
 	struct mutex log_mutex;
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 00cf1641f1b9..8a2c1665baad 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2851,6 +2851,11 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     struct btrfs_file_extent_item *fi,
 				     const bool new_inline,
 				     struct extent_map *em);
+int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
+					u64 len);
+int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
+				      u64 len);
+void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize);
 
 /* inode.c */
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index a3febe746c79..c8bcd2e3184c 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -44,6 +44,7 @@ enum {
 	IO_TREE_TRANS_DIRTY_PAGES,
 	IO_TREE_ROOT_DIRTY_LOG_PAGES,
 	IO_TREE_SELFTEST,
+	IO_TREE_INODE_FILE_EXTENT,
 };
 
 struct extent_io_tree {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0d40cd7427ba..f9b223d827b3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -265,6 +265,15 @@ void __cold extent_io_exit(void)
 	bioset_exit(&btrfs_bioset);
 }
 
+/*
+ * For the file_extent_tree, we want to hold the inode lock when we lookup and
+ * update the disk_i_size, but lockdep will complain because our io_tree we hold
+ * the tree lock and get the inode lock when setting delalloc.  These two things
+ * are unrelated, so make a class for the file_extent_tree so we don't get the
+ * two locking patterns mixed up.
+ */
+static struct lock_class_key file_extent_tree_class;
+
 void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 			 struct extent_io_tree *tree, unsigned int owner,
 			 void *private_data)
@@ -276,6 +285,8 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&tree->lock);
 	tree->private_data = private_data;
 	tree->owner = owner;
+	if (owner == IO_TREE_INODE_FILE_EXTENT)
+		lockdep_set_class(&tree->lock, &file_extent_tree_class);
 }
 
 void extent_io_tree_release(struct extent_io_tree *tree)
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index c2f365662d55..e5dc6c4b2f05 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -23,6 +23,97 @@
 #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
 				       PAGE_SIZE))
 
+/**
+ * @inode - the inode we want to update the disk_i_size for
+ * @new_isize - the isize we want to set to, 0 if we use i_size
+ *
+ * With NO_HOLES set this simply sets the disk_is_size to whatever i_size_read()
+ * returns as it is perfectly fine with a file that has holes without hole file
+ * extent items.
+ *
+ * However for !NO_HOLES we need to only return the area that is contiguous from
+ * the 0 offset of the file.  Otherwise we could end up adjust i_size up to an
+ * extent that has a gap in between.
+ *
+ * Finally new_isize should only be set in the case of truncate where we're not
+ * ready to use i_size_read() as the limiter yet.
+ */
+void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize)
+{
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	u64 start, end, isize;
+	int ret;
+
+	isize = new_isize ? new_isize : i_size_read(inode);
+	if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
+		BTRFS_I(inode)->disk_i_size = isize;
+		return;
+	}
+
+	spin_lock(&BTRFS_I(inode)->lock);
+	ret = find_first_extent_bit(&BTRFS_I(inode)->file_extent_tree, 0,
+				    &start, &end, EXTENT_DIRTY, NULL);
+	if (!ret && start == 0)
+		isize = min(isize, end + 1);
+	else
+		isize = 0;
+	BTRFS_I(inode)->disk_i_size = isize;
+	spin_unlock(&BTRFS_I(inode)->lock);
+}
+
+/**
+ * @inode - the inode we're modifying
+ * @start - the start file offset of the file extent we've inserted
+ * @len - the logical length of the file extent item
+ *
+ * Call when we are insering a new file extent where there was none before.
+ * Does not need to call this in the case where we're replacing an existing file
+ * extent, however if you're unsure it's fine to call this multiple times.
+ *
+ * The start and len must match the file extent item, so thus must be sectorsize
+ * aligned.
+ */
+int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
+				      u64 len)
+{
+	if (len == 0)
+		return 0;
+
+	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize));
+
+	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
+		return 0;
+	return set_extent_bits(&inode->file_extent_tree, start, start + len - 1,
+			       EXTENT_DIRTY);
+}
+
+/**
+ * @inode - the inode we're modifying
+ * @start - the start file offset of the file extent we've inserted
+ * @len - the logical length of the file extent item
+ *
+ * Called when we drop a file extent, for example when we truncate.  Doesn't
+ * need to be called for cases where we're replacing a file extent, like when
+ * we've cow'ed a file extent.
+ *
+ * The start and len must match the file extent item, so thus must be sectorsize
+ * aligned.
+ */
+int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
+					u64 len)
+{
+	if (len == 0)
+		return 0;
+
+	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize) ||
+	       len == (u64)-1);
+
+	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
+		return 0;
+	return clear_extent_bit(&inode->file_extent_tree, start,
+				start + len - 1, EXTENT_DIRTY, 0, 0, NULL);
+}
+
 static inline u32 max_ordered_sum_bytes(struct btrfs_fs_info *fs_info,
 					u16 csum_size)
 {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4bdd412182ae..fd8f821a3919 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3116,6 +3116,9 @@ static int btrfs_read_locked_inode(struct inode *inode,
 	i_uid_write(inode, btrfs_inode_uid(leaf, inode_item));
 	i_gid_write(inode, btrfs_inode_gid(leaf, inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_inode_size(leaf, inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 
 	inode->i_atime.tv_sec = btrfs_timespec_sec(leaf, &inode_item->atime);
 	inode->i_atime.tv_nsec = btrfs_timespec_nsec(leaf, &inode_item->atime);
@@ -8695,6 +8698,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode);
 	extent_io_tree_init(fs_info, &ei->io_failure_tree,
 			    IO_TREE_INODE_IO_FAILURE, inode);
+	extent_io_tree_init(fs_info, &ei->file_extent_tree,
+			    IO_TREE_INODE_FILE_EXTENT, inode);
 	ei->io_tree.track_uptodate = true;
 	ei->io_failure_tree.track_uptodate = true;
 	atomic_set(&ei->sync_writers, 0);
@@ -8761,6 +8766,7 @@ void btrfs_destroy_inode(struct inode *inode)
 	btrfs_qgroup_check_reserved_leak(inode);
 	inode_tree_del(inode);
 	btrfs_drop_extent_cache(BTRFS_I(inode), 0, (u64)-1, 0);
+	btrfs_inode_clear_file_extent_range(BTRFS_I(inode), 0, (u64)-1);
 	btrfs_put_root(BTRFS_I(inode)->root);
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] btrfs: use the file extent tree infrastructure
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (2 preceding siblings ...)
  2020-01-17 14:02 ` [PATCH 3/6] btrfs: introduce the inode->file_extent_tree Josef Bacik
@ 2020-01-17 14:02 ` Josef Bacik
  2020-03-06 11:51   ` Filipe Manana
  2020-01-17 14:02 ` [PATCH 5/6] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Filipe Manana

We want to use this everywhere we modify the file extent items
permanently.  These include

1) Inserting new file extents for writes and prealloc extents.
2) Truncating inode items.
3) btrfs_cont_expand().
4) Insert inline extents.
5) Insert new extents from log replay.
6) Insert a new extent for clone, as it could be past isize.

We do not however call the clear helper for hole punching because it
simply swaps out an existing file extent for a hole, so there's
effectively no change as far as the i_size is concerned.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-inode.c |  4 +++
 fs/btrfs/file.c          |  6 ++++
 fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/tree-log.c      |  5 ++++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index d3e15e1d4a91..8b4dcf4f6b3e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 {
 	struct btrfs_delayed_node *delayed_node;
 	struct btrfs_inode_item *inode_item;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 
 	delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
 	if (!delayed_node)
@@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 	i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
 	i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 	inode->i_mode = btrfs_stack_inode_mode(inode_item);
 	set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
 	inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c6f9029e3d49..f72fb38e9aaa 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2482,6 +2482,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+						clone_info->file_offset,
+						clone_len);
+	if (ret)
+		return ret;
+
 	/* If it's a hole, nothing more needs to be done. */
 	if (clone_info->disk_offset == 0)
 		return 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fd8f821a3919..21fb80292256 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -241,6 +241,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	/*
+	 * We align size to sectorsize for inline extents just for simplicity
+	 * sake.
+	 */
+	size = ALIGN(size, root->fs_info->sectorsize);
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
+	if (ret)
+		goto fail;
+
 	/*
 	 * we're an inline extent, so nobody can
 	 * extend the file past i_size without locking
@@ -2375,6 +2384,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
+						ram_bytes);
+	if (ret)
+		goto out;
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
@@ -4084,6 +4098,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	while (1) {
+		u64 clear_start = 0, clear_len = 0;
+
 		fi = NULL;
 		leaf = path->nodes[0];
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
@@ -4134,6 +4150,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
 			u64 num_dec;
+
+			clear_start = found_key.offset;
 			extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
 			if (!del_item) {
 				u64 orig_num_bytes =
@@ -4141,6 +4159,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				extent_num_bytes = ALIGN(new_size -
 						found_key.offset,
 						fs_info->sectorsize);
+				clear_start = ALIGN(new_size,
+						    fs_info->sectorsize);
 				btrfs_set_file_extent_num_bytes(leaf, fi,
 							 extent_num_bytes);
 				num_dec = (orig_num_bytes -
@@ -4166,6 +4186,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						inode_sub_bytes(inode, num_dec);
 				}
 			}
+			clear_len = num_dec;
 		} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
 			/*
 			 * we can't truncate inline items that have had
@@ -4187,12 +4208,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 */
 				ret = NEED_TRUNCATE_BLOCK;
 				break;
+			} else {
+				/*
+				 * Inline extents are special, we just treat
+				 * them as a full sector worth in the file
+				 * extent tree just for simplicity sake.
+				 */
+				clear_len = fs_info->sectorsize;
 			}
 
 			if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 				inode_sub_bytes(inode, item_end + 1 - new_size);
 		}
 delete:
+		/*
+		 * We use btrfs_truncate_inode_items() to clean up log trees for
+		 * multiple fsyncs, and in this case we don't want to clear the
+		 * file extent range because it's just the log.
+		 */
+		if (root == BTRFS_I(inode)->root) {
+			ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+								  clear_start,
+								  clear_len);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				break;
+			}
+		}
+
 		if (del_item)
 			last_size = found_key.offset;
 		else
@@ -4513,14 +4556,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 		}
 		last_byte = min(extent_map_end(em), block_end);
 		last_byte = ALIGN(last_byte, fs_info->sectorsize);
+		hole_size = last_byte - cur_offset;
+
 		if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
 			struct extent_map *hole_em;
-			hole_size = last_byte - cur_offset;
 
 			err = maybe_insert_hole(root, inode, cur_offset,
 						hole_size);
 			if (err)
 				break;
+
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
+
 			btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
@@ -4552,6 +4603,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 							hole_size - 1, 0);
 			}
 			free_extent_map(hole_em);
+		} else {
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
 		}
 next:
 		free_extent_map(em);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 80978ebfdcbb..56278a5b69e4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -830,6 +830,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 			goto out;
 	}
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
+						extent_end - start);
+	if (ret)
+		goto out;
+
 	inode_add_bytes(inode, nbytes);
 update_inode:
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] btrfs: replace all uses of btrfs_ordered_update_i_size
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (3 preceding siblings ...)
  2020-01-17 14:02 ` [PATCH 4/6] btrfs: use the file extent tree infrastructure Josef Bacik
@ 2020-01-17 14:02 ` Josef Bacik
  2020-01-17 14:02 ` [PATCH 6/6] btrfs: delete the ordered isize update code Josef Bacik
  2020-02-03 17:59 ` [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES David Sterba
  6 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Filipe Manana

Now that we have a safe way to update the i_size, replace all uses of
btrfs_ordered_update_i_size with btrfs_inode_safe_disk_i_size_write.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c  |  2 +-
 fs/btrfs/inode.c | 12 ++++++------
 fs/btrfs/ioctl.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f72fb38e9aaa..ac838cb50bea 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2937,7 +2937,7 @@ static int btrfs_fallocate_update_isize(struct inode *inode,
 
 	inode->i_ctime = current_time(inode);
 	i_size_write(inode, end);
-	btrfs_ordered_update_i_size(inode, end, NULL);
+	btrfs_inode_safe_disk_i_size_write(inode, 0);
 	ret = btrfs_update_inode(trans, root, inode);
 	ret2 = btrfs_end_transaction(trans);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 21fb80292256..da31571b150b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2479,7 +2479,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		 */
 		btrfs_qgroup_free_data(inode, NULL, start,
 				       ordered_extent->num_bytes);
-		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 		if (freespace_inode)
 			trans = btrfs_join_transaction_spacecache(root);
 		else
@@ -2550,7 +2550,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		goto out;
 	}
 
-	btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+	btrfs_inode_safe_disk_i_size_write(inode, 0);
 	ret = btrfs_update_inode_fallback(trans, root, inode);
 	if (ret) { /* -ENOMEM or corruption */
 		btrfs_abort_transaction(trans, ret);
@@ -4337,7 +4337,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		ASSERT(last_size >= new_size);
 		if (!ret && last_size > new_size)
 			last_size = new_size;
-		btrfs_ordered_update_i_size(inode, last_size, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, last_size);
 	}
 
 	btrfs_free_path(path);
@@ -4666,7 +4666,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		}
 
 		i_size_write(inode, newsize);
-		btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 		pagecache_isize_extended(inode, oldsize, newsize);
 		ret = btrfs_update_inode(trans, root, inode);
 		btrfs_end_write_no_snapshotting(root);
@@ -8651,7 +8651,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 			ret = PTR_ERR(trans);
 			goto out;
 		}
-		btrfs_ordered_update_i_size(inode, inode->i_size, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 	}
 
 	if (trans) {
@@ -9907,7 +9907,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			else
 				i_size = cur_offset;
 			i_size_write(inode, i_size);
-			btrfs_ordered_update_i_size(inode, i_size, NULL);
+			btrfs_inode_safe_disk_i_size_write(inode, 0);
 		}
 
 		ret = btrfs_update_inode(trans, root, inode);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index db95144e4f22..ecb6b188df15 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3334,7 +3334,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 		endoff = destoff + olen;
 	if (endoff > inode->i_size) {
 		i_size_write(inode, endoff);
-		btrfs_ordered_update_i_size(inode, endoff, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 	}
 
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] btrfs: delete the ordered isize update code
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (4 preceding siblings ...)
  2020-01-17 14:02 ` [PATCH 5/6] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
@ 2020-01-17 14:02 ` Josef Bacik
  2020-02-03 17:59 ` [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES David Sterba
  6 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2020-01-17 14:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Filipe Manana

Now that we have a safe way to update the isize, remove all of this code
as it's no longer needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ordered-data.c      | 128 -----------------------------------
 fs/btrfs/ordered-data.h      |   7 --
 include/trace/events/btrfs.h |   1 -
 3 files changed, 136 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 64281247bd18..a68b6d745010 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -780,134 +780,6 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
 	return entry;
 }
 
-/*
- * After an extent is done, call this to conditionally update the on disk
- * i_size.  i_size is updated to cover any fully written part of the file.
- */
-int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
-				struct btrfs_ordered_extent *ordered)
-{
-	struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
-	u64 disk_i_size;
-	u64 new_i_size;
-	u64 i_size = i_size_read(inode);
-	struct rb_node *node;
-	struct rb_node *prev = NULL;
-	struct btrfs_ordered_extent *test;
-	int ret = 1;
-	u64 orig_offset = offset;
-
-	spin_lock_irq(&tree->lock);
-	if (ordered) {
-		offset = entry_end(ordered);
-		if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags))
-			offset = min(offset,
-				     ordered->file_offset +
-				     ordered->truncated_len);
-	} else {
-		offset = ALIGN(offset, btrfs_inode_sectorsize(inode));
-	}
-	disk_i_size = BTRFS_I(inode)->disk_i_size;
-
-	/*
-	 * truncate file.
-	 * If ordered is not NULL, then this is called from endio and
-	 * disk_i_size will be updated by either truncate itself or any
-	 * in-flight IOs which are inside the disk_i_size.
-	 *
-	 * Because btrfs_setsize() may set i_size with disk_i_size if truncate
-	 * fails somehow, we need to make sure we have a precise disk_i_size by
-	 * updating it as usual.
-	 *
-	 */
-	if (!ordered && disk_i_size > i_size) {
-		BTRFS_I(inode)->disk_i_size = orig_offset;
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * if the disk i_size is already at the inode->i_size, or
-	 * this ordered extent is inside the disk i_size, we're done
-	 */
-	if (disk_i_size == i_size)
-		goto out;
-
-	/*
-	 * We still need to update disk_i_size if outstanding_isize is greater
-	 * than disk_i_size.
-	 */
-	if (offset <= disk_i_size &&
-	    (!ordered || ordered->outstanding_isize <= disk_i_size))
-		goto out;
-
-	/*
-	 * walk backward from this ordered extent to disk_i_size.
-	 * if we find an ordered extent then we can't update disk i_size
-	 * yet
-	 */
-	if (ordered) {
-		node = rb_prev(&ordered->rb_node);
-	} else {
-		prev = tree_search(tree, offset);
-		/*
-		 * we insert file extents without involving ordered struct,
-		 * so there should be no ordered struct cover this offset
-		 */
-		if (prev) {
-			test = rb_entry(prev, struct btrfs_ordered_extent,
-					rb_node);
-			BUG_ON(offset_in_entry(test, offset));
-		}
-		node = prev;
-	}
-	for (; node; node = rb_prev(node)) {
-		test = rb_entry(node, struct btrfs_ordered_extent, rb_node);
-
-		/* We treat this entry as if it doesn't exist */
-		if (test_bit(BTRFS_ORDERED_UPDATED_ISIZE, &test->flags))
-			continue;
-
-		if (entry_end(test) <= disk_i_size)
-			break;
-		if (test->file_offset >= i_size)
-			break;
-
-		/*
-		 * We don't update disk_i_size now, so record this undealt
-		 * i_size. Or we will not know the real i_size.
-		 */
-		if (test->outstanding_isize < offset)
-			test->outstanding_isize = offset;
-		if (ordered &&
-		    ordered->outstanding_isize > test->outstanding_isize)
-			test->outstanding_isize = ordered->outstanding_isize;
-		goto out;
-	}
-	new_i_size = min_t(u64, offset, i_size);
-
-	/*
-	 * Some ordered extents may completed before the current one, and
-	 * we hold the real i_size in ->outstanding_isize.
-	 */
-	if (ordered && ordered->outstanding_isize > new_i_size)
-		new_i_size = min_t(u64, ordered->outstanding_isize, i_size);
-	BTRFS_I(inode)->disk_i_size = new_i_size;
-	ret = 0;
-out:
-	/*
-	 * We need to do this because we can't remove ordered extents until
-	 * after the i_disk_size has been updated and then the inode has been
-	 * updated to reflect the change, so we need to tell anybody who finds
-	 * this ordered extent that we've already done all the real work, we
-	 * just haven't completed all the other work.
-	 */
-	if (ordered)
-		set_bit(BTRFS_ORDERED_UPDATED_ISIZE, &ordered->flags);
-	spin_unlock_irq(&tree->lock);
-	return ret;
-}
-
 /*
  * search the ordered extents for one corresponding to 'offset' and
  * try to find a checksum.  This is used because we allow pages to
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 3beb4da4ab41..a46f319d9ae0 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -52,11 +52,6 @@ enum {
 	BTRFS_ORDERED_DIRECT,
 	/* We had an io error when writing this out */
 	BTRFS_ORDERED_IOERR,
-	/*
-	 * indicates whether this ordered extent has done its due diligence in
-	 * updating the isize
-	 */
-	BTRFS_ORDERED_UPDATED_ISIZE,
 	/* Set when we have to truncate an extent */
 	BTRFS_ORDERED_TRUNCATED,
 	/* Regular IO for COW */
@@ -182,8 +177,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		struct btrfs_inode *inode,
 		u64 file_offset,
 		u64 len);
-int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
-				struct btrfs_ordered_extent *ordered);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
 			   u8 *sum, int len);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 17088a112ed0..58b8a8107d7b 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -468,7 +468,6 @@ DEFINE_EVENT(
 		{ (1 << BTRFS_ORDERED_PREALLOC), 	"PREALLOC" 	}, \
 		{ (1 << BTRFS_ORDERED_DIRECT),	 	"DIRECT" 	}, \
 		{ (1 << BTRFS_ORDERED_IOERR), 		"IOERR" 	}, \
-		{ (1 << BTRFS_ORDERED_UPDATED_ISIZE), 	"UPDATED_ISIZE"	}, \
 		{ (1 << BTRFS_ORDERED_TRUNCATED), 	"TRUNCATED"	})
 
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] btrfs: don't use path->leave_spinning for truncate
  2020-01-17 14:02 ` [PATCH 2/6] btrfs: don't use path->leave_spinning for truncate Josef Bacik
@ 2020-01-17 15:08   ` Filipe Manana
  0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2020-01-17 15:08 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jan 17, 2020 at 2:03 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> The only time we actually leave the path spinning is if we're truncating
> a small amount and don't actually free an extent, which is not a common
> occurrence.  We have to set the path blocking in order to add the
> delayed ref anyway, so the first extent we find we set the path to
> blocking and stay blocking for the duration of the operation.  With the
> upcoming file extent map stuff there will be another case that we have
> to have the path blocking, so just swap to blocking always.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/inode.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 10087e1a5946..4bdd412182ae 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4066,7 +4066,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                 goto out;
>         }
>
> -       path->leave_spinning = 1;
>         ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>         if (ret < 0)
>                 goto out;
> @@ -4218,7 +4217,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                      root == fs_info->tree_root)) {
>                         struct btrfs_ref ref = { 0 };
>
> -                       btrfs_set_path_blocking(path);
>                         bytes_deleted += extent_num_bytes;
>
>                         btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF,
> --
> 2.24.1
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] btrfs: introduce the inode->file_extent_tree
  2020-01-17 14:02 ` [PATCH 3/6] btrfs: introduce the inode->file_extent_tree Josef Bacik
@ 2020-01-30 11:17   ` Nikolay Borisov
  2020-01-30 11:25     ` Filipe Manana
  2020-02-03 18:11   ` David Sterba
  1 sibling, 1 reply; 16+ messages in thread
From: Nikolay Borisov @ 2020-01-30 11:17 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Filipe Manana



On 17.01.20 г. 16:02 ч., Josef Bacik wrote:
> In order to keep track of where we have file extents on disk, and thus
> where it is safe to adjust the i_size to, we need to have a tree in
> place to keep track of the contiguous areas we have file extents for.
> Add helpers to use this tree, as it's not required for NO_HOLES file
> systems.  We will use this by setting DIRTY for areas we know we have
> file extent item's set, and clearing it when we remove file extent items
> for truncation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/btrfs_inode.h    |  5 +++
>  fs/btrfs/ctree.h          |  5 +++
>  fs/btrfs/extent-io-tree.h |  1 +
>  fs/btrfs/extent_io.c      | 11 +++++
>  fs/btrfs/file-item.c      | 91 +++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/inode.c          |  6 +++
>  6 files changed, 119 insertions(+)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4e12a477d32e..d9dcbac513ed 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -60,6 +60,11 @@ struct btrfs_inode {
>  	 */
>  	struct extent_io_tree io_failure_tree;
>  
> +	/* keeps track of where we have extent items mapped in order to make
> +	 * sure our i_size adjustments are accurate.
> +	 */
> +	struct extent_io_tree file_extent_tree;
> +
>  	/* held while logging the inode in tree-log.c */
>  	struct mutex log_mutex;
>  
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 00cf1641f1b9..8a2c1665baad 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2851,6 +2851,11 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>  				     struct btrfs_file_extent_item *fi,
>  				     const bool new_inline,
>  				     struct extent_map *em);
> +int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
> +					u64 len);
> +int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
> +				      u64 len);
> +void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize);
>  
>  /* inode.c */
>  struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index a3febe746c79..c8bcd2e3184c 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -44,6 +44,7 @@ enum {
>  	IO_TREE_TRANS_DIRTY_PAGES,
>  	IO_TREE_ROOT_DIRTY_LOG_PAGES,
>  	IO_TREE_SELFTEST,
> +	IO_TREE_INODE_FILE_EXTENT,
>  };
>  
>  struct extent_io_tree {
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 0d40cd7427ba..f9b223d827b3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -265,6 +265,15 @@ void __cold extent_io_exit(void)
>  	bioset_exit(&btrfs_bioset);
>  }
>  
> +/*
> + * For the file_extent_tree, we want to hold the inode lock when we lookup and
> + * update the disk_i_size, but lockdep will complain because our io_tree we hold
> + * the tree lock and get the inode lock when setting delalloc.  These two things
> + * are unrelated, so make a class for the file_extent_tree so we don't get the
> + * two locking patterns mixed up.
> + */
> +static struct lock_class_key file_extent_tree_class;
> +
>  void extent_io_tree_init(struct btrfs_fs_info *fs_info,
>  			 struct extent_io_tree *tree, unsigned int owner,
>  			 void *private_data)
> @@ -276,6 +285,8 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
>  	spin_lock_init(&tree->lock);
>  	tree->private_data = private_data;
>  	tree->owner = owner;
> +	if (owner == IO_TREE_INODE_FILE_EXTENT)
> +		lockdep_set_class(&tree->lock, &file_extent_tree_class);
>  }
>  
>  void extent_io_tree_release(struct extent_io_tree *tree)
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index c2f365662d55..e5dc6c4b2f05 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -23,6 +23,97 @@
>  #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
>  				       PAGE_SIZE))
>  
> +/**
> + * @inode - the inode we want to update the disk_i_size for
> + * @new_isize - the isize we want to set to, 0 if we use i_size
> + *
> + * With NO_HOLES set this simply sets the disk_is_size to whatever i_size_read()
> + * returns as it is perfectly fine with a file that has holes without hole file
> + * extent items.
> + *
> + * However for !NO_HOLES we need to only return the area that is contiguous from
> + * the 0 offset of the file.  Otherwise we could end up adjust i_size up to an
> + * extent that has a gap in between.
> + *
> + * Finally new_isize should only be set in the case of truncate where we're not
> + * ready to use i_size_read() as the limiter yet.
> + */
> +void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize)
> +{
> +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> +	u64 start, end, isize;
> +	int ret;
> +
> +	isize = new_isize ? new_isize : i_size_read(inode);
> +	if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
> +		BTRFS_I(inode)->disk_i_size = isize;
> +		return;
> +	}
> +
> +	spin_lock(&BTRFS_I(inode)->lock);
> +	ret = find_first_extent_bit(&BTRFS_I(inode)->file_extent_tree, 0,
> +				    &start, &end, EXTENT_DIRTY, NULL);
> +	if (!ret && start == 0)
> +		isize = min(isize, end + 1);
> +	else
> +		isize = 0;
> +	BTRFS_I(inode)->disk_i_size = isize;
> +	spin_unlock(&BTRFS_I(inode)->lock);
> +}
> +


What if we have the following layout for an inode:

[----1M-EXTЕNT---][----2M-HOLE--------][------3M-EXTENT-----------]

In this case the idisk size should be 4m whereas with the above code it
will be 1M. Isn't this wrong?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] btrfs: introduce the inode->file_extent_tree
  2020-01-30 11:17   ` Nikolay Borisov
@ 2020-01-30 11:25     ` Filipe Manana
  0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2020-01-30 11:25 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team, Filipe Manana

On Thu, Jan 30, 2020 at 11:19 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 17.01.20 г. 16:02 ч., Josef Bacik wrote:
> > In order to keep track of where we have file extents on disk, and thus
> > where it is safe to adjust the i_size to, we need to have a tree in
> > place to keep track of the contiguous areas we have file extents for.
> > Add helpers to use this tree, as it's not required for NO_HOLES file
> > systems.  We will use this by setting DIRTY for areas we know we have
> > file extent item's set, and clearing it when we remove file extent items
> > for truncation.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > Reviewed-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >  fs/btrfs/btrfs_inode.h    |  5 +++
> >  fs/btrfs/ctree.h          |  5 +++
> >  fs/btrfs/extent-io-tree.h |  1 +
> >  fs/btrfs/extent_io.c      | 11 +++++
> >  fs/btrfs/file-item.c      | 91 +++++++++++++++++++++++++++++++++++++++
> >  fs/btrfs/inode.c          |  6 +++
> >  6 files changed, 119 insertions(+)
> >
> > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> > index 4e12a477d32e..d9dcbac513ed 100644
> > --- a/fs/btrfs/btrfs_inode.h
> > +++ b/fs/btrfs/btrfs_inode.h
> > @@ -60,6 +60,11 @@ struct btrfs_inode {
> >        */
> >       struct extent_io_tree io_failure_tree;
> >
> > +     /* keeps track of where we have extent items mapped in order to make
> > +      * sure our i_size adjustments are accurate.
> > +      */
> > +     struct extent_io_tree file_extent_tree;
> > +
> >       /* held while logging the inode in tree-log.c */
> >       struct mutex log_mutex;
> >
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 00cf1641f1b9..8a2c1665baad 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -2851,6 +2851,11 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
> >                                    struct btrfs_file_extent_item *fi,
> >                                    const bool new_inline,
> >                                    struct extent_map *em);
> > +int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
> > +                                     u64 len);
> > +int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
> > +                                   u64 len);
> > +void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize);
> >
> >  /* inode.c */
> >  struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
> > diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> > index a3febe746c79..c8bcd2e3184c 100644
> > --- a/fs/btrfs/extent-io-tree.h
> > +++ b/fs/btrfs/extent-io-tree.h
> > @@ -44,6 +44,7 @@ enum {
> >       IO_TREE_TRANS_DIRTY_PAGES,
> >       IO_TREE_ROOT_DIRTY_LOG_PAGES,
> >       IO_TREE_SELFTEST,
> > +     IO_TREE_INODE_FILE_EXTENT,
> >  };
> >
> >  struct extent_io_tree {
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 0d40cd7427ba..f9b223d827b3 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -265,6 +265,15 @@ void __cold extent_io_exit(void)
> >       bioset_exit(&btrfs_bioset);
> >  }
> >
> > +/*
> > + * For the file_extent_tree, we want to hold the inode lock when we lookup and
> > + * update the disk_i_size, but lockdep will complain because our io_tree we hold
> > + * the tree lock and get the inode lock when setting delalloc.  These two things
> > + * are unrelated, so make a class for the file_extent_tree so we don't get the
> > + * two locking patterns mixed up.
> > + */
> > +static struct lock_class_key file_extent_tree_class;
> > +
> >  void extent_io_tree_init(struct btrfs_fs_info *fs_info,
> >                        struct extent_io_tree *tree, unsigned int owner,
> >                        void *private_data)
> > @@ -276,6 +285,8 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
> >       spin_lock_init(&tree->lock);
> >       tree->private_data = private_data;
> >       tree->owner = owner;
> > +     if (owner == IO_TREE_INODE_FILE_EXTENT)
> > +             lockdep_set_class(&tree->lock, &file_extent_tree_class);
> >  }
> >
> >  void extent_io_tree_release(struct extent_io_tree *tree)
> > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> > index c2f365662d55..e5dc6c4b2f05 100644
> > --- a/fs/btrfs/file-item.c
> > +++ b/fs/btrfs/file-item.c
> > @@ -23,6 +23,97 @@
> >  #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
> >                                      PAGE_SIZE))
> >
> > +/**
> > + * @inode - the inode we want to update the disk_i_size for
> > + * @new_isize - the isize we want to set to, 0 if we use i_size
> > + *
> > + * With NO_HOLES set this simply sets the disk_is_size to whatever i_size_read()
> > + * returns as it is perfectly fine with a file that has holes without hole file
> > + * extent items.
> > + *
> > + * However for !NO_HOLES we need to only return the area that is contiguous from
> > + * the 0 offset of the file.  Otherwise we could end up adjust i_size up to an
> > + * extent that has a gap in between.
> > + *
> > + * Finally new_isize should only be set in the case of truncate where we're not
> > + * ready to use i_size_read() as the limiter yet.
> > + */
> > +void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize)
> > +{
> > +     struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> > +     u64 start, end, isize;
> > +     int ret;
> > +
> > +     isize = new_isize ? new_isize : i_size_read(inode);
> > +     if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
> > +             BTRFS_I(inode)->disk_i_size = isize;
> > +             return;
> > +     }
> > +
> > +     spin_lock(&BTRFS_I(inode)->lock);
> > +     ret = find_first_extent_bit(&BTRFS_I(inode)->file_extent_tree, 0,
> > +                                 &start, &end, EXTENT_DIRTY, NULL);
> > +     if (!ret && start == 0)
> > +             isize = min(isize, end + 1);
> > +     else
> > +             isize = 0;
> > +     BTRFS_I(inode)->disk_i_size = isize;
> > +     spin_unlock(&BTRFS_I(inode)->lock);
> > +}
> > +
>
>
> What if we have the following layout for an inode:
>
> [----1M-EXTЕNT---][----2M-HOLE--------][------3M-EXTENT-----------]
>
> In this case the idisk size should be 4m whereas with the above code it
> will be 1M. Isn't this wrong?

No. That's precisely the point - to set it to 1Mb because there's an
implicit hole after it. For explicit holes, it will be 4M (that's done
in another patch in the series).



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES
  2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (5 preceding siblings ...)
  2020-01-17 14:02 ` [PATCH 6/6] btrfs: delete the ordered isize update code Josef Bacik
@ 2020-02-03 17:59 ` David Sterba
  2020-02-03 19:52   ` Filipe Manana
  6 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-02-03 17:59 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jan 17, 2020 at 09:02:18AM -0500, Josef Bacik wrote:
> v2->v3:
> - Rebased onto misc-next.
> - Added a patch to stop using ->leave_spinning in truncate_inode_items.
> 
> v1->v2:
> - fixed a bug in 'btrfs: use the file extent tree infrastructure' that would
>   result in 0 length files because btrfs_truncate_inode_items() was clearing the
>   file extent map when we fsync'ed multiple times.  Validated this with a
>   modified fsx and generic/521 that reproduced the problem, those modifications
>   were sent up as well.
> - dropped the RFC
> 
> ----------------- Original Message -----------------------
> We've historically had this problem where you could flush a targeted section of
> an inode and end up with a hole between extents without a hole extent item.
> This of course makes fsck complain because this is not ok for a file system that
> doesn't have NO_HOLES set.  Because this is a well understood problem I and
> others have been ignoring fsck failures during certain xfstests (generic/475 for
> example) because they would regularly trigger this edge case.
> 
> However this isn't a great behavior to have, we should really be taking all fsck
> failures seriously, and we could potentially ignore fsck legitimate fsck errors
> because we expect it to be this particular failure.
> 
> In order to fix this we need to keep track of where we have valid extent items,
> and only update i_size to encompass that area.  This unfortunately means we need
> a new per-inode extent_io_tree to keep track of the valid ranges.  This is
> relatively straightforward in practice, and helpers have been added to manage
> this so that in the case of a NO_HOLES file system we just simply skip this work
> altogether.
> 
> I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
> really like Filipe to take a look and I still have some longer running tests
> going on the series.  All of our boxes internally are btrfs and the box I was
> testing on ended up with a weird RPM db corruption that was likely from an
> earlier, broken version of the patch.  However I cannot be 100% sure that was
> the case, so I'm giving it a few more days of testing before I'm satisfied
> there's not some weird thing that RPM does that xfstests doesn't cover.
> 
> This has gone through several iterations of xfstests already, including many
> loops of generic/475 for validation to make sure it was no longer failing.  So
> far so good, but for something like this wider testing will definitely be
> necessary.  Thanks,

I've reviewed the series and will add it to for-next. The i_size
tracking seems to be an important part that we want to merge before
NO_HOLES is default in mkfs. It would be good to steer more focus on
that during testing. If everything goes fine, the mkfs default can
happen in 5.7. Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] btrfs: introduce the inode->file_extent_tree
  2020-01-17 14:02 ` [PATCH 3/6] btrfs: introduce the inode->file_extent_tree Josef Bacik
  2020-01-30 11:17   ` Nikolay Borisov
@ 2020-02-03 18:11   ` David Sterba
  1 sibling, 0 replies; 16+ messages in thread
From: David Sterba @ 2020-02-03 18:11 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Filipe Manana

On Fri, Jan 17, 2020 at 09:02:21AM -0500, Josef Bacik wrote:
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index a3febe746c79..c8bcd2e3184c 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -44,6 +44,7 @@ enum {
>  	IO_TREE_TRANS_DIRTY_PAGES,
>  	IO_TREE_ROOT_DIRTY_LOG_PAGES,
>  	IO_TREE_SELFTEST,
> +	IO_TREE_INODE_FILE_EXTENT,

I've reordered the two and added the missing enum -> string map for the
tracepoints.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES
  2020-02-03 17:59 ` [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES David Sterba
@ 2020-02-03 19:52   ` Filipe Manana
  0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2020-02-03 19:52 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Mon, Feb 3, 2020 at 7:11 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, Jan 17, 2020 at 09:02:18AM -0500, Josef Bacik wrote:
> > v2->v3:
> > - Rebased onto misc-next.
> > - Added a patch to stop using ->leave_spinning in truncate_inode_items.
> >
> > v1->v2:
> > - fixed a bug in 'btrfs: use the file extent tree infrastructure' that would
> >   result in 0 length files because btrfs_truncate_inode_items() was clearing the
> >   file extent map when we fsync'ed multiple times.  Validated this with a
> >   modified fsx and generic/521 that reproduced the problem, those modifications
> >   were sent up as well.
> > - dropped the RFC
> >
> > ----------------- Original Message -----------------------
> > We've historically had this problem where you could flush a targeted section of
> > an inode and end up with a hole between extents without a hole extent item.
> > This of course makes fsck complain because this is not ok for a file system that
> > doesn't have NO_HOLES set.  Because this is a well understood problem I and
> > others have been ignoring fsck failures during certain xfstests (generic/475 for
> > example) because they would regularly trigger this edge case.
> >
> > However this isn't a great behavior to have, we should really be taking all fsck
> > failures seriously, and we could potentially ignore fsck legitimate fsck errors
> > because we expect it to be this particular failure.
> >
> > In order to fix this we need to keep track of where we have valid extent items,
> > and only update i_size to encompass that area.  This unfortunately means we need
> > a new per-inode extent_io_tree to keep track of the valid ranges.  This is
> > relatively straightforward in practice, and helpers have been added to manage
> > this so that in the case of a NO_HOLES file system we just simply skip this work
> > altogether.
> >
> > I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
> > really like Filipe to take a look and I still have some longer running tests
> > going on the series.  All of our boxes internally are btrfs and the box I was
> > testing on ended up with a weird RPM db corruption that was likely from an
> > earlier, broken version of the patch.  However I cannot be 100% sure that was
> > the case, so I'm giving it a few more days of testing before I'm satisfied
> > there's not some weird thing that RPM does that xfstests doesn't cover.
> >
> > This has gone through several iterations of xfstests already, including many
> > loops of generic/475 for validation to make sure it was no longer failing.  So
> > far so good, but for something like this wider testing will definitely be
> > necessary.  Thanks,
>
> I've reviewed the series and will add it to for-next. The i_size
> tracking seems to be an important part that we want to merge before
> NO_HOLES is default in mkfs.

Can you elaborate a bit? I don't understand why you say this is
important in order to make NO_HOLES a default.
This series fixes a problem that only happens when not using the
NO_HOLES feature.

Thanks.

> It would be good to steer more focus on
> that during testing. If everything goes fine, the mkfs default can
> happen in 5.7. Thanks.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] btrfs: use the file extent tree infrastructure
  2020-01-17 14:02 ` [PATCH 4/6] btrfs: use the file extent tree infrastructure Josef Bacik
@ 2020-03-06 11:51   ` Filipe Manana
  2020-03-06 14:52     ` Josef Bacik
  0 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2020-03-06 11:51 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Filipe Manana

On Fri, Jan 17, 2020 at 2:03 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We want to use this everywhere we modify the file extent items
> permanently.  These include
>
> 1) Inserting new file extents for writes and prealloc extents.
> 2) Truncating inode items.
> 3) btrfs_cont_expand().
> 4) Insert inline extents.
> 5) Insert new extents from log replay.
> 6) Insert a new extent for clone, as it could be past isize.
>
> We do not however call the clear helper for hole punching because it
> simply swaps out an existing file extent for a hole, so there's
> effectively no change as far as the i_size is concerned.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/delayed-inode.c |  4 +++
>  fs/btrfs/file.c          |  6 ++++
>  fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/tree-log.c      |  5 ++++
>  4 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>  {
>         struct btrfs_delayed_node *delayed_node;
>         struct btrfs_inode_item *inode_item;
> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>
>         delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>         if (!delayed_node)
> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>         i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>         i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>         btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> +                                         round_up(i_size_read(inode),
> +                                                  fs_info->sectorsize));
>         inode->i_mode = btrfs_stack_inode_mode(inode_item);
>         set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
>         inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index c6f9029e3d49..f72fb38e9aaa 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2482,6 +2482,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                               clone_info->file_offset,
> +                                               clone_len);
> +       if (ret)
> +               return ret;
> +
>         /* If it's a hole, nothing more needs to be done. */
>         if (clone_info->disk_offset == 0)
>                 return 0;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index fd8f821a3919..21fb80292256 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -241,6 +241,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       /*
> +        * We align size to sectorsize for inline extents just for simplicity
> +        * sake.
> +        */
> +       size = ALIGN(size, root->fs_info->sectorsize);
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
> +       if (ret)
> +               goto fail;
> +
>         /*
>          * we're an inline extent, so nobody can
>          * extend the file past i_size without locking
> @@ -2375,6 +2384,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
>         ins.offset = disk_num_bytes;
>         ins.type = BTRFS_EXTENT_ITEM_KEY;
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
> +                                               ram_bytes);
> +       if (ret)
> +               goto out;
> +
>         /*
>          * Release the reserved range from inode dirty range map, as it is
>          * already moved into delayed_ref_head
> @@ -4084,6 +4098,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>         }
>
>         while (1) {
> +               u64 clear_start = 0, clear_len = 0;
> +
>                 fi = NULL;
>                 leaf = path->nodes[0];
>                 btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> @@ -4134,6 +4150,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>
>                 if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
>                         u64 num_dec;
> +
> +                       clear_start = found_key.offset;
>                         extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>                         if (!del_item) {
>                                 u64 orig_num_bytes =
> @@ -4141,6 +4159,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                 extent_num_bytes = ALIGN(new_size -
>                                                 found_key.offset,
>                                                 fs_info->sectorsize);
> +                               clear_start = ALIGN(new_size,
> +                                                   fs_info->sectorsize);
>                                 btrfs_set_file_extent_num_bytes(leaf, fi,
>                                                          extent_num_bytes);
>                                 num_dec = (orig_num_bytes -
> @@ -4166,6 +4186,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                                 inode_sub_bytes(inode, num_dec);
>                                 }
>                         }
> +                       clear_len = num_dec;
>                 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
>                         /*
>                          * we can't truncate inline items that have had
> @@ -4187,12 +4208,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                  */
>                                 ret = NEED_TRUNCATE_BLOCK;
>                                 break;
> +                       } else {
> +                               /*
> +                                * Inline extents are special, we just treat
> +                                * them as a full sector worth in the file
> +                                * extent tree just for simplicity sake.
> +                                */
> +                               clear_len = fs_info->sectorsize;
>                         }
>
>                         if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>                                 inode_sub_bytes(inode, item_end + 1 - new_size);
>                 }
>  delete:
> +               /*
> +                * We use btrfs_truncate_inode_items() to clean up log trees for
> +                * multiple fsyncs, and in this case we don't want to clear the
> +                * file extent range because it's just the log.
> +                */
> +               if (root == BTRFS_I(inode)->root) {
> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                                                 clear_start,
> +                                                                 clear_len);
> +                       if (ret) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               break;
> +                       }
> +               }
> +
>                 if (del_item)
>                         last_size = found_key.offset;
>                 else
> @@ -4513,14 +4556,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                 }
>                 last_byte = min(extent_map_end(em), block_end);
>                 last_byte = ALIGN(last_byte, fs_info->sectorsize);
> +               hole_size = last_byte - cur_offset;
> +
>                 if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
>                         struct extent_map *hole_em;
> -                       hole_size = last_byte - cur_offset;
>
>                         err = maybe_insert_hole(root, inode, cur_offset,
>                                                 hole_size);
>                         if (err)
>                                 break;
> +
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
> +
>                         btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
>                                                 cur_offset + hole_size - 1, 0);
>                         hole_em = alloc_extent_map();
> @@ -4552,6 +4603,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                                                         hole_size - 1, 0);
>                         }
>                         free_extent_map(hole_em);
> +               } else {
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
>                 }
>  next:
>                 free_extent_map(em);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 80978ebfdcbb..56278a5b69e4 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -830,6 +830,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>                         goto out;
>         }
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
> +                                               extent_end - start);
> +       if (ret)
> +               goto out;

So while working on making ranged fsyncs for the slow patch (inode has
the 'need full sync' flag set), I uncovered more cases where we end up
with missing file extent items.

When doing a fast fsync, we log only the extents in the given range,
but we log an inode item with the current i_size of the inode.
So after a log replay we can get missing file extent items.

For example this test case I made earlier today (also at
https://pastebin.com/ezFFGEf8 for better readability):

#! /bin/bash
# SPDX-License-Identifier: GPL-2.0
# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved.
#
# FSQA Test No. XXX
#
# TODO description
#
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
_cleanup_flakey
cd /
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/attr
. ./common/filter
. ./common/dmflakey

# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_target flakey
_require_btrfs_fs_feature "no_holes"
_require_btrfs_mkfs_feature "no-holes"
_require_xfs_io_command "sync_range"

rm -f $seqres.full

_scratch_mkfs -O ^no-holes >>$seqres.full 2>&1
_require_metadata_journaling $SCRATCH_DEV
_init_flakey
_mount_flakey

touch $SCRATCH_MNT/foo
# Clear the full sync bit, so that the msync below triggers the fast fsync path.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

# get rid of the log tree
sync

# Dirty some pages and flush only some of them at the beginning without updating
# the log tree.
$XFS_IO_PROG -c "pwrite -S 0xab 0 1M" \
     -c "sync_range -abw 0 256K" \
     $SCRATCH_MNT/foo >>$seqres.full

$XFS_IO_PROG \
        -c "mmap -w 512K 512K"        \
        -c "mwrite -S 0xcd 512K 512K" \
        -c "msync -s 512K 512K"       \
        -c "munmap"                   \
        $SCRATCH_MNT/foo >>$seqres.full

_flakey_drop_and_remount
_unmount_flakey

# On exit fsck is run and reports missing file extent for range from 0 to 512Kb.

status=0
exit

We end up an unmarked hole at 0-512K.
After thinking a bit, I don't see an easy/simple way to fix this.
The only reliable way I can think of so sar is:  after replaying the
extents of an inode, inserting file extent items to represent holes
(only when not using NO_HOLES of course). That would imply scanning
all extents items in the fs/subvolume tree to check for gaps.

Any better idea?

Thanks.




> +
>         inode_add_bytes(inode, nbytes);
>  update_inode:
>         ret = btrfs_update_inode(trans, root, inode);
> --
> 2.24.1
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] btrfs: use the file extent tree infrastructure
  2020-03-06 11:51   ` Filipe Manana
@ 2020-03-06 14:52     ` Josef Bacik
  2020-03-06 15:01       ` Filipe Manana
  0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-03-06 14:52 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team, Filipe Manana

On 3/6/20 6:51 AM, Filipe Manana wrote:
> On Fri, Jan 17, 2020 at 2:03 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> We want to use this everywhere we modify the file extent items
>> permanently.  These include
>>
>> 1) Inserting new file extents for writes and prealloc extents.
>> 2) Truncating inode items.
>> 3) btrfs_cont_expand().
>> 4) Insert inline extents.
>> 5) Insert new extents from log replay.
>> 6) Insert a new extent for clone, as it could be past isize.
>>
>> We do not however call the clear helper for hole punching because it
>> simply swaps out an existing file extent for a hole, so there's
>> effectively no change as far as the i_size is concerned.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> ---
>>   fs/btrfs/delayed-inode.c |  4 +++
>>   fs/btrfs/file.c          |  6 ++++
>>   fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/tree-log.c      |  5 ++++
>>   4 files changed, 73 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index d3e15e1d4a91..8b4dcf4f6b3e 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>>   {
>>          struct btrfs_delayed_node *delayed_node;
>>          struct btrfs_inode_item *inode_item;
>> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>>
>>          delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>>          if (!delayed_node)
>> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>>          i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>>          i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>>          btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
>> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
>> +                                         round_up(i_size_read(inode),
>> +                                                  fs_info->sectorsize));
>>          inode->i_mode = btrfs_stack_inode_mode(inode_item);
>>          set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
>>          inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index c6f9029e3d49..f72fb38e9aaa 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -2482,6 +2482,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
>>          btrfs_mark_buffer_dirty(leaf);
>>          btrfs_release_path(path);
>>
>> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
>> +                                               clone_info->file_offset,
>> +                                               clone_len);
>> +       if (ret)
>> +               return ret;
>> +
>>          /* If it's a hole, nothing more needs to be done. */
>>          if (clone_info->disk_offset == 0)
>>                  return 0;
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index fd8f821a3919..21fb80292256 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -241,6 +241,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
>>          btrfs_mark_buffer_dirty(leaf);
>>          btrfs_release_path(path);
>>
>> +       /*
>> +        * We align size to sectorsize for inline extents just for simplicity
>> +        * sake.
>> +        */
>> +       size = ALIGN(size, root->fs_info->sectorsize);
>> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
>> +       if (ret)
>> +               goto fail;
>> +
>>          /*
>>           * we're an inline extent, so nobody can
>>           * extend the file past i_size without locking
>> @@ -2375,6 +2384,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
>>          ins.offset = disk_num_bytes;
>>          ins.type = BTRFS_EXTENT_ITEM_KEY;
>>
>> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
>> +                                               ram_bytes);
>> +       if (ret)
>> +               goto out;
>> +
>>          /*
>>           * Release the reserved range from inode dirty range map, as it is
>>           * already moved into delayed_ref_head
>> @@ -4084,6 +4098,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>          }
>>
>>          while (1) {
>> +               u64 clear_start = 0, clear_len = 0;
>> +
>>                  fi = NULL;
>>                  leaf = path->nodes[0];
>>                  btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> @@ -4134,6 +4150,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>
>>                  if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
>>                          u64 num_dec;
>> +
>> +                       clear_start = found_key.offset;
>>                          extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>>                          if (!del_item) {
>>                                  u64 orig_num_bytes =
>> @@ -4141,6 +4159,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>                                  extent_num_bytes = ALIGN(new_size -
>>                                                  found_key.offset,
>>                                                  fs_info->sectorsize);
>> +                               clear_start = ALIGN(new_size,
>> +                                                   fs_info->sectorsize);
>>                                  btrfs_set_file_extent_num_bytes(leaf, fi,
>>                                                           extent_num_bytes);
>>                                  num_dec = (orig_num_bytes -
>> @@ -4166,6 +4186,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>                                                  inode_sub_bytes(inode, num_dec);
>>                                  }
>>                          }
>> +                       clear_len = num_dec;
>>                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
>>                          /*
>>                           * we can't truncate inline items that have had
>> @@ -4187,12 +4208,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>                                   */
>>                                  ret = NEED_TRUNCATE_BLOCK;
>>                                  break;
>> +                       } else {
>> +                               /*
>> +                                * Inline extents are special, we just treat
>> +                                * them as a full sector worth in the file
>> +                                * extent tree just for simplicity sake.
>> +                                */
>> +                               clear_len = fs_info->sectorsize;
>>                          }
>>
>>                          if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>>                                  inode_sub_bytes(inode, item_end + 1 - new_size);
>>                  }
>>   delete:
>> +               /*
>> +                * We use btrfs_truncate_inode_items() to clean up log trees for
>> +                * multiple fsyncs, and in this case we don't want to clear the
>> +                * file extent range because it's just the log.
>> +                */
>> +               if (root == BTRFS_I(inode)->root) {
>> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
>> +                                                                 clear_start,
>> +                                                                 clear_len);
>> +                       if (ret) {
>> +                               btrfs_abort_transaction(trans, ret);
>> +                               break;
>> +                       }
>> +               }
>> +
>>                  if (del_item)
>>                          last_size = found_key.offset;
>>                  else
>> @@ -4513,14 +4556,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>>                  }
>>                  last_byte = min(extent_map_end(em), block_end);
>>                  last_byte = ALIGN(last_byte, fs_info->sectorsize);
>> +               hole_size = last_byte - cur_offset;
>> +
>>                  if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
>>                          struct extent_map *hole_em;
>> -                       hole_size = last_byte - cur_offset;
>>
>>                          err = maybe_insert_hole(root, inode, cur_offset,
>>                                                  hole_size);
>>                          if (err)
>>                                  break;
>> +
>> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
>> +                                                               cur_offset,
>> +                                                               hole_size);
>> +                       if (err)
>> +                               break;
>> +
>>                          btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
>>                                                  cur_offset + hole_size - 1, 0);
>>                          hole_em = alloc_extent_map();
>> @@ -4552,6 +4603,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>>                                                          hole_size - 1, 0);
>>                          }
>>                          free_extent_map(hole_em);
>> +               } else {
>> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
>> +                                                               cur_offset,
>> +                                                               hole_size);
>> +                       if (err)
>> +                               break;
>>                  }
>>   next:
>>                  free_extent_map(em);
>> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
>> index 80978ebfdcbb..56278a5b69e4 100644
>> --- a/fs/btrfs/tree-log.c
>> +++ b/fs/btrfs/tree-log.c
>> @@ -830,6 +830,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>>                          goto out;
>>          }
>>
>> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
>> +                                               extent_end - start);
>> +       if (ret)
>> +               goto out;
> 
> So while working on making ranged fsyncs for the slow patch (inode has
> the 'need full sync' flag set), I uncovered more cases where we end up
> with missing file extent items.
> 
> When doing a fast fsync, we log only the extents in the given range,
> but we log an inode item with the current i_size of the inode.
> So after a log replay we can get missing file extent items.
> 

For this case I think we need to not just log that current range, we need to log 
anything that was changed leading up to that offset.  Range fsync is just an 
optimization, in the !NO_HOLES case we need to make sure we're still leaving the 
fs in a consistent state, so if we have any modified extents that lead up to our 
range those need to be logged as well.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] btrfs: use the file extent tree infrastructure
  2020-03-06 14:52     ` Josef Bacik
@ 2020-03-06 15:01       ` Filipe Manana
  0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2020-03-06 15:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Mar 6, 2020 at 2:52 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 3/6/20 6:51 AM, Filipe Manana wrote:
> > On Fri, Jan 17, 2020 at 2:03 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> We want to use this everywhere we modify the file extent items
> >> permanently.  These include
> >>
> >> 1) Inserting new file extents for writes and prealloc extents.
> >> 2) Truncating inode items.
> >> 3) btrfs_cont_expand().
> >> 4) Insert inline extents.
> >> 5) Insert new extents from log replay.
> >> 6) Insert a new extent for clone, as it could be past isize.
> >>
> >> We do not however call the clear helper for hole punching because it
> >> simply swaps out an existing file extent for a hole, so there's
> >> effectively no change as far as the i_size is concerned.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >> ---
> >>   fs/btrfs/delayed-inode.c |  4 +++
> >>   fs/btrfs/file.c          |  6 ++++
> >>   fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
> >>   fs/btrfs/tree-log.c      |  5 ++++
> >>   4 files changed, 73 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> >> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> >> --- a/fs/btrfs/delayed-inode.c
> >> +++ b/fs/btrfs/delayed-inode.c
> >> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
> >>   {
> >>          struct btrfs_delayed_node *delayed_node;
> >>          struct btrfs_inode_item *inode_item;
> >> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> >>
> >>          delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
> >>          if (!delayed_node)
> >> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
> >>          i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
> >>          i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
> >>          btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> >> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> >> +                                         round_up(i_size_read(inode),
> >> +                                                  fs_info->sectorsize));
> >>          inode->i_mode = btrfs_stack_inode_mode(inode_item);
> >>          set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
> >>          inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> >> index c6f9029e3d49..f72fb38e9aaa 100644
> >> --- a/fs/btrfs/file.c
> >> +++ b/fs/btrfs/file.c
> >> @@ -2482,6 +2482,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
> >>          btrfs_mark_buffer_dirty(leaf);
> >>          btrfs_release_path(path);
> >>
> >> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> >> +                                               clone_info->file_offset,
> >> +                                               clone_len);
> >> +       if (ret)
> >> +               return ret;
> >> +
> >>          /* If it's a hole, nothing more needs to be done. */
> >>          if (clone_info->disk_offset == 0)
> >>                  return 0;
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index fd8f821a3919..21fb80292256 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -241,6 +241,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
> >>          btrfs_mark_buffer_dirty(leaf);
> >>          btrfs_release_path(path);
> >>
> >> +       /*
> >> +        * We align size to sectorsize for inline extents just for simplicity
> >> +        * sake.
> >> +        */
> >> +       size = ALIGN(size, root->fs_info->sectorsize);
> >> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
> >> +       if (ret)
> >> +               goto fail;
> >> +
> >>          /*
> >>           * we're an inline extent, so nobody can
> >>           * extend the file past i_size without locking
> >> @@ -2375,6 +2384,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
> >>          ins.offset = disk_num_bytes;
> >>          ins.type = BTRFS_EXTENT_ITEM_KEY;
> >>
> >> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
> >> +                                               ram_bytes);
> >> +       if (ret)
> >> +               goto out;
> >> +
> >>          /*
> >>           * Release the reserved range from inode dirty range map, as it is
> >>           * already moved into delayed_ref_head
> >> @@ -4084,6 +4098,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
> >>          }
> >>
> >>          while (1) {
> >> +               u64 clear_start = 0, clear_len = 0;
> >> +
> >>                  fi = NULL;
> >>                  leaf = path->nodes[0];
> >>                  btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> >> @@ -4134,6 +4150,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
> >>
> >>                  if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
> >>                          u64 num_dec;
> >> +
> >> +                       clear_start = found_key.offset;
> >>                          extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
> >>                          if (!del_item) {
> >>                                  u64 orig_num_bytes =
> >> @@ -4141,6 +4159,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
> >>                                  extent_num_bytes = ALIGN(new_size -
> >>                                                  found_key.offset,
> >>                                                  fs_info->sectorsize);
> >> +                               clear_start = ALIGN(new_size,
> >> +                                                   fs_info->sectorsize);
> >>                                  btrfs_set_file_extent_num_bytes(leaf, fi,
> >>                                                           extent_num_bytes);
> >>                                  num_dec = (orig_num_bytes -
> >> @@ -4166,6 +4186,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
> >>                                                  inode_sub_bytes(inode, num_dec);
> >>                                  }
> >>                          }
> >> +                       clear_len = num_dec;
> >>                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
> >>                          /*
> >>                           * we can't truncate inline items that have had
> >> @@ -4187,12 +4208,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
> >>                                   */
> >>                                  ret = NEED_TRUNCATE_BLOCK;
> >>                                  break;
> >> +                       } else {
> >> +                               /*
> >> +                                * Inline extents are special, we just treat
> >> +                                * them as a full sector worth in the file
> >> +                                * extent tree just for simplicity sake.
> >> +                                */
> >> +                               clear_len = fs_info->sectorsize;
> >>                          }
> >>
> >>                          if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
> >>                                  inode_sub_bytes(inode, item_end + 1 - new_size);
> >>                  }
> >>   delete:
> >> +               /*
> >> +                * We use btrfs_truncate_inode_items() to clean up log trees for
> >> +                * multiple fsyncs, and in this case we don't want to clear the
> >> +                * file extent range because it's just the log.
> >> +                */
> >> +               if (root == BTRFS_I(inode)->root) {
> >> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> >> +                                                                 clear_start,
> >> +                                                                 clear_len);
> >> +                       if (ret) {
> >> +                               btrfs_abort_transaction(trans, ret);
> >> +                               break;
> >> +                       }
> >> +               }
> >> +
> >>                  if (del_item)
> >>                          last_size = found_key.offset;
> >>                  else
> >> @@ -4513,14 +4556,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
> >>                  }
> >>                  last_byte = min(extent_map_end(em), block_end);
> >>                  last_byte = ALIGN(last_byte, fs_info->sectorsize);
> >> +               hole_size = last_byte - cur_offset;
> >> +
> >>                  if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
> >>                          struct extent_map *hole_em;
> >> -                       hole_size = last_byte - cur_offset;
> >>
> >>                          err = maybe_insert_hole(root, inode, cur_offset,
> >>                                                  hole_size);
> >>                          if (err)
> >>                                  break;
> >> +
> >> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> >> +                                                               cur_offset,
> >> +                                                               hole_size);
> >> +                       if (err)
> >> +                               break;
> >> +
> >>                          btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
> >>                                                  cur_offset + hole_size - 1, 0);
> >>                          hole_em = alloc_extent_map();
> >> @@ -4552,6 +4603,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
> >>                                                          hole_size - 1, 0);
> >>                          }
> >>                          free_extent_map(hole_em);
> >> +               } else {
> >> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> >> +                                                               cur_offset,
> >> +                                                               hole_size);
> >> +                       if (err)
> >> +                               break;
> >>                  }
> >>   next:
> >>                  free_extent_map(em);
> >> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> >> index 80978ebfdcbb..56278a5b69e4 100644
> >> --- a/fs/btrfs/tree-log.c
> >> +++ b/fs/btrfs/tree-log.c
> >> @@ -830,6 +830,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
> >>                          goto out;
> >>          }
> >>
> >> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
> >> +                                               extent_end - start);
> >> +       if (ret)
> >> +               goto out;
> >
> > So while working on making ranged fsyncs for the slow patch (inode has
> > the 'need full sync' flag set), I uncovered more cases where we end up
> > with missing file extent items.
> >
> > When doing a fast fsync, we log only the extents in the given range,
> > but we log an inode item with the current i_size of the inode.
> > So after a log replay we can get missing file extent items.
> >
>
> For this case I think we need to not just log that current range, we need to log
> anything that was changed leading up to that offset.  Range fsync is just an
> optimization, in the !NO_HOLES case we need to make sure we're still leaving the
> fs in a consistent state, so if we have any modified extents that lead up to our
> range those need to be logged as well.  Thanks,

Sounds reasonable, I don't any efficient way in mind to do it either.
So for any type of fsync (full or fast) always reset the start offset
to 0 when not using NO_HOLES, and leave the end offset alone.
I'll introduce a patch in my patchset for that and send the test case.

Thanks.

>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-06 15:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 14:02 [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
2020-01-17 14:02 ` [PATCH 1/6] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
2020-01-17 14:02 ` [PATCH 2/6] btrfs: don't use path->leave_spinning for truncate Josef Bacik
2020-01-17 15:08   ` Filipe Manana
2020-01-17 14:02 ` [PATCH 3/6] btrfs: introduce the inode->file_extent_tree Josef Bacik
2020-01-30 11:17   ` Nikolay Borisov
2020-01-30 11:25     ` Filipe Manana
2020-02-03 18:11   ` David Sterba
2020-01-17 14:02 ` [PATCH 4/6] btrfs: use the file extent tree infrastructure Josef Bacik
2020-03-06 11:51   ` Filipe Manana
2020-03-06 14:52     ` Josef Bacik
2020-03-06 15:01       ` Filipe Manana
2020-01-17 14:02 ` [PATCH 5/6] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
2020-01-17 14:02 ` [PATCH 6/6] btrfs: delete the ordered isize update code Josef Bacik
2020-02-03 17:59 ` [PATCH 0/6][v3] btrfs: fix hole corruption issue with !NO_HOLES David Sterba
2020-02-03 19:52   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).