linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES
@ 2019-12-30 21:31 Josef Bacik
  2019-12-30 21:31 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We've historically had this problem where you could flush a targeted section of
an inode and end up with a hole between extents without a hole extent item.
This of course makes fsck complain because this is not ok for a file system that
doesn't have NO_HOLES set.  Because this is a well understood problem I and
others have been ignoring fsck failures during certain xfstests (generic/475 for
example) because they would regularly trigger this edge case.

However this isn't a great behavior to have, we should really be taking all fsck
failures seriously, and we could potentially ignore fsck legitimate fsck errors
because we expect it to be this particular failure.

In order to fix this we need to keep track of where we have valid extent items,
and only update i_size to encompass that area.  This unfortunately means we need
a new per-inode extent_io_tree to keep track of the valid ranges.  This is
relatively straightforward in practice, and helpers have been added to manage
this so that in the case of a NO_HOLES file system we just simply skip this work
altogether.

I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
really like Filipe to take a look and I still have some longer running tests
going on the series.  All of our boxes internally are btrfs and the box I was
testing on ended up with a weird RPM db corruption that was likely from an
earlier, broken version of the patch.  However I cannot be 100% sure that was
the case, so I'm giving it a few more days of testing before I'm satisfied
there's not some weird thing that RPM does that xfstests doesn't cover.

This has gone through several iterations of xfstests already, including many
loops of generic/475 for validation to make sure it was no longer failing.  So
far so good, but for something like this wider testing will definitely be
necessary.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update
  2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
@ 2019-12-30 21:31 ` Josef Bacik
  2019-12-30 21:31 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We were using btrfs_i_size_write(), which unconditionally jacks up
inode->disk_i_size.  However since clone can operate on ranges we could
have pending ordered extents for a range prior to the start of our clone
operation and thus increase disk_i_size too far and have a hole with no
file extent.

Fix this by using the btrfs_ordered_update_i_size helper which will do
the right thing in the face of pending ordered extents outside of our
clone range.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8ec61f3f0291..291dda3b6547 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3332,8 +3332,10 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 	 */
 	if (endoff > destoff + olen)
 		endoff = destoff + olen;
-	if (endoff > inode->i_size)
-		btrfs_i_size_write(BTRFS_I(inode), endoff);
+	if (endoff > inode->i_size) {
+		i_size_write(inode, endoff);
+		btrfs_ordered_update_i_size(inode, endoff, NULL);
+	}
 
 	ret = btrfs_update_inode(trans, root, inode);
 	if (ret) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
  2019-12-30 21:31 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
@ 2019-12-30 21:31 ` Josef Bacik
  2020-01-06 17:22   ` David Sterba
  2020-01-07 16:46   ` David Sterba
  2019-12-30 21:31 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

In order to keep track of where we have file extents on disk, and thus
where it is safe to adjust the i_size to, we need to have a tree in
place to keep track of the contiguous areas we have file extents for.
Add helpers to use this tree, as it's not required for NO_HOLES file
systems.  We will use this by setting DIRTY for areas we know we have
file extent item's set, and clearing it when we remove file extent items
for truncation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/btrfs_inode.h    |  5 +++
 fs/btrfs/ctree.h          |  5 +++
 fs/btrfs/extent-io-tree.h |  1 +
 fs/btrfs/extent_io.c      | 11 +++++
 fs/btrfs/file-item.c      | 91 +++++++++++++++++++++++++++++++++++++++
 fs/btrfs/inode.c          |  6 +++
 6 files changed, 119 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..d9dcbac513ed 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -60,6 +60,11 @@ struct btrfs_inode {
 	 */
 	struct extent_io_tree io_failure_tree;
 
+	/* keeps track of where we have extent items mapped in order to make
+	 * sure our i_size adjustments are accurate.
+	 */
+	struct extent_io_tree file_extent_tree;
+
 	/* held while logging the inode in tree-log.c */
 	struct mutex log_mutex;
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a01ff3e0ead4..7142124d03c5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2807,6 +2807,11 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     struct btrfs_file_extent_item *fi,
 				     const bool new_inline,
 				     struct extent_map *em);
+int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
+					u64 len);
+int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
+				      u64 len);
+void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize);
 
 /* inode.c */
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index a3febe746c79..c8bcd2e3184c 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -44,6 +44,7 @@ enum {
 	IO_TREE_TRANS_DIRTY_PAGES,
 	IO_TREE_ROOT_DIRTY_LOG_PAGES,
 	IO_TREE_SELFTEST,
+	IO_TREE_INODE_FILE_EXTENT,
 };
 
 struct extent_io_tree {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e374411ed08c..410f5a64d3a6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -265,6 +265,15 @@ void __cold extent_io_exit(void)
 	bioset_exit(&btrfs_bioset);
 }
 
+/*
+ * For the file_extent_tree, we want to hold the inode lock when we lookup and
+ * update the disk_i_size, but lockdep will complain because our io_tree we hold
+ * the tree lock and get the inode lock when setting delalloc.  These two things
+ * are unrelated, so make a class for the file_extent_tree so we don't get the
+ * two locking patterns mixed up.
+ */
+static struct lock_class_key file_extent_tree_class;
+
 void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 			 struct extent_io_tree *tree, unsigned int owner,
 			 void *private_data)
@@ -276,6 +285,8 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&tree->lock);
 	tree->private_data = private_data;
 	tree->owner = owner;
+	if (owner == IO_TREE_INODE_FILE_EXTENT)
+		lockdep_set_class(&tree->lock, &file_extent_tree_class);
 }
 
 void extent_io_tree_release(struct extent_io_tree *tree)
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 1a599f50837b..b733d85510ed 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -23,6 +23,97 @@
 #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
 				       PAGE_SIZE))
 
+/**
+ * @inode - the inode we want to update the disk_i_size for
+ * @new_isize - the isize we want to set to, 0 if we use i_size
+ *
+ * With NO_HOLES set this simply sets the disk_is_size to whatever i_size_read()
+ * returns as it is perfectly fine with a file that has holes without hole file
+ * extent items.
+ *
+ * However for !NO_HOLES we need to only return the area that is contiguous from
+ * the 0 offset of the file.  Otherwise we could end up adjust i_size up to an
+ * extent that has a gap in between.
+ *
+ * Finally new_isize should only be set in the case of truncate where we're not
+ * ready to use i_size_read() as the limiter yet.
+ */
+void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize)
+{
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	u64 start, end, isize;
+	int ret;
+
+	isize = new_isize ? new_isize : i_size_read(inode);
+	if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
+		BTRFS_I(inode)->disk_i_size = isize;
+		return;
+	}
+
+	spin_lock(&BTRFS_I(inode)->lock);
+	ret = find_first_extent_bit(&BTRFS_I(inode)->file_extent_tree, 0,
+				    &start, &end, EXTENT_DIRTY, NULL);
+	if (!ret && start == 0)
+		isize = min(isize, end + 1);
+	else
+		isize = 0;
+	BTRFS_I(inode)->disk_i_size = isize;
+	spin_unlock(&BTRFS_I(inode)->lock);
+}
+
+/**
+ * @inode - the inode we're modifying
+ * @start - the start file offset of the file extent we've inserted
+ * @len - the logical length of the file extent item
+ *
+ * Call when we are insering a new file extent where there was none before.
+ * Does not need to call this in the case where we're replacing an existing file
+ * extent, however if you're unsure it's fine to call this multiple times.
+ *
+ * The start and len must match the file extent item, so thus must be sectorsize
+ * aligned.
+ */
+int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
+				      u64 len)
+{
+	if (len == 0)
+		return 0;
+
+	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize));
+
+	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
+		return 0;
+	return set_extent_bits(&inode->file_extent_tree, start, start + len - 1,
+			       EXTENT_DIRTY);
+}
+
+/**
+ * @inode - the inode we're modifying
+ * @start - the start file offset of the file extent we've inserted
+ * @len - the logical length of the file extent item
+ *
+ * Called when we drop a file extent, for example when we truncate.  Doesn't
+ * need to be called for cases where we're replacing a file extent, like when
+ * we've cow'ed a file extent.
+ *
+ * The start and len must match the file extent item, so thus must be sectorsize
+ * aligned.
+ */
+int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
+					u64 len)
+{
+	if (len == 0)
+		return 0;
+
+	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize) ||
+	       len == (u64)-1);
+
+	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
+		return 0;
+	return clear_extent_bit(&inode->file_extent_tree, start,
+				start + len - 1, EXTENT_DIRTY, 0, 0, NULL);
+}
+
 static inline u32 max_ordered_sum_bytes(struct btrfs_fs_info *fs_info,
 					u16 csum_size)
 {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index db67e1984c91..ab8b972863b1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3784,6 +3784,9 @@ static int btrfs_read_locked_inode(struct inode *inode,
 	i_uid_write(inode, btrfs_inode_uid(leaf, inode_item));
 	i_gid_write(inode, btrfs_inode_gid(leaf, inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_inode_size(leaf, inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 
 	inode->i_atime.tv_sec = btrfs_timespec_sec(leaf, &inode_item->atime);
 	inode->i_atime.tv_nsec = btrfs_timespec_nsec(leaf, &inode_item->atime);
@@ -9363,6 +9366,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode);
 	extent_io_tree_init(fs_info, &ei->io_failure_tree,
 			    IO_TREE_INODE_IO_FAILURE, inode);
+	extent_io_tree_init(fs_info, &ei->file_extent_tree,
+			    IO_TREE_INODE_FILE_EXTENT, inode);
 	ei->io_tree.track_uptodate = true;
 	ei->io_failure_tree.track_uptodate = true;
 	atomic_set(&ei->sync_writers, 0);
@@ -9429,6 +9434,7 @@ void btrfs_destroy_inode(struct inode *inode)
 	btrfs_qgroup_check_reserved_leak(inode);
 	inode_tree_del(inode);
 	btrfs_drop_extent_cache(BTRFS_I(inode), 0, (u64)-1, 0);
+	btrfs_inode_clear_file_extent_range(BTRFS_I(inode), 0, (u64)-1);
 	btrfs_put_root(BTRFS_I(inode)->root);
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
  2019-12-30 21:31 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
  2019-12-30 21:31 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
@ 2019-12-30 21:31 ` Josef Bacik
  2019-12-30 21:31 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We want to use this everywhere we modify the file extent items
permanently.  These include

1) Inserting new file extents for writes and prealloc extents.
2) Truncating inode items.
3) btrfs_cont_expand().
4) Insert inline extents.
5) Insert new extents from log replay.
6) Insert a new extent for clone, as it could be past isize.

We do not however call the clear helper for hole punching because it
simply swaps out an existing file extent for a hole, so there's
effectively no change as far as the i_size is concerned.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-inode.c |  4 ++++
 fs/btrfs/file.c          |  6 +++++
 fs/btrfs/inode.c         | 52 +++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/tree-log.c      |  5 ++++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index d3e15e1d4a91..8b4dcf4f6b3e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 {
 	struct btrfs_delayed_node *delayed_node;
 	struct btrfs_inode_item *inode_item;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 
 	delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
 	if (!delayed_node)
@@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 	i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
 	i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 	inode->i_mode = btrfs_stack_inode_mode(inode_item);
 	set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
 	inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4fadb892af24..f1c880c06ca2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+						clone_info->file_offset,
+						clone_len);
+	if (ret)
+		return ret;
+
 	/* If it's a hole, nothing more needs to be done. */
 	if (clone_info->disk_offset == 0)
 		return 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ab8b972863b1..eaf81129817d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	/*
+	 * We align size to sectorsize for inline extents just for simplicity
+	 * sake.
+	 */
+	size = ALIGN(size, root->fs_info->sectorsize);
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
+	if (ret)
+		goto fail;
+
 	/*
 	 * we're an inline extent, so nobody can
 	 * extend the file past i_size without locking
@@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
+						ram_bytes);
+	if (ret)
+		goto out;
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
@@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	while (1) {
+		u64 clear_start = 0, clear_len = 0;
+
 		fi = NULL;
 		leaf = path->nodes[0];
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
@@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
 			u64 num_dec;
+
+			clear_start = found_key.offset;
 			extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
 			if (!del_item) {
 				u64 orig_num_bytes =
@@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				extent_num_bytes = ALIGN(new_size -
 						found_key.offset,
 						fs_info->sectorsize);
+				clear_start = ALIGN(new_size,
+						    fs_info->sectorsize);
 				btrfs_set_file_extent_num_bytes(leaf, fi,
 							 extent_num_bytes);
 				num_dec = (orig_num_bytes -
@@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						inode_sub_bytes(inode, num_dec);
 				}
 			}
+			clear_len = num_dec;
 		} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
 			/*
 			 * we can't truncate inline items that have had
@@ -4856,12 +4877,27 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 */
 				ret = NEED_TRUNCATE_BLOCK;
 				break;
+			} else {
+				/*
+				 * Inline extents are special, we just treat
+				 * them as a full sector worth in the file
+				 * extent tree just for simplicity sake.
+				 */
+				clear_len = fs_info->sectorsize;
 			}
 
 			if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 				inode_sub_bytes(inode, item_end + 1 - new_size);
 		}
 delete:
+		ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+							  clear_start,
+							  clear_len);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			break;
+		}
+
 		if (del_item)
 			last_size = found_key.offset;
 		else
@@ -5183,14 +5219,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 		}
 		last_byte = min(extent_map_end(em), block_end);
 		last_byte = ALIGN(last_byte, fs_info->sectorsize);
+		hole_size = last_byte - cur_offset;
+
 		if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
 			struct extent_map *hole_em;
-			hole_size = last_byte - cur_offset;
 
 			err = maybe_insert_hole(root, inode, cur_offset,
 						hole_size);
 			if (err)
 				break;
+
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
+
 			btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
@@ -5223,6 +5267,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 							hole_size - 1, 0);
 			}
 			free_extent_map(hole_em);
+		} else {
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
 		}
 next:
 		free_extent_map(em);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 19364940f9a1..ad25974ff936 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 			goto out;
 	}
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
+						extent_end - start);
+	if (ret)
+		goto out;
+
 	inode_add_bytes(inode, nbytes);
 update_inode:
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size
  2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (2 preceding siblings ...)
  2019-12-30 21:31 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
@ 2019-12-30 21:31 ` Josef Bacik
  2019-12-30 21:31 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
  2019-12-31 12:25 ` [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Qu Wenruo
  5 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now that we have a safe way to update the i_size, replace all uses of
btrfs_ordered_update_i_size with btrfs_inode_safe_disk_i_size_write.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/file.c  |  2 +-
 fs/btrfs/inode.c | 12 ++++++------
 fs/btrfs/ioctl.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f1c880c06ca2..35fdc5b99804 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2941,7 +2941,7 @@ static int btrfs_fallocate_update_isize(struct inode *inode,
 
 	inode->i_ctime = current_time(inode);
 	i_size_write(inode, end);
-	btrfs_ordered_update_i_size(inode, end, NULL);
+	btrfs_inode_safe_disk_i_size_write(inode, 0);
 	ret = btrfs_update_inode(trans, root, inode);
 	ret2 = btrfs_end_transaction(trans);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eaf81129817d..2df54486d0e4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3119,7 +3119,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		 */
 		btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
 				       ordered_extent->len);
-		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 		if (freespace_inode)
 			trans = btrfs_join_transaction_spacecache(root);
 		else
@@ -3207,7 +3207,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		goto out;
 	}
 
-	btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+	btrfs_inode_safe_disk_i_size_write(inode, 0);
 	ret = btrfs_update_inode_fallback(trans, root, inode);
 	if (ret) { /* -ENOMEM or corruption */
 		btrfs_abort_transaction(trans, ret);
@@ -5000,7 +5000,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		ASSERT(last_size >= new_size);
 		if (!ret && last_size > new_size)
 			last_size = new_size;
-		btrfs_ordered_update_i_size(inode, last_size, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, last_size);
 	}
 
 	btrfs_free_path(path);
@@ -5330,7 +5330,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		}
 
 		i_size_write(inode, newsize);
-		btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 		pagecache_isize_extended(inode, oldsize, newsize);
 		ret = btrfs_update_inode(trans, root, inode);
 		btrfs_end_write_no_snapshotting(root);
@@ -9312,7 +9312,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 			ret = PTR_ERR(trans);
 			goto out;
 		}
-		btrfs_ordered_update_i_size(inode, inode->i_size, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 	}
 
 	if (trans) {
@@ -10571,7 +10571,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			else
 				i_size = cur_offset;
 			i_size_write(inode, i_size);
-			btrfs_ordered_update_i_size(inode, i_size, NULL);
+			btrfs_inode_safe_disk_i_size_write(inode, 0);
 		}
 
 		ret = btrfs_update_inode(trans, root, inode);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 291dda3b6547..2a02a21cac59 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3334,7 +3334,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 		endoff = destoff + olen;
 	if (endoff > inode->i_size) {
 		i_size_write(inode, endoff);
-		btrfs_ordered_update_i_size(inode, endoff, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 	}
 
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] btrfs: delete the ordered isize update code
  2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (3 preceding siblings ...)
  2019-12-30 21:31 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
@ 2019-12-30 21:31 ` Josef Bacik
  2019-12-31 12:25 ` [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Qu Wenruo
  5 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now that we have a safe way to update the isize, remove all of this code
as it's no longer needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ordered-data.c      | 128 -----------------------------------
 fs/btrfs/ordered-data.h      |   7 --
 include/trace/events/btrfs.h |   1 -
 3 files changed, 136 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 3a3c648bb9d3..b8de2aea36b3 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -791,134 +791,6 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
 	return entry;
 }
 
-/*
- * After an extent is done, call this to conditionally update the on disk
- * i_size.  i_size is updated to cover any fully written part of the file.
- */
-int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
-				struct btrfs_ordered_extent *ordered)
-{
-	struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
-	u64 disk_i_size;
-	u64 new_i_size;
-	u64 i_size = i_size_read(inode);
-	struct rb_node *node;
-	struct rb_node *prev = NULL;
-	struct btrfs_ordered_extent *test;
-	int ret = 1;
-	u64 orig_offset = offset;
-
-	spin_lock_irq(&tree->lock);
-	if (ordered) {
-		offset = entry_end(ordered);
-		if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags))
-			offset = min(offset,
-				     ordered->file_offset +
-				     ordered->truncated_len);
-	} else {
-		offset = ALIGN(offset, btrfs_inode_sectorsize(inode));
-	}
-	disk_i_size = BTRFS_I(inode)->disk_i_size;
-
-	/*
-	 * truncate file.
-	 * If ordered is not NULL, then this is called from endio and
-	 * disk_i_size will be updated by either truncate itself or any
-	 * in-flight IOs which are inside the disk_i_size.
-	 *
-	 * Because btrfs_setsize() may set i_size with disk_i_size if truncate
-	 * fails somehow, we need to make sure we have a precise disk_i_size by
-	 * updating it as usual.
-	 *
-	 */
-	if (!ordered && disk_i_size > i_size) {
-		BTRFS_I(inode)->disk_i_size = orig_offset;
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * if the disk i_size is already at the inode->i_size, or
-	 * this ordered extent is inside the disk i_size, we're done
-	 */
-	if (disk_i_size == i_size)
-		goto out;
-
-	/*
-	 * We still need to update disk_i_size if outstanding_isize is greater
-	 * than disk_i_size.
-	 */
-	if (offset <= disk_i_size &&
-	    (!ordered || ordered->outstanding_isize <= disk_i_size))
-		goto out;
-
-	/*
-	 * walk backward from this ordered extent to disk_i_size.
-	 * if we find an ordered extent then we can't update disk i_size
-	 * yet
-	 */
-	if (ordered) {
-		node = rb_prev(&ordered->rb_node);
-	} else {
-		prev = tree_search(tree, offset);
-		/*
-		 * we insert file extents without involving ordered struct,
-		 * so there should be no ordered struct cover this offset
-		 */
-		if (prev) {
-			test = rb_entry(prev, struct btrfs_ordered_extent,
-					rb_node);
-			BUG_ON(offset_in_entry(test, offset));
-		}
-		node = prev;
-	}
-	for (; node; node = rb_prev(node)) {
-		test = rb_entry(node, struct btrfs_ordered_extent, rb_node);
-
-		/* We treat this entry as if it doesn't exist */
-		if (test_bit(BTRFS_ORDERED_UPDATED_ISIZE, &test->flags))
-			continue;
-
-		if (entry_end(test) <= disk_i_size)
-			break;
-		if (test->file_offset >= i_size)
-			break;
-
-		/*
-		 * We don't update disk_i_size now, so record this undealt
-		 * i_size. Or we will not know the real i_size.
-		 */
-		if (test->outstanding_isize < offset)
-			test->outstanding_isize = offset;
-		if (ordered &&
-		    ordered->outstanding_isize > test->outstanding_isize)
-			test->outstanding_isize = ordered->outstanding_isize;
-		goto out;
-	}
-	new_i_size = min_t(u64, offset, i_size);
-
-	/*
-	 * Some ordered extents may completed before the current one, and
-	 * we hold the real i_size in ->outstanding_isize.
-	 */
-	if (ordered && ordered->outstanding_isize > new_i_size)
-		new_i_size = min_t(u64, ordered->outstanding_isize, i_size);
-	BTRFS_I(inode)->disk_i_size = new_i_size;
-	ret = 0;
-out:
-	/*
-	 * We need to do this because we can't remove ordered extents until
-	 * after the i_disk_size has been updated and then the inode has been
-	 * updated to reflect the change, so we need to tell anybody who finds
-	 * this ordered extent that we've already done all the real work, we
-	 * just haven't completed all the other work.
-	 */
-	if (ordered)
-		set_bit(BTRFS_ORDERED_UPDATED_ISIZE, &ordered->flags);
-	spin_unlock_irq(&tree->lock);
-	return ret;
-}
-
 /*
  * search the ordered extents for one corresponding to 'offset' and
  * try to find a checksum.  This is used because we allow pages to
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 5204171ea962..7f7f9ad091a6 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -52,11 +52,6 @@ enum {
 	BTRFS_ORDERED_DIRECT,
 	/* We had an io error when writing this out */
 	BTRFS_ORDERED_IOERR,
-	/*
-	 * indicates whether this ordered extent has done its due diligence in
-	 * updating the isize
-	 */
-	BTRFS_ORDERED_UPDATED_ISIZE,
 	/* Set when we have to truncate an extent */
 	BTRFS_ORDERED_TRUNCATED,
 	/* Regular IO for COW */
@@ -180,8 +175,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		struct btrfs_inode *inode,
 		u64 file_offset,
 		u64 len);
-int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
-				struct btrfs_ordered_extent *ordered);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
 			   u8 *sum, int len);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 620bf1b38fba..02ac28f0e99e 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -468,7 +468,6 @@ DEFINE_EVENT(
 		{ (1 << BTRFS_ORDERED_PREALLOC), 	"PREALLOC" 	}, \
 		{ (1 << BTRFS_ORDERED_DIRECT),	 	"DIRECT" 	}, \
 		{ (1 << BTRFS_ORDERED_IOERR), 		"IOERR" 	}, \
-		{ (1 << BTRFS_ORDERED_UPDATED_ISIZE), 	"UPDATED_ISIZE"	}, \
 		{ (1 << BTRFS_ORDERED_TRUNCATED), 	"TRUNCATED"	})
 
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES
  2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (4 preceding siblings ...)
  2019-12-30 21:31 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
@ 2019-12-31 12:25 ` Qu Wenruo
  2020-01-02 16:10   ` Josef Bacik
  5 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2019-12-31 12:25 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 2300 bytes --]



On 2019/12/31 上午5:31, Josef Bacik wrote:
> We've historically had this problem where you could flush a targeted section of
> an inode and end up with a hole between extents without a hole extent item.
> This of course makes fsck complain because this is not ok for a file system that
> doesn't have NO_HOLES set.  Because this is a well understood problem I and
> others have been ignoring fsck failures during certain xfstests (generic/475 for
> example) because they would regularly trigger this edge case.
> 
> However this isn't a great behavior to have, we should really be taking all fsck
> failures seriously, and we could potentially ignore fsck legitimate fsck errors
> because we expect it to be this particular failure.
> 
> In order to fix this we need to keep track of where we have valid extent items,
> and only update i_size to encompass that area.  This unfortunately means we need
> a new per-inode extent_io_tree to keep track of the valid ranges.  This is
> relatively straightforward in practice, and helpers have been added to manage
> this so that in the case of a NO_HOLES file system we just simply skip this work
> altogether.

Not an expert of this problem, but AFAIK this is caused by mixing
buffered and direct IO, right?

Since that deadly mix is not recommended anyway, can we make things
simpler by just block any buffered IO if the same inode is under going
any direct IO?

Thanks,
Qu

> 
> I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
> really like Filipe to take a look and I still have some longer running tests
> going on the series.  All of our boxes internally are btrfs and the box I was
> testing on ended up with a weird RPM db corruption that was likely from an
> earlier, broken version of the patch.  However I cannot be 100% sure that was
> the case, so I'm giving it a few more days of testing before I'm satisfied
> there's not some weird thing that RPM does that xfstests doesn't cover.
> 
> This has gone through several iterations of xfstests already, including many
> loops of generic/475 for validation to make sure it was no longer failing.  So
> far so good, but for something like this wider testing will definitely be
> necessary.  Thanks,
> 
> Josef
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES
  2019-12-31 12:25 ` [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Qu Wenruo
@ 2020-01-02 16:10   ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2020-01-02 16:10 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, kernel-team

On 12/31/19 7:25 AM, Qu Wenruo wrote:
> 
> 
> On 2019/12/31 上午5:31, Josef Bacik wrote:
>> We've historically had this problem where you could flush a targeted section of
>> an inode and end up with a hole between extents without a hole extent item.
>> This of course makes fsck complain because this is not ok for a file system that
>> doesn't have NO_HOLES set.  Because this is a well understood problem I and
>> others have been ignoring fsck failures during certain xfstests (generic/475 for
>> example) because they would regularly trigger this edge case.
>>
>> However this isn't a great behavior to have, we should really be taking all fsck
>> failures seriously, and we could potentially ignore fsck legitimate fsck errors
>> because we expect it to be this particular failure.
>>
>> In order to fix this we need to keep track of where we have valid extent items,
>> and only update i_size to encompass that area.  This unfortunately means we need
>> a new per-inode extent_io_tree to keep track of the valid ranges.  This is
>> relatively straightforward in practice, and helpers have been added to manage
>> this so that in the case of a NO_HOLES file system we just simply skip this work
>> altogether.
> 
> Not an expert of this problem, but AFAIK this is caused by mixing
> buffered and direct IO, right?
> 
> Since that deadly mix is not recommended anyway, can we make things
> simpler by just block any buffered IO if the same inode is under going
> any direct IO?
>

This can happen if you write 100mb and then sync_file_range 1mb in the middle of 
the file, it's not just restricted to O_DIRECT.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2019-12-30 21:31 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
@ 2020-01-06 17:22   ` David Sterba
  2020-01-06 19:29     ` Josef Bacik
  2020-01-07 16:46   ` David Sterba
  1 sibling, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-01-06 17:22 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Dec 30, 2019 at 04:31:15PM -0500, Josef Bacik wrote:
> @@ -60,6 +60,11 @@ struct btrfs_inode {
>  	 */
>  	struct extent_io_tree io_failure_tree;
>  
> +	/* keeps track of where we have extent items mapped in order to make
> +	 * sure our i_size adjustments are accurate.
> +	 */
> +	struct extent_io_tree file_extent_tree;

This is not exactly lightweight and cut to the minimum needed, the size
is 40 bytes and contains struct members that are unused. At least the
file extents tree seems to be in use unlike that io_failure_tree wasting
the bytes almost 100% of time.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2020-01-06 17:22   ` David Sterba
@ 2020-01-06 19:29     ` Josef Bacik
  2020-01-07 16:17       ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2020-01-06 19:29 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 1/6/20 12:22 PM, David Sterba wrote:
> On Mon, Dec 30, 2019 at 04:31:15PM -0500, Josef Bacik wrote:
>> @@ -60,6 +60,11 @@ struct btrfs_inode {
>>   	 */
>>   	struct extent_io_tree io_failure_tree;
>>   
>> +	/* keeps track of where we have extent items mapped in order to make
>> +	 * sure our i_size adjustments are accurate.
>> +	 */
>> +	struct extent_io_tree file_extent_tree;
> 
> This is not exactly lightweight and cut to the minimum needed, the size
> is 40 bytes and contains struct members that are unused. At least the
> file extents tree seems to be in use unlike that io_failure_tree wasting
> the bytes almost 100% of time.
>

I'm not in love with it either, but I don't want to invent some lighter weight 
range thingy that I'm going to end up messing up in other ways.

Don't take this series yet, there's still something fishy going on that I have 
to figure out.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2020-01-06 19:29     ` Josef Bacik
@ 2020-01-07 16:17       ` David Sterba
  2020-01-07 16:45         ` Filipe Manana
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-01-07 16:17 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, linux-btrfs, kernel-team

On Mon, Jan 06, 2020 at 02:29:01PM -0500, Josef Bacik wrote:
> On 1/6/20 12:22 PM, David Sterba wrote:
> > On Mon, Dec 30, 2019 at 04:31:15PM -0500, Josef Bacik wrote:
> >> @@ -60,6 +60,11 @@ struct btrfs_inode {
> >>   	 */
> >>   	struct extent_io_tree io_failure_tree;
> >>   
> >> +	/* keeps track of where we have extent items mapped in order to make
> >> +	 * sure our i_size adjustments are accurate.
> >> +	 */
> >> +	struct extent_io_tree file_extent_tree;
> > 
> > This is not exactly lightweight and cut to the minimum needed, the size
> > is 40 bytes and contains struct members that are unused. At least the
> > file extents tree seems to be in use unlike that io_failure_tree wasting
> > the bytes almost 100% of time.
> >
> 
> I'm not in love with it either, but I don't want to invent some lighter weight 
> range thingy that I'm going to end up messing up in other ways.

Yeah, the extent_io_tree is now being used for generic range tree, some
cleanups could remove the members that were added for the first specific
use (like the dirty_bytes, or track_updates). For correctness of NOHOLES
the inode size increase shouldn't be a blocker but needs to be addressed
later.

> Don't take this series yet, there's still something fishy going on that I have 
> to figure out.  Thanks,

Ok.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2020-01-07 16:17       ` David Sterba
@ 2020-01-07 16:45         ` Filipe Manana
  0 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2020-01-07 16:45 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 4:18 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Jan 06, 2020 at 02:29:01PM -0500, Josef Bacik wrote:
> > On 1/6/20 12:22 PM, David Sterba wrote:
> > > On Mon, Dec 30, 2019 at 04:31:15PM -0500, Josef Bacik wrote:
> > >> @@ -60,6 +60,11 @@ struct btrfs_inode {
> > >>     */
> > >>    struct extent_io_tree io_failure_tree;
> > >>
> > >> +  /* keeps track of where we have extent items mapped in order to make
> > >> +   * sure our i_size adjustments are accurate.
> > >> +   */
> > >> +  struct extent_io_tree file_extent_tree;
> > >
> > > This is not exactly lightweight and cut to the minimum needed, the size
> > > is 40 bytes and contains struct members that are unused. At least the
> > > file extents tree seems to be in use unlike that io_failure_tree wasting
> > > the bytes almost 100% of time.
> > >
> >
> > I'm not in love with it either, but I don't want to invent some lighter weight
> > range thingy that I'm going to end up messing up in other ways.
>
> Yeah, the extent_io_tree is now being used for generic range tree, some
> cleanups could remove the members that were added for the first specific
> use (like the dirty_bytes, or track_updates). For correctness of NOHOLES
> the inode size increase shouldn't be a blocker but needs to be addressed
> later.

Hum?
This isn't about the NO_HOLES case, it's about unordered i_size update
which can often lead to missing extent items for a file - when using
NO_HOLES fsck simply doesn't report any issue, but when not using it,
it does report about missing extents.

>
> > Don't take this series yet, there's still something fishy going on that I have
> > to figure out.  Thanks,
>
> Ok.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2019-12-30 21:31 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
  2020-01-06 17:22   ` David Sterba
@ 2020-01-07 16:46   ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-01-07 16:46 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Dec 30, 2019 at 04:31:15PM -0500, Josef Bacik wrote:
> +int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
> +				      u64 len)
> +{
> +	if (len == 0)
> +		return 0;
> +
> +	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize));
> +
> +	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
> +		return 0;
> +	return set_extent_bits(&inode->file_extent_tree, start, start + len - 1,
> +			       EXTENT_DIRTY);

set_extent_bits and friends do the allocations for range splits so this
is going to decrease performance. For the allocations and tree
traversals.  With enabled no-holes it's not going to be a problem, but
there are still many filesystems without the feature enabled so this
needs some evaluation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
  2020-01-15 17:12   ` Filipe Manana
@ 2020-01-16 12:46   ` Filipe Manana
  1 sibling, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2020-01-16 12:46 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We want to use this everywhere we modify the file extent items
> permanently.  These include
>
> 1) Inserting new file extents for writes and prealloc extents.
> 2) Truncating inode items.
> 3) btrfs_cont_expand().
> 4) Insert inline extents.
> 5) Insert new extents from log replay.
> 6) Insert a new extent for clone, as it could be past isize.
>
> We do not however call the clear helper for hole punching because it
> simply swaps out an existing file extent for a hole, so there's
> effectively no change as far as the i_size is concerned.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/delayed-inode.c |  4 +++
>  fs/btrfs/file.c          |  6 ++++
>  fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/tree-log.c      |  5 ++++
>  4 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>  {
>         struct btrfs_delayed_node *delayed_node;
>         struct btrfs_inode_item *inode_item;
> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>
>         delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>         if (!delayed_node)
> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>         i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>         i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>         btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> +                                         round_up(i_size_read(inode),
> +                                                  fs_info->sectorsize));
>         inode->i_mode = btrfs_stack_inode_mode(inode_item);
>         set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
>         inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4fadb892af24..f1c880c06ca2 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                               clone_info->file_offset,
> +                                               clone_len);
> +       if (ret)
> +               return ret;
> +
>         /* If it's a hole, nothing more needs to be done. */
>         if (clone_info->disk_offset == 0)
>                 return 0;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ab8b972863b1..5d34007aa7ec 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       /*
> +        * We align size to sectorsize for inline extents just for simplicity
> +        * sake.
> +        */
> +       size = ALIGN(size, root->fs_info->sectorsize);
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
> +       if (ret)
> +               goto fail;
> +
>         /*
>          * we're an inline extent, so nobody can
>          * extend the file past i_size without locking
> @@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
>         ins.offset = disk_num_bytes;
>         ins.type = BTRFS_EXTENT_ITEM_KEY;
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
> +                                               ram_bytes);
> +       if (ret)
> +               goto out;
> +
>         /*
>          * Release the reserved range from inode dirty range map, as it is
>          * already moved into delayed_ref_head
> @@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>         }
>
>         while (1) {
> +               u64 clear_start = 0, clear_len = 0;
> +
>                 fi = NULL;
>                 leaf = path->nodes[0];
>                 btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> @@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>
>                 if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
>                         u64 num_dec;
> +
> +                       clear_start = found_key.offset;
>                         extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>                         if (!del_item) {
>                                 u64 orig_num_bytes =
> @@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                 extent_num_bytes = ALIGN(new_size -
>                                                 found_key.offset,
>                                                 fs_info->sectorsize);
> +                               clear_start = ALIGN(new_size,
> +                                                   fs_info->sectorsize);
>                                 btrfs_set_file_extent_num_bytes(leaf, fi,
>                                                          extent_num_bytes);
>                                 num_dec = (orig_num_bytes -
> @@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                                 inode_sub_bytes(inode, num_dec);
>                                 }
>                         }
> +                       clear_len = num_dec;
>                 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
>                         /*
>                          * we can't truncate inline items that have had
> @@ -4856,12 +4877,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                  */
>                                 ret = NEED_TRUNCATE_BLOCK;
>                                 break;
> +                       } else {
> +                               /*
> +                                * Inline extents are special, we just treat
> +                                * them as a full sector worth in the file
> +                                * extent tree just for simplicity sake.
> +                                */
> +                               clear_len = fs_info->sectorsize;
>                         }
>
>                         if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>                                 inode_sub_bytes(inode, item_end + 1 - new_size);
>                 }
>  delete:
> +               /*
> +                * We use btrfs_truncate_inode_items() to clean up log trees for
> +                * multiple fsyncs, and in this case we don't want to clear the
> +                * file extent range because it's just the log.
> +                */
> +               if (root == BTRFS_I(inode)->root) {
> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                                                 clear_start,
> +                                                                 clear_len);

Trying this out today:

[ 3567.156540] BUG: sleeping function called from invalid context at
mm/slab.h:565
[ 3567.156595] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
2344, name: fsstress
[ 3567.156636] 4 locks held by fsstress/2344:
[ 3567.156638]  #0: ffff9b5f82a7c410 (sb_writers#12){.+.+}, at:
mnt_want_write+0x20/0x50
[ 3567.156646]  #1: ffff9b5f6a4df698
(&sb->s_type->i_mutex_key#14){+.+.}, at: do_truncate+0x66/0xc0
[ 3567.156651]  #2: ffff9b5f82a7c610 (sb_internal#2){.+.+}, at:
start_transaction+0x3c7/0x5c0 [btrfs]
[ 3567.156673]  #3: ffff9b5f7a746ae8 (btrfs-fs-00){++++}, at:
btrfs_try_tree_write_lock+0x2f/0x1c0 [btrfs]
[ 3567.156692] Preemption disabled at:
[ 3567.156693] [<0000000000000000>] 0x0
[ 3567.156709] CPU: 1 PID: 2344 Comm: fsstress Tainted: G        W
    5.5.0-rc6-btrfs-next-52 #1
[ 3567.156710] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 3567.156711] Call Trace:
[ 3567.156715]  dump_stack+0x87/0xcb
[ 3567.156718]  ___might_sleep+0x287/0x2f0
[ 3567.156735]  ? alloc_extent_state+0x23/0x1f0 [btrfs]
[ 3567.156737]  slab_pre_alloc_hook+0x64/0x80
[ 3567.156739]  kmem_cache_alloc+0x33/0x300
[ 3567.156755]  alloc_extent_state+0x23/0x1f0 [btrfs]
[ 3567.156770]  __clear_extent_bit+0x2d6/0x6b0 [btrfs]
[ 3567.156788]  clear_extent_bit+0x15/0x20 [btrfs]
[ 3567.156799]  btrfs_inode_clear_file_extent_range+0x61/0x80 [btrfs]
[ 3567.156812]  btrfs_truncate_inode_items+0x944/0x10a0 [btrfs]
[ 3567.156822]  ? do_raw_spin_unlock+0x49/0xc0
[ 3567.156836]  btrfs_setattr+0x30b/0x5b0 [btrfs]
[ 3567.156840]  notify_change+0x306/0x460
[ 3567.156843]  do_truncate+0x75/0xc0
[ 3567.156846]  ? generic_permission+0x1d/0x1e0
[ 3567.156850]  vfs_truncate+0x1b0/0x250
[ 3567.156853]  do_sys_truncate+0x79/0xc0
[ 3567.156855]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 3567.156857]  do_syscall_64+0x5c/0x280
[ 3567.156860]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3567.156861] RIP: 0033:0x7fe6436a1c57

We are holding the path with spin locks... We must set it to blocking
locks before clearing the io tree.

Thanks.


> +                       if (ret) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               break;
> +                       }
> +               }
> +
>                 if (del_item)
>                         last_size = found_key.offset;
>                 else
> @@ -5183,14 +5226,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                 }
>                 last_byte = min(extent_map_end(em), block_end);
>                 last_byte = ALIGN(last_byte, fs_info->sectorsize);
> +               hole_size = last_byte - cur_offset;
> +
>                 if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
>                         struct extent_map *hole_em;
> -                       hole_size = last_byte - cur_offset;
>
>                         err = maybe_insert_hole(root, inode, cur_offset,
>                                                 hole_size);
>                         if (err)
>                                 break;
> +
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
> +
>                         btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
>                                                 cur_offset + hole_size - 1, 0);
>                         hole_em = alloc_extent_map();
> @@ -5223,6 +5274,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                                                         hole_size - 1, 0);
>                         }
>                         free_extent_map(hole_em);
> +               } else {
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
>                 }
>  next:
>                 free_extent_map(em);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 19364940f9a1..ad25974ff936 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>                         goto out;
>         }
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
> +                                               extent_end - start);
> +       if (ret)
> +               goto out;
> +
>         inode_add_bytes(inode, nbytes);
>  update_inode:
>         ret = btrfs_update_inode(trans, root, inode);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-15 17:20     ` Josef Bacik
@ 2020-01-15 17:34       ` Filipe Manana
  0 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2020-01-15 17:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Jan 15, 2020 at 5:20 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 1/15/20 12:12 PM, Filipe Manana wrote:
> > On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> We want to use this everywhere we modify the file extent items
> >> permanently.  These include
> >>
> >> 1) Inserting new file extents for writes and prealloc extents.
> >> 2) Truncating inode items.
> >> 3) btrfs_cont_expand().
> >> 4) Insert inline extents.
> >> 5) Insert new extents from log replay.
> >> 6) Insert a new extent for clone, as it could be past isize.
> >>
> >> We do not however call the clear helper for hole punching because it
> >> simply swaps out an existing file extent for a hole, so there's
> >> effectively no change as far as the i_size is concerned.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >> ---
> >>   fs/btrfs/delayed-inode.c |  4 +++
> >>   fs/btrfs/file.c          |  6 ++++
> >>   fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
> >>   fs/btrfs/tree-log.c      |  5 ++++
> >>   4 files changed, 73 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> >> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> >> --- a/fs/btrfs/delayed-inode.c
> >> +++ b/fs/btrfs/delayed-inode.c
> >> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
> >>   {
> >>          struct btrfs_delayed_node *delayed_node;
> >>          struct btrfs_inode_item *inode_item;
> >> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> >>
> >>          delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
> >>          if (!delayed_node)
> >> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
> >>          i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
> >>          i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
> >>          btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> >> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> >> +                                         round_up(i_size_read(inode),
> >> +                                                  fs_info->sectorsize));
> >
> > Why set it here when we have already done it at btrfs_read_locked_inode()?
> > This seems duplicated.
> >
>
> Because if btrfs_fill_inode() returns true it means we had a delayed inode in
> memory still and we skip the reading of the inode off of disk, so it's an either
> or case, not a duplicate.  Thanks,

Oh, yes, I missed that.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-15 17:12   ` Filipe Manana
@ 2020-01-15 17:20     ` Josef Bacik
  2020-01-15 17:34       ` Filipe Manana
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2020-01-15 17:20 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team

On 1/15/20 12:12 PM, Filipe Manana wrote:
> On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> We want to use this everywhere we modify the file extent items
>> permanently.  These include
>>
>> 1) Inserting new file extents for writes and prealloc extents.
>> 2) Truncating inode items.
>> 3) btrfs_cont_expand().
>> 4) Insert inline extents.
>> 5) Insert new extents from log replay.
>> 6) Insert a new extent for clone, as it could be past isize.
>>
>> We do not however call the clear helper for hole punching because it
>> simply swaps out an existing file extent for a hole, so there's
>> effectively no change as far as the i_size is concerned.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/delayed-inode.c |  4 +++
>>   fs/btrfs/file.c          |  6 ++++
>>   fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/tree-log.c      |  5 ++++
>>   4 files changed, 73 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index d3e15e1d4a91..8b4dcf4f6b3e 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>>   {
>>          struct btrfs_delayed_node *delayed_node;
>>          struct btrfs_inode_item *inode_item;
>> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>>
>>          delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>>          if (!delayed_node)
>> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>>          i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>>          i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>>          btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
>> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
>> +                                         round_up(i_size_read(inode),
>> +                                                  fs_info->sectorsize));
> 
> Why set it here when we have already done it at btrfs_read_locked_inode()?
> This seems duplicated.
> 

Because if btrfs_fill_inode() returns true it means we had a delayed inode in 
memory still and we skip the reading of the inode off of disk, so it's an either 
or case, not a duplicate.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
@ 2020-01-15 17:12   ` Filipe Manana
  2020-01-15 17:20     ` Josef Bacik
  2020-01-16 12:46   ` Filipe Manana
  1 sibling, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2020-01-15 17:12 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We want to use this everywhere we modify the file extent items
> permanently.  These include
>
> 1) Inserting new file extents for writes and prealloc extents.
> 2) Truncating inode items.
> 3) btrfs_cont_expand().
> 4) Insert inline extents.
> 5) Insert new extents from log replay.
> 6) Insert a new extent for clone, as it could be past isize.
>
> We do not however call the clear helper for hole punching because it
> simply swaps out an existing file extent for a hole, so there's
> effectively no change as far as the i_size is concerned.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/delayed-inode.c |  4 +++
>  fs/btrfs/file.c          |  6 ++++
>  fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/tree-log.c      |  5 ++++
>  4 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>  {
>         struct btrfs_delayed_node *delayed_node;
>         struct btrfs_inode_item *inode_item;
> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>
>         delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>         if (!delayed_node)
> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>         i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>         i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>         btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> +                                         round_up(i_size_read(inode),
> +                                                  fs_info->sectorsize));

Why set it here when we have already done it at btrfs_read_locked_inode()?
This seems duplicated.

Thanks.

>         inode->i_mode = btrfs_stack_inode_mode(inode_item);
>         set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
>         inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4fadb892af24..f1c880c06ca2 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                               clone_info->file_offset,
> +                                               clone_len);
> +       if (ret)
> +               return ret;
> +
>         /* If it's a hole, nothing more needs to be done. */
>         if (clone_info->disk_offset == 0)
>                 return 0;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ab8b972863b1..5d34007aa7ec 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       /*
> +        * We align size to sectorsize for inline extents just for simplicity
> +        * sake.
> +        */
> +       size = ALIGN(size, root->fs_info->sectorsize);
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
> +       if (ret)
> +               goto fail;
> +
>         /*
>          * we're an inline extent, so nobody can
>          * extend the file past i_size without locking
> @@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
>         ins.offset = disk_num_bytes;
>         ins.type = BTRFS_EXTENT_ITEM_KEY;
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
> +                                               ram_bytes);
> +       if (ret)
> +               goto out;
> +
>         /*
>          * Release the reserved range from inode dirty range map, as it is
>          * already moved into delayed_ref_head
> @@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>         }
>
>         while (1) {
> +               u64 clear_start = 0, clear_len = 0;
> +
>                 fi = NULL;
>                 leaf = path->nodes[0];
>                 btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> @@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>
>                 if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
>                         u64 num_dec;
> +
> +                       clear_start = found_key.offset;
>                         extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>                         if (!del_item) {
>                                 u64 orig_num_bytes =
> @@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                 extent_num_bytes = ALIGN(new_size -
>                                                 found_key.offset,
>                                                 fs_info->sectorsize);
> +                               clear_start = ALIGN(new_size,
> +                                                   fs_info->sectorsize);
>                                 btrfs_set_file_extent_num_bytes(leaf, fi,
>                                                          extent_num_bytes);
>                                 num_dec = (orig_num_bytes -
> @@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                                 inode_sub_bytes(inode, num_dec);
>                                 }
>                         }
> +                       clear_len = num_dec;
>                 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
>                         /*
>                          * we can't truncate inline items that have had
> @@ -4856,12 +4877,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                  */
>                                 ret = NEED_TRUNCATE_BLOCK;
>                                 break;
> +                       } else {
> +                               /*
> +                                * Inline extents are special, we just treat
> +                                * them as a full sector worth in the file
> +                                * extent tree just for simplicity sake.
> +                                */
> +                               clear_len = fs_info->sectorsize;
>                         }
>
>                         if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>                                 inode_sub_bytes(inode, item_end + 1 - new_size);
>                 }
>  delete:
> +               /*
> +                * We use btrfs_truncate_inode_items() to clean up log trees for
> +                * multiple fsyncs, and in this case we don't want to clear the
> +                * file extent range because it's just the log.
> +                */
> +               if (root == BTRFS_I(inode)->root) {
> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                                                 clear_start,
> +                                                                 clear_len);
> +                       if (ret) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               break;
> +                       }
> +               }
> +
>                 if (del_item)
>                         last_size = found_key.offset;
>                 else
> @@ -5183,14 +5226,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                 }
>                 last_byte = min(extent_map_end(em), block_end);
>                 last_byte = ALIGN(last_byte, fs_info->sectorsize);
> +               hole_size = last_byte - cur_offset;
> +
>                 if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
>                         struct extent_map *hole_em;
> -                       hole_size = last_byte - cur_offset;
>
>                         err = maybe_insert_hole(root, inode, cur_offset,
>                                                 hole_size);
>                         if (err)
>                                 break;
> +
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
> +
>                         btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
>                                                 cur_offset + hole_size - 1, 0);
>                         hole_em = alloc_extent_map();
> @@ -5223,6 +5274,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                                                         hole_size - 1, 0);
>                         }
>                         free_extent_map(hole_em);
> +               } else {
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
>                 }
>  next:
>                 free_extent_map(em);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 19364940f9a1..ad25974ff936 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>                         goto out;
>         }
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
> +                                               extent_end - start);
> +       if (ret)
> +               goto out;
> +
>         inode_add_bytes(inode, nbytes);
>  update_inode:
>         ret = btrfs_update_inode(trans, root, inode);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-07 19:42 [PATCH 0/5][v2] " Josef Bacik
@ 2020-01-07 19:42 ` Josef Bacik
  2020-01-15 17:12   ` Filipe Manana
  2020-01-16 12:46   ` Filipe Manana
  0 siblings, 2 replies; 18+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We want to use this everywhere we modify the file extent items
permanently.  These include

1) Inserting new file extents for writes and prealloc extents.
2) Truncating inode items.
3) btrfs_cont_expand().
4) Insert inline extents.
5) Insert new extents from log replay.
6) Insert a new extent for clone, as it could be past isize.

We do not however call the clear helper for hole punching because it
simply swaps out an existing file extent for a hole, so there's
effectively no change as far as the i_size is concerned.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-inode.c |  4 +++
 fs/btrfs/file.c          |  6 ++++
 fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/tree-log.c      |  5 ++++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index d3e15e1d4a91..8b4dcf4f6b3e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 {
 	struct btrfs_delayed_node *delayed_node;
 	struct btrfs_inode_item *inode_item;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 
 	delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
 	if (!delayed_node)
@@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 	i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
 	i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 	inode->i_mode = btrfs_stack_inode_mode(inode_item);
 	set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
 	inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4fadb892af24..f1c880c06ca2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+						clone_info->file_offset,
+						clone_len);
+	if (ret)
+		return ret;
+
 	/* If it's a hole, nothing more needs to be done. */
 	if (clone_info->disk_offset == 0)
 		return 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ab8b972863b1..5d34007aa7ec 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	/*
+	 * We align size to sectorsize for inline extents just for simplicity
+	 * sake.
+	 */
+	size = ALIGN(size, root->fs_info->sectorsize);
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
+	if (ret)
+		goto fail;
+
 	/*
 	 * we're an inline extent, so nobody can
 	 * extend the file past i_size without locking
@@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
+						ram_bytes);
+	if (ret)
+		goto out;
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
@@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	while (1) {
+		u64 clear_start = 0, clear_len = 0;
+
 		fi = NULL;
 		leaf = path->nodes[0];
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
@@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
 			u64 num_dec;
+
+			clear_start = found_key.offset;
 			extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
 			if (!del_item) {
 				u64 orig_num_bytes =
@@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				extent_num_bytes = ALIGN(new_size -
 						found_key.offset,
 						fs_info->sectorsize);
+				clear_start = ALIGN(new_size,
+						    fs_info->sectorsize);
 				btrfs_set_file_extent_num_bytes(leaf, fi,
 							 extent_num_bytes);
 				num_dec = (orig_num_bytes -
@@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						inode_sub_bytes(inode, num_dec);
 				}
 			}
+			clear_len = num_dec;
 		} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
 			/*
 			 * we can't truncate inline items that have had
@@ -4856,12 +4877,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 */
 				ret = NEED_TRUNCATE_BLOCK;
 				break;
+			} else {
+				/*
+				 * Inline extents are special, we just treat
+				 * them as a full sector worth in the file
+				 * extent tree just for simplicity sake.
+				 */
+				clear_len = fs_info->sectorsize;
 			}
 
 			if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 				inode_sub_bytes(inode, item_end + 1 - new_size);
 		}
 delete:
+		/*
+		 * We use btrfs_truncate_inode_items() to clean up log trees for
+		 * multiple fsyncs, and in this case we don't want to clear the
+		 * file extent range because it's just the log.
+		 */
+		if (root == BTRFS_I(inode)->root) {
+			ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+								  clear_start,
+								  clear_len);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				break;
+			}
+		}
+
 		if (del_item)
 			last_size = found_key.offset;
 		else
@@ -5183,14 +5226,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 		}
 		last_byte = min(extent_map_end(em), block_end);
 		last_byte = ALIGN(last_byte, fs_info->sectorsize);
+		hole_size = last_byte - cur_offset;
+
 		if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
 			struct extent_map *hole_em;
-			hole_size = last_byte - cur_offset;
 
 			err = maybe_insert_hole(root, inode, cur_offset,
 						hole_size);
 			if (err)
 				break;
+
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
+
 			btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
@@ -5223,6 +5274,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 							hole_size - 1, 0);
 			}
 			free_extent_map(hole_em);
+		} else {
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
 		}
 next:
 		free_extent_map(em);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 19364940f9a1..ad25974ff936 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 			goto out;
 	}
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
+						extent_end - start);
+	if (ret)
+		goto out;
+
 	inode_add_bytes(inode, nbytes);
 update_inode:
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-16 12:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-30 21:31 [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
2019-12-30 21:31 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
2019-12-30 21:31 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
2020-01-06 17:22   ` David Sterba
2020-01-06 19:29     ` Josef Bacik
2020-01-07 16:17       ` David Sterba
2020-01-07 16:45         ` Filipe Manana
2020-01-07 16:46   ` David Sterba
2019-12-30 21:31 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
2019-12-30 21:31 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
2019-12-30 21:31 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
2019-12-31 12:25 ` [RFC][PATCH 0/5] btrfs: fix hole corruption issue with !NO_HOLES Qu Wenruo
2020-01-02 16:10   ` Josef Bacik
2020-01-07 19:42 [PATCH 0/5][v2] " Josef Bacik
2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
2020-01-15 17:12   ` Filipe Manana
2020-01-15 17:20     ` Josef Bacik
2020-01-15 17:34       ` Filipe Manana
2020-01-16 12:46   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).