Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES
@ 2020-01-07 19:42 Josef Bacik
  2020-01-07 19:42 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- fixed a bug in 'btrfs: use the file extent tree infrastructure' that would
  result in 0 length files because btrfs_truncate_inode_items() was clearing the
  file extent map when we fsync'ed multiple times.  Validated this with a
  modified fsx and generic/521 that reproduced the problem, those modifications
  were sent up as well.
- dropped the RFC

----------------- Original Message -----------------------
We've historically had this problem where you could flush a targeted section of
an inode and end up with a hole between extents without a hole extent item.
This of course makes fsck complain because this is not ok for a file system that
doesn't have NO_HOLES set.  Because this is a well understood problem I and
others have been ignoring fsck failures during certain xfstests (generic/475 for
example) because they would regularly trigger this edge case.

However this isn't a great behavior to have, we should really be taking all fsck
failures seriously, and we could potentially ignore fsck legitimate fsck errors
because we expect it to be this particular failure.

In order to fix this we need to keep track of where we have valid extent items,
and only update i_size to encompass that area.  This unfortunately means we need
a new per-inode extent_io_tree to keep track of the valid ranges.  This is
relatively straightforward in practice, and helpers have been added to manage
this so that in the case of a NO_HOLES file system we just simply skip this work
altogether.

I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
really like Filipe to take a look and I still have some longer running tests
going on the series.  All of our boxes internally are btrfs and the box I was
testing on ended up with a weird RPM db corruption that was likely from an
earlier, broken version of the patch.  However I cannot be 100% sure that was
the case, so I'm giving it a few more days of testing before I'm satisfied
there's not some weird thing that RPM does that xfstests doesn't cover.

This has gone through several iterations of xfstests already, including many
loops of generic/475 for validation to make sure it was no longer failing.  So
far so good, but for something like this wider testing will definitely be
necessary.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update
  2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
@ 2020-01-07 19:42 ` Josef Bacik
  2020-01-15 17:01   ` Filipe Manana
  2020-01-07 19:42 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We were using btrfs_i_size_write(), which unconditionally jacks up
inode->disk_i_size.  However since clone can operate on ranges we could
have pending ordered extents for a range prior to the start of our clone
operation and thus increase disk_i_size too far and have a hole with no
file extent.

Fix this by using the btrfs_ordered_update_i_size helper which will do
the right thing in the face of pending ordered extents outside of our
clone range.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8ec61f3f0291..291dda3b6547 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3332,8 +3332,10 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 	 */
 	if (endoff > destoff + olen)
 		endoff = destoff + olen;
-	if (endoff > inode->i_size)
-		btrfs_i_size_write(BTRFS_I(inode), endoff);
+	if (endoff > inode->i_size) {
+		i_size_write(inode, endoff);
+		btrfs_ordered_update_i_size(inode, endoff, NULL);
+	}
 
 	ret = btrfs_update_inode(trans, root, inode);
 	if (ret) {
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
  2020-01-07 19:42 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
@ 2020-01-07 19:42 ` Josef Bacik
  2020-01-15 17:10   ` Filipe Manana
  2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

In order to keep track of where we have file extents on disk, and thus
where it is safe to adjust the i_size to, we need to have a tree in
place to keep track of the contiguous areas we have file extents for.
Add helpers to use this tree, as it's not required for NO_HOLES file
systems.  We will use this by setting DIRTY for areas we know we have
file extent item's set, and clearing it when we remove file extent items
for truncation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/btrfs_inode.h    |  5 +++
 fs/btrfs/ctree.h          |  5 +++
 fs/btrfs/extent-io-tree.h |  1 +
 fs/btrfs/extent_io.c      | 11 +++++
 fs/btrfs/file-item.c      | 91 +++++++++++++++++++++++++++++++++++++++
 fs/btrfs/inode.c          |  6 +++
 6 files changed, 119 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..d9dcbac513ed 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -60,6 +60,11 @@ struct btrfs_inode {
 	 */
 	struct extent_io_tree io_failure_tree;
 
+	/* keeps track of where we have extent items mapped in order to make
+	 * sure our i_size adjustments are accurate.
+	 */
+	struct extent_io_tree file_extent_tree;
+
 	/* held while logging the inode in tree-log.c */
 	struct mutex log_mutex;
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a01ff3e0ead4..7142124d03c5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2807,6 +2807,11 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     struct btrfs_file_extent_item *fi,
 				     const bool new_inline,
 				     struct extent_map *em);
+int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
+					u64 len);
+int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
+				      u64 len);
+void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize);
 
 /* inode.c */
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index a3febe746c79..c8bcd2e3184c 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -44,6 +44,7 @@ enum {
 	IO_TREE_TRANS_DIRTY_PAGES,
 	IO_TREE_ROOT_DIRTY_LOG_PAGES,
 	IO_TREE_SELFTEST,
+	IO_TREE_INODE_FILE_EXTENT,
 };
 
 struct extent_io_tree {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e374411ed08c..410f5a64d3a6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -265,6 +265,15 @@ void __cold extent_io_exit(void)
 	bioset_exit(&btrfs_bioset);
 }
 
+/*
+ * For the file_extent_tree, we want to hold the inode lock when we lookup and
+ * update the disk_i_size, but lockdep will complain because our io_tree we hold
+ * the tree lock and get the inode lock when setting delalloc.  These two things
+ * are unrelated, so make a class for the file_extent_tree so we don't get the
+ * two locking patterns mixed up.
+ */
+static struct lock_class_key file_extent_tree_class;
+
 void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 			 struct extent_io_tree *tree, unsigned int owner,
 			 void *private_data)
@@ -276,6 +285,8 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&tree->lock);
 	tree->private_data = private_data;
 	tree->owner = owner;
+	if (owner == IO_TREE_INODE_FILE_EXTENT)
+		lockdep_set_class(&tree->lock, &file_extent_tree_class);
 }
 
 void extent_io_tree_release(struct extent_io_tree *tree)
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 1a599f50837b..b733d85510ed 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -23,6 +23,97 @@
 #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
 				       PAGE_SIZE))
 
+/**
+ * @inode - the inode we want to update the disk_i_size for
+ * @new_isize - the isize we want to set to, 0 if we use i_size
+ *
+ * With NO_HOLES set this simply sets the disk_is_size to whatever i_size_read()
+ * returns as it is perfectly fine with a file that has holes without hole file
+ * extent items.
+ *
+ * However for !NO_HOLES we need to only return the area that is contiguous from
+ * the 0 offset of the file.  Otherwise we could end up adjust i_size up to an
+ * extent that has a gap in between.
+ *
+ * Finally new_isize should only be set in the case of truncate where we're not
+ * ready to use i_size_read() as the limiter yet.
+ */
+void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize)
+{
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	u64 start, end, isize;
+	int ret;
+
+	isize = new_isize ? new_isize : i_size_read(inode);
+	if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
+		BTRFS_I(inode)->disk_i_size = isize;
+		return;
+	}
+
+	spin_lock(&BTRFS_I(inode)->lock);
+	ret = find_first_extent_bit(&BTRFS_I(inode)->file_extent_tree, 0,
+				    &start, &end, EXTENT_DIRTY, NULL);
+	if (!ret && start == 0)
+		isize = min(isize, end + 1);
+	else
+		isize = 0;
+	BTRFS_I(inode)->disk_i_size = isize;
+	spin_unlock(&BTRFS_I(inode)->lock);
+}
+
+/**
+ * @inode - the inode we're modifying
+ * @start - the start file offset of the file extent we've inserted
+ * @len - the logical length of the file extent item
+ *
+ * Call when we are insering a new file extent where there was none before.
+ * Does not need to call this in the case where we're replacing an existing file
+ * extent, however if you're unsure it's fine to call this multiple times.
+ *
+ * The start and len must match the file extent item, so thus must be sectorsize
+ * aligned.
+ */
+int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
+				      u64 len)
+{
+	if (len == 0)
+		return 0;
+
+	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize));
+
+	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
+		return 0;
+	return set_extent_bits(&inode->file_extent_tree, start, start + len - 1,
+			       EXTENT_DIRTY);
+}
+
+/**
+ * @inode - the inode we're modifying
+ * @start - the start file offset of the file extent we've inserted
+ * @len - the logical length of the file extent item
+ *
+ * Called when we drop a file extent, for example when we truncate.  Doesn't
+ * need to be called for cases where we're replacing a file extent, like when
+ * we've cow'ed a file extent.
+ *
+ * The start and len must match the file extent item, so thus must be sectorsize
+ * aligned.
+ */
+int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
+					u64 len)
+{
+	if (len == 0)
+		return 0;
+
+	ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize) ||
+	       len == (u64)-1);
+
+	if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
+		return 0;
+	return clear_extent_bit(&inode->file_extent_tree, start,
+				start + len - 1, EXTENT_DIRTY, 0, 0, NULL);
+}
+
 static inline u32 max_ordered_sum_bytes(struct btrfs_fs_info *fs_info,
 					u16 csum_size)
 {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index db67e1984c91..ab8b972863b1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3784,6 +3784,9 @@ static int btrfs_read_locked_inode(struct inode *inode,
 	i_uid_write(inode, btrfs_inode_uid(leaf, inode_item));
 	i_gid_write(inode, btrfs_inode_gid(leaf, inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_inode_size(leaf, inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 
 	inode->i_atime.tv_sec = btrfs_timespec_sec(leaf, &inode_item->atime);
 	inode->i_atime.tv_nsec = btrfs_timespec_nsec(leaf, &inode_item->atime);
@@ -9363,6 +9366,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode);
 	extent_io_tree_init(fs_info, &ei->io_failure_tree,
 			    IO_TREE_INODE_IO_FAILURE, inode);
+	extent_io_tree_init(fs_info, &ei->file_extent_tree,
+			    IO_TREE_INODE_FILE_EXTENT, inode);
 	ei->io_tree.track_uptodate = true;
 	ei->io_failure_tree.track_uptodate = true;
 	atomic_set(&ei->sync_writers, 0);
@@ -9429,6 +9434,7 @@ void btrfs_destroy_inode(struct inode *inode)
 	btrfs_qgroup_check_reserved_leak(inode);
 	inode_tree_del(inode);
 	btrfs_drop_extent_cache(BTRFS_I(inode), 0, (u64)-1, 0);
+	btrfs_inode_clear_file_extent_range(BTRFS_I(inode), 0, (u64)-1);
 	btrfs_put_root(BTRFS_I(inode)->root);
 }
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
  2020-01-07 19:42 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
  2020-01-07 19:42 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
@ 2020-01-07 19:42 ` Josef Bacik
  2020-01-15 17:12   ` Filipe Manana
  2020-01-16 12:46   ` Filipe Manana
  2020-01-07 19:42 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We want to use this everywhere we modify the file extent items
permanently.  These include

1) Inserting new file extents for writes and prealloc extents.
2) Truncating inode items.
3) btrfs_cont_expand().
4) Insert inline extents.
5) Insert new extents from log replay.
6) Insert a new extent for clone, as it could be past isize.

We do not however call the clear helper for hole punching because it
simply swaps out an existing file extent for a hole, so there's
effectively no change as far as the i_size is concerned.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-inode.c |  4 +++
 fs/btrfs/file.c          |  6 ++++
 fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/tree-log.c      |  5 ++++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index d3e15e1d4a91..8b4dcf4f6b3e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 {
 	struct btrfs_delayed_node *delayed_node;
 	struct btrfs_inode_item *inode_item;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 
 	delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
 	if (!delayed_node)
@@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 	i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
 	i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 	inode->i_mode = btrfs_stack_inode_mode(inode_item);
 	set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
 	inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4fadb892af24..f1c880c06ca2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+						clone_info->file_offset,
+						clone_len);
+	if (ret)
+		return ret;
+
 	/* If it's a hole, nothing more needs to be done. */
 	if (clone_info->disk_offset == 0)
 		return 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ab8b972863b1..5d34007aa7ec 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	/*
+	 * We align size to sectorsize for inline extents just for simplicity
+	 * sake.
+	 */
+	size = ALIGN(size, root->fs_info->sectorsize);
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
+	if (ret)
+		goto fail;
+
 	/*
 	 * we're an inline extent, so nobody can
 	 * extend the file past i_size without locking
@@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
+						ram_bytes);
+	if (ret)
+		goto out;
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
@@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	while (1) {
+		u64 clear_start = 0, clear_len = 0;
+
 		fi = NULL;
 		leaf = path->nodes[0];
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
@@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
 			u64 num_dec;
+
+			clear_start = found_key.offset;
 			extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
 			if (!del_item) {
 				u64 orig_num_bytes =
@@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				extent_num_bytes = ALIGN(new_size -
 						found_key.offset,
 						fs_info->sectorsize);
+				clear_start = ALIGN(new_size,
+						    fs_info->sectorsize);
 				btrfs_set_file_extent_num_bytes(leaf, fi,
 							 extent_num_bytes);
 				num_dec = (orig_num_bytes -
@@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						inode_sub_bytes(inode, num_dec);
 				}
 			}
+			clear_len = num_dec;
 		} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
 			/*
 			 * we can't truncate inline items that have had
@@ -4856,12 +4877,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 */
 				ret = NEED_TRUNCATE_BLOCK;
 				break;
+			} else {
+				/*
+				 * Inline extents are special, we just treat
+				 * them as a full sector worth in the file
+				 * extent tree just for simplicity sake.
+				 */
+				clear_len = fs_info->sectorsize;
 			}
 
 			if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 				inode_sub_bytes(inode, item_end + 1 - new_size);
 		}
 delete:
+		/*
+		 * We use btrfs_truncate_inode_items() to clean up log trees for
+		 * multiple fsyncs, and in this case we don't want to clear the
+		 * file extent range because it's just the log.
+		 */
+		if (root == BTRFS_I(inode)->root) {
+			ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+								  clear_start,
+								  clear_len);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				break;
+			}
+		}
+
 		if (del_item)
 			last_size = found_key.offset;
 		else
@@ -5183,14 +5226,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 		}
 		last_byte = min(extent_map_end(em), block_end);
 		last_byte = ALIGN(last_byte, fs_info->sectorsize);
+		hole_size = last_byte - cur_offset;
+
 		if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
 			struct extent_map *hole_em;
-			hole_size = last_byte - cur_offset;
 
 			err = maybe_insert_hole(root, inode, cur_offset,
 						hole_size);
 			if (err)
 				break;
+
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
+
 			btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
@@ -5223,6 +5274,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 							hole_size - 1, 0);
 			}
 			free_extent_map(hole_em);
+		} else {
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
 		}
 next:
 		free_extent_map(em);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 19364940f9a1..ad25974ff936 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 			goto out;
 	}
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
+						extent_end - start);
+	if (ret)
+		goto out;
+
 	inode_add_bytes(inode, nbytes);
 update_inode:
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size
  2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (2 preceding siblings ...)
  2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
@ 2020-01-07 19:42 ` Josef Bacik
  2020-01-15 17:13   ` Filipe Manana
  2020-01-07 19:42 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
  2020-01-15 17:32 ` [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Filipe Manana
  5 siblings, 1 reply; 17+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now that we have a safe way to update the i_size, replace all uses of
btrfs_ordered_update_i_size with btrfs_inode_safe_disk_i_size_write.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/file.c  |  2 +-
 fs/btrfs/inode.c | 12 ++++++------
 fs/btrfs/ioctl.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f1c880c06ca2..35fdc5b99804 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2941,7 +2941,7 @@ static int btrfs_fallocate_update_isize(struct inode *inode,
 
 	inode->i_ctime = current_time(inode);
 	i_size_write(inode, end);
-	btrfs_ordered_update_i_size(inode, end, NULL);
+	btrfs_inode_safe_disk_i_size_write(inode, 0);
 	ret = btrfs_update_inode(trans, root, inode);
 	ret2 = btrfs_end_transaction(trans);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5d34007aa7ec..4a3ef3174d73 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3119,7 +3119,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		 */
 		btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
 				       ordered_extent->len);
-		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 		if (freespace_inode)
 			trans = btrfs_join_transaction_spacecache(root);
 		else
@@ -3207,7 +3207,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		goto out;
 	}
 
-	btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+	btrfs_inode_safe_disk_i_size_write(inode, 0);
 	ret = btrfs_update_inode_fallback(trans, root, inode);
 	if (ret) { /* -ENOMEM or corruption */
 		btrfs_abort_transaction(trans, ret);
@@ -5007,7 +5007,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		ASSERT(last_size >= new_size);
 		if (!ret && last_size > new_size)
 			last_size = new_size;
-		btrfs_ordered_update_i_size(inode, last_size, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, last_size);
 	}
 
 	btrfs_free_path(path);
@@ -5337,7 +5337,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		}
 
 		i_size_write(inode, newsize);
-		btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 		pagecache_isize_extended(inode, oldsize, newsize);
 		ret = btrfs_update_inode(trans, root, inode);
 		btrfs_end_write_no_snapshotting(root);
@@ -9319,7 +9319,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 			ret = PTR_ERR(trans);
 			goto out;
 		}
-		btrfs_ordered_update_i_size(inode, inode->i_size, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 	}
 
 	if (trans) {
@@ -10578,7 +10578,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			else
 				i_size = cur_offset;
 			i_size_write(inode, i_size);
-			btrfs_ordered_update_i_size(inode, i_size, NULL);
+			btrfs_inode_safe_disk_i_size_write(inode, 0);
 		}
 
 		ret = btrfs_update_inode(trans, root, inode);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 291dda3b6547..2a02a21cac59 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3334,7 +3334,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 		endoff = destoff + olen;
 	if (endoff > inode->i_size) {
 		i_size_write(inode, endoff);
-		btrfs_ordered_update_i_size(inode, endoff, NULL);
+		btrfs_inode_safe_disk_i_size_write(inode, 0);
 	}
 
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 5/5] btrfs: delete the ordered isize update code
  2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (3 preceding siblings ...)
  2020-01-07 19:42 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
@ 2020-01-07 19:42 ` Josef Bacik
  2020-01-15 17:13   ` Filipe Manana
  2020-01-15 17:32 ` [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Filipe Manana
  5 siblings, 1 reply; 17+ messages in thread
From: Josef Bacik @ 2020-01-07 19:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now that we have a safe way to update the isize, remove all of this code
as it's no longer needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ordered-data.c      | 128 -----------------------------------
 fs/btrfs/ordered-data.h      |   7 --
 include/trace/events/btrfs.h |   1 -
 3 files changed, 136 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 3a3c648bb9d3..b8de2aea36b3 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -791,134 +791,6 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
 	return entry;
 }
 
-/*
- * After an extent is done, call this to conditionally update the on disk
- * i_size.  i_size is updated to cover any fully written part of the file.
- */
-int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
-				struct btrfs_ordered_extent *ordered)
-{
-	struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
-	u64 disk_i_size;
-	u64 new_i_size;
-	u64 i_size = i_size_read(inode);
-	struct rb_node *node;
-	struct rb_node *prev = NULL;
-	struct btrfs_ordered_extent *test;
-	int ret = 1;
-	u64 orig_offset = offset;
-
-	spin_lock_irq(&tree->lock);
-	if (ordered) {
-		offset = entry_end(ordered);
-		if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags))
-			offset = min(offset,
-				     ordered->file_offset +
-				     ordered->truncated_len);
-	} else {
-		offset = ALIGN(offset, btrfs_inode_sectorsize(inode));
-	}
-	disk_i_size = BTRFS_I(inode)->disk_i_size;
-
-	/*
-	 * truncate file.
-	 * If ordered is not NULL, then this is called from endio and
-	 * disk_i_size will be updated by either truncate itself or any
-	 * in-flight IOs which are inside the disk_i_size.
-	 *
-	 * Because btrfs_setsize() may set i_size with disk_i_size if truncate
-	 * fails somehow, we need to make sure we have a precise disk_i_size by
-	 * updating it as usual.
-	 *
-	 */
-	if (!ordered && disk_i_size > i_size) {
-		BTRFS_I(inode)->disk_i_size = orig_offset;
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * if the disk i_size is already at the inode->i_size, or
-	 * this ordered extent is inside the disk i_size, we're done
-	 */
-	if (disk_i_size == i_size)
-		goto out;
-
-	/*
-	 * We still need to update disk_i_size if outstanding_isize is greater
-	 * than disk_i_size.
-	 */
-	if (offset <= disk_i_size &&
-	    (!ordered || ordered->outstanding_isize <= disk_i_size))
-		goto out;
-
-	/*
-	 * walk backward from this ordered extent to disk_i_size.
-	 * if we find an ordered extent then we can't update disk i_size
-	 * yet
-	 */
-	if (ordered) {
-		node = rb_prev(&ordered->rb_node);
-	} else {
-		prev = tree_search(tree, offset);
-		/*
-		 * we insert file extents without involving ordered struct,
-		 * so there should be no ordered struct cover this offset
-		 */
-		if (prev) {
-			test = rb_entry(prev, struct btrfs_ordered_extent,
-					rb_node);
-			BUG_ON(offset_in_entry(test, offset));
-		}
-		node = prev;
-	}
-	for (; node; node = rb_prev(node)) {
-		test = rb_entry(node, struct btrfs_ordered_extent, rb_node);
-
-		/* We treat this entry as if it doesn't exist */
-		if (test_bit(BTRFS_ORDERED_UPDATED_ISIZE, &test->flags))
-			continue;
-
-		if (entry_end(test) <= disk_i_size)
-			break;
-		if (test->file_offset >= i_size)
-			break;
-
-		/*
-		 * We don't update disk_i_size now, so record this undealt
-		 * i_size. Or we will not know the real i_size.
-		 */
-		if (test->outstanding_isize < offset)
-			test->outstanding_isize = offset;
-		if (ordered &&
-		    ordered->outstanding_isize > test->outstanding_isize)
-			test->outstanding_isize = ordered->outstanding_isize;
-		goto out;
-	}
-	new_i_size = min_t(u64, offset, i_size);
-
-	/*
-	 * Some ordered extents may completed before the current one, and
-	 * we hold the real i_size in ->outstanding_isize.
-	 */
-	if (ordered && ordered->outstanding_isize > new_i_size)
-		new_i_size = min_t(u64, ordered->outstanding_isize, i_size);
-	BTRFS_I(inode)->disk_i_size = new_i_size;
-	ret = 0;
-out:
-	/*
-	 * We need to do this because we can't remove ordered extents until
-	 * after the i_disk_size has been updated and then the inode has been
-	 * updated to reflect the change, so we need to tell anybody who finds
-	 * this ordered extent that we've already done all the real work, we
-	 * just haven't completed all the other work.
-	 */
-	if (ordered)
-		set_bit(BTRFS_ORDERED_UPDATED_ISIZE, &ordered->flags);
-	spin_unlock_irq(&tree->lock);
-	return ret;
-}
-
 /*
  * search the ordered extents for one corresponding to 'offset' and
  * try to find a checksum.  This is used because we allow pages to
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 5204171ea962..7f7f9ad091a6 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -52,11 +52,6 @@ enum {
 	BTRFS_ORDERED_DIRECT,
 	/* We had an io error when writing this out */
 	BTRFS_ORDERED_IOERR,
-	/*
-	 * indicates whether this ordered extent has done its due diligence in
-	 * updating the isize
-	 */
-	BTRFS_ORDERED_UPDATED_ISIZE,
 	/* Set when we have to truncate an extent */
 	BTRFS_ORDERED_TRUNCATED,
 	/* Regular IO for COW */
@@ -180,8 +175,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		struct btrfs_inode *inode,
 		u64 file_offset,
 		u64 len);
-int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
-				struct btrfs_ordered_extent *ordered);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
 			   u8 *sum, int len);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 620bf1b38fba..02ac28f0e99e 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -468,7 +468,6 @@ DEFINE_EVENT(
 		{ (1 << BTRFS_ORDERED_PREALLOC), 	"PREALLOC" 	}, \
 		{ (1 << BTRFS_ORDERED_DIRECT),	 	"DIRECT" 	}, \
 		{ (1 << BTRFS_ORDERED_IOERR), 		"IOERR" 	}, \
-		{ (1 << BTRFS_ORDERED_UPDATED_ISIZE), 	"UPDATED_ISIZE"	}, \
 		{ (1 << BTRFS_ORDERED_TRUNCATED), 	"TRUNCATED"	})
 
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update
  2020-01-07 19:42 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
@ 2020-01-15 17:01   ` Filipe Manana
  0 siblings, 0 replies; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We were using btrfs_i_size_write(), which unconditionally jacks up
> inode->disk_i_size.  However since clone can operate on ranges we could
> have pending ordered extents for a range prior to the start of our clone
> operation and thus increase disk_i_size too far and have a hole with no
> file extent.
>
> Fix this by using the btrfs_ordered_update_i_size helper which will do
> the right thing in the face of pending ordered extents outside of our
> clone range.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/ioctl.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 8ec61f3f0291..291dda3b6547 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3332,8 +3332,10 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
>          */
>         if (endoff > destoff + olen)
>                 endoff = destoff + olen;
> -       if (endoff > inode->i_size)
> -               btrfs_i_size_write(BTRFS_I(inode), endoff);
> +       if (endoff > inode->i_size) {
> +               i_size_write(inode, endoff);
> +               btrfs_ordered_update_i_size(inode, endoff, NULL);
> +       }
>
>         ret = btrfs_update_inode(trans, root, inode);
>         if (ret) {
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] btrfs: introduce the inode->file_extent_tree
  2020-01-07 19:42 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
@ 2020-01-15 17:10   ` Filipe Manana
  0 siblings, 0 replies; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:10 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> In order to keep track of where we have file extents on disk, and thus
> where it is safe to adjust the i_size to, we need to have a tree in
> place to keep track of the contiguous areas we have file extents for.
> Add helpers to use this tree, as it's not required for NO_HOLES file
> systems.  We will use this by setting DIRTY for areas we know we have
> file extent item's set, and clearing it when we remove file extent items
> for truncation.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Just one comment, it's not a blocker for me, and I doubt it will be
for someone else.

So this effectively changes the semantics of i_size update and it now
behaves differently depending on whether no-holes is enabled or not.
So on power failure cases, under the same conditions, we see different
i_size values - I don't think anyone relies on this or should, as the
fs should be allowed to change the implementation any time.

Thanks, looks good.


> ---
>  fs/btrfs/btrfs_inode.h    |  5 +++
>  fs/btrfs/ctree.h          |  5 +++
>  fs/btrfs/extent-io-tree.h |  1 +
>  fs/btrfs/extent_io.c      | 11 +++++
>  fs/btrfs/file-item.c      | 91 +++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/inode.c          |  6 +++
>  6 files changed, 119 insertions(+)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4e12a477d32e..d9dcbac513ed 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -60,6 +60,11 @@ struct btrfs_inode {
>          */
>         struct extent_io_tree io_failure_tree;
>
> +       /* keeps track of where we have extent items mapped in order to make
> +        * sure our i_size adjustments are accurate.
> +        */
> +       struct extent_io_tree file_extent_tree;
> +
>         /* held while logging the inode in tree-log.c */
>         struct mutex log_mutex;
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index a01ff3e0ead4..7142124d03c5 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2807,6 +2807,11 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>                                      struct btrfs_file_extent_item *fi,
>                                      const bool new_inline,
>                                      struct extent_map *em);
> +int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
> +                                       u64 len);
> +int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
> +                                     u64 len);
> +void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize);
>
>  /* inode.c */
>  struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index a3febe746c79..c8bcd2e3184c 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -44,6 +44,7 @@ enum {
>         IO_TREE_TRANS_DIRTY_PAGES,
>         IO_TREE_ROOT_DIRTY_LOG_PAGES,
>         IO_TREE_SELFTEST,
> +       IO_TREE_INODE_FILE_EXTENT,
>  };
>
>  struct extent_io_tree {
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e374411ed08c..410f5a64d3a6 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -265,6 +265,15 @@ void __cold extent_io_exit(void)
>         bioset_exit(&btrfs_bioset);
>  }
>
> +/*
> + * For the file_extent_tree, we want to hold the inode lock when we lookup and
> + * update the disk_i_size, but lockdep will complain because our io_tree we hold
> + * the tree lock and get the inode lock when setting delalloc.  These two things
> + * are unrelated, so make a class for the file_extent_tree so we don't get the
> + * two locking patterns mixed up.
> + */
> +static struct lock_class_key file_extent_tree_class;
> +
>  void extent_io_tree_init(struct btrfs_fs_info *fs_info,
>                          struct extent_io_tree *tree, unsigned int owner,
>                          void *private_data)
> @@ -276,6 +285,8 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
>         spin_lock_init(&tree->lock);
>         tree->private_data = private_data;
>         tree->owner = owner;
> +       if (owner == IO_TREE_INODE_FILE_EXTENT)
> +               lockdep_set_class(&tree->lock, &file_extent_tree_class);
>  }
>
>  void extent_io_tree_release(struct extent_io_tree *tree)
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index 1a599f50837b..b733d85510ed 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -23,6 +23,97 @@
>  #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
>                                        PAGE_SIZE))
>
> +/**
> + * @inode - the inode we want to update the disk_i_size for
> + * @new_isize - the isize we want to set to, 0 if we use i_size
> + *
> + * With NO_HOLES set this simply sets the disk_is_size to whatever i_size_read()
> + * returns as it is perfectly fine with a file that has holes without hole file
> + * extent items.
> + *
> + * However for !NO_HOLES we need to only return the area that is contiguous from
> + * the 0 offset of the file.  Otherwise we could end up adjust i_size up to an
> + * extent that has a gap in between.
> + *
> + * Finally new_isize should only be set in the case of truncate where we're not
> + * ready to use i_size_read() as the limiter yet.
> + */
> +void btrfs_inode_safe_disk_i_size_write(struct inode *inode, u64 new_isize)
> +{
> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> +       u64 start, end, isize;
> +       int ret;
> +
> +       isize = new_isize ? new_isize : i_size_read(inode);
> +       if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
> +               BTRFS_I(inode)->disk_i_size = isize;
> +               return;
> +       }
> +
> +       spin_lock(&BTRFS_I(inode)->lock);
> +       ret = find_first_extent_bit(&BTRFS_I(inode)->file_extent_tree, 0,
> +                                   &start, &end, EXTENT_DIRTY, NULL);
> +       if (!ret && start == 0)
> +               isize = min(isize, end + 1);
> +       else
> +               isize = 0;
> +       BTRFS_I(inode)->disk_i_size = isize;
> +       spin_unlock(&BTRFS_I(inode)->lock);
> +}
> +
> +/**
> + * @inode - the inode we're modifying
> + * @start - the start file offset of the file extent we've inserted
> + * @len - the logical length of the file extent item
> + *
> + * Call when we are insering a new file extent where there was none before.
> + * Does not need to call this in the case where we're replacing an existing file
> + * extent, however if you're unsure it's fine to call this multiple times.
> + *
> + * The start and len must match the file extent item, so thus must be sectorsize
> + * aligned.
> + */
> +int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
> +                                     u64 len)
> +{
> +       if (len == 0)
> +               return 0;
> +
> +       ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize));
> +
> +       if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
> +               return 0;
> +       return set_extent_bits(&inode->file_extent_tree, start, start + len - 1,
> +                              EXTENT_DIRTY);
> +}
> +
> +/**
> + * @inode - the inode we're modifying
> + * @start - the start file offset of the file extent we've inserted
> + * @len - the logical length of the file extent item
> + *
> + * Called when we drop a file extent, for example when we truncate.  Doesn't
> + * need to be called for cases where we're replacing a file extent, like when
> + * we've cow'ed a file extent.
> + *
> + * The start and len must match the file extent item, so thus must be sectorsize
> + * aligned.
> + */
> +int btrfs_inode_clear_file_extent_range(struct btrfs_inode *inode, u64 start,
> +                                       u64 len)
> +{
> +       if (len == 0)
> +               return 0;
> +
> +       ASSERT(IS_ALIGNED(start + len, inode->root->fs_info->sectorsize) ||
> +              len == (u64)-1);
> +
> +       if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
> +               return 0;
> +       return clear_extent_bit(&inode->file_extent_tree, start,
> +                               start + len - 1, EXTENT_DIRTY, 0, 0, NULL);
> +}
> +
>  static inline u32 max_ordered_sum_bytes(struct btrfs_fs_info *fs_info,
>                                         u16 csum_size)
>  {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index db67e1984c91..ab8b972863b1 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3784,6 +3784,9 @@ static int btrfs_read_locked_inode(struct inode *inode,
>         i_uid_write(inode, btrfs_inode_uid(leaf, inode_item));
>         i_gid_write(inode, btrfs_inode_gid(leaf, inode_item));
>         btrfs_i_size_write(BTRFS_I(inode), btrfs_inode_size(leaf, inode_item));
> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> +                                         round_up(i_size_read(inode),
> +                                                  fs_info->sectorsize));
>
>         inode->i_atime.tv_sec = btrfs_timespec_sec(leaf, &inode_item->atime);
>         inode->i_atime.tv_nsec = btrfs_timespec_nsec(leaf, &inode_item->atime);
> @@ -9363,6 +9366,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
>         extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode);
>         extent_io_tree_init(fs_info, &ei->io_failure_tree,
>                             IO_TREE_INODE_IO_FAILURE, inode);
> +       extent_io_tree_init(fs_info, &ei->file_extent_tree,
> +                           IO_TREE_INODE_FILE_EXTENT, inode);
>         ei->io_tree.track_uptodate = true;
>         ei->io_failure_tree.track_uptodate = true;
>         atomic_set(&ei->sync_writers, 0);
> @@ -9429,6 +9434,7 @@ void btrfs_destroy_inode(struct inode *inode)
>         btrfs_qgroup_check_reserved_leak(inode);
>         inode_tree_del(inode);
>         btrfs_drop_extent_cache(BTRFS_I(inode), 0, (u64)-1, 0);
> +       btrfs_inode_clear_file_extent_range(BTRFS_I(inode), 0, (u64)-1);
>         btrfs_put_root(BTRFS_I(inode)->root);
>  }
>
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
@ 2020-01-15 17:12   ` Filipe Manana
  2020-01-15 17:20     ` Josef Bacik
  2020-01-16 12:46   ` Filipe Manana
  1 sibling, 1 reply; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:12 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We want to use this everywhere we modify the file extent items
> permanently.  These include
>
> 1) Inserting new file extents for writes and prealloc extents.
> 2) Truncating inode items.
> 3) btrfs_cont_expand().
> 4) Insert inline extents.
> 5) Insert new extents from log replay.
> 6) Insert a new extent for clone, as it could be past isize.
>
> We do not however call the clear helper for hole punching because it
> simply swaps out an existing file extent for a hole, so there's
> effectively no change as far as the i_size is concerned.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/delayed-inode.c |  4 +++
>  fs/btrfs/file.c          |  6 ++++
>  fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/tree-log.c      |  5 ++++
>  4 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>  {
>         struct btrfs_delayed_node *delayed_node;
>         struct btrfs_inode_item *inode_item;
> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>
>         delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>         if (!delayed_node)
> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>         i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>         i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>         btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> +                                         round_up(i_size_read(inode),
> +                                                  fs_info->sectorsize));

Why set it here when we have already done it at btrfs_read_locked_inode()?
This seems duplicated.

Thanks.

>         inode->i_mode = btrfs_stack_inode_mode(inode_item);
>         set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
>         inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4fadb892af24..f1c880c06ca2 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                               clone_info->file_offset,
> +                                               clone_len);
> +       if (ret)
> +               return ret;
> +
>         /* If it's a hole, nothing more needs to be done. */
>         if (clone_info->disk_offset == 0)
>                 return 0;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ab8b972863b1..5d34007aa7ec 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       /*
> +        * We align size to sectorsize for inline extents just for simplicity
> +        * sake.
> +        */
> +       size = ALIGN(size, root->fs_info->sectorsize);
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
> +       if (ret)
> +               goto fail;
> +
>         /*
>          * we're an inline extent, so nobody can
>          * extend the file past i_size without locking
> @@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
>         ins.offset = disk_num_bytes;
>         ins.type = BTRFS_EXTENT_ITEM_KEY;
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
> +                                               ram_bytes);
> +       if (ret)
> +               goto out;
> +
>         /*
>          * Release the reserved range from inode dirty range map, as it is
>          * already moved into delayed_ref_head
> @@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>         }
>
>         while (1) {
> +               u64 clear_start = 0, clear_len = 0;
> +
>                 fi = NULL;
>                 leaf = path->nodes[0];
>                 btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> @@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>
>                 if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
>                         u64 num_dec;
> +
> +                       clear_start = found_key.offset;
>                         extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>                         if (!del_item) {
>                                 u64 orig_num_bytes =
> @@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                 extent_num_bytes = ALIGN(new_size -
>                                                 found_key.offset,
>                                                 fs_info->sectorsize);
> +                               clear_start = ALIGN(new_size,
> +                                                   fs_info->sectorsize);
>                                 btrfs_set_file_extent_num_bytes(leaf, fi,
>                                                          extent_num_bytes);
>                                 num_dec = (orig_num_bytes -
> @@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                                 inode_sub_bytes(inode, num_dec);
>                                 }
>                         }
> +                       clear_len = num_dec;
>                 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
>                         /*
>                          * we can't truncate inline items that have had
> @@ -4856,12 +4877,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                  */
>                                 ret = NEED_TRUNCATE_BLOCK;
>                                 break;
> +                       } else {
> +                               /*
> +                                * Inline extents are special, we just treat
> +                                * them as a full sector worth in the file
> +                                * extent tree just for simplicity sake.
> +                                */
> +                               clear_len = fs_info->sectorsize;
>                         }
>
>                         if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>                                 inode_sub_bytes(inode, item_end + 1 - new_size);
>                 }
>  delete:
> +               /*
> +                * We use btrfs_truncate_inode_items() to clean up log trees for
> +                * multiple fsyncs, and in this case we don't want to clear the
> +                * file extent range because it's just the log.
> +                */
> +               if (root == BTRFS_I(inode)->root) {
> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                                                 clear_start,
> +                                                                 clear_len);
> +                       if (ret) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               break;
> +                       }
> +               }
> +
>                 if (del_item)
>                         last_size = found_key.offset;
>                 else
> @@ -5183,14 +5226,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                 }
>                 last_byte = min(extent_map_end(em), block_end);
>                 last_byte = ALIGN(last_byte, fs_info->sectorsize);
> +               hole_size = last_byte - cur_offset;
> +
>                 if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
>                         struct extent_map *hole_em;
> -                       hole_size = last_byte - cur_offset;
>
>                         err = maybe_insert_hole(root, inode, cur_offset,
>                                                 hole_size);
>                         if (err)
>                                 break;
> +
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
> +
>                         btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
>                                                 cur_offset + hole_size - 1, 0);
>                         hole_em = alloc_extent_map();
> @@ -5223,6 +5274,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                                                         hole_size - 1, 0);
>                         }
>                         free_extent_map(hole_em);
> +               } else {
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
>                 }
>  next:
>                 free_extent_map(em);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 19364940f9a1..ad25974ff936 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>                         goto out;
>         }
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
> +                                               extent_end - start);
> +       if (ret)
> +               goto out;
> +
>         inode_add_bytes(inode, nbytes);
>  update_inode:
>         ret = btrfs_update_inode(trans, root, inode);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size
  2020-01-07 19:42 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
@ 2020-01-15 17:13   ` Filipe Manana
  0 siblings, 0 replies; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Now that we have a safe way to update the i_size, replace all uses of
> btrfs_ordered_update_i_size with btrfs_inode_safe_disk_i_size_write.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/file.c  |  2 +-
>  fs/btrfs/inode.c | 12 ++++++------
>  fs/btrfs/ioctl.c |  2 +-
>  3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index f1c880c06ca2..35fdc5b99804 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2941,7 +2941,7 @@ static int btrfs_fallocate_update_isize(struct inode *inode,
>
>         inode->i_ctime = current_time(inode);
>         i_size_write(inode, end);
> -       btrfs_ordered_update_i_size(inode, end, NULL);
> +       btrfs_inode_safe_disk_i_size_write(inode, 0);
>         ret = btrfs_update_inode(trans, root, inode);
>         ret2 = btrfs_end_transaction(trans);
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5d34007aa7ec..4a3ef3174d73 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3119,7 +3119,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>                  */
>                 btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
>                                        ordered_extent->len);
> -               btrfs_ordered_update_i_size(inode, 0, ordered_extent);
> +               btrfs_inode_safe_disk_i_size_write(inode, 0);
>                 if (freespace_inode)
>                         trans = btrfs_join_transaction_spacecache(root);
>                 else
> @@ -3207,7 +3207,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>                 goto out;
>         }
>
> -       btrfs_ordered_update_i_size(inode, 0, ordered_extent);
> +       btrfs_inode_safe_disk_i_size_write(inode, 0);
>         ret = btrfs_update_inode_fallback(trans, root, inode);
>         if (ret) { /* -ENOMEM or corruption */
>                 btrfs_abort_transaction(trans, ret);
> @@ -5007,7 +5007,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                 ASSERT(last_size >= new_size);
>                 if (!ret && last_size > new_size)
>                         last_size = new_size;
> -               btrfs_ordered_update_i_size(inode, last_size, NULL);
> +               btrfs_inode_safe_disk_i_size_write(inode, last_size);
>         }
>
>         btrfs_free_path(path);
> @@ -5337,7 +5337,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>                 }
>
>                 i_size_write(inode, newsize);
> -               btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
> +               btrfs_inode_safe_disk_i_size_write(inode, 0);
>                 pagecache_isize_extended(inode, oldsize, newsize);
>                 ret = btrfs_update_inode(trans, root, inode);
>                 btrfs_end_write_no_snapshotting(root);
> @@ -9319,7 +9319,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
>                         ret = PTR_ERR(trans);
>                         goto out;
>                 }
> -               btrfs_ordered_update_i_size(inode, inode->i_size, NULL);
> +               btrfs_inode_safe_disk_i_size_write(inode, 0);
>         }
>
>         if (trans) {
> @@ -10578,7 +10578,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>                         else
>                                 i_size = cur_offset;
>                         i_size_write(inode, i_size);
> -                       btrfs_ordered_update_i_size(inode, i_size, NULL);
> +                       btrfs_inode_safe_disk_i_size_write(inode, 0);
>                 }
>
>                 ret = btrfs_update_inode(trans, root, inode);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 291dda3b6547..2a02a21cac59 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3334,7 +3334,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
>                 endoff = destoff + olen;
>         if (endoff > inode->i_size) {
>                 i_size_write(inode, endoff);
> -               btrfs_ordered_update_i_size(inode, endoff, NULL);
> +               btrfs_inode_safe_disk_i_size_write(inode, 0);
>         }
>
>         ret = btrfs_update_inode(trans, root, inode);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] btrfs: delete the ordered isize update code
  2020-01-07 19:42 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
@ 2020-01-15 17:13   ` Filipe Manana
  0 siblings, 0 replies; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Now that we have a safe way to update the isize, remove all of this code
> as it's no longer needed.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/ordered-data.c      | 128 -----------------------------------
>  fs/btrfs/ordered-data.h      |   7 --
>  include/trace/events/btrfs.h |   1 -
>  3 files changed, 136 deletions(-)
>
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 3a3c648bb9d3..b8de2aea36b3 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -791,134 +791,6 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
>         return entry;
>  }
>
> -/*
> - * After an extent is done, call this to conditionally update the on disk
> - * i_size.  i_size is updated to cover any fully written part of the file.
> - */
> -int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
> -                               struct btrfs_ordered_extent *ordered)
> -{
> -       struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
> -       u64 disk_i_size;
> -       u64 new_i_size;
> -       u64 i_size = i_size_read(inode);
> -       struct rb_node *node;
> -       struct rb_node *prev = NULL;
> -       struct btrfs_ordered_extent *test;
> -       int ret = 1;
> -       u64 orig_offset = offset;
> -
> -       spin_lock_irq(&tree->lock);
> -       if (ordered) {
> -               offset = entry_end(ordered);
> -               if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags))
> -                       offset = min(offset,
> -                                    ordered->file_offset +
> -                                    ordered->truncated_len);
> -       } else {
> -               offset = ALIGN(offset, btrfs_inode_sectorsize(inode));
> -       }
> -       disk_i_size = BTRFS_I(inode)->disk_i_size;
> -
> -       /*
> -        * truncate file.
> -        * If ordered is not NULL, then this is called from endio and
> -        * disk_i_size will be updated by either truncate itself or any
> -        * in-flight IOs which are inside the disk_i_size.
> -        *
> -        * Because btrfs_setsize() may set i_size with disk_i_size if truncate
> -        * fails somehow, we need to make sure we have a precise disk_i_size by
> -        * updating it as usual.
> -        *
> -        */
> -       if (!ordered && disk_i_size > i_size) {
> -               BTRFS_I(inode)->disk_i_size = orig_offset;
> -               ret = 0;
> -               goto out;
> -       }
> -
> -       /*
> -        * if the disk i_size is already at the inode->i_size, or
> -        * this ordered extent is inside the disk i_size, we're done
> -        */
> -       if (disk_i_size == i_size)
> -               goto out;
> -
> -       /*
> -        * We still need to update disk_i_size if outstanding_isize is greater
> -        * than disk_i_size.
> -        */
> -       if (offset <= disk_i_size &&
> -           (!ordered || ordered->outstanding_isize <= disk_i_size))
> -               goto out;
> -
> -       /*
> -        * walk backward from this ordered extent to disk_i_size.
> -        * if we find an ordered extent then we can't update disk i_size
> -        * yet
> -        */
> -       if (ordered) {
> -               node = rb_prev(&ordered->rb_node);
> -       } else {
> -               prev = tree_search(tree, offset);
> -               /*
> -                * we insert file extents without involving ordered struct,
> -                * so there should be no ordered struct cover this offset
> -                */
> -               if (prev) {
> -                       test = rb_entry(prev, struct btrfs_ordered_extent,
> -                                       rb_node);
> -                       BUG_ON(offset_in_entry(test, offset));
> -               }
> -               node = prev;
> -       }
> -       for (; node; node = rb_prev(node)) {
> -               test = rb_entry(node, struct btrfs_ordered_extent, rb_node);
> -
> -               /* We treat this entry as if it doesn't exist */
> -               if (test_bit(BTRFS_ORDERED_UPDATED_ISIZE, &test->flags))
> -                       continue;
> -
> -               if (entry_end(test) <= disk_i_size)
> -                       break;
> -               if (test->file_offset >= i_size)
> -                       break;
> -
> -               /*
> -                * We don't update disk_i_size now, so record this undealt
> -                * i_size. Or we will not know the real i_size.
> -                */
> -               if (test->outstanding_isize < offset)
> -                       test->outstanding_isize = offset;
> -               if (ordered &&
> -                   ordered->outstanding_isize > test->outstanding_isize)
> -                       test->outstanding_isize = ordered->outstanding_isize;
> -               goto out;
> -       }
> -       new_i_size = min_t(u64, offset, i_size);
> -
> -       /*
> -        * Some ordered extents may completed before the current one, and
> -        * we hold the real i_size in ->outstanding_isize.
> -        */
> -       if (ordered && ordered->outstanding_isize > new_i_size)
> -               new_i_size = min_t(u64, ordered->outstanding_isize, i_size);
> -       BTRFS_I(inode)->disk_i_size = new_i_size;
> -       ret = 0;
> -out:
> -       /*
> -        * We need to do this because we can't remove ordered extents until
> -        * after the i_disk_size has been updated and then the inode has been
> -        * updated to reflect the change, so we need to tell anybody who finds
> -        * this ordered extent that we've already done all the real work, we
> -        * just haven't completed all the other work.
> -        */
> -       if (ordered)
> -               set_bit(BTRFS_ORDERED_UPDATED_ISIZE, &ordered->flags);
> -       spin_unlock_irq(&tree->lock);
> -       return ret;
> -}
> -
>  /*
>   * search the ordered extents for one corresponding to 'offset' and
>   * try to find a checksum.  This is used because we allow pages to
> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
> index 5204171ea962..7f7f9ad091a6 100644
> --- a/fs/btrfs/ordered-data.h
> +++ b/fs/btrfs/ordered-data.h
> @@ -52,11 +52,6 @@ enum {
>         BTRFS_ORDERED_DIRECT,
>         /* We had an io error when writing this out */
>         BTRFS_ORDERED_IOERR,
> -       /*
> -        * indicates whether this ordered extent has done its due diligence in
> -        * updating the isize
> -        */
> -       BTRFS_ORDERED_UPDATED_ISIZE,
>         /* Set when we have to truncate an extent */
>         BTRFS_ORDERED_TRUNCATED,
>         /* Regular IO for COW */
> @@ -180,8 +175,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
>                 struct btrfs_inode *inode,
>                 u64 file_offset,
>                 u64 len);
> -int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
> -                               struct btrfs_ordered_extent *ordered);
>  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
>                            u8 *sum, int len);
>  u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> index 620bf1b38fba..02ac28f0e99e 100644
> --- a/include/trace/events/btrfs.h
> +++ b/include/trace/events/btrfs.h
> @@ -468,7 +468,6 @@ DEFINE_EVENT(
>                 { (1 << BTRFS_ORDERED_PREALLOC),        "PREALLOC"      }, \
>                 { (1 << BTRFS_ORDERED_DIRECT),          "DIRECT"        }, \
>                 { (1 << BTRFS_ORDERED_IOERR),           "IOERR"         }, \
> -               { (1 << BTRFS_ORDERED_UPDATED_ISIZE),   "UPDATED_ISIZE" }, \
>                 { (1 << BTRFS_ORDERED_TRUNCATED),       "TRUNCATED"     })
>
>
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-15 17:12   ` Filipe Manana
@ 2020-01-15 17:20     ` Josef Bacik
  2020-01-15 17:34       ` Filipe Manana
  0 siblings, 1 reply; 17+ messages in thread
From: Josef Bacik @ 2020-01-15 17:20 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team

On 1/15/20 12:12 PM, Filipe Manana wrote:
> On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> We want to use this everywhere we modify the file extent items
>> permanently.  These include
>>
>> 1) Inserting new file extents for writes and prealloc extents.
>> 2) Truncating inode items.
>> 3) btrfs_cont_expand().
>> 4) Insert inline extents.
>> 5) Insert new extents from log replay.
>> 6) Insert a new extent for clone, as it could be past isize.
>>
>> We do not however call the clear helper for hole punching because it
>> simply swaps out an existing file extent for a hole, so there's
>> effectively no change as far as the i_size is concerned.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/delayed-inode.c |  4 +++
>>   fs/btrfs/file.c          |  6 ++++
>>   fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/tree-log.c      |  5 ++++
>>   4 files changed, 73 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index d3e15e1d4a91..8b4dcf4f6b3e 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>>   {
>>          struct btrfs_delayed_node *delayed_node;
>>          struct btrfs_inode_item *inode_item;
>> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>>
>>          delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>>          if (!delayed_node)
>> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>>          i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>>          i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>>          btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
>> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
>> +                                         round_up(i_size_read(inode),
>> +                                                  fs_info->sectorsize));
> 
> Why set it here when we have already done it at btrfs_read_locked_inode()?
> This seems duplicated.
> 

Because if btrfs_fill_inode() returns true it means we had a delayed inode in 
memory still and we skip the reading of the inode off of disk, so it's an either 
or case, not a duplicate.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES
  2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
                   ` (4 preceding siblings ...)
  2020-01-07 19:42 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
@ 2020-01-15 17:32 ` Filipe Manana
  2020-01-15 18:44   ` Josef Bacik
  5 siblings, 1 reply; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:32 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> v1->v2:
> - fixed a bug in 'btrfs: use the file extent tree infrastructure' that would
>   result in 0 length files because btrfs_truncate_inode_items() was clearing the
>   file extent map when we fsync'ed multiple times.  Validated this with a
>   modified fsx and generic/521 that reproduced the problem, those modifications
>   were sent up as well.
> - dropped the RFC
>
> ----------------- Original Message -----------------------
> We've historically had this problem where you could flush a targeted section of
> an inode and end up with a hole between extents without a hole extent item.
> This of course makes fsck complain because this is not ok for a file system that
> doesn't have NO_HOLES set.  Because this is a well understood problem I and
> others have been ignoring fsck failures during certain xfstests (generic/475 for
> example) because they would regularly trigger this edge case.
>
> However this isn't a great behavior to have, we should really be taking all fsck
> failures seriously, and we could potentially ignore fsck legitimate fsck errors
> because we expect it to be this particular failure.
>
> In order to fix this we need to keep track of where we have valid extent items,
> and only update i_size to encompass that area.  This unfortunately means we need
> a new per-inode extent_io_tree to keep track of the valid ranges.  This is
> relatively straightforward in practice, and helpers have been added to manage
> this so that in the case of a NO_HOLES file system we just simply skip this work
> altogether.
>
> I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
> really like Filipe to take a look and I still have some longer running tests
> going on the series.  All of our boxes internally are btrfs and the box I was
> testing on ended up with a weird RPM db corruption that was likely from an
> earlier, broken version of the patch.  However I cannot be 100% sure that was
> the case, so I'm giving it a few more days of testing before I'm satisfied
> there's not some weird thing that RPM does that xfstests doesn't cover.
>
> This has gone through several iterations of xfstests already, including many
> loops of generic/475 for validation to make sure it was no longer failing.  So
> far so good, but for something like this wider testing will definitely be
> necessary.  Thanks,

So a comment that applies to the whole patchset.

On power failures we can now end up with non-prealloc extents beyond
the disk_i_size after mounting the filesystem.

Not entirely sure if it will give any potential problems other then
non-reclaimed space for a long time (unless the file is truncated or
written to or beyond the extent's offset), have you tested this
scenario?

I suppose the test cases from fstests that use dm's log writes target
exercise this easily.

Thanks

>
> Josef
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-15 17:20     ` Josef Bacik
@ 2020-01-15 17:34       ` Filipe Manana
  0 siblings, 0 replies; 17+ messages in thread
From: Filipe Manana @ 2020-01-15 17:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Jan 15, 2020 at 5:20 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 1/15/20 12:12 PM, Filipe Manana wrote:
> > On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> We want to use this everywhere we modify the file extent items
> >> permanently.  These include
> >>
> >> 1) Inserting new file extents for writes and prealloc extents.
> >> 2) Truncating inode items.
> >> 3) btrfs_cont_expand().
> >> 4) Insert inline extents.
> >> 5) Insert new extents from log replay.
> >> 6) Insert a new extent for clone, as it could be past isize.
> >>
> >> We do not however call the clear helper for hole punching because it
> >> simply swaps out an existing file extent for a hole, so there's
> >> effectively no change as far as the i_size is concerned.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >> ---
> >>   fs/btrfs/delayed-inode.c |  4 +++
> >>   fs/btrfs/file.c          |  6 ++++
> >>   fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
> >>   fs/btrfs/tree-log.c      |  5 ++++
> >>   4 files changed, 73 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> >> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> >> --- a/fs/btrfs/delayed-inode.c
> >> +++ b/fs/btrfs/delayed-inode.c
> >> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
> >>   {
> >>          struct btrfs_delayed_node *delayed_node;
> >>          struct btrfs_inode_item *inode_item;
> >> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> >>
> >>          delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
> >>          if (!delayed_node)
> >> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
> >>          i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
> >>          i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
> >>          btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> >> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> >> +                                         round_up(i_size_read(inode),
> >> +                                                  fs_info->sectorsize));
> >
> > Why set it here when we have already done it at btrfs_read_locked_inode()?
> > This seems duplicated.
> >
>
> Because if btrfs_fill_inode() returns true it means we had a delayed inode in
> memory still and we skip the reading of the inode off of disk, so it's an either
> or case, not a duplicate.  Thanks,

Oh, yes, I missed that.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES
  2020-01-15 17:32 ` [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Filipe Manana
@ 2020-01-15 18:44   ` Josef Bacik
  0 siblings, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2020-01-15 18:44 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team

On 1/15/20 12:32 PM, Filipe Manana wrote:
> On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> v1->v2:
>> - fixed a bug in 'btrfs: use the file extent tree infrastructure' that would
>>    result in 0 length files because btrfs_truncate_inode_items() was clearing the
>>    file extent map when we fsync'ed multiple times.  Validated this with a
>>    modified fsx and generic/521 that reproduced the problem, those modifications
>>    were sent up as well.
>> - dropped the RFC
>>
>> ----------------- Original Message -----------------------
>> We've historically had this problem where you could flush a targeted section of
>> an inode and end up with a hole between extents without a hole extent item.
>> This of course makes fsck complain because this is not ok for a file system that
>> doesn't have NO_HOLES set.  Because this is a well understood problem I and
>> others have been ignoring fsck failures during certain xfstests (generic/475 for
>> example) because they would regularly trigger this edge case.
>>
>> However this isn't a great behavior to have, we should really be taking all fsck
>> failures seriously, and we could potentially ignore fsck legitimate fsck errors
>> because we expect it to be this particular failure.
>>
>> In order to fix this we need to keep track of where we have valid extent items,
>> and only update i_size to encompass that area.  This unfortunately means we need
>> a new per-inode extent_io_tree to keep track of the valid ranges.  This is
>> relatively straightforward in practice, and helpers have been added to manage
>> this so that in the case of a NO_HOLES file system we just simply skip this work
>> altogether.
>>
>> I've been hammering on this for a week now and I'm pretty sure its ok, but I'd
>> really like Filipe to take a look and I still have some longer running tests
>> going on the series.  All of our boxes internally are btrfs and the box I was
>> testing on ended up with a weird RPM db corruption that was likely from an
>> earlier, broken version of the patch.  However I cannot be 100% sure that was
>> the case, so I'm giving it a few more days of testing before I'm satisfied
>> there's not some weird thing that RPM does that xfstests doesn't cover.
>>
>> This has gone through several iterations of xfstests already, including many
>> loops of generic/475 for validation to make sure it was no longer failing.  So
>> far so good, but for something like this wider testing will definitely be
>> necessary.  Thanks,
> 
> So a comment that applies to the whole patchset.
> 
> On power failures we can now end up with non-prealloc extents beyond
> the disk_i_size after mounting the filesystem.
> 
> Not entirely sure if it will give any potential problems other then
> non-reclaimed space for a long time (unless the file is truncated or
> written to or beyond the extent's offset), have you tested this
> scenario?
> 
> I suppose the test cases from fstests that use dm's log writes target
> exercise this easily.
> 

Yeah I've run it through xfstests a bunch and none of the log writes things blew up.

Keep in mind that this scenario can already happen, just not as easily.  The 
original btrfs_ordered_update_i_size() would only update i_size if the previous 
ordered extent had completed.  If it hadn't you would end up with a normal 
extent past i_size, and if you crashed at the right time you would be in this 
spot.  This patch only makes that case more likely to happen if you happen to do 
something like sync_file_range() in the middle of the dirty range.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
  2020-01-15 17:12   ` Filipe Manana
@ 2020-01-16 12:46   ` Filipe Manana
  1 sibling, 0 replies; 17+ messages in thread
From: Filipe Manana @ 2020-01-16 12:46 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jan 7, 2020 at 7:43 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We want to use this everywhere we modify the file extent items
> permanently.  These include
>
> 1) Inserting new file extents for writes and prealloc extents.
> 2) Truncating inode items.
> 3) btrfs_cont_expand().
> 4) Insert inline extents.
> 5) Insert new extents from log replay.
> 6) Insert a new extent for clone, as it could be past isize.
>
> We do not however call the clear helper for hole punching because it
> simply swaps out an existing file extent for a hole, so there's
> effectively no change as far as the i_size is concerned.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/delayed-inode.c |  4 +++
>  fs/btrfs/file.c          |  6 ++++
>  fs/btrfs/inode.c         | 59 +++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/tree-log.c      |  5 ++++
>  4 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index d3e15e1d4a91..8b4dcf4f6b3e 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>  {
>         struct btrfs_delayed_node *delayed_node;
>         struct btrfs_inode_item *inode_item;
> +       struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>
>         delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
>         if (!delayed_node)
> @@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
>         i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
>         i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
>         btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
> +       btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
> +                                         round_up(i_size_read(inode),
> +                                                  fs_info->sectorsize));
>         inode->i_mode = btrfs_stack_inode_mode(inode_item);
>         set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
>         inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4fadb892af24..f1c880c06ca2 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                               clone_info->file_offset,
> +                                               clone_len);
> +       if (ret)
> +               return ret;
> +
>         /* If it's a hole, nothing more needs to be done. */
>         if (clone_info->disk_offset == 0)
>                 return 0;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ab8b972863b1..5d34007aa7ec 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
>         btrfs_mark_buffer_dirty(leaf);
>         btrfs_release_path(path);
>
> +       /*
> +        * We align size to sectorsize for inline extents just for simplicity
> +        * sake.
> +        */
> +       size = ALIGN(size, root->fs_info->sectorsize);
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
> +       if (ret)
> +               goto fail;
> +
>         /*
>          * we're an inline extent, so nobody can
>          * extend the file past i_size without locking
> @@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
>         ins.offset = disk_num_bytes;
>         ins.type = BTRFS_EXTENT_ITEM_KEY;
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
> +                                               ram_bytes);
> +       if (ret)
> +               goto out;
> +
>         /*
>          * Release the reserved range from inode dirty range map, as it is
>          * already moved into delayed_ref_head
> @@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>         }
>
>         while (1) {
> +               u64 clear_start = 0, clear_len = 0;
> +
>                 fi = NULL;
>                 leaf = path->nodes[0];
>                 btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> @@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>
>                 if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
>                         u64 num_dec;
> +
> +                       clear_start = found_key.offset;
>                         extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>                         if (!del_item) {
>                                 u64 orig_num_bytes =
> @@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                 extent_num_bytes = ALIGN(new_size -
>                                                 found_key.offset,
>                                                 fs_info->sectorsize);
> +                               clear_start = ALIGN(new_size,
> +                                                   fs_info->sectorsize);
>                                 btrfs_set_file_extent_num_bytes(leaf, fi,
>                                                          extent_num_bytes);
>                                 num_dec = (orig_num_bytes -
> @@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                                 inode_sub_bytes(inode, num_dec);
>                                 }
>                         }
> +                       clear_len = num_dec;
>                 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
>                         /*
>                          * we can't truncate inline items that have had
> @@ -4856,12 +4877,34 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>                                  */
>                                 ret = NEED_TRUNCATE_BLOCK;
>                                 break;
> +                       } else {
> +                               /*
> +                                * Inline extents are special, we just treat
> +                                * them as a full sector worth in the file
> +                                * extent tree just for simplicity sake.
> +                                */
> +                               clear_len = fs_info->sectorsize;
>                         }
>
>                         if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>                                 inode_sub_bytes(inode, item_end + 1 - new_size);
>                 }
>  delete:
> +               /*
> +                * We use btrfs_truncate_inode_items() to clean up log trees for
> +                * multiple fsyncs, and in this case we don't want to clear the
> +                * file extent range because it's just the log.
> +                */
> +               if (root == BTRFS_I(inode)->root) {
> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                                                 clear_start,
> +                                                                 clear_len);

Trying this out today:

[ 3567.156540] BUG: sleeping function called from invalid context at
mm/slab.h:565
[ 3567.156595] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
2344, name: fsstress
[ 3567.156636] 4 locks held by fsstress/2344:
[ 3567.156638]  #0: ffff9b5f82a7c410 (sb_writers#12){.+.+}, at:
mnt_want_write+0x20/0x50
[ 3567.156646]  #1: ffff9b5f6a4df698
(&sb->s_type->i_mutex_key#14){+.+.}, at: do_truncate+0x66/0xc0
[ 3567.156651]  #2: ffff9b5f82a7c610 (sb_internal#2){.+.+}, at:
start_transaction+0x3c7/0x5c0 [btrfs]
[ 3567.156673]  #3: ffff9b5f7a746ae8 (btrfs-fs-00){++++}, at:
btrfs_try_tree_write_lock+0x2f/0x1c0 [btrfs]
[ 3567.156692] Preemption disabled at:
[ 3567.156693] [<0000000000000000>] 0x0
[ 3567.156709] CPU: 1 PID: 2344 Comm: fsstress Tainted: G        W
    5.5.0-rc6-btrfs-next-52 #1
[ 3567.156710] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 3567.156711] Call Trace:
[ 3567.156715]  dump_stack+0x87/0xcb
[ 3567.156718]  ___might_sleep+0x287/0x2f0
[ 3567.156735]  ? alloc_extent_state+0x23/0x1f0 [btrfs]
[ 3567.156737]  slab_pre_alloc_hook+0x64/0x80
[ 3567.156739]  kmem_cache_alloc+0x33/0x300
[ 3567.156755]  alloc_extent_state+0x23/0x1f0 [btrfs]
[ 3567.156770]  __clear_extent_bit+0x2d6/0x6b0 [btrfs]
[ 3567.156788]  clear_extent_bit+0x15/0x20 [btrfs]
[ 3567.156799]  btrfs_inode_clear_file_extent_range+0x61/0x80 [btrfs]
[ 3567.156812]  btrfs_truncate_inode_items+0x944/0x10a0 [btrfs]
[ 3567.156822]  ? do_raw_spin_unlock+0x49/0xc0
[ 3567.156836]  btrfs_setattr+0x30b/0x5b0 [btrfs]
[ 3567.156840]  notify_change+0x306/0x460
[ 3567.156843]  do_truncate+0x75/0xc0
[ 3567.156846]  ? generic_permission+0x1d/0x1e0
[ 3567.156850]  vfs_truncate+0x1b0/0x250
[ 3567.156853]  do_sys_truncate+0x79/0xc0
[ 3567.156855]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 3567.156857]  do_syscall_64+0x5c/0x280
[ 3567.156860]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3567.156861] RIP: 0033:0x7fe6436a1c57

We are holding the path with spin locks... We must set it to blocking
locks before clearing the io tree.

Thanks.


> +                       if (ret) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               break;
> +                       }
> +               }
> +
>                 if (del_item)
>                         last_size = found_key.offset;
>                 else
> @@ -5183,14 +5226,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                 }
>                 last_byte = min(extent_map_end(em), block_end);
>                 last_byte = ALIGN(last_byte, fs_info->sectorsize);
> +               hole_size = last_byte - cur_offset;
> +
>                 if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
>                         struct extent_map *hole_em;
> -                       hole_size = last_byte - cur_offset;
>
>                         err = maybe_insert_hole(root, inode, cur_offset,
>                                                 hole_size);
>                         if (err)
>                                 break;
> +
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
> +
>                         btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
>                                                 cur_offset + hole_size - 1, 0);
>                         hole_em = alloc_extent_map();
> @@ -5223,6 +5274,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
>                                                         hole_size - 1, 0);
>                         }
>                         free_extent_map(hole_em);
> +               } else {
> +                       err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
> +                                                               cur_offset,
> +                                                               hole_size);
> +                       if (err)
> +                               break;
>                 }
>  next:
>                 free_extent_map(em);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 19364940f9a1..ad25974ff936 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
>                         goto out;
>         }
>
> +       ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
> +                                               extent_end - start);
> +       if (ret)
> +               goto out;
> +
>         inode_add_bytes(inode, nbytes);
>  update_inode:
>         ret = btrfs_update_inode(trans, root, inode);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/5] btrfs: use the file extent tree infrastructure
  2019-12-30 21:31 [RFC][PATCH 0/5] " Josef Bacik
@ 2019-12-30 21:31 ` Josef Bacik
  0 siblings, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2019-12-30 21:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We want to use this everywhere we modify the file extent items
permanently.  These include

1) Inserting new file extents for writes and prealloc extents.
2) Truncating inode items.
3) btrfs_cont_expand().
4) Insert inline extents.
5) Insert new extents from log replay.
6) Insert a new extent for clone, as it could be past isize.

We do not however call the clear helper for hole punching because it
simply swaps out an existing file extent for a hole, so there's
effectively no change as far as the i_size is concerned.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-inode.c |  4 ++++
 fs/btrfs/file.c          |  6 +++++
 fs/btrfs/inode.c         | 52 +++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/tree-log.c      |  5 ++++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index d3e15e1d4a91..8b4dcf4f6b3e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1762,6 +1762,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 {
 	struct btrfs_delayed_node *delayed_node;
 	struct btrfs_inode_item *inode_item;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 
 	delayed_node = btrfs_get_delayed_node(BTRFS_I(inode));
 	if (!delayed_node)
@@ -1779,6 +1780,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
 	i_uid_write(inode, btrfs_stack_inode_uid(inode_item));
 	i_gid_write(inode, btrfs_stack_inode_gid(inode_item));
 	btrfs_i_size_write(BTRFS_I(inode), btrfs_stack_inode_size(inode_item));
+	btrfs_inode_set_file_extent_range(BTRFS_I(inode), 0,
+					  round_up(i_size_read(inode),
+						   fs_info->sectorsize));
 	inode->i_mode = btrfs_stack_inode_mode(inode_item);
 	set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
 	inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4fadb892af24..f1c880c06ca2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2486,6 +2486,12 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+						clone_info->file_offset,
+						clone_len);
+	if (ret)
+		return ret;
+
 	/* If it's a hole, nothing more needs to be done. */
 	if (clone_info->disk_offset == 0)
 		return 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ab8b972863b1..eaf81129817d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -243,6 +243,15 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
+	/*
+	 * We align size to sectorsize for inline extents just for simplicity
+	 * sake.
+	 */
+	size = ALIGN(size, root->fs_info->sectorsize);
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size);
+	if (ret)
+		goto fail;
+
 	/*
 	 * we're an inline extent, so nobody can
 	 * extend the file past i_size without locking
@@ -2377,6 +2386,11 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos,
+						ram_bytes);
+	if (ret)
+		goto out;
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
@@ -4753,6 +4767,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	}
 
 	while (1) {
+		u64 clear_start = 0, clear_len = 0;
+
 		fi = NULL;
 		leaf = path->nodes[0];
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
@@ -4803,6 +4819,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 
 		if (extent_type != BTRFS_FILE_EXTENT_INLINE) {
 			u64 num_dec;
+
+			clear_start = found_key.offset;
 			extent_start = btrfs_file_extent_disk_bytenr(leaf, fi);
 			if (!del_item) {
 				u64 orig_num_bytes =
@@ -4810,6 +4828,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				extent_num_bytes = ALIGN(new_size -
 						found_key.offset,
 						fs_info->sectorsize);
+				clear_start = ALIGN(new_size,
+						    fs_info->sectorsize);
 				btrfs_set_file_extent_num_bytes(leaf, fi,
 							 extent_num_bytes);
 				num_dec = (orig_num_bytes -
@@ -4835,6 +4855,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						inode_sub_bytes(inode, num_dec);
 				}
 			}
+			clear_len = num_dec;
 		} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
 			/*
 			 * we can't truncate inline items that have had
@@ -4856,12 +4877,27 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 */
 				ret = NEED_TRUNCATE_BLOCK;
 				break;
+			} else {
+				/*
+				 * Inline extents are special, we just treat
+				 * them as a full sector worth in the file
+				 * extent tree just for simplicity sake.
+				 */
+				clear_len = fs_info->sectorsize;
 			}
 
 			if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 				inode_sub_bytes(inode, item_end + 1 - new_size);
 		}
 delete:
+		ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+							  clear_start,
+							  clear_len);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			break;
+		}
+
 		if (del_item)
 			last_size = found_key.offset;
 		else
@@ -5183,14 +5219,22 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 		}
 		last_byte = min(extent_map_end(em), block_end);
 		last_byte = ALIGN(last_byte, fs_info->sectorsize);
+		hole_size = last_byte - cur_offset;
+
 		if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
 			struct extent_map *hole_em;
-			hole_size = last_byte - cur_offset;
 
 			err = maybe_insert_hole(root, inode, cur_offset,
 						hole_size);
 			if (err)
 				break;
+
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
+
 			btrfs_drop_extent_cache(BTRFS_I(inode), cur_offset,
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
@@ -5223,6 +5267,12 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size)
 							hole_size - 1, 0);
 			}
 			free_extent_map(hole_em);
+		} else {
+			err = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
+								cur_offset,
+								hole_size);
+			if (err)
+				break;
 		}
 next:
 		free_extent_map(em);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 19364940f9a1..ad25974ff936 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -829,6 +829,11 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 			goto out;
 	}
 
+	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start,
+						extent_end - start);
+	if (ret)
+		goto out;
+
 	inode_add_bytes(inode, nbytes);
 update_inode:
 	ret = btrfs_update_inode(trans, root, inode);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07 19:42 [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Josef Bacik
2020-01-07 19:42 ` [PATCH 1/5] btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update Josef Bacik
2020-01-15 17:01   ` Filipe Manana
2020-01-07 19:42 ` [PATCH 2/5] btrfs: introduce the inode->file_extent_tree Josef Bacik
2020-01-15 17:10   ` Filipe Manana
2020-01-07 19:42 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik
2020-01-15 17:12   ` Filipe Manana
2020-01-15 17:20     ` Josef Bacik
2020-01-15 17:34       ` Filipe Manana
2020-01-16 12:46   ` Filipe Manana
2020-01-07 19:42 ` [PATCH 4/5] btrfs: replace all uses of btrfs_ordered_update_i_size Josef Bacik
2020-01-15 17:13   ` Filipe Manana
2020-01-07 19:42 ` [PATCH 5/5] btrfs: delete the ordered isize update code Josef Bacik
2020-01-15 17:13   ` Filipe Manana
2020-01-15 17:32 ` [PATCH 0/5][v2] btrfs: fix hole corruption issue with !NO_HOLES Filipe Manana
2020-01-15 18:44   ` Josef Bacik
  -- strict thread matches above, loose matches on Subject: below --
2019-12-30 21:31 [RFC][PATCH 0/5] " Josef Bacik
2019-12-30 21:31 ` [PATCH 3/5] btrfs: use the file extent tree infrastructure Josef Bacik

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git