All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS
@ 2021-09-03 14:44 Johannes Thumshirn
  2021-09-03 14:44 ` [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root Johannes Thumshirn
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

A while ago David reported a bug in zoned btrfs' relocation code. The bug is
triggered because relocation on a zoned filesystem does not preallocate the
extents it copies and a writeback process running in parallel can cause a
split of the written extent. But splitting extents is currently not allowed on
relocation as it assumes a one to one copy of the relocated extents.

This causes transaction aborts and the fielssytem switching to read-only in
order to prevent further damage.

The first patch in this series is just a preparation to avoid overly long
lines in follow up patches. Patch number two adds a dedicated block group for
relocation on a zoned filesystem. Patch three switches relocation from
REQ_OP_ZONE_APPEND to regular REQ_OP_WRITE, four prepares an ASSERT()ion that
we can enter the nocow path on a  zoned filesystem under very special
circumstances and the fifth patch then switches the relocation code for a
zoned filesystem to using the same code path as we use on a non zoned
filesystem. As the changes before have made the prerequisites to do so. The
last patch in this series is jsut a simple rename of a function whose name we
have twice in the btrfs codebase but with a different purpose in different
files.

Johannes Thumshirn (6):
  btrfs: introduce btrfs_is_data_reloc_root
  btrfs: zoned: add a dedicated data relocation block group
  btrfs: zoned: use regular writes for relocation
  btrfs: check for relocation inodes on zoned btrfs in should_nocow
  btrfs: zoned: allow preallocation for relocation inodes
  btrfs: rename setup_extent_mapping in relocation code

 fs/btrfs/block-group.c |  1 +
 fs/btrfs/ctree.h       |  7 ++++++
 fs/btrfs/disk-io.c     |  3 ++-
 fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/inode.c       | 22 ++++++++----------
 fs/btrfs/relocation.c  | 43 +++++-----------------------------
 fs/btrfs/zoned.c       |  3 +++
 fs/btrfs/zoned.h       |  9 ++++++++
 8 files changed, 87 insertions(+), 53 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root
  2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
@ 2021-09-03 14:44 ` Johannes Thumshirn
  2021-09-07 11:36   ` Naohiro Aota
  2021-09-03 14:44 ` [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group Johannes Thumshirn
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

There are several places in our codebase where we check if a root is the
root of the data reloc tree and subsequent patches will introduce more.

Factor out the check into a small helper function instead of open coding
it multiple times.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/ctree.h       |  5 +++++
 fs/btrfs/disk-io.c     |  2 +-
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/inode.c       | 19 ++++++++-----------
 fs/btrfs/relocation.c  |  2 +-
 5 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 38870ae46cbb..8cc0b29e24ee 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3846,6 +3846,11 @@ static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info)
 	return fs_info->zoned != 0;
 }
 
+static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root)
+{
+	return root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID;
+}
+
 /*
  * We use page status Private2 to indicate there is an ordered extent with
  * unfinished IO.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 41ea50f48cfe..9a6be409c1d6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1500,7 +1500,7 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
 		goto fail;
 
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
-	    root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) {
+	    !btrfs_is_data_reloc_root(root)) {
 		set_bit(BTRFS_ROOT_SHAREABLE, &root->state);
 		btrfs_check_and_init_root_item(&root->root_item);
 	}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7d03ffa04bce..239e09f7239a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2376,7 +2376,7 @@ int btrfs_cross_ref_exist(struct btrfs_root *root, u64 objectid, u64 offset,
 
 out:
 	btrfs_free_path(path);
-	if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
+	if (btrfs_is_data_reloc_root(root))
 		WARN_ON(ret > 0);
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0517f31a3bed..8e1a46e9c63e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1150,7 +1150,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	 * fails during the stage where it updates the bytenr of file extent
 	 * items.
 	 */
-	if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
+	if (btrfs_is_data_reloc_root(root))
 		min_alloc_size = num_bytes;
 	else
 		min_alloc_size = fs_info->sectorsize;
@@ -1186,8 +1186,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 		if (ret)
 			goto out_drop_extent_cache;
 
-		if (root->root_key.objectid ==
-		    BTRFS_DATA_RELOC_TREE_OBJECTID) {
+		if (btrfs_is_data_reloc_root(root)) {
 			ret = btrfs_reloc_clone_csums(inode, start,
 						      cur_alloc_size);
 			/*
@@ -1503,8 +1502,7 @@ static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page,
 			   int *page_started, unsigned long *nr_written)
 {
 	const bool is_space_ino = btrfs_is_free_space_inode(inode);
-	const bool is_reloc_ino = (inode->root->root_key.objectid ==
-				   BTRFS_DATA_RELOC_TREE_OBJECTID);
+	const bool is_reloc_ino = btrfs_is_data_reloc_root(inode->root);
 	const u64 range_bytes = end + 1 - start;
 	struct extent_io_tree *io_tree = &inode->io_tree;
 	u64 range_start = start;
@@ -1866,8 +1864,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			btrfs_dec_nocow_writers(fs_info, disk_bytenr);
 		nocow = false;
 
-		if (root->root_key.objectid ==
-		    BTRFS_DATA_RELOC_TREE_OBJECTID)
+		if (btrfs_is_data_reloc_root(root))
 			/*
 			 * Error handled later, as we must prevent
 			 * extent_clear_unlock_delalloc() in error handler
@@ -2206,7 +2203,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
 		if (btrfs_is_testing(fs_info))
 			return;
 
-		if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID &&
+		if (!btrfs_is_data_reloc_root(root) &&
 		    do_list && !(state->state & EXTENT_NORESERVE) &&
 		    (*bits & EXTENT_CLEAR_DATA_RESV))
 			btrfs_free_reserved_data_space_noquota(fs_info, len);
@@ -2531,7 +2528,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 		goto mapit;
 	} else if (async && !skip_sum) {
 		/* csum items have already been cloned */
-		if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
+		if (btrfs_is_data_reloc_root(root))
 			goto mapit;
 		/* we're doing a write, do the async checksumming */
 		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
@@ -3307,7 +3304,7 @@ unsigned int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
 		u64 file_offset = pg_off + page_offset(page);
 		int ret;
 
-		if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID &&
+		if (btrfs_is_data_reloc_root(root) &&
 		    test_range_bit(io_tree, file_offset,
 				   file_offset + sectorsize - 1,
 				   EXTENT_NODATASUM, 1, NULL)) {
@@ -4008,7 +4005,7 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle *trans,
 	 * without delay
 	 */
 	if (!btrfs_is_free_space_inode(inode)
-	    && root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
+	    && !btrfs_is_data_reloc_root(root)
 	    && !test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
 		btrfs_update_root_times(trans, root);
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 63d2b22cf438..3c9c0aab7fc3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4391,7 +4391,7 @@ int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
 		return 0;
 
 	BUG_ON(rc->stage == UPDATE_DATA_PTRS &&
-	       root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID);
+	       btrfs_is_data_reloc_root(root));
 
 	level = btrfs_header_level(buf);
 	if (btrfs_header_generation(buf) <=
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group
  2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
  2021-09-03 14:44 ` [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root Johannes Thumshirn
@ 2021-09-03 14:44 ` Johannes Thumshirn
  2021-09-07 11:37   ` Naohiro Aota
                     ` (2 more replies)
  2021-09-03 14:44 ` [PATCH 3/6] btrfs: zoned: use regular writes for relocation Johannes Thumshirn
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

Relocation in a zoned filesystem can fail with a transaction abort with
error -22 (EINVAL). This happens because the relocation code assumes that
the extents we relocated the data to have the same size the source extents
had and ensures this by preallocating the extents.

But in a zoned filesystem we currently can't preallocate the extents as
this would break the sequential write required rule. Therefore it can
happen that the writeback process kicks in while we're still adding pages
to a delallocation range and starts writing out dirty pages.

This then creates destination extents that are smaller than the source
extents, triggering the following safety check in get_new_location():

 1034         if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi)) {
 1035                 ret = -EINVAL;
 1036                 goto out;
 1037         }

Temporarily create a dedicated block group for the relocation process, so
no non-relocation data writes can interfere with the relocation writes.

This is needed that we can switch the relocation process on a zoned
filesystem from the REQ_OP_ZONE_APPEND writing we use for data to a scheme
like in a non-zoned filesystem using REQ_OP_WRITE and preallocation.

Fixes: 32430c614844 ("btrfs: zoned: enable relocation on a zoned filesystem")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/block-group.c |  1 +
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/disk-io.c     |  1 +
 fs/btrfs/extent-tree.c | 50 ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/zoned.h       |  9 ++++++++
 5 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 1302bf8d0be1..46fdef7bbe20 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -903,6 +903,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	spin_unlock(&cluster->refill_lock);
 
 	btrfs_clear_treelog_bg(block_group);
+	btrfs_clear_data_reloc_bg(block_group);
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8cc0b29e24ee..344ba70315d8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1017,6 +1017,8 @@ struct btrfs_fs_info {
 	struct mutex zoned_meta_io_lock;
 	spinlock_t treelog_bg_lock;
 	u64 treelog_bg;
+	spinlock_t relocation_bg_lock;
+	u64 data_reloc_bg;
 
 	spinlock_t zone_active_bgs_lock;
 	struct list_head zone_active_bgs;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9a6be409c1d6..5541bc6fe8a7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2885,6 +2885,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	spin_lock_init(&fs_info->unused_bgs_lock);
 	spin_lock_init(&fs_info->treelog_bg_lock);
 	spin_lock_init(&fs_info->zone_active_bgs_lock);
+	spin_lock_init(&fs_info->relocation_bg_lock);
 	rwlock_init(&fs_info->tree_mod_log_lock);
 	mutex_init(&fs_info->unused_bg_unpin_mutex);
 	mutex_init(&fs_info->reclaim_bgs_lock);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 239e09f7239a..644791150631 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3497,6 +3497,9 @@ struct find_free_extent_ctl {
 	/* Allocation is called for tree-log */
 	bool for_treelog;
 
+	/* Allocation is called for data relocation */
+	bool for_data_reloc;
+
 	/* RAID index, converted from flags */
 	int index;
 
@@ -3758,6 +3761,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	u64 avail;
 	u64 bytenr = block_group->start;
 	u64 log_bytenr;
+	u64 data_reloc_bytenr;
 	int ret = 0;
 	bool skip;
 
@@ -3775,6 +3779,18 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	if (skip)
 		return 1;
 
+	/*
+	 * Do not allow non-relocation blocks in the dedicated relocation block
+	 * group, and vice versa.
+	 */
+	spin_lock(&fs_info->relocation_bg_lock);
+	data_reloc_bytenr = fs_info->data_reloc_bg;
+	skip = data_reloc_bytenr &&
+		((ffe_ctl->for_data_reloc && bytenr != data_reloc_bytenr) ||
+		 (!ffe_ctl->for_data_reloc && bytenr == data_reloc_bytenr));
+	spin_unlock(&fs_info->relocation_bg_lock);
+	if (skip)
+		return 1;
 	/* Check RO and no space case before trying to activate it */
 	spin_lock(&block_group->lock);
 	if (block_group->ro ||
@@ -3790,10 +3806,14 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	spin_lock(&space_info->lock);
 	spin_lock(&block_group->lock);
 	spin_lock(&fs_info->treelog_bg_lock);
+	spin_lock(&fs_info->relocation_bg_lock);
 
 	ASSERT(!ffe_ctl->for_treelog ||
 	       block_group->start == fs_info->treelog_bg ||
 	       fs_info->treelog_bg == 0);
+	ASSERT(!ffe_ctl->for_data_reloc ||
+	       block_group->start == fs_info->data_reloc_bg ||
+	       fs_info->data_reloc_bg == 0);
 
 	if (block_group->ro) {
 		ret = 1;
@@ -3810,6 +3830,16 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 		goto out;
 	}
 
+	/*
+	 * Do not allow currently used block group to be the data relocation
+	 * dedicated block group.
+	 */
+	if (ffe_ctl->for_data_reloc && !fs_info->data_reloc_bg &&
+	    (block_group->used || block_group->reserved)) {
+		ret = 1;
+		goto out;
+	}
+
 	WARN_ON_ONCE(block_group->alloc_offset > block_group->zone_capacity);
 	avail = block_group->zone_capacity - block_group->alloc_offset;
 	if (avail < num_bytes) {
@@ -3828,6 +3858,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	if (ffe_ctl->for_treelog && !fs_info->treelog_bg)
 		fs_info->treelog_bg = block_group->start;
 
+	if (ffe_ctl->for_data_reloc && !fs_info->data_reloc_bg)
+		fs_info->data_reloc_bg = block_group->start;
+
 	ffe_ctl->found_offset = start + block_group->alloc_offset;
 	block_group->alloc_offset += num_bytes;
 	spin_lock(&ctl->tree_lock);
@@ -3844,6 +3877,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 out:
 	if (ret && ffe_ctl->for_treelog)
 		fs_info->treelog_bg = 0;
+	if (ret && ffe_ctl->for_data_reloc)
+		fs_info->data_reloc_bg = 0;
+	spin_unlock(&fs_info->relocation_bg_lock);
 	spin_unlock(&fs_info->treelog_bg_lock);
 	spin_unlock(&block_group->lock);
 	spin_unlock(&space_info->lock);
@@ -4112,6 +4148,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info,
 				ffe_ctl->hint_byte = fs_info->treelog_bg;
 			spin_unlock(&fs_info->treelog_bg_lock);
 		}
+		if (ffe_ctl->for_data_reloc) {
+			spin_lock(&fs_info->relocation_bg_lock);
+			if (fs_info->data_reloc_bg)
+				ffe_ctl->hint_byte = fs_info->data_reloc_bg;
+			spin_unlock(&fs_info->relocation_bg_lock);
+		}
 		return 0;
 	default:
 		BUG();
@@ -4245,6 +4287,8 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		if (unlikely(block_group->ro)) {
 			if (ffe_ctl->for_treelog)
 				btrfs_clear_treelog_bg(block_group);
+			if (ffe_ctl->for_data_reloc)
+				btrfs_clear_data_reloc_bg(block_group);
 			continue;
 		}
 
@@ -4438,6 +4482,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 	u64 flags;
 	int ret;
 	bool for_treelog = (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID);
+	bool for_data_reloc = (btrfs_is_data_reloc_root(root) && is_data);
 
 	flags = get_alloc_profile_by_root(root, is_data);
 again:
@@ -4451,6 +4496,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 	ffe_ctl.delalloc = delalloc;
 	ffe_ctl.hint_byte = hint_byte;
 	ffe_ctl.for_treelog = for_treelog;
+	ffe_ctl.for_data_reloc = for_data_reloc;
 
 	ret = find_free_extent(root, ins, &ffe_ctl);
 	if (!ret && !is_data) {
@@ -4470,8 +4516,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 
 			sinfo = btrfs_find_space_info(fs_info, flags);
 			btrfs_err(fs_info,
-			"allocation failed flags %llu, wanted %llu tree-log %d",
-				  flags, num_bytes, for_treelog);
+			"allocation failed flags %llu, wanted %llu tree-log %d, relocation: %d",
+				  flags, num_bytes, for_treelog, for_data_reloc);
 			if (sinfo)
 				btrfs_dump_space_info(fs_info, sinfo,
 						      num_bytes, 1);
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 9c512402d7f4..1cf87bb1db08 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -347,4 +347,13 @@ static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg)
 	spin_unlock(&fs_info->treelog_bg_lock);
 }
 
+static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
+{
+	struct btrfs_fs_info *fs_info = bg->fs_info;
+
+	spin_lock(&fs_info->relocation_bg_lock);
+	if (fs_info->data_reloc_bg == bg->start)
+		fs_info->data_reloc_bg = 0;
+	spin_unlock(&fs_info->relocation_bg_lock);
+}
 #endif
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] btrfs: zoned: use regular writes for relocation
  2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
  2021-09-03 14:44 ` [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root Johannes Thumshirn
  2021-09-03 14:44 ` [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group Johannes Thumshirn
@ 2021-09-03 14:44 ` Johannes Thumshirn
  2021-09-07 11:39   ` Naohiro Aota
  2021-09-03 14:44 ` [PATCH 4/6] btrfs: check for relocation inodes on zoned btrfs in should_nocow Johannes Thumshirn
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

Now that we have a dedicated block group for relocation, we can use
REQ_OP_WRITE instead of  REQ_OP_ZONE_APPEND for writing out the data on
relocation.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 28a06c2d80ad..be82823c9b16 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1490,6 +1490,9 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start)
 	if (!is_data_inode(&inode->vfs_inode))
 		return false;
 
+	if (btrfs_is_data_reloc_root(inode->root))
+		return false;
+
 	cache = btrfs_lookup_block_group(fs_info, start);
 	ASSERT(cache);
 	if (!cache)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] btrfs: check for relocation inodes on zoned btrfs in should_nocow
  2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2021-09-03 14:44 ` [PATCH 3/6] btrfs: zoned: use regular writes for relocation Johannes Thumshirn
@ 2021-09-03 14:44 ` Johannes Thumshirn
  2021-09-07 11:40   ` Naohiro Aota
  2021-09-03 14:44 ` [PATCH 5/6] btrfs: zoned: allow preallocation for relocation inodes Johannes Thumshirn
  2021-09-03 14:44 ` [PATCH 6/6] btrfs: rename setup_extent_mapping in relocation code Johannes Thumshirn
  5 siblings, 1 reply; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

Prepare for allowing preallocation for relocation inodes.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8e1a46e9c63e..5f4c8e12ebcc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1944,7 +1944,8 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 	const bool zoned = btrfs_is_zoned(inode->root->fs_info);
 
 	if (should_nocow(inode, start, end)) {
-		ASSERT(!zoned);
+		ASSERT(!zoned ||
+		       (zoned && btrfs_is_data_reloc_root(inode->root)));
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, nr_written);
 	} else if (!inode_can_compress(inode) ||
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] btrfs: zoned: allow preallocation for relocation inodes
  2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2021-09-03 14:44 ` [PATCH 4/6] btrfs: check for relocation inodes on zoned btrfs in should_nocow Johannes Thumshirn
@ 2021-09-03 14:44 ` Johannes Thumshirn
  2021-09-03 14:44 ` [PATCH 6/6] btrfs: rename setup_extent_mapping in relocation code Johannes Thumshirn
  5 siblings, 0 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

Now that we use a dedicated block group and regular WRITEs for data relocation,
we can preallocate the space needed for a relocated inode, just like
regular btrfs does as well.

Essentially this reverts commit 32430c614844 ("btrfs: zoned: enable relocation
on a zoned filesystem") as it is not needed anymore.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/relocation.c | 35 ++---------------------------------
 1 file changed, 2 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 3c9c0aab7fc3..6f668bc01cd1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2853,31 +2853,6 @@ static noinline_for_stack int prealloc_file_extent_cluster(
 	if (ret)
 		return ret;
 
-	/*
-	 * On a zoned filesystem, we cannot preallocate the file region.
-	 * Instead, we dirty and fiemap_write the region.
-	 */
-	if (btrfs_is_zoned(inode->root->fs_info)) {
-		struct btrfs_root *root = inode->root;
-		struct btrfs_trans_handle *trans;
-
-		end = cluster->end - offset + 1;
-		trans = btrfs_start_transaction(root, 1);
-		if (IS_ERR(trans))
-			return PTR_ERR(trans);
-
-		inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode);
-		i_size_write(&inode->vfs_inode, end);
-		ret = btrfs_update_inode(trans, root, inode);
-		if (ret) {
-			btrfs_abort_transaction(trans, ret);
-			btrfs_end_transaction(trans);
-			return ret;
-		}
-
-		return btrfs_end_transaction(trans);
-	}
-
 	btrfs_inode_lock(&inode->vfs_inode, 0);
 	for (nr = 0; nr < cluster->nr; nr++) {
 		start = cluster->boundary[nr] - offset;
@@ -3085,7 +3060,6 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
 static int relocate_file_extent_cluster(struct inode *inode,
 					struct file_extent_cluster *cluster)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u64 offset = BTRFS_I(inode)->index_cnt;
 	unsigned long index;
 	unsigned long last_index;
@@ -3115,8 +3089,6 @@ static int relocate_file_extent_cluster(struct inode *inode,
 	for (index = (cluster->start - offset) >> PAGE_SHIFT;
 	     index <= last_index && !ret; index++)
 		ret = relocate_one_page(inode, ra, cluster, &cluster_nr, index);
-	if (btrfs_is_zoned(fs_info) && !ret)
-		ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
 	if (ret == 0)
 		WARN_ON(cluster_nr != cluster->nr);
 out:
@@ -3771,12 +3743,8 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans,
 	struct btrfs_path *path;
 	struct btrfs_inode_item *item;
 	struct extent_buffer *leaf;
-	u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC;
 	int ret;
 
-	if (btrfs_is_zoned(trans->fs_info))
-		flags &= ~BTRFS_INODE_PREALLOC;
-
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -3791,7 +3759,8 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans,
 	btrfs_set_inode_generation(leaf, item, 1);
 	btrfs_set_inode_size(leaf, item, 0);
 	btrfs_set_inode_mode(leaf, item, S_IFREG | 0600);
-	btrfs_set_inode_flags(leaf, item, flags);
+	btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS |
+					  BTRFS_INODE_PREALLOC);
 	btrfs_mark_buffer_dirty(leaf);
 out:
 	btrfs_free_path(path);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] btrfs: rename setup_extent_mapping in relocation code
  2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
                   ` (4 preceding siblings ...)
  2021-09-03 14:44 ` [PATCH 5/6] btrfs: zoned: allow preallocation for relocation inodes Johannes Thumshirn
@ 2021-09-03 14:44 ` Johannes Thumshirn
  5 siblings, 0 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2021-09-03 14:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Filipe Manana, Damien Le Moal,
	Naohiro Aota

In btrfs we have two functions called setup_extent_mapping, one in
the extent_map code and one in the relocation code. While both are private
to their respective implementation, this can still be confusing for the
reader.

So rename the version in relocation.c to setup_relocation_extent_mapping.
No functional change otherwise.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/relocation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 6f668bc01cd1..bf93e11b6d4e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2880,8 +2880,8 @@ static noinline_for_stack int prealloc_file_extent_cluster(
 }
 
 static noinline_for_stack
-int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
-			 u64 block_start)
+int setup_relocation_extent_mapping(struct inode *inode, u64 start, u64 end,
+				    u64 block_start)
 {
 	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
 	struct extent_map *em;
@@ -3080,7 +3080,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
 
 	file_ra_state_init(ra, inode->i_mapping);
 
-	ret = setup_extent_mapping(inode, cluster->start - offset,
+	ret = setup_relocation_extent_mapping(inode, cluster->start - offset,
 				   cluster->end - offset, cluster->start);
 	if (ret)
 		goto out;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root
  2021-09-03 14:44 ` [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root Johannes Thumshirn
@ 2021-09-07 11:36   ` Naohiro Aota
  0 siblings, 0 replies; 13+ messages in thread
From: Naohiro Aota @ 2021-09-07 11:36 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Filipe Manana, Damien Le Moal

On Fri, Sep 03, 2021 at 11:44:42PM +0900, Johannes Thumshirn wrote:
> There are several places in our codebase where we check if a root is the
> root of the data reloc tree and subsequent patches will introduce more.
> 
> Factor out the check into a small helper function instead of open coding
> it multiple times.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---

Looks good,
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group
  2021-09-03 14:44 ` [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group Johannes Thumshirn
@ 2021-09-07 11:37   ` Naohiro Aota
  2021-09-07 11:52   ` David Sterba
  2021-09-07 12:43   ` David Sterba
  2 siblings, 0 replies; 13+ messages in thread
From: Naohiro Aota @ 2021-09-07 11:37 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Filipe Manana, Damien Le Moal

On Fri, Sep 03, 2021 at 11:44:43PM +0900, Johannes Thumshirn wrote:
> Relocation in a zoned filesystem can fail with a transaction abort with
> error -22 (EINVAL). This happens because the relocation code assumes that
> the extents we relocated the data to have the same size the source extents
> had and ensures this by preallocating the extents.
> 
> But in a zoned filesystem we currently can't preallocate the extents as
> this would break the sequential write required rule. Therefore it can
> happen that the writeback process kicks in while we're still adding pages
> to a delallocation range and starts writing out dirty pages.
> 
> This then creates destination extents that are smaller than the source
> extents, triggering the following safety check in get_new_location():
> 
>  1034         if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi)) {
>  1035                 ret = -EINVAL;
>  1036                 goto out;
>  1037         }
> 
> Temporarily create a dedicated block group for the relocation process, so
> no non-relocation data writes can interfere with the relocation writes.
> 
> This is needed that we can switch the relocation process on a zoned
> filesystem from the REQ_OP_ZONE_APPEND writing we use for data to a scheme
> like in a non-zoned filesystem using REQ_OP_WRITE and preallocation.
> 
> Fixes: 32430c614844 ("btrfs: zoned: enable relocation on a zoned filesystem")
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---

Looks good,
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] btrfs: zoned: use regular writes for relocation
  2021-09-03 14:44 ` [PATCH 3/6] btrfs: zoned: use regular writes for relocation Johannes Thumshirn
@ 2021-09-07 11:39   ` Naohiro Aota
  0 siblings, 0 replies; 13+ messages in thread
From: Naohiro Aota @ 2021-09-07 11:39 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Filipe Manana, Damien Le Moal

On Fri, Sep 03, 2021 at 11:44:44PM +0900, Johannes Thumshirn wrote:
> Now that we have a dedicated block group for relocation, we can use
> REQ_OP_WRITE instead of  REQ_OP_ZONE_APPEND for writing out the data on
> relocation.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/zoned.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 28a06c2d80ad..be82823c9b16 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -1490,6 +1490,9 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start)
>  	if (!is_data_inode(&inode->vfs_inode))
>  		return false;
>  
> +	if (btrfs_is_data_reloc_root(inode->root))
> +		return false;
> +

Not using zone append for data relocation is not straight forward. So,
I'd like to have a comment why we need to use WRITE here.

Apart from that looks good,
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

>  	cache = btrfs_lookup_block_group(fs_info, start);
>  	ASSERT(cache);
>  	if (!cache)
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/6] btrfs: check for relocation inodes on zoned btrfs in should_nocow
  2021-09-03 14:44 ` [PATCH 4/6] btrfs: check for relocation inodes on zoned btrfs in should_nocow Johannes Thumshirn
@ 2021-09-07 11:40   ` Naohiro Aota
  0 siblings, 0 replies; 13+ messages in thread
From: Naohiro Aota @ 2021-09-07 11:40 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Filipe Manana, Damien Le Moal

On Fri, Sep 03, 2021 at 11:44:45PM +0900, Johannes Thumshirn wrote:
> Prepare for allowing preallocation for relocation inodes.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/inode.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8e1a46e9c63e..5f4c8e12ebcc 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1944,7 +1944,8 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
>  	const bool zoned = btrfs_is_zoned(inode->root->fs_info);
>  
>  	if (should_nocow(inode, start, end)) {
> -		ASSERT(!zoned);
> +		ASSERT(!zoned ||
> +		       (zoned && btrfs_is_data_reloc_root(inode->root)));

I want to have a comment why we can allow nocow for this special case.

With that added

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

>  		ret = run_delalloc_nocow(inode, locked_page, start, end,
>  					 page_started, nr_written);
>  	} else if (!inode_can_compress(inode) ||
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group
  2021-09-03 14:44 ` [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group Johannes Thumshirn
  2021-09-07 11:37   ` Naohiro Aota
@ 2021-09-07 11:52   ` David Sterba
  2021-09-07 12:43   ` David Sterba
  2 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2021-09-07 11:52 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Filipe Manana, Damien Le Moal, Naohiro Aota

On Fri, Sep 03, 2021 at 11:44:43PM +0900, Johannes Thumshirn wrote:
> index 8cc0b29e24ee..344ba70315d8 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1017,6 +1017,8 @@ struct btrfs_fs_info {
>  	struct mutex zoned_meta_io_lock;
>  	spinlock_t treelog_bg_lock;
>  	u64 treelog_bg;
> +	spinlock_t relocation_bg_lock;
> +	u64 data_reloc_bg;

Please add some comments for the new members.

> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -347,4 +347,13 @@ static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg)
>  	spin_unlock(&fs_info->treelog_bg_lock);
>  }
>  
> +static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
> +{
> +	struct btrfs_fs_info *fs_info = bg->fs_info;
> +
> +	spin_lock(&fs_info->relocation_bg_lock);
> +	if (fs_info->data_reloc_bg == bg->start)
> +		fs_info->data_reloc_bg = 0;
> +	spin_unlock(&fs_info->relocation_bg_lock);
> +}

This does not look like it's important enough to be static inline.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group
  2021-09-03 14:44 ` [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group Johannes Thumshirn
  2021-09-07 11:37   ` Naohiro Aota
  2021-09-07 11:52   ` David Sterba
@ 2021-09-07 12:43   ` David Sterba
  2 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2021-09-07 12:43 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Filipe Manana, Damien Le Moal, Naohiro Aota

On Fri, Sep 03, 2021 at 11:44:43PM +0900, Johannes Thumshirn wrote:
> +	/*
> +	 * Do not allow non-relocation blocks in the dedicated relocation block
> +	 * group, and vice versa.
> +	 */
> +	spin_lock(&fs_info->relocation_bg_lock);
> +	data_reloc_bytenr = fs_info->data_reloc_bg;
> +	skip = data_reloc_bytenr &&
> +		((ffe_ctl->for_data_reloc && bytenr != data_reloc_bytenr) ||
> +		 (!ffe_ctl->for_data_reloc && bytenr == data_reloc_bytenr));
> +	spin_unlock(&fs_info->relocation_bg_lock);

Please rewrite the expression into an if (...), the condition is not
trivial like in other cases, so it would be better to make it stand out
a bit.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-07 12:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 14:44 [PATCH 0/6] btrfs: zoned: unify relocation on a zoned and regular FS Johannes Thumshirn
2021-09-03 14:44 ` [PATCH 1/6] btrfs: introduce btrfs_is_data_reloc_root Johannes Thumshirn
2021-09-07 11:36   ` Naohiro Aota
2021-09-03 14:44 ` [PATCH 2/6] btrfs: zoned: add a dedicated data relocation block group Johannes Thumshirn
2021-09-07 11:37   ` Naohiro Aota
2021-09-07 11:52   ` David Sterba
2021-09-07 12:43   ` David Sterba
2021-09-03 14:44 ` [PATCH 3/6] btrfs: zoned: use regular writes for relocation Johannes Thumshirn
2021-09-07 11:39   ` Naohiro Aota
2021-09-03 14:44 ` [PATCH 4/6] btrfs: check for relocation inodes on zoned btrfs in should_nocow Johannes Thumshirn
2021-09-07 11:40   ` Naohiro Aota
2021-09-03 14:44 ` [PATCH 5/6] btrfs: zoned: allow preallocation for relocation inodes Johannes Thumshirn
2021-09-03 14:44 ` [PATCH 6/6] btrfs: rename setup_extent_mapping in relocation code Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.