linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: implement metadata DUP for zoned mode
@ 2022-01-26  8:59 Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 1/4] btrfs: zoned: make zone activation multi stripe capable Johannes Thumshirn
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2022-01-26  8:59 UTC (permalink / raw)
  To: David Sterba; +Cc: Johannes Thumshirn, linux-btrfs

Btrfs' default block-group profile for metadata on rotating devices has been
DUP for a long time. Recently the default also changed for non-rotating
devices.

Technically, there is no reason why btrfs on zoned devices can't use DUP for
metadata as well. All I/O to metadata block-groups is serialized via the
zoned_meta_io_lock and written with regular REQ_OP_WRITE operations. Therefore
reordering due to REQ_OP_ZONE_APPEND cannot happen on metadata (as opposed to
data).

The first three patches lay the groundwork by making sure zoned btrfs can work
with more than one stripe and the last patch then implements DUP on metadata
block groups in zoned btrfs.

Johannes Thumshirn (4):
  btrfs: zoned: make zone activation multi stripe capable
  btrfs: zoned: make zone finishing multi stripe capable
  btrfs: zoned: prepare for allowing DUP on zoned
  btrfs: zoned: allow DUP on meta-data block groups

 fs/btrfs/zoned.c | 162 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 105 insertions(+), 57 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] btrfs: zoned: make zone activation multi stripe capable
  2022-01-26  8:59 [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
@ 2022-01-26  8:59 ` Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 2/4] btrfs: zoned: make zone finishing " Johannes Thumshirn
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2022-01-26  8:59 UTC (permalink / raw)
  To: David Sterba; +Cc: Johannes Thumshirn, linux-btrfs

Currently activation of a zone only works if the block group isn't
spanning more than one zone.

This limitation is purely artificial and can be easily expanded to block
groups being places across multiple zones.

This is a preparation for allowing DUP and later more complex block-group
profiles on zoned btrfs.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 57 ++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index f559d517c7c4..e3d198405d39 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1781,50 +1781,55 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 	struct btrfs_device *device;
 	u64 physical;
 	bool ret;
+	int i;
 
 	if (!btrfs_is_zoned(block_group->fs_info))
 		return true;
 
 	map = block_group->physical_map;
-	/* Currently support SINGLE profile only */
-	ASSERT(map->num_stripes == 1);
-	device = map->stripes[0].dev;
-	physical = map->stripes[0].physical;
-
-	if (device->zone_info->max_active_zones == 0)
-		return true;
 
 	spin_lock(&block_group->lock);
-
 	if (block_group->zone_is_active) {
 		ret = true;
 		goto out_unlock;
 	}
 
-	/* No space left */
-	if (block_group->alloc_offset == block_group->zone_capacity) {
-		ret = false;
-		goto out_unlock;
-	}
+	for (i = 0; i < map->num_stripes; i++) {
+		device = map->stripes[i].dev;
+		physical = map->stripes[i].physical;
 
-	if (!btrfs_dev_set_active_zone(device, physical)) {
-		/* Cannot activate the zone */
-		ret = false;
-		goto out_unlock;
-	}
+		if (device->zone_info->max_active_zones == 0)
+			continue;
+
+		/* No space left */
+		if (block_group->alloc_offset == block_group->zone_capacity) {
+			ret = false;
+			goto out_unlock;
+		}
+
+		if (!btrfs_dev_set_active_zone(device, physical)) {
+			/* Cannot activate the zone */
+			ret = false;
+			goto out_unlock;
+		}
+
+		/* Successfully activated all the zones */
+		if (i == map->num_stripes - 1)
+			block_group->zone_is_active = 1;
 
-	/* Successfully activated all the zones */
-	block_group->zone_is_active = 1;
 
+	}
 	spin_unlock(&block_group->lock);
 
-	/* For the active block group list */
-	btrfs_get_block_group(block_group);
+	if (block_group->zone_is_active) {
+		/* For the active block group list */
+		btrfs_get_block_group(block_group);
 
-	spin_lock(&fs_info->zone_active_bgs_lock);
-	ASSERT(list_empty(&block_group->active_bg_list));
-	list_add_tail(&block_group->active_bg_list, &fs_info->zone_active_bgs);
-	spin_unlock(&fs_info->zone_active_bgs_lock);
+		spin_lock(&fs_info->zone_active_bgs_lock);
+		list_add_tail(&block_group->active_bg_list,
+			      &fs_info->zone_active_bgs);
+		spin_unlock(&fs_info->zone_active_bgs_lock);
+	}
 
 	return true;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] btrfs: zoned: make zone finishing multi stripe capable
  2022-01-26  8:59 [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 1/4] btrfs: zoned: make zone activation multi stripe capable Johannes Thumshirn
@ 2022-01-26  8:59 ` Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 3/4] btrfs: zoned: prepare for allowing DUP on zoned Johannes Thumshirn
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2022-01-26  8:59 UTC (permalink / raw)
  To: David Sterba; +Cc: Johannes Thumshirn, linux-btrfs

Currently finishing of a zone only works if the block group isn't
spanning more than one zone.

This limitation is purely artificial and can be easily expanded to block
groups being places across multiple zones.

This is a preparation for allowing DUP and later more complex block-group
profiles on zoned btrfs.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index e3d198405d39..987d2456d398 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1845,19 +1845,12 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
 	struct btrfs_device *device;
 	u64 physical;
 	int ret = 0;
+	int i;
 
 	if (!btrfs_is_zoned(fs_info))
 		return 0;
 
 	map = block_group->physical_map;
-	/* Currently support SINGLE profile only */
-	ASSERT(map->num_stripes == 1);
-
-	device = map->stripes[0].dev;
-	physical = map->stripes[0].physical;
-
-	if (device->zone_info->max_active_zones == 0)
-		return 0;
 
 	spin_lock(&block_group->lock);
 	if (!block_group->zone_is_active) {
@@ -1909,25 +1902,34 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
 	btrfs_clear_data_reloc_bg(block_group);
 	spin_unlock(&block_group->lock);
 
-	ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
-			       physical >> SECTOR_SHIFT,
-			       device->zone_info->zone_size >> SECTOR_SHIFT,
-			       GFP_NOFS);
-	btrfs_dec_block_group_ro(block_group);
+	for (i = 0; i < map->num_stripes; i++) {
+		device = map->stripes[i].dev;
+		physical = map->stripes[i].physical;
 
-	if (!ret) {
-		btrfs_dev_clear_active_zone(device, physical);
+		if (device->zone_info->max_active_zones == 0)
+			continue;
 
-		spin_lock(&fs_info->zone_active_bgs_lock);
-		ASSERT(!list_empty(&block_group->active_bg_list));
-		list_del_init(&block_group->active_bg_list);
-		spin_unlock(&fs_info->zone_active_bgs_lock);
+		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
+				       physical >> SECTOR_SHIFT,
+				       device->zone_info->zone_size >> SECTOR_SHIFT,
+				       GFP_NOFS);
+
+		if (ret)
+			return ret;
 
-		/* For active_bg_list */
-		btrfs_put_block_group(block_group);
+		btrfs_dev_clear_active_zone(device, physical);
 	}
+	btrfs_dec_block_group_ro(block_group);
 
-	return ret;
+	spin_lock(&fs_info->zone_active_bgs_lock);
+	ASSERT(!list_empty(&block_group->active_bg_list));
+	list_del_init(&block_group->active_bg_list);
+	spin_unlock(&fs_info->zone_active_bgs_lock);
+
+	/* For active_bg_list */
+	btrfs_put_block_group(block_group);
+
+	return 0;
 }
 
 bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] btrfs: zoned: prepare for allowing DUP on zoned
  2022-01-26  8:59 [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 1/4] btrfs: zoned: make zone activation multi stripe capable Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 2/4] btrfs: zoned: make zone finishing " Johannes Thumshirn
@ 2022-01-26  8:59 ` Johannes Thumshirn
  2022-01-26  8:59 ` [PATCH 4/4] btrfs: zoned: allow DUP on meta-data block groups Johannes Thumshirn
  2022-01-26  9:15 ` [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
  4 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2022-01-26  8:59 UTC (permalink / raw)
  To: David Sterba; +Cc: Johannes Thumshirn, linux-btrfs

Allow for a block-group to be placed on more than one physical zone.

This is a preparation for allowing DUP profiles for meta-data on a zoned
file-system.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 987d2456d398..4f6f7afaa5a2 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1215,12 +1215,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	struct btrfs_device *device;
 	u64 logical = cache->start;
 	u64 length = cache->length;
-	u64 physical = 0;
 	int ret;
 	int i;
 	unsigned int nofs_flag;
 	u64 *alloc_offsets = NULL;
 	u64 *caps = NULL;
+	u64 *physical = NULL;
 	unsigned long *active = NULL;
 	u64 last_alloc = 0;
 	u32 num_sequential = 0, num_conventional = 0;
@@ -1264,6 +1264,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		goto out;
 	}
 
+	physical = kcalloc(map->num_stripes, sizeof(*physical), GFP_NOFS);
+	if (!physical) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	active = bitmap_zalloc(map->num_stripes, GFP_NOFS);
 	if (!active) {
 		ret = -ENOMEM;
@@ -1277,14 +1283,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		int dev_replace_is_ongoing = 0;
 
 		device = map->stripes[i].dev;
-		physical = map->stripes[i].physical;
+		physical[i] = map->stripes[i].physical;
 
 		if (device->bdev == NULL) {
 			alloc_offsets[i] = WP_MISSING_DEV;
 			continue;
 		}
 
-		is_sequential = btrfs_dev_is_sequential(device, physical);
+		is_sequential = btrfs_dev_is_sequential(device, physical[i]);
 		if (is_sequential)
 			num_sequential++;
 		else
@@ -1299,21 +1305,21 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		 * This zone will be used for allocation, so mark this zone
 		 * non-empty.
 		 */
-		btrfs_dev_clear_zone_empty(device, physical);
+		btrfs_dev_clear_zone_empty(device, physical[i]);
 
 		down_read(&dev_replace->rwsem);
 		dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
 		if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL)
-			btrfs_dev_clear_zone_empty(dev_replace->tgtdev, physical);
+			btrfs_dev_clear_zone_empty(dev_replace->tgtdev, physical[i]);
 		up_read(&dev_replace->rwsem);
 
 		/*
 		 * The group is mapped to a sequential zone. Get the zone write
 		 * pointer to determine the allocation offset within the zone.
 		 */
-		WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size));
+		WARN_ON(!IS_ALIGNED(physical[i], fs_info->zone_size));
 		nofs_flag = memalloc_nofs_save();
-		ret = btrfs_get_dev_zone(device, physical, &zone);
+		ret = btrfs_get_dev_zone(device, physical[i], &zone);
 		memalloc_nofs_restore(nofs_flag);
 		if (ret == -EIO || ret == -EOPNOTSUPP) {
 			ret = 0;
@@ -1339,7 +1345,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		case BLK_ZONE_COND_READONLY:
 			btrfs_err(fs_info,
 		"zoned: offline/readonly zone %llu on device %s (devid %llu)",
-				  physical >> device->zone_info->zone_size_shift,
+				  physical[i] >> device->zone_info->zone_size_shift,
 				  rcu_str_deref(device->name), device->devid);
 			alloc_offsets[i] = WP_MISSING_DEV;
 			break;
@@ -1404,7 +1410,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		if (alloc_offsets[0] == WP_MISSING_DEV) {
 			btrfs_err(fs_info,
 			"zoned: cannot recover write pointer for zone %llu",
-				physical);
+				physical[0]);
 			ret = -EIO;
 			goto out;
 		}
@@ -1465,6 +1471,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		cache->physical_map = NULL;
 	}
 	bitmap_free(active);
+	kfree(physical);
 	kfree(caps);
 	kfree(alloc_offsets);
 	free_extent_map(em);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] btrfs: zoned: allow DUP on meta-data block groups
  2022-01-26  8:59 [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2022-01-26  8:59 ` [PATCH 3/4] btrfs: zoned: prepare for allowing DUP on zoned Johannes Thumshirn
@ 2022-01-26  8:59 ` Johannes Thumshirn
  2022-01-26  9:15 ` [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
  4 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2022-01-26  8:59 UTC (permalink / raw)
  To: David Sterba; +Cc: Johannes Thumshirn, linux-btrfs

Allow creating or reading block-groups on a zoned device with DUP as a
meta-data profile.

This works because we're using the zoned_meta_io_lock and REQ_OP_WRITE
operations for meta-data on zoned btrfs, so all writes to meta-data zones
are aligned to the zone's write-pointer.

Upon loading of the block-group, it is ensured both zones do have the same
zone capacity and write-pointer offsets, so no extra machinery is needed
to keep the write-pointers in sync for the meta-data zones. If this
prerequisite is not met, loading of the block-group is refused.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 4f6f7afaa5a2..39fc5487e78c 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1419,6 +1419,40 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		cache->zone_is_active = test_bit(0, active);
 		break;
 	case BTRFS_BLOCK_GROUP_DUP:
+		if (map->type & BTRFS_BLOCK_GROUP_DATA) {
+			btrfs_err(fs_info, "zoned: profile DUP not yet supported on data bg");
+			ret = -EINVAL;
+			goto out;
+		}
+		if (alloc_offsets[0] == WP_MISSING_DEV) {
+			btrfs_err(fs_info,
+			"zoned: cannot recover write pointer for zone %llu",
+				physical[0]);
+			ret = -EIO;
+			goto out;
+		}
+		if (alloc_offsets[1] == WP_MISSING_DEV) {
+			btrfs_err(fs_info,
+			"zoned: cannot recover write pointer for zone %llu",
+				physical[1]);
+			ret = -EIO;
+			goto out;
+		}
+		if (alloc_offsets[0] != alloc_offsets[1]) {
+			btrfs_err(fs_info,
+			"zoned: write pointer offset mismatch of zones in DUP profile");
+			ret = -EIO;
+			goto out;
+		}
+		if (test_bit(0, active) != test_bit(1, active)) {
+			btrfs_err(fs_info, "zoned: active zone mismatch");
+			ret = -EIO;
+			goto out;
+		}
+		cache->alloc_offset = alloc_offsets[0];
+		cache->zone_capacity = min(caps[0], caps[1]);
+		cache->zone_is_active = test_bit(0, active);
+		break;
 	case BTRFS_BLOCK_GROUP_RAID1:
 	case BTRFS_BLOCK_GROUP_RAID0:
 	case BTRFS_BLOCK_GROUP_RAID10:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] btrfs: implement metadata DUP for zoned mode
  2022-01-26  8:59 [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2022-01-26  8:59 ` [PATCH 4/4] btrfs: zoned: allow DUP on meta-data block groups Johannes Thumshirn
@ 2022-01-26  9:15 ` Johannes Thumshirn
  4 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2022-01-26  9:15 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs

On Wed, 2022-01-26 at 00:59 -0800, Johannes Thumshirn wrote:
> Btrfs' default block-group profile for metadata on rotating devices
> has been
> DUP for a long time. Recently the default also changed for non-
> rotating
> devices.
> 
> Technically, there is no reason why btrfs on zoned devices can't use
> DUP for
> metadata as well. All I/O to metadata block-groups is serialized via
> the
> zoned_meta_io_lock and written with regular REQ_OP_WRITE operations.
> Therefore
> reordering due to REQ_OP_ZONE_APPEND cannot happen on metadata (as
> opposed to
> data).
> 


Please disregard this series, I've forgot to git commit --amend a part.

Will send a v2 shortly.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-26  9:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26  8:59 [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn
2022-01-26  8:59 ` [PATCH 1/4] btrfs: zoned: make zone activation multi stripe capable Johannes Thumshirn
2022-01-26  8:59 ` [PATCH 2/4] btrfs: zoned: make zone finishing " Johannes Thumshirn
2022-01-26  8:59 ` [PATCH 3/4] btrfs: zoned: prepare for allowing DUP on zoned Johannes Thumshirn
2022-01-26  8:59 ` [PATCH 4/4] btrfs: zoned: allow DUP on meta-data block groups Johannes Thumshirn
2022-01-26  9:15 ` [PATCH 0/4] btrfs: implement metadata DUP for zoned mode Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).