All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: zoned: fixes for zone finishing
@ 2022-04-28 15:02 Naohiro Aota
  2022-04-28 15:02 ` [PATCH 1/4] btrfs: zoned: consolidate zone finish function Naohiro Aota
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Naohiro Aota @ 2022-04-28 15:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Commit be1a1d7a5d24 ("btrfs: zoned: finish fully written block group")
introduced zone finishing a block group when the IO reaches at the end of
the block group. However, since the zone capacity may not aligned to 16KB
(node size), we can leave an un-allocatable space at the end of a block
group. Also, it turned out that metadata zone finishing never works
actually.

This series addresses these issues by rewriting metadata zone finishing
code to use a workqueue to process the finishing work.

Patch 1 is a clean-up patch to consolidate zone finishing function for
better maintainability.

Patch 2 changes the left region calculation so that it finishes a block
group when there is no more space left for allocation.

Patch 3 fixes metadata block group finishing which is not actually working.

Patch 4 implements zone finishing of an unused block group and fixes active
block group accounting. This patch is a bit unrelated to other ones. But,
the patch is tested with the previous patches applied, so let me go with
the same series.

Naohiro Aota (4):
  btrfs: zoned: consolidate zone finish function
  btrfs: zoned: finish BG when there are no more allocatable bytes left
  btrfs: zoned: properly finish block group on metadata write
  btrfs: zoned: zone finish unused block group

 fs/btrfs/block-group.c |   8 ++
 fs/btrfs/block-group.h |   2 +
 fs/btrfs/extent_io.c   |   6 +-
 fs/btrfs/extent_io.h   |   1 -
 fs/btrfs/zoned.c       | 163 +++++++++++++++++++++++------------------
 fs/btrfs/zoned.h       |   5 ++
 6 files changed, 109 insertions(+), 76 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] btrfs: zoned: consolidate zone finish function
  2022-04-28 15:02 [PATCH 0/4] btrfs: zoned: fixes for zone finishing Naohiro Aota
@ 2022-04-28 15:02 ` Naohiro Aota
  2022-04-28 16:11   ` David Sterba
  2022-04-28 15:02 ` [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left Naohiro Aota
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Naohiro Aota @ 2022-04-28 15:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

btrfs_zone_finish() and btrfs_zone_finish_endio() have similar code.
Introduce __btrfs_zone_finish() to consolidate them.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 127 ++++++++++++++++++++---------------------------
 1 file changed, 54 insertions(+), 73 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 997a96d7a3d5..9cddafe78fb1 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1879,20 +1879,14 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 	return ret;
 }
 
-int btrfs_zone_finish(struct btrfs_block_group *block_group)
+static int __btrfs_zone_finish(struct btrfs_block_group *block_group, bool nowait)
 {
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
 	struct map_lookup *map;
-	struct btrfs_device *device;
-	u64 physical;
+	bool need_zone_finish;
 	int ret = 0;
 	int i;
 
-	if (!btrfs_is_zoned(fs_info))
-		return 0;
-
-	map = block_group->physical_map;
-
 	spin_lock(&block_group->lock);
 	if (!block_group->zone_is_active) {
 		spin_unlock(&block_group->lock);
@@ -1906,36 +1900,42 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
 		spin_unlock(&block_group->lock);
 		return -EAGAIN;
 	}
-	spin_unlock(&block_group->lock);
 
-	ret = btrfs_inc_block_group_ro(block_group, false);
-	if (ret)
-		return ret;
+	if (!nowait) {
+		spin_unlock(&block_group->lock);
 
-	/* Ensure all writes in this block group finish */
-	btrfs_wait_block_group_reservations(block_group);
-	/* No need to wait for NOCOW writers. Zoned mode does not allow that. */
-	btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
-				 block_group->length);
+		ret = btrfs_inc_block_group_ro(block_group, false);
+		if (ret)
+			return ret;
 
-	spin_lock(&block_group->lock);
+		/* Ensure all writes in this block group finish */
+		btrfs_wait_block_group_reservations(block_group);
+		/* No need to wait for NOCOW writers. Zoned mode does not allow that. */
+		btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
+					 block_group->length);
 
-	/*
-	 * Bail out if someone already deactivated the block group, or
-	 * allocated space is left in the block group.
-	 */
-	if (!block_group->zone_is_active) {
-		spin_unlock(&block_group->lock);
-		btrfs_dec_block_group_ro(block_group);
-		return 0;
-	}
+		spin_lock(&block_group->lock);
 
-	if (block_group->reserved) {
-		spin_unlock(&block_group->lock);
-		btrfs_dec_block_group_ro(block_group);
-		return -EAGAIN;
+		/*
+		 * Bail out if someone already deactivated the block group, or
+		 * allocated space is left in the block group.
+		 */
+		if (!block_group->zone_is_active) {
+			spin_unlock(&block_group->lock);
+			btrfs_dec_block_group_ro(block_group);
+			return 0;
+		}
+
+		if (block_group->reserved) {
+			spin_unlock(&block_group->lock);
+			btrfs_dec_block_group_ro(block_group);
+			return -EAGAIN;
+		}
 	}
 
+	/* There is unwritten space left. Need to finish the underlying zones. */
+	need_zone_finish = (block_group->zone_capacity - block_group->alloc_offset) > 0;
+
 	block_group->zone_is_active = 0;
 	block_group->alloc_offset = block_group->zone_capacity;
 	block_group->free_space_ctl->free_space = 0;
@@ -1943,24 +1943,29 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
 	btrfs_clear_data_reloc_bg(block_group);
 	spin_unlock(&block_group->lock);
 
+	map = block_group->physical_map;
 	for (i = 0; i < map->num_stripes; i++) {
-		device = map->stripes[i].dev;
-		physical = map->stripes[i].physical;
+		struct btrfs_device *device = map->stripes[i].dev;
+		const u64 physical = map->stripes[i].physical;
 
 		if (device->zone_info->max_active_zones == 0)
 			continue;
 
-		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
-				       physical >> SECTOR_SHIFT,
-				       device->zone_info->zone_size >> SECTOR_SHIFT,
-				       GFP_NOFS);
+		if (need_zone_finish) {
+			ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
+					       physical >> SECTOR_SHIFT,
+					       device->zone_info->zone_size >> SECTOR_SHIFT,
+					       GFP_NOFS);
 
-		if (ret)
-			return ret;
+			if (ret)
+				return ret;
+		}
 
 		btrfs_dev_clear_active_zone(device, physical);
 	}
-	btrfs_dec_block_group_ro(block_group);
+
+	if (!nowait)
+		btrfs_dec_block_group_ro(block_group);
 
 	spin_lock(&fs_info->zone_active_bgs_lock);
 	ASSERT(!list_empty(&block_group->active_bg_list));
@@ -1973,6 +1978,14 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
 	return 0;
 }
 
+int btrfs_zone_finish(struct btrfs_block_group *block_group)
+{
+	if (!btrfs_is_zoned(block_group->fs_info))
+		return 0;
+
+	return __btrfs_zone_finish(block_group, false);
+}
+
 bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
 {
 	struct btrfs_fs_info *fs_info = fs_devices->fs_info;
@@ -2004,9 +2017,6 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
 void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 length)
 {
 	struct btrfs_block_group *block_group;
-	struct map_lookup *map;
-	struct btrfs_device *device;
-	u64 physical;
 
 	if (!btrfs_is_zoned(fs_info))
 		return;
@@ -2017,36 +2027,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
 	if (logical + length < block_group->start + block_group->zone_capacity)
 		goto out;
 
-	spin_lock(&block_group->lock);
-
-	if (!block_group->zone_is_active) {
-		spin_unlock(&block_group->lock);
-		goto out;
-	}
-
-	block_group->zone_is_active = 0;
-	/* We should have consumed all the free space */
-	ASSERT(block_group->alloc_offset == block_group->zone_capacity);
-	ASSERT(block_group->free_space_ctl->free_space == 0);
-	btrfs_clear_treelog_bg(block_group);
-	btrfs_clear_data_reloc_bg(block_group);
-	spin_unlock(&block_group->lock);
-
-	map = block_group->physical_map;
-	device = map->stripes[0].dev;
-	physical = map->stripes[0].physical;
-
-	if (!device->zone_info->max_active_zones)
-		goto out;
-
-	btrfs_dev_clear_active_zone(device, physical);
-
-	spin_lock(&fs_info->zone_active_bgs_lock);
-	ASSERT(!list_empty(&block_group->active_bg_list));
-	list_del_init(&block_group->active_bg_list);
-	spin_unlock(&fs_info->zone_active_bgs_lock);
-
-	btrfs_put_block_group(block_group);
+	__btrfs_zone_finish(block_group, true);
 
 out:
 	btrfs_put_block_group(block_group);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left
  2022-04-28 15:02 [PATCH 0/4] btrfs: zoned: fixes for zone finishing Naohiro Aota
  2022-04-28 15:02 ` [PATCH 1/4] btrfs: zoned: consolidate zone finish function Naohiro Aota
@ 2022-04-28 15:02 ` Naohiro Aota
  2022-04-29 11:55   ` Pankaj Raghav
  2022-04-28 15:02 ` [PATCH 3/4] btrfs: zoned: properly finish block group on metadata write Naohiro Aota
  2022-04-28 15:02 ` [PATCH 4/4] btrfs: zoned: zone finish unused block group Naohiro Aota
  3 siblings, 1 reply; 10+ messages in thread
From: Naohiro Aota @ 2022-04-28 15:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Currently, btrfs_zone_finish_endio() finishes a block group only when the
written region reaches the end of the block group. We can also finish the
block group when no more allocation is possible.

Cc: stable@vger.kernel.org # 5.16+
Fixes: be1a1d7a5d24 ("btrfs: zoned: finish fully written block group")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 9cddafe78fb1..0f6ca3587c3b 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2017,6 +2017,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
 void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 length)
 {
 	struct btrfs_block_group *block_group;
+	u64 min_use;
 
 	if (!btrfs_is_zoned(fs_info))
 		return;
@@ -2024,7 +2025,14 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
 	block_group = btrfs_lookup_block_group(fs_info, logical);
 	ASSERT(block_group);
 
-	if (logical + length < block_group->start + block_group->zone_capacity)
+	/* No MIXED BG on zoned btrfs. */
+	if (block_group->flags & BTRFS_BLOCK_GROUP_DATA)
+		min_use = fs_info->sectorsize;
+	else
+		min_use = fs_info->nodesize;
+
+	/* Bail out if we can allocate more data from this BG. */
+	if (logical + length + min_use <= block_group->start + block_group->zone_capacity)
 		goto out;
 
 	__btrfs_zone_finish(block_group, true);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] btrfs: zoned: properly finish block group on metadata write
  2022-04-28 15:02 [PATCH 0/4] btrfs: zoned: fixes for zone finishing Naohiro Aota
  2022-04-28 15:02 ` [PATCH 1/4] btrfs: zoned: consolidate zone finish function Naohiro Aota
  2022-04-28 15:02 ` [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left Naohiro Aota
@ 2022-04-28 15:02 ` Naohiro Aota
  2022-04-28 15:02 ` [PATCH 4/4] btrfs: zoned: zone finish unused block group Naohiro Aota
  3 siblings, 0 replies; 10+ messages in thread
From: Naohiro Aota @ 2022-04-28 15:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

Commit be1a1d7a5d24 ("btrfs: zoned: finish fully written block group")
introduced zone finishing code both for data and metadata end_io path.
However, the metadata side is not working as it should be. First, it
compares logical address (eb->start + eb->len) with offset within a block
group (cache->zone_capacity) in submit_eb_page(). That essentially disabled
zone finishing on metadata end_io path.

Furthermore, fixing the issue above revealed we cannot call
btrfs_zone_finish_endio() in end_extent_buffer_writeback(). We cannot call
btrfs_lookup_block_group() which require spin lock inside end_io context.

This commit introduces btrfs_schedule_zone_finish_bg() to wait for the
extent buffer writeback and do the zone finish IO in a workqueue.

Also, drop EXTENT_BUFFER_ZONE_FINISH as it is no longer used.

Cc: stable@vger.kernel.org # 5.16+
Fixes: be1a1d7a5d24 ("btrfs: zoned: finish fully written block group")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.h |  2 ++
 fs/btrfs/extent_io.c   |  6 +-----
 fs/btrfs/extent_io.h   |  1 -
 fs/btrfs/zoned.c       | 34 ++++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h       |  5 +++++
 5 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index c9bf01dd10e8..3ac668ace50a 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -212,6 +212,8 @@ struct btrfs_block_group {
 	u64 meta_write_pointer;
 	struct map_lookup *physical_map;
 	struct list_head active_bg_list;
+	struct work_struct zone_finish_work;
+	struct extent_buffer *last_eb;
 };
 
 static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f9d6dd310c42..4778067bc0fa 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4259,9 +4259,6 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 
 static void end_extent_buffer_writeback(struct extent_buffer *eb)
 {
-	if (test_bit(EXTENT_BUFFER_ZONE_FINISH, &eb->bflags))
-		btrfs_zone_finish_endio(eb->fs_info, eb->start, eb->len);
-
 	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
 	smp_mb__after_atomic();
 	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
@@ -4851,8 +4848,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc,
 		/*
 		 * Implies write in zoned mode. Mark the last eb in a block group.
 		 */
-		if (cache->seq_zone && eb->start + eb->len == cache->zone_capacity)
-			set_bit(EXTENT_BUFFER_ZONE_FINISH, &eb->bflags);
+		btrfs_schedule_zone_finish_bg(cache, eb);
 		btrfs_put_block_group(cache);
 	}
 	ret = write_one_eb(eb, wbc, epd);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b390ec79f9a8..89ebb7338d6f 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -32,7 +32,6 @@ enum {
 	/* write IO error */
 	EXTENT_BUFFER_WRITE_ERR,
 	EXTENT_BUFFER_NO_CHECK,
-	EXTENT_BUFFER_ZONE_FINISH,
 };
 
 /* these are flags for __process_pages_contig */
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 0f6ca3587c3b..afad085a589a 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2041,6 +2041,40 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
 	btrfs_put_block_group(block_group);
 }
 
+static void btrfs_zone_finish_endio_workfn(struct work_struct *work)
+{
+	struct btrfs_block_group *bg =
+		container_of(work, struct btrfs_block_group, zone_finish_work);
+
+	wait_on_extent_buffer_writeback(bg->last_eb);
+	free_extent_buffer(bg->last_eb);
+	btrfs_zone_finish_endio(bg->fs_info, bg->start, bg->length);
+	btrfs_put_block_group(bg);
+}
+
+void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
+				   struct extent_buffer *eb)
+{
+	if (!bg->seq_zone ||
+	    eb->start + eb->len * 2 <= bg->start + bg->zone_capacity)
+		return;
+
+	if (WARN_ON(bg->zone_finish_work.func ==
+		    btrfs_zone_finish_endio_workfn)) {
+		btrfs_err(bg->fs_info,
+			  "double scheduling of BG %llu zone finishing",
+			  bg->start);
+		return;
+	}
+
+	/* For the work */
+	btrfs_get_block_group(bg);
+	atomic_inc(&eb->refs);
+	bg->last_eb = eb;
+	INIT_WORK(&bg->zone_finish_work, btrfs_zone_finish_endio_workfn);
+	queue_work(system_unbound_wq, &bg->zone_finish_work);
+}
+
 void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
 {
 	struct btrfs_fs_info *fs_info = bg->fs_info;
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index de923fc8449d..10f31d1c8b0c 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -72,6 +72,8 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group);
 bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags);
 void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
 			     u64 length);
+void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
+				   struct extent_buffer *eb);
 void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
 void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
 bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info);
@@ -230,6 +232,9 @@ static inline bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
 static inline void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info,
 					   u64 logical, u64 length) { }
 
+static inline void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
+						 struct extent_buffer *eb) { }
+
 static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
 
 static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] btrfs: zoned: zone finish unused block group
  2022-04-28 15:02 [PATCH 0/4] btrfs: zoned: fixes for zone finishing Naohiro Aota
                   ` (2 preceding siblings ...)
  2022-04-28 15:02 ` [PATCH 3/4] btrfs: zoned: properly finish block group on metadata write Naohiro Aota
@ 2022-04-28 15:02 ` Naohiro Aota
  3 siblings, 0 replies; 10+ messages in thread
From: Naohiro Aota @ 2022-04-28 15:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota

While the active zones within an active block group are reset, and their
active resource is released, the block group itself is kept in the active
block group list and marked as active. As a result, the list will contain
more than max_active_zones block groups. That itself is not fatal for the
device as the zones are properly reset.

However, that inflated list is, of course, strange. Also, a to-appear patch
series, which deactivates an active block group on demand, get confused
with the wrong list.

So, fix the issue by finishing the unused block group once it gets
read-only, so that we can release the active resource in an early stage.

Cc: stable@vger.kernel.org # 5.16+
Fixes: be1a1d7a5d24 ("btrfs: zoned: finish fully written block group")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9739f3e8230a..ede389f2602d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1385,6 +1385,14 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 			goto next;
 		}
 
+		ret = btrfs_zone_finish(block_group);
+		if (ret < 0) {
+			btrfs_dec_block_group_ro(block_group);
+			if (ret == -EAGAIN)
+				ret = 0;
+			goto next;
+		}
+
 		/*
 		 * Want to do this before we do anything else so we can recover
 		 * properly if we fail to join the transaction.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] btrfs: zoned: consolidate zone finish function
  2022-04-28 15:02 ` [PATCH 1/4] btrfs: zoned: consolidate zone finish function Naohiro Aota
@ 2022-04-28 16:11   ` David Sterba
  2022-04-29  4:56     ` Naohiro Aota
  0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2022-04-28 16:11 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs

On Fri, Apr 29, 2022 at 12:02:15AM +0900, Naohiro Aota wrote:
>  1 file changed, 54 insertions(+), 73 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 997a96d7a3d5..9cddafe78fb1 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -1879,20 +1879,14 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
>  	return ret;
>  }
>  
> -int btrfs_zone_finish(struct btrfs_block_group *block_group)
> +static int __btrfs_zone_finish(struct btrfs_block_group *block_group, bool nowait)

Can you please avoid using __ for functions? Eg. the main function can
be btrfs_zone_finish(taking 2 parameters) and the exported one being
btrfs_zone_finish_nowait reflecting the preset parameter and also making
the 'nowait' semantics clear.

>  {
>  	struct btrfs_fs_info *fs_info = block_group->fs_info;
>  	struct map_lookup *map;
> -	struct btrfs_device *device;
> -	u64 physical;
> +	bool need_zone_finish;
>  	int ret = 0;
>  	int i;
>  
> -	if (!btrfs_is_zoned(fs_info))
> -		return 0;
> -
> -	map = block_group->physical_map;
> -
>  	spin_lock(&block_group->lock);
>  	if (!block_group->zone_is_active) {
>  		spin_unlock(&block_group->lock);
> @@ -1906,36 +1900,42 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
>  		spin_unlock(&block_group->lock);
>  		return -EAGAIN;
>  	}
> -	spin_unlock(&block_group->lock);
>  
> -	ret = btrfs_inc_block_group_ro(block_group, false);
> -	if (ret)
> -		return ret;
> +	if (!nowait) {
> +		spin_unlock(&block_group->lock);
>  
> -	/* Ensure all writes in this block group finish */
> -	btrfs_wait_block_group_reservations(block_group);
> -	/* No need to wait for NOCOW writers. Zoned mode does not allow that. */
> -	btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
> -				 block_group->length);
> +		ret = btrfs_inc_block_group_ro(block_group, false);
> +		if (ret)
> +			return ret;
>  
> -	spin_lock(&block_group->lock);
> +		/* Ensure all writes in this block group finish */
> +		btrfs_wait_block_group_reservations(block_group);
> +		/* No need to wait for NOCOW writers. Zoned mode does not allow that. */
> +		btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
> +					 block_group->length);
>  
> -	/*
> -	 * Bail out if someone already deactivated the block group, or
> -	 * allocated space is left in the block group.
> -	 */
> -	if (!block_group->zone_is_active) {
> -		spin_unlock(&block_group->lock);
> -		btrfs_dec_block_group_ro(block_group);
> -		return 0;
> -	}
> +		spin_lock(&block_group->lock);
>  
> -	if (block_group->reserved) {
> -		spin_unlock(&block_group->lock);
> -		btrfs_dec_block_group_ro(block_group);
> -		return -EAGAIN;
> +		/*
> +		 * Bail out if someone already deactivated the block group, or
> +		 * allocated space is left in the block group.
> +		 */
> +		if (!block_group->zone_is_active) {
> +			spin_unlock(&block_group->lock);
> +			btrfs_dec_block_group_ro(block_group);
> +			return 0;
> +		}
> +
> +		if (block_group->reserved) {
> +			spin_unlock(&block_group->lock);
> +			btrfs_dec_block_group_ro(block_group);
> +			return -EAGAIN;
> +		}
>  	}
>  
> +	/* There is unwritten space left. Need to finish the underlying zones. */
> +	need_zone_finish = (block_group->zone_capacity - block_group->alloc_offset) > 0;

This could be simplified to 'bg->zone_capacity > bg->alloc_offset',
but maybe should be behind a helper as the expression appears more
times.

> +
>  	block_group->zone_is_active = 0;
>  	block_group->alloc_offset = block_group->zone_capacity;
>  	block_group->free_space_ctl->free_space = 0;
> @@ -1943,24 +1943,29 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
>  	btrfs_clear_data_reloc_bg(block_group);
>  	spin_unlock(&block_group->lock);
>  
> +	map = block_group->physical_map;
>  	for (i = 0; i < map->num_stripes; i++) {
> -		device = map->stripes[i].dev;
> -		physical = map->stripes[i].physical;
> +		struct btrfs_device *device = map->stripes[i].dev;
> +		const u64 physical = map->stripes[i].physical;
>  
>  		if (device->zone_info->max_active_zones == 0)
>  			continue;
>  
> -		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
> -				       physical >> SECTOR_SHIFT,
> -				       device->zone_info->zone_size >> SECTOR_SHIFT,
> -				       GFP_NOFS);
> +		if (need_zone_finish) {
> +			ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
> +					       physical >> SECTOR_SHIFT,
> +					       device->zone_info->zone_size >> SECTOR_SHIFT,
> +					       GFP_NOFS);
>  
> -		if (ret)
> -			return ret;
> +			if (ret)
> +				return ret;
> +		}
>  
>  		btrfs_dev_clear_active_zone(device, physical);
>  	}
> -	btrfs_dec_block_group_ro(block_group);
> +
> +	if (!nowait)
> +		btrfs_dec_block_group_ro(block_group);
>  
>  	spin_lock(&fs_info->zone_active_bgs_lock);
>  	ASSERT(!list_empty(&block_group->active_bg_list));

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] btrfs: zoned: consolidate zone finish function
  2022-04-28 16:11   ` David Sterba
@ 2022-04-29  4:56     ` Naohiro Aota
  2022-04-29 18:41       ` David Sterba
  0 siblings, 1 reply; 10+ messages in thread
From: Naohiro Aota @ 2022-04-29  4:56 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Thu, Apr 28, 2022 at 06:11:03PM +0200, David Sterba wrote:
> On Fri, Apr 29, 2022 at 12:02:15AM +0900, Naohiro Aota wrote:
> >  1 file changed, 54 insertions(+), 73 deletions(-)
> > 
> > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> > index 997a96d7a3d5..9cddafe78fb1 100644
> > --- a/fs/btrfs/zoned.c
> > +++ b/fs/btrfs/zoned.c
> > @@ -1879,20 +1879,14 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
> >  	return ret;
> >  }
> >  
> > -int btrfs_zone_finish(struct btrfs_block_group *block_group)
> > +static int __btrfs_zone_finish(struct btrfs_block_group *block_group, bool nowait)
> 
> Can you please avoid using __ for functions? Eg. the main function can
> be btrfs_zone_finish(taking 2 parameters) and the exported one being
> btrfs_zone_finish_nowait reflecting the preset parameter and also making
> the 'nowait' semantics clear.

But, we exports both btrfs_zone_finish() (the waiting variant) and
btrfs_zone_finish_endio() (the nowait variant + checking the left space
after write). How about "do_zone_finish(block_group, bool nowait)" as a
main function and "btrfs_zone_finish(block_group)" and
"btrfs_zone_finish_endio(block_group)" ?

> >  {
> >  	struct btrfs_fs_info *fs_info = block_group->fs_info;
> >  	struct map_lookup *map;
> > -	struct btrfs_device *device;
> > -	u64 physical;
> > +	bool need_zone_finish;
> >  	int ret = 0;
> >  	int i;
> >  
> > -	if (!btrfs_is_zoned(fs_info))
> > -		return 0;
> > -
> > -	map = block_group->physical_map;
> > -
> >  	spin_lock(&block_group->lock);
> >  	if (!block_group->zone_is_active) {
> >  		spin_unlock(&block_group->lock);
> > @@ -1906,36 +1900,42 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
> >  		spin_unlock(&block_group->lock);
> >  		return -EAGAIN;
> >  	}
> > -	spin_unlock(&block_group->lock);
> >  
> > -	ret = btrfs_inc_block_group_ro(block_group, false);
> > -	if (ret)
> > -		return ret;
> > +	if (!nowait) {
> > +		spin_unlock(&block_group->lock);
> >  
> > -	/* Ensure all writes in this block group finish */
> > -	btrfs_wait_block_group_reservations(block_group);
> > -	/* No need to wait for NOCOW writers. Zoned mode does not allow that. */
> > -	btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
> > -				 block_group->length);
> > +		ret = btrfs_inc_block_group_ro(block_group, false);
> > +		if (ret)
> > +			return ret;
> >  
> > -	spin_lock(&block_group->lock);
> > +		/* Ensure all writes in this block group finish */
> > +		btrfs_wait_block_group_reservations(block_group);
> > +		/* No need to wait for NOCOW writers. Zoned mode does not allow that. */
> > +		btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
> > +					 block_group->length);
> >  
> > -	/*
> > -	 * Bail out if someone already deactivated the block group, or
> > -	 * allocated space is left in the block group.
> > -	 */
> > -	if (!block_group->zone_is_active) {
> > -		spin_unlock(&block_group->lock);
> > -		btrfs_dec_block_group_ro(block_group);
> > -		return 0;
> > -	}
> > +		spin_lock(&block_group->lock);
> >  
> > -	if (block_group->reserved) {
> > -		spin_unlock(&block_group->lock);
> > -		btrfs_dec_block_group_ro(block_group);
> > -		return -EAGAIN;
> > +		/*
> > +		 * Bail out if someone already deactivated the block group, or
> > +		 * allocated space is left in the block group.
> > +		 */
> > +		if (!block_group->zone_is_active) {
> > +			spin_unlock(&block_group->lock);
> > +			btrfs_dec_block_group_ro(block_group);
> > +			return 0;
> > +		}
> > +
> > +		if (block_group->reserved) {
> > +			spin_unlock(&block_group->lock);
> > +			btrfs_dec_block_group_ro(block_group);
> > +			return -EAGAIN;
> > +		}
> >  	}
> >  
> > +	/* There is unwritten space left. Need to finish the underlying zones. */
> > +	need_zone_finish = (block_group->zone_capacity - block_group->alloc_offset) > 0;
> 
> This could be simplified to 'bg->zone_capacity > bg->alloc_offset',
> but maybe should be behind a helper as the expression appears more
> times.

Yep. True. I think extent-tree.c has some of this. Let me check.

> > +
> >  	block_group->zone_is_active = 0;
> >  	block_group->alloc_offset = block_group->zone_capacity;
> >  	block_group->free_space_ctl->free_space = 0;
> > @@ -1943,24 +1943,29 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
> >  	btrfs_clear_data_reloc_bg(block_group);
> >  	spin_unlock(&block_group->lock);
> >  
> > +	map = block_group->physical_map;
> >  	for (i = 0; i < map->num_stripes; i++) {
> > -		device = map->stripes[i].dev;
> > -		physical = map->stripes[i].physical;
> > +		struct btrfs_device *device = map->stripes[i].dev;
> > +		const u64 physical = map->stripes[i].physical;
> >  
> >  		if (device->zone_info->max_active_zones == 0)
> >  			continue;
> >  
> > -		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
> > -				       physical >> SECTOR_SHIFT,
> > -				       device->zone_info->zone_size >> SECTOR_SHIFT,
> > -				       GFP_NOFS);
> > +		if (need_zone_finish) {
> > +			ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
> > +					       physical >> SECTOR_SHIFT,
> > +					       device->zone_info->zone_size >> SECTOR_SHIFT,
> > +					       GFP_NOFS);
> >  
> > -		if (ret)
> > -			return ret;
> > +			if (ret)
> > +				return ret;
> > +		}
> >  
> >  		btrfs_dev_clear_active_zone(device, physical);
> >  	}
> > -	btrfs_dec_block_group_ro(block_group);
> > +
> > +	if (!nowait)
> > +		btrfs_dec_block_group_ro(block_group);
> >  
> >  	spin_lock(&fs_info->zone_active_bgs_lock);
> >  	ASSERT(!list_empty(&block_group->active_bg_list));

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left
  2022-04-28 15:02 ` [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left Naohiro Aota
@ 2022-04-29 11:55   ` Pankaj Raghav
  2022-05-02 18:40     ` Naohiro Aota
  0 siblings, 1 reply; 10+ messages in thread
From: Pankaj Raghav @ 2022-04-29 11:55 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, Pankaj Raghav

On Fri, Apr 29, 2022 at 12:02:16AM +0900, Naohiro Aota wrote:
> +++ b/fs/btrfs/zoned.c
> @@ -2017,6 +2017,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
>  void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 length)
>  {
>  	struct btrfs_block_group *block_group;
> +	u64 min_use;
minor nit:
Could you rename this variable to `min_alloc_bytes` or something else
more descriptive?
>  
>  	if (!btrfs_is_zoned(fs_info))
>  		return;
> @@ -2024,7 +2025,14 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
>  	block_group = btrfs_lookup_block_group(fs_info, logical);
>  	ASSERT(block_group);
>  
> -	if (logical + length < block_group->start + block_group->zone_capacity)
> +	/* No MIXED BG on zoned btrfs. */
> +	if (block_group->flags & BTRFS_BLOCK_GROUP_DATA)
> +		min_use = fs_info->sectorsize;
> +	else
> +		min_use = fs_info->nodesize;
> +
> +	/* Bail out if we can allocate more data from this BG. */
> +	if (logical + length + min_use <= block_group->start + block_group->zone_capacity)
>  		goto out;
>  
>  	__btrfs_zone_finish(block_group, true);
> -- 
> 2.35.1
> 
Otherwise the changes look good to me.

Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>

-- 
Pankaj Raghav

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] btrfs: zoned: consolidate zone finish function
  2022-04-29  4:56     ` Naohiro Aota
@ 2022-04-29 18:41       ` David Sterba
  0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2022-04-29 18:41 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: dsterba, linux-btrfs

On Fri, Apr 29, 2022 at 04:56:33AM +0000, Naohiro Aota wrote:
> On Thu, Apr 28, 2022 at 06:11:03PM +0200, David Sterba wrote:
> > On Fri, Apr 29, 2022 at 12:02:15AM +0900, Naohiro Aota wrote:
> > >  1 file changed, 54 insertions(+), 73 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> > > index 997a96d7a3d5..9cddafe78fb1 100644
> > > --- a/fs/btrfs/zoned.c
> > > +++ b/fs/btrfs/zoned.c
> > > @@ -1879,20 +1879,14 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
> > >  	return ret;
> > >  }
> > >  
> > > -int btrfs_zone_finish(struct btrfs_block_group *block_group)
> > > +static int __btrfs_zone_finish(struct btrfs_block_group *block_group, bool nowait)
> > 
> > Can you please avoid using __ for functions? Eg. the main function can
> > be btrfs_zone_finish(taking 2 parameters) and the exported one being
> > btrfs_zone_finish_nowait reflecting the preset parameter and also making
> > the 'nowait' semantics clear.
> 
> But, we exports both btrfs_zone_finish() (the waiting variant) and
> btrfs_zone_finish_endio() (the nowait variant + checking the left space
> after write). How about "do_zone_finish(block_group, bool nowait)" as a
> main function and "btrfs_zone_finish(block_group)" and
> "btrfs_zone_finish_endio(block_group)" ?

Yeah, do_zone_finish works as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left
  2022-04-29 11:55   ` Pankaj Raghav
@ 2022-05-02 18:40     ` Naohiro Aota
  0 siblings, 0 replies; 10+ messages in thread
From: Naohiro Aota @ 2022-05-02 18:40 UTC (permalink / raw)
  To: Pankaj Raghav; +Cc: linux-btrfs, Pankaj Raghav

On Fri, Apr 29, 2022 at 01:55:07PM +0200, Pankaj Raghav wrote:
> On Fri, Apr 29, 2022 at 12:02:16AM +0900, Naohiro Aota wrote:
> > +++ b/fs/btrfs/zoned.c
> > @@ -2017,6 +2017,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
> >  void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 length)
> >  {
> >  	struct btrfs_block_group *block_group;
> > +	u64 min_use;
> minor nit:
> Could you rename this variable to `min_alloc_bytes` or something else
> more descriptive?

Sure. It looks better.

> >  
> >  	if (!btrfs_is_zoned(fs_info))
> >  		return;
> > @@ -2024,7 +2025,14 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
> >  	block_group = btrfs_lookup_block_group(fs_info, logical);
> >  	ASSERT(block_group);
> >  
> > -	if (logical + length < block_group->start + block_group->zone_capacity)
> > +	/* No MIXED BG on zoned btrfs. */
> > +	if (block_group->flags & BTRFS_BLOCK_GROUP_DATA)
> > +		min_use = fs_info->sectorsize;
> > +	else
> > +		min_use = fs_info->nodesize;
> > +
> > +	/* Bail out if we can allocate more data from this BG. */
> > +	if (logical + length + min_use <= block_group->start + block_group->zone_capacity)
> >  		goto out;
> >  
> >  	__btrfs_zone_finish(block_group, true);
> > -- 
> > 2.35.1
> > 
> Otherwise the changes look good to me.
> 
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> 
> -- 
> Pankaj Raghav

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-02 18:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 15:02 [PATCH 0/4] btrfs: zoned: fixes for zone finishing Naohiro Aota
2022-04-28 15:02 ` [PATCH 1/4] btrfs: zoned: consolidate zone finish function Naohiro Aota
2022-04-28 16:11   ` David Sterba
2022-04-29  4:56     ` Naohiro Aota
2022-04-29 18:41       ` David Sterba
2022-04-28 15:02 ` [PATCH 2/4] btrfs: zoned: finish BG when there are no more allocatable bytes left Naohiro Aota
2022-04-29 11:55   ` Pankaj Raghav
2022-05-02 18:40     ` Naohiro Aota
2022-04-28 15:02 ` [PATCH 3/4] btrfs: zoned: properly finish block group on metadata write Naohiro Aota
2022-04-28 15:02 ` [PATCH 4/4] btrfs: zoned: zone finish unused block group Naohiro Aota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.