All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix active zone accounting for zoned
@ 2023-03-01 21:14 Josef Bacik
  2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Josef Bacik @ 2023-03-01 21:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

While trying to get the CI system up and running with the ZNS drives I ran into
this issue where the TEST_DEV would get maxed out on block groups and everything
would start failing with -ENOSPC.

This is because the way we handle active zone tracking isn't correct.  We need
to only take into account the used/pinned/zone_unusable counters for the active
block groups, not the system wide counters.

Unfortunately this is kind of annoying in that we have to mirror these counters
for the active block groups.  At this point this is the most straightforward way
to fix this.  I'm currently running these patches through xfstests, so consider
this as sort of an RFC to make sure the approach seems reasonable.  Thanks,

Josef

Josef Bacik (3):
  btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING
  btrfs: clean up space_info usage in  btrfs_update_block_group
  btrfs: handle active zone accounting properly

 fs/btrfs/block-group.c | 51 ++++++++++++++++++-------
 fs/btrfs/disk-io.c     |  6 +++
 fs/btrfs/extent-tree.c | 13 +++++++
 fs/btrfs/fs.h          |  7 +---
 fs/btrfs/space-info.c  | 85 +++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/space-info.h  | 20 +++++++++-
 fs/btrfs/zoned.c       | 17 ++++++---
 7 files changed, 168 insertions(+), 31 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING
  2023-03-01 21:14 [PATCH 0/3] Fix active zone accounting for zoned Josef Bacik
@ 2023-03-01 21:14 ` Josef Bacik
  2023-03-02  6:45   ` Anand Jain
                     ` (2 more replies)
  2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
  2023-03-01 21:14 ` [PATCH 3/3] btrfs: handle active zone accounting properly Josef Bacik
  2 siblings, 3 replies; 13+ messages in thread
From: Josef Bacik @ 2023-03-01 21:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This flag only gets set when we're doing active zone tracking, and I'm
going to need to use this flag for things related to this behavior.
Rename the flag to represent what it actually means for the file system
so it can be used in other ways and still make sense.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/fs.h         | 7 ++-----
 fs/btrfs/space-info.c | 2 +-
 fs/btrfs/zoned.c      | 3 +--
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 4c477eae6891..24cd49229408 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -120,11 +120,8 @@ enum {
 	/* Indicate that we want to commit the transaction. */
 	BTRFS_FS_NEED_TRANS_COMMIT,
 
-	/*
-	 * Indicate metadata over-commit is disabled. This is set when active
-	 * zone tracking is needed.
-	 */
-	BTRFS_FS_NO_OVERCOMMIT,
+	/* This is set when active zone tracking is needed. */
+	BTRFS_FS_ACTIVE_ZONE_TRACKING,
 
 	/*
 	 * Indicate if we have some features changed, this is mostly for
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 69c09508afb5..2237685d1ed0 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -407,7 +407,7 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
 		return 0;
 
 	used = btrfs_space_info_used(space_info, true);
-	if (test_bit(BTRFS_FS_NO_OVERCOMMIT, &fs_info->flags) &&
+	if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags) &&
 	    (space_info->flags & BTRFS_BLOCK_GROUP_METADATA))
 		avail = 0;
 	else
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index f95b2c94d619..808cfa3091c5 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -524,8 +524,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
 		}
 		atomic_set(&zone_info->active_zones_left,
 			   max_active_zones - nactive);
-		/* Overcommit does not work well with active zone tacking. */
-		set_bit(BTRFS_FS_NO_OVERCOMMIT, &fs_info->flags);
+		set_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags);
 	}
 
 	/* Validate superblock log */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] btrfs: clean up space_info usage in  btrfs_update_block_group
  2023-03-01 21:14 [PATCH 0/3] Fix active zone accounting for zoned Josef Bacik
  2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
@ 2023-03-01 21:14 ` Josef Bacik
  2023-03-02  6:46   ` Anand Jain
                     ` (2 more replies)
  2023-03-01 21:14 ` [PATCH 3/3] btrfs: handle active zone accounting properly Josef Bacik
  2 siblings, 3 replies; 13+ messages in thread
From: Josef Bacik @ 2023-03-01 21:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We do

cache->space_info->counter += num_bytes;

Everywhere in here.  This is makes the lines longer than they need to
be, and will be especially noticeable when I add the active tracking in,
so add a temp variable for the space_info so this is cleaner.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 5b10401d803b..3eff0b35e139 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3474,6 +3474,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 	spin_unlock(&info->delalloc_root_lock);
 
 	while (total) {
+		struct btrfs_space_info *space_info;
 		bool reclaim = false;
 
 		cache = btrfs_lookup_block_group(info, bytenr);
@@ -3481,6 +3482,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			ret = -ENOENT;
 			break;
 		}
+		space_info = cache->space_info;
 		factor = btrfs_bg_type_to_factor(cache->flags);
 
 		/*
@@ -3495,7 +3497,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 		byte_in_group = bytenr - cache->start;
 		WARN_ON(byte_in_group > cache->length);
 
-		spin_lock(&cache->space_info->lock);
+		spin_lock(&space_info->lock);
 		spin_lock(&cache->lock);
 
 		if (btrfs_test_opt(info, SPACE_CACHE) &&
@@ -3508,24 +3510,24 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			old_val += num_bytes;
 			cache->used = old_val;
 			cache->reserved -= num_bytes;
-			cache->space_info->bytes_reserved -= num_bytes;
-			cache->space_info->bytes_used += num_bytes;
-			cache->space_info->disk_used += num_bytes * factor;
+			space_info->bytes_reserved -= num_bytes;
+			space_info->bytes_used += num_bytes;
+			space_info->disk_used += num_bytes * factor;
 			spin_unlock(&cache->lock);
-			spin_unlock(&cache->space_info->lock);
+			spin_unlock(&space_info->lock);
 		} else {
 			old_val -= num_bytes;
 			cache->used = old_val;
 			cache->pinned += num_bytes;
-			btrfs_space_info_update_bytes_pinned(info,
-					cache->space_info, num_bytes);
-			cache->space_info->bytes_used -= num_bytes;
-			cache->space_info->disk_used -= num_bytes * factor;
+			btrfs_space_info_update_bytes_pinned(info, space_info,
+							     num_bytes);
+			space_info->bytes_used -= num_bytes;
+			space_info->disk_used -= num_bytes * factor;
 
 			reclaim = should_reclaim_block_group(cache, num_bytes);
 
 			spin_unlock(&cache->lock);
-			spin_unlock(&cache->space_info->lock);
+			spin_unlock(&space_info->lock);
 
 			set_extent_dirty(&trans->transaction->pinned_extents,
 					 bytenr, bytenr + num_bytes - 1,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] btrfs: handle active zone accounting properly
  2023-03-01 21:14 [PATCH 0/3] Fix active zone accounting for zoned Josef Bacik
  2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
  2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
@ 2023-03-01 21:14 ` Josef Bacik
  2023-03-02  7:01   ` Naohiro Aota
  2 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2023-03-01 21:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Running xfstests on my ZNS drives uncovered a problem where I was
allocating the entire device with metadata block groups.  The root cause
of this was we would get ENOSPC on mount, and while trying to satisfy
tickets we'd allocate more block groups.

The reason we were getting ENOSPC was because ->bytes_zone_unusable was
set to 40gib, but ->active_total_bytes was set to 8gib, which was the
maximum number of active block groups we're allowed to have at one time.
This was because we always update ->bytes_zone_unusable with the
unusable amount from every block group, but we only update
->active_total_bytes with the active block groups.

This is actually much worse however, because we could potentially have
other counters potentially well above the ->active_total_bytes, which
would lead to this early enospc situation.

Fix this by mirroring the counters for active block groups into their
own counters.  This allows us to keep the full space_info counters
consistent, which is needed for things like sysfs and the space info
ioctl, and then track the actual usage for ENOSPC reasons for only the
active block groups.

Additionally, when de-activating we weren't properly updating the
->active_total_bytes, which could lead to other problems.  Unifying all
of this with the proper helpers will make sure our accounting is
correct.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 29 +++++++++++++--
 fs/btrfs/disk-io.c     |  6 +++
 fs/btrfs/extent-tree.c | 13 +++++++
 fs/btrfs/space-info.c  | 83 ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/space-info.h  | 20 +++++++++-
 fs/btrfs/zoned.c       | 14 +++++--
 6 files changed, 152 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 3eff0b35e139..7d5b73b2689e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1166,6 +1166,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		btrfs_put_caching_control(caching_ctl);
 	}
 
+	ASSERT(!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+			 &block_group->runtime_flags));
+
 	spin_lock(&trans->transaction->dirty_bgs_lock);
 	WARN_ON(!list_empty(&block_group->dirty_list));
 	WARN_ON(!list_empty(&block_group->io_list));
@@ -1191,12 +1194,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 			< block_group->length);
 	}
 	block_group->space_info->total_bytes -= block_group->length;
-	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
-		block_group->space_info->active_total_bytes -= block_group->length;
 	block_group->space_info->bytes_readonly -=
 		(block_group->length - block_group->zone_unusable);
-	block_group->space_info->bytes_zone_unusable -=
-		block_group->zone_unusable;
 	block_group->space_info->disk_total -= block_group->length * factor;
 
 	spin_unlock(&block_group->space_info->lock);
@@ -1377,10 +1376,14 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force)
 
 	if (!ret) {
 		sinfo->bytes_readonly += num_bytes;
+		ASSERT(!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+				 &cache->runtime_flags));
+
 		if (btrfs_is_zoned(cache->fs_info)) {
 			/* Migrate zone_unusable bytes to readonly */
 			sinfo->bytes_readonly += cache->zone_unusable;
 			sinfo->bytes_zone_unusable -= cache->zone_unusable;
+
 			cache->zone_unusable = 0;
 		}
 		cache->ro++;
@@ -1575,6 +1578,9 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		}
 		spin_unlock(&fs_info->discard_ctl.lock);
 
+		ASSERT(!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+				 &block_group->runtime_flags));
+
 		/* Reset pinned so btrfs_put_block_group doesn't complain */
 		spin_lock(&space_info->lock);
 		spin_lock(&block_group->lock);
@@ -1582,6 +1588,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		btrfs_space_info_update_bytes_pinned(fs_info, space_info,
 						     -block_group->pinned);
 		space_info->bytes_readonly += block_group->pinned;
+
 		block_group->pinned = 0;
 
 		spin_unlock(&block_group->lock);
@@ -2876,6 +2883,9 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache)
 
 	BUG_ON(!cache->ro);
 
+	ASSERT(!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+			 &cache->runtime_flags));
+
 	spin_lock(&sinfo->lock);
 	spin_lock(&cache->lock);
 	if (!--cache->ro) {
@@ -3513,6 +3523,11 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			space_info->bytes_reserved -= num_bytes;
 			space_info->bytes_used += num_bytes;
 			space_info->disk_used += num_bytes * factor;
+
+			if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+				     &cache->runtime_flags))
+				space_info->active_bytes_used += num_bytes;
+
 			spin_unlock(&cache->lock);
 			spin_unlock(&space_info->lock);
 		} else {
@@ -3524,6 +3539,12 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			space_info->bytes_used -= num_bytes;
 			space_info->disk_used -= num_bytes * factor;
 
+			if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+				     &cache->runtime_flags)) {
+				space_info->active_bytes_pinned += num_bytes;
+				space_info->active_bytes_used -= num_bytes;
+			}
+
 			reclaim = should_reclaim_block_group(cache, num_bytes);
 
 			spin_unlock(&cache->lock);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b53f0e30ce2b..4cec057f4358 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4883,6 +4883,12 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
 				cache->space_info, head->num_bytes);
 			cache->reserved -= head->num_bytes;
 			cache->space_info->bytes_reserved -= head->num_bytes;
+
+			if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+				     &cache->runtime_flags))
+				cache->space_info->active_bytes_pinned +=
+					head->num_bytes;
+
 			spin_unlock(&cache->lock);
 			spin_unlock(&cache->space_info->lock);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 824c657f59e8..29fb3b2b7370 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2530,6 +2530,11 @@ static int pin_down_extent(struct btrfs_trans_handle *trans,
 		cache->reserved -= num_bytes;
 		cache->space_info->bytes_reserved -= num_bytes;
 	}
+
+	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+		     &cache->runtime_flags))
+		cache->space_info->active_bytes_pinned += num_bytes;
+
 	spin_unlock(&cache->lock);
 	spin_unlock(&cache->space_info->lock);
 
@@ -2725,6 +2730,14 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
 		spin_lock(&cache->lock);
 		cache->pinned -= len;
 		btrfs_space_info_update_bytes_pinned(fs_info, space_info, -len);
+
+		if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+			     &cache->runtime_flags)) {
+			ASSERT(!cache->ro);
+			space_info->active_bytes_zone_unusable += len;
+			space_info->active_bytes_pinned -= len;
+		}
+
 		space_info->max_extent_size = 0;
 		if (cache->ro) {
 			space_info->bytes_readonly += len;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 2237685d1ed0..d9a8d8ba38d7 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -166,6 +166,14 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info,
 			  bool may_use_included)
 {
 	ASSERT(s_info);
+
+	if (s_info->active_total_bytes)
+		return s_info->bytes_reserved +
+			s_info->active_bytes_used +
+			s_info->active_bytes_pinned +
+			s_info->active_bytes_zone_unusable +
+			(may_use_included ? s_info->bytes_may_use : 0);
+
 	return s_info->bytes_used + s_info->bytes_reserved +
 		s_info->bytes_pinned + s_info->bytes_readonly +
 		s_info->bytes_zone_unusable +
@@ -296,6 +304,66 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+static void update_block_group_activation(struct btrfs_block_group *block_group,
+					  bool activate)
+{
+	struct btrfs_fs_info *fs_info = block_group->fs_info;
+	struct btrfs_space_info *space_info = block_group->space_info;
+
+	lockdep_assert_held(&space_info->lock);
+
+	if (!test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags))
+		return;
+
+	if (!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
+		      &block_group->runtime_flags))
+		return;
+
+	if (activate) {
+		space_info->active_total_bytes += block_group->length;
+		space_info->active_bytes_used += block_group->used;
+		space_info->active_bytes_pinned += block_group->pinned;
+		space_info->active_bytes_zone_unusable +=
+			block_group->zone_unusable;
+	} else {
+		space_info->active_total_bytes -= block_group->length;
+		space_info->active_bytes_used -= block_group->used;
+		space_info->active_bytes_pinned -= block_group->pinned;
+		space_info->active_bytes_zone_unusable -=
+			block_group->zone_unusable;
+	}
+}
+
+/*
+ * Set the block group as active and update the counters.
+ *
+ * @block_group: The block group we're activating.
+ *
+ * If we have BTRFS_FS_ACTIVE_ZONE_TRACKING, this will update
+ * ->active_* counters for the space_info and set
+ *  BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE on the block group.
+ */
+void btrfs_activate_block_group(struct btrfs_block_group *block_group)
+{
+	set_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags);
+	update_block_group_activation(block_group, true);
+}
+
+/*
+ * Deactivate the block group and update the counters.
+ *
+ * @block_group: The block group we're deactivating.
+ *
+ * If we have BTRFS_FS_ACTIVE_ZONE_TRACKING, this will update
+ * ->active_* counters for the space_info and clear
+ *  BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE on the block group.
+ */
+void btrfs_deactivate_block_group(struct btrfs_block_group *block_group)
+{
+	update_block_group_activation(block_group, false);
+	clear_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags);
+}
+
 void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
 				struct btrfs_block_group *block_group)
 {
@@ -306,10 +374,10 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
 
 	found = btrfs_find_space_info(info, block_group->flags);
 	ASSERT(found);
+	block_group->space_info = found;
+
 	spin_lock(&found->lock);
 	found->total_bytes += block_group->length;
-	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
-		found->active_total_bytes += block_group->length;
 	found->disk_total += block_group->length * factor;
 	found->bytes_used += block_group->used;
 	found->disk_used += block_group->used * factor;
@@ -317,11 +385,11 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
 	found->bytes_zone_unusable += block_group->zone_unusable;
 	if (block_group->length > 0)
 		found->full = 0;
+
+	update_block_group_activation(block_group, true);
 	btrfs_try_granting_tickets(info, found);
 	spin_unlock(&found->lock);
 
-	block_group->space_info = found;
-
 	index = btrfs_bg_flags_to_raid_index(block_group->flags);
 	down_write(&found->groups_sem);
 	list_add_tail(&block_group->list, &found->block_groups[index]);
@@ -521,6 +589,13 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 		info->total_bytes, info->bytes_used, info->bytes_pinned,
 		info->bytes_reserved, info->bytes_may_use,
 		info->bytes_readonly, info->bytes_zone_unusable);
+	if (info->active_total_bytes == 0)
+		return;
+	btrfs_info(fs_info,
+"space_info active counters total=%llu, used=%llu, pinned=%llu, zone_unusable=%llu",
+		info->active_total_bytes, info->active_bytes_used,
+		info->active_bytes_pinned, info->active_bytes_zone_unusable);
+
 }
 
 void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index fc99ea2b0c34..d072d6c26d3a 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -96,11 +96,25 @@ struct btrfs_space_info {
 	u64 bytes_may_use;	/* number of bytes that may be used for
 				   delalloc/allocations */
 	u64 bytes_readonly;	/* total bytes that are read only */
-	/* Total bytes in the space, but only accounts active block groups. */
-	u64 active_total_bytes;
+
 	u64 bytes_zone_unusable;	/* total bytes that are unusable until
 					   resetting the device zone */
 
+	/*
+	 * This are mirrors of the above countesr.
+	 *
+	 * We need to mirror a lot of the counters for ENOSPC reasons if we're
+	 * doing active block group management.  The only exceptions are
+	 * bytes_reserved, which would be correct as we can only be allocating
+	 * from active block groups, and bytes_may_use which again is uptodate
+	 * based on what is currently being reserved.  Everything else has to be
+	 * mirrored and only accounted for in the active block groups.
+	 * */
+	u64 active_total_bytes;
+	u64 active_bytes_used;
+	u64 active_bytes_pinned;
+	u64 active_bytes_zone_unusable;
+
 	u64 max_extent_size;	/* This will hold the maximum extent size of
 				   the space info if we had an ENOSPC in the
 				   allocator. */
@@ -237,5 +251,7 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info);
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info);
 u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
+void btrfs_deactivate_block_group(struct btrfs_block_group *block_group);
+void btrfs_activate_block_group(struct btrfs_block_group *block_group);
 
 #endif /* BTRFS_SPACE_INFO_H */
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 808cfa3091c5..8a863004d750 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1900,8 +1900,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 	}
 
 	/* Successfully activated all the zones */
-	set_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags);
-	space_info->active_total_bytes += block_group->length;
+	btrfs_activate_block_group(block_group);
 	spin_unlock(&block_group->lock);
 	btrfs_try_granting_tickets(fs_info, space_info);
 	spin_unlock(&space_info->lock);
@@ -1956,15 +1955,18 @@ static void wait_eb_writebacks(struct btrfs_block_group *block_group)
 static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written)
 {
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
+	struct btrfs_space_info *space_info = block_group->space_info;
 	struct map_lookup *map;
 	const bool is_metadata = (block_group->flags &
 			(BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_SYSTEM));
 	int ret = 0;
 	int i;
 
+	spin_lock(&space_info->lock);
 	spin_lock(&block_group->lock);
 	if (!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags)) {
 		spin_unlock(&block_group->lock);
+		spin_unlock(&space_info->lock);
 		return 0;
 	}
 
@@ -1972,6 +1974,7 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 	if (is_metadata &&
 	    block_group->start + block_group->alloc_offset > block_group->meta_write_pointer) {
 		spin_unlock(&block_group->lock);
+		spin_unlock(&space_info->lock);
 		return -EAGAIN;
 	}
 
@@ -1984,6 +1987,7 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 	 */
 	if (!fully_written) {
 		spin_unlock(&block_group->lock);
+		spin_unlock(&space_info->lock);
 
 		ret = btrfs_inc_block_group_ro(block_group, false);
 		if (ret)
@@ -1998,6 +2002,7 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 		if (is_metadata)
 			wait_eb_writebacks(block_group);
 
+		spin_lock(&space_info->lock);
 		spin_lock(&block_group->lock);
 
 		/*
@@ -2007,23 +2012,26 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 		if (!test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
 			      &block_group->runtime_flags)) {
 			spin_unlock(&block_group->lock);
+			spin_unlock(&space_info->lock);
 			btrfs_dec_block_group_ro(block_group);
 			return 0;
 		}
 
 		if (block_group->reserved) {
 			spin_unlock(&block_group->lock);
+			spin_unlock(&space_info->lock);
 			btrfs_dec_block_group_ro(block_group);
 			return -EAGAIN;
 		}
 	}
 
-	clear_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags);
+	btrfs_deactivate_block_group(block_group);
 	block_group->alloc_offset = block_group->zone_capacity;
 	block_group->free_space_ctl->free_space = 0;
 	btrfs_clear_treelog_bg(block_group);
 	btrfs_clear_data_reloc_bg(block_group);
 	spin_unlock(&block_group->lock);
+	spin_unlock(&space_info->lock);
 
 	map = block_group->physical_map;
 	for (i = 0; i < map->num_stripes; i++) {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING
  2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
@ 2023-03-02  6:45   ` Anand Jain
  2023-03-02 10:03   ` Johannes Thumshirn
  2023-03-02 14:45   ` Naohiro Aota
  2 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2023-03-02  6:45 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 3/2/23 05:14, Josef Bacik wrote:
> This flag only gets set when we're doing active zone tracking, and I'm
> going to need to use this flag for things related to this behavior.
> Rename the flag to represent what it actually means for the file system
> so it can be used in other ways and still make sense.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>


> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 69c09508afb5..2237685d1ed0 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -407,7 +407,7 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
>   		return 0;
>   
>   	used = btrfs_space_info_used(space_info, true);
> -	if (test_bit(BTRFS_FS_NO_OVERCOMMIT, &fs_info->flags) &&
> +	if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags) &&
>   	    (space_info->flags & BTRFS_BLOCK_GROUP_METADATA))
>   		avail = 0;
>   	else
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index f95b2c94d619..808cfa3091c5 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -524,8 +524,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
>   		}
>   		atomic_set(&zone_info->active_zones_left,
>   			   max_active_zones - nactive);

> -		/* Overcommit does not work well with active zone tacking. */

  Nit:
  Consider moving this comment to the btrfs_can_overcommit() function,
  possibly during the apply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group
  2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
@ 2023-03-02  6:46   ` Anand Jain
  2023-03-02 10:04   ` Johannes Thumshirn
  2023-03-02 14:47   ` Naohiro Aota
  2 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2023-03-02  6:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 3/2/23 05:14, Josef Bacik wrote:
> We do
> 
> cache->space_info->counter += num_bytes;
> 
> Everywhere in here.  This is makes the lines longer than they need to
> be, and will be especially noticeable when I add the active tracking in,
> so add a temp variable for the space_info so this is cleaner.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] btrfs: handle active zone accounting properly
  2023-03-01 21:14 ` [PATCH 3/3] btrfs: handle active zone accounting properly Josef Bacik
@ 2023-03-02  7:01   ` Naohiro Aota
  2023-03-02 14:45     ` Naohiro Aota
  0 siblings, 1 reply; 13+ messages in thread
From: Naohiro Aota @ 2023-03-02  7:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Mar 01, 2023 at 04:14:44PM -0500, Josef Bacik wrote:
> Running xfstests on my ZNS drives uncovered a problem where I was
> allocating the entire device with metadata block groups.  The root cause
> of this was we would get ENOSPC on mount, and while trying to satisfy
> tickets we'd allocate more block groups.
> 
> The reason we were getting ENOSPC was because ->bytes_zone_unusable was
> set to 40gib, but ->active_total_bytes was set to 8gib, which was the
> maximum number of active block groups we're allowed to have at one time.
> This was because we always update ->bytes_zone_unusable with the
> unusable amount from every block group, but we only update
> ->active_total_bytes with the active block groups.
>
> This is actually much worse however, because we could potentially have
> other counters potentially well above the ->active_total_bytes, which
> would lead to this early enospc situation.
> 
> Fix this by mirroring the counters for active block groups into their
> own counters.  This allows us to keep the full space_info counters
> consistent, which is needed for things like sysfs and the space info
> ioctl, and then track the actual usage for ENOSPC reasons for only the
> active block groups.

I think the mirroring the counters approach duplicates the code and
variables and makes them complex. Instead, we can fix the
"active_total_bytes" accounting maybe like this:

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index d4dd73c9a701..bf4d96d74efe 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -319,7 +319,8 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
 	ASSERT(found);
 	spin_lock(&found->lock);
 	found->total_bytes += block_group->length;
-	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
+	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags) ||
+	    btrfs_zoned_bg_is_full(block_group))
 		found->active_total_bytes += block_group->length;
 	found->disk_total += block_group->length * factor;
 	found->bytes_used += block_group->used;

Or, we can remove "active_total_bytes" and introduce something like
"preactivated_bytes" to count the bytes of BGs never get activated (BGs #1 below).

There are three kinds of block groups.

1. Block groups never activated
2. Block groups currently active
3. Block groups previously active and currently inactive (due to fully written or zone finish)

What we really want to exclude from "total_bytes" is the total size of BGs
#1. They seem empty and allocatable but since they are not activated, we
cannot rely on them to do the space reservation.

And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.

OTOH, BGs #3 is OK, since they are already full we cannot allocate from them
anyway. For them, "total_bytes == used + reserved + pinned + zone_unusable"
should hold.

> Additionally, when de-activating we weren't properly updating the
> ->active_total_bytes, which could lead to other problems.  Unifying all
> of this with the proper helpers will make sure our accounting is
> correct.

So, de-activating not reducing the "active_total_bytes" should be OK
... but I admit its name is confusing. It should be
"active_and_finished_total_bytes" ? "once_activated_total_bytes" ? I still
don't have a good idea.

> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/block-group.c | 29 +++++++++++++--
>  fs/btrfs/disk-io.c     |  6 +++
>  fs/btrfs/extent-tree.c | 13 +++++++
>  fs/btrfs/space-info.c  | 83 ++++++++++++++++++++++++++++++++++++++++--
>  fs/btrfs/space-info.h  | 20 +++++++++-
>  fs/btrfs/zoned.c       | 14 +++++--
>  6 files changed, 152 insertions(+), 13 deletions(-)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING
  2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
  2023-03-02  6:45   ` Anand Jain
@ 2023-03-02 10:03   ` Johannes Thumshirn
  2023-03-02 14:45   ` Naohiro Aota
  2 siblings, 0 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2023-03-02 10:03 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group
  2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
  2023-03-02  6:46   ` Anand Jain
@ 2023-03-02 10:04   ` Johannes Thumshirn
  2023-03-02 14:47   ` Naohiro Aota
  2 siblings, 0 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2023-03-02 10:04 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] btrfs: handle active zone accounting properly
  2023-03-02  7:01   ` Naohiro Aota
@ 2023-03-02 14:45     ` Naohiro Aota
  2023-03-02 16:07       ` Josef Bacik
  0 siblings, 1 reply; 13+ messages in thread
From: Naohiro Aota @ 2023-03-02 14:45 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Thu, Mar 02, 2023 at 04:01:07PM +0900, Naohiro Aota wrote:
> On Wed, Mar 01, 2023 at 04:14:44PM -0500, Josef Bacik wrote:
> > Running xfstests on my ZNS drives uncovered a problem where I was
> > allocating the entire device with metadata block groups.  The root cause
> > of this was we would get ENOSPC on mount, and while trying to satisfy
> > tickets we'd allocate more block groups.
> > 
> > The reason we were getting ENOSPC was because ->bytes_zone_unusable was
> > set to 40gib, but ->active_total_bytes was set to 8gib, which was the
> > maximum number of active block groups we're allowed to have at one time.
> > This was because we always update ->bytes_zone_unusable with the
> > unusable amount from every block group, but we only update
> > ->active_total_bytes with the active block groups.
> >
> > This is actually much worse however, because we could potentially have
> > other counters potentially well above the ->active_total_bytes, which
> > would lead to this early enospc situation.
> > 
> > Fix this by mirroring the counters for active block groups into their
> > own counters.  This allows us to keep the full space_info counters
> > consistent, which is needed for things like sysfs and the space info
> > ioctl, and then track the actual usage for ENOSPC reasons for only the
> > active block groups.
> 
> I think the mirroring the counters approach duplicates the code and
> variables and makes them complex. Instead, we can fix the
> "active_total_bytes" accounting maybe like this:
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index d4dd73c9a701..bf4d96d74efe 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -319,7 +319,8 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
>  	ASSERT(found);
>  	spin_lock(&found->lock);
>  	found->total_bytes += block_group->length;
> -	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
> +	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags) ||
> +	    btrfs_zoned_bg_is_full(block_group))
>  		found->active_total_bytes += block_group->length;
>  	found->disk_total += block_group->length * factor;
>  	found->bytes_used += block_group->used;
> 
> Or, we can remove "active_total_bytes" and introduce something like
> "preactivated_bytes" to count the bytes of BGs never get activated (BGs #1 below).

I got a new idea. How about counting all the region in a new block group as
zone_unusable? Then, region [0 .. zone_capacity] will be freed for use once
it gets activated. With this, we can drop "active_total_bytes" so the code
will become similar between the regular and the zoned mode.

We also need to care not to reclaim the fresh BGs but it's trivial
(alloc_offset == 0).

> 
> There are three kinds of block groups.
> 
> 1. Block groups never activated
> 2. Block groups currently active
> 3. Block groups previously active and currently inactive (due to fully written or zone finish)
> 
> What we really want to exclude from "total_bytes" is the total size of BGs
> #1. They seem empty and allocatable but since they are not activated, we
> cannot rely on them to do the space reservation.
> 
> And, since BGs #1 never get activated, they should have no "used",
> "reserved" and "pinned" bytes.
> 
> OTOH, BGs #3 is OK, since they are already full we cannot allocate from them
> anyway. For them, "total_bytes == used + reserved + pinned + zone_unusable"
> should hold.
> 
> > Additionally, when de-activating we weren't properly updating the
> > ->active_total_bytes, which could lead to other problems.  Unifying all
> > of this with the proper helpers will make sure our accounting is
> > correct.
> 
> So, de-activating not reducing the "active_total_bytes" should be OK
> ... but I admit its name is confusing. It should be
> "active_and_finished_total_bytes" ? "once_activated_total_bytes" ? I still
> don't have a good idea.
> 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/block-group.c | 29 +++++++++++++--
> >  fs/btrfs/disk-io.c     |  6 +++
> >  fs/btrfs/extent-tree.c | 13 +++++++
> >  fs/btrfs/space-info.c  | 83 ++++++++++++++++++++++++++++++++++++++++--
> >  fs/btrfs/space-info.h  | 20 +++++++++-
> >  fs/btrfs/zoned.c       | 14 +++++--
> >  6 files changed, 152 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING
  2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
  2023-03-02  6:45   ` Anand Jain
  2023-03-02 10:03   ` Johannes Thumshirn
@ 2023-03-02 14:45   ` Naohiro Aota
  2 siblings, 0 replies; 13+ messages in thread
From: Naohiro Aota @ 2023-03-02 14:45 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Mar 01, 2023 at 04:14:42PM -0500, Josef Bacik wrote:
> This flag only gets set when we're doing active zone tracking, and I'm
> going to need to use this flag for things related to this behavior.
> Rename the flag to represent what it actually means for the file system
> so it can be used in other ways and still make sense.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Looks good,
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group
  2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
  2023-03-02  6:46   ` Anand Jain
  2023-03-02 10:04   ` Johannes Thumshirn
@ 2023-03-02 14:47   ` Naohiro Aota
  2 siblings, 0 replies; 13+ messages in thread
From: Naohiro Aota @ 2023-03-02 14:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Mar 01, 2023 at 04:14:43PM -0500, Josef Bacik wrote:
> We do
> 
> cache->space_info->counter += num_bytes;
> 
> Everywhere in here.  This is makes the lines longer than they need to
> be, and will be especially noticeable when I add the active tracking in,
> so add a temp variable for the space_info so this is cleaner.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Looks good to me.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] btrfs: handle active zone accounting properly
  2023-03-02 14:45     ` Naohiro Aota
@ 2023-03-02 16:07       ` Josef Bacik
  0 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2023-03-02 16:07 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, kernel-team

On Thu, Mar 02, 2023 at 02:45:13PM +0000, Naohiro Aota wrote:
> On Thu, Mar 02, 2023 at 04:01:07PM +0900, Naohiro Aota wrote:
> > On Wed, Mar 01, 2023 at 04:14:44PM -0500, Josef Bacik wrote:
> > > Running xfstests on my ZNS drives uncovered a problem where I was
> > > allocating the entire device with metadata block groups.  The root cause
> > > of this was we would get ENOSPC on mount, and while trying to satisfy
> > > tickets we'd allocate more block groups.
> > > 
> > > The reason we were getting ENOSPC was because ->bytes_zone_unusable was
> > > set to 40gib, but ->active_total_bytes was set to 8gib, which was the
> > > maximum number of active block groups we're allowed to have at one time.
> > > This was because we always update ->bytes_zone_unusable with the
> > > unusable amount from every block group, but we only update
> > > ->active_total_bytes with the active block groups.
> > >
> > > This is actually much worse however, because we could potentially have
> > > other counters potentially well above the ->active_total_bytes, which
> > > would lead to this early enospc situation.
> > > 
> > > Fix this by mirroring the counters for active block groups into their
> > > own counters.  This allows us to keep the full space_info counters
> > > consistent, which is needed for things like sysfs and the space info
> > > ioctl, and then track the actual usage for ENOSPC reasons for only the
> > > active block groups.
> > 
> > I think the mirroring the counters approach duplicates the code and
> > variables and makes them complex. Instead, we can fix the
> > "active_total_bytes" accounting maybe like this:
> > 
> > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> > index d4dd73c9a701..bf4d96d74efe 100644
> > --- a/fs/btrfs/space-info.c
> > +++ b/fs/btrfs/space-info.c
> > @@ -319,7 +319,8 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
> >  	ASSERT(found);
> >  	spin_lock(&found->lock);
> >  	found->total_bytes += block_group->length;
> > -	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
> > +	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags) ||
> > +	    btrfs_zoned_bg_is_full(block_group))
> >  		found->active_total_bytes += block_group->length;
> >  	found->disk_total += block_group->length * factor;
> >  	found->bytes_used += block_group->used;
> > 
> > Or, we can remove "active_total_bytes" and introduce something like
> > "preactivated_bytes" to count the bytes of BGs never get activated (BGs #1 below).
> 
> I got a new idea. How about counting all the region in a new block group as
> zone_unusable? Then, region [0 .. zone_capacity] will be freed for use once
> it gets activated. With this, we can drop "active_total_bytes" so the code
> will become similar between the regular and the zoned mode.
> 
> We also need to care not to reclaim the fresh BGs but it's trivial
> (alloc_offset == 0).
> 

I like this idea better than mine for sure.  Will you write it up and send it
and I'll run it through my test-rig?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-03-02 16:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01 21:14 [PATCH 0/3] Fix active zone accounting for zoned Josef Bacik
2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
2023-03-02  6:45   ` Anand Jain
2023-03-02 10:03   ` Johannes Thumshirn
2023-03-02 14:45   ` Naohiro Aota
2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
2023-03-02  6:46   ` Anand Jain
2023-03-02 10:04   ` Johannes Thumshirn
2023-03-02 14:47   ` Naohiro Aota
2023-03-01 21:14 ` [PATCH 3/3] btrfs: handle active zone accounting properly Josef Bacik
2023-03-02  7:01   ` Naohiro Aota
2023-03-02 14:45     ` Naohiro Aota
2023-03-02 16:07       ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.