All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] ZNS Support for Btrfs
@ 2021-08-11 14:16 Naohiro Aota
  2021-08-11 14:16 ` [PATCH 01/17] btrfs: zoned: load zone capacity information from devices Naohiro Aota
                   ` (16 more replies)
  0 siblings, 17 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

This series extends zoned support for Zoned Namespace (ZNS) SSDs [1].

[1] https://zonedstorage.io/introduction/zns/

This series is available on GitHub at
v1    https://github.com/naota/linux/tree/btrfs-zns-v1
HEAD  https://github.com/naota/linux/tree/btrfs-zns

The ZNS specification introduces extra functionalities listed below.

- No conventional zones
- Zone Append write command
- Zone Capacity
- Active Zones

The first two functionalities are already addressed in the current
zoned support on btrfs. We do not rely on conventional zones, and we
use the zone append write command to write data IOs.

This series implements support for the other ones.

While userland tool needs some tweaks (e.g. using capactiy instead of
the length) to be precise, but it still works fine as it is.

* Zone Capacity Support

A zone capacity is an additional per-zone attribute that indicates the
number of usable logical blocks within each zone, starting from the
first logical block of each zone. It is always smaller or equal to the
zone size.

We can naturally map the capacity to the newly introduced
"zone_capacity" of a block group. Allocations are limited under the
zone capacity instead of the block group's length.

* Active Zones Tracking

The ZNS specification defines a limit on the number of zones that can
be in the implicit open, explicit open or closed conditions. Any zone
with such condition is defined as an active zone and correspond to any
zone that is being written or that has been only partially written. If
the maximum number of active zones is reached, we must either reset or
finish some active zones before being able to chose other zones for
storing data.

In order to not exceed the number of max active zones, we need to
track which zones are active and how the active zones are related to
the block groups. We mark a block group as "active" if the
corresponding device zones are all active. Allocating an extent will
activate a block group, and allocation from an inactive block group is
prohibited. Such active block groups are tracked in a list. Once a
block group is fully written, we deactivate it and remove it from the
list.
  
* Active Zone Aware Sequential Allocator

Handling the active zones will make the allocator complex. Here is a
summary of how find_free_extent_update_loop() behave.
  
1. If enough space is available in an active block group
   -  allocate from it (end, success)
2. If we can activate another zone on a device
   2.1 Try to allocate a new block group and activate it
   2.2 If the activation succeeds
      - allocation will be satisfied from it in the next iteration
   2.3 If the activation failed
      - Try the next cycle. Some writes may free up an active block group
3. If we cannot activate any zones
   3.1 Try to allocate in a small size by checking min_alloc_size
      - btrfs_reserve_extent() will halve the allocation size and
        restart the loop
   3.2 Nothing can be done anymore. Give up. ENOSPC

* Patch series organization

Note: patches 2 and 14 are preparation patches and can be merged
independently.

Patches 1-6 implement zone capacity support.

Patch 7 implements finishing a superblock zone once there is no space
left for new superblock.

Patches 8-13 implement the activation side of the active zone
tracking.

Patches 14 and 15 tweak the allocator to retry with a smaller size if
possible (step 3.1 in the above list)

Patches 16 and 17 implement the deactivation side of the active zone
tracking.

Naohiro Aota (17):
  btrfs: zoned: load zone capacity information from devices
  btrfs: zoned: move btrfs_free_excluded_extents out from
    btrfs_calc_zone_unusable
  btrfs: zoned: calculate free space from zone capacity
  btrfs: zoned: tweak reclaim threshold for zone capacity
  btrfs: zoned: consider zone as full when no more SB can be written
  btrfs: zoned: locate superblock position using zone capacity
  btrfs: zoned: finish superblock zone once no space left for new SB
  btrfs: zoned: load active zone information from devices
  btrfs: zoned: introduce physical_map to btrfs_block_group
  btrfs: zoned: implement active zone tracking
  btrfs: zoned: load active zone info for block group
  btrfs: zoned: activate block group on allocation
  btrfs: zoned: activate new block group
  btrfs: move ffe_ctl one level up
  btrfs: zoned: avoid chunk allocation if active block group has enough
    space
  btrfs: zoned: finish fully written block group
  btrfs: zoned: finish relocating block group

 fs/btrfs/block-group.c      |  29 ++-
 fs/btrfs/block-group.h      |   4 +
 fs/btrfs/ctree.h            |   3 +
 fs/btrfs/disk-io.c          |   6 +-
 fs/btrfs/extent-tree.c      | 204 +++++++++------
 fs/btrfs/extent_io.c        |  11 +-
 fs/btrfs/extent_io.h        |   1 +
 fs/btrfs/free-space-cache.c |  19 +-
 fs/btrfs/inode.c            |   6 +-
 fs/btrfs/relocation.c       |   4 +
 fs/btrfs/zoned.c            | 495 +++++++++++++++++++++++++++++++++---
 fs/btrfs/zoned.h            |  39 ++-
 12 files changed, 692 insertions(+), 129 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/17] btrfs: zoned: load zone capacity information from devices
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 02/17] btrfs: zoned: move btrfs_free_excluded_extents out from btrfs_calc_zone_unusable Naohiro Aota
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

The ZNS specification introduces the concept of a Zone Capacity.  A zone
capacity is an additional per-zone attribute that indicates the number of
usable logical blocks within each zone, starting from the first logical
block of each zone. It is always smaller or equal to the zone size.

With the SINGLE profile, we can set a block group's "capacity" as the same
as the underlying zone's Zone Capacity. We will limit the allocation not
to exceed in a following commit.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.h |  1 +
 fs/btrfs/zoned.c       | 24 +++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index c72a71efcb18..2db40a005512 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -202,6 +202,7 @@ struct btrfs_block_group {
 	 */
 	u64 alloc_offset;
 	u64 zone_unusable;
+	u64 zone_capacity;
 	u64 meta_write_pointer;
 };
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 47af1ab3bf12..32f444c7ec76 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1039,6 +1039,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	int i;
 	unsigned int nofs_flag;
 	u64 *alloc_offsets = NULL;
+	u64 *caps = NULL;
 	u64 last_alloc = 0;
 	u32 num_sequential = 0, num_conventional = 0;
 
@@ -1069,6 +1070,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		return -ENOMEM;
 	}
 
+	caps = kcalloc(map->num_stripes, sizeof(*caps), GFP_NOFS);
+	if (!caps) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	for (i = 0; i < map->num_stripes; i++) {
 		bool is_sequential;
 		struct blk_zone zone;
@@ -1131,6 +1138,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 			goto out;
 		}
 
+		caps[i] = zone.capacity << SECTOR_SHIFT;
+
 		switch (zone.cond) {
 		case BLK_ZONE_COND_OFFLINE:
 		case BLK_ZONE_COND_READONLY:
@@ -1144,7 +1153,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 			alloc_offsets[i] = 0;
 			break;
 		case BLK_ZONE_COND_FULL:
-			alloc_offsets[i] = fs_info->zone_size;
+			alloc_offsets[i] = caps[i];
 			break;
 		default:
 			/* Partially used zone */
@@ -1169,6 +1178,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		 * calculate_alloc_pointer() which takes extent buffer
 		 * locks to avoid deadlock.
 		 */
+
+		/* Zone capacity is always zone size in emulation */
+		cache->zone_capacity = cache->length;
 		if (new) {
 			cache->alloc_offset = 0;
 			goto out;
@@ -1195,6 +1207,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 			goto out;
 		}
 		cache->alloc_offset = alloc_offsets[0];
+		cache->zone_capacity = caps[0];
 		break;
 	case BTRFS_BLOCK_GROUP_DUP:
 	case BTRFS_BLOCK_GROUP_RAID1:
@@ -1218,6 +1231,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		ret = -EIO;
 	}
 
+	if (cache->alloc_offset > cache->zone_capacity) {
+		btrfs_err(fs_info,
+"zoned: invalid write pointer %llu (larger than zone capacity %llu) in block group %llu",
+			  cache->alloc_offset, cache->zone_capacity,
+			  cache->start);
+		ret = -EIO;
+	}
+
 	/* An extent is allocated after the write pointer */
 	if (!ret && num_conventional && last_alloc > cache->alloc_offset) {
 		btrfs_err(fs_info,
@@ -1229,6 +1250,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	if (!ret)
 		cache->meta_write_pointer = cache->alloc_offset + cache->start;
 
+	kfree(caps);
 	kfree(alloc_offsets);
 	free_extent_map(em);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/17] btrfs: zoned: move btrfs_free_excluded_extents out from btrfs_calc_zone_unusable
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
  2021-08-11 14:16 ` [PATCH 01/17] btrfs: zoned: load zone capacity information from devices Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 03/17] btrfs: zoned: calculate free space from zone capacity Naohiro Aota
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

btrfs_free_excluded_extents() is not nccessary for
btrfs_calc_zone_unusable() and it makes btrfs_calc_zone_unusable()
difficult to reuse. Move it out and call btrfs_free_excluded_extents() in
the proper context.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 5 +++++
 fs/btrfs/zoned.c       | 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9e833d74e8dc..db368518d42c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2037,6 +2037,11 @@ static int read_one_block_group(struct btrfs_fs_info *info,
 	 */
 	if (btrfs_is_zoned(info)) {
 		btrfs_calc_zone_unusable(cache);
+		/*
+		 * Should not have any excluded extents. Just in case,
+		 * though
+		 */
+		btrfs_free_excluded_extents(cache);
 	} else if (cache->length == cache->used) {
 		cache->last_byte_to_unpin = (u64)-1;
 		cache->cached = BTRFS_CACHE_FINISHED;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 32f444c7ec76..579fb03ba937 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1273,9 +1273,6 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache)
 	cache->cached = BTRFS_CACHE_FINISHED;
 	cache->free_space_ctl->free_space = free;
 	cache->zone_unusable = unusable;
-
-	/* Should not have any excluded extents. Just in case, though */
-	btrfs_free_excluded_extents(cache);
 }
 
 void btrfs_redirty_list_add(struct btrfs_transaction *trans,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/17] btrfs: zoned: calculate free space from zone capacity
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
  2021-08-11 14:16 ` [PATCH 01/17] btrfs: zoned: load zone capacity information from devices Naohiro Aota
  2021-08-11 14:16 ` [PATCH 02/17] btrfs: zoned: move btrfs_free_excluded_extents out from btrfs_calc_zone_unusable Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 04/17] btrfs: zoned: tweak reclaim threshold for " Naohiro Aota
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

Now that we introduced capacity in a block group, we need to calculate free
space using the capacity instead of the length. Thus, bytes we account
capacity - alloc_pointer as free, and account bytes [capacity, length] as
zone unusable.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c      | 6 ++++--
 fs/btrfs/extent-tree.c      | 3 ++-
 fs/btrfs/free-space-cache.c | 8 +++++++-
 fs/btrfs/zoned.c            | 5 +++--
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index db368518d42c..de22e3c9599e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2486,7 +2486,8 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
 	 */
 	trace_btrfs_add_block_group(fs_info, cache, 1);
 	btrfs_update_space_info(fs_info, cache->flags, size, bytes_used,
-				cache->bytes_super, 0, &cache->space_info);
+				cache->bytes_super, cache->zone_unusable,
+				&cache->space_info);
 	btrfs_update_global_block_rsv(fs_info);
 
 	link_block_group(cache);
@@ -2601,7 +2602,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache)
 	if (!--cache->ro) {
 		if (btrfs_is_zoned(cache->fs_info)) {
 			/* Migrate zone_unusable bytes back */
-			cache->zone_unusable = cache->alloc_offset - cache->used;
+			cache->zone_unusable = (cache->alloc_offset - cache->used) +
+				(cache->length - cache->zone_capacity);
 			sinfo->bytes_zone_unusable += cache->zone_unusable;
 			sinfo->bytes_readonly -= cache->zone_unusable;
 		}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc3da7585fb7..8dafb61c4946 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3796,7 +3796,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 		goto out;
 	}
 
-	avail = block_group->length - block_group->alloc_offset;
+	WARN_ON_ONCE(block_group->alloc_offset > block_group->zone_capacity);
+	avail = block_group->zone_capacity - block_group->alloc_offset;
 	if (avail < num_bytes) {
 		if (ffe_ctl->max_extent_size < avail) {
 			/*
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index da0eee7c9e5f..bb2536c745cd 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2539,10 +2539,15 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 	u64 offset = bytenr - block_group->start;
 	u64 to_free, to_unusable;
 	const int bg_reclaim_threshold = READ_ONCE(fs_info->bg_reclaim_threshold);
+	bool initial = (size == block_group->length);
+
+	WARN_ON(!initial && offset + size > block_group->zone_capacity);
 
 	spin_lock(&ctl->tree_lock);
 	if (!used)
 		to_free = size;
+	else if (initial)
+		to_free = block_group->zone_capacity;
 	else if (offset >= block_group->alloc_offset)
 		to_free = size;
 	else if (offset + size <= block_group->alloc_offset)
@@ -2755,7 +2760,8 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group,
 	 */
 	if (btrfs_is_zoned(fs_info)) {
 		btrfs_info(fs_info, "free space %llu",
-			   block_group->length - block_group->alloc_offset);
+			   block_group->zone_capacity -
+			   block_group->alloc_offset);
 		return;
 	}
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 579fb03ba937..0eb8ea4d3542 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1265,8 +1265,9 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache)
 		return;
 
 	WARN_ON(cache->bytes_super != 0);
-	unusable = cache->alloc_offset - cache->used;
-	free = cache->length - cache->alloc_offset;
+	unusable = (cache->alloc_offset - cache->used) +
+		(cache->length - cache->zone_capacity);
+	free = cache->zone_capacity - cache->alloc_offset;
 
 	/* We only need ->free_space in ALLOC_SEQ block groups */
 	cache->last_byte_to_unpin = (u64)-1;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/17] btrfs: zoned: tweak reclaim threshold for zone capacity
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (2 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 03/17] btrfs: zoned: calculate free space from zone capacity Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 05/17] btrfs: zoned: consider zone as full when no more SB can be written Naohiro Aota
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

With the introduction of zone capacity, bytes [capacity, length] are always
zone unusable. Counting this region as a reclaim target will cause
reclaiming too early. Reclaim block groups based on bytes that can be
usable after resetting.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/free-space-cache.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index bb2536c745cd..772485c39e45 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2540,6 +2540,7 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 	u64 to_free, to_unusable;
 	const int bg_reclaim_threshold = READ_ONCE(fs_info->bg_reclaim_threshold);
 	bool initial = (size == block_group->length);
+	u64 reclaimable_unusable;
 
 	WARN_ON(!initial && offset + size > block_group->zone_capacity);
 
@@ -2570,12 +2571,15 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 		spin_unlock(&block_group->lock);
 	}
 
+	reclaimable_unusable = block_group->zone_unusable -
+		(block_group->length - block_group->zone_capacity);
 	/* All the region is now unusable. Mark it as unused and reclaim */
 	if (block_group->zone_unusable == block_group->length) {
 		btrfs_mark_bg_unused(block_group);
 	} else if (bg_reclaim_threshold &&
-		   block_group->zone_unusable >=
-		   div_factor_fine(block_group->length, bg_reclaim_threshold)) {
+		   reclaimable_unusable >=
+		   div_factor_fine(block_group->zone_capacity,
+				   bg_reclaim_threshold)) {
 		btrfs_mark_bg_to_reclaim(block_group);
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/17] btrfs: zoned: consider zone as full when no more SB can be written
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (3 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 04/17] btrfs: zoned: tweak reclaim threshold for " Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 06/17] btrfs: zoned: locate superblock position using zone capacity Naohiro Aota
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

We cannot write beyond zone capacity. So, we should consider a zone as
"full" when the write pointer goes beyond capacity - the size of super
info.

Also, take this opportunity to replace a subtle duplicated code with a loop
and fix a typo in comment.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 0eb8ea4d3542..d20c97395a70 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -45,6 +45,14 @@
  */
 #define BTRFS_MAX_ZONE_SIZE		SZ_8G
 
+#define SUPER_INFO_SECTORS	((u64)BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT)
+
+static inline bool sb_zone_is_full(struct blk_zone *zone)
+{
+	return (zone->cond == BLK_ZONE_COND_FULL) ||
+		(zone->wp + SUPER_INFO_SECTORS > zone->start + zone->capacity);
+}
+
 static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data)
 {
 	struct blk_zone *zones = data;
@@ -60,14 +68,13 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 	bool empty[BTRFS_NR_SB_LOG_ZONES];
 	bool full[BTRFS_NR_SB_LOG_ZONES];
 	sector_t sector;
+	int i;
 
-	ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL &&
-	       zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL);
-
-	empty[0] = (zones[0].cond == BLK_ZONE_COND_EMPTY);
-	empty[1] = (zones[1].cond == BLK_ZONE_COND_EMPTY);
-	full[0] = (zones[0].cond == BLK_ZONE_COND_FULL);
-	full[1] = (zones[1].cond == BLK_ZONE_COND_FULL);
+	for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+		ASSERT(zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL);
+		empty[i] = (zones[i].cond == BLK_ZONE_COND_EMPTY);
+		full[i] = sb_zone_is_full(&zones[i]);
+	}
 
 	/*
 	 * Possible states of log buffer zones
@@ -664,7 +671,7 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 			reset = &zones[1];
 
 		if (reset && reset->cond != BLK_ZONE_COND_EMPTY) {
-			ASSERT(reset->cond == BLK_ZONE_COND_FULL);
+			ASSERT(sb_zone_is_full(reset));
 
 			ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
 					       reset->start, reset->len,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/17] btrfs: zoned: locate superblock position using zone capacity
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (4 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 05/17] btrfs: zoned: consider zone as full when no more SB can be written Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 07/17] btrfs: zoned: finish superblock zone once no space left for new SB Naohiro Aota
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

sb_write_pointer() returns the write position of next superblock. For READ,
we need a previous location. When the pointer is at the head, the previous
one is the last one of the other zone. Calculate the last one's position
from zone capacity.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index d20c97395a70..188a4ebefe59 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -683,9 +683,20 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 			reset->wp = reset->start;
 		}
 	} else if (ret != -ENOENT) {
-		/* For READ, we want the precious one */
+		/*
+		 * For READ, we want the previous one. Move write pointer
+		 * to the end of a zone, if it is at the head of a zone.
+		 */
+		u64 zone_end = 0;
+
 		if (wp == zones[0].start << SECTOR_SHIFT)
-			wp = (zones[1].start + zones[1].len) << SECTOR_SHIFT;
+			zone_end = zones[1].start + zones[1].capacity;
+		else if (wp == zones[1].start << SECTOR_SHIFT)
+			zone_end = zones[0].start + zones[0].capacity;
+		if (zone_end)
+			wp = ALIGN_DOWN(zone_end << SECTOR_SHIFT,
+					BTRFS_SUPER_INFO_SIZE);
+
 		wp -= BTRFS_SUPER_INFO_SIZE;
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/17] btrfs: zoned: finish superblock zone once no space left for new SB
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (5 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 06/17] btrfs: zoned: locate superblock position using zone capacity Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 08/17] btrfs: zoned: load active zone information from devices Naohiro Aota
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

If there is no more space left for a new superblock in a superblock zone,
then it is better to ZONE_FINISH the zone and frees up the active zone
count.

Since btrfs_advance_sb_log() can now issue REQ_OP_ZONE_FINISH, we also need
to convert it to return int for the error case.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c |  4 +++-
 fs/btrfs/zoned.c   | 53 ++++++++++++++++++++++++++++++++--------------
 fs/btrfs/zoned.h   |  8 ++++---
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2f9515dccce0..74b8848de8da 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3881,7 +3881,9 @@ static int write_dev_supers(struct btrfs_device *device,
 			bio->bi_opf |= REQ_FUA;
 
 		btrfsic_submit_bio(bio);
-		btrfs_advance_sb_log(device, i);
+
+		if (btrfs_advance_sb_log(device, i))
+			errors++;
 	}
 	return errors < i ? 0 : -1;
 }
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 188a4ebefe59..3eb74542a9b1 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -789,36 +789,57 @@ static inline bool is_sb_log_zone(struct btrfs_zoned_device_info *zinfo,
 	return true;
 }
 
-void btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
+int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
 {
 	struct btrfs_zoned_device_info *zinfo = device->zone_info;
 	struct blk_zone *zone;
+	int i;
 
 	if (!is_sb_log_zone(zinfo, mirror))
-		return;
+		return 0;
 
 	zone = &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror];
-	if (zone->cond != BLK_ZONE_COND_FULL) {
+	for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+		/* Advance the next zone */
+		if (zone->cond == BLK_ZONE_COND_FULL) {
+			zone++;
+			continue;
+		}
+
 		if (zone->cond == BLK_ZONE_COND_EMPTY)
 			zone->cond = BLK_ZONE_COND_IMP_OPEN;
 
-		zone->wp += (BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT);
+		zone->wp += SUPER_INFO_SECTORS;
+
+		if (sb_zone_is_full(zone)) {
+			/*
+			 * No room left to write new superblock. Since
+			 * superblock is written with REQ_SYNC, it is safe
+			 * to finish the zone now.
+			 *
+			 * If the write pointer is exactly at the capacity,
+			 * explicit ZONE_FINISH is not necessary.
+			 */
+			if (zone->wp != zone->start + zone->capacity) {
+				int ret;
+
+				ret = blkdev_zone_mgmt(device->bdev,
+						       REQ_OP_ZONE_FINISH,
+						       zone->start, zone->len,
+						       GFP_NOFS);
+				if (ret)
+					return ret;
+			}
 
-		if (zone->wp == zone->start + zone->len)
+			zone->wp = zone->start + zone->len;
 			zone->cond = BLK_ZONE_COND_FULL;
-
-		return;
+		}
+		return 0;
 	}
 
-	zone++;
-	ASSERT(zone->cond != BLK_ZONE_COND_FULL);
-	if (zone->cond == BLK_ZONE_COND_EMPTY)
-		zone->cond = BLK_ZONE_COND_IMP_OPEN;
-
-	zone->wp += (BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT);
-
-	if (zone->wp == zone->start + zone->len)
-		zone->cond = BLK_ZONE_COND_FULL;
+	/* All the zones are FULL. Should not reach here. */
+	ASSERT(0);
+	return -EIO;
 }
 
 int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 4b299705bb12..4f30f3bf1886 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -40,7 +40,7 @@ int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
 			       u64 *bytenr_ret);
 int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
 			  u64 *bytenr_ret);
-void btrfs_advance_sb_log(struct btrfs_device *device, int mirror);
+int btrfs_advance_sb_log(struct btrfs_device *device, int mirror);
 int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror);
 u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 				 u64 hole_end, u64 num_bytes);
@@ -113,8 +113,10 @@ static inline int btrfs_sb_log_location(struct btrfs_device *device, int mirror,
 	return 0;
 }
 
-static inline void btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
-{ }
+static inline int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
+{
+	return 0;
+}
 
 static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/17] btrfs: zoned: load active zone information from devices
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (6 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 07/17] btrfs: zoned: finish superblock zone once no space left for new SB Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 09/17] btrfs: zoned: introduce physical_map to btrfs_block_group Naohiro Aota
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

The ZNS specification defines a limit on the number of zones that can be in
the implicit open, explicit open or closed conditions. Any zone with such
condition is defined as an active zone and correspond to any zone that is
being written or that has been only partially written. If the maximum
number of active zones is reached, we must either reset or finish some
active zones before being able to chose other zones for storing data.

Load queue_max_active_zones() and track the number of active zones left on
the device.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/zoned.h |  3 +++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 3eb74542a9b1..a198ce073353 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -4,6 +4,7 @@
 #include <linux/slab.h>
 #include <linux/blkdev.h>
 #include <linux/sched/mm.h>
+#include <linux/atomic.h>
 #include "ctree.h"
 #include "volumes.h"
 #include "zoned.h"
@@ -38,6 +39,15 @@
 /* Number of superblock log zones */
 #define BTRFS_NR_SB_LOG_ZONES 2
 
+/* Number of minimal activate zones we want.
+ *
+ * - BTRFS_SUPER_MIRROR_MAX zones for superblock mirrors
+ * - 3 zones to ensure at least one zone per SYSTEM, META and DATA block group
+ * - 1 zone for tree-log dedicated block group
+ * - 1 zone for relocation
+ */
+#define BTRFS_MIN_ACTIVE_ZONES (BTRFS_SUPER_MIRROR_MAX + 5)
+
 /*
  * Maximum supported zone size. Currently, SMR disks have a zone size of
  * 256MiB, and we are expecting ZNS drives to be in the 1-4GiB range. We do not
@@ -303,6 +313,9 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 	struct btrfs_fs_info *fs_info = device->fs_info;
 	struct btrfs_zoned_device_info *zone_info = NULL;
 	struct block_device *bdev = device->bdev;
+	struct request_queue *queue = bdev_get_queue(bdev);
+	unsigned int max_active_zones;
+	unsigned int nactive;
 	sector_t nr_sectors;
 	sector_t sector = 0;
 	struct blk_zone *zones = NULL;
@@ -358,6 +371,17 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 	if (!IS_ALIGNED(nr_sectors, zone_sectors))
 		zone_info->nr_zones++;
 
+	max_active_zones = queue_max_active_zones(queue);
+	if (max_active_zones && max_active_zones < BTRFS_MIN_ACTIVE_ZONES) {
+		btrfs_err_in_rcu(fs_info,
+"zoned: %s: max active zones %u is too small. Need at least %u active zones",
+				 rcu_str_deref(device->name), max_active_zones,
+				 BTRFS_MIN_ACTIVE_ZONES);
+		ret = -EINVAL;
+		goto out;
+	}
+	zone_info->max_active_zones = max_active_zones;
+
 	zone_info->seq_zones = bitmap_zalloc(zone_info->nr_zones, GFP_KERNEL);
 	if (!zone_info->seq_zones) {
 		ret = -ENOMEM;
@@ -370,6 +394,12 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 		goto out;
 	}
 
+	zone_info->active_zones = bitmap_zalloc(zone_info->nr_zones, GFP_KERNEL);
+	if (!zone_info->active_zones) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	zones = kcalloc(BTRFS_REPORT_NR_ZONES, sizeof(struct blk_zone), GFP_KERNEL);
 	if (!zones) {
 		ret = -ENOMEM;
@@ -377,6 +407,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 	}
 
 	/* Get zones type */
+	nactive = 0;
 	while (sector < nr_sectors) {
 		nr_zones = BTRFS_REPORT_NR_ZONES;
 		ret = btrfs_get_dev_zones(device, sector << SECTOR_SHIFT, zones,
@@ -387,8 +418,17 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 		for (i = 0; i < nr_zones; i++) {
 			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ)
 				__set_bit(nreported, zone_info->seq_zones);
-			if (zones[i].cond == BLK_ZONE_COND_EMPTY)
+			switch (zones[i].cond) {
+			case BLK_ZONE_COND_EMPTY:
 				__set_bit(nreported, zone_info->empty_zones);
+				break;
+			case BLK_ZONE_COND_IMP_OPEN:
+			case BLK_ZONE_COND_EXP_OPEN:
+			case BLK_ZONE_COND_CLOSED:
+				__set_bit(nreported, zone_info->active_zones);
+				nactive++;
+				break;
+			}
 			nreported++;
 		}
 		sector = zones[nr_zones - 1].start + zones[nr_zones - 1].len;
@@ -403,6 +443,19 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 		goto out;
 	}
 
+	if (max_active_zones) {
+		if (nactive > max_active_zones) {
+			btrfs_err_in_rcu(device->fs_info,
+			"zoned: %d active zones on %s exceeds max_active_zones %d",
+					 nactive, rcu_str_deref(device->name),
+					 max_active_zones);
+			ret = -EIO;
+			goto out;
+		}
+		atomic_set(&zone_info->active_zones_left,
+			   max_active_zones - nactive);
+	}
+
 	/* Validate superblock log */
 	nr_zones = BTRFS_NR_SB_LOG_ZONES;
 	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
@@ -485,6 +538,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 out:
 	kfree(zones);
 out_free_zone_info:
+	bitmap_free(zone_info->active_zones);
 	bitmap_free(zone_info->empty_zones);
 	bitmap_free(zone_info->seq_zones);
 	kfree(zone_info);
@@ -500,6 +554,7 @@ void btrfs_destroy_dev_zone_info(struct btrfs_device *device)
 	if (!zone_info)
 		return;
 
+	bitmap_free(zone_info->active_zones);
 	bitmap_free(zone_info->seq_zones);
 	bitmap_free(zone_info->empty_zones);
 	kfree(zone_info);
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 4f30f3bf1886..48628782e4b8 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -23,8 +23,11 @@ struct btrfs_zoned_device_info {
 	u64 zone_size;
 	u8  zone_size_shift;
 	u32 nr_zones;
+	unsigned int max_active_zones;
+	atomic_t active_zones_left;
 	unsigned long *seq_zones;
 	unsigned long *empty_zones;
+	unsigned long *active_zones;
 	struct blk_zone sb_zones[2 * BTRFS_SUPER_MIRROR_MAX];
 };
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/17] btrfs: zoned: introduce physical_map to btrfs_block_group
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (7 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 08/17] btrfs: zoned: load active zone information from devices Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 10/17] btrfs: zoned: implement active zone tracking Naohiro Aota
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

We will use a block group's physical location to track active zones and
finish fully written zones in the following commits. Since the zone
activation is done in the extent allocation context which already holding
the tree locks, we can't query the chunk tree for the physical locations.
So, copy the location info into a block group and use it for activation.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c |  1 +
 fs/btrfs/block-group.h |  1 +
 fs/btrfs/zoned.c       | 17 +++++++++++++++--
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index de22e3c9599e..c94815cbd136 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -144,6 +144,7 @@ void btrfs_put_block_group(struct btrfs_block_group *cache)
 		 */
 		WARN_ON(!RB_EMPTY_ROOT(&cache->full_stripe_locks_root.root));
 		kfree(cache->free_space_ctl);
+		kfree(cache->physical_map);
 		kfree(cache);
 	}
 }
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 2db40a005512..265db2c316d3 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -204,6 +204,7 @@ struct btrfs_block_group {
 	u64 zone_unusable;
 	u64 zone_capacity;
 	u64 meta_write_pointer;
+	struct map_lookup *physical_map;
 };
 
 static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index a198ce073353..cfc6d4337473 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1158,10 +1158,19 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 
 	map = em->map_lookup;
 
+	cache->physical_map = kmalloc(map_lookup_size(map->num_stripes),
+				      GFP_NOFS);
+	if (!cache->physical_map) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	memcpy(cache->physical_map, map, map_lookup_size(map->num_stripes));
+
 	alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), GFP_NOFS);
 	if (!alloc_offsets) {
-		free_extent_map(em);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	caps = kcalloc(map->num_stripes, sizeof(*caps), GFP_NOFS);
@@ -1344,6 +1353,10 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	if (!ret)
 		cache->meta_write_pointer = cache->alloc_offset + cache->start;
 
+	if (ret) {
+		kfree(cache->physical_map);
+		cache->physical_map = NULL;
+	}
 	kfree(caps);
 	kfree(alloc_offsets);
 	free_extent_map(em);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/17] btrfs: zoned: implement active zone tracking
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (8 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 09/17] btrfs: zoned: introduce physical_map to btrfs_block_group Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 11/17] btrfs: zoned: load active zone info for block group Naohiro Aota
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

Add zone_is_active flag to btrfs_block_group. This flag indicates the
underlying zones are all active. Such zone active block groups are tracked
by fs_info->active_bg_list.

btrfs_dev_{set,clear}_active_zone() take responsibility for the underlying
device part. They set/clear the bitmap to indicate zone activeness and
count the number of zones we can activate left.

btrfs_zone_{activate,finish}() take responsibility for the logical part and
the list management. In addition, btrfs_zone_finish() wait for any writes
on it and send REQ_OP_ZONE_FINISH to the zone.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c      |  11 +++
 fs/btrfs/block-group.h      |   2 +
 fs/btrfs/ctree.h            |   3 +
 fs/btrfs/disk-io.c          |   2 +
 fs/btrfs/free-space-cache.c |   5 +-
 fs/btrfs/zoned.c            | 183 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h            |  12 +++
 7 files changed, 216 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index c94815cbd136..90c4279592a0 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1898,6 +1898,7 @@ static struct btrfs_block_group *btrfs_create_block_group_cache(
 	INIT_LIST_HEAD(&cache->discard_list);
 	INIT_LIST_HEAD(&cache->dirty_list);
 	INIT_LIST_HEAD(&cache->io_list);
+	INIT_LIST_HEAD(&cache->active_bg_list);
 	btrfs_init_free_space_ctl(cache, cache->free_space_ctl);
 	atomic_set(&cache->frozen, 0);
 	mutex_init(&cache->free_space_lock);
@@ -3843,6 +3844,16 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 	}
 	spin_unlock(&info->unused_bgs_lock);
 
+	spin_lock(&info->zone_active_bgs_lock);
+	while (!list_empty(&info->zone_active_bgs)) {
+		block_group = list_first_entry(&info->zone_active_bgs,
+					       struct btrfs_block_group,
+					       active_bg_list);
+		list_del_init(&block_group->active_bg_list);
+		btrfs_put_block_group(block_group);
+	}
+	spin_unlock(&info->zone_active_bgs_lock);
+
 	spin_lock(&info->block_group_cache_lock);
 	while ((n = rb_last(&info->block_group_cache_tree)) != NULL) {
 		block_group = rb_entry(n, struct btrfs_block_group,
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 265db2c316d3..f751b802b173 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -98,6 +98,7 @@ struct btrfs_block_group {
 	unsigned int to_copy:1;
 	unsigned int relocating_repair:1;
 	unsigned int chunk_item_inserted:1;
+	unsigned int zone_is_active:1;
 
 	int disk_cache_state;
 
@@ -205,6 +206,7 @@ struct btrfs_block_group {
 	u64 zone_capacity;
 	u64 meta_write_pointer;
 	struct map_lookup *physical_map;
+	struct list_head active_bg_list;
 };
 
 static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a898257ad2b5..bbe1a93a5d6e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1017,6 +1017,9 @@ struct btrfs_fs_info {
 	spinlock_t treelog_bg_lock;
 	u64 treelog_bg;
 
+	spinlock_t zone_active_bgs_lock;
+	struct list_head zone_active_bgs;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 74b8848de8da..88c7b2dcf78d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2883,6 +2883,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	spin_lock_init(&fs_info->buffer_lock);
 	spin_lock_init(&fs_info->unused_bgs_lock);
 	spin_lock_init(&fs_info->treelog_bg_lock);
+	spin_lock_init(&fs_info->zone_active_bgs_lock);
 	rwlock_init(&fs_info->tree_mod_log_lock);
 	mutex_init(&fs_info->unused_bg_unpin_mutex);
 	mutex_init(&fs_info->reclaim_bgs_lock);
@@ -2896,6 +2897,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
 	INIT_LIST_HEAD(&fs_info->unused_bgs);
 	INIT_LIST_HEAD(&fs_info->reclaim_bgs);
+	INIT_LIST_HEAD(&fs_info->zone_active_bgs);
 #ifdef CONFIG_BTRFS_DEBUG
 	INIT_LIST_HEAD(&fs_info->allocated_roots);
 	INIT_LIST_HEAD(&fs_info->allocated_ebs);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 772485c39e45..e74e0ec1e3cc 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2763,9 +2763,10 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group,
 	 * out the free space after the allocation offset.
 	 */
 	if (btrfs_is_zoned(fs_info)) {
-		btrfs_info(fs_info, "free space %llu",
+		btrfs_info(fs_info, "free space %llu active %d",
 			   block_group->zone_capacity -
-			   block_group->alloc_offset);
+			   block_group->alloc_offset,
+			   block_group->zone_is_active);
 		return;
 	}
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index cfc6d4337473..39468989a203 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -989,6 +989,41 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 	return pos;
 }
 
+static bool btrfs_dev_set_active_zone(struct btrfs_device *device, u64 pos)
+{
+	struct btrfs_zoned_device_info *zone_info = device->zone_info;
+	unsigned int zno = pos >> zone_info->zone_size_shift;
+
+	/* We can use any number of zones */
+	if (!zone_info->max_active_zones)
+		return true;
+
+	if (!test_bit(zno, zone_info->active_zones)) {
+		/* Active zone left? */
+		if (atomic_dec_if_positive(&zone_info->active_zones_left) < 0)
+			return false;
+		if (test_and_set_bit(zno, zone_info->active_zones)) {
+			/* Someone already set the bit */
+			atomic_inc(&zone_info->active_zones_left);
+		}
+	}
+
+	return true;
+}
+
+static void btrfs_dev_clear_active_zone(struct btrfs_device *device, u64 pos)
+{
+	struct btrfs_zoned_device_info *zone_info = device->zone_info;
+	unsigned int zno = pos >> zone_info->zone_size_shift;
+
+	/* We can use any number of zones */
+	if (!zone_info->max_active_zones)
+		return;
+
+	if (test_and_clear_bit(zno, zone_info->active_zones))
+		atomic_inc(&zone_info->active_zones_left);
+}
+
 int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical,
 			    u64 length, u64 *bytes)
 {
@@ -1004,6 +1039,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical,
 	*bytes = length;
 	while (length) {
 		btrfs_dev_set_zone_empty(device, physical);
+		btrfs_dev_clear_active_zone(device, physical);
 		physical += device->zone_info->zone_size;
 		length -= device->zone_info->zone_size;
 	}
@@ -1657,3 +1693,150 @@ struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info,
 
 	return device;
 }
+
+/**
+ * btrfs_zone_activate - activate block group and underlying device zones
+ *
+ * @block_group: the block group to activate
+ *
+ * @return: true on success, false otherwise
+ */
+bool btrfs_zone_activate(struct btrfs_block_group *block_group)
+{
+	struct btrfs_fs_info *fs_info = block_group->fs_info;
+	struct map_lookup *map;
+	struct btrfs_device *device;
+	u64 physical;
+	bool ret;
+
+	if (!btrfs_is_zoned(block_group->fs_info))
+		return true;
+
+	map = block_group->physical_map;
+	/* Currently support SINGLE profile only */
+	ASSERT(map->num_stripes == 1);
+	device = map->stripes[0].dev;
+	physical = map->stripes[0].physical;
+
+	if (!device->zone_info->max_active_zones)
+		return true;
+
+	spin_lock(&block_group->lock);
+
+	if (block_group->zone_is_active) {
+		ret = true;
+		goto out_unlock;
+	}
+
+	/* No space left */
+	if (block_group->alloc_offset == block_group->zone_capacity) {
+		ret = false;
+		goto out_unlock;
+	}
+
+	if (!btrfs_dev_set_active_zone(device, physical)) {
+		/* Cannot activate the zone */
+		ret = false;
+		goto out_unlock;
+	}
+
+	/* Successfully activated all the zones */
+	block_group->zone_is_active = 1;
+
+	spin_unlock(&block_group->lock);
+
+	/* for the active BG list */
+	btrfs_get_block_group(block_group);
+
+	spin_lock(&fs_info->zone_active_bgs_lock);
+	ASSERT(list_empty(&block_group->active_bg_list));
+	list_add_tail(&block_group->active_bg_list, &fs_info->zone_active_bgs);
+	spin_unlock(&fs_info->zone_active_bgs_lock);
+
+	return true;
+
+out_unlock:
+	spin_unlock(&block_group->lock);
+	return ret;
+}
+
+int btrfs_zone_finish(struct btrfs_block_group *block_group)
+{
+	struct btrfs_fs_info *fs_info = block_group->fs_info;
+	struct map_lookup *map;
+	struct btrfs_device *device;
+	u64 physical;
+	int ret = 0;
+
+	if (!btrfs_is_zoned(fs_info))
+		return 0;
+
+	map = block_group->physical_map;
+	/* Currently support SINGLE profile only */
+	ASSERT(map->num_stripes == 1);
+
+	device = map->stripes[0].dev;
+	physical = map->stripes[0].physical;
+
+	if (!device->zone_info->max_active_zones)
+		return 0;
+
+	spin_lock(&block_group->lock);
+	if (!block_group->zone_is_active) {
+		spin_unlock(&block_group->lock);
+		return 0;
+	}
+
+	/* Check if we have unwritten allocated space */
+	if ((block_group->flags &
+	     (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_SYSTEM)) &&
+	    block_group->alloc_offset > block_group->meta_write_pointer) {
+		spin_unlock(&block_group->lock);
+		return -EAGAIN;
+	}
+	spin_unlock(&block_group->lock);
+
+	ret = btrfs_inc_block_group_ro(block_group, false);
+	if (ret)
+		return ret;
+
+	/* Ensure all writes in this block group finish */
+	btrfs_wait_block_group_reservations(block_group);
+	/*
+	 * No need to wait nocow writers. Zoned btrfs does not allow
+	 * nocow anyway.
+	 */
+	btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start,
+				 block_group->length);
+
+	spin_lock(&block_group->lock);
+	if (!block_group->zone_is_active) {
+		spin_unlock(&block_group->lock);
+		return 0;
+	}
+	block_group->zone_is_active = 0;
+	block_group->alloc_offset = block_group->zone_capacity;
+	block_group->free_space_ctl->free_space = 0;
+	btrfs_clear_treelog_bg(block_group);
+	spin_unlock(&block_group->lock);
+
+	ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
+			       physical >> SECTOR_SHIFT,
+			       device->zone_info->zone_size >> SECTOR_SHIFT,
+			       GFP_NOFS);
+	btrfs_dec_block_group_ro(block_group);
+
+	if (!ret) {
+		btrfs_dev_clear_active_zone(device, physical);
+
+		spin_lock(&fs_info->zone_active_bgs_lock);
+		ASSERT(!list_empty(&block_group->active_bg_list));
+		list_del_init(&block_group->active_bg_list);
+		spin_unlock(&fs_info->zone_active_bgs_lock);
+
+		/* for active_bg_list */
+		btrfs_put_block_group(block_group);
+	}
+
+	return ret;
+}
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 48628782e4b8..2345ecfa1805 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -69,6 +69,8 @@ int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical,
 				  u64 physical_start, u64 physical_pos);
 struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info,
 					    u64 logical, u64 length);
+bool btrfs_zone_activate(struct btrfs_block_group *block_group);
+int btrfs_zone_finish(struct btrfs_block_group *block_group);
 #else /* CONFIG_BLK_DEV_ZONED */
 static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 				     struct blk_zone *zone)
@@ -204,6 +206,16 @@ static inline struct btrfs_device *btrfs_zoned_get_device(
 	return ERR_PTR(-EOPNOTSUPP);
 }
 
+static inline bool btrfs_zone_activate(struct btrfs_block_group *block_group)
+{
+	return true;
+}
+
+static inline int btrfs_zone_finish(struct btrfs_block_group *block_group)
+{
+	return 0;
+}
+
 #endif
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 11/17] btrfs: zoned: load active zone info for block group
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (9 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 10/17] btrfs: zoned: implement active zone tracking Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 12/17] btrfs: zoned: activate block group on allocation Naohiro Aota
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

Load activeness of underlying zones of a block group. When underlying zones
are active, we add the block group to the fs_info->zone_active_bgs list.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 39468989a203..f3c31159fb2e 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1170,6 +1170,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	unsigned int nofs_flag;
 	u64 *alloc_offsets = NULL;
 	u64 *caps = NULL;
+	unsigned long *active = NULL;
 	u64 last_alloc = 0;
 	u32 num_sequential = 0, num_conventional = 0;
 
@@ -1215,6 +1216,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		goto out;
 	}
 
+	active = bitmap_zalloc(map->num_stripes, GFP_NOFS);
+	if (!active) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	for (i = 0; i < map->num_stripes; i++) {
 		bool is_sequential;
 		struct blk_zone zone;
@@ -1298,8 +1305,16 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 			/* Partially used zone */
 			alloc_offsets[i] =
 					((zone.wp - zone.start) << SECTOR_SHIFT);
+			__set_bit(i, active);
 			break;
 		}
+
+		/*
+		 * Consider a zone as active if we can allow any number of
+		 * active zones.
+		 */
+		if (!device->zone_info->max_active_zones)
+			__set_bit(i, active);
 	}
 
 	if (num_sequential > 0)
@@ -1347,6 +1362,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		}
 		cache->alloc_offset = alloc_offsets[0];
 		cache->zone_capacity = caps[0];
+		cache->zone_is_active = test_bit(0, active);
 		break;
 	case BTRFS_BLOCK_GROUP_DUP:
 	case BTRFS_BLOCK_GROUP_RAID1:
@@ -1362,6 +1378,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		goto out;
 	}
 
+	if (cache->zone_is_active) {
+		btrfs_get_block_group(cache);
+		spin_lock(&fs_info->zone_active_bgs_lock);
+		list_add_tail(&cache->active_bg_list,
+			      &fs_info->zone_active_bgs);
+		spin_unlock(&fs_info->zone_active_bgs_lock);
+	}
+
 out:
 	if (cache->alloc_offset > fs_info->zone_size) {
 		btrfs_err(fs_info,
@@ -1393,6 +1417,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		kfree(cache->physical_map);
 		cache->physical_map = NULL;
 	}
+	bitmap_free(active);
 	kfree(caps);
 	kfree(alloc_offsets);
 	free_extent_map(em);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 12/17] btrfs: zoned: activate block group on allocation
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (10 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 11/17] btrfs: zoned: load active zone info for block group Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 13/17] btrfs: zoned: activate new block group Naohiro Aota
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

Activate a block group when trying to allocate an extent from it. We check
read-only case and no space left case before trying to activate a block
group not to consume the number of active zones uselessly.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8dafb61c4946..ac367f1dc4e6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3773,6 +3773,18 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
 	if (skip)
 		return 1;
 
+	/* Check RO and no space case before trying to activate it */
+	spin_lock(&block_group->lock);
+	if (block_group->ro ||
+	    block_group->alloc_offset == block_group->zone_capacity) {
+		spin_unlock(&block_group->lock);
+		return 1;
+	}
+	spin_unlock(&block_group->lock);
+
+	if (!btrfs_zone_activate(block_group))
+		return 1;
+
 	spin_lock(&space_info->lock);
 	spin_lock(&block_group->lock);
 	spin_lock(&fs_info->treelog_bg_lock);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 13/17] btrfs: zoned: activate new block group
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (11 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 12/17] btrfs: zoned: activate block group on allocation Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 14/17] btrfs: move ffe_ctl one level up Naohiro Aota
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

Activate new block group at btrfs_make_block_group(). We do not check the
return value. If failed, we can try again later at the actual extent
allocation phase.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 90c4279592a0..f4fa65438eed 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2447,6 +2447,12 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
 		return ERR_PTR(ret);
 	}
 
+	/*
+	 * New block group is likely to be used soon. Try to activate it now.
+	 * Failure is OK for now.
+	 */
+	btrfs_zone_activate(cache);
+
 	ret = exclude_super_stripes(cache);
 	if (ret) {
 		/* We may have excluded something, so call this just in case */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 14/17] btrfs: move ffe_ctl one level up
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (12 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 13/17] btrfs: zoned: activate new block group Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 15/17] btrfs: zoned: avoid chunk allocation if active block group has enough space Naohiro Aota
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

We are passing too many variables as it is from btrfs_reserve_extent() to
find_free_extent(). The next commit will add min_alloc_size to ffe_ctl, and
that means another pass-through argument. Take this opportunity to move
ffe_ctl one level up and drop the redundant arguments.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 162 ++++++++++++++++++++++-------------------
 1 file changed, 88 insertions(+), 74 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ac367f1dc4e6..1daa432673c4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3476,6 +3476,7 @@ enum btrfs_extent_allocation_policy {
  */
 struct find_free_extent_ctl {
 	/* Basic allocation info */
+	u64 ram_bytes;
 	u64 num_bytes;
 	u64 empty_size;
 	u64 flags;
@@ -4130,65 +4131,62 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info,
  *    |- If not found, re-iterate all block groups
  */
 static noinline int find_free_extent(struct btrfs_root *root,
-				u64 ram_bytes, u64 num_bytes, u64 empty_size,
-				u64 hint_byte_orig, struct btrfs_key *ins,
-				u64 flags, int delalloc)
+				     struct btrfs_key *ins,
+				     struct find_free_extent_ctl *ffe_ctl)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	int ret = 0;
 	int cache_block_group_error = 0;
 	struct btrfs_block_group *block_group = NULL;
-	struct find_free_extent_ctl ffe_ctl = {0};
 	struct btrfs_space_info *space_info;
 	bool full_search = false;
-	bool for_treelog = (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID);
 
-	WARN_ON(num_bytes < fs_info->sectorsize);
-
-	ffe_ctl.num_bytes = num_bytes;
-	ffe_ctl.empty_size = empty_size;
-	ffe_ctl.flags = flags;
-	ffe_ctl.search_start = 0;
-	ffe_ctl.delalloc = delalloc;
-	ffe_ctl.index = btrfs_bg_flags_to_raid_index(flags);
-	ffe_ctl.have_caching_bg = false;
-	ffe_ctl.orig_have_caching_bg = false;
-	ffe_ctl.found_offset = 0;
-	ffe_ctl.hint_byte = hint_byte_orig;
-	ffe_ctl.for_treelog = for_treelog;
-	ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED;
+	WARN_ON(ffe_ctl->num_bytes < fs_info->sectorsize);
 
+	ffe_ctl->search_start = 0;
+	/* For clustered allocation */
+	ffe_ctl->empty_cluster = 0;
+	ffe_ctl->last_ptr = NULL;
+	ffe_ctl->use_cluster = true;
+	ffe_ctl->have_caching_bg = false;
+	ffe_ctl->orig_have_caching_bg = false;
+	ffe_ctl->index = btrfs_bg_flags_to_raid_index(ffe_ctl->flags);
+	ffe_ctl->loop = 0;
 	/* For clustered allocation */
-	ffe_ctl.retry_clustered = false;
-	ffe_ctl.retry_unclustered = false;
-	ffe_ctl.last_ptr = NULL;
-	ffe_ctl.use_cluster = true;
+	ffe_ctl->retry_clustered = false;
+	ffe_ctl->retry_unclustered = false;
+	ffe_ctl->cached = 0;
+	ffe_ctl->max_extent_size = 0;
+	ffe_ctl->total_free_space = 0;
+	ffe_ctl->found_offset = 0;
+	ffe_ctl->policy = BTRFS_EXTENT_ALLOC_CLUSTERED;
 
 	if (btrfs_is_zoned(fs_info))
-		ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED;
+		ffe_ctl->policy = BTRFS_EXTENT_ALLOC_ZONED;
 
 	ins->type = BTRFS_EXTENT_ITEM_KEY;
 	ins->objectid = 0;
 	ins->offset = 0;
 
-	trace_find_free_extent(root, num_bytes, empty_size, flags);
+	trace_find_free_extent(root, ffe_ctl->num_bytes, ffe_ctl->empty_size,
+			       ffe_ctl->flags);
 
-	space_info = btrfs_find_space_info(fs_info, flags);
+	space_info = btrfs_find_space_info(fs_info, ffe_ctl->flags);
 	if (!space_info) {
-		btrfs_err(fs_info, "No space info for %llu", flags);
+		btrfs_err(fs_info, "No space info for %llu", ffe_ctl->flags);
 		return -ENOSPC;
 	}
 
-	ret = prepare_allocation(fs_info, &ffe_ctl, space_info, ins);
+	ret = prepare_allocation(fs_info, ffe_ctl, space_info, ins);
 	if (ret < 0)
 		return ret;
 
-	ffe_ctl.search_start = max(ffe_ctl.search_start,
-				   first_logical_byte(fs_info, 0));
-	ffe_ctl.search_start = max(ffe_ctl.search_start, ffe_ctl.hint_byte);
-	if (ffe_ctl.search_start == ffe_ctl.hint_byte) {
+	ffe_ctl->search_start = max(ffe_ctl->search_start,
+				    first_logical_byte(fs_info, 0));
+	ffe_ctl->search_start = max(ffe_ctl->search_start, ffe_ctl->hint_byte);
+	if (ffe_ctl->search_start == ffe_ctl->hint_byte) {
 		block_group = btrfs_lookup_block_group(fs_info,
-						       ffe_ctl.search_start);
+						       ffe_ctl->search_start);
 		/*
 		 * we don't want to use the block group if it doesn't match our
 		 * allocation bits, or if its not cached.
@@ -4196,7 +4194,7 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		 * However if we are re-searching with an ideal block group
 		 * picked out then we don't care that the block group is cached.
 		 */
-		if (block_group && block_group_bits(block_group, flags) &&
+		if (block_group && block_group_bits(block_group, ffe_ctl->flags) &&
 		    block_group->cached != BTRFS_CACHE_NO) {
 			down_read(&space_info->groups_sem);
 			if (list_empty(&block_group->list) ||
@@ -4210,9 +4208,10 @@ static noinline int find_free_extent(struct btrfs_root *root,
 				btrfs_put_block_group(block_group);
 				up_read(&space_info->groups_sem);
 			} else {
-				ffe_ctl.index = btrfs_bg_flags_to_raid_index(
+				ffe_ctl->index = btrfs_bg_flags_to_raid_index(
 						block_group->flags);
-				btrfs_lock_block_group(block_group, delalloc);
+				btrfs_lock_block_group(block_group,
+						       ffe_ctl->delalloc);
 				goto have_block_group;
 			}
 		} else if (block_group) {
@@ -4220,31 +4219,31 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		}
 	}
 search:
-	ffe_ctl.have_caching_bg = false;
-	if (ffe_ctl.index == btrfs_bg_flags_to_raid_index(flags) ||
-	    ffe_ctl.index == 0)
+	ffe_ctl->have_caching_bg = false;
+	if (ffe_ctl->index == btrfs_bg_flags_to_raid_index(ffe_ctl->flags) ||
+	    ffe_ctl->index == 0)
 		full_search = true;
 	down_read(&space_info->groups_sem);
 	list_for_each_entry(block_group,
-			    &space_info->block_groups[ffe_ctl.index], list) {
+			    &space_info->block_groups[ffe_ctl->index], list) {
 		struct btrfs_block_group *bg_ret;
 
 		/* If the block group is read-only, we can skip it entirely. */
 		if (unlikely(block_group->ro)) {
-			if (for_treelog)
+			if (ffe_ctl->for_treelog)
 				btrfs_clear_treelog_bg(block_group);
 			continue;
 		}
 
-		btrfs_grab_block_group(block_group, delalloc);
-		ffe_ctl.search_start = block_group->start;
+		btrfs_grab_block_group(block_group, ffe_ctl->delalloc);
+		ffe_ctl->search_start = block_group->start;
 
 		/*
 		 * this can happen if we end up cycling through all the
 		 * raid types, but we want to make sure we only allocate
 		 * for the proper type.
 		 */
-		if (!block_group_bits(block_group, flags)) {
+		if (!block_group_bits(block_group, ffe_ctl->flags)) {
 			u64 extra = BTRFS_BLOCK_GROUP_DUP |
 				BTRFS_BLOCK_GROUP_RAID1_MASK |
 				BTRFS_BLOCK_GROUP_RAID56_MASK |
@@ -4255,7 +4254,8 @@ static noinline int find_free_extent(struct btrfs_root *root,
 			 * doesn't provide them, bail.  This does allow us to
 			 * fill raid0 from raid1.
 			 */
-			if ((flags & extra) && !(block_group->flags & extra))
+			if ((ffe_ctl->flags & extra) &&
+			    !(block_group->flags & extra))
 				goto loop;
 
 			/*
@@ -4263,14 +4263,15 @@ static noinline int find_free_extent(struct btrfs_root *root,
 			 * It's possible that we have MIXED_GROUP flag but no
 			 * block group is mixed.  Just skip such block group.
 			 */
-			btrfs_release_block_group(block_group, delalloc);
+			btrfs_release_block_group(block_group,
+						  ffe_ctl->delalloc);
 			continue;
 		}
 
 have_block_group:
-		ffe_ctl.cached = btrfs_block_group_done(block_group);
-		if (unlikely(!ffe_ctl.cached)) {
-			ffe_ctl.have_caching_bg = true;
+		ffe_ctl->cached = btrfs_block_group_done(block_group);
+		if (unlikely(!ffe_ctl->cached)) {
+			ffe_ctl->have_caching_bg = true;
 			ret = btrfs_cache_block_group(block_group, 0);
 
 			/*
@@ -4293,10 +4294,11 @@ static noinline int find_free_extent(struct btrfs_root *root,
 			goto loop;
 
 		bg_ret = NULL;
-		ret = do_allocation(block_group, &ffe_ctl, &bg_ret);
+		ret = do_allocation(block_group, ffe_ctl, &bg_ret);
 		if (ret == 0) {
 			if (bg_ret && bg_ret != block_group) {
-				btrfs_release_block_group(block_group, delalloc);
+				btrfs_release_block_group(block_group,
+							  ffe_ctl->delalloc);
 				block_group = bg_ret;
 			}
 		} else if (ret == -EAGAIN) {
@@ -4306,46 +4308,49 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		}
 
 		/* Checks */
-		ffe_ctl.search_start = round_up(ffe_ctl.found_offset,
-					     fs_info->stripesize);
+		ffe_ctl->search_start = round_up(ffe_ctl->found_offset,
+						 fs_info->stripesize);
 
 		/* move on to the next group */
-		if (ffe_ctl.search_start + num_bytes >
+		if (ffe_ctl->search_start + ffe_ctl->num_bytes >
 		    block_group->start + block_group->length) {
 			btrfs_add_free_space_unused(block_group,
-					    ffe_ctl.found_offset, num_bytes);
+					    ffe_ctl->found_offset,
+					    ffe_ctl->num_bytes);
 			goto loop;
 		}
 
-		if (ffe_ctl.found_offset < ffe_ctl.search_start)
+		if (ffe_ctl->found_offset < ffe_ctl->search_start)
 			btrfs_add_free_space_unused(block_group,
-					ffe_ctl.found_offset,
-					ffe_ctl.search_start - ffe_ctl.found_offset);
+					ffe_ctl->found_offset,
+					ffe_ctl->search_start - ffe_ctl->found_offset);
 
-		ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
-				num_bytes, delalloc);
+		ret = btrfs_add_reserved_bytes(block_group, ffe_ctl->ram_bytes,
+					       ffe_ctl->num_bytes,
+					       ffe_ctl->delalloc);
 		if (ret == -EAGAIN) {
 			btrfs_add_free_space_unused(block_group,
-					ffe_ctl.found_offset, num_bytes);
+					ffe_ctl->found_offset,
+					ffe_ctl->num_bytes);
 			goto loop;
 		}
 		btrfs_inc_block_group_reservations(block_group);
 
 		/* we are all good, lets return */
-		ins->objectid = ffe_ctl.search_start;
-		ins->offset = num_bytes;
+		ins->objectid = ffe_ctl->search_start;
+		ins->offset = ffe_ctl->num_bytes;
 
-		trace_btrfs_reserve_extent(block_group, ffe_ctl.search_start,
-					   num_bytes);
-		btrfs_release_block_group(block_group, delalloc);
+		trace_btrfs_reserve_extent(block_group, ffe_ctl->search_start,
+					   ffe_ctl->num_bytes);
+		btrfs_release_block_group(block_group, ffe_ctl->delalloc);
 		break;
 loop:
-		release_block_group(block_group, &ffe_ctl, delalloc);
+		release_block_group(block_group, ffe_ctl, ffe_ctl->delalloc);
 		cond_resched();
 	}
 	up_read(&space_info->groups_sem);
 
-	ret = find_free_extent_update_loop(fs_info, ins, &ffe_ctl, full_search);
+	ret = find_free_extent_update_loop(fs_info, ins, ffe_ctl, full_search);
 	if (ret > 0)
 		goto search;
 
@@ -4354,12 +4359,12 @@ static noinline int find_free_extent(struct btrfs_root *root,
 		 * Use ffe_ctl->total_free_space as fallback if we can't find
 		 * any contiguous hole.
 		 */
-		if (!ffe_ctl.max_extent_size)
-			ffe_ctl.max_extent_size = ffe_ctl.total_free_space;
+		if (!ffe_ctl->max_extent_size)
+			ffe_ctl->max_extent_size = ffe_ctl->total_free_space;
 		spin_lock(&space_info->lock);
-		space_info->max_extent_size = ffe_ctl.max_extent_size;
+		space_info->max_extent_size = ffe_ctl->max_extent_size;
 		spin_unlock(&space_info->lock);
-		ins->offset = ffe_ctl.max_extent_size;
+		ins->offset = ffe_ctl->max_extent_size;
 	} else if (ret == -ENOSPC) {
 		ret = cache_block_group_error;
 	}
@@ -4417,6 +4422,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 			 struct btrfs_key *ins, int is_data, int delalloc)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct find_free_extent_ctl ffe_ctl = {0};
 	bool final_tried = num_bytes == min_alloc_size;
 	u64 flags;
 	int ret;
@@ -4425,8 +4431,16 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 	flags = get_alloc_profile_by_root(root, is_data);
 again:
 	WARN_ON(num_bytes < fs_info->sectorsize);
-	ret = find_free_extent(root, ram_bytes, num_bytes, empty_size,
-			       hint_byte, ins, flags, delalloc);
+
+	ffe_ctl.ram_bytes = ram_bytes;
+	ffe_ctl.num_bytes = num_bytes;
+	ffe_ctl.empty_size = empty_size;
+	ffe_ctl.flags = flags;
+	ffe_ctl.delalloc = delalloc;
+	ffe_ctl.hint_byte = hint_byte;
+	ffe_ctl.for_treelog = for_treelog;
+
+	ret = find_free_extent(root, ins, &ffe_ctl);
 	if (!ret && !is_data) {
 		btrfs_dec_block_group_reservations(fs_info, ins->objectid);
 	} else if (ret == -ENOSPC) {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 15/17] btrfs: zoned: avoid chunk allocation if active block group has enough space
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (13 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 14/17] btrfs: move ffe_ctl one level up Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 14:16 ` [PATCH 16/17] btrfs: zoned: finish fully written block group Naohiro Aota
  2021-08-11 14:16 ` [PATCH 17/17] btrfs: zoned: finish relocating " Naohiro Aota
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

The current extent allocator tries to allocate a new block group when the
existing block groups do not have enough space. On a ZNS device, a new
block group means a new active zone. If the number of active zones has
already reached the max_active_zones, activating a new zone needs to finish
an existing zone, leading to wasting the free space there.

So, instead, it should reuse the existing active block groups as much as
possible when we can't activate any other zones without sacrificing an
already activated block group.

While at it, I converted find_free_extent_update_loop() to check the
found_extent() case early and made the other conditions simpler.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 27 ++++++++++++++++++++-------
 fs/btrfs/zoned.c       | 32 ++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h       |  8 ++++++++
 3 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1daa432673c4..b11097f557f8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3478,6 +3478,7 @@ struct find_free_extent_ctl {
 	/* Basic allocation info */
 	u64 ram_bytes;
 	u64 num_bytes;
+	u64 min_alloc_size;
 	u64 empty_size;
 	u64 flags;
 	int delalloc;
@@ -3946,18 +3947,29 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 	    ffe_ctl->have_caching_bg && !ffe_ctl->orig_have_caching_bg)
 		ffe_ctl->orig_have_caching_bg = true;
 
-	if (!ins->objectid && ffe_ctl->loop >= LOOP_CACHING_WAIT &&
-	    ffe_ctl->have_caching_bg)
-		return 1;
-
-	if (!ins->objectid && ++(ffe_ctl->index) < BTRFS_NR_RAID_TYPES)
-		return 1;
-
 	if (ins->objectid) {
 		found_extent(ffe_ctl, ins);
 		return 0;
 	}
 
+	if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size &&
+	    !btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->index)) {
+		/*
+		 * If we have enough free space left in an already active
+		 * block group and we can't activate any other zone now,
+		 * retry the active ones with a smaller allocation size.
+		 * Returning early from here will tell
+		 * btrfs_reserve_extent() to haven the size.
+		 */
+		return -ENOSPC;
+	}
+
+	if (ffe_ctl->loop >= LOOP_CACHING_WAIT && ffe_ctl->have_caching_bg)
+		return 1;
+
+	if (++(ffe_ctl->index) < BTRFS_NR_RAID_TYPES)
+		return 1;
+
 	/*
 	 * LOOP_CACHING_NOWAIT, search partially cached block groups, kicking
 	 *			caching kthreads as we move along
@@ -4434,6 +4446,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 
 	ffe_ctl.ram_bytes = ram_bytes;
 	ffe_ctl.num_bytes = num_bytes;
+	ffe_ctl.min_alloc_size = min_alloc_size;
 	ffe_ctl.empty_size = empty_size;
 	ffe_ctl.flags = flags;
 	ffe_ctl.delalloc = delalloc;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index f3c31159fb2e..850662d103e9 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1865,3 +1865,35 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
 
 	return ret;
 }
+
+bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
+			     int raid_index)
+{
+	struct btrfs_device *device;
+	bool ret = false;
+
+	if (!btrfs_is_zoned(fs_devices->fs_info))
+		return true;
+
+	/* Non-single profiles are not supported yet */
+	if (raid_index != BTRFS_RAID_SINGLE)
+		return false;
+
+	/* Check if there is a device with active zones left */
+	mutex_lock(&fs_devices->device_list_mutex);
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		struct btrfs_zoned_device_info *zinfo = device->zone_info;
+
+		if (!device->bdev)
+			continue;
+
+		if (!zinfo->max_active_zones ||
+		    atomic_read(&zinfo->active_zones_left)) {
+			ret = true;
+			break;
+		}
+	}
+	mutex_unlock(&fs_devices->device_list_mutex);
+
+	return ret;
+}
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 2345ecfa1805..ade6588c4ccd 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -71,6 +71,8 @@ struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info,
 					    u64 logical, u64 length);
 bool btrfs_zone_activate(struct btrfs_block_group *block_group);
 int btrfs_zone_finish(struct btrfs_block_group *block_group);
+bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
+			     int raid_index);
 #else /* CONFIG_BLK_DEV_ZONED */
 static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 				     struct blk_zone *zone)
@@ -216,6 +218,12 @@ static inline int btrfs_zone_finish(struct btrfs_block_group *block_group)
 	return 0;
 }
 
+static inline bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
+					   int raid_index)
+{
+	return true;
+}
+
 #endif
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 16/17] btrfs: zoned: finish fully written block group
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (14 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 15/17] btrfs: zoned: avoid chunk allocation if active block group has enough space Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  2021-08-11 17:26     ` kernel test robot
  2021-08-11 18:37     ` kernel test robot
  2021-08-11 14:16 ` [PATCH 17/17] btrfs: zoned: finish relocating " Naohiro Aota
  16 siblings, 2 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

If we have written to the zone capacity, the device automatically
deactivates the zone. Sync up block group side (the active BG list and
zone_is_active flag) with it.

We need to do it both on data BGs and metadata BGs. On data side, we add a
hook to btrfs_finish_ordered_io(). On metadata side, we use
end_extent_buffer_writeback().

To reduce excess lookup of a block group, we mark the last extent buffer in
a block group with EXTENT_BUFFER_ZONE_FINISH flag. This cannot be done for
data (ordered_extent), because the address may change due to
REQ_OP_ZONE_APPEND.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent_io.c | 11 ++++++++-
 fs/btrfs/extent_io.h |  1 +
 fs/btrfs/inode.c     |  6 ++++-
 fs/btrfs/zoned.c     | 58 ++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h     |  8 ++++++
 5 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index aaddd7225348..c353bfd89dfc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4155,6 +4155,9 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 
 static void end_extent_buffer_writeback(struct extent_buffer *eb)
 {
+	if (test_bit(EXTENT_BUFFER_ZONE_FINISH, &eb->bflags))
+		btrfs_zone_finish_endio(eb->fs_info, eb->start, eb->len);
+
 	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
 	smp_mb__after_atomic();
 	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
@@ -4756,8 +4759,14 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc,
 		free_extent_buffer(eb);
 		return ret;
 	}
-	if (cache)
+	if (cache) {
+		/* Impiles write in zoned btrfs*/
 		btrfs_put_block_group(cache);
+		/* Mark the last eb in a block group */
+		if (cache->seq_zone &&
+		    eb->start + eb->len == cache->zone_capacity)
+			set_bit(EXTENT_BUFFER_ZONE_FINISH, &eb->bflags);
+	}
 	ret = write_one_eb(eb, wbc, epd);
 	free_extent_buffer(eb);
 	if (ret < 0)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 53abdc280451..9f3e0a45a5e4 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -32,6 +32,7 @@ enum {
 	/* write IO error */
 	EXTENT_BUFFER_WRITE_ERR,
 	EXTENT_BUFFER_NO_CHECK,
+	EXTENT_BUFFER_ZONE_FINISH,
 };
 
 /* these are flags for __process_pages_contig */
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index afe9dcda860b..1697f745ba5c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3010,8 +3010,12 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		goto out;
 	}
 
-	if (ordered_extent->bdev)
+	/* Non-null bdev implies a write on a sequential zone */
+	if (ordered_extent->bdev) {
 		btrfs_rewrite_logical_zoned(ordered_extent);
+		btrfs_zone_finish_endio(fs_info, ordered_extent->disk_bytenr,
+					logical_len);
+	}
 
 	btrfs_free_io_failure_record(inode, start, end);
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 850662d103e9..dd92e48b7f56 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1897,3 +1897,61 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
 
 	return ret;
 }
+
+int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
+			    u64 length)
+{
+	struct btrfs_block_group *block_group;
+	struct map_lookup *map;
+	struct btrfs_device *device;
+	u64 physical;
+	int ret;
+
+	if (!btrfs_is_zoned(fs_info))
+		return 0;
+
+	block_group = btrfs_lookup_block_group(fs_info, logical);
+	ASSERT(block_group);
+
+	if (logical + length < block_group->start + block_group->zone_capacity) {
+		ret = 0;
+		goto out;
+	}
+
+	spin_lock(&block_group->lock);
+
+	if (!block_group->zone_is_active) {
+		spin_unlock(&block_group->lock);
+		ret = 0;
+		goto out;
+	}
+
+	block_group->zone_is_active = 0;
+	/* We should have consumed all the free space */
+	ASSERT(block_group->alloc_offset == block_group->zone_capacity);
+	ASSERT(block_group->free_space_ctl->free_space == 0);
+	btrfs_clear_treelog_bg(block_group);
+	spin_unlock(&block_group->lock);
+
+	map = block_group->physical_map;
+	device = map->stripes[0].dev;
+	physical = map->stripes[0].physical;
+
+	if (!device->zone_info->max_active_zones) {
+		ret = 0;
+		goto out;
+	}
+
+	btrfs_dev_clear_active_zone(device, physical);
+
+	spin_lock(&fs_info->zone_active_bgs_lock);
+	ASSERT(!list_empty(&block_group->active_bg_list));
+	list_del_init(&block_group->active_bg_list);
+	spin_unlock(&fs_info->zone_active_bgs_lock);
+
+	btrfs_put_block_group(block_group);
+
+out:
+	btrfs_put_block_group(block_group);
+	return ret;
+}
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index ade6588c4ccd..04a3ea884f3b 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -73,6 +73,8 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group);
 int btrfs_zone_finish(struct btrfs_block_group *block_group);
 bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
 			     int raid_index);
+int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
+			    u64 length);
 #else /* CONFIG_BLK_DEV_ZONED */
 static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 				     struct blk_zone *zone)
@@ -224,6 +226,12 @@ static inline bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices,
 	return true;
 }
 
+static inline int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info,
+					  u64 logical, u64 length)
+{
+	return 0;
+}
+
 #endif
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 17/17] btrfs: zoned: finish relocating block group
  2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
                   ` (15 preceding siblings ...)
  2021-08-11 14:16 ` [PATCH 16/17] btrfs: zoned: finish fully written block group Naohiro Aota
@ 2021-08-11 14:16 ` Naohiro Aota
  16 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

We will no longer write to a relocating block group. So, we can finish it
now.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/relocation.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 914d403b4415..63d2b22cf438 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -25,6 +25,7 @@
 #include "backref.h"
 #include "misc.h"
 #include "subpage.h"
+#include "zoned.h"
 
 /*
  * Relocation overview
@@ -4063,6 +4064,9 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 				 rc->block_group->start,
 				 rc->block_group->length);
 
+	ret = btrfs_zone_finish(rc->block_group);
+	WARN_ON(ret && ret != -EAGAIN);
+
 	while (1) {
 		int finishes_stage;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
  2021-08-11 14:16 ` [PATCH 16/17] btrfs: zoned: finish fully written block group Naohiro Aota
@ 2021-08-11 17:26     ` kernel test robot
  2021-08-11 18:37     ` kernel test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kernel test robot @ 2021-08-11 17:26 UTC (permalink / raw)
  To: Naohiro Aota, Josef Bacik, David Sterba
  Cc: clang-built-linux, kbuild-all, linux-btrfs, Naohiro Aota

[-- Attachment #1: Type: text/plain, Size: 4395 bytes --]

Hi Naohiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[cannot apply to v5.14-rc5 next-20210811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: hexagon-randconfig-r041-20210810 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ccecd271dc2436fe587af8d2083c3c96ab86e309
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
        git checkout ccecd271dc2436fe587af8d2083c3c96ab86e309
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/btrfs/zoned.c:1948:2: warning: variable 'ret' is used uninitialized whenever '?:' condition is true [-Wsometimes-uninitialized]
           ASSERT(!list_empty(&block_group->active_bg_list));
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
           (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
            ^~~~~~~~~~~~
   include/linux/compiler.h:77:20: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/zoned.c:1956:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   fs/btrfs/zoned.c:1948:2: note: remove the '?:' if its condition is always false
           ASSERT(!list_empty(&block_group->active_bg_list));
           ^
   fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
           (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
            ^
   include/linux/compiler.h:77:20: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                           ^
   fs/btrfs/zoned.c:1908:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +1948 fs/btrfs/zoned.c

  1900	
  1901	int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
  1902				    u64 length)
  1903	{
  1904		struct btrfs_block_group *block_group;
  1905		struct map_lookup *map;
  1906		struct btrfs_device *device;
  1907		u64 physical;
  1908		int ret;
  1909	
  1910		if (!btrfs_is_zoned(fs_info))
  1911			return 0;
  1912	
  1913		block_group = btrfs_lookup_block_group(fs_info, logical);
  1914		ASSERT(block_group);
  1915	
  1916		if (logical + length < block_group->start + block_group->zone_capacity) {
  1917			ret = 0;
  1918			goto out;
  1919		}
  1920	
  1921		spin_lock(&block_group->lock);
  1922	
  1923		if (!block_group->zone_is_active) {
  1924			spin_unlock(&block_group->lock);
  1925			ret = 0;
  1926			goto out;
  1927		}
  1928	
  1929		block_group->zone_is_active = 0;
  1930		/* We should have consumed all the free space */
  1931		ASSERT(block_group->alloc_offset == block_group->zone_capacity);
  1932		ASSERT(block_group->free_space_ctl->free_space == 0);
  1933		btrfs_clear_treelog_bg(block_group);
  1934		spin_unlock(&block_group->lock);
  1935	
  1936		map = block_group->physical_map;
  1937		device = map->stripes[0].dev;
  1938		physical = map->stripes[0].physical;
  1939	
  1940		if (!device->zone_info->max_active_zones) {
  1941			ret = 0;
  1942			goto out;
  1943		}
  1944	
  1945		btrfs_dev_clear_active_zone(device, physical);
  1946	
  1947		spin_lock(&fs_info->zone_active_bgs_lock);
> 1948		ASSERT(!list_empty(&block_group->active_bg_list));

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25180 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
@ 2021-08-11 17:26     ` kernel test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2021-08-11 17:26 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4509 bytes --]

Hi Naohiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[cannot apply to v5.14-rc5 next-20210811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: hexagon-randconfig-r041-20210810 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ccecd271dc2436fe587af8d2083c3c96ab86e309
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
        git checkout ccecd271dc2436fe587af8d2083c3c96ab86e309
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/btrfs/zoned.c:1948:2: warning: variable 'ret' is used uninitialized whenever '?:' condition is true [-Wsometimes-uninitialized]
           ASSERT(!list_empty(&block_group->active_bg_list));
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
           (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
            ^~~~~~~~~~~~
   include/linux/compiler.h:77:20: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/zoned.c:1956:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   fs/btrfs/zoned.c:1948:2: note: remove the '?:' if its condition is always false
           ASSERT(!list_empty(&block_group->active_bg_list));
           ^
   fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
           (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
            ^
   include/linux/compiler.h:77:20: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                           ^
   fs/btrfs/zoned.c:1908:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +1948 fs/btrfs/zoned.c

  1900	
  1901	int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
  1902				    u64 length)
  1903	{
  1904		struct btrfs_block_group *block_group;
  1905		struct map_lookup *map;
  1906		struct btrfs_device *device;
  1907		u64 physical;
  1908		int ret;
  1909	
  1910		if (!btrfs_is_zoned(fs_info))
  1911			return 0;
  1912	
  1913		block_group = btrfs_lookup_block_group(fs_info, logical);
  1914		ASSERT(block_group);
  1915	
  1916		if (logical + length < block_group->start + block_group->zone_capacity) {
  1917			ret = 0;
  1918			goto out;
  1919		}
  1920	
  1921		spin_lock(&block_group->lock);
  1922	
  1923		if (!block_group->zone_is_active) {
  1924			spin_unlock(&block_group->lock);
  1925			ret = 0;
  1926			goto out;
  1927		}
  1928	
  1929		block_group->zone_is_active = 0;
  1930		/* We should have consumed all the free space */
  1931		ASSERT(block_group->alloc_offset == block_group->zone_capacity);
  1932		ASSERT(block_group->free_space_ctl->free_space == 0);
  1933		btrfs_clear_treelog_bg(block_group);
  1934		spin_unlock(&block_group->lock);
  1935	
  1936		map = block_group->physical_map;
  1937		device = map->stripes[0].dev;
  1938		physical = map->stripes[0].physical;
  1939	
  1940		if (!device->zone_info->max_active_zones) {
  1941			ret = 0;
  1942			goto out;
  1943		}
  1944	
  1945		btrfs_dev_clear_active_zone(device, physical);
  1946	
  1947		spin_lock(&fs_info->zone_active_bgs_lock);
> 1948		ASSERT(!list_empty(&block_group->active_bg_list));

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25180 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
  2021-08-11 14:16 ` [PATCH 16/17] btrfs: zoned: finish fully written block group Naohiro Aota
@ 2021-08-11 18:37     ` kernel test robot
  2021-08-11 18:37     ` kernel test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kernel test robot @ 2021-08-11 18:37 UTC (permalink / raw)
  To: Naohiro Aota, Josef Bacik, David Sterba
  Cc: clang-built-linux, kbuild-all, linux-btrfs, Naohiro Aota

[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]

Hi Naohiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[cannot apply to v5.14-rc5 next-20210811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-a012-20210810 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d39ebdae674c8efc84ebe8dc32716ec353220530)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ccecd271dc2436fe587af8d2083c3c96ab86e309
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
        git checkout ccecd271dc2436fe587af8d2083c3c96ab86e309
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/btrfs/zoned.c:1940:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (!device->zone_info->max_active_zones) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/zoned.c:1956:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   fs/btrfs/zoned.c:1940:2: note: remove the 'if' if its condition is always true
           if (!device->zone_info->max_active_zones) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/zoned.c:1908:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +1940 fs/btrfs/zoned.c

  1900	
  1901	int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
  1902				    u64 length)
  1903	{
  1904		struct btrfs_block_group *block_group;
  1905		struct map_lookup *map;
  1906		struct btrfs_device *device;
  1907		u64 physical;
  1908		int ret;
  1909	
  1910		if (!btrfs_is_zoned(fs_info))
  1911			return 0;
  1912	
  1913		block_group = btrfs_lookup_block_group(fs_info, logical);
  1914		ASSERT(block_group);
  1915	
  1916		if (logical + length < block_group->start + block_group->zone_capacity) {
  1917			ret = 0;
  1918			goto out;
  1919		}
  1920	
  1921		spin_lock(&block_group->lock);
  1922	
  1923		if (!block_group->zone_is_active) {
  1924			spin_unlock(&block_group->lock);
  1925			ret = 0;
  1926			goto out;
  1927		}
  1928	
  1929		block_group->zone_is_active = 0;
  1930		/* We should have consumed all the free space */
  1931		ASSERT(block_group->alloc_offset == block_group->zone_capacity);
  1932		ASSERT(block_group->free_space_ctl->free_space == 0);
  1933		btrfs_clear_treelog_bg(block_group);
  1934		spin_unlock(&block_group->lock);
  1935	
  1936		map = block_group->physical_map;
  1937		device = map->stripes[0].dev;
  1938		physical = map->stripes[0].physical;
  1939	
> 1940		if (!device->zone_info->max_active_zones) {

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35768 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
@ 2021-08-11 18:37     ` kernel test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2021-08-11 18:37 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3709 bytes --]

Hi Naohiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[cannot apply to v5.14-rc5 next-20210811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-a012-20210810 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d39ebdae674c8efc84ebe8dc32716ec353220530)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ccecd271dc2436fe587af8d2083c3c96ab86e309
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
        git checkout ccecd271dc2436fe587af8d2083c3c96ab86e309
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/btrfs/zoned.c:1940:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (!device->zone_info->max_active_zones) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/zoned.c:1956:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   fs/btrfs/zoned.c:1940:2: note: remove the 'if' if its condition is always true
           if (!device->zone_info->max_active_zones) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/btrfs/zoned.c:1908:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +1940 fs/btrfs/zoned.c

  1900	
  1901	int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
  1902				    u64 length)
  1903	{
  1904		struct btrfs_block_group *block_group;
  1905		struct map_lookup *map;
  1906		struct btrfs_device *device;
  1907		u64 physical;
  1908		int ret;
  1909	
  1910		if (!btrfs_is_zoned(fs_info))
  1911			return 0;
  1912	
  1913		block_group = btrfs_lookup_block_group(fs_info, logical);
  1914		ASSERT(block_group);
  1915	
  1916		if (logical + length < block_group->start + block_group->zone_capacity) {
  1917			ret = 0;
  1918			goto out;
  1919		}
  1920	
  1921		spin_lock(&block_group->lock);
  1922	
  1923		if (!block_group->zone_is_active) {
  1924			spin_unlock(&block_group->lock);
  1925			ret = 0;
  1926			goto out;
  1927		}
  1928	
  1929		block_group->zone_is_active = 0;
  1930		/* We should have consumed all the free space */
  1931		ASSERT(block_group->alloc_offset == block_group->zone_capacity);
  1932		ASSERT(block_group->free_space_ctl->free_space == 0);
  1933		btrfs_clear_treelog_bg(block_group);
  1934		spin_unlock(&block_group->lock);
  1935	
  1936		map = block_group->physical_map;
  1937		device = map->stripes[0].dev;
  1938		physical = map->stripes[0].physical;
  1939	
> 1940		if (!device->zone_info->max_active_zones) {

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35768 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
  2021-08-11 17:26     ` kernel test robot
@ 2021-08-16  4:34       ` Naohiro Aota
  -1 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-16  4:34 UTC (permalink / raw)
  To: kernel test robot
  Cc: Josef Bacik, David Sterba, clang-built-linux, kbuild-all, linux-btrfs

On Thu, Aug 12, 2021 at 01:26:02AM +0800, kernel test robot wrote:
> Hi Naohiro,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on kdave/for-next]
> [cannot apply to v5.14-rc5 next-20210811]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> config: hexagon-randconfig-r041-20210810 (attached as .config)
> compiler: clang version 12.0.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/0day-ci/linux/commit/ccecd271dc2436fe587af8d2083c3c96ab86e309
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
>         git checkout ccecd271dc2436fe587af8d2083c3c96ab86e309
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> fs/btrfs/zoned.c:1948:2: warning: variable 'ret' is used uninitialized whenever '?:' condition is true [-Wsometimes-uninitialized]
>            ASSERT(!list_empty(&block_group->active_bg_list));
>            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
>            (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
>             ^~~~~~~~~~~~
>    include/linux/compiler.h:77:20: note: expanded from macro 'likely'
>    # define likely(x)      __builtin_expect(!!(x), 1)
>                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>    fs/btrfs/zoned.c:1956:9: note: uninitialized use occurs here
>            return ret;
>                   ^~~
>    fs/btrfs/zoned.c:1948:2: note: remove the '?:' if its condition is always false
>            ASSERT(!list_empty(&block_group->active_bg_list));
>            ^
>    fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
>            (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
>             ^
>    include/linux/compiler.h:77:20: note: expanded from macro 'likely'
>    # define likely(x)      __builtin_expect(!!(x), 1)
>                            ^
>    fs/btrfs/zoned.c:1908:9: note: initialize the variable 'ret' to silence this warning
>            int ret;
>                   ^
>                    = 0
>    1 warning generated.

This looks false-positive for me but still I noticed this "ret" can
never be non-zero in any path. So, I'll convert the function to "void"
in the next version.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
@ 2021-08-16  4:34       ` Naohiro Aota
  0 siblings, 0 replies; 24+ messages in thread
From: Naohiro Aota @ 2021-08-16  4:34 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

On Thu, Aug 12, 2021 at 01:26:02AM +0800, kernel test robot wrote:
> Hi Naohiro,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on kdave/for-next]
> [cannot apply to v5.14-rc5 next-20210811]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> config: hexagon-randconfig-r041-20210810 (attached as .config)
> compiler: clang version 12.0.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/0day-ci/linux/commit/ccecd271dc2436fe587af8d2083c3c96ab86e309
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
>         git checkout ccecd271dc2436fe587af8d2083c3c96ab86e309
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> fs/btrfs/zoned.c:1948:2: warning: variable 'ret' is used uninitialized whenever '?:' condition is true [-Wsometimes-uninitialized]
>            ASSERT(!list_empty(&block_group->active_bg_list));
>            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
>            (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
>             ^~~~~~~~~~~~
>    include/linux/compiler.h:77:20: note: expanded from macro 'likely'
>    # define likely(x)      __builtin_expect(!!(x), 1)
>                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>    fs/btrfs/zoned.c:1956:9: note: uninitialized use occurs here
>            return ret;
>                   ^~~
>    fs/btrfs/zoned.c:1948:2: note: remove the '?:' if its condition is always false
>            ASSERT(!list_empty(&block_group->active_bg_list));
>            ^
>    fs/btrfs/ctree.h:3458:3: note: expanded from macro 'ASSERT'
>            (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
>             ^
>    include/linux/compiler.h:77:20: note: expanded from macro 'likely'
>    # define likely(x)      __builtin_expect(!!(x), 1)
>                            ^
>    fs/btrfs/zoned.c:1908:9: note: initialize the variable 'ret' to silence this warning
>            int ret;
>                   ^
>                    = 0
>    1 warning generated.

This looks false-positive for me but still I noticed this "ret" can
never be non-zero in any path. So, I'll convert the function to "void"
in the next version.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-08-16  4:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
2021-08-11 14:16 ` [PATCH 01/17] btrfs: zoned: load zone capacity information from devices Naohiro Aota
2021-08-11 14:16 ` [PATCH 02/17] btrfs: zoned: move btrfs_free_excluded_extents out from btrfs_calc_zone_unusable Naohiro Aota
2021-08-11 14:16 ` [PATCH 03/17] btrfs: zoned: calculate free space from zone capacity Naohiro Aota
2021-08-11 14:16 ` [PATCH 04/17] btrfs: zoned: tweak reclaim threshold for " Naohiro Aota
2021-08-11 14:16 ` [PATCH 05/17] btrfs: zoned: consider zone as full when no more SB can be written Naohiro Aota
2021-08-11 14:16 ` [PATCH 06/17] btrfs: zoned: locate superblock position using zone capacity Naohiro Aota
2021-08-11 14:16 ` [PATCH 07/17] btrfs: zoned: finish superblock zone once no space left for new SB Naohiro Aota
2021-08-11 14:16 ` [PATCH 08/17] btrfs: zoned: load active zone information from devices Naohiro Aota
2021-08-11 14:16 ` [PATCH 09/17] btrfs: zoned: introduce physical_map to btrfs_block_group Naohiro Aota
2021-08-11 14:16 ` [PATCH 10/17] btrfs: zoned: implement active zone tracking Naohiro Aota
2021-08-11 14:16 ` [PATCH 11/17] btrfs: zoned: load active zone info for block group Naohiro Aota
2021-08-11 14:16 ` [PATCH 12/17] btrfs: zoned: activate block group on allocation Naohiro Aota
2021-08-11 14:16 ` [PATCH 13/17] btrfs: zoned: activate new block group Naohiro Aota
2021-08-11 14:16 ` [PATCH 14/17] btrfs: move ffe_ctl one level up Naohiro Aota
2021-08-11 14:16 ` [PATCH 15/17] btrfs: zoned: avoid chunk allocation if active block group has enough space Naohiro Aota
2021-08-11 14:16 ` [PATCH 16/17] btrfs: zoned: finish fully written block group Naohiro Aota
2021-08-11 17:26   ` kernel test robot
2021-08-11 17:26     ` kernel test robot
2021-08-16  4:34     ` Naohiro Aota
2021-08-16  4:34       ` Naohiro Aota
2021-08-11 18:37   ` kernel test robot
2021-08-11 18:37     ` kernel test robot
2021-08-11 14:16 ` [PATCH 17/17] btrfs: zoned: finish relocating " Naohiro Aota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.