All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/26] btrfs-progs: zoned: zoned block device support
@ 2021-04-26  6:27 Naohiro Aota
  2021-04-26  6:27 ` [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
                   ` (26 more replies)
  0 siblings, 27 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

This series implements user-land side support for zoned btrfs.

This series is based on misc-next + preparation series below.
https://lore.kernel.org/linux-btrfs/cover.1617694997.git.naohiro.aota@wdc.com/

Userland tool depends on patched util-linux (libblkid and wipefs) to handle
log-structured superblock. The patches are available in the util-linux list.
https://lore.kernel.org/util-linux/20210426055036.2103620-1-naohiro.aota@wdc.com/T/

Followup work will address several areas that can be improved.


* Patch series organization

Patches 1 and 2 are preparation patches. They add a helper function
queue_param() and provide fs_info in struct btrfs_device as same as the
kernel code.

Patch 3 adds a check for a header file of zoned block device support.

Patches 4 to 16 implement zoned btrfs features (loading zone info,
chunk/extent allocator, zone emulation for a non-zoned device, etc.) like
in the kernel code.

Patches 17 to 19 extend btrfs_prepare_device() for a zoned device.

Patches 20 to 24 implement zoned support for mkfs.btrfs.

And, patches 25 and 26 add zoned support for other commands ("device add"
and "device replace").

Naohiro Aota (26):
  btrfs-progs: utils: Introduce queue_param helper function
  btrfs-progs: provide fs_info from btrfs_device
  btrfs-progs: build: zoned: Check zoned block device support
  btrfs-progs: zoned: add new ZONED feature flag
  btrfs-progs: zoned: get zone information of zoned block devices
  btrfs-progs: zoned: check and enable ZONED mode
  btrfs-progs: zoned: introduce max_zone_append_size
  btrfs-progs: zoned: disallow mixed-bg in ZONED mode
  btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices
  btrfs-progs: zoned: implement log-structured superblock for ZONED mode
  btrfs-progs: zoned: implement zoned chunk allocator
  btrfs-progs: zoned: load zone's allocation offset
  btrfs-progs: zoned: implement sequential extent allocation
  btrfs-progs: zoned: calculate allocation offset for conventional zones
  btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs
  btrfs-progs: zoned: reset zone of freed block group
  btrfs-progs: zoned: support resetting zoned device
  btrfs-progs: zoned: support zero out on zoned block device
  btrfs-progs: zoned: support wiping SB on sequential write zone
  btrfs-progs: mkfs: zoned: detect and enable zoned feature flag
  btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs
  btrfs-progs: mkfs: zoned: tweak initial system block group placement
  btrfs-progs: mkfs: zoned: use sbwrite to update superblock
  btrfs-progs: zoned: wipe temporary superblocks in superblock log zone
  btrfs-progs: zoned: device-add: support ZONED device
  btrfs-progs: zoned: introduce zoned support for device replace

 Makefile                    |    2 +-
 cmds/device.c               |   21 +-
 cmds/inspect-dump-super.c   |    3 +-
 cmds/replace.c              |   13 +-
 cmds/rescue-chunk-recover.c |    2 +-
 common/device-scan.c        |    7 +-
 common/device-utils.c       |  127 +++-
 common/device-utils.h       |    4 +
 common/fsfeatures.c         |    8 +
 common/fsfeatures.h         |    3 +-
 configure.ac                |   13 +
 kerncompat.h                |   23 +
 kernel-shared/ctree.h       |   28 +-
 kernel-shared/disk-io.c     |   39 +-
 kernel-shared/extent-tree.c |   26 +
 kernel-shared/print-tree.c  |    1 +
 kernel-shared/transaction.c |    6 +
 kernel-shared/volumes.c     |  153 ++++-
 kernel-shared/volumes.h     |    8 +-
 kernel-shared/zoned.c       | 1181 +++++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h       |  170 +++++
 mkfs/common.c               |   38 +-
 mkfs/common.h               |    1 +
 mkfs/main.c                 |  112 ++--
 24 files changed, 1887 insertions(+), 102 deletions(-)
 create mode 100644 kernel-shared/zoned.c
 create mode 100644 kernel-shared/zoned.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:26   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device Naohiro Aota
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota, Damien Le Moal

Introduce the queue_param helper function to get a device request queue
parameter. This helper will be used later to query information of a zoned
device.

Furthermore, rewrite is_ssd() using the helper function.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Naohiro] fixed error return value
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 46 +++++++++++++++++++++++++++++++++++++++++++
 common/device-utils.h |  1 +
 mkfs/main.c           | 40 ++-----------------------------------
 3 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index c860b94661c4..f5d5277e8fce 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -252,3 +252,49 @@ u64 get_partition_size(const char *dev)
 	return result;
 }
 
+/*
+ * Get a device request queue parameter.
+ */
+int queue_param(const char *file, const char *param, char *buf, size_t len)
+{
+	blkid_probe probe;
+	char wholedisk[PATH_MAX];
+	char sysfs_path[PATH_MAX];
+	dev_t devno;
+	int fd;
+	int ret;
+
+	probe = blkid_new_probe_from_filename(file);
+	if (!probe)
+		return 0;
+
+	/* Device number of this disk (possibly a partition) */
+	devno = blkid_probe_get_devno(probe);
+	if (!devno) {
+		blkid_free_probe(probe);
+		return 0;
+	}
+
+	/* Get whole disk name (not full path) for this devno */
+	ret = blkid_devno_to_wholedisk(devno,
+			wholedisk, sizeof(wholedisk), NULL);
+	if (ret) {
+		blkid_free_probe(probe);
+		return 0;
+	}
+
+	snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/%s",
+		 wholedisk, param);
+
+	blkid_free_probe(probe);
+
+	fd = open(sysfs_path, O_RDONLY);
+	if (fd < 0)
+		return 0;
+
+	len = read(fd, buf, len);
+	close(fd);
+
+	return len;
+}
+
diff --git a/common/device-utils.h b/common/device-utils.h
index 70d19cae3e50..d1799323d002 100644
--- a/common/device-utils.h
+++ b/common/device-utils.h
@@ -29,5 +29,6 @@ u64 disk_size(const char *path);
 u64 btrfs_device_size(int fd, struct stat *st);
 int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		u64 max_block_count, unsigned opflags);
+int queue_param(const char *file, const char *param, char *buf, size_t len);
 
 #endif
diff --git a/mkfs/main.c b/mkfs/main.c
index c910369cbf94..a903896289fa 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -434,49 +434,13 @@ static int zero_output_file(int out_fd, u64 size)
 
 static int is_ssd(const char *file)
 {
-	blkid_probe probe;
-	char wholedisk[PATH_MAX];
-	char sysfs_path[PATH_MAX];
-	dev_t devno;
-	int fd;
 	char rotational;
 	int ret;
 
-	probe = blkid_new_probe_from_filename(file);
-	if (!probe)
+	ret = queue_param(file, "rotational", &rotational, 1);
+	if (ret < 1)
 		return 0;
 
-	/* Device number of this disk (possibly a partition) */
-	devno = blkid_probe_get_devno(probe);
-	if (!devno) {
-		blkid_free_probe(probe);
-		return 0;
-	}
-
-	/* Get whole disk name (not full path) for this devno */
-	ret = blkid_devno_to_wholedisk(devno,
-			wholedisk, sizeof(wholedisk), NULL);
-	if (ret) {
-		blkid_free_probe(probe);
-		return 0;
-	}
-
-	snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/rotational",
-		 wholedisk);
-
-	blkid_free_probe(probe);
-
-	fd = open(sysfs_path, O_RDONLY);
-	if (fd < 0) {
-		return 0;
-	}
-
-	if (read(fd, &rotational, 1) < 1) {
-		close(fd);
-		return 0;
-	}
-	close(fd);
-
 	return rotational == '0';
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
  2021-04-26  6:27 ` [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:25   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 03/26] btrfs-progs: build: zoned: Check zoned block device support Naohiro Aota
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Likewise in the kernel code, provide fs_info access from struct
btrfs_device. This will help to unify the code between the kernel and the
userland.

Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use
btrfs_open_devices() to set fs_info to the devices.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/rescue-chunk-recover.c | 2 +-
 common/device-scan.c        | 1 +
 kernel-shared/disk-io.c     | 2 +-
 kernel-shared/volumes.c     | 8 ++++++--
 kernel-shared/volumes.h     | 5 +++--
 5 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/cmds/rescue-chunk-recover.c b/cmds/rescue-chunk-recover.c
index 5f21672b9d3e..216a6226b0f7 100644
--- a/cmds/rescue-chunk-recover.c
+++ b/cmds/rescue-chunk-recover.c
@@ -1446,7 +1446,7 @@ open_ctree_with_broken_chunk(struct recover_control *rc)
 	fs_info->is_chunk_recover = 1;
 
 	fs_info->fs_devices = rc->fs_devices;
-	ret = btrfs_open_devices(fs_info->fs_devices, O_RDWR);
+	ret = btrfs_open_devices(fs_info, fs_info->fs_devices, O_RDWR);
 	if (ret)
 		goto out;
 
diff --git a/common/device-scan.c b/common/device-scan.c
index cd4c12821078..01d2e0656583 100644
--- a/common/device-scan.c
+++ b/common/device-scan.c
@@ -141,6 +141,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
 	dev_item = &disk_super->dev_item;
 
 	uuid_generate(device->uuid);
+	device->fs_info = fs_info;
 	device->devid = 0;
 	device->type = 0;
 	device->io_width = io_width;
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 5555a406321b..a78be1e7a692 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1271,7 +1271,7 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path,
 	if (flags & OPEN_CTREE_EXCLUSIVE)
 		oflags |= O_EXCL;
 
-	ret = btrfs_open_devices(fs_devices, oflags);
+	ret = btrfs_open_devices(fs_info, fs_devices, oflags);
 	if (ret)
 		goto out;
 
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index f7dd879398d4..cbcf7bfa371d 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -389,13 +389,17 @@ void btrfs_close_all_devices(void)
 	}
 }
 
-int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags)
+int btrfs_open_devices(struct btrfs_fs_info *fs_info,
+		       struct btrfs_fs_devices *fs_devices, int flags)
 {
 	int fd;
 	struct btrfs_device *device;
 	int ret;
 
 	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		if (!device->fs_info)
+			device->fs_info = fs_info;
+
 		if (!device->name) {
 			printk("no name for device %llu, skip it now\n", device->devid);
 			continue;
@@ -2106,7 +2110,7 @@ static int open_seed_devices(struct btrfs_fs_info *fs_info, u8 *fsid)
 		memcpy(fs_devices->fsid, fsid, BTRFS_FSID_SIZE);
 	}
 
-	ret = btrfs_open_devices(fs_devices, O_RDONLY);
+	ret = btrfs_open_devices(fs_info, fs_devices, O_RDONLY);
 	if (ret)
 		goto out;
 
diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
index e1d7918dd30b..faaa285dbf11 100644
--- a/kernel-shared/volumes.h
+++ b/kernel-shared/volumes.h
@@ -28,6 +28,7 @@ struct btrfs_device {
 	struct list_head dev_list;
 	struct btrfs_root *dev_root;
 	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_fs_info *fs_info;
 
 	u64 total_ios;
 
@@ -282,8 +283,8 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		      u64 *num_bytes, u64 type);
 int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
 			   struct btrfs_fs_info *fs_info, u64 *start, u64 num_bytes);
-int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
-		       int flags);
+int btrfs_open_devices(struct btrfs_fs_info *fs_info,
+		       struct btrfs_fs_devices *fs_devices, int flags);
 int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
 void btrfs_close_all_devices(void);
 int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/26] btrfs-progs: build: zoned: Check zoned block device support
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
  2021-04-26  6:27 ` [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
  2021-04-26  6:27 ` [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag Naohiro Aota
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota, Damien Le Moal

If the kernel supports zoned block devices, the file
/usr/include/linux/blkzoned.h will be present. Check this and define
BTRFS_ZONED if the file is present.

If it present, enables ZONED feature, if not disable it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 configure.ac | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/configure.ac b/configure.ac
index 6ea29e0a5a06..5ad95d662b47 100644
--- a/configure.ac
+++ b/configure.ac
@@ -250,6 +250,18 @@ AX_CHECK_DEFINE([ext2fs/ext2_fs.h], [EXT4_EPOCH_MASK],
 			   [Define to 1 if e2fsprogs defines EXT4_EPOCH_MASK])],
 		[AC_MSG_WARN([no definition of EXT4_EPOCH_MASK found, probably old e2fsprogs, no 64bit time precision of converted images])])
 
+AC_CHECK_HEADER(linux/blkzoned.h, [blkzoned_found=yes], [blkzoned_found=no])
+AC_ARG_ENABLE([zoned],
+  AS_HELP_STRING([--disable-zoned], [disable zoned block device support]),
+  [], [enable_zoned=$blkzoned_found]
+)
+
+AS_IF([test "x$enable_zoned" = xyes], [
+	AC_CHECK_HEADER(linux/blkzoned.h, [],
+		[AC_MSG_ERROR([Couldn't find linux/blkzoned.h])])
+	AC_DEFINE([BTRFS_ZONED], [1], [enable zoned block device support])
+])
+
 dnl Define <NAME>_LIBS= and <NAME>_CFLAGS= by pkg-config
 dnl
 dnl The default PKG_CHECK_MODULES() action-if-not-found is end the
@@ -367,6 +379,7 @@ AC_MSG_RESULT([
 	Python bindings:    ${enable_python}
 	Python interpreter: ${PYTHON}
 	crypto provider:    ${cryptoprovider} ${cryptoproviderversion}
+	zoned device:       ${enable_zoned}
 
 	Type 'make' to compile.
 ])
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (2 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 03/26] btrfs-progs: build: zoned: Check zoned block device support Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:45   ` Johannes Thumshirn
  2021-04-27 15:46   ` David Sterba
  2021-04-26  6:27 ` [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices Naohiro Aota
                   ` (22 subsequent siblings)
  26 siblings, 2 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

With the zoned feature enabled, a zoned block device-aware btrfs allocates
block groups aligned to the device zones and always write in sequential
zones at the zone write pointer position.

It also supports "emulated" zoned mode on a non-zoned device. In the
emulated mode, btrfs emulates conventional zones by slicing the device with
a fixed size.

We don't support conversion from the ext4 volume with the zoned feature
because we can't be sure all the converted block groups are aligned to zone
boundaries.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/fsfeatures.c        | 8 ++++++++
 common/fsfeatures.h        | 3 ++-
 kernel-shared/ctree.h      | 4 +++-
 kernel-shared/print-tree.c | 1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index 569208a9e5b1..c0793339b531 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -100,6 +100,14 @@ static const struct btrfs_feature mkfs_features[] = {
 		NULL, 0,
 		NULL, 0,
 		"RAID1 with 3 or 4 copies" },
+#ifdef BTRFS_ZONED
+	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
+		"zoned",
+		NULL, 0,
+		NULL, 0,
+		NULL, 0,
+		"support Zoned devices" },
+#endif
 	/* Keep this one last */
 	{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
 };
diff --git a/common/fsfeatures.h b/common/fsfeatures.h
index 74ec2a21caf6..1a7d7f62897f 100644
--- a/common/fsfeatures.h
+++ b/common/fsfeatures.h
@@ -25,7 +25,8 @@
 		| BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
 
 /*
- * Avoid multi-device features (RAID56) and mixed block groups
+ * Avoid multi-device features (RAID56), mixed block groups, and zoned
+ * btrfs
  */
 #define BTRFS_CONVERT_ALLOWED_FEATURES				\
 	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF			\
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 7683b8bbf0b4..77a5ad488104 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -495,6 +495,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID    (1ULL << 10)
 #define BTRFS_FEATURE_INCOMPAT_RAID1C34		(1ULL << 11)
+#define BTRFS_FEATURE_INCOMPAT_ZONED		(1ULL << 12)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 
@@ -519,7 +520,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
 	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
 	 BTRFS_FEATURE_INCOMPAT_RAID1C34 |		\
-	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID |		\
+	 BTRFS_FEATURE_INCOMPAT_ZONED)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
index 92df05c15d68..76853aee8634 100644
--- a/kernel-shared/print-tree.c
+++ b/kernel-shared/print-tree.c
@@ -1614,6 +1614,7 @@ static struct readable_flag_entry incompat_flags_array[] = {
 	DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES),
 	DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID),
 	DEF_INCOMPAT_FLAG_ENTRY(RAID1C34),
+	DEF_INCOMPAT_FLAG_ENTRY(ZONED),
 };
 static const int incompat_flags_num = sizeof(incompat_flags_array) /
 				      sizeof(struct readable_flag_entry);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (3 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:32   ` Su Yue
  2021-04-26  6:27 ` [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode Naohiro Aota
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Get the zone information (number of zones and zone size) from all the
devices, if the volume contains a zoned block device. To avoid costly
run-time zone report commands to test the device zones type during block
allocation, it also records all the zone status (zone type, write pointer
position, etc.).

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 Makefile                |   2 +-
 common/device-scan.c    |   2 +
 kerncompat.h            |   4 +
 kernel-shared/disk-io.c |  12 ++
 kernel-shared/volumes.c |   2 +
 kernel-shared/volumes.h |   2 +
 kernel-shared/zoned.c   | 242 ++++++++++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h   |  42 +++++++
 8 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 kernel-shared/zoned.c
 create mode 100644 kernel-shared/zoned.h

diff --git a/Makefile b/Makefile
index e288a336c81e..3dc0543982b2 100644
--- a/Makefile
+++ b/Makefile
@@ -169,7 +169,7 @@ libbtrfs_objects = common/send-stream.o common/send-utils.o kernel-lib/rbtree.o
 		   kernel-shared/free-space-cache.o kernel-shared/root-tree.o \
 		   kernel-shared/volumes.o kernel-shared/transaction.o \
 		   kernel-shared/free-space-tree.o repair.o kernel-shared/inode-item.o \
-		   kernel-shared/file-item.o \
+		   kernel-shared/file-item.o kernel-shared/zoned.o \
 		   kernel-lib/raid56.o kernel-lib/tables.o \
 		   common/device-scan.o common/path-utils.o \
 		   common/utils.o libbtrfsutil/subvolume.o libbtrfsutil/stubs.o \
diff --git a/common/device-scan.c b/common/device-scan.c
index 01d2e0656583..74d7853afccb 100644
--- a/common/device-scan.c
+++ b/common/device-scan.c
@@ -35,6 +35,7 @@
 #include "kernel-shared/ctree.h"
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/disk-io.h"
+#include "kernel-shared/zoned.h"
 #include "ioctl.h"
 
 static int btrfs_scan_done = 0;
@@ -198,6 +199,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
 	return 0;
 
 out:
+	free(device->zone_info);
 	free(device);
 	free(buf);
 	return ret;
diff --git a/kerncompat.h b/kerncompat.h
index 7060326fe4f4..a39b79cba767 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -76,6 +76,10 @@
 #define ULONG_MAX       (~0UL)
 #endif
 
+#ifndef SECTOR_SHIFT
+#define SECTOR_SHIFT 9
+#endif
+
 #define __token_glue(a,b,c)	___token_glue(a,b,c)
 #define ___token_glue(a,b,c)	a ## b ## c
 #ifdef DEBUG_BUILD_CHECKS
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index a78be1e7a692..0519cb2358b5 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -29,6 +29,7 @@
 #include "kernel-shared/disk-io.h"
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/transaction.h"
+#include "zoned.h"
 #include "crypto/crc32c.h"
 #include "common/utils.h"
 #include "kernel-shared/print-tree.h"
@@ -1314,6 +1315,17 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path,
 	if (!fs_info->chunk_root)
 		return fs_info;
 
+	/*
+	 * Get zone type information of zoned block devices. This will also
+	 * handle emulation of a zoned filesystem if a regular device has the
+	 * zoned incompat feature flag set.
+	 */
+	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
+	if (ret) {
+		error("zoned: failed to read device zone info: %d", ret);
+		goto out_chunk;
+	}
+
 	eb = fs_info->chunk_root->node;
 	read_extent_buffer(eb, fs_info->chunk_tree_uuid,
 			   btrfs_header_chunk_tree_uuid(eb),
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index cbcf7bfa371d..63530a99b41c 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -27,6 +27,7 @@
 #include "kernel-shared/transaction.h"
 #include "kernel-shared/print-tree.h"
 #include "kernel-shared/volumes.h"
+#include "zoned.h"
 #include "common/utils.h"
 #include "kernel-lib/raid56.h"
 
@@ -357,6 +358,7 @@ again:
 		/* free the memory */
 		free(device->name);
 		free(device->label);
+		free(device->zone_info);
 		free(device);
 	}
 
diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
index faaa285dbf11..a64288d566d8 100644
--- a/kernel-shared/volumes.h
+++ b/kernel-shared/volumes.h
@@ -45,6 +45,8 @@ struct btrfs_device {
 
 	u64 generation;
 
+	struct btrfs_zoned_device_info *zone_info;
+
 	/* the internal btrfs device id */
 	u64 devid;
 
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
new file mode 100644
index 000000000000..370d93915c6e
--- /dev/null
+++ b/kernel-shared/zoned.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+
+#include "kernel-lib/list.h"
+#include "kernel-shared/volumes.h"
+#include "kernel-shared/zoned.h"
+#include "common/utils.h"
+#include "common/device-utils.h"
+#include "common/messages.h"
+#include "mkfs/common.h"
+
+/* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */
+#define BTRFS_REPORT_NR_ZONES   4096
+
+static int btrfs_get_dev_zone_info(struct btrfs_device *device);
+
+enum btrfs_zoned_model zoned_model(const char *file)
+{
+	const char *host_aware = "host-aware";
+	const char *host_managed = "host-managed";
+	struct stat st;
+	char model[32];
+	int ret;
+
+	ret = stat(file, &st);
+	if (ret < 0) {
+		error("zoned: unable to stat %s", file);
+		return -ENOENT;
+	}
+
+	/* Consider a regular file as non-zoned device */
+	if (!S_ISBLK(st.st_mode))
+		return ZONED_NONE;
+
+	ret = queue_param(file, "zoned", model, sizeof(model));
+	if (ret <= 0)
+		return ZONED_NONE;
+
+	if (strncmp(model, host_aware, strlen(host_aware)) == 0)
+		return ZONED_HOST_AWARE;
+	if (strncmp(model, host_managed, strlen(host_managed)) == 0)
+		return ZONED_HOST_MANAGED;
+
+	return ZONED_NONE;
+}
+
+u64 zone_size(const char *file)
+{
+	char chunk[32];
+	int ret;
+
+	ret = queue_param(file, "chunk_sectors", chunk, sizeof(chunk));
+	if (ret <= 0)
+		return 0;
+
+	return strtoull((const char *)chunk, NULL, 10) << SECTOR_SHIFT;
+}
+
+#ifdef BTRFS_ZONED
+static int report_zones(int fd, const char *file,
+			struct btrfs_zoned_device_info *zinfo)
+{
+	u64 device_size;
+	u64 zone_bytes = zone_size(file);
+	size_t rep_size;
+	u64 sector = 0;
+	struct blk_zone_report *rep;
+	struct blk_zone *zone;
+	unsigned int i, n = 0;
+	int ret;
+
+	/*
+	 * Zones are guaranteed (by the kernel) to be a power of 2 number of
+	 * sectors. Check this here and make sure that zones are not too
+	 * small.
+	 */
+	if (!zone_bytes || !is_power_of_2(zone_bytes)) {
+		error("zoned: illegal zone size %llu (not a power of 2)",
+		      zone_bytes);
+		exit(1);
+	}
+	/*
+	 * The zone size must be large enough to hold the initial system
+	 * block group for mkfs time.
+	 */
+	if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+		error("zoned: illegal zone size %llu (smaller than %d)",
+		      zone_bytes, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+		exit(1);
+	}
+
+	/*
+	 * No need to use btrfs_device_size() here, since it is ensured
+	 * that the file is block device.
+	 */
+	if (ioctl(fd, BLKGETSIZE64, &device_size) < 0) {
+		error("zoned: ioctl(BLKGETSIZE64) failed on %s (%m)", file);
+		exit(1);
+	}
+
+	/* Allocate the zone information array */
+	zinfo->zone_size = zone_bytes;
+	zinfo->nr_zones = device_size / zone_bytes;
+	if (device_size & (zone_bytes - 1))
+		zinfo->nr_zones++;
+	zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct blk_zone));
+	if (!zinfo->zones) {
+		error("zoned: no memory for zone information");
+		exit(1);
+	}
+
+	/* Allocate a zone report */
+	rep_size = sizeof(struct blk_zone_report) +
+		sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES;
+	rep = malloc(rep_size);
+	if (!rep) {
+		error("zoned: no memory for zones report");
+		exit(1);
+	}
+
+	/* Get zone information */
+	zone = (struct blk_zone *)(rep + 1);
+	while (n < zinfo->nr_zones) {
+		memset(rep, 0, rep_size);
+		rep->sector = sector;
+		rep->nr_zones = BTRFS_REPORT_NR_ZONES;
+
+		ret = ioctl(fd, BLKREPORTZONE, rep);
+		if (ret != 0) {
+			error("zoned: ioctl BLKREPORTZONE failed (%m)");
+			exit(1);
+		}
+
+		if (!rep->nr_zones)
+			break;
+
+		for (i = 0; i < rep->nr_zones; i++) {
+			if (n >= zinfo->nr_zones)
+				break;
+			memcpy(&zinfo->zones[n], &zone[i],
+			       sizeof(struct blk_zone));
+			n++;
+		}
+
+		sector = zone[rep->nr_zones - 1].start +
+			 zone[rep->nr_zones - 1].len;
+	}
+
+	free(rep);
+
+	return 0;
+}
+
+#endif
+
+int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	int ret = 0;
+
+	/* fs_info->zone_size might not set yet. Use the incomapt flag here. */
+	if (!btrfs_fs_incompat(fs_info, ZONED))
+		return 0;
+
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		/* We can skip reading of zone info for missing devices */
+		if (device->fd == -1)
+			continue;
+
+		ret = btrfs_get_dev_zone_info(device);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+static int btrfs_get_dev_zone_info(struct btrfs_device *device)
+{
+	struct btrfs_fs_info *fs_info = device->fs_info;
+
+	/*
+	 * Cannot use btrfs_is_zoned here, since fs_info::zone_size might not
+	 * yet be set.
+	 */
+	if (!btrfs_fs_incompat(fs_info, ZONED))
+		return 0;
+
+	if (device->zone_info)
+		return 0;
+
+	return btrfs_get_zone_info(device->fd, device->name,
+				   &device->zone_info);
+}
+
+int btrfs_get_zone_info(int fd, const char *file,
+			struct btrfs_zoned_device_info **zinfo_ret)
+{
+#ifdef BTRFS_ZONED
+	struct btrfs_zoned_device_info *zinfo;
+	int ret;
+#endif
+	enum btrfs_zoned_model model;
+
+	*zinfo_ret = NULL;
+
+	/* Check zone model */
+	model = zoned_model(file);
+	if (model == ZONED_NONE)
+		return 0;
+
+#ifdef BTRFS_ZONED
+	zinfo = calloc(1, sizeof(*zinfo));
+	if (!zinfo) {
+		error("zoned: no memory for zone information");
+		exit(1);
+	}
+
+	zinfo->model = model;
+
+	/* Get zone information */
+	ret = report_zones(fd, file, zinfo);
+	if (ret != 0) {
+		kfree(zinfo);
+		return ret;
+	}
+	*zinfo_ret = zinfo;
+#else
+	error("zoned: %s: Unsupported host-%s zoned block device", file,
+	      model == ZONED_HOST_MANAGED ? "managed" : "aware");
+	if (model == ZONED_HOST_MANAGED)
+		return -EOPNOTSUPP;
+
+	error("zoned: %s: handling host-aware block device as a regular disk",
+	      file);
+#endif
+
+	return 0;
+}
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
new file mode 100644
index 000000000000..461a2d624c67
--- /dev/null
+++ b/kernel-shared/zoned.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __BTRFS_ZONED_H__
+#define __BTRFS_ZONED_H__
+
+#include <stdbool.h>
+#include "kerncompat.h"
+
+#ifdef BTRFS_ZONED
+#include <linux/blkzoned.h>
+#else
+struct blk_zone {
+	int dummy;
+};
+#endif /* BTRFS_ZONED */
+
+/*
+ * Zoned block device models.
+ */
+enum btrfs_zoned_model {
+	ZONED_NONE = 0,
+	ZONED_HOST_AWARE,
+	ZONED_HOST_MANAGED,
+};
+
+/*
+ * Zone information for a zoned block device.
+ */
+struct btrfs_zoned_device_info {
+	enum btrfs_zoned_model	model;
+	u64			zone_size;
+	u32			nr_zones;
+	struct blk_zone		*zones;
+};
+
+enum btrfs_zoned_model zoned_model(const char *file);
+u64 zone_size(const char *file);
+int btrfs_get_zone_info(int fd, const char *file,
+			struct btrfs_zoned_device_info **zinfo);
+int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info);
+
+#endif /* __BTRFS_ZONED_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (4 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:48   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size Naohiro Aota
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Introduce function btrfs_check_zoned_mode() to check if ZONED flag is
enabled on the file system and if the file system consists of zoned devices
with equal zone size.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h   | 14 +++++++
 kernel-shared/disk-io.c |  6 +++
 kernel-shared/zoned.c   | 85 +++++++++++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h   |  1 +
 4 files changed, 106 insertions(+)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 77a5ad488104..aab631a44785 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -1213,8 +1213,22 @@ struct btrfs_fs_info {
 	u32 nodesize;
 	u32 sectorsize;
 	u32 stripesize;
+
+	/*
+	 * Zone size > 0 when in ZONED mode, otherwise it's used for a check
+	 * if the mode is enabled
+	 */
+	union {
+		u64 zone_size;
+		u64 zoned;
+	};
 };
 
+static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info)
+{
+	return fs_info->zoned != 0;
+}
+
 /*
  * in ram representation of the tree.  extent_root is used for all allocations
  * and for the extent tree extent_root root.
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 0519cb2358b5..4aba237f5a5c 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1326,6 +1326,12 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path,
 		goto out_chunk;
 	}
 
+	ret = btrfs_check_zoned_mode(fs_info);
+	if (ret) {
+		error("zoned: failed to initialize zoned mode: %d", ret);
+		goto out_chunk;
+	}
+
 	eb = fs_info->chunk_root->node;
 	read_extent_buffer(eb, fs_info->chunk_tree_uuid,
 			   btrfs_header_chunk_tree_uuid(eb),
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 370d93915c6e..7cb5262ba481 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -240,3 +240,88 @@ int btrfs_get_zone_info(int fd, const char *file,
 
 	return 0;
 }
+
+int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	u64 zoned_devices = 0;
+	u64 nr_devices = 0;
+	u64 zone_size = 0;
+	const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED);
+	int ret = 0;
+
+	/* Count zoned devices */
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		enum btrfs_zoned_model model;
+
+		if (device->fd == -1)
+			continue;
+
+		model = zoned_model(device->name);
+		/*
+		 * A Host-Managed zoned device must be used as a zoned device.
+		 * A Host-Aware zoned device and a non-zoned devices can be
+		 * treated as a zoned device, if ZONED flag is enabled in the
+		 * superblock.
+		 */
+		if (model == ZONED_HOST_MANAGED ||
+		    (model == ZONED_HOST_AWARE && incompat_zoned) ||
+		    (model == ZONED_NONE && incompat_zoned)) {
+			struct btrfs_zoned_device_info *zone_info =
+				device->zone_info;
+
+			zoned_devices++;
+			if (!zone_size) {
+				zone_size = zone_info->zone_size;
+			} else if (zone_info->zone_size != zone_size) {
+				error(
+		"zoned: unequal block device zone sizes: have %llu found %llu",
+				      device->zone_info->zone_size,
+				      zone_size);
+				ret = -EINVAL;
+				goto out;
+			}
+		}
+		nr_devices++;
+	}
+
+	if (!zoned_devices && !incompat_zoned)
+		goto out;
+
+	if (!zoned_devices && incompat_zoned) {
+		/* No zoned block device found on ZONED filesystem */
+		error("zoned: no zoned devices found on a zoned filesystem");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (zoned_devices && !incompat_zoned) {
+		error("zoned: mode not enabled but zoned device found");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (zoned_devices != nr_devices) {
+		error("zoned: cannot mix zoned and regular devices");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * stripe_size is always aligned to BTRFS_STRIPE_LEN in
+	 * __btrfs_alloc_chunk(). Since we want stripe_len == zone_size,
+	 * check the alignment here.
+	 */
+	if (!IS_ALIGNED(zone_size, BTRFS_STRIPE_LEN)) {
+		error("zoned: zone size %llu not aligned to stripe %u",
+		      zone_size, BTRFS_STRIPE_LEN);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	fs_info->zone_size = zone_size;
+
+out:
+	return ret;
+}
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 461a2d624c67..a6134babdf41 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -38,5 +38,6 @@ u64 zone_size(const char *file);
 int btrfs_get_zone_info(int fd, const char *file,
 			struct btrfs_zoned_device_info **zinfo);
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info);
+int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info);
 
 #endif /* __BTRFS_ZONED_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (5 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:51   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode Naohiro Aota
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

The zone append write command has a maximum IO size restriction it
accepts. This is because a zone append write command cannot be split, as
we ask the device to place the data into a specific target zone and the
device responds with the actual written location of the data.

Introduce max_zone_append_size to zone_info and fs_info to track the
value, so we can limit all I/O to a zoned block device that we want to
write using the zone append command to the device's limits.

Zone append command is mandatory for zoned btrfs. So, reject a device with
max_zone_append_size == 0.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h |  3 +++
 kernel-shared/zoned.c | 29 +++++++++++++++++++++++++++++
 kernel-shared/zoned.h |  1 +
 3 files changed, 33 insertions(+)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index aab631a44785..5023db474784 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -1222,6 +1222,9 @@ struct btrfs_fs_info {
 		u64 zone_size;
 		u64 zoned;
 	};
+
+	/* Max size to emit ZONE_APPEND write command */
+	u64 max_zone_append_size;
 };
 
 static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 7cb5262ba481..ee879a57b716 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -58,6 +58,18 @@ u64 zone_size(const char *file)
 	return strtoull((const char *)chunk, NULL, 10) << SECTOR_SHIFT;
 }
 
+u64 max_zone_append_size(const char *file)
+{
+	char chunk[32];
+	int ret;
+
+	ret = queue_param(file, "zone_append_max_bytes", chunk, sizeof(chunk));
+	if (ret <= 0)
+		return 0;
+
+	return strtoull((const char *)chunk, NULL, 10);
+}
+
 #ifdef BTRFS_ZONED
 static int report_zones(int fd, const char *file,
 			struct btrfs_zoned_device_info *zinfo)
@@ -102,9 +114,19 @@ static int report_zones(int fd, const char *file,
 
 	/* Allocate the zone information array */
 	zinfo->zone_size = zone_bytes;
+	zinfo->max_zone_append_size = max_zone_append_size(file);
 	zinfo->nr_zones = device_size / zone_bytes;
 	if (device_size & (zone_bytes - 1))
 		zinfo->nr_zones++;
+
+	if (zoned_model(file) != ZONED_NONE &&
+	    zinfo->max_zone_append_size == 0) {
+		error(
+		"zoned: zoned device %s does not support ZONE_APPEND command",
+		      file);
+		exit(1);
+	}
+
 	zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct blk_zone));
 	if (!zinfo->zones) {
 		error("zoned: no memory for zone information");
@@ -248,6 +270,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 	u64 zoned_devices = 0;
 	u64 nr_devices = 0;
 	u64 zone_size = 0;
+	u64 max_zone_append_size = 0;
 	const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED);
 	int ret = 0;
 
@@ -282,6 +305,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 				ret = -EINVAL;
 				goto out;
 			}
+			if (!max_zone_append_size ||
+			    (zone_info->max_zone_append_size &&
+			     zone_info->max_zone_append_size < max_zone_append_size))
+				max_zone_append_size =
+					zone_info->max_zone_append_size;
 		}
 		nr_devices++;
 	}
@@ -321,6 +349,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 	}
 
 	fs_info->zone_size = zone_size;
+	fs_info->max_zone_append_size = max_zone_append_size;
 
 out:
 	return ret;
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index a6134babdf41..fcf2ccf34f26 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -29,6 +29,7 @@ enum btrfs_zoned_model {
 struct btrfs_zoned_device_info {
 	enum btrfs_zoned_model	model;
 	u64			zone_size;
+	u64		        max_zone_append_size;
 	u32			nr_zones;
 	struct blk_zone		*zones;
 };
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (6 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  7:56   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Naohiro Aota
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Placing both data and metadata in a block group is impossible in ZONED
mode. For data, we can allocate a space for it and write it immediately
after the allocation. For metadata, however, we cannot do that, because the
logical addresses are recorded in other metadata buffers to build up the
trees. As a result, a data buffer can be placed after a metadata buffer,
which is not written yet. Writing out the data buffer will break the
sequential write rule.

Check and disallow MIXED_BG with ZONED mode.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index ee879a57b716..7b05fe6cc70f 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -348,6 +348,12 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 		goto out;
 	}
 
+	if (btrfs_fs_incompat(fs_info, MIXED_GROUPS)) {
+		error("zoned: mixed block groups not supported");
+		ret = -EINVAL;
+		goto out;
+	}
+
 	fs_info->zone_size = zone_size;
 	fs_info->max_zone_append_size = max_zone_append_size;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (7 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26 13:43   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode Naohiro Aota
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Run a zoned filesystem on non-zoned devices. This is done by "slicing up"
the block device into static sized chunks and fake a conventional zone on
each of them. The emulated zone size is determined from the size of device
extent.

This is mainly aimed at testing of zoned filesystems, i.e. the zoned chunk
allocator, on regular block devices.

Currently, we always use EMULATED_ZONE_SIZE (= 256MB) for the emulated zone
size. In the future, this will be customized by mkfs option.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kerncompat.h          |  1 +
 kernel-shared/zoned.c | 67 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/kerncompat.h b/kerncompat.h
index a39b79cba767..b2983ed60c4a 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -166,6 +166,7 @@ typedef long long s64;
 typedef int s32;
 #endif
 
+typedef u64 sector_t;
 
 struct vma_shared { int prio_tree_node; };
 struct vm_area_struct {
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 7b05fe6cc70f..ebaa2a81b2c8 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -14,6 +14,8 @@
 /* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */
 #define BTRFS_REPORT_NR_ZONES   4096
 
+#define EMULATED_ZONE_SIZE SZ_256M
+
 static int btrfs_get_dev_zone_info(struct btrfs_device *device);
 
 enum btrfs_zoned_model zoned_model(const char *file)
@@ -51,6 +53,10 @@ u64 zone_size(const char *file)
 	char chunk[32];
 	int ret;
 
+	/* zoned emulation on regular device */
+	if (zoned_model(file) == ZONED_NONE)
+		return EMULATED_ZONE_SIZE;
+
 	ret = queue_param(file, "chunk_sectors", chunk, sizeof(chunk));
 	if (ret <= 0)
 		return 0;
@@ -71,6 +77,46 @@ u64 max_zone_append_size(const char *file)
 }
 
 #ifdef BTRFS_ZONED
+/*
+ * Emulate blkdev_report_zones() for a non-zoned device. It slices up the block
+ * device into static sized chunks and fake a conventional zone on each of
+ * them.
+ */
+static int emulate_report_zones(const char *file, int fd, u64 pos,
+				struct blk_zone *zones, unsigned int nr_zones)
+{
+	const sector_t zone_sectors = EMULATED_ZONE_SIZE >> SECTOR_SHIFT;
+	struct stat st;
+	sector_t bdev_size;
+	unsigned int i;
+	int ret;
+
+	ret = fstat(fd, &st);
+	if (ret < 0) {
+		error("unable to stat %s: %m", file);
+		return -EIO;
+	}
+
+	bdev_size = btrfs_device_size(fd, &st) >> SECTOR_SHIFT;
+
+	pos >>= SECTOR_SHIFT;
+	for (i = 0; i < nr_zones; i++) {
+		zones[i].start = i * zone_sectors + pos;
+		zones[i].len = zone_sectors;
+		zones[i].capacity = zone_sectors;
+		zones[i].wp = zones[i].start + zone_sectors;
+		zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL;
+		zones[i].cond = BLK_ZONE_COND_NOT_WP;
+
+		if (zones[i].wp >= bdev_size) {
+			i++;
+			break;
+		}
+	}
+
+	return i;
+}
+
 static int report_zones(int fd, const char *file,
 			struct btrfs_zoned_device_info *zinfo)
 {
@@ -149,12 +195,23 @@ static int report_zones(int fd, const char *file,
 		rep->sector = sector;
 		rep->nr_zones = BTRFS_REPORT_NR_ZONES;
 
-		ret = ioctl(fd, BLKREPORTZONE, rep);
-		if (ret != 0) {
-			error("zoned: ioctl BLKREPORTZONE failed (%m)");
-			exit(1);
+		if (zinfo->model != ZONED_NONE) {
+			ret = ioctl(fd, BLKREPORTZONE, rep);
+			if (ret != 0) {
+				error("zoned: ioctl BLKREPORTZONE failed (%m)");
+				exit(1);
+			}
+		} else {
+			ret = emulate_report_zones(file, fd,
+						   sector << SECTOR_SHIFT,
+						   zone, BTRFS_REPORT_NR_ZONES);
+			if (ret < 0) {
+				error("zoned: failed to emulate BLKREPORTZONE");
+				exit(1);
+			}
 		}
 
+
 		if (!rep->nr_zones)
 			break;
 
@@ -231,8 +288,6 @@ int btrfs_get_zone_info(int fd, const char *file,
 
 	/* Check zone model */
 	model = zoned_model(file);
-	if (model == ZONED_NONE)
-		return 0;
 
 #ifdef BTRFS_ZONED
 	zinfo = calloc(1, sizeof(*zinfo));
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (8 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26 16:04   ` Johannes Thumshirn
  2021-04-26  6:27 ` [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator Naohiro Aota
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Superblock (and its copies) is the only data structure in btrfs which has a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone.  One easy solution
is limiting superblock and copies to be placed only in conventional zones.
However, this method has two downsides: one is reduced number of superblock
copies. The location of the second copy of superblock is 256GB, which is in
a sequential write required zone on typical devices in the market today.
So, the number of superblock and copies is limited to be two.  Second
downside is that we cannot support devices which have no conventional zones
at all.

To solve these two problems, we employ superblock log writing. It uses two
adjacent zones as a circular buffer to write updated superblocks.  Once the
first zone is filled up, start writing into the second one.  Then, when
both zones are filled up and before starting to write to the first zone
again, it reset the first zone.

We can determine the position of the latest superblock by reading write
pointer information from a device. One corner case is when both zones are
full. For this situation, we read out the last superblock of each zone, and
compare them to determine which zone is older.

The following zones are reserved as the circular buffer on ZONED btrfs.

- primary superblock: offset   0B (and the following zone)
- first copy:         offset 512G (and the following zone)
- Second copy:        offset   4T (4096G, and the following zone)

If these reserved zones are conventional, superblock is written fixed at
the start of the zone without logging.

Currently, superblock reading/writing is done by pread/pwrite. This commit
replace the call sites with sbread/sbwrite to wrap the functions. For zoned
btrfs, btrfs_sb_io which is called from sbread/sbwrite reverses the IO
position back to a mirror number, maps the mirror number into the
superblock logging position, and do the IO.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/inspect-dump-super.c |   3 +-
 common/device-scan.c      |   4 +-
 kerncompat.h              |  16 +++
 kernel-shared/disk-io.c   |  13 +-
 kernel-shared/zoned.c     | 280 ++++++++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h     |  29 ++++
 6 files changed, 335 insertions(+), 10 deletions(-)

diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c
index f8d1506a6afd..04e81d8c3b60 100644
--- a/cmds/inspect-dump-super.c
+++ b/cmds/inspect-dump-super.c
@@ -25,6 +25,7 @@
 #include "kernel-shared/ctree.h"
 #include "kernel-shared/disk-io.h"
 #include "kernel-shared/print-tree.h"
+#include "kernel-shared/zoned.h"
 #include "common/utils.h"
 #include "cmds/commands.h"
 #include "common/help.h"
@@ -38,7 +39,7 @@ static int load_and_dump_sb(char *filename, int fd, u64 sb_bytenr, int full,
 
 	sb = (struct btrfs_super_block *)super_block_data;
 
-	ret = pread64(fd, super_block_data, BTRFS_SUPER_INFO_SIZE, sb_bytenr);
+	ret = sbread(fd, super_block_data, sb_bytenr);
 	if (ret != BTRFS_SUPER_INFO_SIZE) {
 		/* check if the disk if too short for further superblock */
 		if (ret == 0 && errno == 0)
diff --git a/common/device-scan.c b/common/device-scan.c
index 74d7853afccb..659f48c4dedb 100644
--- a/common/device-scan.c
+++ b/common/device-scan.c
@@ -190,7 +190,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
 	btrfs_set_stack_device_bytes_used(dev_item, device->bytes_used);
 	memcpy(&dev_item->uuid, device->uuid, BTRFS_UUID_SIZE);
 
-	ret = pwrite(fd, buf, sectorsize, BTRFS_SUPER_INFO_OFFSET);
+	ret = sbwrite(fd, buf, BTRFS_SUPER_INFO_OFFSET);
 	BUG_ON(ret != sectorsize);
 
 	free(buf);
@@ -267,7 +267,7 @@ int btrfs_device_already_in_root(struct btrfs_root *root, int fd,
 		ret = -ENOMEM;
 		goto out;
 	}
-	ret = pread(fd, buf, BTRFS_SUPER_INFO_SIZE, super_offset);
+	ret = sbread(fd, buf, super_offset);
 	if (ret != BTRFS_SUPER_INFO_SIZE)
 		goto brelse;
 
diff --git a/kerncompat.h b/kerncompat.h
index b2983ed60c4a..d37edfe7fdac 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -364,6 +364,19 @@ static inline int is_power_of_2(unsigned long n)
 	return (n != 0 && ((n & (n - 1)) == 0));
 }
 
+static inline int ilog2(u64 num)
+{
+	int l = 0;
+
+	num >>= 1;
+	while (num) {
+		l++;
+		num >>= 1;
+	}
+
+	return l;
+}
+
 typedef u16 __bitwise __le16;
 typedef u16 __bitwise __be16;
 typedef u32 __bitwise __le32;
@@ -371,6 +384,9 @@ typedef u32 __bitwise __be32;
 typedef u64 __bitwise __le64;
 typedef u64 __bitwise __be64;
 
+#define U64_MAX UINT64_MAX
+#define U32_MAX UINT32_MAX
+
 /* Macros to generate set/get funcs for the struct fields
  * assume there is a lefoo_to_cpu for every type, so lets make a simple
  * one for u8:
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 4aba237f5a5c..d79d6a00cdf8 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1615,7 +1615,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 	u64 bytenr;
 
 	if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) {
-		ret = pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, sb_bytenr);
+		ret = sbread(fd, buf, sb_bytenr);
 		/* real error */
 		if (ret < 0)
 			return -errno;
@@ -1643,7 +1643,8 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 
 	for (i = 0; i < max_super; i++) {
 		bytenr = btrfs_sb_offset(i);
-		ret = pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, bytenr);
+		ret = sbread(fd, buf, bytenr);
+
 		if (ret < BTRFS_SUPER_INFO_SIZE)
 			break;
 
@@ -1715,9 +1716,8 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 		 * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is
 		 * zero filled, we can use it directly
 		 */
-		ret = pwrite64(device->fd, fs_info->super_copy,
-				BTRFS_SUPER_INFO_SIZE,
-				fs_info->super_bytenr);
+		ret = sbwrite(device->fd, fs_info->super_copy,
+			      fs_info->super_bytenr);
 		if (ret != BTRFS_SUPER_INFO_SIZE) {
 			errno = EIO;
 			error(
@@ -1750,8 +1750,7 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 		 * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is
 		 * zero filled, we can use it directly
 		 */
-		ret = pwrite64(device->fd, fs_info->super_copy,
-				BTRFS_SUPER_INFO_SIZE, bytenr);
+		ret = sbwrite(device->fd, fs_info->super_copy, bytenr);
 		if (ret != BTRFS_SUPER_INFO_SIZE) {
 			errno = EIO;
 			error(
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index ebaa2a81b2c8..1b235dc0a1c9 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -2,6 +2,7 @@
 
 #include <sys/ioctl.h>
 #include <linux/fs.h>
+#include <unistd.h>
 
 #include "kernel-lib/list.h"
 #include "kernel-shared/volumes.h"
@@ -14,6 +15,20 @@
 /* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */
 #define BTRFS_REPORT_NR_ZONES   4096
 
+/*
+ * Location of the first zone of superblock logging zone pairs.
+ *
+ * - primary superblock:    0B (zone 0)
+ * - first copy:          512G (zone starting at that offset)
+ * - second copy:           4T (zone starting at that offset)
+ */
+#define BTRFS_SB_LOG_PRIMARY_OFFSET	(0ULL)
+#define BTRFS_SB_LOG_FIRST_OFFSET	(512ULL * SZ_1G)
+#define BTRFS_SB_LOG_SECOND_OFFSET	(4096ULL * SZ_1G)
+
+#define BTRFS_SB_LOG_FIRST_SHIFT	ilog2(BTRFS_SB_LOG_FIRST_OFFSET)
+#define BTRFS_SB_LOG_SECOND_SHIFT	ilog2(BTRFS_SB_LOG_SECOND_OFFSET)
+
 #define EMULATED_ZONE_SIZE SZ_256M
 
 static int btrfs_get_dev_zone_info(struct btrfs_device *device);
@@ -117,6 +132,116 @@ static int emulate_report_zones(const char *file, int fd, u64 pos,
 	return i;
 }
 
+static int sb_write_pointer(int fd, struct blk_zone *zones, u64 *wp_ret)
+{
+	bool empty[BTRFS_NR_SB_LOG_ZONES];
+	bool full[BTRFS_NR_SB_LOG_ZONES];
+	sector_t sector;
+
+	ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL &&
+	       zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL);
+
+	empty[0] = zones[0].cond == BLK_ZONE_COND_EMPTY;
+	empty[1] = zones[1].cond == BLK_ZONE_COND_EMPTY;
+	full[0] = zones[0].cond == BLK_ZONE_COND_FULL;
+	full[1] = zones[1].cond == BLK_ZONE_COND_FULL;
+
+	/*
+	 * Possible states of log buffer zones
+	 *
+	 *           Empty[0]  In use[0]  Full[0]
+	 * Empty[1]         *          x        0
+	 * In use[1]        0          x        0
+	 * Full[1]          1          1        C
+	 *
+	 * Log position:
+	 *   *: Special case, no superblock is written
+	 *   0: Use write pointer of zones[0]
+	 *   1: Use write pointer of zones[1]
+	 *   C: Compare super blocks from zones[0] and zones[1], use the latest
+	 *      one determined by generation
+	 *   x: Invalid state
+	 */
+
+	if (empty[0] && empty[1]) {
+		/* Special case to distinguish no superblock to read */
+		*wp_ret = zones[0].start << SECTOR_SHIFT;
+		return -ENOENT;
+	} else if (full[0] && full[1]) {
+		/* Compare two super blocks */
+		u8 buf[BTRFS_NR_SB_LOG_ZONES][BTRFS_SUPER_INFO_SIZE];
+		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
+		int i;
+		int ret;
+
+		for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+			u64 bytenr;
+
+			bytenr = ((zones[i].start + zones[i].len)
+				   << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE;
+
+			ret = pread64(fd, buf[i], BTRFS_SUPER_INFO_SIZE,
+				      bytenr);
+			if (ret != BTRFS_SUPER_INFO_SIZE)
+				return -EIO;
+			super[i] = (struct btrfs_super_block *)&buf[i];
+		}
+
+		if (super[0]->generation > super[1]->generation)
+			sector = zones[1].start;
+		else
+			sector = zones[0].start;
+	} else if (!full[0] && (empty[1] || full[1])) {
+		sector = zones[0].wp;
+	} else if (full[0]) {
+		sector = zones[1].wp;
+	} else {
+		return -EUCLEAN;
+	}
+	*wp_ret = sector << SECTOR_SHIFT;
+	return 0;
+}
+
+/*
+ * Get the first zone number of the superblock mirror
+ */
+static inline u32 sb_zone_number(int shift, int mirror)
+{
+	u64 zone = 0;
+
+	ASSERT(0 <= mirror && mirror < BTRFS_SUPER_MIRROR_MAX);
+	switch (mirror) {
+	case 0: zone = 0; break;
+	case 1: zone = 1ULL << (BTRFS_SB_LOG_FIRST_SHIFT - shift); break;
+	case 2: zone = 1ULL << (BTRFS_SB_LOG_SECOND_SHIFT - shift); break;
+	}
+
+	ASSERT(zone <= U32_MAX);
+
+	return (u32)zone;
+}
+
+int btrfs_reset_dev_zone(int fd, struct blk_zone *zone)
+{
+	struct blk_zone_range range;
+
+	/* Nothing to do if it is already empty */
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL ||
+	    zone->cond == BLK_ZONE_COND_EMPTY)
+		return 0;
+
+	range.sector = zone->start;
+	range.nr_sectors = zone->len;
+
+	if (ioctl(fd, BLKRESETZONE, &range) < 0)
+		return -errno;
+
+	zone->cond = BLK_ZONE_COND_EMPTY;
+	zone->wp = zone->start;
+
+	return 0;
+}
+
 static int report_zones(int fd, const char *file,
 			struct btrfs_zoned_device_info *zinfo)
 {
@@ -232,6 +357,161 @@ static int report_zones(int fd, const char *file,
 	return 0;
 }
 
+static int sb_log_location(int fd, struct blk_zone *zones, int rw,
+			   u64 *bytenr_ret)
+{
+	u64 wp;
+	int ret;
+
+	/* Use the head of the zones if either zone is conventional */
+	if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		*bytenr_ret = zones[0].start << SECTOR_SHIFT;
+		return 0;
+	} else if (zones[1].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		*bytenr_ret = zones[1].start << SECTOR_SHIFT;
+		return 0;
+	}
+
+	ret = sb_write_pointer(fd, zones, &wp);
+	if (ret != -ENOENT && ret < 0)
+		return ret;
+
+	if (rw == WRITE) {
+		struct blk_zone *reset = NULL;
+
+		if (wp == zones[0].start << SECTOR_SHIFT)
+			reset = &zones[0];
+		else if (wp == zones[1].start << SECTOR_SHIFT)
+			reset = &zones[1];
+
+		if (reset && reset->cond != BLK_ZONE_COND_EMPTY) {
+			ASSERT(reset->cond == BLK_ZONE_COND_FULL);
+
+			ret = btrfs_reset_dev_zone(fd, reset);
+			if (ret)
+				return ret;
+		}
+	} else if (ret != -ENOENT) {
+		/* For READ, we want the previous one */
+		if (wp == zones[0].start << SECTOR_SHIFT)
+			wp = (zones[1].start + zones[1].len) << SECTOR_SHIFT;
+		wp -= BTRFS_SUPER_INFO_SIZE;
+	}
+
+	*bytenr_ret = wp;
+	return 0;
+
+}
+
+size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw)
+{
+	size_t count = BTRFS_SUPER_INFO_SIZE;
+	struct stat stat_buf;
+	struct blk_zone_report *rep;
+	struct blk_zone *zones;
+	const u64 sb_size_sector = BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT;
+	u64 mapped = U64_MAX;
+	u32 zone_num;
+	unsigned int zone_size_sector;
+	size_t rep_size;
+	int mirror = -1;
+	int i;
+	int ret;
+	size_t ret_sz;
+
+	ASSERT(rw == READ || rw == WRITE);
+
+	if (fstat(fd, &stat_buf) == -1) {
+		error("fstat failed (%s)", strerror(errno));
+		exit(1);
+	}
+
+	/* Do not call ioctl(BLKGETZONESZ) on a regular file. */
+	if ((stat_buf.st_mode & S_IFMT) == S_IFBLK) {
+		ret = ioctl(fd, BLKGETZONESZ, &zone_size_sector);
+		if (ret) {
+			error("zoned: ioctl BLKGETZONESZ failed (%m)");
+			exit(1);
+		}
+	} else {
+		zone_size_sector = 0;
+	}
+
+	/* We can call pread/pwrite if 'fd' is non-zoned device/file. */
+	if (zone_size_sector == 0) {
+		if (rw == READ)
+			return pread64(fd, buf, count, offset);
+		return pwrite64(fd, buf, count, offset);
+	}
+
+	ASSERT(IS_ALIGNED(zone_size_sector, sb_size_sector));
+
+	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+		if (offset == btrfs_sb_offset(i)) {
+			mirror = i;
+			break;
+		}
+	}
+	ASSERT(mirror != -1);
+
+	zone_num = sb_zone_number(ilog2(zone_size_sector) + SECTOR_SHIFT,
+				  mirror);
+
+	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2;
+	rep = calloc(1, rep_size);
+	if (!rep) {
+		error("zoned: no memory for zones report");
+		exit(1);
+	}
+
+	rep->sector = zone_num * (sector_t)zone_size_sector;
+	rep->nr_zones = 2;
+
+	ret = ioctl(fd, BLKREPORTZONE, rep);
+	if (ret) {
+		error("zoned: ioctl BLKREPORTZONE failed (%m)");
+		exit(1);
+	}
+	if (rep->nr_zones != 2) {
+		if (errno == ENOENT || errno == 0)
+			return (rw == WRITE ? count : 0);
+		error("zoned: failed to read zone info of %u and %u: %m",
+		      zone_num, zone_num + 1);
+		free(rep);
+		return 0;
+	}
+
+	zones = (struct blk_zone *)(rep + 1);
+
+	ret = sb_log_location(fd, zones, rw, &mapped);
+	/*
+	 * Special case: no superblock found in the zones. This case happens
+	 * when initializing a file-system.
+	 */
+	if (rw == READ && ret == -ENOENT) {
+		memset(buf, 0, count);
+		return count;
+	}
+	if (ret)
+		return ret;
+
+	if (rw == READ)
+		ret_sz = pread64(fd, buf, count, mapped);
+	else
+		ret_sz = pwrite64(fd, buf, count, mapped);
+
+	if (ret_sz != count)
+		return ret_sz;
+
+	/* Call fsync() to force the write order */
+	if (rw == WRITE && fsync(fd)) {
+		error("failed to synchronize superblock: %s", strerror(errno));
+		exit(1);
+	}
+
+	return ret_sz;
+}
+
 #endif
 
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index fcf2ccf34f26..82e3096eab8a 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -3,9 +3,14 @@
 #ifndef __BTRFS_ZONED_H__
 #define __BTRFS_ZONED_H__
 
+#include "kernel-shared/disk-io.h"
+
 #include <stdbool.h>
 #include "kerncompat.h"
 
+/* Number of superblock log zones */
+#define BTRFS_NR_SB_LOG_ZONES 2
+
 #ifdef BTRFS_ZONED
 #include <linux/blkzoned.h>
 #else
@@ -41,4 +46,28 @@ int btrfs_get_zone_info(int fd, const char *file,
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info);
 int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info);
 
+#ifdef BTRFS_ZONED
+size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw);
+static inline size_t sbread(int fd, void *buf, off_t offset)
+{
+	return btrfs_sb_io(fd, buf, offset, READ);
+}
+static inline size_t sbwrite(int fd, void *buf, off_t offset)
+{
+	return btrfs_sb_io(fd, buf, offset, WRITE);
+}
+int btrfs_reset_dev_zone(int fd, struct blk_zone *zone);
+#else
+#define sbread(fd, buf, offset) \
+	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
+#define sbwrite(fd, buf, offset) \
+	pwrite64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
+
+static inline int btrfs_reset_dev_zone(int fd, struct blk_zone *zone)
+{
+	return 0;
+}
+
+#endif /* BTRFS_ZONED */
+
 #endif /* __BTRFS_ZONED_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (9 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-27 17:19   ` David Sterba
  2021-04-26  6:27 ` [PATCH 12/26] btrfs-progs: zoned: load zone's allocation offset Naohiro Aota
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Implement a zoned chunk and device extent allocator. One device zone
becomes a device extent so that a zone reset affects only this device
extent and does not change the state of blocks in the neighbor device
extents.

To implement the allocator, we need to extend the following functions for
a zoned filesystem.

- init_alloc_chunk_ctl
- dev_extent_search_start
- dev_extent_hole_check
- decide_stripe_size

Here, dev_extent_hole_check() is newly introduced to check the validity of
a hole found.

init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always
set the stripe_size to the zone size and aligns the parameters to the zone
size.

dev_extent_search_start() only aligns the start offset to zone boundaries.
We don't care about the first 1MB like in regular filesystem because we
anyway reserve the first two zones for superblock logging.

dev_extent_hole_check_zoned() checks if zones in given hole are either
conventional or empty sequential zones. Also, it skips zones reserved for
superblock logging.

With the change to the hole, the new hole may now contain pending extents.
So, in this case, loop again to check that.

Finally, decide_stripe_size_zoned() should shrink the number of devices
instead of stripe size because we need to honor stripe_size == zone_size.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kerncompat.h            |   2 +
 kernel-shared/volumes.c | 143 ++++++++++++++++++++++++++++++++++++++--
 kernel-shared/volumes.h |   1 +
 kernel-shared/zoned.c   | 139 ++++++++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h   |  51 ++++++++++++++
 5 files changed, 332 insertions(+), 4 deletions(-)

diff --git a/kerncompat.h b/kerncompat.h
index d37edfe7fdac..c58e8a27430f 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -28,6 +28,7 @@
 #include <assert.h>
 #include <stddef.h>
 #include <linux/types.h>
+#include <linux/kernel.h>
 #include <stdint.h>
 
 #include <features.h>
@@ -358,6 +359,7 @@ do {					\
 
 /* Alignment check */
 #define IS_ALIGNED(x, a)                (((x) & ((typeof(x))(a) - 1)) == 0)
+#define ALIGN(x, a)		__ALIGN_KERNEL((x), (a))
 
 static inline int is_power_of_2(unsigned long n)
 {
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index 63530a99b41c..ecfc63265f35 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -162,6 +162,8 @@ struct alloc_chunk_ctl {
 	u64 max_chunk_size;
 	int total_devs;
 	u64 dev_offset;
+	int nparity;
+	int ncopies;
 };
 
 struct stripe {
@@ -457,6 +459,8 @@ int btrfs_scan_one_device(int fd, const char *path,
 
 static u64 dev_extent_search_start(struct btrfs_device *device, u64 start)
 {
+	u64 zone_size;
+
 	switch (device->fs_devices->chunk_alloc_policy) {
 	case BTRFS_CHUNK_ALLOC_REGULAR:
 		/*
@@ -465,11 +469,72 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start)
 		 * make sure to start at an offset of at least 1MB.
 		 */
 		return max(start, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER);
+	case BTRFS_CHUNK_ALLOC_ZONED:
+		zone_size = device->zone_info->zone_size;
+		return ALIGN(max_t(u64, start, zone_size), zone_size);
 	default:
 		BUG();
 	}
 }
 
+static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
+					u64 *hole_start, u64 *hole_size,
+					u64 num_bytes)
+{
+	u64 zone_size = device->zone_info->zone_size;
+	u64 pos;
+	bool changed = false;
+
+	ASSERT(IS_ALIGNED(*hole_start, zone_size));
+
+	while (*hole_size > 0) {
+		pos = btrfs_find_allocatable_zones(device, *hole_start,
+						   *hole_start + *hole_size,
+						   num_bytes);
+		if (pos != *hole_start) {
+			*hole_size = *hole_start + *hole_size - pos;
+			*hole_start = pos;
+			changed = true;
+			if (*hole_size < num_bytes)
+				break;
+		}
+
+		*hole_start += zone_size;
+		*hole_size -= zone_size;
+		changed = true;
+	}
+
+	return changed;
+}
+
+/**
+ * dev_extent_hole_check - check if specified hole is suitable for allocation
+ * @device:	the device which we have the hole
+ * @hole_start: starting position of the hole
+ * @hole_size:	the size of the hole
+ * @num_bytes:	the size of the free space that we need
+ *
+ * This function may modify @hole_start and @hole_size to reflect the suitable
+ * position for allocation. Returns true if hole position is updated, false
+ * otherwise.
+ */
+static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
+				  u64 *hole_size, u64 num_bytes)
+{
+	switch (device->fs_devices->chunk_alloc_policy) {
+	case BTRFS_CHUNK_ALLOC_REGULAR:
+		/* No check */
+		break;
+	case BTRFS_CHUNK_ALLOC_ZONED:
+		return dev_extent_hole_check_zoned(device, hole_start,
+						   hole_size, num_bytes);
+	default:
+		BUG();
+	}
+
+	return false;
+}
+
 /*
  * find_free_dev_extent_start - find free space in the specified device
  * @device:	  the device which we search the free space in
@@ -507,6 +572,10 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	int ret;
 	int slot;
 	struct extent_buffer *l;
+	u64 zone_size = 0;
+
+	if (device->zone_info)
+		zone_size = device->zone_info->zone_size;
 
 	search_start = dev_extent_search_start(device, search_start);
 
@@ -517,6 +586,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	max_hole_start = search_start;
 	max_hole_size = 0;
 
+again:
 	if (search_start >= search_end) {
 		ret = -ENOSPC;
 		goto out;
@@ -562,11 +632,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 
 		if (key.offset > search_start) {
 			hole_size = key.offset - search_start;
+			dev_extent_hole_check(device, &search_start, &hole_size,
+					      num_bytes);
 
-			/*
-			 * Have to check before we set max_hole_start, otherwise
-			 * we could end up sending back this offset anyway.
-			 */
 			if (hole_size > max_hole_size) {
 				max_hole_start = search_start;
 				max_hole_size = hole_size;
@@ -603,6 +671,12 @@ next:
 	 * search_end may be smaller than search_start.
 	 */
 	if (search_end > search_start) {
+		if (dev_extent_hole_check(device, &search_start, &hole_size,
+					  num_bytes)) {
+			btrfs_release_path(path);
+			goto again;
+		}
+
 		hole_size = search_end - search_start;
 
 		if (hole_size > max_hole_size) {
@@ -618,6 +692,7 @@ next:
 		ret = 0;
 
 out:
+	ASSERT(zone_size == 0 || IS_ALIGNED(max_hole_start, zone_size));
 	btrfs_free_path(path);
 	*start = max_hole_start;
 	if (len)
@@ -646,6 +721,11 @@ int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans,
 	struct extent_buffer *leaf;
 	struct btrfs_key key;
 
+	/* Check alignment to zone for a zoned block device */
+	ASSERT(!device->zone_info ||
+	       device->zone_info->model != ZONED_HOST_MANAGED ||
+	       IS_ALIGNED(start, device->zone_info->zone_size));
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -1052,6 +1132,38 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info,
 	ctl->max_chunk_size = min(percent_max, ctl->max_chunk_size);
 }
 
+static void init_alloc_chunk_ctl_policy_zoned(struct btrfs_fs_info *info,
+					      struct alloc_chunk_ctl *ctl)
+{
+	u64 type = ctl->type;
+	u64 zone_size = info->zone_size;
+	int min_num_stripes = ctl->min_stripes * ctl->num_stripes;
+	int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies;
+	u64 min_chunk_size = min_data_stripes * zone_size;
+
+	ctl->stripe_size = zone_size;
+	ctl->min_stripe_size = zone_size;
+	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
+			ctl->max_chunk_size = SZ_16M;
+			ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
+		} else if (type & BTRFS_BLOCK_GROUP_DATA) {
+			ctl->max_chunk_size = 10ULL * SZ_1G;
+			ctl->max_stripes = BTRFS_MAX_DEVS(info);
+		} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
+			/* for larger filesystems, use larger metadata chunks */
+			if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
+				ctl->max_chunk_size = SZ_1G;
+			else
+				ctl->max_chunk_size = SZ_256M;
+			ctl->max_stripes = BTRFS_MAX_DEVS(info);
+		}
+	}
+
+	ctl->max_chunk_size = round_down(ctl->max_chunk_size, zone_size);
+	ctl->max_chunk_size = max(ctl->max_chunk_size, min_chunk_size);
+}
+
 static void init_alloc_chunk_ctl(struct btrfs_fs_info *info,
 				 struct alloc_chunk_ctl *ctl)
 {
@@ -1066,11 +1178,16 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_info *info,
 	ctl->max_chunk_size = 4 * ctl->stripe_size;
 	ctl->total_devs = btrfs_super_num_devices(info->super_copy);
 	ctl->dev_offset = 0;
+	ctl->nparity = btrfs_raid_array[type].nparity;
+	ctl->ncopies = btrfs_raid_array[type].ncopies;
 
 	switch (info->fs_devices->chunk_alloc_policy) {
 	case BTRFS_CHUNK_ALLOC_REGULAR:
 		init_alloc_chunk_ctl_policy_regular(info, ctl);
 		break;
+	case BTRFS_CHUNK_ALLOC_ZONED:
+		init_alloc_chunk_ctl_policy_zoned(info, ctl);
+		break;
 	default:
 		BUG();
 	}
@@ -1113,12 +1230,27 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl)
 	return 0;
 }
 
+static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl)
+{
+	if (chunk_bytes_by_type(ctl) > ctl->max_chunk_size) {
+		/* stripe_size is fixed in ZONED. Reduce num_stripes instead. */
+		ctl->num_stripes = ctl->max_chunk_size * ctl->ncopies /
+			ctl->stripe_size;
+		if (ctl->num_stripes < ctl->min_stripes)
+			return -ENOSPC;
+	}
+
+	return 0;
+}
+
 static int decide_stripe_size(struct btrfs_fs_info *info,
 			      struct alloc_chunk_ctl *ctl)
 {
 	switch (info->fs_devices->chunk_alloc_policy) {
 	case BTRFS_CHUNK_ALLOC_REGULAR:
 		return decide_stripe_size_regular(ctl);
+	case BTRFS_CHUNK_ALLOC_ZONED:
+		return decide_stripe_size_zoned(ctl);
 	default:
 		BUG();
 	}
@@ -1140,6 +1272,7 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 	int index;
 	struct btrfs_key key;
 	u64 offset;
+	u64 zone_size = info->zone_size;
 
 	if (!ctl->start) {
 		ret = find_next_chunk(info, &offset);
@@ -1192,6 +1325,8 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 			BUG_ON(ret);
 		}
 
+		ASSERT(!zone_size || IS_ALIGNED(dev_offset, zone_size));
+
 		device->bytes_used += ctl->stripe_size;
 		ret = btrfs_update_device(trans, device);
 		if (ret < 0)
diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
index a64288d566d8..5a85a6c0bc6f 100644
--- a/kernel-shared/volumes.h
+++ b/kernel-shared/volumes.h
@@ -74,6 +74,7 @@ struct btrfs_device {
 
 enum btrfs_chunk_allocation_policy {
 	BTRFS_CHUNK_ALLOC_REGULAR,
+	BTRFS_CHUNK_ALLOC_ZONED,
 };
 
 struct btrfs_fs_devices {
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 1b235dc0a1c9..e828d633619a 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -512,6 +512,144 @@ size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw)
 	return ret_sz;
 }
 
+/*
+ * btrfs_check_allocatable_zones - check if spcecifeid region is
+ *                                 suitable for allocation
+ * @device:	the device to allocate a region
+ * @pos:	the position of the region
+ * @num_bytes:	the size of the region
+ *
+ * In non-ZONED device, anywhere is suitable for allocation. In ZONED
+ * device, check if
+ * 1) the region is not on non-empty sequential zones,
+ * 2) all zones in the region have the same zone type,
+ * 3) it does not contain super block location
+ */
+bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
+				   u64 num_bytes)
+{
+	struct btrfs_zoned_device_info *zinfo = device->zone_info;
+	u64 nzones, begin, end;
+	u64 sb_pos;
+	bool is_sequential;
+	int shift;
+	int i;
+
+	if (!zinfo || zinfo->model == ZONED_NONE)
+		return true;
+
+	nzones = num_bytes / zinfo->zone_size;
+	begin = pos / zinfo->zone_size;
+	end = begin + nzones;
+
+	ASSERT(IS_ALIGNED(pos, zinfo->zone_size));
+	ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size));
+
+	if (end > zinfo->nr_zones)
+		return false;
+
+	shift = ilog2(zinfo->zone_size);
+	for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+		sb_pos = sb_zone_number(shift, i);
+		if (!(end < sb_pos || sb_pos + 1 < begin))
+			return false;
+	}
+
+	is_sequential = btrfs_dev_is_sequential(device, pos);
+
+	while (num_bytes) {
+		if (is_sequential && !btrfs_dev_is_empty_zone(device, pos))
+			return false;
+		if (is_sequential != btrfs_dev_is_sequential(device, pos))
+			return false;
+
+		pos += zinfo->zone_size;
+		num_bytes -= zinfo->zone_size;
+	}
+
+	return true;
+}
+
+/**
+ * btrfs_find_allocatable_zones - find allocatable zones within a given region
+ *
+ * @device:	the device to allocate a region on
+ * @hole_start: the position of the hole to allocate the region
+ * @num_bytes:	size of wanted region
+ * @hole_end:	the end of the hole
+ * @return:	position of allocatable zones
+ *
+ * Allocatable region should not contain any superblock locations.
+ */
+u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
+				 u64 hole_end, u64 num_bytes)
+{
+	struct btrfs_zoned_device_info *zinfo = device->zone_info;
+	int shift = ilog2(zinfo->zone_size);
+	u64 nzones = num_bytes >> shift;
+	u64 pos = hole_start;
+	u64 begin, end;
+	bool is_sequential;
+	bool have_sb;
+	int i;
+
+	ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size));
+	ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size));
+
+	while (pos < hole_end) {
+		begin = pos >> shift;
+		end = begin + nzones;
+
+		if (end > zinfo->nr_zones)
+			return hole_end;
+
+		/*
+		 * The zones must be all sequential (and empty), or
+		 * conventional zones
+		 */
+		is_sequential = btrfs_dev_is_sequential(device, pos);
+		for (i = 0; i < end - begin; i++) {
+			u64 zone_offset = pos + ((u64)i << shift);
+
+			if ((is_sequential &&
+			     !btrfs_dev_is_empty_zone(device, zone_offset)) ||
+			    (is_sequential !=
+			     btrfs_dev_is_sequential(device, zone_offset))) {
+				pos += zinfo->zone_size;
+				continue;
+			}
+		}
+
+		have_sb = false;
+		for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+			u32 sb_zone;
+			u64 sb_pos;
+
+			sb_zone = sb_zone_number(shift, i);
+			if (!(end <= sb_zone ||
+			      sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) {
+				have_sb = true;
+				pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift;
+				break;
+			}
+
+			/* We also need to exclude regular superblock positions */
+			sb_pos = btrfs_sb_offset(i);
+			if (!(pos + num_bytes <= sb_pos ||
+			      sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) {
+				have_sb = true;
+				pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE,
+					    zinfo->zone_size);
+				break;
+			}
+		}
+		if (!have_sb)
+			break;
+	}
+
+	return pos;
+}
+
 #endif
 
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
@@ -691,6 +829,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 
 	fs_info->zone_size = zone_size;
 	fs_info->max_zone_append_size = max_zone_append_size;
+	fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED;
 
 out:
 	return ret;
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 82e3096eab8a..29c203f45ada 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -7,6 +7,7 @@
 
 #include <stdbool.h>
 #include "kerncompat.h"
+#include "kernel-shared/volumes.h"
 
 /* Number of superblock log zones */
 #define BTRFS_NR_SB_LOG_ZONES 2
@@ -56,7 +57,34 @@ static inline size_t sbwrite(int fd, void *buf, off_t offset)
 {
 	return btrfs_sb_io(fd, buf, offset, WRITE);
 }
+
+static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo,
+				      u64 bytenr)
+{
+	unsigned int zno;
+
+	if (!zinfo || zinfo->model == ZONED_NONE)
+		return false;
+
+	zno = bytenr / zinfo->zone_size;
+	return zinfo->zones[zno].type == BLK_ZONE_TYPE_SEQWRITE_REQ;
+}
+
+static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos)
+{
+	struct btrfs_zoned_device_info *zinfo = device->zone_info;
+	unsigned int zno;
+
+	if (!zone_is_sequential(zinfo, pos))
+		return true;
+
+	zno = pos / zinfo->zone_size;
+	return zinfo->zones[zno].cond == BLK_ZONE_COND_EMPTY;
+}
+
 int btrfs_reset_dev_zone(int fd, struct blk_zone *zone);
+u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
+				 u64 hole_end, u64 num_bytes);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -68,6 +96,29 @@ static inline int btrfs_reset_dev_zone(int fd, struct blk_zone *zone)
 	return 0;
 }
 
+static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo,
+				      u64 bytenr)
+{
+	return false;
+}
+
+static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device,
+					       u64 hole_start, u64 hole_end,
+					       u64 num_bytes)
+{
+	return hole_start;
+}
+
+static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos)
+{
+	return true;
+}
+
 #endif /* BTRFS_ZONED */
 
+static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
+{
+	return zone_is_sequential(device->zone_info, pos);
+}
+
 #endif /* __BTRFS_ZONED_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/26] btrfs-progs: zoned: load zone's allocation offset
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (10 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 13/26] btrfs-progs: zoned: implement sequential extent allocation Naohiro Aota
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

A zoned filesystem must allocate blocks at the zones' write pointer. The
device's write pointer position can be mapped to a logical address within a
block group. To facilitate this, add an "alloc_offset" to the block-group
to track the logical addresses of the write pointer.

This logical address is populated in btrfs_load_block_group_zone_info()
from the write pointers of corresponding zones.

For now, zoned filesystems the single profile. Supporting non-single
profile with zone append writing is not trivial. For example, in the DUP
profile, we send a zone append writing IO to two zones on a device. The
device reply with written LBAs for the IOs. If the offsets of the returned
addresses from the beginning of the zone are different, then it results in
different logical addresses.

We need fine-grained logical to physical mapping to support such separated
physical address issue. Since it should require additional metadata type,
disable non-single profiles for now.

This commit supports the case all the zones in a block group are
sequential. The next patch will handle the case having a conventional zone.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h       |   6 ++
 kernel-shared/extent-tree.c |   8 +++
 kernel-shared/zoned.c       | 133 ++++++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h       |   8 +++
 4 files changed, 155 insertions(+)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 5023db474784..a68c8bd38bd2 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -1134,6 +1134,12 @@ struct btrfs_block_group {
 
 	/* For dirty block groups */
 	struct list_head dirty_list;
+
+	/*
+	 * Allocation offset for the block group to implement sequential
+	 * allocation. This is used only with ZONED mode enabled.
+	 */
+	u64 alloc_offset;
 };
 
 struct btrfs_device;
diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 5b1fbe10283a..ec5ea9a8e090 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -31,6 +31,7 @@
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/free-space-cache.h"
 #include "kernel-shared/free-space-tree.h"
+#include "kernel-shared/zoned.h"
 #include "common/utils.h"
 
 #define PENDING_EXTENT_INSERT 0
@@ -2704,6 +2705,10 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 	}
 	cache->space_info = space_info;
 
+	ret = btrfs_load_block_group_zone_info(fs_info, cache);
+	if (ret)
+		return ret;
+
 	btrfs_add_block_group_cache(fs_info, cache);
 	return 0;
 }
@@ -2761,6 +2766,9 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type,
 	cache->start = chunk_offset;
 	cache->length = size;
 
+	ret = btrfs_load_block_group_zone_info(fs_info, cache);
+	BUG_ON(ret);
+
 	cache->used = bytes_used;
 	cache->flags = type;
 	INIT_LIST_HEAD(&cache->dirty_list);
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index e828d633619a..8b51115e667f 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -14,6 +14,10 @@
 
 /* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */
 #define BTRFS_REPORT_NR_ZONES   4096
+/* Invalid allocation pointer value for missing devices */
+#define WP_MISSING_DEV ((u64)-1)
+/* Pseudo write pointer value for conventional zone */
+#define WP_CONVENTIONAL ((u64)-2)
 
 /*
  * Location of the first zone of superblock logging zone pairs.
@@ -650,6 +654,135 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 	return pos;
 }
 
+int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
+				     struct btrfs_block_group *cache)
+{
+	struct btrfs_device *device;
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	u64 logical = cache->start;
+	u64 length = cache->length;
+	u64 physical = 0;
+	int ret = 0;
+	int i;
+	u64 *alloc_offsets = NULL;
+	u32 num_sequential = 0, num_conventional = 0;
+
+	if (!btrfs_is_zoned(fs_info))
+		return 0;
+
+	/* Sanity check */
+	if (logical == BTRFS_BLOCK_RESERVED_1M_FOR_SUPER) {
+		if (length + SZ_1M != fs_info->zone_size) {
+			error("zoned: unaligned initial system block group");
+			return -EIO;
+		}
+	} else if (!IS_ALIGNED(length, fs_info->zone_size)) {
+		error("zoned: unaligned block group at %llu + %llu", logical,
+		      length);
+		return -EIO;
+	}
+
+	/* Get the chunk mapping */
+	ce = search_cache_extent(&map_tree->cache_tree, logical);
+	if (!ce) {
+		error("zoned: failed to find block group at %llu", logical);
+		return -ENOENT;
+	}
+	map = container_of(ce, struct map_lookup, ce);
+
+	alloc_offsets = calloc(map->num_stripes, sizeof(*alloc_offsets));
+	if (!alloc_offsets) {
+		error("zoned: failed to allocate alloc_offsets");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < map->num_stripes; i++) {
+		bool is_sequential;
+		struct blk_zone zone;
+
+		device = map->stripes[i].dev;
+		physical = map->stripes[i].physical;
+
+		if (device->fd == -1) {
+			alloc_offsets[i] = WP_MISSING_DEV;
+			continue;
+		}
+
+		is_sequential = btrfs_dev_is_sequential(device, physical);
+		if (is_sequential)
+			num_sequential++;
+		else
+			num_conventional++;
+
+		if (!is_sequential) {
+			alloc_offsets[i] = WP_CONVENTIONAL;
+			continue;
+		}
+
+		/*
+		 * The group is mapped to a sequential zone. Get the zone write
+		 * pointer to determine the allocation offset within the zone.
+		 */
+		WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size));
+		zone = device->zone_info->zones[physical / fs_info->zone_size];
+
+		switch (zone.cond) {
+		case BLK_ZONE_COND_OFFLINE:
+		case BLK_ZONE_COND_READONLY:
+			error(
+		"zoned: offline/readonly zone %llu on device %s (devid %llu)",
+			      physical / fs_info->zone_size, device->name,
+			      device->devid);
+			alloc_offsets[i] = WP_MISSING_DEV;
+			break;
+		case BLK_ZONE_COND_EMPTY:
+			alloc_offsets[i] = 0;
+			break;
+		case BLK_ZONE_COND_FULL:
+			alloc_offsets[i] = fs_info->zone_size;
+			break;
+		default:
+			/* Partially used zone */
+			alloc_offsets[i] =
+					((zone.wp - zone.start) << SECTOR_SHIFT);
+			break;
+		}
+	}
+
+	if (num_conventional > 0) {
+		/*
+		 * Since conventional zones do not have a write pointer, we
+		 * cannot determine alloc_offset from the pointer
+		 */
+		ret = -EINVAL;
+		goto out;
+	}
+
+	switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case 0: /* single */
+		cache->alloc_offset = alloc_offsets[0];
+		break;
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID0:
+	case BTRFS_BLOCK_GROUP_RAID10:
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID6:
+		/* non-single profiles are not supported yet */
+	default:
+		error("zoned: profile %s not yet supported",
+		      btrfs_group_profile_str(map->type));
+		ret = -EINVAL;
+		goto out;
+	}
+
+out:
+	free(alloc_offsets);
+	return ret;
+}
+
 #endif
 
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 29c203f45ada..45d77c8daa69 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -85,6 +85,8 @@ static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos)
 int btrfs_reset_dev_zone(int fd, struct blk_zone *zone);
 u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 				 u64 hole_end, u64 num_bytes);
+int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
+				     struct btrfs_block_group *cache);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -114,6 +116,12 @@ static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos)
 	return true;
 }
 
+static inline int btrfs_load_block_group_zone_info(
+	struct btrfs_fs_info *fs_info, struct btrfs_block_group *cache)
+{
+	return 0;
+}
+
 #endif /* BTRFS_ZONED */
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/26] btrfs-progs: zoned: implement sequential extent allocation
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (11 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 12/26] btrfs-progs: zoned: load zone's allocation offset Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 14/26] btrfs-progs: zoned: calculate allocation offset for conventional zones Naohiro Aota
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Implement a sequential extent allocator for zoned filesystems. This
allocator only needs to check if there is enough space in the block group
after the allocation pointer to satisfy the extent allocation request.

Since the allocator is really simple, we implement it directly in
find_search_start().

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/extent-tree.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index ec5ea9a8e090..7453bf9f49b6 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -284,6 +284,14 @@ again:
 	if (cache->ro || !block_group_bits(cache, data))
 		goto new_group;
 
+	if (btrfs_is_zoned(root->fs_info)) {
+		if (cache->length - cache->alloc_offset < num)
+			goto new_group;
+		*start_ret = cache->start + cache->alloc_offset;
+		cache->alloc_offset += num;
+		return 0;
+	}
+
 	while(1) {
 		ret = find_first_extent_bit(&root->fs_info->free_space_cache,
 					    last, &start, &end, EXTENT_DIRTY);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/26] btrfs-progs: zoned: calculate allocation offset for conventional zones
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (12 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 13/26] btrfs-progs: zoned: implement sequential extent allocation Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 15/26] btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs Naohiro Aota
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Conventional zones do not have a write pointer, so we cannot use it to
determine the allocation offset for sequential allocation if a block group
contains a conventional zone.

But instead, we can consider the end of the highest addressed extent in the
block group for the allocation offset.

For new block group, we cannot calculate the allocation offset by
consulting the extent tree, because it can cause deadlock by taking extent
buffer lock after chunk mutex, which is already taken in
btrfs_make_block_group(). Since it is a new block group anyways, we can
simply set the allocation offset to 0.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/zoned.c | 85 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 79 insertions(+), 6 deletions(-)

diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 8b51115e667f..715a7881328c 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -654,6 +654,67 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 	return pos;
 }
 
+/*
+ * Calculate an allocation pointer from the extent allocation information
+ * for a block group consist of conventional zones. It is pointed to the
+ * end of the highest addressed extent in the block group as an allocation
+ * offset.
+ */
+static int calculate_alloc_pointer(struct btrfs_fs_info *fs_info,
+				   struct btrfs_block_group *cache,
+				   u64 *offset_ret)
+{
+	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct btrfs_key found_key;
+	int ret;
+	u64 length;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = cache->start + cache->length;
+	key.type = 0;
+	key.offset = 0;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	/* We should not find the exact match */
+	if (!ret)
+		ret = -EUCLEAN;
+	if (ret < 0)
+		goto out;
+
+	ret = btrfs_previous_extent_item(root, path, cache->start);
+	if (ret) {
+		if (ret == 1) {
+			ret = 0;
+			*offset_ret = 0;
+		}
+		goto out;
+	}
+
+	btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]);
+
+	if (found_key.type == BTRFS_EXTENT_ITEM_KEY)
+		length = found_key.offset;
+	else
+		length = fs_info->nodesize;
+
+	if (!(found_key.objectid >= cache->start &&
+	       found_key.objectid + length <= cache->start + cache->length)) {
+		ret = -EUCLEAN;
+		goto out;
+	}
+	*offset_ret = found_key.objectid + length - cache->start;
+	ret = 0;
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache)
 {
@@ -667,6 +728,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 	int i;
 	u64 *alloc_offsets = NULL;
+	u64 last_alloc = 0;
 	u32 num_sequential = 0, num_conventional = 0;
 
 	if (!btrfs_is_zoned(fs_info))
@@ -752,12 +814,16 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	}
 
 	if (num_conventional > 0) {
-		/*
-		 * Since conventional zones do not have a write pointer, we
-		 * cannot determine alloc_offset from the pointer
-		 */
-		ret = -EINVAL;
-		goto out;
+		ret = calculate_alloc_pointer(fs_info, cache, &last_alloc);
+		if (ret || map->num_stripes == num_conventional) {
+			if (!ret)
+				cache->alloc_offset = last_alloc;
+			else
+				error(
+		"zoned: failed to determine allocation offset of bg %llu",
+					  cache->start);
+			goto out;
+		}
 	}
 
 	switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
@@ -779,6 +845,13 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 	}
 
 out:
+	/* An extent is allocated after the write pointer */
+	if (!ret && num_conventional && last_alloc > cache->alloc_offset) {
+		error("zoned: got wrong write pointer in BG %llu: %llu > %llu",
+		      logical, last_alloc, cache->alloc_offset);
+		ret = -EIO;
+	}
+
 	free(alloc_offsets);
 	return ret;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/26] btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (13 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 14/26] btrfs-progs: zoned: calculate allocation offset for conventional zones Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 16/26] btrfs-progs: zoned: reset zone of freed block group Naohiro Aota
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Tree manipulating operations like merging nodes often release
once-allocated tree nodes. Btrfs cleans such nodes so that pages in the
node are not uselessly written out. On ZONED drives, however, such
optimization blocks the following IOs as the cancellation of the write out
of the freed blocks breaks the sequential write sequence expected by the
device.

This patch check if next dirty extent buffer is continuous to a previously
written one. If not, it redirty extent buffers between the previous one and
the next one, so that all dirty buffers are written sequentially.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/ctree.h       |  1 +
 kernel-shared/transaction.c |  6 ++++++
 kernel-shared/zoned.c       | 30 ++++++++++++++++++++++++++++++
 kernel-shared/zoned.h       |  8 ++++++++
 4 files changed, 45 insertions(+)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index a68c8bd38bd2..3cca60323e3d 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -1140,6 +1140,7 @@ struct btrfs_block_group {
 	 * allocation. This is used only with ZONED mode enabled.
 	 */
 	u64 alloc_offset;
+	u64 write_offset;
 };
 
 struct btrfs_device;
diff --git a/kernel-shared/transaction.c b/kernel-shared/transaction.c
index a2e53fb8dfca..5b991651c28e 100644
--- a/kernel-shared/transaction.c
+++ b/kernel-shared/transaction.c
@@ -18,6 +18,7 @@
 #include "kernel-shared/disk-io.h"
 #include "kernel-shared/transaction.h"
 #include "kernel-shared/delayed-ref.h"
+#include "kernel-shared/zoned.h"
 #include "common/messages.h"
 
 struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
@@ -138,10 +139,15 @@ int __commit_transaction(struct btrfs_trans_handle *trans,
 	int ret;
 
 	while(1) {
+again:
 		ret = find_first_extent_bit(tree, 0, &start, &end,
 					    EXTENT_DIRTY);
 		if (ret)
 			break;
+
+		if (btrfs_redirty_extent_buffer_for_zoned(fs_info, start, end))
+			goto again;
+
 		while(start <= end) {
 			eb = find_first_extent_buffer(tree, start);
 			BUG_ON(!eb || eb->start != start);
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 715a7881328c..793c524ed66f 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -852,10 +852,40 @@ out:
 		ret = -EIO;
 	}
 
+	if (!ret)
+		cache->write_offset = cache->alloc_offset;
+
 	free(alloc_offsets);
 	return ret;
 }
 
+bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
+					   u64 start, u64 end)
+{
+	u64 next;
+	struct btrfs_block_group *cache;
+	struct extent_buffer *eb;
+
+	if (!btrfs_is_zoned(fs_info))
+		return false;
+
+	cache = btrfs_lookup_first_block_group(fs_info, start);
+	BUG_ON(!cache);
+
+	if (cache->start + cache->write_offset < start) {
+		next = cache->start + cache->write_offset;
+		BUG_ON(next + fs_info->nodesize > start);
+		eb = btrfs_find_create_tree_block(fs_info, next);
+		btrfs_mark_buffer_dirty(eb);
+		free_extent_buffer(eb);
+		return true;
+	}
+
+	cache->write_offset += (end + 1 - start);
+
+	return false;
+}
+
 #endif
 
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 45d77c8daa69..1ba5a9939a3c 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -87,6 +87,8 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 				 u64 hole_end, u64 num_bytes);
 int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache);
+bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
+					   u64 start, u64 end);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -122,6 +124,12 @@ static inline int btrfs_load_block_group_zone_info(
 	return 0;
 }
 
+static inline bool btrfs_redirty_extent_buffer_for_zoned(
+	struct btrfs_fs_info *fs_info, u64 start, u64 end)
+{
+	return false;
+}
+
 #endif /* BTRFS_ZONED */
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/26] btrfs-progs: zoned: reset zone of freed block group
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (14 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 15/26] btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 17/26] btrfs-progs: zoned: support resetting zoned device Naohiro Aota
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

When freeing a chunk, we can/should reset the underlying device zones for
the chunk. This commit introduces btrfs_reset_chunk_zones() and reset the
zones.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/extent-tree.c | 10 ++++++++++
 kernel-shared/zoned.c       | 28 ++++++++++++++++++++++++++++
 kernel-shared/zoned.h       |  8 ++++++++
 3 files changed, 46 insertions(+)

diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 7453bf9f49b6..e3ffe146606f 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -21,6 +21,7 @@
 #include <stdint.h>
 #include <math.h>
 #include "kerncompat.h"
+#include "kernel-lib/list.h"
 #include "kernel-lib/radix-tree.h"
 #include "kernel-lib/rbtree.h"
 #include "kernel-shared/ctree.h"
@@ -3013,6 +3014,15 @@ static int free_chunk_dev_extent_items(struct btrfs_trans_handle *trans,
 			       struct btrfs_chunk);
 	num_stripes = btrfs_chunk_num_stripes(path->nodes[0], chunk);
 	for (i = 0; i < num_stripes; i++) {
+		u64 devid = btrfs_stripe_devid_nr(path->nodes[0], chunk, i);
+		u64 offset = btrfs_stripe_offset_nr(path->nodes[0], chunk, i);
+		u64 length = btrfs_stripe_length(fs_info, path->nodes[0],
+						 chunk);
+
+		ret = btrfs_reset_chunk_zones(fs_info, devid, offset, length);
+		if (ret < 0)
+			goto out;
+
 		ret = free_dev_extent_item(trans, fs_info,
 			btrfs_stripe_devid_nr(path->nodes[0], chunk, i),
 			btrfs_stripe_offset_nr(path->nodes[0], chunk, i));
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 793c524ed66f..22e0245abaf6 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -886,6 +886,34 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
 	return false;
 }
 
+int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
+			    u64 offset, u64 length)
+{
+	struct btrfs_device *device;
+
+	list_for_each_entry(device, &fs_info->fs_devices->devices,
+			    dev_list) {
+		struct btrfs_zoned_device_info *zinfo;
+		struct blk_zone *reset;
+
+		if (device->devid != devid)
+			continue;
+
+		zinfo = device->zone_info;
+		if (!zone_is_sequential(zinfo, offset))
+			continue;
+
+		reset = &zinfo->zones[offset / zinfo->zone_size];
+		if (btrfs_reset_dev_zone(device->fd, reset)) {
+			error("zoned: failed to reset zone %llu: %m",
+			      offset / zinfo->zone_size);
+			return -EIO;
+		}
+	}
+
+	return 0;
+}
+
 #endif
 
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 1ba5a9939a3c..70044acc4d94 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -89,6 +89,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group *cache);
 bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
 					   u64 start, u64 end);
+int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
+			    u64 offset, u64 length);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -130,6 +132,12 @@ static inline bool btrfs_redirty_extent_buffer_for_zoned(
 	return false;
 }
 
+static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info,
+					  u64 devid, u64 offset, u64 length)
+{
+	return 0;
+}
+
 #endif /* BTRFS_ZONED */
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/26] btrfs-progs: zoned: support resetting zoned device
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (15 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 16/26] btrfs-progs: zoned: reset zone of freed block group Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 18/26] btrfs-progs: zoned: support zero out on zoned block device Naohiro Aota
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

All zones of zoned block devices should be reset before writing. Support
this by introducing PREP_DEVICE_ZONED.

btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if
it is sequential required zone, or discard the zone range otherwise.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 35 +++++++++++++++++++++++++++++++----
 common/device-utils.h |  2 ++
 kernel-shared/zoned.c | 33 +++++++++++++++++++++++++++++++++
 kernel-shared/zoned.h |  7 +++++++
 4 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index f5d5277e8fce..2687f1884619 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -25,6 +25,7 @@
 #include <blkid/blkid.h>
 #include "kernel-lib/sizes.h"
 #include "kernel-shared/disk-io.h"
+#include "kernel-shared/zoned.h"
 #include "common/device-utils.h"
 #include "common/internal.h"
 #include "common/messages.h"
@@ -49,7 +50,7 @@ static int discard_range(int fd, u64 start, u64 len)
 /*
  * Discard blocks in the given range in 1G chunks, the process is interruptible
  */
-static int discard_blocks(int fd, u64 start, u64 len)
+int discard_blocks(int fd, u64 start, u64 len)
 {
 	while (len > 0) {
 		/* 1G granularity */
@@ -155,6 +156,7 @@ out:
 int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		u64 max_block_count, unsigned opflags)
 {
+	struct btrfs_zoned_device_info *zinfo = NULL;
 	u64 block_count;
 	struct stat st;
 	int i, ret;
@@ -173,7 +175,27 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 	if (max_block_count)
 		block_count = min(block_count, max_block_count);
 
-	if (opflags & PREP_DEVICE_DISCARD) {
+	if (opflags & PREP_DEVICE_ZONED) {
+		ret = btrfs_get_zone_info(fd, file, &zinfo);
+		if (ret < 0 || !zinfo) {
+			error("zoned: unable to load zone information of %s",
+			      file);
+			return 1;
+		}
+		if (opflags & PREP_DEVICE_VERBOSE)
+			printf("Resetting device zones %s (%u zones) ...\n",
+			       file, zinfo->nr_zones);
+		/*
+		 * We cannot ignore zone reset errors for a zoned block
+		 * device as this could result in the inability to write to
+		 * non-empty sequential zones of the device.
+		 */
+		if (btrfs_reset_all_zones(fd, zinfo)) {
+			error("zoned: failed to reset device '%s' zones: %m",
+			      file);
+			goto err;
+		}
+	} else if (opflags & PREP_DEVICE_DISCARD) {
 		/*
 		 * We intentionally ignore errors from the discard ioctl.  It
 		 * is not necessary for the mkfs functionality but just an
@@ -198,17 +220,22 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 	if (ret < 0) {
 		errno = -ret;
 		error("failed to zero device '%s': %m", file);
-		return 1;
+		goto err;
 	}
 
 	ret = btrfs_wipe_existing_sb(fd);
 	if (ret < 0) {
 		error("cannot wipe superblocks on %s", file);
-		return 1;
+		goto err;
 	}
 
+	free(zinfo);
 	*block_count_ret = block_count;
 	return 0;
+
+err:
+	free(zinfo);
+	return 1;
 }
 
 u64 btrfs_device_size(int fd, struct stat *st)
diff --git a/common/device-utils.h b/common/device-utils.h
index d1799323d002..e7e638a57eb2 100644
--- a/common/device-utils.h
+++ b/common/device-utils.h
@@ -23,7 +23,9 @@
 #define	PREP_DEVICE_ZERO_END	(1U << 0)
 #define	PREP_DEVICE_DISCARD	(1U << 1)
 #define	PREP_DEVICE_VERBOSE	(1U << 2)
+#define	PREP_DEVICE_ZONED	(1U << 3)
 
+int discard_blocks(int fd, u64 start, u64 len);
 u64 get_partition_size(const char *dev);
 u64 disk_size(const char *path);
 u64 btrfs_device_size(int fd, struct stat *st);
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 22e0245abaf6..ba1399cce04d 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -361,6 +361,39 @@ static int report_zones(int fd, const char *file,
 	return 0;
 }
 
+/*
+ * Discard blocks in the zones of a zoned block device. Process this with
+ * zone size granularity so that blocks in conventional zones are discarded
+ * using discard_range and blocks in sequential zones are reset though a
+ * zone reset.
+ */
+int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
+{
+	unsigned int i;
+	int ret = 0;
+
+	ASSERT(zinfo);
+
+	/* Zone size granularity */
+	for (i = 0; i < zinfo->nr_zones; i++) {
+		if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+			ret = discard_blocks(fd,
+					     zinfo->zones[i].start << SECTOR_SHIFT,
+					     zinfo->zone_size);
+			if (ret == EOPNOTSUPP)
+				ret = 0;
+		} else if (zinfo->zones[i].cond != BLK_ZONE_COND_EMPTY) {
+			ret = btrfs_reset_dev_zone(fd, &zinfo->zones[i]);
+		} else {
+			ret = 0;
+		}
+
+		if (ret)
+			return ret;
+	}
+	return fsync(fd);
+}
+
 static int sb_log_location(int fd, struct blk_zone *zones, int rw,
 			   u64 *bytenr_ret)
 {
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 70044acc4d94..88831d2d787c 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -91,6 +91,7 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
 					   u64 start, u64 end);
 int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
 			    u64 offset, u64 length);
+int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -138,6 +139,12 @@ static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static inline int btrfs_reset_all_zones(int fd,
+					struct btrfs_zoned_device_info *zinfo)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif /* BTRFS_ZONED */
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 18/26] btrfs-progs: zoned: support zero out on zoned block device
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (16 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 17/26] btrfs-progs: zoned: support resetting zoned device Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 19/26] btrfs-progs: zoned: support wiping SB on sequential write zone Naohiro Aota
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

If we zero out a region in a sequential write required zone, we cannot
write to the region until we reset the zone. Thus, we must prohibit zeroing
out to a sequential write required zone.

zero_dev_clamped() is modified to take the zone information and it calls
zero_zone_blocks() if the device is host managed to avoid writing to
sequential write required zones.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 14 +++++++++-----
 common/device-utils.h |  1 +
 kernel-shared/zoned.c | 28 ++++++++++++++++++++++++++++
 kernel-shared/zoned.h |  9 +++++++++
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index 2687f1884619..c1006c501555 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -67,7 +67,7 @@ int discard_blocks(int fd, u64 start, u64 len)
 	return 0;
 }
 
-static int zero_blocks(int fd, off_t start, size_t len)
+int zero_blocks(int fd, off_t start, size_t len)
 {
 	char *buf = malloc(len);
 	int ret = 0;
@@ -86,7 +86,8 @@ static int zero_blocks(int fd, off_t start, size_t len)
 #define ZERO_DEV_BYTES SZ_2M
 
 /* don't write outside the device by clamping the region to the device size */
-static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size)
+static int zero_dev_clamped(int fd, struct btrfs_zoned_device_info *zinfo,
+			    off_t start, ssize_t len, u64 dev_size)
 {
 	off_t end = max(start, start + len);
 
@@ -99,6 +100,9 @@ static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size)
 	start = min_t(u64, start, dev_size);
 	end = min_t(u64, end, dev_size);
 
+	if (zinfo && zinfo->model == ZONED_HOST_MANAGED)
+		return zero_zone_blocks(fd, zinfo, start, end - start);
+
 	return zero_blocks(fd, start, end - start);
 }
 
@@ -209,12 +213,12 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		}
 	}
 
-	ret = zero_dev_clamped(fd, 0, ZERO_DEV_BYTES, block_count);
+	ret = zero_dev_clamped(fd, zinfo, 0, ZERO_DEV_BYTES, block_count);
 	for (i = 0 ; !ret && i < BTRFS_SUPER_MIRROR_MAX; i++)
-		ret = zero_dev_clamped(fd, btrfs_sb_offset(i),
+		ret = zero_dev_clamped(fd, zinfo, btrfs_sb_offset(i),
 				       BTRFS_SUPER_INFO_SIZE, block_count);
 	if (!ret && (opflags & PREP_DEVICE_ZERO_END))
-		ret = zero_dev_clamped(fd, block_count - ZERO_DEV_BYTES,
+		ret = zero_dev_clamped(fd, zinfo, block_count - ZERO_DEV_BYTES,
 				       ZERO_DEV_BYTES, block_count);
 
 	if (ret < 0) {
diff --git a/common/device-utils.h b/common/device-utils.h
index e7e638a57eb2..6eee3270e0c7 100644
--- a/common/device-utils.h
+++ b/common/device-utils.h
@@ -26,6 +26,7 @@
 #define	PREP_DEVICE_ZONED	(1U << 3)
 
 int discard_blocks(int fd, u64 start, u64 len);
+int zero_blocks(int fd, off_t start, size_t len);
 u64 get_partition_size(const char *dev);
 u64 disk_size(const char *path);
 u64 btrfs_device_size(int fd, struct stat *st);
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index ba1399cce04d..3c476eebf004 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -394,6 +394,34 @@ int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo)
 	return fsync(fd);
 }
 
+int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
+		     size_t len)
+{
+	size_t zone_len = zinfo->zone_size;
+	off_t ofst = start;
+	size_t count;
+	int ret;
+
+	/* Make sure that zero_blocks does not write sequential zones */
+	while (len > 0) {
+		/* Limit zero_blocks to a single zone */
+		count = min_t(size_t, len, zone_len);
+		if (count > zone_len - (ofst & (zone_len - 1)))
+			count = zone_len - (ofst & (zone_len - 1));
+
+		if (!zone_is_sequential(zinfo, ofst)) {
+			ret = zero_blocks(fd, ofst, count);
+			if (ret != 0)
+				return ret;
+		}
+
+		len -= count;
+		ofst += count;
+	}
+
+	return 0;
+}
+
 static int sb_log_location(int fd, struct blk_zone *zones, int rw,
 			   u64 *bytenr_ret)
 {
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 88831d2d787c..9e1ce3ae103f 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -92,6 +92,8 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info,
 int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
 			    u64 offset, u64 length);
 int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
+int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
+		     size_t len);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -145,6 +147,13 @@ static inline int btrfs_reset_all_zones(int fd,
 	return -EOPNOTSUPP;
 }
 
+static inline int zero_zone_blocks(int fd,
+				   struct btrfs_zoned_device_info *zinfo,
+				   off_t start, size_t len)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif /* BTRFS_ZONED */
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 19/26] btrfs-progs: zoned: support wiping SB on sequential write zone
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (17 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 18/26] btrfs-progs: zoned: support zero out on zoned block device Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 20/26] btrfs-progs: mkfs: zoned: detect and enable zoned feature flag Naohiro Aota
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

We cannot overwrite superblock magic in a sequential required zone.
Instead, we can reset the zone to wipe it.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index c1006c501555..4230654653aa 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -106,7 +106,7 @@ static int zero_dev_clamped(int fd, struct btrfs_zoned_device_info *zinfo,
 	return zero_blocks(fd, start, end - start);
 }
 
-static int btrfs_wipe_existing_sb(int fd)
+static int btrfs_wipe_existing_sb(int fd, struct btrfs_zoned_device_info *zinfo)
 {
 	const char *off = NULL;
 	size_t len = 0;
@@ -141,14 +141,26 @@ static int btrfs_wipe_existing_sb(int fd)
 	if (len > sizeof(buf))
 		len = sizeof(buf);
 
-	memset(buf, 0, len);
-	ret = pwrite(fd, buf, len, offset);
-	if (ret < 0) {
-		error("cannot wipe existing superblock: %m");
-		ret = -1;
-	} else if (ret != len) {
-		error("cannot wipe existing superblock: wrote %d of %zd", ret, len);
-		ret = -1;
+	if (!zone_is_sequential(zinfo, offset)) {
+		memset(buf, 0, len);
+		ret = pwrite(fd, buf, len, offset);
+		if (ret < 0) {
+			error("cannot wipe existing superblock: %m");
+			ret = -1;
+		} else if (ret != len) {
+			error("cannot wipe existing superblock: wrote %d of %zd",
+			      ret, len);
+			ret = -1;
+		}
+	} else {
+		struct blk_zone *zone = &zinfo->zones[offset / zinfo->zone_size];
+
+		ret = btrfs_reset_dev_zone(fd, zone);
+		if (ret < 0) {
+			error(
+		"zoned: failed to wipe zones containing superblock: %m");
+			ret = -1;
+		}
 	}
 	fsync(fd);
 
@@ -227,7 +239,7 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		goto err;
 	}
 
-	ret = btrfs_wipe_existing_sb(fd);
+	ret = btrfs_wipe_existing_sb(fd, zinfo);
 	if (ret < 0) {
 		error("cannot wipe superblocks on %s", file);
 		goto err;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 20/26] btrfs-progs: mkfs: zoned: detect and enable zoned feature flag
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (18 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 19/26] btrfs-progs: zoned: support wiping SB on sequential write zone Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 21/26] btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs Naohiro Aota
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

This commit make mkfs.btrfs aware of the "zoned" feature flag and prepare
the disks for mkfs.btrfs. It automatically detects host-managed zoned
device and enable the future.

It also add "zone_size" to struct btrfs_mkfs_config to track the zone size.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 mkfs/common.h |  1 +
 mkfs/main.c   | 28 ++++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/mkfs/common.h b/mkfs/common.h
index cc88db7183fb..4d86f5ef4ccc 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -65,6 +65,7 @@ struct btrfs_mkfs_config {
 	u64 num_bytes;
 	/* checksum algorithm to use */
 	enum btrfs_csum_type csum_type;
+	u64 zone_size;
 
 	/* Output fields, set during creation */
 
diff --git a/mkfs/main.c b/mkfs/main.c
index a903896289fa..42e6e6b58b04 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -37,6 +37,7 @@
 #include "kernel-shared/free-space-tree.h"
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/transaction.h"
+#include "kernel-shared/zoned.h"
 #include "common/utils.h"
 #include "common/path-utils.h"
 #include "common/device-utils.h"
@@ -900,6 +901,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	int metadata_profile_opt = 0;
 	int discard = 1;
 	int ssd = 0;
+	int zoned = 0;
 	int force_overwrite = 0;
 	char *source_dir = NULL;
 	bool source_dir_set = false;
@@ -1069,6 +1071,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	if (dev_cnt == 0)
 		print_usage(1);
 
+	zoned = features & BTRFS_FEATURE_INCOMPAT_ZONED;
+
 	if (source_dir_set && dev_cnt > 1) {
 		error("the option -r is limited to a single device");
 		goto error;
@@ -1109,6 +1113,19 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 
 	file = argv[optind++];
 	ssd = is_ssd(file);
+	if (zoned) {
+		if (!zone_size(file)) {
+			error("zoned: %s: zone size undefined", file);
+			exit(1);
+		}
+	} else if (zoned_model(file) == ZONED_HOST_MANAGED) {
+		if (verbose)
+			printf(
+	"Zoned: %s: host-managed device detected, setting zoned feature\n",
+			       file);
+		zoned = 1;
+		features |= BTRFS_FEATURE_INCOMPAT_ZONED;
+	}
 
 	/*
 	* Set default profiles according to number of added devices.
@@ -1278,7 +1295,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	ret = btrfs_prepare_device(fd, file, &dev_block_count, block_count,
 			(zero_end ? PREP_DEVICE_ZERO_END : 0) |
 			(discard ? PREP_DEVICE_DISCARD : 0) |
-			(verbose ? PREP_DEVICE_VERBOSE : 0));
+			(verbose ? PREP_DEVICE_VERBOSE : 0) |
+			(zoned ? PREP_DEVICE_ZONED : 0));
 	if (ret)
 		goto error;
 	if (block_count && block_count > dev_block_count) {
@@ -1309,6 +1327,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	mkfs_cfg.stripesize = stripesize;
 	mkfs_cfg.features = features;
 	mkfs_cfg.csum_type = csum_type;
+	mkfs_cfg.zone_size = zone_size(file);
 
 	ret = make_btrfs(fd, &mkfs_cfg);
 	if (ret) {
@@ -1391,7 +1410,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 				block_count,
 				(verbose ? PREP_DEVICE_VERBOSE : 0) |
 				(zero_end ? PREP_DEVICE_ZERO_END : 0) |
-				(discard ? PREP_DEVICE_DISCARD : 0));
+				(discard ? PREP_DEVICE_DISCARD : 0) |
+				(zoned ? PREP_DEVICE_ZONED : 0));
 		if (ret) {
 			goto error;
 		}
@@ -1502,6 +1522,10 @@ raid_groups:
 			btrfs_group_profile_str(metadata_profile),
 			pretty_size(allocation.system));
 		printf("SSD detected:       %s\n", ssd ? "yes" : "no");
+		printf("Zoned device:       %s\n", zoned ? "yes" : "no");
+		if (zoned)
+			printf("Zone size:          %s\n",
+			       pretty_size(fs_info->zone_size));
 		btrfs_parse_fs_features_to_string(features_buf, features);
 		printf("Incompat features:  %s\n", features_buf);
 		btrfs_parse_runtime_features_to_string(features_buf,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 21/26] btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (19 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 20/26] btrfs-progs: mkfs: zoned: detect and enable zoned feature flag Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 22/26] btrfs-progs: mkfs: zoned: tweak initial system block group placement Naohiro Aota
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

This commit disables some features which are incompatible with zoned btrfs.

RAID/DUP is disabled because we cannot handle two zone append writes to
different zones in the kernel. MIXED_BG is disabled because the allocated
metadata region will be write holes for data writes. Space-cache (v1)
require in-place updatings.

It also disables the "--rootdir" option for now. The copying from a
directory needs some tweaks for zoned btrfs (e.g. zone size aware space
calculation), and we do not implement them yet.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 mkfs/common.c |  5 ++++-
 mkfs/main.c   | 23 +++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index 368f3b06f75e..6b0c434fbd6a 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -204,7 +204,10 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_super_stripesize(&super, cfg->stripesize);
 	btrfs_set_super_csum_type(&super, cfg->csum_type);
 	btrfs_set_super_chunk_root_generation(&super, 1);
-	btrfs_set_super_cache_generation(&super, -1);
+	if (cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)
+		btrfs_set_super_cache_generation(&super, 0);
+	else
+		btrfs_set_super_cache_generation(&super, -1);
 	btrfs_set_super_incompat_flags(&super, cfg->features);
 	if (cfg->label)
 		__strncpy_null(super.label, cfg->label, BTRFS_LABEL_SIZE - 1);
diff --git a/mkfs/main.c b/mkfs/main.c
index 42e6e6b58b04..9407cdfa8fe7 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1191,6 +1191,23 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 		features |= BTRFS_FEATURE_INCOMPAT_RAID1C34;
 	}
 
+	if (zoned) {
+		if (source_dir_set) {
+			error("the option -r and zoned feature are incompatible");
+			exit(1);
+		}
+
+		if (features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) {
+			error("cannot enable mixed-bg with zoned feature");
+			exit(1);
+		}
+
+		if (features & BTRFS_FEATURE_INCOMPAT_RAID56) {
+			error("cannot enable RAID5/6 with zoned feature");
+			exit(1);
+		}
+	}
+
 	if (btrfs_check_nodesize(nodesize, sectorsize,
 				 features))
 		goto error;
@@ -1280,6 +1297,12 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	if (ret)
 		goto error;
 
+	if (zoned && ((metadata_profile | data_profile) &
+			      BTRFS_BLOCK_GROUP_PROFILE_MASK)) {
+		error("cannot use RAID/DUP profile on zoned mode");
+		goto error;
+	}
+
 	dev_cnt--;
 
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 22/26] btrfs-progs: mkfs: zoned: tweak initial system block group placement
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (20 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 21/26] btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 23/26] btrfs-progs: mkfs: zoned: use sbwrite to update superblock Naohiro Aota
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

On zoned btrfs, chunks must be aligned to zone size to ensure sequential
writing to a block group maps to sequential writing to a device zone. Thus,
we need to tweak the position and the size of the initial system block
group.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 mkfs/common.c | 26 ++++++++++++++++----------
 mkfs/main.c   | 21 ++++++++++++++++-----
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index 6b0c434fbd6a..3d10ad086754 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -22,6 +22,7 @@
 #include "kernel-shared/ctree.h"
 #include "kernel-shared/disk-io.h"
 #include "kernel-shared/volumes.h"
+#include "kernel-shared/zoned.h"
 #include "common/utils.h"
 #include "common/path-utils.h"
 #include "common/device-utils.h"
@@ -155,6 +156,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	int skinny_metadata = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
 	u64 num_bytes;
+	u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER;
+	u64 system_group_size =  BTRFS_MKFS_SYSTEM_GROUP_SIZE;
+
+	if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) {
+		system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES;
+		system_group_size = cfg->zone_size;
+	}
 
 	buf = malloc(sizeof(*buf) + max(cfg->sectorsize, cfg->nodesize));
 	if (!buf)
@@ -186,7 +194,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 
 	cfg->blocks[MKFS_SUPER_BLOCK] = BTRFS_SUPER_INFO_OFFSET;
 	for (i = 1; i < MKFS_BLOCK_COUNT; i++) {
-		cfg->blocks[i] = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER +
+		cfg->blocks[i] = system_group_offset +
 			cfg->nodesize * (i - 1);
 	}
 
@@ -323,8 +331,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_device_id(buf, dev_item, 1);
 	btrfs_set_device_generation(buf, dev_item, 0);
 	btrfs_set_device_total_bytes(buf, dev_item, num_bytes);
-	btrfs_set_device_bytes_used(buf, dev_item,
-				    BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+	btrfs_set_device_bytes_used(buf, dev_item, system_group_size);
 	btrfs_set_device_io_align(buf, dev_item, cfg->sectorsize);
 	btrfs_set_device_io_width(buf, dev_item, cfg->sectorsize);
 	btrfs_set_device_sector_size(buf, dev_item, cfg->sectorsize);
@@ -345,14 +352,14 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 
 	/* then we have chunk 0 */
 	btrfs_set_disk_key_objectid(&disk_key, BTRFS_FIRST_CHUNK_TREE_OBJECTID);
-	btrfs_set_disk_key_offset(&disk_key, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER);
+	btrfs_set_disk_key_offset(&disk_key, system_group_offset);
 	btrfs_set_disk_key_type(&disk_key, BTRFS_CHUNK_ITEM_KEY);
 	btrfs_set_item_key(buf, &disk_key, nritems);
 	btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
 	btrfs_set_item_size(buf, btrfs_item_nr(nritems), item_size);
 
 	chunk = btrfs_item_ptr(buf, nritems, struct btrfs_chunk);
-	btrfs_set_chunk_length(buf, chunk, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+	btrfs_set_chunk_length(buf, chunk, system_group_size);
 	btrfs_set_chunk_owner(buf, chunk, BTRFS_EXTENT_TREE_OBJECTID);
 	btrfs_set_chunk_stripe_len(buf, chunk, BTRFS_STRIPE_LEN);
 	btrfs_set_chunk_type(buf, chunk, BTRFS_BLOCK_GROUP_SYSTEM);
@@ -362,7 +369,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_chunk_num_stripes(buf, chunk, 1);
 	btrfs_set_stripe_devid_nr(buf, chunk, 0, 1);
 	btrfs_set_stripe_offset_nr(buf, chunk, 0,
-				   BTRFS_BLOCK_RESERVED_1M_FOR_SUPER);
+				   system_group_offset);
 	nritems++;
 
 	write_extent_buffer(buf, super.dev_item.uuid,
@@ -401,7 +408,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 		sizeof(struct btrfs_dev_extent);
 
 	btrfs_set_disk_key_objectid(&disk_key, 1);
-	btrfs_set_disk_key_offset(&disk_key, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER);
+	btrfs_set_disk_key_offset(&disk_key, system_group_offset);
 	btrfs_set_disk_key_type(&disk_key, BTRFS_DEV_EXTENT_KEY);
 	btrfs_set_item_key(buf, &disk_key, nritems);
 	btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
@@ -413,14 +420,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_dev_extent_chunk_objectid(buf, dev_extent,
 					BTRFS_FIRST_CHUNK_TREE_OBJECTID);
 	btrfs_set_dev_extent_chunk_offset(buf, dev_extent,
-					  BTRFS_BLOCK_RESERVED_1M_FOR_SUPER);
+					  system_group_offset);
 
 	write_extent_buffer(buf, chunk_tree_uuid,
 		    (unsigned long)btrfs_dev_extent_chunk_tree_uuid(dev_extent),
 		    BTRFS_UUID_SIZE);
 
-	btrfs_set_dev_extent_length(buf, dev_extent,
-				    BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+	btrfs_set_dev_extent_length(buf, dev_extent, system_group_size);
 	nritems++;
 
 	btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_DEV_TREE]);
diff --git a/mkfs/main.c b/mkfs/main.c
index 9407cdfa8fe7..915e42b7f9cd 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -71,8 +71,17 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 	u64 bytes_used;
 	u64 chunk_start = 0;
 	u64 chunk_size = 0;
+	u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER;
+	u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE;
 	int ret;
 
+	if (btrfs_is_zoned(fs_info)) {
+		/* Two zones are reserved for superblock */
+		system_group_offset = fs_info->zone_size *
+			BTRFS_NR_SB_LOG_ZONES;
+		system_group_size = fs_info->zone_size;
+	}
+
 	if (mixed)
 		flags |= BTRFS_BLOCK_GROUP_DATA;
 
@@ -92,9 +101,8 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 	 */
 	ret = btrfs_make_block_group(trans, fs_info, bytes_used,
 				     BTRFS_BLOCK_GROUP_SYSTEM,
-				     BTRFS_BLOCK_RESERVED_1M_FOR_SUPER,
-				     BTRFS_MKFS_SYSTEM_GROUP_SIZE);
-	allocation->system += BTRFS_MKFS_SYSTEM_GROUP_SIZE;
+				     system_group_offset, system_group_size);
+	allocation->system += system_group_size;
 	if (ret)
 		return ret;
 
@@ -917,6 +925,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	struct mkfs_allocation allocation = { 0 };
 	struct btrfs_mkfs_config mkfs_cfg;
 	enum btrfs_csum_type csum_type = BTRFS_CSUM_TYPE_CRC32;
+	u64 system_group_size;
 
 	crc32c_optimization_init();
 
@@ -1330,9 +1339,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	}
 
 	/* To create the first block group and chunk 0 in make_btrfs */
-	if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+	system_group_size = zoned ?
+		zone_size(file) : BTRFS_MKFS_SYSTEM_GROUP_SIZE;
+	if (dev_block_count < system_group_size) {
 		error("device is too small to make filesystem, must be at least %llu",
-				(unsigned long long)BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+				(unsigned long long)system_group_size);
 		goto error;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 23/26] btrfs-progs: mkfs: zoned: use sbwrite to update superblock
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (21 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 22/26] btrfs-progs: mkfs: zoned: tweak initial system block group placement Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 24/26] btrfs-progs: zoned: wipe temporary superblocks in superblock log zone Naohiro Aota
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Use sbwrite instead of pwrite to support superblock logging on zoned btrfs.
In addition, call fsync() to persist the superblock to ensure the write
order. It also helps us to detect an unaligned write (write to a position
other than the write pointer) error.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 mkfs/common.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index 3d10ad086754..cee6a54ae7a5 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -473,13 +473,16 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	buf->len = BTRFS_SUPER_INFO_SIZE;
 	csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0,
 			     cfg->csum_type);
-	ret = pwrite(fd, buf->data, BTRFS_SUPER_INFO_SIZE,
-			cfg->blocks[MKFS_SUPER_BLOCK]);
+	ret = sbwrite(fd, buf->data, cfg->blocks[MKFS_SUPER_BLOCK]);
 	if (ret != BTRFS_SUPER_INFO_SIZE) {
 		ret = (ret < 0 ? -errno : -EIO);
 		goto out;
 	}
 
+	ret = fsync(fd);
+	if (ret)
+		goto out;
+
 	ret = 0;
 
 out:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 24/26] btrfs-progs: zoned: wipe temporary superblocks in superblock log zone
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (22 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 23/26] btrfs-progs: mkfs: zoned: use sbwrite to update superblock Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 25/26] btrfs-progs: zoned: device-add: support ZONED device Naohiro Aota
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

Mkfs.btrfs uses a temporary superblock during the initialization process.
The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is
different from a regular superblock. As a result, libblkid, which only
supports the usual magic, cannot recognize the volume as btrfs. So, let's
wipe the temporary magic before writing out the usual superblock.

Technically, we can add the temporary magic to the libblkid's table. But,
it will result in recognizing a half-baked filesystem as btrfs, which is
not ideal.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 kernel-shared/disk-io.c |  6 ++++++
 kernel-shared/zoned.c   | 20 ++++++++++++++++++++
 kernel-shared/zoned.h   |  6 ++++++
 3 files changed, 32 insertions(+)

diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index d79d6a00cdf8..355010277ca9 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1951,6 +1951,12 @@ int close_ctree_fs_info(struct btrfs_fs_info *fs_info)
 	}
 
 	if (fs_info->finalize_on_close) {
+		ret = btrfs_wipe_temporary_sb(fs_info->fs_devices);
+		if (ret) {
+			error("zoned: failed to wipe temporary super blocks: %m");
+			goto skip_commit;
+		}
+
 		btrfs_set_super_magic(fs_info->super_copy, BTRFS_MAGIC);
 		root->fs_info->finalize_on_close = 0;
 		ret = write_all_supers(fs_info);
diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
index 3c476eebf004..8801ed43157e 100644
--- a/kernel-shared/zoned.c
+++ b/kernel-shared/zoned.c
@@ -975,6 +975,26 @@ int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
 	return 0;
 }
 
+int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices)
+{
+	struct list_head *head = &fs_devices->devices;
+	struct btrfs_device *dev;
+	int ret = 0;
+
+	list_for_each_entry(dev, head, dev_list) {
+		struct btrfs_zoned_device_info *zinfo = dev->zone_info;
+
+		if (!zinfo)
+			continue;
+
+		ret = btrfs_reset_dev_zone(dev->fd, &zinfo->zones[0]);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 #endif
 
 int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
index 9e1ce3ae103f..a2e84464a221 100644
--- a/kernel-shared/zoned.h
+++ b/kernel-shared/zoned.h
@@ -94,6 +94,7 @@ int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid,
 int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo);
 int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start,
 		     size_t len);
+int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices);
 #else
 #define sbread(fd, buf, offset) \
 	pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset)
@@ -154,6 +155,11 @@ static inline int zero_zone_blocks(int fd,
 	return -EOPNOTSUPP;
 }
 
+static inline int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices)
+{
+	return 0;
+}
+
 #endif /* BTRFS_ZONED */
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 25/26] btrfs-progs: zoned: device-add: support ZONED device
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (23 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 24/26] btrfs-progs: zoned: wipe temporary superblocks in superblock log zone Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-26  6:27 ` [PATCH 26/26] btrfs-progs: zoned: introduce zoned support for device replace Naohiro Aota
  2021-04-29 15:53 ` [PATCH 00/26] btrfs-progs: zoned: zoned block device support David Sterba
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

This patch check if the target file system is flagged as ZONED. If it is,
the device to be added is flagged PREP_DEVICE_ZONED.  Also add checks to
prevent mixing non-zoned devices and zoned devices.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/device.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/cmds/device.c b/cmds/device.c
index adc21053fbc8..4cc104b788bb 100644
--- a/cmds/device.c
+++ b/cmds/device.c
@@ -29,6 +29,7 @@
 #include "ioctl.h"
 #include "common/utils.h"
 #include "kernel-shared/volumes.h"
+#include "kernel-shared/zoned.h"
 #include "cmds/filesystem-usage.h"
 
 #include "cmds/commands.h"
@@ -65,6 +66,8 @@ static int cmd_device_add(const struct cmd_struct *cmd,
 	int force = 0;
 	int last_dev;
 	bool enqueue = false;
+	int zoned;
+	struct btrfs_ioctl_feature_flags feature_flags;
 
 	optind = 0;
 	while (1) {
@@ -113,12 +116,27 @@ static int cmd_device_add(const struct cmd_struct *cmd,
 		return 1;
 	}
 
+	ret = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags);
+	if (ret) {
+		error("error getting feature flags '%s': %m", mntpnt);
+		return 1;
+	}
+	zoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_ZONED;
+
 	for (i = optind; i < last_dev; i++){
 		struct btrfs_ioctl_vol_args ioctl_args;
 		int	devfd, res;
 		u64 dev_block_count = 0;
 		char *path;
 
+		if (!zoned && zoned_model(argv[i]) == ZONED_HOST_MANAGED) {
+			error(
+"zoned: cannot add host managed zoned device to non-ZONED file system '%s'",
+			      argv[i]);
+			ret++;
+			continue;
+		}
+
 		res = test_dev_for_mkfs(argv[i], force);
 		if (res) {
 			ret++;
@@ -134,7 +152,8 @@ static int cmd_device_add(const struct cmd_struct *cmd,
 
 		res = btrfs_prepare_device(devfd, argv[i], &dev_block_count, 0,
 				PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE |
-				(discard ? PREP_DEVICE_DISCARD : 0));
+				(discard ? PREP_DEVICE_DISCARD : 0) |
+				(zoned ? PREP_DEVICE_ZONED : 0));
 		close(devfd);
 		if (res) {
 			ret++;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 26/26] btrfs-progs: zoned: introduce zoned support for device replace
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (24 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 25/26] btrfs-progs: zoned: device-add: support ZONED device Naohiro Aota
@ 2021-04-26  6:27 ` Naohiro Aota
  2021-04-29 15:53 ` [PATCH 00/26] btrfs-progs: zoned: zoned block device support David Sterba
  26 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-26  6:27 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, Josef Bacik, Naohiro Aota

This patch checks if the target file system is flagged as ZONED. If it is,
the device to be added is flagged PREP_DEVICE_ZONED.  Also add checks to
prevent mixing non-zoned devices and zoned devices.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/replace.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/cmds/replace.c b/cmds/replace.c
index 53af8ca61898..1de4a6d3ca9f 100644
--- a/cmds/replace.c
+++ b/cmds/replace.c
@@ -122,12 +122,14 @@ static const char *const cmd_replace_start_usage[] = {
 static int cmd_replace_start(const struct cmd_struct *cmd,
 			     int argc, char **argv)
 {
+	struct btrfs_ioctl_feature_flags feature_flags;
 	struct btrfs_ioctl_dev_replace_args start_args = {0};
 	struct btrfs_ioctl_dev_replace_args status_args = {0};
 	int ret;
 	int i;
 	int fdmnt = -1;
 	int fddstdev = -1;
+	int zoned;
 	char *path;
 	char *srcdev;
 	char *dstdev = NULL;
@@ -182,6 +184,14 @@ static int cmd_replace_start(const struct cmd_struct *cmd,
 	if (fdmnt < 0)
 		goto leave_with_error;
 
+	ret = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags);
+	if (ret) {
+		error("zoned: ioctl(GET_FEATURES) on '%s' returns error: %m",
+		      path);
+		goto leave_with_error;
+	}
+	zoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_ZONED;
+
 	/* check for possible errors before backgrounding */
 	status_args.cmd = BTRFS_IOCTL_DEV_REPLACE_CMD_STATUS;
 	status_args.result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT;
@@ -286,7 +296,8 @@ static int cmd_replace_start(const struct cmd_struct *cmd,
 	strncpy((char *)start_args.start.tgtdev_name, dstdev,
 		BTRFS_DEVICE_PATH_NAME_MAX);
 	ret = btrfs_prepare_device(fddstdev, dstdev, &dstdev_block_count, 0,
-			PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE);
+			PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE |
+			(zoned ? PREP_DEVICE_ZONED : 0));
 	if (ret)
 		goto leave_with_error;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device
  2021-04-26  6:27 ` [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device Naohiro Aota
@ 2021-04-26  7:25   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26  7:25 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

On 26/04/2021 08:28, Naohiro Aota wrote:
> -int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags)
> +int btrfs_open_devices(struct btrfs_fs_info *fs_info,
> +		       struct btrfs_fs_devices *fs_devices, int flags)
>  {
>  	int fd;
>  	struct btrfs_device *device;
>  	int ret;
>  

Why not pass only fs_info and then have?
	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function
  2021-04-26  6:27 ` [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
@ 2021-04-26  7:26   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26  7:26 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik, Damien Le Moal

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices
  2021-04-26  6:27 ` [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices Naohiro Aota
@ 2021-04-26  7:32   ` Su Yue
  2021-04-27 16:45     ` David Sterba
  0 siblings, 1 reply; 44+ messages in thread
From: Su Yue @ 2021-04-26  7:32 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: David Sterba, linux-btrfs, Josef Bacik


On Mon 26 Apr 2021 at 14:27, Naohiro Aota <naohiro.aota@wdc.com> 
wrote:

> Get the zone information (number of zones and zone size) from 
> all the
> devices, if the volume contains a zoned block device. To avoid 
> costly
> run-time zone report commands to test the device zones type 
> during block
> allocation, it also records all the zone status (zone type, 
> write pointer
> position, etc.).
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  Makefile                |   2 +-
>  common/device-scan.c    |   2 +
>  kerncompat.h            |   4 +
>  kernel-shared/disk-io.c |  12 ++
>  kernel-shared/volumes.c |   2 +
>  kernel-shared/volumes.h |   2 +
>  kernel-shared/zoned.c   | 242 
>  ++++++++++++++++++++++++++++++++++++++++
>  kernel-shared/zoned.h   |  42 +++++++
>  8 files changed, 307 insertions(+), 1 deletion(-)
>  create mode 100644 kernel-shared/zoned.c
>  create mode 100644 kernel-shared/zoned.h
>
> diff --git a/Makefile b/Makefile
> index e288a336c81e..3dc0543982b2 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -169,7 +169,7 @@ libbtrfs_objects = common/send-stream.o 
> common/send-utils.o kernel-lib/rbtree.o
>  		   kernel-shared/free-space-cache.o 
>  kernel-shared/root-tree.o \
>  		   kernel-shared/volumes.o kernel-shared/transaction.o \
>  		   kernel-shared/free-space-tree.o repair.o 
>  kernel-shared/inode-item.o \
> -		   kernel-shared/file-item.o \
> +		   kernel-shared/file-item.o kernel-shared/zoned.o \
>  		   kernel-lib/raid56.o kernel-lib/tables.o \
>  		   common/device-scan.o common/path-utils.o \
>  		   common/utils.o libbtrfsutil/subvolume.o 
>  libbtrfsutil/stubs.o \
> diff --git a/common/device-scan.c b/common/device-scan.c
> index 01d2e0656583..74d7853afccb 100644
> --- a/common/device-scan.c
> +++ b/common/device-scan.c
> @@ -35,6 +35,7 @@
>  #include "kernel-shared/ctree.h"
>  #include "kernel-shared/volumes.h"
>  #include "kernel-shared/disk-io.h"
> +#include "kernel-shared/zoned.h"
>  #include "ioctl.h"
>
>  static int btrfs_scan_done = 0;
> @@ -198,6 +199,7 @@ int btrfs_add_to_fsid(struct 
> btrfs_trans_handle *trans,
>  	return 0;
>
>  out:
> +	free(device->zone_info);
>  	free(device);
>  	free(buf);
>  	return ret;
> diff --git a/kerncompat.h b/kerncompat.h
> index 7060326fe4f4..a39b79cba767 100644
> --- a/kerncompat.h
> +++ b/kerncompat.h
> @@ -76,6 +76,10 @@
>  #define ULONG_MAX       (~0UL)
>  #endif
>
> +#ifndef SECTOR_SHIFT
> +#define SECTOR_SHIFT 9
> +#endif
> +
>  #define __token_glue(a,b,c)	___token_glue(a,b,c)
>  #define ___token_glue(a,b,c)	a ## b ## c
>  #ifdef DEBUG_BUILD_CHECKS
> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
> index a78be1e7a692..0519cb2358b5 100644
> --- a/kernel-shared/disk-io.c
> +++ b/kernel-shared/disk-io.c
> @@ -29,6 +29,7 @@
>  #include "kernel-shared/disk-io.h"
>  #include "kernel-shared/volumes.h"
>  #include "kernel-shared/transaction.h"
> +#include "zoned.h"
>  #include "crypto/crc32c.h"
>  #include "common/utils.h"
>  #include "kernel-shared/print-tree.h"
> @@ -1314,6 +1315,17 @@ static struct btrfs_fs_info 
> *__open_ctree_fd(int fp, const char *path,
>  	if (!fs_info->chunk_root)
>  		return fs_info;
>
> +	/*
> +	 * Get zone type information of zoned block devices. This will 
> also
> +	 * handle emulation of a zoned filesystem if a regular device 
> has the
> +	 * zoned incompat feature flag set.
> +	 */
> +	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
> +	if (ret) {
> +		error("zoned: failed to read device zone info: %d", ret);
> +		goto out_chunk;
> +	}
> +
>  	eb = fs_info->chunk_root->node;
>  	read_extent_buffer(eb, fs_info->chunk_tree_uuid,
>  			   btrfs_header_chunk_tree_uuid(eb),
> diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
> index cbcf7bfa371d..63530a99b41c 100644
> --- a/kernel-shared/volumes.c
> +++ b/kernel-shared/volumes.c
> @@ -27,6 +27,7 @@
>  #include "kernel-shared/transaction.h"
>  #include "kernel-shared/print-tree.h"
>  #include "kernel-shared/volumes.h"
> +#include "zoned.h"
>  #include "common/utils.h"
>  #include "kernel-lib/raid56.h"
>
> @@ -357,6 +358,7 @@ again:
>  		/* free the memory */
>  		free(device->name);
>  		free(device->label);
> +		free(device->zone_info);
>  		free(device);
>  	}
>
> diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
> index faaa285dbf11..a64288d566d8 100644
> --- a/kernel-shared/volumes.h
> +++ b/kernel-shared/volumes.h
> @@ -45,6 +45,8 @@ struct btrfs_device {
>
>  	u64 generation;
>
> +	struct btrfs_zoned_device_info *zone_info;
> +
>  	/* the internal btrfs device id */
>  	u64 devid;
>
> diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
> new file mode 100644
> index 000000000000..370d93915c6e
> --- /dev/null
> +++ b/kernel-shared/zoned.c
> @@ -0,0 +1,242 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +
> +#include "kernel-lib/list.h"
> +#include "kernel-shared/volumes.h"
> +#include "kernel-shared/zoned.h"
> +#include "common/utils.h"
> +#include "common/device-utils.h"
> +#include "common/messages.h"
> +#include "mkfs/common.h"
> +
> +/* Maximum number of zones to report per ioctl(BLKREPORTZONE) 
> call */
> +#define BTRFS_REPORT_NR_ZONES   4096
> +
> +static int btrfs_get_dev_zone_info(struct btrfs_device 
> *device);
> +
> +enum btrfs_zoned_model zoned_model(const char *file)
> +{
> +	const char *host_aware = "host-aware";
> +	const char *host_managed = "host-managed";
> +	struct stat st;
> +	char model[32];
> +	int ret;
> +
> +	ret = stat(file, &st);
> +	if (ret < 0) {
> +		error("zoned: unable to stat %s", file);
> +		return -ENOENT;
> +	}
> +
> +	/* Consider a regular file as non-zoned device */
> +	if (!S_ISBLK(st.st_mode))
> +		return ZONED_NONE;
> +
> +	ret = queue_param(file, "zoned", model, sizeof(model));
> +	if (ret <= 0)
> +		return ZONED_NONE;
> +
> +	if (strncmp(model, host_aware, strlen(host_aware)) == 0)
> +		return ZONED_HOST_AWARE;
> +	if (strncmp(model, host_managed, strlen(host_managed)) == 0)
> +		return ZONED_HOST_MANAGED;
> +
> +	return ZONED_NONE;
> +}
> +
> +u64 zone_size(const char *file)
> +{
> +	char chunk[32];
> +	int ret;
> +
> +	ret = queue_param(file, "chunk_sectors", chunk, 
> sizeof(chunk));
> +	if (ret <= 0)
> +		return 0;
> +
> +	return strtoull((const char *)chunk, NULL, 10) << 
> SECTOR_SHIFT;
> +}
> +
> +#ifdef BTRFS_ZONED
> +static int report_zones(int fd, const char *file,
> +			struct btrfs_zoned_device_info *zinfo)
> +{
> +	u64 device_size;
> +	u64 zone_bytes = zone_size(file);
> +	size_t rep_size;
> +	u64 sector = 0;
> +	struct blk_zone_report *rep;
> +	struct blk_zone *zone;
> +	unsigned int i, n = 0;
> +	int ret;
> +
> +	/*
> +	 * Zones are guaranteed (by the kernel) to be a power of 2 
> number of
> +	 * sectors. Check this here and make sure that zones are not 
> too
> +	 * small.
> +	 */
> +	if (!zone_bytes || !is_power_of_2(zone_bytes)) {
> +		error("zoned: illegal zone size %llu (not a power of 2)",
> +		      zone_bytes);
> +		exit(1);
> +	}
> +	/*
> +	 * The zone size must be large enough to hold the initial 
> system
> +	 * block group for mkfs time.
> +	 */
> +	if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
> +		error("zoned: illegal zone size %llu (smaller than %d)",
> +		      zone_bytes, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
> +		exit(1);

I see many exit() calls in this patch and other patches.
Any special reasion?

--
Su
> +	}
> +
> +	/*
> +	 * No need to use btrfs_device_size() here, since it is 
> ensured
> +	 * that the file is block device.
> +	 */
> +	if (ioctl(fd, BLKGETSIZE64, &device_size) < 0) {
> +		error("zoned: ioctl(BLKGETSIZE64) failed on %s (%m)", 
> file);
> +		exit(1);
> +	}
> +
> +	/* Allocate the zone information array */
> +	zinfo->zone_size = zone_bytes;
> +	zinfo->nr_zones = device_size / zone_bytes;
> +	if (device_size & (zone_bytes - 1))
> +		zinfo->nr_zones++;
> +	zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct 
> blk_zone));
> +	if (!zinfo->zones) {
> +		error("zoned: no memory for zone information");
> +		exit(1);
> +	}
> +
> +	/* Allocate a zone report */
> +	rep_size = sizeof(struct blk_zone_report) +
> +		sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES;
> +	rep = malloc(rep_size);
> +	if (!rep) {
> +		error("zoned: no memory for zones report");
> +		exit(1);
> +	}
> +
> +	/* Get zone information */
> +	zone = (struct blk_zone *)(rep + 1);
> +	while (n < zinfo->nr_zones) {
> +		memset(rep, 0, rep_size);
> +		rep->sector = sector;
> +		rep->nr_zones = BTRFS_REPORT_NR_ZONES;
> +
> +		ret = ioctl(fd, BLKREPORTZONE, rep);
> +		if (ret != 0) {
> +			error("zoned: ioctl BLKREPORTZONE failed (%m)");
> +			exit(1);
> +		}
> +
> +		if (!rep->nr_zones)
> +			break;
> +
> +		for (i = 0; i < rep->nr_zones; i++) {
> +			if (n >= zinfo->nr_zones)
> +				break;
> +			memcpy(&zinfo->zones[n], &zone[i],
> +			       sizeof(struct blk_zone));
> +			n++;
> +		}
> +
> +		sector = zone[rep->nr_zones - 1].start +
> +			 zone[rep->nr_zones - 1].len;
> +	}
> +
> +	free(rep);
> +
> +	return 0;
> +}
> +
> +#endif
> +
> +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info 
> *fs_info)
> +{
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +	struct btrfs_device *device;
> +	int ret = 0;
> +
> +	/* fs_info->zone_size might not set yet. Use the incomapt flag 
> here. */
> +	if (!btrfs_fs_incompat(fs_info, ZONED))
> +		return 0;
> +
> +	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +		/* We can skip reading of zone info for missing devices */
> +		if (device->fd == -1)
> +			continue;
> +
> +		ret = btrfs_get_dev_zone_info(device);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static int btrfs_get_dev_zone_info(struct btrfs_device *device)
> +{
> +	struct btrfs_fs_info *fs_info = device->fs_info;
> +
> +	/*
> +	 * Cannot use btrfs_is_zoned here, since fs_info::zone_size 
> might not
> +	 * yet be set.
> +	 */
> +	if (!btrfs_fs_incompat(fs_info, ZONED))
> +		return 0;
> +
> +	if (device->zone_info)
> +		return 0;
> +
> +	return btrfs_get_zone_info(device->fd, device->name,
> +				   &device->zone_info);
> +}
> +
> +int btrfs_get_zone_info(int fd, const char *file,
> +			struct btrfs_zoned_device_info **zinfo_ret)
> +{
> +#ifdef BTRFS_ZONED
> +	struct btrfs_zoned_device_info *zinfo;
> +	int ret;
> +#endif
> +	enum btrfs_zoned_model model;
> +
> +	*zinfo_ret = NULL;
> +
> +	/* Check zone model */
> +	model = zoned_model(file);
> +	if (model == ZONED_NONE)
> +		return 0;
> +
> +#ifdef BTRFS_ZONED
> +	zinfo = calloc(1, sizeof(*zinfo));
> +	if (!zinfo) {
> +		error("zoned: no memory for zone information");
> +		exit(1);
> +	}
> +
> +	zinfo->model = model;
> +
> +	/* Get zone information */
> +	ret = report_zones(fd, file, zinfo);
> +	if (ret != 0) {
> +		kfree(zinfo);
> +		return ret;
> +	}
> +	*zinfo_ret = zinfo;
> +#else
> +	error("zoned: %s: Unsupported host-%s zoned block device", 
> file,
> +	      model == ZONED_HOST_MANAGED ? "managed" : "aware");
> +	if (model == ZONED_HOST_MANAGED)
> +		return -EOPNOTSUPP;
> +
> +	error("zoned: %s: handling host-aware block device as a 
> regular disk",
> +	      file);
> +#endif
> +
> +	return 0;
> +}
> diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h
> new file mode 100644
> index 000000000000..461a2d624c67
> --- /dev/null
> +++ b/kernel-shared/zoned.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __BTRFS_ZONED_H__
> +#define __BTRFS_ZONED_H__
> +
> +#include <stdbool.h>
> +#include "kerncompat.h"
> +
> +#ifdef BTRFS_ZONED
> +#include <linux/blkzoned.h>
> +#else
> +struct blk_zone {
> +	int dummy;
> +};
> +#endif /* BTRFS_ZONED */
> +
> +/*
> + * Zoned block device models.
> + */
> +enum btrfs_zoned_model {
> +	ZONED_NONE = 0,
> +	ZONED_HOST_AWARE,
> +	ZONED_HOST_MANAGED,
> +};
> +
> +/*
> + * Zone information for a zoned block device.
> + */
> +struct btrfs_zoned_device_info {
> +	enum btrfs_zoned_model	model;
> +	u64			zone_size;
> +	u32			nr_zones;
> +	struct blk_zone		*zones;
> +};
> +
> +enum btrfs_zoned_model zoned_model(const char *file);
> +u64 zone_size(const char *file);
> +int btrfs_get_zone_info(int fd, const char *file,
> +			struct btrfs_zoned_device_info **zinfo);
> +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info 
> *fs_info);
> +
> +#endif /* __BTRFS_ZONED_H__ */


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag
  2021-04-26  6:27 ` [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag Naohiro Aota
@ 2021-04-26  7:45   ` Johannes Thumshirn
  2021-04-27 15:45     ` David Sterba
  2021-04-27 15:46   ` David Sterba
  1 sibling, 1 reply; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26  7:45 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

On 26/04/2021 08:28, Naohiro Aota wrote:
> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
> index 569208a9e5b1..c0793339b531 100644
> --- a/common/fsfeatures.c
> +++ b/common/fsfeatures.c
> @@ -100,6 +100,14 @@ static const struct btrfs_feature mkfs_features[] = {
>  		NULL, 0,
>  		NULL, 0,
>  		"RAID1 with 3 or 4 copies" },
> +#ifdef BTRFS_ZONED
> +	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
> +		"zoned",
> +		NULL, 0,
> +		NULL, 0,
> +		NULL, 0,
> +		"support Zoned devices" },
> +#endif

Shouldn't we set the compat version to 5.12?
I.e.:
#ifdef BTRFS_ZONED
	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
		"zoned",
		VERSION_TO_STRING2(5,12),
		NULL, 0,
		NULL, 0,
		"support Zoned devices" },
#endif


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode
  2021-04-26  6:27 ` [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode Naohiro Aota
@ 2021-04-26  7:48   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26  7:48 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size
  2021-04-26  6:27 ` [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size Naohiro Aota
@ 2021-04-26  7:51   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26  7:51 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode
  2021-04-26  6:27 ` [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode Naohiro Aota
@ 2021-04-26  7:56   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26  7:56 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices
  2021-04-26  6:27 ` [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Naohiro Aota
@ 2021-04-26 13:43   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26 13:43 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

On 26/04/2021 08:28, Naohiro Aota wrote:
> +			}
>  		}
>  
> +
>  		if (!rep->nr_zones)
>  			break;

Nit: double newline

Otherwise
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode
  2021-04-26  6:27 ` [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode Naohiro Aota
@ 2021-04-26 16:04   ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-26 16:04 UTC (permalink / raw)
  To: Naohiro Aota, David Sterba; +Cc: linux-btrfs, Josef Bacik

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag
  2021-04-26  7:45   ` Johannes Thumshirn
@ 2021-04-27 15:45     ` David Sterba
  0 siblings, 0 replies; 44+ messages in thread
From: David Sterba @ 2021-04-27 15:45 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Naohiro Aota, David Sterba, linux-btrfs, Josef Bacik

On Mon, Apr 26, 2021 at 07:45:23AM +0000, Johannes Thumshirn wrote:
> On 26/04/2021 08:28, Naohiro Aota wrote:
> > diff --git a/common/fsfeatures.c b/common/fsfeatures.c
> > index 569208a9e5b1..c0793339b531 100644
> > --- a/common/fsfeatures.c
> > +++ b/common/fsfeatures.c
> > @@ -100,6 +100,14 @@ static const struct btrfs_feature mkfs_features[] = {
> >  		NULL, 0,
> >  		NULL, 0,
> >  		"RAID1 with 3 or 4 copies" },
> > +#ifdef BTRFS_ZONED
> > +	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
> > +		"zoned",
> > +		NULL, 0,
> > +		NULL, 0,
> > +		NULL, 0,
> > +		"support Zoned devices" },
> > +#endif
> 
> Shouldn't we set the compat version to 5.12?
> I.e.:
> #ifdef BTRFS_ZONED
> 	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
> 		"zoned",
> 		VERSION_TO_STRING2(5,12),
> 		NULL, 0,
> 		NULL, 0,
> 		"support Zoned devices" },
> #endif

Folded in, thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag
  2021-04-26  6:27 ` [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag Naohiro Aota
  2021-04-26  7:45   ` Johannes Thumshirn
@ 2021-04-27 15:46   ` David Sterba
  2021-04-28  0:07     ` Naohiro Aota
  1 sibling, 1 reply; 44+ messages in thread
From: David Sterba @ 2021-04-27 15:46 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: David Sterba, linux-btrfs, Josef Bacik

On Mon, Apr 26, 2021 at 03:27:20PM +0900, Naohiro Aota wrote:
> With the zoned feature enabled, a zoned block device-aware btrfs allocates
> block groups aligned to the device zones and always write in sequential
> zones at the zone write pointer position.
> 
> It also supports "emulated" zoned mode on a non-zoned device. In the
> emulated mode, btrfs emulates conventional zones by slicing the device with
> a fixed size.
> 
> We don't support conversion from the ext4 volume with the zoned feature
> because we can't be sure all the converted block groups are aligned to zone
> boundaries.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  common/fsfeatures.c        | 8 ++++++++
>  common/fsfeatures.h        | 3 ++-
>  kernel-shared/ctree.h      | 4 +++-
>  kernel-shared/print-tree.c | 1 +
>  4 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
> index 569208a9e5b1..c0793339b531 100644
> --- a/common/fsfeatures.c
> +++ b/common/fsfeatures.c
> @@ -100,6 +100,14 @@ static const struct btrfs_feature mkfs_features[] = {
>  		NULL, 0,
>  		NULL, 0,
>  		"RAID1 with 3 or 4 copies" },
> +#ifdef BTRFS_ZONED
> +	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
> +		"zoned",
> +		NULL, 0,
> +		NULL, 0,
> +		NULL, 0,
> +		"support Zoned devices" },
> +#endif
>  	/* Keep this one last */
>  	{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
>  };
> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
> index 74ec2a21caf6..1a7d7f62897f 100644
> --- a/common/fsfeatures.h
> +++ b/common/fsfeatures.h
> @@ -25,7 +25,8 @@
>  		| BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
>  
>  /*
> - * Avoid multi-device features (RAID56) and mixed block groups
> + * Avoid multi-device features (RAID56), mixed block groups, and zoned
> + * btrfs
>   */
>  #define BTRFS_CONVERT_ALLOWED_FEATURES				\
>  	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF			\

Looks like BTRFS_FEATURE_INCOMPAT_ZONED should be here.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices
  2021-04-26  7:32   ` Su Yue
@ 2021-04-27 16:45     ` David Sterba
  2021-04-28  0:09       ` Naohiro Aota
  0 siblings, 1 reply; 44+ messages in thread
From: David Sterba @ 2021-04-27 16:45 UTC (permalink / raw)
  To: Su Yue; +Cc: Naohiro Aota, David Sterba, linux-btrfs, Josef Bacik

On Mon, Apr 26, 2021 at 03:32:23PM +0800, Su Yue wrote:
> 
> On Mon 26 Apr 2021 at 14:27, Naohiro Aota <naohiro.aota@wdc.com> 
> wrote:
> 
> > Get the zone information (number of zones and zone size) from 
> > all the
> > devices, if the volume contains a zoned block device. To avoid 
> > costly
> > run-time zone report commands to test the device zones type 
> > during block
> > allocation, it also records all the zone status (zone type, 
> > write pointer
> > position, etc.).
> >
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  Makefile                |   2 +-
> >  common/device-scan.c    |   2 +
> >  kerncompat.h            |   4 +
> >  kernel-shared/disk-io.c |  12 ++
> >  kernel-shared/volumes.c |   2 +
> >  kernel-shared/volumes.h |   2 +
> >  kernel-shared/zoned.c   | 242 
> >  ++++++++++++++++++++++++++++++++++++++++
> >  kernel-shared/zoned.h   |  42 +++++++
> >  8 files changed, 307 insertions(+), 1 deletion(-)
> >  create mode 100644 kernel-shared/zoned.c
> >  create mode 100644 kernel-shared/zoned.h
> >
> > diff --git a/Makefile b/Makefile
> > index e288a336c81e..3dc0543982b2 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -169,7 +169,7 @@ libbtrfs_objects = common/send-stream.o 
> > common/send-utils.o kernel-lib/rbtree.o
> >  		   kernel-shared/free-space-cache.o 
> >  kernel-shared/root-tree.o \
> >  		   kernel-shared/volumes.o kernel-shared/transaction.o \
> >  		   kernel-shared/free-space-tree.o repair.o 
> >  kernel-shared/inode-item.o \
> > -		   kernel-shared/file-item.o \
> > +		   kernel-shared/file-item.o kernel-shared/zoned.o \
> >  		   kernel-lib/raid56.o kernel-lib/tables.o \
> >  		   common/device-scan.o common/path-utils.o \
> >  		   common/utils.o libbtrfsutil/subvolume.o 
> >  libbtrfsutil/stubs.o \
> > diff --git a/common/device-scan.c b/common/device-scan.c
> > index 01d2e0656583..74d7853afccb 100644
> > --- a/common/device-scan.c
> > +++ b/common/device-scan.c
> > @@ -35,6 +35,7 @@
> >  #include "kernel-shared/ctree.h"
> >  #include "kernel-shared/volumes.h"
> >  #include "kernel-shared/disk-io.h"
> > +#include "kernel-shared/zoned.h"
> >  #include "ioctl.h"
> >
> >  static int btrfs_scan_done = 0;
> > @@ -198,6 +199,7 @@ int btrfs_add_to_fsid(struct 
> > btrfs_trans_handle *trans,
> >  	return 0;
> >
> >  out:
> > +	free(device->zone_info);
> >  	free(device);
> >  	free(buf);
> >  	return ret;
> > diff --git a/kerncompat.h b/kerncompat.h
> > index 7060326fe4f4..a39b79cba767 100644
> > --- a/kerncompat.h
> > +++ b/kerncompat.h
> > @@ -76,6 +76,10 @@
> >  #define ULONG_MAX       (~0UL)
> >  #endif
> >
> > +#ifndef SECTOR_SHIFT
> > +#define SECTOR_SHIFT 9
> > +#endif
> > +
> >  #define __token_glue(a,b,c)	___token_glue(a,b,c)
> >  #define ___token_glue(a,b,c)	a ## b ## c
> >  #ifdef DEBUG_BUILD_CHECKS
> > diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
> > index a78be1e7a692..0519cb2358b5 100644
> > --- a/kernel-shared/disk-io.c
> > +++ b/kernel-shared/disk-io.c
> > @@ -29,6 +29,7 @@
> >  #include "kernel-shared/disk-io.h"
> >  #include "kernel-shared/volumes.h"
> >  #include "kernel-shared/transaction.h"
> > +#include "zoned.h"
> >  #include "crypto/crc32c.h"
> >  #include "common/utils.h"
> >  #include "kernel-shared/print-tree.h"
> > @@ -1314,6 +1315,17 @@ static struct btrfs_fs_info 
> > *__open_ctree_fd(int fp, const char *path,
> >  	if (!fs_info->chunk_root)
> >  		return fs_info;
> >
> > +	/*
> > +	 * Get zone type information of zoned block devices. This will 
> > also
> > +	 * handle emulation of a zoned filesystem if a regular device 
> > has the
> > +	 * zoned incompat feature flag set.
> > +	 */
> > +	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
> > +	if (ret) {
> > +		error("zoned: failed to read device zone info: %d", ret);
> > +		goto out_chunk;
> > +	}
> > +
> >  	eb = fs_info->chunk_root->node;
> >  	read_extent_buffer(eb, fs_info->chunk_tree_uuid,
> >  			   btrfs_header_chunk_tree_uuid(eb),
> > diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
> > index cbcf7bfa371d..63530a99b41c 100644
> > --- a/kernel-shared/volumes.c
> > +++ b/kernel-shared/volumes.c
> > @@ -27,6 +27,7 @@
> >  #include "kernel-shared/transaction.h"
> >  #include "kernel-shared/print-tree.h"
> >  #include "kernel-shared/volumes.h"
> > +#include "zoned.h"
> >  #include "common/utils.h"
> >  #include "kernel-lib/raid56.h"
> >
> > @@ -357,6 +358,7 @@ again:
> >  		/* free the memory */
> >  		free(device->name);
> >  		free(device->label);
> > +		free(device->zone_info);
> >  		free(device);
> >  	}
> >
> > diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
> > index faaa285dbf11..a64288d566d8 100644
> > --- a/kernel-shared/volumes.h
> > +++ b/kernel-shared/volumes.h
> > @@ -45,6 +45,8 @@ struct btrfs_device {
> >
> >  	u64 generation;
> >
> > +	struct btrfs_zoned_device_info *zone_info;
> > +
> >  	/* the internal btrfs device id */
> >  	u64 devid;
> >
> > diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
> > new file mode 100644
> > index 000000000000..370d93915c6e
> > --- /dev/null
> > +++ b/kernel-shared/zoned.c
> > @@ -0,0 +1,242 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <sys/ioctl.h>
> > +#include <linux/fs.h>
> > +
> > +#include "kernel-lib/list.h"
> > +#include "kernel-shared/volumes.h"
> > +#include "kernel-shared/zoned.h"
> > +#include "common/utils.h"
> > +#include "common/device-utils.h"
> > +#include "common/messages.h"
> > +#include "mkfs/common.h"
> > +
> > +/* Maximum number of zones to report per ioctl(BLKREPORTZONE) 
> > call */
> > +#define BTRFS_REPORT_NR_ZONES   4096
> > +
> > +static int btrfs_get_dev_zone_info(struct btrfs_device 
> > *device);
> > +
> > +enum btrfs_zoned_model zoned_model(const char *file)
> > +{
> > +	const char *host_aware = "host-aware";
> > +	const char *host_managed = "host-managed";
> > +	struct stat st;
> > +	char model[32];
> > +	int ret;
> > +
> > +	ret = stat(file, &st);
> > +	if (ret < 0) {
> > +		error("zoned: unable to stat %s", file);
> > +		return -ENOENT;
> > +	}
> > +
> > +	/* Consider a regular file as non-zoned device */
> > +	if (!S_ISBLK(st.st_mode))
> > +		return ZONED_NONE;
> > +
> > +	ret = queue_param(file, "zoned", model, sizeof(model));
> > +	if (ret <= 0)
> > +		return ZONED_NONE;
> > +
> > +	if (strncmp(model, host_aware, strlen(host_aware)) == 0)
> > +		return ZONED_HOST_AWARE;
> > +	if (strncmp(model, host_managed, strlen(host_managed)) == 0)
> > +		return ZONED_HOST_MANAGED;
> > +
> > +	return ZONED_NONE;
> > +}
> > +
> > +u64 zone_size(const char *file)
> > +{
> > +	char chunk[32];
> > +	int ret;
> > +
> > +	ret = queue_param(file, "chunk_sectors", chunk, 
> > sizeof(chunk));
> > +	if (ret <= 0)
> > +		return 0;
> > +
> > +	return strtoull((const char *)chunk, NULL, 10) << 
> > SECTOR_SHIFT;
> > +}
> > +
> > +#ifdef BTRFS_ZONED
> > +static int report_zones(int fd, const char *file,
> > +			struct btrfs_zoned_device_info *zinfo)
> > +{
> > +	u64 device_size;
> > +	u64 zone_bytes = zone_size(file);
> > +	size_t rep_size;
> > +	u64 sector = 0;
> > +	struct blk_zone_report *rep;
> > +	struct blk_zone *zone;
> > +	unsigned int i, n = 0;
> > +	int ret;
> > +
> > +	/*
> > +	 * Zones are guaranteed (by the kernel) to be a power of 2 
> > number of
> > +	 * sectors. Check this here and make sure that zones are not 
> > too
> > +	 * small.
> > +	 */
> > +	if (!zone_bytes || !is_power_of_2(zone_bytes)) {
> > +		error("zoned: illegal zone size %llu (not a power of 2)",
> > +		      zone_bytes);
> > +		exit(1);
> > +	}
> > +	/*
> > +	 * The zone size must be large enough to hold the initial 
> > system
> > +	 * block group for mkfs time.
> > +	 */
> > +	if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
> > +		error("zoned: illegal zone size %llu (smaller than %d)",
> > +		      zone_bytes, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
> > +		exit(1);
> 
> I see many exit() calls in this patch and other patches.
> Any special reasion?

Yeah, it should be turned into normal error returns and handling in the
callers.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator
  2021-04-26  6:27 ` [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator Naohiro Aota
@ 2021-04-27 17:19   ` David Sterba
  2021-04-27 19:58     ` Johannes Thumshirn
  0 siblings, 1 reply; 44+ messages in thread
From: David Sterba @ 2021-04-27 17:19 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: David Sterba, linux-btrfs, Josef Bacik

On Mon, Apr 26, 2021 at 03:27:27PM +0900, Naohiro Aota wrote:
> Implement a zoned chunk and device extent allocator. One device zone
> becomes a device extent so that a zone reset affects only this device
> extent and does not change the state of blocks in the neighbor device
> extents.
> 
> To implement the allocator, we need to extend the following functions for
> a zoned filesystem.
> 
> - init_alloc_chunk_ctl
> - dev_extent_search_start

This function is not present in current btrfs-progs codebase

>  static u64 dev_extent_search_start(struct btrfs_device *device, u64 start)
>  {
> +	u64 zone_size;
> +

So this does not apply. Looks like some intermediate patches are
missing. There's more missing code and several other conflicts.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator
  2021-04-27 17:19   ` David Sterba
@ 2021-04-27 19:58     ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2021-04-27 19:58 UTC (permalink / raw)
  To: dsterba, Naohiro Aota; +Cc: David Sterba, linux-btrfs, Josef Bacik

On 27/04/2021 19:21, David Sterba wrote:
> On Mon, Apr 26, 2021 at 03:27:27PM +0900, Naohiro Aota wrote:
>> Implement a zoned chunk and device extent allocator. One device zone
>> becomes a device extent so that a zone reset affects only this device
>> extent and does not change the state of blocks in the neighbor device
>> extents.
>>
>> To implement the allocator, we need to extend the following functions for
>> a zoned filesystem.
>>
>> - init_alloc_chunk_ctl
>> - dev_extent_search_start
> 
> This function is not present in current btrfs-progs codebase
> 
>>  static u64 dev_extent_search_start(struct btrfs_device *device, u64 start)
>>  {
>> +	u64 zone_size;
>> +
> 
> So this does not apply. Looks like some intermediate patches are
> missing. There's more missing code and several other conflicts.
> 

That's probably this series, aligning the user space code more to the kernel code:
https://lore.kernel.org/linux-btrfs/cover.1617694997.git.naohiro.aota@wdc.com/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag
  2021-04-27 15:46   ` David Sterba
@ 2021-04-28  0:07     ` Naohiro Aota
  0 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-28  0:07 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs, Josef Bacik

On Tue, Apr 27, 2021 at 05:46:36PM +0200, David Sterba wrote:
> On Mon, Apr 26, 2021 at 03:27:20PM +0900, Naohiro Aota wrote:
> > With the zoned feature enabled, a zoned block device-aware btrfs allocates
> > block groups aligned to the device zones and always write in sequential
> > zones at the zone write pointer position.
> > 
> > It also supports "emulated" zoned mode on a non-zoned device. In the
> > emulated mode, btrfs emulates conventional zones by slicing the device with
> > a fixed size.
> > 
> > We don't support conversion from the ext4 volume with the zoned feature
> > because we can't be sure all the converted block groups are aligned to zone
> > boundaries.
> > 
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  common/fsfeatures.c        | 8 ++++++++
> >  common/fsfeatures.h        | 3 ++-
> >  kernel-shared/ctree.h      | 4 +++-
> >  kernel-shared/print-tree.c | 1 +
> >  4 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/common/fsfeatures.c b/common/fsfeatures.c
> > index 569208a9e5b1..c0793339b531 100644
> > --- a/common/fsfeatures.c
> > +++ b/common/fsfeatures.c
> > @@ -100,6 +100,14 @@ static const struct btrfs_feature mkfs_features[] = {
> >  		NULL, 0,
> >  		NULL, 0,
> >  		"RAID1 with 3 or 4 copies" },
> > +#ifdef BTRFS_ZONED
> > +	{ "zoned", BTRFS_FEATURE_INCOMPAT_ZONED,
> > +		"zoned",
> > +		NULL, 0,
> > +		NULL, 0,
> > +		NULL, 0,
> > +		"support Zoned devices" },
> > +#endif
> >  	/* Keep this one last */
> >  	{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
> >  };
> > diff --git a/common/fsfeatures.h b/common/fsfeatures.h
> > index 74ec2a21caf6..1a7d7f62897f 100644
> > --- a/common/fsfeatures.h
> > +++ b/common/fsfeatures.h
> > @@ -25,7 +25,8 @@
> >  		| BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
> >  
> >  /*
> > - * Avoid multi-device features (RAID56) and mixed block groups
> > + * Avoid multi-device features (RAID56), mixed block groups, and zoned
> > + * btrfs
> >   */
> >  #define BTRFS_CONVERT_ALLOWED_FEATURES				\
> >  	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF			\
> 
> Looks like BTRFS_FEATURE_INCOMPAT_ZONED should be here.

Since, we do not support converting ext4 to zoned btrfs, I didn't list
it here. I do not think we can support the converting easily, since
ext4's data might not aligned to the zone boundary.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices
  2021-04-27 16:45     ` David Sterba
@ 2021-04-28  0:09       ` Naohiro Aota
  0 siblings, 0 replies; 44+ messages in thread
From: Naohiro Aota @ 2021-04-28  0:09 UTC (permalink / raw)
  To: dsterba, Su Yue, David Sterba, linux-btrfs, Josef Bacik

On Tue, Apr 27, 2021 at 06:45:55PM +0200, David Sterba wrote:
> On Mon, Apr 26, 2021 at 03:32:23PM +0800, Su Yue wrote:
> > 
> > On Mon 26 Apr 2021 at 14:27, Naohiro Aota <naohiro.aota@wdc.com> 
> > wrote:
> > 
> > > Get the zone information (number of zones and zone size) from 
> > > all the
> > > devices, if the volume contains a zoned block device. To avoid 
> > > costly
> > > run-time zone report commands to test the device zones type 
> > > during block
> > > allocation, it also records all the zone status (zone type, 
> > > write pointer
> > > position, etc.).
> > >
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > >  Makefile                |   2 +-
> > >  common/device-scan.c    |   2 +
> > >  kerncompat.h            |   4 +
> > >  kernel-shared/disk-io.c |  12 ++
> > >  kernel-shared/volumes.c |   2 +
> > >  kernel-shared/volumes.h |   2 +
> > >  kernel-shared/zoned.c   | 242 
> > >  ++++++++++++++++++++++++++++++++++++++++
> > >  kernel-shared/zoned.h   |  42 +++++++
> > >  8 files changed, 307 insertions(+), 1 deletion(-)
> > >  create mode 100644 kernel-shared/zoned.c
> > >  create mode 100644 kernel-shared/zoned.h
> > >
> > > diff --git a/Makefile b/Makefile
> > > index e288a336c81e..3dc0543982b2 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -169,7 +169,7 @@ libbtrfs_objects = common/send-stream.o 
> > > common/send-utils.o kernel-lib/rbtree.o
> > >  		   kernel-shared/free-space-cache.o 
> > >  kernel-shared/root-tree.o \
> > >  		   kernel-shared/volumes.o kernel-shared/transaction.o \
> > >  		   kernel-shared/free-space-tree.o repair.o 
> > >  kernel-shared/inode-item.o \
> > > -		   kernel-shared/file-item.o \
> > > +		   kernel-shared/file-item.o kernel-shared/zoned.o \
> > >  		   kernel-lib/raid56.o kernel-lib/tables.o \
> > >  		   common/device-scan.o common/path-utils.o \
> > >  		   common/utils.o libbtrfsutil/subvolume.o 
> > >  libbtrfsutil/stubs.o \
> > > diff --git a/common/device-scan.c b/common/device-scan.c
> > > index 01d2e0656583..74d7853afccb 100644
> > > --- a/common/device-scan.c
> > > +++ b/common/device-scan.c
> > > @@ -35,6 +35,7 @@
> > >  #include "kernel-shared/ctree.h"
> > >  #include "kernel-shared/volumes.h"
> > >  #include "kernel-shared/disk-io.h"
> > > +#include "kernel-shared/zoned.h"
> > >  #include "ioctl.h"
> > >
> > >  static int btrfs_scan_done = 0;
> > > @@ -198,6 +199,7 @@ int btrfs_add_to_fsid(struct 
> > > btrfs_trans_handle *trans,
> > >  	return 0;
> > >
> > >  out:
> > > +	free(device->zone_info);
> > >  	free(device);
> > >  	free(buf);
> > >  	return ret;
> > > diff --git a/kerncompat.h b/kerncompat.h
> > > index 7060326fe4f4..a39b79cba767 100644
> > > --- a/kerncompat.h
> > > +++ b/kerncompat.h
> > > @@ -76,6 +76,10 @@
> > >  #define ULONG_MAX       (~0UL)
> > >  #endif
> > >
> > > +#ifndef SECTOR_SHIFT
> > > +#define SECTOR_SHIFT 9
> > > +#endif
> > > +
> > >  #define __token_glue(a,b,c)	___token_glue(a,b,c)
> > >  #define ___token_glue(a,b,c)	a ## b ## c
> > >  #ifdef DEBUG_BUILD_CHECKS
> > > diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
> > > index a78be1e7a692..0519cb2358b5 100644
> > > --- a/kernel-shared/disk-io.c
> > > +++ b/kernel-shared/disk-io.c
> > > @@ -29,6 +29,7 @@
> > >  #include "kernel-shared/disk-io.h"
> > >  #include "kernel-shared/volumes.h"
> > >  #include "kernel-shared/transaction.h"
> > > +#include "zoned.h"
> > >  #include "crypto/crc32c.h"
> > >  #include "common/utils.h"
> > >  #include "kernel-shared/print-tree.h"
> > > @@ -1314,6 +1315,17 @@ static struct btrfs_fs_info 
> > > *__open_ctree_fd(int fp, const char *path,
> > >  	if (!fs_info->chunk_root)
> > >  		return fs_info;
> > >
> > > +	/*
> > > +	 * Get zone type information of zoned block devices. This will 
> > > also
> > > +	 * handle emulation of a zoned filesystem if a regular device 
> > > has the
> > > +	 * zoned incompat feature flag set.
> > > +	 */
> > > +	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
> > > +	if (ret) {
> > > +		error("zoned: failed to read device zone info: %d", ret);
> > > +		goto out_chunk;
> > > +	}
> > > +
> > >  	eb = fs_info->chunk_root->node;
> > >  	read_extent_buffer(eb, fs_info->chunk_tree_uuid,
> > >  			   btrfs_header_chunk_tree_uuid(eb),
> > > diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
> > > index cbcf7bfa371d..63530a99b41c 100644
> > > --- a/kernel-shared/volumes.c
> > > +++ b/kernel-shared/volumes.c
> > > @@ -27,6 +27,7 @@
> > >  #include "kernel-shared/transaction.h"
> > >  #include "kernel-shared/print-tree.h"
> > >  #include "kernel-shared/volumes.h"
> > > +#include "zoned.h"
> > >  #include "common/utils.h"
> > >  #include "kernel-lib/raid56.h"
> > >
> > > @@ -357,6 +358,7 @@ again:
> > >  		/* free the memory */
> > >  		free(device->name);
> > >  		free(device->label);
> > > +		free(device->zone_info);
> > >  		free(device);
> > >  	}
> > >
> > > diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
> > > index faaa285dbf11..a64288d566d8 100644
> > > --- a/kernel-shared/volumes.h
> > > +++ b/kernel-shared/volumes.h
> > > @@ -45,6 +45,8 @@ struct btrfs_device {
> > >
> > >  	u64 generation;
> > >
> > > +	struct btrfs_zoned_device_info *zone_info;
> > > +
> > >  	/* the internal btrfs device id */
> > >  	u64 devid;
> > >
> > > diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c
> > > new file mode 100644
> > > index 000000000000..370d93915c6e
> > > --- /dev/null
> > > +++ b/kernel-shared/zoned.c
> > > @@ -0,0 +1,242 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +#include <sys/ioctl.h>
> > > +#include <linux/fs.h>
> > > +
> > > +#include "kernel-lib/list.h"
> > > +#include "kernel-shared/volumes.h"
> > > +#include "kernel-shared/zoned.h"
> > > +#include "common/utils.h"
> > > +#include "common/device-utils.h"
> > > +#include "common/messages.h"
> > > +#include "mkfs/common.h"
> > > +
> > > +/* Maximum number of zones to report per ioctl(BLKREPORTZONE) 
> > > call */
> > > +#define BTRFS_REPORT_NR_ZONES   4096
> > > +
> > > +static int btrfs_get_dev_zone_info(struct btrfs_device 
> > > *device);
> > > +
> > > +enum btrfs_zoned_model zoned_model(const char *file)
> > > +{
> > > +	const char *host_aware = "host-aware";
> > > +	const char *host_managed = "host-managed";
> > > +	struct stat st;
> > > +	char model[32];
> > > +	int ret;
> > > +
> > > +	ret = stat(file, &st);
> > > +	if (ret < 0) {
> > > +		error("zoned: unable to stat %s", file);
> > > +		return -ENOENT;
> > > +	}
> > > +
> > > +	/* Consider a regular file as non-zoned device */
> > > +	if (!S_ISBLK(st.st_mode))
> > > +		return ZONED_NONE;
> > > +
> > > +	ret = queue_param(file, "zoned", model, sizeof(model));
> > > +	if (ret <= 0)
> > > +		return ZONED_NONE;
> > > +
> > > +	if (strncmp(model, host_aware, strlen(host_aware)) == 0)
> > > +		return ZONED_HOST_AWARE;
> > > +	if (strncmp(model, host_managed, strlen(host_managed)) == 0)
> > > +		return ZONED_HOST_MANAGED;
> > > +
> > > +	return ZONED_NONE;
> > > +}
> > > +
> > > +u64 zone_size(const char *file)
> > > +{
> > > +	char chunk[32];
> > > +	int ret;
> > > +
> > > +	ret = queue_param(file, "chunk_sectors", chunk, 
> > > sizeof(chunk));
> > > +	if (ret <= 0)
> > > +		return 0;
> > > +
> > > +	return strtoull((const char *)chunk, NULL, 10) << 
> > > SECTOR_SHIFT;
> > > +}
> > > +
> > > +#ifdef BTRFS_ZONED
> > > +static int report_zones(int fd, const char *file,
> > > +			struct btrfs_zoned_device_info *zinfo)
> > > +{
> > > +	u64 device_size;
> > > +	u64 zone_bytes = zone_size(file);
> > > +	size_t rep_size;
> > > +	u64 sector = 0;
> > > +	struct blk_zone_report *rep;
> > > +	struct blk_zone *zone;
> > > +	unsigned int i, n = 0;
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * Zones are guaranteed (by the kernel) to be a power of 2 
> > > number of
> > > +	 * sectors. Check this here and make sure that zones are not 
> > > too
> > > +	 * small.
> > > +	 */
> > > +	if (!zone_bytes || !is_power_of_2(zone_bytes)) {
> > > +		error("zoned: illegal zone size %llu (not a power of 2)",
> > > +		      zone_bytes);
> > > +		exit(1);
> > > +	}
> > > +	/*
> > > +	 * The zone size must be large enough to hold the initial 
> > > system
> > > +	 * block group for mkfs time.
> > > +	 */
> > > +	if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
> > > +		error("zoned: illegal zone size %llu (smaller than %d)",
> > > +		      zone_bytes, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
> > > +		exit(1);
> > 
> > I see many exit() calls in this patch and other patches.
> > Any special reasion?
> 
> Yeah, it should be turned into normal error returns and handling in the
> callers.

I'd like to abort the execution in the error case. But, yeah, it's
better to let the callers handle the cases. I'll fix in the next
version.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/26] btrfs-progs: zoned: zoned block device support
  2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
                   ` (25 preceding siblings ...)
  2021-04-26  6:27 ` [PATCH 26/26] btrfs-progs: zoned: introduce zoned support for device replace Naohiro Aota
@ 2021-04-29 15:53 ` David Sterba
  26 siblings, 0 replies; 44+ messages in thread
From: David Sterba @ 2021-04-29 15:53 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: David Sterba, linux-btrfs, Josef Bacik

On Mon, Apr 26, 2021 at 03:27:16PM +0900, Naohiro Aota wrote:
> This series implements user-land side support for zoned btrfs.
> 
> This series is based on misc-next + preparation series below.
> https://lore.kernel.org/linux-btrfs/cover.1617694997.git.naohiro.aota@wdc.com/

The prep patchset has been merged.

> Userland tool depends on patched util-linux (libblkid and wipefs) to handle
> log-structured superblock. The patches are available in the util-linux list.
> https://lore.kernel.org/util-linux/20210426055036.2103620-1-naohiro.aota@wdc.com/T/

I was wondering if we should implement some workarounds in case the
blkid utils don't have the zoned support. This will inevitably happen
that not all the tools (progs/kernel/blkid) will have the support, at
least temporarily.

We'd need only the detection and eventually lookup of the most recent
superblock.

> Naohiro Aota (26):
>   btrfs-progs: utils: Introduce queue_param helper function
>   btrfs-progs: provide fs_info from btrfs_device
>   btrfs-progs: build: zoned: Check zoned block device support
>   btrfs-progs: zoned: add new ZONED feature flag
>   btrfs-progs: zoned: get zone information of zoned block devices
>   btrfs-progs: zoned: check and enable ZONED mode
>   btrfs-progs: zoned: introduce max_zone_append_size
>   btrfs-progs: zoned: disallow mixed-bg in ZONED mode
>   btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices
>   btrfs-progs: zoned: implement log-structured superblock for ZONED mode
>   btrfs-progs: zoned: implement zoned chunk allocator
>   btrfs-progs: zoned: load zone's allocation offset
>   btrfs-progs: zoned: implement sequential extent allocation
>   btrfs-progs: zoned: calculate allocation offset for conventional zones
>   btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs
>   btrfs-progs: zoned: reset zone of freed block group
>   btrfs-progs: zoned: support resetting zoned device
>   btrfs-progs: zoned: support zero out on zoned block device
>   btrfs-progs: zoned: support wiping SB on sequential write zone
>   btrfs-progs: mkfs: zoned: detect and enable zoned feature flag
>   btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs
>   btrfs-progs: mkfs: zoned: tweak initial system block group placement
>   btrfs-progs: mkfs: zoned: use sbwrite to update superblock
>   btrfs-progs: zoned: wipe temporary superblocks in superblock log zone
>   btrfs-progs: zoned: device-add: support ZONED device
>   btrfs-progs: zoned: introduce zoned support for device replace

Now in devel. I did some fixups on the way but only minor ones. There
are still cleanups needed that we'll do as followup patches. I'd like to
also have some zoned tests inside progs testsuite so eg. mkfs can be
verified to work.

The kernel 5.12 is out so my plan for progs 5.12 release is sometime
next week. I'll probably do an rc1 with current devel so we have some
checkpoint before the full release.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-04-29 15:55 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26  6:27 [PATCH 00/26] btrfs-progs: zoned: zoned block device support Naohiro Aota
2021-04-26  6:27 ` [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
2021-04-26  7:26   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device Naohiro Aota
2021-04-26  7:25   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 03/26] btrfs-progs: build: zoned: Check zoned block device support Naohiro Aota
2021-04-26  6:27 ` [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag Naohiro Aota
2021-04-26  7:45   ` Johannes Thumshirn
2021-04-27 15:45     ` David Sterba
2021-04-27 15:46   ` David Sterba
2021-04-28  0:07     ` Naohiro Aota
2021-04-26  6:27 ` [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices Naohiro Aota
2021-04-26  7:32   ` Su Yue
2021-04-27 16:45     ` David Sterba
2021-04-28  0:09       ` Naohiro Aota
2021-04-26  6:27 ` [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode Naohiro Aota
2021-04-26  7:48   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size Naohiro Aota
2021-04-26  7:51   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode Naohiro Aota
2021-04-26  7:56   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Naohiro Aota
2021-04-26 13:43   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode Naohiro Aota
2021-04-26 16:04   ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator Naohiro Aota
2021-04-27 17:19   ` David Sterba
2021-04-27 19:58     ` Johannes Thumshirn
2021-04-26  6:27 ` [PATCH 12/26] btrfs-progs: zoned: load zone's allocation offset Naohiro Aota
2021-04-26  6:27 ` [PATCH 13/26] btrfs-progs: zoned: implement sequential extent allocation Naohiro Aota
2021-04-26  6:27 ` [PATCH 14/26] btrfs-progs: zoned: calculate allocation offset for conventional zones Naohiro Aota
2021-04-26  6:27 ` [PATCH 15/26] btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs Naohiro Aota
2021-04-26  6:27 ` [PATCH 16/26] btrfs-progs: zoned: reset zone of freed block group Naohiro Aota
2021-04-26  6:27 ` [PATCH 17/26] btrfs-progs: zoned: support resetting zoned device Naohiro Aota
2021-04-26  6:27 ` [PATCH 18/26] btrfs-progs: zoned: support zero out on zoned block device Naohiro Aota
2021-04-26  6:27 ` [PATCH 19/26] btrfs-progs: zoned: support wiping SB on sequential write zone Naohiro Aota
2021-04-26  6:27 ` [PATCH 20/26] btrfs-progs: mkfs: zoned: detect and enable zoned feature flag Naohiro Aota
2021-04-26  6:27 ` [PATCH 21/26] btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs Naohiro Aota
2021-04-26  6:27 ` [PATCH 22/26] btrfs-progs: mkfs: zoned: tweak initial system block group placement Naohiro Aota
2021-04-26  6:27 ` [PATCH 23/26] btrfs-progs: mkfs: zoned: use sbwrite to update superblock Naohiro Aota
2021-04-26  6:27 ` [PATCH 24/26] btrfs-progs: zoned: wipe temporary superblocks in superblock log zone Naohiro Aota
2021-04-26  6:27 ` [PATCH 25/26] btrfs-progs: zoned: device-add: support ZONED device Naohiro Aota
2021-04-26  6:27 ` [PATCH 26/26] btrfs-progs: zoned: introduce zoned support for device replace Naohiro Aota
2021-04-29 15:53 ` [PATCH 00/26] btrfs-progs: zoned: zoned block device support David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.