Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 00/15] btrfs-progs: zoned block device support
@ 2019-08-20  4:52 Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 01/15] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

This is a userland part of zoned block device support for btrfs.

Kernel side patch series:
https://lore.kernel.org/linux-btrfs/20190808093038.4163421-1-naohiro.aota@wdc.com/T/

Please see the kernel side for general description of zoned block device
support.

Patches 1 and 2 introduce some modification to prepare for the later
patches.

Patch 3 enable zoned support for the configuration script.

Patches 4 to 6 introduce functions to retrieve zone information from a
device and call them when opening a device.

Patches 7 to 12 adopts current implementations e.g., extent allocation for
HMZONED mode.

Patches 13 to 15 enables HMZONED mode in various sub-commands.

v2 https://lore.kernel.org/linux-btrfs/20190607131751.5359-1-naohiro.aota@wdc.com/
v1 https://lore.kernel.org/linux-btrfs/20180809181105.12856-1-naota@elisp.net/

Changelog:
v3:
 - Unified userland code and kernel code
 - Introduce common/hmzoned.c and put hmzoned related code there

Naohiro Aota (15):
  btrfs-progs: utils: Introduce queue_param helper function
  btrfs-progs: introduce raid parameters variables
  btrfs-progs: build: Check zoned block device support
  btrfs-progs: add new HMZONED feature flag
  btrfs-progs: Introduce zone block device helper functions
  btrfs-progs: load and check zone information
  btrfs-progs: avoid writing super block to sequential zones
  btrfs-progs: support discarding zoned device
  btrfs-progs: support zero out on zoned block device
  btrfs-progs: align device extent allocation to zone boundary
  btrfs-progs: do sequential allocation in HMZONED mode
  btrfs-progs: redirty clean extent buffers in seq
  btrfs-progs: mkfs: Zoned block device support
  btrfs-progs: device-add: support HMZONED device
  btrfs-progs: introduce support for device replace HMZONED device

 Makefile                  |   2 +-
 cmds/device.c             |  29 +-
 cmds/inspect-dump-super.c |   3 +-
 cmds/replace.c            |  12 +-
 common/device-scan.c      |  10 +
 common/device-utils.c     |  85 +++++-
 common/device-utils.h     |   4 +
 common/fsfeatures.c       |   8 +
 common/fsfeatures.h       |   2 +-
 common/hmzoned.c          | 590 ++++++++++++++++++++++++++++++++++++++
 common/hmzoned.h          |  90 ++++++
 configure.ac              |  13 +
 ctree.h                   |  21 +-
 disk-io.c                 |  10 +
 extent-tree.c             |  15 +
 kerncompat.h              |   2 +
 libbtrfsutil/btrfs.h      |   2 +
 mkfs/common.c             |  20 +-
 mkfs/common.h             |   1 +
 mkfs/main.c               | 107 +++----
 transaction.c             |   7 +
 volumes.c                 | 101 ++++++-
 volumes.h                 |  17 ++
 23 files changed, 1081 insertions(+), 70 deletions(-)
 create mode 100644 common/hmzoned.c
 create mode 100644 common/hmzoned.h

-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 01/15] btrfs-progs: utils: Introduce queue_param helper function
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 02/15] btrfs-progs: introduce raid parameters variables Naohiro Aota
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

Introduce the queue_param helper function to get a device request queue
parameter.  This helper will be used later to query information of a zoned
device.

Furthermore, rewrite is_ssd() using the helper function.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Naohiro] fixed error return value
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 46 +++++++++++++++++++++++++++++++++++++++++++
 common/device-utils.h |  1 +
 mkfs/main.c           | 40 ++-----------------------------------
 3 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index b03d62faaf21..7fa9386f4677 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -252,3 +252,49 @@ u64 get_partition_size(const char *dev)
 	return result;
 }
 
+/*
+ * Get a device request queue parameter.
+ */
+int queue_param(const char *file, const char *param, char *buf, size_t len)
+{
+	blkid_probe probe;
+	char wholedisk[PATH_MAX];
+	char sysfs_path[PATH_MAX];
+	dev_t devno;
+	int fd;
+	int ret;
+
+	probe = blkid_new_probe_from_filename(file);
+	if (!probe)
+		return 0;
+
+	/* Device number of this disk (possibly a partition) */
+	devno = blkid_probe_get_devno(probe);
+	if (!devno) {
+		blkid_free_probe(probe);
+		return 0;
+	}
+
+	/* Get whole disk name (not full path) for this devno */
+	ret = blkid_devno_to_wholedisk(devno,
+			wholedisk, sizeof(wholedisk), NULL);
+	if (ret) {
+		blkid_free_probe(probe);
+		return 0;
+	}
+
+	snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/%s",
+		 wholedisk, param);
+
+	blkid_free_probe(probe);
+
+	fd = open(sysfs_path, O_RDONLY);
+	if (fd < 0)
+		return 0;
+
+	len = read(fd, buf, len);
+	close(fd);
+
+	return len;
+}
+
diff --git a/common/device-utils.h b/common/device-utils.h
index 70d19cae3e50..d1799323d002 100644
--- a/common/device-utils.h
+++ b/common/device-utils.h
@@ -29,5 +29,6 @@ u64 disk_size(const char *path);
 u64 btrfs_device_size(int fd, struct stat *st);
 int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		u64 max_block_count, unsigned opflags);
+int queue_param(const char *file, const char *param, char *buf, size_t len);
 
 #endif
diff --git a/mkfs/main.c b/mkfs/main.c
index 971cb39534fd..948c84be5f39 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -426,49 +426,13 @@ static int zero_output_file(int out_fd, u64 size)
 
 static int is_ssd(const char *file)
 {
-	blkid_probe probe;
-	char wholedisk[PATH_MAX];
-	char sysfs_path[PATH_MAX];
-	dev_t devno;
-	int fd;
 	char rotational;
 	int ret;
 
-	probe = blkid_new_probe_from_filename(file);
-	if (!probe)
+	ret = queue_param(file, "rotational", &rotational, 1);
+	if (ret < 1)
 		return 0;
 
-	/* Device number of this disk (possibly a partition) */
-	devno = blkid_probe_get_devno(probe);
-	if (!devno) {
-		blkid_free_probe(probe);
-		return 0;
-	}
-
-	/* Get whole disk name (not full path) for this devno */
-	ret = blkid_devno_to_wholedisk(devno,
-			wholedisk, sizeof(wholedisk), NULL);
-	if (ret) {
-		blkid_free_probe(probe);
-		return 0;
-	}
-
-	snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/rotational",
-		 wholedisk);
-
-	blkid_free_probe(probe);
-
-	fd = open(sysfs_path, O_RDONLY);
-	if (fd < 0) {
-		return 0;
-	}
-
-	if (read(fd, &rotational, 1) < 1) {
-		close(fd);
-		return 0;
-	}
-	close(fd);
-
 	return rotational == '0';
 }
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 02/15] btrfs-progs: introduce raid parameters variables
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 01/15] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 03/15] btrfs-progs: build: Check zoned block device support Naohiro Aota
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

Userland btrfs_alloc_chunk() and its kernel side counterpart
__btrfs_alloc_chunk() is so diverged that it's difficult to use the kernel
code as is.

This commit introduces some RAID parameter variables and read them from
btrfs_raid_array as the same as in kernel land.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 volumes.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/volumes.c b/volumes.c
index 0e6fb1dbce15..f99fddc7cf6f 100644
--- a/volumes.c
+++ b/volumes.c
@@ -993,7 +993,19 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	int num_stripes = 1;
 	int max_stripes = 0;
 	int min_stripes = 1;
-	int sub_stripes = 0;
+	int sub_stripes;	/* sub_stripes info for map */
+	int dev_stripes __attribute__((unused));
+				/* stripes per dev */
+	int devs_max;		/* max devs to use */
+	int devs_min __attribute__((unused));
+				/* min devs needed */
+	int devs_increment __attribute__((unused));
+				/* ndevs has to be a multiple of this */
+	int ncopies __attribute__((unused));
+				/* how many copies to data has */
+	int nparity __attribute__((unused));
+				/* number of stripes worth of bytes to
+				   store parity information */
 	int looped = 0;
 	int ret;
 	int index;
@@ -1005,6 +1017,18 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		return -ENOSPC;
 	}
 
+	index = btrfs_bg_flags_to_raid_index(type);
+
+	sub_stripes = btrfs_raid_array[index].sub_stripes;
+	dev_stripes = btrfs_raid_array[index].dev_stripes;
+	devs_max = btrfs_raid_array[index].devs_max;
+	if (!devs_max)
+		devs_max = BTRFS_MAX_DEVS(info);
+	devs_min = btrfs_raid_array[index].devs_min;
+	devs_increment = btrfs_raid_array[index].devs_increment;
+	ncopies = btrfs_raid_array[index].ncopies;
+	nparity = btrfs_raid_array[index].nparity;
+
 	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
 		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
 			calc_size = SZ_8M;
@@ -1051,7 +1075,6 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		if (num_stripes < 4)
 			return -ENOSPC;
 		num_stripes &= ~(u32)1;
-		sub_stripes = 2;
 		min_stripes = 4;
 	}
 	if (type & (BTRFS_BLOCK_GROUP_RAID5)) {
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 03/15] btrfs-progs: build: Check zoned block device support
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 01/15] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 02/15] btrfs-progs: introduce raid parameters variables Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 04/15] btrfs-progs: add new HMZONED feature flag Naohiro Aota
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

If the kernel supports zoned block devices, the file
/usr/include/linux/blkzoned.h will be present. Check this and define
BTRFS_ZONED if the file is present.

If it present, enables HMZONED feature, if not disable it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 configure.ac | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/configure.ac b/configure.ac
index cf792eb5488b..c637f72a8fe6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -206,6 +206,18 @@ else
 AC_DEFINE([HAVE_OWN_FIEMAP_EXTENT_SHARED_DEFINE], [0], [We did not define FIEMAP_EXTENT_SHARED])
 fi
 
+AC_CHECK_HEADER(linux/blkzoned.h, [blkzoned_found=yes], [blkzoned_found=no])
+AC_ARG_ENABLE([zoned],
+  AS_HELP_STRING([--disable-zoned], [disable zoned block device support]),
+  [], [enable_zoned=$blkzoned_found]
+)
+
+AS_IF([test "x$enable_zoned" = xyes], [
+	AC_CHECK_HEADER(linux/blkzoned.h, [],
+		[AC_MSG_ERROR([Couldn't find linux/blkzoned.h])])
+	AC_DEFINE([BTRFS_ZONED], [1], [enable zoned block device support])
+])
+
 dnl Define <NAME>_LIBS= and <NAME>_CFLAGS= by pkg-config
 dnl
 dnl The default PKG_CHECK_MODULES() action-if-not-found is end the
@@ -307,6 +319,7 @@ AC_MSG_RESULT([
 	btrfs-restore zstd: ${enable_zstd}
 	Python bindings:    ${enable_python}
 	Python interpreter: ${PYTHON}
+	zoned device:       ${enable_zoned}
 
 	Type 'make' to compile.
 ])
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 04/15] btrfs-progs: add new HMZONED feature flag
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (2 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 03/15] btrfs-progs: build: Check zoned block device support Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 05/15] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

With this feature enabled, a zoned block device aware btrfs allocates block
groups aligned to the device zones and always write in sequential zones at
the zone write pointer position.

Enabling this feature also force disable conversion from ext4 volumes.

Note: this flag can be moved to COMPAT_RO, so that older kernel can read
but not write zoned block devices formatted with btrfs.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/inspect-dump-super.c | 3 ++-
 common/fsfeatures.c       | 8 ++++++++
 common/fsfeatures.h       | 2 +-
 ctree.h                   | 4 +++-
 libbtrfsutil/btrfs.h      | 2 ++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c
index 65fb3506eac6..e942d37f8c9b 100644
--- a/cmds/inspect-dump-super.c
+++ b/cmds/inspect-dump-super.c
@@ -229,7 +229,8 @@ static struct readable_flag_entry incompat_flags_array[] = {
 	DEF_INCOMPAT_FLAG_ENTRY(RAID56),
 	DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA),
 	DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES),
-	DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID)
+	DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID),
+	DEF_INCOMPAT_FLAG_ENTRY(HMZONED)
 };
 static const int incompat_flags_num = sizeof(incompat_flags_array) /
 				      sizeof(struct readable_flag_entry);
diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index 50934bd161b0..b5bbecd8cf62 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -86,6 +86,14 @@ static const struct btrfs_fs_feature {
 		VERSION_TO_STRING2(4,0),
 		NULL, 0,
 		"no explicit hole extents for files" },
+#ifdef BTRFS_ZONED
+	{ "hmzoned", BTRFS_FEATURE_INCOMPAT_HMZONED,
+		"hmzoned",
+		NULL, 0,
+		NULL, 0,
+		NULL, 0,
+		"support Host-Managed Zoned devices" },
+#endif
 	/* Keep this one last */
 	{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
 };
diff --git a/common/fsfeatures.h b/common/fsfeatures.h
index 3cc9452a3327..0918ee1aa113 100644
--- a/common/fsfeatures.h
+++ b/common/fsfeatures.h
@@ -25,7 +25,7 @@
 		| BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
 
 /*
- * Avoid multi-device features (RAID56) and mixed block groups
+ * Avoid multi-device features (RAID56), mixed block groups, and hmzoned device
  */
 #define BTRFS_CONVERT_ALLOWED_FEATURES				\
 	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF			\
diff --git a/ctree.h b/ctree.h
index 0d12563b7261..a56e18119069 100644
--- a/ctree.h
+++ b/ctree.h
@@ -490,6 +490,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID    (1ULL << 10)
+#define BTRFS_FEATURE_INCOMPAT_HMZONED		(1ULL << 11)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 
@@ -513,7 +514,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
 	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
-	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID |		\
+	 BTRFS_FEATURE_INCOMPAT_HMZONED)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
diff --git a/libbtrfsutil/btrfs.h b/libbtrfsutil/btrfs.h
index 944d50132456..5c415240f74c 100644
--- a/libbtrfsutil/btrfs.h
+++ b/libbtrfsutil/btrfs.h
@@ -268,6 +268,8 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+/* Missing */
+#define BTRFS_FEATURE_INCOMPAT_HMZONED		(1ULL << 11)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 05/15] btrfs-progs: Introduce zone block device helper functions
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (3 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 04/15] btrfs-progs: add new HMZONED feature flag Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 06/15] btrfs-progs: load and check zone information Naohiro Aota
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

This patch introduce several zone related functions: btrfs_get_zone_info()
to get zone information from the specified device and put the information
in zinfo, and zone_is_sequential() to check if a zone is a sequential
required zone.

btrfs_get_zone_info() is intentionaly works with "struct btrfs_zone_info"
instead of "struct btrfs_device". We need to load zone information at
btrfs_prepare_device(), but there are no "struct btrfs_device" at that
time.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 Makefile         |   2 +-
 common/hmzoned.c | 218 +++++++++++++++++++++++++++++++++++++++++++++++
 common/hmzoned.h |  67 +++++++++++++++
 3 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 common/hmzoned.c
 create mode 100644 common/hmzoned.h

diff --git a/Makefile b/Makefile
index 82417d19a9f8..60a9e8992864 100644
--- a/Makefile
+++ b/Makefile
@@ -140,7 +140,7 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o \
 	  inode.o file.o find-root.o free-space-tree.o common/help.o send-dump.o \
 	  common/fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
 	  delayed-ref.o common/format-output.o common/path-utils.o \
-	  common/device-utils.o common/device-scan.o
+	  common/device-utils.o common/device-scan.o common/hmzoned.o
 cmds_objects = cmds/subvolume.o cmds/filesystem.o cmds/device.o cmds/scrub.o \
 	       cmds/inspect.o cmds/balance.o cmds/send.o cmds/receive.o \
 	       cmds/quota.o cmds/qgroup.o cmds/replace.o check/main.o \
diff --git a/common/hmzoned.c b/common/hmzoned.c
new file mode 100644
index 000000000000..7114943458ef
--- /dev/null
+++ b/common/hmzoned.c
@@ -0,0 +1,218 @@
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ * Authors:
+ *      Naohiro Aota    <naohiro.aota@wdc.com>
+ *      Damien Le Moal  <damien.lemoal@wdc.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <sys/ioctl.h>
+
+#include "common/utils.h"
+#include "common/device-utils.h"
+#include "common/messages.h"
+#include "mkfs/common.h"
+#include "common/hmzoned.h"
+
+#define BTRFS_REPORT_NR_ZONES	8192
+
+enum btrfs_zoned_model zoned_model(const char *file)
+{
+	char model[32];
+	int ret;
+
+	ret = queue_param(file, "zoned", model, sizeof(model));
+	if (ret <= 0)
+		return ZONED_NONE;
+
+	if (strncmp(model, "host-aware", 10) == 0)
+		return ZONED_HOST_AWARE;
+	if (strncmp(model, "host-managed", 12) == 0)
+		return ZONED_HOST_MANAGED;
+
+	return ZONED_NONE;
+}
+
+size_t zone_size(const char *file)
+{
+	char chunk[32];
+	int ret;
+
+	ret = queue_param(file, "chunk_sectors", chunk, sizeof(chunk));
+	if (ret <= 0)
+		return 0;
+
+	return strtoul((const char *)chunk, NULL, 10) << 9;
+}
+
+#ifdef BTRFS_ZONED
+bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr)
+{
+	unsigned int zno;
+
+	if (zinfo->model == ZONED_NONE)
+		return false;
+
+	zno = bytenr / zinfo->zone_size;
+
+	/*
+	 * Only sequential write required zones on host-managed
+	 * devices cannot be written randomly.
+	 */
+	return zinfo->zones[zno].type == BLK_ZONE_TYPE_SEQWRITE_REQ;
+}
+
+static int report_zones(int fd, const char *file, u64 block_count,
+			struct btrfs_zone_info *zinfo)
+{
+	size_t zone_bytes = zone_size(file);
+	size_t rep_size;
+	u64 sector = 0;
+	struct blk_zone_report *rep;
+	struct blk_zone *zone;
+	unsigned int i, n = 0;
+	int ret;
+
+	/*
+	 * Zones are guaranteed (by the kernel) to be a power of 2 number of
+	 * sectors. Check this here and make sure that zones are not too
+	 * small.
+	 */
+	if (!zone_bytes || !is_power_of_2(zone_bytes)) {
+		error("Illegal zone size %zu (not a power of 2)", zone_bytes);
+		exit(1);
+	}
+	if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+		error("Illegal zone size %zu (smaller than %d)", zone_bytes,
+		      BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+		exit(1);
+	}
+
+	/* Allocate the zone information array */
+	zinfo->zone_size = zone_bytes;
+	zinfo->nr_zones = block_count / zone_bytes;
+	if (block_count & (zone_bytes - 1))
+		zinfo->nr_zones++;
+	zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct blk_zone));
+	if (!zinfo->zones) {
+		error("No memory for zone information");
+		exit(1);
+	}
+
+	/* Allocate a zone report */
+	rep_size = sizeof(struct blk_zone_report) +
+		sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES;
+	rep = malloc(rep_size);
+	if (!rep) {
+		error("No memory for zones report");
+		exit(1);
+	}
+
+	/* Get zone information */
+	zone = (struct blk_zone *)(rep + 1);
+	while (n < zinfo->nr_zones) {
+		memset(rep, 0, rep_size);
+		rep->sector = sector;
+		rep->nr_zones = BTRFS_REPORT_NR_ZONES;
+
+		ret = ioctl(fd, BLKREPORTZONE, rep);
+		if (ret != 0) {
+			error("ioctl BLKREPORTZONE failed (%s)",
+			      strerror(errno));
+			exit(1);
+		}
+
+		if (!rep->nr_zones)
+			break;
+
+		for (i = 0; i < rep->nr_zones; i++) {
+			if (n >= zinfo->nr_zones)
+				break;
+			memcpy(&zinfo->zones[n], &zone[i],
+			       sizeof(struct blk_zone));
+			n++;
+		}
+
+		sector = zone[rep->nr_zones - 1].start +
+			zone[rep->nr_zones - 1].len;
+	}
+
+	/*
+	 * We need at least one random write zone (a conventional zone or
+	 * a sequential write preferred zone on a host-aware device).
+	 */
+	if (zone_is_sequential(zinfo, 0)) {
+		error("No conventional zone at block 0");
+		exit(1);
+	}
+
+	free(rep);
+
+	return 0;
+}
+
+#endif
+
+int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
+			struct btrfs_zone_info *zinfo)
+{
+	struct stat st;
+	int ret;
+
+	memset(zinfo, 0, sizeof(struct btrfs_zone_info));
+
+	ret = fstat(fd, &st);
+	if (ret < 0) {
+		error("unable to stat %s", file);
+		return 1;
+	}
+
+	if (!S_ISBLK(st.st_mode))
+		return 0;
+
+	/* Check zone model */
+	zinfo->model = zoned_model(file);
+	if (zinfo->model == ZONED_NONE)
+		return 0;
+
+	if (zinfo->model == ZONED_HOST_MANAGED && !hmzoned) {
+		error(
+"%s: host-managed zoned block device (enable zone block device support with -O hmzoned)",
+		      file);
+		return 1;
+	}
+
+	if (!hmzoned) {
+		/* Treat host-aware devices as regular devices */
+		zinfo->model = ZONED_NONE;
+		return 0;
+	}
+
+#ifdef BTRFS_ZONED
+	/* Get zone information */
+	ret = report_zones(fd, file, btrfs_device_size(fd, &st), zinfo);
+	if (ret != 0)
+		return ret;
+#else
+	error("%s: Unsupported host-%s zoned block device", file,
+	      zinfo->model == ZONED_HOST_MANAGED ? "managed" : "aware");
+	if (zinfo->model == ZONED_HOST_MANAGED)
+		return 1;
+
+	error("%s: handling host-aware block device as a regular disk", file);
+#endif
+	return 0;
+}
diff --git a/common/hmzoned.h b/common/hmzoned.h
new file mode 100644
index 000000000000..fbcaaf2da20e
--- /dev/null
+++ b/common/hmzoned.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ * Authors:
+ *      Naohiro Aota    <naohiro.aota@wdc.com>
+ *      Damien Le Moal  <damien.lemoal@wdc.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_HMZONED_H__
+#define __BTRFS_HMZONED_H__
+
+#ifdef BTRFS_ZONED
+#include <linux/blkzoned.h>
+#else
+struct blk_zone {
+	int dummy;
+};
+#endif /* BTRFS_ZONED */
+
+/*
+ * Zoned block device models.
+ */
+enum btrfs_zoned_model {
+	ZONED_NONE = 0,
+	ZONED_HOST_AWARE,
+	ZONED_HOST_MANAGED,
+};
+
+/*
+ * Zone information for a zoned block device.
+ */
+struct btrfs_zone_info {
+	enum btrfs_zoned_model	model;
+	size_t			zone_size;
+	struct blk_zone	*zones;
+	unsigned int		nr_zones;
+};
+
+enum btrfs_zoned_model zoned_model(const char *file);
+size_t zone_size(const char *file);
+int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
+			struct btrfs_zone_info *zinfo);
+
+#ifdef BTRFS_ZONED
+bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr);
+#else
+static inline bool zone_is_sequential(struct btrfs_zone_info *zinfo,
+				      u64 bytenr)
+{
+	return true;
+}
+#endif /* BTRFS_ZONED */
+
+#endif /* __BTRFS_HMZONED_H__ */
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 06/15] btrfs-progs: load and check zone information
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (4 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 05/15] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 07/15] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

This patch checks if a device added to btrfs is a zoned block device. If it
is, load zones information and the zone size for the device.

For a btrfs volume composed of multiple zoned block devices, all devices
must have the same zone size.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-scan.c | 10 ++++++++++
 volumes.c            | 18 ++++++++++++++++++
 volumes.h            |  6 ++++++
 3 files changed, 34 insertions(+)

diff --git a/common/device-scan.c b/common/device-scan.c
index 2c5ae225f710..5df77b9b68d7 100644
--- a/common/device-scan.c
+++ b/common/device-scan.c
@@ -135,6 +135,16 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
 		goto out;
 	}
 
+	ret = btrfs_get_zone_info(fd, path, fs_info->fs_devices->hmzoned,
+				  &device->zone_info);
+	if (ret)
+		goto out;
+	if (device->zone_info.zone_size != fs_info->fs_devices->zone_size) {
+		error("Device zone size differ");
+		ret = -EINVAL;
+		goto out;
+	}
+
 	disk_super = (struct btrfs_super_block *)buf;
 	dev_item = &disk_super->dev_item;
 
diff --git a/volumes.c b/volumes.c
index f99fddc7cf6f..a0ebed547faa 100644
--- a/volumes.c
+++ b/volumes.c
@@ -196,6 +196,8 @@ static int device_list_add(const char *path,
 	u64 found_transid = btrfs_super_generation(disk_super);
 	bool metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
 		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+	bool hmzoned = btrfs_super_incompat_flags(disk_super) &
+		BTRFS_FEATURE_INCOMPAT_HMZONED;
 
 	if (metadata_uuid)
 		fs_devices = find_fsid(disk_super->fsid,
@@ -285,6 +287,7 @@ static int device_list_add(const char *path,
 	if (fs_devices->lowest_devid > devid) {
 		fs_devices->lowest_devid = devid;
 	}
+	fs_devices->hmzoned = hmzoned;
 	*fs_devices_ret = fs_devices;
 	return 0;
 }
@@ -355,6 +358,8 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags)
 	struct btrfs_device *device;
 	int ret;
 
+	fs_devices->zone_size = 0;
+
 	list_for_each_entry(device, &fs_devices->devices, dev_list) {
 		if (!device->name) {
 			printk("no name for device %llu, skip it now\n", device->devid);
@@ -378,6 +383,19 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags)
 		device->fd = fd;
 		if (flags & O_RDWR)
 			device->writeable = 1;
+
+		ret = btrfs_get_zone_info(fd, device->name, fs_devices->hmzoned,
+					  &device->zone_info);
+		if (ret != 0)
+			goto fail;
+		if (!fs_devices->zone_size) {
+			fs_devices->zone_size = device->zone_info.zone_size;
+		} else if (device->zone_info.zone_size !=
+			   fs_devices->zone_size) {
+			error("Device zone size differ");
+			ret = -EINVAL;
+			goto fail;
+		}
 	}
 	return 0;
 fail:
diff --git a/volumes.h b/volumes.h
index 586588c871ab..edbb0f36aa75 100644
--- a/volumes.h
+++ b/volumes.h
@@ -22,12 +22,15 @@
 #include "kerncompat.h"
 #include "ctree.h"
 
+#include "common/hmzoned.h"
+
 #define BTRFS_STRIPE_LEN	SZ_64K
 
 struct btrfs_device {
 	struct list_head dev_list;
 	struct btrfs_root *dev_root;
 	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_zone_info zone_info;
 
 	u64 total_ios;
 
@@ -87,6 +90,9 @@ struct btrfs_fs_devices {
 
 	int seeding;
 	struct btrfs_fs_devices *seed;
+
+	u64 zone_size;
+	bool hmzoned;
 };
 
 struct btrfs_bio_stripe {
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 07/15] btrfs-progs: avoid writing super block to sequential zones
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (5 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 06/15] btrfs-progs: load and check zone information Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 08/15] btrfs-progs: support discarding zoned device Naohiro Aota
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

It is not possible to write a super block copy in sequential write required
zones as this prevents in-place updates required for super blocks.  This
patch limits super block possible locations to zones accepting random
writes. In particular, the zone containing the first block of the device or
partition being formatted must accept random writes.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 disk-io.c | 10 ++++++++++
 volumes.h |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/disk-io.c b/disk-io.c
index be44eead5cef..5dd55723b9b7 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1632,6 +1632,14 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 				      BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
 		btrfs_csum_final(crc, &sb->csum[0]);
 
+		if (btrfs_dev_is_sequential(device, fs_info->super_bytenr)) {
+			errno = EIO;
+			error(
+"failed to write super block for devid %llu: require random write zone: %m",
+				device->devid);
+			return -EIO;
+		}
+
 		/*
 		 * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is
 		 * zero filled, we can use it directly
@@ -1660,6 +1668,8 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info,
 		bytenr = btrfs_sb_offset(i);
 		if (bytenr + BTRFS_SUPER_INFO_SIZE > device->total_bytes)
 			break;
+		if (btrfs_dev_is_sequential(device, bytenr))
+			continue;
 
 		btrfs_set_super_bytenr(sb, bytenr);
 
diff --git a/volumes.h b/volumes.h
index edbb0f36aa75..b5e7a07df5a8 100644
--- a/volumes.h
+++ b/volumes.h
@@ -319,4 +319,8 @@ int btrfs_fix_device_size(struct btrfs_fs_info *fs_info,
 			  struct btrfs_device *device);
 int btrfs_fix_super_size(struct btrfs_fs_info *fs_info);
 int btrfs_fix_device_and_super_size(struct btrfs_fs_info *fs_info);
+static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
+{
+	return zone_is_sequential(&device->zone_info, pos);
+}
 #endif
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 08/15] btrfs-progs: support discarding zoned device
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (6 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 07/15] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 09/15] btrfs-progs: support zero out on zoned block device Naohiro Aota
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

All zones of zoned block devices should be reset before writing. Support
this by introducing PREP_DEVICE_HMZONED.

This commit export discard_blocks() and use it from
btrfs_discard_all_zones().

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 23 +++++++++++++++++++++--
 common/device-utils.h |  2 ++
 common/hmzoned.c      | 27 +++++++++++++++++++++++++++
 common/hmzoned.h      |  5 +++++
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index 7fa9386f4677..c7046c22a9fb 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -29,6 +29,7 @@
 #include "common/internal.h"
 #include "common/messages.h"
 #include "common/utils.h"
+#include "common/hmzoned.h"
 
 #ifndef BLKDISCARD
 #define BLKDISCARD	_IO(0x12,119)
@@ -49,7 +50,7 @@ static int discard_range(int fd, u64 start, u64 len)
 /*
  * Discard blocks in the given range in 1G chunks, the process is interruptible
  */
-static int discard_blocks(int fd, u64 start, u64 len)
+int discard_blocks(int fd, u64 start, u64 len)
 {
 	while (len > 0) {
 		/* 1G granularity */
@@ -155,6 +156,7 @@ out:
 int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		u64 max_block_count, unsigned opflags)
 {
+	struct btrfs_zone_info zinfo;
 	u64 block_count;
 	struct stat st;
 	int i, ret;
@@ -173,7 +175,24 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 	if (max_block_count)
 		block_count = min(block_count, max_block_count);
 
-	if (opflags & PREP_DEVICE_DISCARD) {
+	ret = btrfs_get_zone_info(fd, file, opflags & PREP_DEVICE_HMZONED,
+				  &zinfo);
+	if (ret < 0)
+		return 1;
+
+	if (opflags & PREP_DEVICE_HMZONED) {
+		printf("Resetting device zones %s (%u zones) ...\n",
+			file, zinfo.nr_zones);
+		/*
+		 * We cannot ignore zone discard (reset) errors for a zoned
+		 * block device as this could result in the inability to
+		 * write to non-empty sequential zones of the device.
+		 */
+		if (btrfs_discard_all_zones(fd, &zinfo)) {
+			error("failed to reset device '%s' zones", file);
+			return 1;
+		}
+	} else if (opflags & PREP_DEVICE_DISCARD) {
 		/*
 		 * We intentionally ignore errors from the discard ioctl.  It
 		 * is not necessary for the mkfs functionality but just an
diff --git a/common/device-utils.h b/common/device-utils.h
index d1799323d002..885a46937e0d 100644
--- a/common/device-utils.h
+++ b/common/device-utils.h
@@ -23,7 +23,9 @@
 #define	PREP_DEVICE_ZERO_END	(1U << 0)
 #define	PREP_DEVICE_DISCARD	(1U << 1)
 #define	PREP_DEVICE_VERBOSE	(1U << 2)
+#define	PREP_DEVICE_HMZONED	(1U << 3)
 
+int discard_blocks(int fd, u64 start, u64 len);
 u64 get_partition_size(const char *dev);
 u64 disk_size(const char *path);
 u64 btrfs_device_size(int fd, struct stat *st);
diff --git a/common/hmzoned.c b/common/hmzoned.c
index 7114943458ef..70de111f22da 100644
--- a/common/hmzoned.c
+++ b/common/hmzoned.c
@@ -216,3 +216,30 @@ int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
 #endif
 	return 0;
 }
+
+/*
+ * Discard blocks in the zones of a zoned block device.
+ * Process this with zone size granularity so that blocks in
+ * conventional zones are discarded using discard_range and
+ * blocks in sequential zones are discarded though a zone reset.
+ */
+int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo)
+{
+	unsigned int i;
+
+	/* Zone size granularity */
+	for (i = 0; i < zinfo->nr_zones; i++) {
+		if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+			discard_blocks(fd, zinfo->zones[i].start << 9,
+				       zinfo->zone_size);
+		} else if (zinfo->zones[i].cond != BLK_ZONE_COND_EMPTY) {
+			struct blk_zone_range range = {
+				zinfo->zones[i].start,
+				zinfo->zone_size >> 9 };
+			if (ioctl(fd, BLKRESETZONE, &range) < 0)
+				return errno;
+		}
+	}
+
+	return 0;
+}
diff --git a/common/hmzoned.h b/common/hmzoned.h
index fbcaaf2da20e..c4e20ae71d21 100644
--- a/common/hmzoned.h
+++ b/common/hmzoned.h
@@ -56,12 +56,17 @@ int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
 
 #ifdef BTRFS_ZONED
 bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr);
+int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo);
 #else
 static inline bool zone_is_sequential(struct btrfs_zone_info *zinfo,
 				      u64 bytenr)
 {
 	return true;
 }
+static inline int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* BTRFS_ZONED */
 
 #endif /* __BTRFS_HMZONED_H__ */
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 09/15] btrfs-progs: support zero out on zoned block device
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (7 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 08/15] btrfs-progs: support discarding zoned device Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 10/15] btrfs-progs: align device extent allocation to zone boundary Naohiro Aota
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

If we zero out a region in a sequential write required zone, we cannot
write to the region until we reset the zone. Thus, we must prohibit zeroing
out to a sequential write required zone.

zero_dev_clamped() is modified to take the zone information and it calls
zero_zone_blocks() if the device is host managed to avoid writing to
sequential write required zones.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/device-utils.c | 14 +++++++++-----
 common/device-utils.h |  1 +
 common/hmzoned.c      | 29 +++++++++++++++++++++++++++++
 common/hmzoned.h      |  7 +++++++
 4 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/common/device-utils.c b/common/device-utils.c
index c7046c22a9fb..840399c860c3 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -67,7 +67,7 @@ int discard_blocks(int fd, u64 start, u64 len)
 	return 0;
 }
 
-static int zero_blocks(int fd, off_t start, size_t len)
+int zero_blocks(int fd, off_t start, size_t len)
 {
 	char *buf = malloc(len);
 	int ret = 0;
@@ -86,7 +86,8 @@ static int zero_blocks(int fd, off_t start, size_t len)
 #define ZERO_DEV_BYTES SZ_2M
 
 /* don't write outside the device by clamping the region to the device size */
-static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size)
+static int zero_dev_clamped(int fd, struct btrfs_zone_info *zinfo, off_t start,
+			    ssize_t len, u64 dev_size)
 {
 	off_t end = max(start, start + len);
 
@@ -99,6 +100,9 @@ static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size)
 	start = min_t(u64, start, dev_size);
 	end = min_t(u64, end, dev_size);
 
+	if (zinfo->model == ZONED_HOST_MANAGED)
+		return zero_zone_blocks(fd, zinfo, start, end - start);
+
 	return zero_blocks(fd, start, end - start);
 }
 
@@ -206,12 +210,12 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret,
 		}
 	}
 
-	ret = zero_dev_clamped(fd, 0, ZERO_DEV_BYTES, block_count);
+	ret = zero_dev_clamped(fd, &zinfo, 0, ZERO_DEV_BYTES, block_count);
 	for (i = 0 ; !ret && i < BTRFS_SUPER_MIRROR_MAX; i++)
-		ret = zero_dev_clamped(fd, btrfs_sb_offset(i),
+		ret = zero_dev_clamped(fd, &zinfo, btrfs_sb_offset(i),
 				       BTRFS_SUPER_INFO_SIZE, block_count);
 	if (!ret && (opflags & PREP_DEVICE_ZERO_END))
-		ret = zero_dev_clamped(fd, block_count - ZERO_DEV_BYTES,
+		ret = zero_dev_clamped(fd, &zinfo, block_count - ZERO_DEV_BYTES,
 				       ZERO_DEV_BYTES, block_count);
 
 	if (ret < 0) {
diff --git a/common/device-utils.h b/common/device-utils.h
index 885a46937e0d..7d5b622b8957 100644
--- a/common/device-utils.h
+++ b/common/device-utils.h
@@ -26,6 +26,7 @@
 #define	PREP_DEVICE_HMZONED	(1U << 3)
 
 int discard_blocks(int fd, u64 start, u64 len);
+int zero_blocks(int fd, off_t start, size_t len);
 u64 get_partition_size(const char *dev);
 u64 disk_size(const char *path);
 u64 btrfs_device_size(int fd, struct stat *st);
diff --git a/common/hmzoned.c b/common/hmzoned.c
index 70de111f22da..12eb8f551853 100644
--- a/common/hmzoned.c
+++ b/common/hmzoned.c
@@ -243,3 +243,32 @@ int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo)
 
 	return 0;
 }
+
+int zero_zone_blocks(int fd, struct btrfs_zone_info *zinfo, off_t start,
+		     size_t len)
+{
+	size_t zone_len = zinfo->zone_size;
+	off_t ofst = start;
+	size_t count;
+	int ret;
+
+	/* Make sure that zero_blocks does not write sequential zones */
+	while (len > 0) {
+
+		/* Limit zero_blocks to a single zone */
+		count = min_t(size_t, len, zone_len);
+		if (count > zone_len - (ofst & (zone_len - 1)))
+			count = zone_len - (ofst & (zone_len - 1));
+
+		if (!zone_is_sequential(zinfo, ofst)) {
+			ret = zero_blocks(fd, ofst, count);
+			if (ret != 0)
+				return ret;
+		}
+
+		len -= count;
+		ofst += count;
+	}
+
+	return 0;
+}
diff --git a/common/hmzoned.h b/common/hmzoned.h
index c4e20ae71d21..75812716ffd9 100644
--- a/common/hmzoned.h
+++ b/common/hmzoned.h
@@ -57,6 +57,8 @@ int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
 #ifdef BTRFS_ZONED
 bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr);
 int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo);
+int zero_zone_blocks(int fd, struct btrfs_zone_info *zinfo, off_t start,
+		     size_t len);
 #else
 static inline bool zone_is_sequential(struct btrfs_zone_info *zinfo,
 				      u64 bytenr)
@@ -67,6 +69,11 @@ static inline int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo)
 {
 	return -EOPNOTSUPP;
 }
+static int zero_zone_blocks(int fd, struct btrfs_zone_info *zinfo, off_t start,
+			    size_t len)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* BTRFS_ZONED */
 
 #endif /* __BTRFS_HMZONED_H__ */
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 10/15] btrfs-progs: align device extent allocation to zone boundary
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (8 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 09/15] btrfs-progs: support zero out on zoned block device Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 11/15] btrfs-progs: do sequential allocation in HMZONED mode Naohiro Aota
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

In HMZONED mode, align the device extents to zone boundaries so that a zone
reset affects only the device extent and does not change the state of
blocks in the neighbor device extents. Also, check that a region allocation
is always over empty same-type zones and it is not over any locations of
super block copies.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/hmzoned.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++
 common/hmzoned.h |  2 ++
 kerncompat.h     |  2 ++
 volumes.c        | 72 ++++++++++++++++++++++++++++++++++++++++------
 volumes.h        |  7 +++++
 5 files changed, 149 insertions(+), 8 deletions(-)

diff --git a/common/hmzoned.c b/common/hmzoned.c
index 12eb8f551853..b1d9f5574d35 100644
--- a/common/hmzoned.c
+++ b/common/hmzoned.c
@@ -26,6 +26,8 @@
 #include "common/messages.h"
 #include "mkfs/common.h"
 #include "common/hmzoned.h"
+#include "volumes.h"
+#include "disk-io.h"
 
 #define BTRFS_REPORT_NR_ZONES	8192
 
@@ -272,3 +274,75 @@ int zero_zone_blocks(int fd, struct btrfs_zone_info *zinfo, off_t start,
 
 	return 0;
 }
+
+static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos)
+{
+	struct btrfs_zone_info *zinfo = &device->zone_info;
+	unsigned int zno;
+
+	if (!zone_is_sequential(zinfo, pos))
+		return true;
+
+	zno = pos / zinfo->zone_size;
+	return zinfo->zones[zno].cond == BLK_ZONE_COND_EMPTY;
+}
+
+
+/*
+ * btrfs_check_allocatable_zones - check if spcecifeid region is
+ *                                 suitable for allocation
+ * @device:	the device to allocate a region
+ * @pos:	the position of the region
+ * @num_bytes:	the size of the region
+ *
+ * In non-ZONED device, anywhere is suitable for allocation. In ZONED
+ * device, check if
+ * 1) the region is not on non-empty zones,
+ * 2) all zones in the region have the same zone type,
+ * 3) it does not contain super block location, if the zones are
+ *    sequential.
+ */
+bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
+				   u64 num_bytes)
+{
+	struct btrfs_zone_info *zinfo = &device->zone_info;
+	u64 nzones, begin, end;
+	u64 sb_pos;
+	bool is_sequential;
+	int i;
+
+	if (zinfo->model == ZONED_NONE)
+		return true;
+
+	nzones = num_bytes / zinfo->zone_size;
+	begin = pos / zinfo->zone_size;
+	end = begin + nzones;
+
+	ASSERT(IS_ALIGNED(pos, zinfo->zone_size));
+	ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size));
+
+	if (end > zinfo->nr_zones)
+		return false;
+
+	is_sequential = btrfs_dev_is_sequential(device, pos);
+	if (is_sequential) {
+		for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+			sb_pos = btrfs_sb_offset(i);
+			if (!(sb_pos + BTRFS_SUPER_INFO_SIZE <= pos ||
+			      pos + end <= sb_pos))
+				return false;
+		}
+	}
+
+	while (num_bytes) {
+		if (!btrfs_dev_is_empty_zone(device, pos))
+			return false;
+		if (is_sequential != btrfs_dev_is_sequential(device, pos))
+			return false;
+
+		pos += zinfo->zone_size;
+		num_bytes -= zinfo->zone_size;
+	}
+
+	return true;
+}
diff --git a/common/hmzoned.h b/common/hmzoned.h
index 75812716ffd9..93759291871f 100644
--- a/common/hmzoned.h
+++ b/common/hmzoned.h
@@ -53,6 +53,8 @@ enum btrfs_zoned_model zoned_model(const char *file);
 size_t zone_size(const char *file);
 int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
 			struct btrfs_zone_info *zinfo);
+bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
+				   u64 num_bytes);
 
 #ifdef BTRFS_ZONED
 bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr);
diff --git a/kerncompat.h b/kerncompat.h
index 9fdc58e25d43..b5105df20a3e 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -28,6 +28,7 @@
 #include <assert.h>
 #include <stddef.h>
 #include <linux/types.h>
+#include <linux/kernel.h>
 #include <stdint.h>
 
 #include <features.h>
@@ -345,6 +346,7 @@ static inline void assert_trace(const char *assertion, const char *filename,
 
 /* Alignment check */
 #define IS_ALIGNED(x, a)                (((x) & ((typeof(x))(a) - 1)) == 0)
+#define ALIGN(x, a)		__ALIGN_KERNEL((x), (a))
 
 static inline int is_power_of_2(unsigned long n)
 {
diff --git a/volumes.c b/volumes.c
index a0ebed547faa..14ca3df0efdb 100644
--- a/volumes.c
+++ b/volumes.c
@@ -465,6 +465,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	int slot;
 	struct extent_buffer *l;
 	u64 min_search_start;
+	u64 zone_size;
 
 	/*
 	 * We don't want to overwrite the superblock on the drive nor any area
@@ -473,6 +474,13 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	 */
 	min_search_start = max(root->fs_info->alloc_start, (u64)SZ_1M);
 	search_start = max(search_start, min_search_start);
+	/*
+	 * For a zoned block device, skip the first zone of the device
+	 * entirely.
+	 */
+	zone_size = device->zone_info.zone_size;
+	search_start = max_t(u64, search_start, zone_size);
+	search_start = btrfs_zone_align(device, search_start);
 
 	path = btrfs_alloc_path();
 	if (!path)
@@ -481,6 +489,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	max_hole_start = search_start;
 	max_hole_size = 0;
 
+again:
 	if (search_start >= search_end) {
 		ret = -ENOSPC;
 		goto out;
@@ -525,6 +534,13 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 			goto next;
 
 		if (key.offset > search_start) {
+			if (!btrfs_check_allocatable_zones(device, search_start,
+							   num_bytes)) {
+				search_start += zone_size;
+				btrfs_release_path(path);
+				goto again;
+			}
+
 			hole_size = key.offset - search_start;
 
 			/*
@@ -567,6 +583,13 @@ next:
 	 * search_end may be smaller than search_start.
 	 */
 	if (search_end > search_start) {
+		if (!btrfs_check_allocatable_zones(device, search_start,
+						   num_bytes)) {
+			search_start += zone_size;
+			btrfs_release_path(path);
+			goto again;
+		}
+
 		hole_size = search_end - search_start;
 
 		if (hole_size > max_hole_size) {
@@ -610,6 +633,10 @@ int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans,
 	struct extent_buffer *leaf;
 	struct btrfs_key key;
 
+	/* Check alignment to zone for a zoned block device */
+	ASSERT(device->zone_info.model != ZONED_HOST_MANAGED ||
+	       IS_ALIGNED(start, device->zone_info.zone_size));
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -1012,17 +1039,13 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	int max_stripes = 0;
 	int min_stripes = 1;
 	int sub_stripes;	/* sub_stripes info for map */
-	int dev_stripes __attribute__((unused));
-				/* stripes per dev */
+	int dev_stripes;	/* stripes per dev */
 	int devs_max;		/* max devs to use */
-	int devs_min __attribute__((unused));
-				/* min devs needed */
+	int devs_min;		/* min devs needed */
 	int devs_increment __attribute__((unused));
 				/* ndevs has to be a multiple of this */
-	int ncopies __attribute__((unused));
-				/* how many copies to data has */
-	int nparity __attribute__((unused));
-				/* number of stripes worth of bytes to
+	int ncopies;		/* how many copies to data has */
+	int nparity;		/* number of stripes worth of bytes to
 				   store parity information */
 	int looped = 0;
 	int ret;
@@ -1030,6 +1053,8 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	int stripe_len = BTRFS_STRIPE_LEN;
 	struct btrfs_key key;
 	u64 offset;
+	bool hmzoned = info->fs_devices->hmzoned;
+	u64 zone_size = info->fs_devices->zone_size;
 
 	if (list_empty(dev_list)) {
 		return -ENOSPC;
@@ -1116,13 +1141,39 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 				    btrfs_super_stripesize(info->super_copy));
 	}
 
+	if (hmzoned) {
+		calc_size = zone_size;
+		max_chunk_size = round_down(max_chunk_size, zone_size);
+	}
+
 	/* we don't want a chunk larger than 10% of the FS */
 	percent_max = div_factor(btrfs_super_total_bytes(info->super_copy), 1);
 	max_chunk_size = min(percent_max, max_chunk_size);
 
+	if (hmzoned) {
+		int min_num_stripes = devs_min * dev_stripes;
+		int min_data_stripes = (min_num_stripes - nparity) / ncopies;
+		u64 min_chunk_size = min_data_stripes * zone_size;
+
+		max_chunk_size = max(round_down(max_chunk_size,
+						zone_size),
+				     min_chunk_size);
+	}
+
 again:
 	if (chunk_bytes_by_type(type, calc_size, num_stripes, sub_stripes) >
 	    max_chunk_size) {
+		if (hmzoned) {
+			/*
+			 * calc_size is fixed in HMZONED. Reduce
+			 * num_stripes instead.
+			 */
+			num_stripes = max_chunk_size / calc_size;
+			if (num_stripes < min_stripes)
+				return -ENOSPC;
+			goto again;
+		}
+
 		calc_size = max_chunk_size;
 		calc_size /= num_stripes;
 		calc_size /= stripe_len;
@@ -1133,6 +1184,9 @@ again:
 
 	calc_size /= stripe_len;
 	calc_size *= stripe_len;
+
+	ASSERT(!hmzoned || calc_size == zone_size);
+
 	INIT_LIST_HEAD(&private_devs);
 	cur = dev_list->next;
 	index = 0;
@@ -1214,6 +1268,8 @@ again:
 		if (ret < 0)
 			goto out_chunk_map;
 
+		ASSERT(!zone_size || IS_ALIGNED(dev_offset, zone_size));
+
 		device->bytes_used += calc_size;
 		ret = btrfs_update_device(trans, device);
 		if (ret < 0)
diff --git a/volumes.h b/volumes.h
index b5e7a07df5a8..d1326d068ca3 100644
--- a/volumes.h
+++ b/volumes.h
@@ -323,4 +323,11 @@ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
 {
 	return zone_is_sequential(&device->zone_info, pos);
 }
+static inline u64 btrfs_zone_align(struct btrfs_device *device, u64 pos)
+{
+	if (device->zone_info.model == ZONED_NONE)
+		return pos;
+
+	return ALIGN(pos, device->zone_info.zone_size);
+}
 #endif
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 11/15] btrfs-progs: do sequential allocation in HMZONED mode
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (9 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 10/15] btrfs-progs: align device extent allocation to zone boundary Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 12/15] btrfs-progs: redirty clean extent buffers in seq Naohiro Aota
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

On HMZONED drives, writes must always be sequential and directed at a block
group zone write pointer position. Thus, block allocation in a block group
must also be done sequentially using an allocation pointer equal to the
block group zone write pointer plus the number of blocks allocated but not
yet written.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/hmzoned.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++
 common/hmzoned.h |   7 ++
 ctree.h          |  16 ++++
 extent-tree.c    |  15 ++++
 4 files changed, 250 insertions(+)

diff --git a/common/hmzoned.c b/common/hmzoned.c
index b1d9f5574d35..0e54144259b7 100644
--- a/common/hmzoned.c
+++ b/common/hmzoned.c
@@ -31,6 +31,9 @@
 
 #define BTRFS_REPORT_NR_ZONES	8192
 
+/* Invalid allocation pointer value for missing devices */
+#define WP_MISSING_DEV ((u64)-1)
+
 enum btrfs_zoned_model zoned_model(const char *file)
 {
 	char model[32];
@@ -346,3 +349,212 @@ bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
 
 	return true;
 }
+
+int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
+				     struct btrfs_block_group_cache *cache)
+{
+	struct btrfs_device *device;
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	u64 logical = cache->key.objectid;
+	u64 length = cache->key.offset;
+	u64 physical = 0;
+	int ret = 0;
+	int alloc_type;
+	int i, j;
+	u64 zone_size = fs_info->fs_devices->zone_size;
+	u64 *alloc_offsets = NULL;
+
+	if (!btrfs_fs_incompat(fs_info, HMZONED))
+		return 0;
+
+	/* Sanity check */
+	if (logical == BTRFS_BLOCK_RESERVED_1M_FOR_SUPER) {
+		if (length + SZ_1M != zone_size) {
+			error("unaligned initial system block group");
+			return -EIO;
+		}
+	} else if (!IS_ALIGNED(length, zone_size)) {
+		error("unaligned block group at %llu + %llu", logical, length);
+		return -EIO;
+	}
+
+	/* Get the chunk mapping */
+	ce = search_cache_extent(&map_tree->cache_tree, logical);
+	if (!ce) {
+		error("failed to find block group at %llu", logical);
+		return -ENOENT;
+	}
+	map = container_of(ce, struct map_lookup, ce);
+
+	/*
+	 * Get the zone type: if the group is mapped to a non-sequential zone,
+	 * there is no need for the allocation offset (fit allocation is OK).
+	 */
+	alloc_type = -1;
+	alloc_offsets = calloc(map->num_stripes, sizeof(*alloc_offsets));
+	if (!alloc_offsets) {
+		error("failed to allocate alloc_offsets");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < map->num_stripes; i++) {
+		bool is_sequential;
+		struct blk_zone zone;
+
+		device = map->stripes[i].dev;
+		physical = map->stripes[i].physical;
+
+		is_sequential = btrfs_dev_is_sequential(device, physical);
+		if (alloc_type == -1)
+			alloc_type = is_sequential ?
+					BTRFS_ALLOC_SEQ : BTRFS_ALLOC_FIT;
+
+		if ((is_sequential && alloc_type != BTRFS_ALLOC_SEQ) ||
+		    (!is_sequential && alloc_type == BTRFS_ALLOC_SEQ)) {
+			error("found block group of mixed zone types");
+			ret = -EIO;
+			goto out;
+		}
+
+		if (!is_sequential)
+			continue;
+
+		/*
+		 * The group is mapped to a sequential zone. Get the zone write
+		 * pointer to determine the allocation offset within the zone.
+		 */
+		WARN_ON(!IS_ALIGNED(physical, zone_size));
+		zone = device->zone_info.zones[physical / zone_size];
+
+		switch (zone.cond) {
+		case BLK_ZONE_COND_OFFLINE:
+		case BLK_ZONE_COND_READONLY:
+			error("Offline/readonly zone %llu",
+			      physical / fs_info->fs_devices->zone_size);
+			ret = -EIO;
+			goto out;
+		case BLK_ZONE_COND_EMPTY:
+			alloc_offsets[i] = 0;
+			break;
+		case BLK_ZONE_COND_FULL:
+			alloc_offsets[i] = zone_size;
+			break;
+		default:
+			/* Partially used zone */
+			alloc_offsets[i] = ((zone.wp - zone.start) << 9);
+			break;
+		}
+	}
+
+	if (alloc_type == BTRFS_ALLOC_FIT)
+		goto out;
+
+	switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case 0: /* single */
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID1:
+		cache->alloc_offset = WP_MISSING_DEV;
+		for (i = 0; i < map->num_stripes; i++) {
+			if (alloc_offsets[i] == WP_MISSING_DEV)
+				continue;
+			if (cache->alloc_offset == WP_MISSING_DEV)
+				cache->alloc_offset = alloc_offsets[i];
+			if (alloc_offsets[i] == cache->alloc_offset)
+				continue;
+
+			error("write pointer mismatch: block group %llu",
+			      logical);
+			ret = -EIO;
+			goto out;
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID0:
+		cache->alloc_offset = 0;
+		for (i = 0; i < map->num_stripes; i++) {
+			if (alloc_offsets[i] == WP_MISSING_DEV) {
+				error("cannot recover write pointer: block group %llu",
+				      logical);
+				ret = -EIO;
+				goto out;
+			}
+
+			if (alloc_offsets[0] < alloc_offsets[i]) {
+				error(
+				"write pointer mismatch: block group %llu",
+				      logical);
+				ret = -EIO;
+				goto out;
+
+			}
+
+			cache->alloc_offset += alloc_offsets[i];
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID10:
+		/*
+		 * Pass1: check write pointer of RAID1 level: each pointer
+		 * should be equal.
+		 */
+		for (i = 0; i < map->num_stripes / map->sub_stripes; i++) {
+			int base = i * map->sub_stripes;
+			u64 offset = WP_MISSING_DEV;
+
+			for (j = 0; j < map->sub_stripes; j++) {
+				if (alloc_offsets[base + j] == WP_MISSING_DEV)
+					continue;
+				if (offset == WP_MISSING_DEV)
+					offset = alloc_offsets[base+j];
+				if (alloc_offsets[base + j] == offset)
+					continue;
+
+				error(
+				"write pointer mismatch: block group %llu",
+				      logical);
+				ret = -EIO;
+				goto out;
+			}
+			for (j = 0; j < map->sub_stripes; j++)
+				alloc_offsets[base + j] = offset;
+		}
+
+		/* Pass2: check write pointer of RAID1 level */
+		cache->alloc_offset = 0;
+		for (i = 0; i < map->num_stripes / map->sub_stripes; i++) {
+			int base = i * map->sub_stripes;
+
+			if (alloc_offsets[base] == WP_MISSING_DEV) {
+				error(
+			"cannot recover write pointer: block group %llu",
+				      logical);
+				ret = -EIO;
+				goto out;
+			}
+
+			if (alloc_offsets[0] < alloc_offsets[base]) {
+				error(
+				"write pointer mismatch: block group %llu",
+				      logical);
+				ret = -EIO;
+				goto out;
+			}
+
+			cache->alloc_offset += alloc_offsets[base];
+		}
+		break;
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID6:
+		/* RAID5/6 is not supported yet */
+	default:
+		error("Unsupported profile %llu",
+		      map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK);
+		ret = -EINVAL;
+		goto out;
+	}
+
+out:
+	cache->alloc_type = alloc_type;
+	free(alloc_offsets);
+	return ret;
+}
diff --git a/common/hmzoned.h b/common/hmzoned.h
index 93759291871f..dca7588f840b 100644
--- a/common/hmzoned.h
+++ b/common/hmzoned.h
@@ -61,6 +61,8 @@ bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr);
 int btrfs_discard_all_zones(int fd, struct btrfs_zone_info *zinfo);
 int zero_zone_blocks(int fd, struct btrfs_zone_info *zinfo, off_t start,
 		     size_t len);
+int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
+				     struct btrfs_block_group_cache *cache);
 #else
 static inline bool zone_is_sequential(struct btrfs_zone_info *zinfo,
 				      u64 bytenr)
@@ -76,6 +78,11 @@ static int zero_zone_blocks(int fd, struct btrfs_zone_info *zinfo, off_t start,
 {
 	return -EOPNOTSUPP;
 }
+static inline int btrfs_load_block_group_zone_info(
+	struct btrfs_fs_info *fs_info, struct btrfs_block_group_cache *cache)
+{
+	return 0;
+}
 #endif /* BTRFS_ZONED */
 
 #endif /* __BTRFS_HMZONED_H__ */
diff --git a/ctree.h b/ctree.h
index a56e18119069..d38708b8a6c5 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1087,6 +1087,20 @@ struct btrfs_space_info {
 	struct list_head list;
 };
 
+/* Block group allocation types */
+enum btrfs_alloc_type {
+
+	/* Regular first fit allocation */
+	BTRFS_ALLOC_FIT		= 0,
+
+	/*
+	 * Sequential allocation: this is for HMZONED mode and
+	 * will result in ignoring free space before a block
+	 * group allocation offset.
+	 */
+	BTRFS_ALLOC_SEQ		= 1,
+};
+
 struct btrfs_block_group_cache {
 	struct cache_extent cache;
 	struct btrfs_key key;
@@ -1109,6 +1123,8 @@ struct btrfs_block_group_cache {
          */
         u32 bitmap_low_thresh;
 
+	enum btrfs_alloc_type alloc_type;
+	u64 alloc_offset;
 };
 
 struct btrfs_device;
diff --git a/extent-tree.c b/extent-tree.c
index 932af2c644bd..35fddfbd9acc 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -251,6 +251,14 @@ again:
 	if (cache->ro || !block_group_bits(cache, data))
 		goto new_group;
 
+	if (cache->alloc_type == BTRFS_ALLOC_SEQ) {
+		if (cache->key.offset - cache->alloc_offset < num)
+			goto new_group;
+		*start_ret = cache->key.objectid + cache->alloc_offset;
+		cache->alloc_offset += num;
+		return 0;
+	}
+
 	while(1) {
 		ret = find_first_extent_bit(&root->fs_info->free_space_cache,
 					    last, &start, &end, EXTENT_DIRTY);
@@ -2724,6 +2732,10 @@ int btrfs_read_block_groups(struct btrfs_root *root)
 		BUG_ON(ret);
 		cache->space_info = space_info;
 
+		ret = btrfs_load_block_group_zone_info(info, cache);
+		if (ret)
+			goto error;
+
 		/* use EXTENT_LOCKED to prevent merging */
 		set_extent_bits(block_group_cache, found_key.objectid,
 				found_key.objectid + found_key.offset - 1,
@@ -2753,6 +2765,9 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type,
 	cache->key.objectid = chunk_offset;
 	cache->key.offset = size;
 
+	ret = btrfs_load_block_group_zone_info(fs_info, cache);
+	BUG_ON(ret);
+
 	cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 	btrfs_set_block_group_used(&cache->item, bytes_used);
 	btrfs_set_block_group_chunk_objectid(&cache->item,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 12/15] btrfs-progs: redirty clean extent buffers in seq
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (10 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 11/15] btrfs-progs: do sequential allocation in HMZONED mode Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 13/15] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

Tree manipulating operations like merging nodes often release
once-allocated tree nodes. Btrfs cleans such nodes so that pages in the
node are not uselessly written out. On HMZONED drives, however, such
optimization blocks the following IOs as the cancellation of the write out
of the freed blocks breaks the sequential write sequence expected by the
device.

This patch check if next dirty extent buffer is continuous to a previously
written one. If not, it redirty extent buffers between the previous one and
the next one, so that all dirty buffers are written sequentially.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/hmzoned.c | 28 ++++++++++++++++++++++++++++
 common/hmzoned.h |  2 ++
 ctree.h          |  1 +
 transaction.c    |  7 +++++++
 4 files changed, 38 insertions(+)

diff --git a/common/hmzoned.c b/common/hmzoned.c
index 0e54144259b7..1b3830b429ab 100644
--- a/common/hmzoned.c
+++ b/common/hmzoned.c
@@ -555,6 +555,34 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info,
 
 out:
 	cache->alloc_type = alloc_type;
+	cache->write_offset = cache->alloc_offset;
 	free(alloc_offsets);
 	return ret;
 }
+
+bool btrfs_redirty_extent_buffer_for_hmzoned(struct btrfs_fs_info *fs_info,
+					     u64 start, u64 end)
+{
+	u64 next;
+	struct btrfs_block_group_cache *cache;
+	struct extent_buffer *eb;
+
+	cache = btrfs_lookup_first_block_group(fs_info, start);
+	BUG_ON(!cache);
+
+	if (cache->alloc_type != BTRFS_ALLOC_SEQ)
+		return false;
+
+	if (cache->key.objectid + cache->write_offset < start) {
+		next = cache->key.objectid + cache->write_offset;
+		BUG_ON(next + fs_info->nodesize > start);
+		eb = btrfs_find_create_tree_block(fs_info, next);
+		btrfs_mark_buffer_dirty(eb);
+		free_extent_buffer(eb);
+		return true;
+	}
+
+	cache->write_offset += (end + 1 - start);
+
+	return false;
+}
diff --git a/common/hmzoned.h b/common/hmzoned.h
index dca7588f840b..bcbf6ea15c0b 100644
--- a/common/hmzoned.h
+++ b/common/hmzoned.h
@@ -55,6 +55,8 @@ int btrfs_get_zone_info(int fd, const char *file, bool hmzoned,
 			struct btrfs_zone_info *zinfo);
 bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
 				   u64 num_bytes);
+bool btrfs_redirty_extent_buffer_for_hmzoned(struct btrfs_fs_info *fs_info,
+					     u64 start, u64 end);
 
 #ifdef BTRFS_ZONED
 bool zone_is_sequential(struct btrfs_zone_info *zinfo, u64 bytenr);
diff --git a/ctree.h b/ctree.h
index d38708b8a6c5..cd315814614a 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1125,6 +1125,7 @@ struct btrfs_block_group_cache {
 
 	enum btrfs_alloc_type alloc_type;
 	u64 alloc_offset;
+	u64 write_offset;
 };
 
 struct btrfs_device;
diff --git a/transaction.c b/transaction.c
index 45bb9e1f9de6..7b37f12f118f 100644
--- a/transaction.c
+++ b/transaction.c
@@ -18,6 +18,7 @@
 #include "disk-io.h"
 #include "transaction.h"
 #include "delayed-ref.h"
+#include "common/hmzoned.h"
 
 #include "common/messages.h"
 
@@ -136,10 +137,16 @@ int __commit_transaction(struct btrfs_trans_handle *trans,
 	int ret;
 
 	while(1) {
+again:
 		ret = find_first_extent_bit(tree, 0, &start, &end,
 					    EXTENT_DIRTY);
 		if (ret)
 			break;
+
+		if (btrfs_redirty_extent_buffer_for_hmzoned(fs_info, start,
+							    end))
+			goto again;
+
 		while(start <= end) {
 			eb = find_first_extent_buffer(tree, start);
 			BUG_ON(!eb || eb->start != start);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 13/15] btrfs-progs: mkfs: Zoned block device support
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (11 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 12/15] btrfs-progs: redirty clean extent buffers in seq Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 14/15] btrfs-progs: device-add: support HMZONED device Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 15/15] btrfs-progs: introduce support for device replace " Naohiro Aota
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

This patch makes the size of the temporary system group chunk equal to the
device zone size. It also enables PREP_DEVICE_HMZONED if the user enables
the HMZONED feature.

Enabling HMZONED feature is done using option "-O hmzoned". This feature is
incompatible for now with source directory setup.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 mkfs/common.c | 20 ++++++++++-----
 mkfs/common.h |  1 +
 mkfs/main.c   | 67 +++++++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index caca5e707233..6b5c5500da67 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -154,6 +154,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	int skinny_metadata = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
 	u64 num_bytes;
+	u64 system_group_size;
 
 	buf = malloc(sizeof(*buf) + max(cfg->sectorsize, cfg->nodesize));
 	if (!buf)
@@ -203,7 +204,10 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_super_stripesize(&super, cfg->stripesize);
 	btrfs_set_super_csum_type(&super, BTRFS_CSUM_TYPE_CRC32);
 	btrfs_set_super_chunk_root_generation(&super, 1);
-	btrfs_set_super_cache_generation(&super, -1);
+	if (cfg->features & BTRFS_FEATURE_INCOMPAT_HMZONED)
+		btrfs_set_super_cache_generation(&super, 0);
+	else
+		btrfs_set_super_cache_generation(&super, -1);
 	btrfs_set_super_incompat_flags(&super, cfg->features);
 	if (cfg->label)
 		__strncpy_null(super.label, cfg->label, BTRFS_LABEL_SIZE - 1);
@@ -314,12 +318,17 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
 	btrfs_set_item_size(buf, btrfs_item_nr(nritems), item_size);
 
+	if (cfg->features & BTRFS_FEATURE_INCOMPAT_HMZONED)
+		system_group_size = cfg->zone_size -
+			BTRFS_BLOCK_RESERVED_1M_FOR_SUPER;
+	else
+		system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE;
+
 	dev_item = btrfs_item_ptr(buf, nritems, struct btrfs_dev_item);
 	btrfs_set_device_id(buf, dev_item, 1);
 	btrfs_set_device_generation(buf, dev_item, 0);
 	btrfs_set_device_total_bytes(buf, dev_item, num_bytes);
-	btrfs_set_device_bytes_used(buf, dev_item,
-				    BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+	btrfs_set_device_bytes_used(buf, dev_item, system_group_size);
 	btrfs_set_device_io_align(buf, dev_item, cfg->sectorsize);
 	btrfs_set_device_io_width(buf, dev_item, cfg->sectorsize);
 	btrfs_set_device_sector_size(buf, dev_item, cfg->sectorsize);
@@ -347,7 +356,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_item_size(buf, btrfs_item_nr(nritems), item_size);
 
 	chunk = btrfs_item_ptr(buf, nritems, struct btrfs_chunk);
-	btrfs_set_chunk_length(buf, chunk, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+	btrfs_set_chunk_length(buf, chunk, system_group_size);
 	btrfs_set_chunk_owner(buf, chunk, BTRFS_EXTENT_TREE_OBJECTID);
 	btrfs_set_chunk_stripe_len(buf, chunk, BTRFS_STRIPE_LEN);
 	btrfs_set_chunk_type(buf, chunk, BTRFS_BLOCK_GROUP_SYSTEM);
@@ -413,8 +422,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 		    (unsigned long)btrfs_dev_extent_chunk_tree_uuid(dev_extent),
 		    BTRFS_UUID_SIZE);
 
-	btrfs_set_dev_extent_length(buf, dev_extent,
-				    BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+	btrfs_set_dev_extent_length(buf, dev_extent, system_group_size);
 	nritems++;
 
 	btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_DEV_TREE]);
diff --git a/mkfs/common.h b/mkfs/common.h
index 28912906d0a9..d0e4c7b2c906 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -53,6 +53,7 @@ struct btrfs_mkfs_config {
 	u64 features;
 	/* Size of the filesystem in bytes */
 	u64 num_bytes;
+	u64 zone_size;
 
 	/* Output fields, set during creation */
 
diff --git a/mkfs/main.c b/mkfs/main.c
index 948c84be5f39..83463f8d819a 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -68,8 +68,13 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 	u64 bytes_used;
 	u64 chunk_start = 0;
 	u64 chunk_size = 0;
+	u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE;
 	int ret;
 
+	if (fs_info->fs_devices->hmzoned)
+		system_group_size = fs_info->fs_devices->zone_size -
+			BTRFS_BLOCK_RESERVED_1M_FOR_SUPER;
+
 	if (mixed)
 		flags |= BTRFS_BLOCK_GROUP_DATA;
 
@@ -90,8 +95,8 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed,
 	ret = btrfs_make_block_group(trans, fs_info, bytes_used,
 				     BTRFS_BLOCK_GROUP_SYSTEM,
 				     BTRFS_BLOCK_RESERVED_1M_FOR_SUPER,
-				     BTRFS_MKFS_SYSTEM_GROUP_SIZE);
-	allocation->system += BTRFS_MKFS_SYSTEM_GROUP_SIZE;
+				     system_group_size);
+	allocation->system += system_group_size;
 	if (ret)
 		return ret;
 
@@ -297,11 +302,19 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans,
 
 static int create_raid_groups(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root, u64 data_profile,
-			      u64 metadata_profile, int mixed,
+			      u64 metadata_profile, int mixed, int hmzoned,
 			      struct mkfs_allocation *allocation)
 {
 	int ret;
 
+	if (!metadata_profile && hmzoned) {
+		ret = create_one_raid_group(trans, root,
+					    BTRFS_BLOCK_GROUP_SYSTEM,
+					    allocation);
+		if (ret)
+			return ret;
+	}
+
 	if (metadata_profile) {
 		u64 meta_flags = BTRFS_BLOCK_GROUP_METADATA;
 
@@ -548,6 +561,7 @@ out:
 /* This function will cleanup  */
 static int cleanup_temp_chunks(struct btrfs_fs_info *fs_info,
 			       struct mkfs_allocation *alloc,
+			       int hmzoned,
 			       u64 data_profile, u64 meta_profile,
 			       u64 sys_profile)
 {
@@ -599,7 +613,11 @@ static int cleanup_temp_chunks(struct btrfs_fs_info *fs_info,
 				     struct btrfs_block_group_item);
 		if (is_temp_block_group(path.nodes[0], bgi,
 					data_profile, meta_profile,
-					sys_profile)) {
+					sys_profile) ||
+		    /* need to remove the first sys chunk */
+		    (hmzoned && found_key.objectid ==
+		     BTRFS_BLOCK_RESERVED_1M_FOR_SUPER)) {
+
 			u64 flags = btrfs_disk_block_group_flags(path.nodes[0],
 							     bgi);
 
@@ -783,6 +801,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	int metadata_profile_opt = 0;
 	int discard = 1;
 	int ssd = 0;
+	int hmzoned = 0;
 	int force_overwrite = 0;
 	char *source_dir = NULL;
 	bool source_dir_set = false;
@@ -796,6 +815,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	u64 features = BTRFS_MKFS_DEFAULT_FEATURES;
 	struct mkfs_allocation allocation = { 0 };
 	struct btrfs_mkfs_config mkfs_cfg;
+	u64 system_group_size;
 
 	crc32c_optimization_init();
 
@@ -920,6 +940,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	if (dev_cnt == 0)
 		print_usage(1);
 
+	hmzoned = features & BTRFS_FEATURE_INCOMPAT_HMZONED;
+
 	if (source_dir_set && dev_cnt > 1) {
 		error("the option -r is limited to a single device");
 		goto error;
@@ -929,6 +951,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 		goto error;
 	}
 
+	if (source_dir_set && hmzoned) {
+		error("The -r and hmzoned feature are incompatible");
+		exit(1);
+	}
+
 	if (*fs_uuid) {
 		uuid_t dummy_uuid;
 
@@ -960,6 +987,16 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 
 	file = argv[optind++];
 	ssd = is_ssd(file);
+	if (hmzoned) {
+		if (zoned_model(file) == ZONED_NONE) {
+			error("%s: not a zoned block device", file);
+			exit(1);
+		}
+		if (!zone_size(file)) {
+			error("%s: zone size undefined", file);
+			exit(1);
+		}
+	}
 
 	/*
 	* Set default profiles according to number of added devices.
@@ -1111,7 +1148,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	ret = btrfs_prepare_device(fd, file, &dev_block_count, block_count,
 			(zero_end ? PREP_DEVICE_ZERO_END : 0) |
 			(discard ? PREP_DEVICE_DISCARD : 0) |
-			(verbose ? PREP_DEVICE_VERBOSE : 0));
+			(verbose ? PREP_DEVICE_VERBOSE : 0) |
+			(hmzoned ? PREP_DEVICE_HMZONED : 0));
 	if (ret)
 		goto error;
 	if (block_count && block_count > dev_block_count) {
@@ -1122,9 +1160,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	}
 
 	/* To create the first block group and chunk 0 in make_btrfs */
-	if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+	system_group_size = hmzoned ?
+		zone_size(file) : BTRFS_MKFS_SYSTEM_GROUP_SIZE;
+	if (dev_block_count < system_group_size) {
 		error("device is too small to make filesystem, must be at least %llu",
-				(unsigned long long)BTRFS_MKFS_SYSTEM_GROUP_SIZE);
+				(unsigned long long)system_group_size);
 		goto error;
 	}
 
@@ -1140,6 +1180,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	mkfs_cfg.sectorsize = sectorsize;
 	mkfs_cfg.stripesize = stripesize;
 	mkfs_cfg.features = features;
+	mkfs_cfg.zone_size = zone_size(file);
 
 	ret = make_btrfs(fd, &mkfs_cfg);
 	if (ret) {
@@ -1150,6 +1191,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 
 	fs_info = open_ctree_fs_info(file, 0, 0, 0,
 			OPEN_CTREE_WRITES | OPEN_CTREE_TEMPORARY_SUPER);
+
 	if (!fs_info) {
 		error("open ctree failed");
 		goto error;
@@ -1223,7 +1265,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 				block_count,
 				(verbose ? PREP_DEVICE_VERBOSE : 0) |
 				(zero_end ? PREP_DEVICE_ZERO_END : 0) |
-				(discard ? PREP_DEVICE_DISCARD : 0));
+				(discard ? PREP_DEVICE_DISCARD : 0) |
+				(hmzoned ? PREP_DEVICE_HMZONED : 0));
 		if (ret) {
 			goto error;
 		}
@@ -1246,7 +1289,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 
 raid_groups:
 	ret = create_raid_groups(trans, root, data_profile,
-			 metadata_profile, mixed, &allocation);
+			 metadata_profile, mixed, hmzoned, &allocation);
 	if (ret) {
 		error("unable to create raid groups: %d", ret);
 		goto out;
@@ -1269,7 +1312,7 @@ raid_groups:
 		goto out;
 	}
 
-	ret = cleanup_temp_chunks(fs_info, &allocation, data_profile,
+	ret = cleanup_temp_chunks(fs_info, &allocation, hmzoned, data_profile,
 				  metadata_profile, metadata_profile);
 	if (ret < 0) {
 		error("failed to cleanup temporary chunks: %d", ret);
@@ -1320,6 +1363,10 @@ raid_groups:
 			btrfs_group_profile_str(metadata_profile),
 			pretty_size(allocation.system));
 		printf("SSD detected:       %s\n", ssd ? "yes" : "no");
+		printf("Zoned device:       %s\n", hmzoned ? "yes" : "no");
+		if (hmzoned)
+			printf("Zone size:          %s\n",
+			       pretty_size(fs_info->fs_devices->zone_size));
 		btrfs_parse_features_to_string(features_buf, features);
 		printf("Incompat features:  %s", features_buf);
 		printf("\n");
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 14/15] btrfs-progs: device-add: support HMZONED device
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (12 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 13/15] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
@ 2019-08-20  4:52 ` Naohiro Aota
  2019-08-20  4:52 ` [PATCH v3 15/15] btrfs-progs: introduce support for device replace " Naohiro Aota
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

This patch check if the target file system is flagged as HMZONED. If it is,
the device to be added is flagged PREP_DEVICE_HMZONED.  Also add checks to
prevent mixing non-zoned devices and zoned devices.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/device.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/cmds/device.c b/cmds/device.c
index 24158308a41b..9caa77efd049 100644
--- a/cmds/device.c
+++ b/cmds/device.c
@@ -61,6 +61,9 @@ static int cmd_device_add(const struct cmd_struct *cmd,
 	int discard = 1;
 	int force = 0;
 	int last_dev;
+	int res;
+	int hmzoned;
+	struct btrfs_ioctl_feature_flags feature_flags;
 
 	optind = 0;
 	while (1) {
@@ -96,12 +99,35 @@ static int cmd_device_add(const struct cmd_struct *cmd,
 	if (fdmnt < 0)
 		return 1;
 
+	res = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags);
+	if (res) {
+		error("error getting feature flags '%s': %m", mntpnt);
+		return 1;
+	}
+	hmzoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_HMZONED;
+
 	for (i = optind; i < last_dev; i++){
 		struct btrfs_ioctl_vol_args ioctl_args;
-		int	devfd, res;
+		int	devfd;
 		u64 dev_block_count = 0;
 		char *path;
 
+		if (hmzoned && zoned_model(argv[i]) == ZONED_NONE) {
+			error(
+		"cannot add non-zoned device to HMZONED file system '%s'",
+			      argv[i]);
+			ret++;
+			continue;
+		}
+
+		if (!hmzoned && zoned_model(argv[i]) == ZONED_HOST_MANAGED) {
+			error(
+	"cannot add host managed zoned device to non-HMZONED file system '%s'",
+			      argv[i]);
+			ret++;
+			continue;
+		}
+
 		res = test_dev_for_mkfs(argv[i], force);
 		if (res) {
 			ret++;
@@ -117,7 +143,8 @@ static int cmd_device_add(const struct cmd_struct *cmd,
 
 		res = btrfs_prepare_device(devfd, argv[i], &dev_block_count, 0,
 				PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE |
-				(discard ? PREP_DEVICE_DISCARD : 0));
+				(discard ? PREP_DEVICE_DISCARD : 0) |
+				(hmzoned ? PREP_DEVICE_HMZONED : 0));
 		close(devfd);
 		if (res) {
 			ret++;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 15/15] btrfs-progs: introduce support for device replace HMZONED device
  2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
                   ` (13 preceding siblings ...)
  2019-08-20  4:52 ` [PATCH v3 14/15] btrfs-progs: device-add: support HMZONED device Naohiro Aota
@ 2019-08-20  4:52 ` " Naohiro Aota
  14 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2019-08-20  4:52 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Matias Bjorling, Johannes Thumshirn, Hannes Reinecke,
	linux-fsdevel, Naohiro Aota

This patch check if the target file system is flagged as HMZONED. If it is,
the device to be added is flagged PREP_DEVICE_HMZONED.  Also add checks to
prevent mixing non-zoned devices and zoned devices.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 cmds/replace.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/cmds/replace.c b/cmds/replace.c
index 2321aa156fe2..670df68a93f7 100644
--- a/cmds/replace.c
+++ b/cmds/replace.c
@@ -119,6 +119,7 @@ static const char *const cmd_replace_start_usage[] = {
 static int cmd_replace_start(const struct cmd_struct *cmd,
 			     int argc, char **argv)
 {
+	struct btrfs_ioctl_feature_flags feature_flags;
 	struct btrfs_ioctl_dev_replace_args start_args = {0};
 	struct btrfs_ioctl_dev_replace_args status_args = {0};
 	int ret;
@@ -126,6 +127,7 @@ static int cmd_replace_start(const struct cmd_struct *cmd,
 	int c;
 	int fdmnt = -1;
 	int fddstdev = -1;
+	int hmzoned;
 	char *path;
 	char *srcdev;
 	char *dstdev = NULL;
@@ -166,6 +168,13 @@ static int cmd_replace_start(const struct cmd_struct *cmd,
 	if (fdmnt < 0)
 		goto leave_with_error;
 
+	ret = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags);
+	if (ret) {
+		error("ioctl(GET_FEATURES) on '%s' returns error: %m", path);
+		goto leave_with_error;
+	}
+	hmzoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_HMZONED;
+
 	/* check for possible errors before backgrounding */
 	status_args.cmd = BTRFS_IOCTL_DEV_REPLACE_CMD_STATUS;
 	status_args.result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT;
@@ -260,7 +269,8 @@ static int cmd_replace_start(const struct cmd_struct *cmd,
 	strncpy((char *)start_args.start.tgtdev_name, dstdev,
 		BTRFS_DEVICE_PATH_NAME_MAX);
 	ret = btrfs_prepare_device(fddstdev, dstdev, &dstdev_block_count, 0,
-			PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE);
+			PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE |
+			(hmzoned ? PREP_DEVICE_HMZONED : 0));
 	if (ret)
 		goto leave_with_error;
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20  4:52 [PATCH v3 00/15] btrfs-progs: zoned block device support Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 01/15] btrfs-progs: utils: Introduce queue_param helper function Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 02/15] btrfs-progs: introduce raid parameters variables Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 03/15] btrfs-progs: build: Check zoned block device support Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 04/15] btrfs-progs: add new HMZONED feature flag Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 05/15] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 06/15] btrfs-progs: load and check zone information Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 07/15] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 08/15] btrfs-progs: support discarding zoned device Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 09/15] btrfs-progs: support zero out on zoned block device Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 10/15] btrfs-progs: align device extent allocation to zone boundary Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 11/15] btrfs-progs: do sequential allocation in HMZONED mode Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 12/15] btrfs-progs: redirty clean extent buffers in seq Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 13/15] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 14/15] btrfs-progs: device-add: support HMZONED device Naohiro Aota
2019-08-20  4:52 ` [PATCH v3 15/15] btrfs-progs: introduce support for device replace " Naohiro Aota

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox