util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs
@ 2021-04-14  1:33 Naohiro Aota
  2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-14  1:33 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

This series implements probing and wiping of the superblock of zoned btrfs.

Changes:
  - v2:
     - Fix zone alignment calculation
     - Fix the build without HAVE_LINUX_BLKZONED_H

Zoned btrfs is merged with this series:
https://lore.kernel.org/linux-btrfs/20210222160049.GR1993@twin.jikos.cz/T/

And, superblock locations are finalized with this patch:
https://lore.kernel.org/linux-btrfs/BL0PR04MB651442E6ACBF48342BD00FEBE7719@BL0PR04MB6514.namprd04.prod.outlook.com/T/

A zoned block device consists of a number of zones. Zones are either
conventional and accepting random writes or sequential and requiring that
writes be issued in LBA order from each zone write pointer position.

Superblock (and its copies) is the only data structure in btrfs with a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone.

Thus, zoned btrfs use superblock log writing to update superblock on
sequential write required zones. It uses two zones as a circular buffer to
write updated superblocks. Once the first zone is filled up, start writing
into the second buffer. When both zones are filled up and before start
writing to the first zone again, it reset the first zone.

This series first implements zone based detection of the magic location.
Then, it adds magics for zoned btrfs and implements a probing function to
detect the latest superblock. Finally, this series also implements
zone-aware wiping by zone resetting.

Naohiro Aota (3):
  blkid: implement zone-aware probing
  blkid: add magic and probing for zoned btrfs
  blkid: support zone reset for wipefs

 configure.ac                     |   1 +
 libblkid/src/blkidP.h            |   5 +
 libblkid/src/probe.c             | 108 ++++++++++++++++--
 libblkid/src/superblocks/btrfs.c | 185 ++++++++++++++++++++++++++++++-
 4 files changed, 290 insertions(+), 9 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] blkid: implement zone-aware probing
  2021-04-14  1:33 [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
@ 2021-04-14  1:33 ` Naohiro Aota
  2021-04-14  9:14   ` Damien Le Moal
                     ` (2 more replies)
  2021-04-14  1:33 ` [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-14  1:33 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

This patch makes libblkid zone-aware. It can probe the magic located at
some offset from the beginning of some specific zone of a device.

Ths patch introduces some new fields to struct blkid_idmag. They indicate
the magic location is placed related to a zone, and the offset in the zone.

Also, this commit introduces `zone_size` to struct blkid_struct_probe. It
stores the size of zones of a device.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 configure.ac          |  1 +
 libblkid/src/blkidP.h |  5 +++++
 libblkid/src/probe.c  | 29 +++++++++++++++++++++++++++--
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index bebb4085425a..52b164e834db 100644
--- a/configure.ac
+++ b/configure.ac
@@ -302,6 +302,7 @@ AC_CHECK_HEADERS([ \
 	lastlog.h \
 	libutil.h \
 	linux/btrfs.h \
+	linux/blkzoned.h \
 	linux/capability.h \
 	linux/cdrom.h \
 	linux/falloc.h \
diff --git a/libblkid/src/blkidP.h b/libblkid/src/blkidP.h
index a3fe6748a969..e3a160aa97c0 100644
--- a/libblkid/src/blkidP.h
+++ b/libblkid/src/blkidP.h
@@ -150,6 +150,10 @@ struct blkid_idmag
 	const char	*hoff;		/* hint which contains byte offset to kboff */
 	long		kboff;		/* kilobyte offset of superblock */
 	unsigned int	sboff;		/* byte offset within superblock */
+
+	int		is_zoned;	/* indicate magic location is calcluated based on zone position  */
+	long		zonenum;	/* zone number which has superblock */
+	long		kboff_inzone;	/* kilobyte offset of superblock in a zone */
 };
 
 /*
@@ -206,6 +210,7 @@ struct blkid_struct_probe
 	dev_t			disk_devno;	/* devno of the whole-disk or 0 */
 	unsigned int		blkssz;		/* sector size (BLKSSZGET ioctl) */
 	mode_t			mode;		/* struct stat.sb_mode */
+	uint64_t		zone_size;	/* zone size (BLKGETZONESZ ioctl) */
 
 	int			flags;		/* private library flags */
 	int			prob_flags;	/* always zeroized by blkid_do_*() */
diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
index a47a8720d4ac..9d180aab5242 100644
--- a/libblkid/src/probe.c
+++ b/libblkid/src/probe.c
@@ -94,6 +94,9 @@
 #ifdef HAVE_LINUX_CDROM_H
 #include <linux/cdrom.h>
 #endif
+#ifdef HAVE_LINUX_BLKZONED_H
+#include <linux/blkzoned.h>
+#endif
 #ifdef HAVE_SYS_STAT_H
 #include <sys/stat.h>
 #endif
@@ -897,6 +900,7 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
 	pr->wipe_off = 0;
 	pr->wipe_size = 0;
 	pr->wipe_chain = NULL;
+	pr->zone_size = 0;
 
 	if (fd < 0)
 		return 1;
@@ -996,6 +1000,15 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
 #endif
 	free(dm_uuid);
 
+# ifdef HAVE_LINUX_BLKZONED_H
+	if (S_ISBLK(sb.st_mode)) {
+		uint32_t zone_size_sector;
+
+		if (!ioctl(pr->fd, BLKGETZONESZ, &zone_size_sector))
+			pr->zone_size = zone_size_sector << 9;
+	}
+# endif
+
 	DBG(LOWPROBE, ul_debug("ready for low-probing, offset=%"PRIu64", size=%"PRIu64"",
 				pr->off, pr->size));
 	DBG(LOWPROBE, ul_debug("whole-disk: %s, regfile: %s",
@@ -1064,12 +1077,24 @@ int blkid_probe_get_idmag(blkid_probe pr, const struct blkid_idinfo *id,
 	/* try to detect by magic string */
 	while(mag && mag->magic) {
 		unsigned char *buf;
+		uint64_t kboff;
 		uint64_t hint_offset;
 
 		if (!mag->hoff || blkid_probe_get_hint(pr, mag->hoff, &hint_offset) < 0)
 			hint_offset = 0;
 
-		off = hint_offset + ((mag->kboff + (mag->sboff >> 10)) << 10);
+		/* If the magic is for zoned device, skip non-zoned device */
+		if (mag->is_zoned && !pr->zone_size) {
+			mag++;
+			continue;
+		}
+
+		if (!mag->is_zoned)
+			kboff = mag->kboff;
+		else
+			kboff = ((mag->zonenum * pr->zone_size) >> 10) + mag->kboff_inzone;
+
+		off = hint_offset + ((kboff + (mag->sboff >> 10)) << 10);
 		buf = blkid_probe_get_buffer(pr, off, 1024);
 
 		if (!buf && errno)
@@ -1079,7 +1104,7 @@ int blkid_probe_get_idmag(blkid_probe pr, const struct blkid_idinfo *id,
 				buf + (mag->sboff & 0x3ff), mag->len)) {
 
 			DBG(LOWPROBE, ul_debug("\tmagic sboff=%u, kboff=%ld",
-				mag->sboff, mag->kboff));
+				mag->sboff, kboff));
 			if (offset)
 				*offset = off + (mag->sboff & 0x3ff);
 			if (res)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-14  1:33 [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
  2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
@ 2021-04-14  1:33 ` Naohiro Aota
  2021-04-14  9:49   ` Johannes Thumshirn
                     ` (2 more replies)
  2021-04-14  1:33 ` [PATCH v2 3/3] blkid: support zone reset for wipefs Naohiro Aota
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-14  1:33 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

This commit adds zone-aware magics and probing functions for zoned btrfs.

Superblock (and its copies) is the only data structure in btrfs with a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone.

Thus, zoned btrfs use superblock log writing to update superblock on
sequential write required zones. It uses two zones as a circular buffer to
write updated superblocks. Once the first zone is filled up, start writing
into the second buffer. When both zones are filled up and before start
writing to the first zone again, it reset the first zone.

We can determine the position of the latest superblock by reading write
pointer information from a device. One corner case is when both zones are
full. For this situation, we read out the last superblock of each zone and
compare them to determine which zone is older.

The magics can detect a superblock magic ("_BHRfs_M") at the beginning of
zone #0 or zone #1 to see if it is zoned btrfs. When both zones are filled
up, zoned btrfs reset the first zone to write a new superblock. If btrfs
crash at the moment, we do not see a superblock at zone #0. Thus, we need
to check not only zone #0 but also zone #1.

It also supports temporary magic ("!BHRfS_M") in zone #0. The mkfs.btrfs
first writes the temporary superblock to the zone during the mkfs process.
It will survive there until the zones are filled up and reset. So, we also
need to detect this temporary magic.

Finally, this commit extends probe_btrfs() to load the latest superblock
determined by the write pointers.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 libblkid/src/superblocks/btrfs.c | 185 ++++++++++++++++++++++++++++++-
 1 file changed, 184 insertions(+), 1 deletion(-)

diff --git a/libblkid/src/superblocks/btrfs.c b/libblkid/src/superblocks/btrfs.c
index f0fde700d896..812918ac1f42 100644
--- a/libblkid/src/superblocks/btrfs.c
+++ b/libblkid/src/superblocks/btrfs.c
@@ -9,6 +9,12 @@
 #include <unistd.h>
 #include <string.h>
 #include <stdint.h>
+#include <stdbool.h>
+#include <assert.h>
+
+#ifdef HAVE_LINUX_BLKZONED_H
+#include <linux/blkzoned.h>
+#endif
 
 #include "superblocks.h"
 
@@ -59,11 +65,176 @@ struct btrfs_super_block {
 	uint8_t label[256];
 } __attribute__ ((__packed__));
 
+#define BTRFS_SUPER_INFO_SIZE 4096
+
+/* Number of superblock log zones */
+#define BTRFS_NR_SB_LOG_ZONES 2
+
+/* Introduce some macros and types to unify the code with kernel side */
+#define SECTOR_SHIFT 9
+
+#define ASSERT(x) assert(x)
+
+typedef uint64_t u64;
+typedef uint64_t sector_t;
+typedef uint8_t u8;
+
+#ifdef HAVE_LINUX_BLKZONED_H
+static int sb_write_pointer(int fd, struct blk_zone *zones, u64 *wp_ret)
+{
+	bool empty[BTRFS_NR_SB_LOG_ZONES];
+	bool full[BTRFS_NR_SB_LOG_ZONES];
+	sector_t sector;
+
+	ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL &&
+	       zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL);
+
+	empty[0] = zones[0].cond == BLK_ZONE_COND_EMPTY;
+	empty[1] = zones[1].cond == BLK_ZONE_COND_EMPTY;
+	full[0] = zones[0].cond == BLK_ZONE_COND_FULL;
+	full[1] = zones[1].cond == BLK_ZONE_COND_FULL;
+
+	/*
+	 * Possible states of log buffer zones
+	 *
+	 *           Empty[0]  In use[0]  Full[0]
+	 * Empty[1]         *          x        0
+	 * In use[1]        0          x        0
+	 * Full[1]          1          1        C
+	 *
+	 * Log position:
+	 *   *: Special case, no superblock is written
+	 *   0: Use write pointer of zones[0]
+	 *   1: Use write pointer of zones[1]
+	 *   C: Compare super blcoks from zones[0] and zones[1], use the latest
+	 *      one determined by generation
+	 *   x: Invalid state
+	 */
+
+	if (empty[0] && empty[1]) {
+		/* Special case to distinguish no superblock to read */
+		*wp_ret = zones[0].start << SECTOR_SHIFT;
+		return -ENOENT;
+	} else if (full[0] && full[1]) {
+		/* Compare two super blocks */
+		u8 buf[BTRFS_NR_SB_LOG_ZONES][BTRFS_SUPER_INFO_SIZE];
+		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
+		int i;
+		int ret;
+
+		for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+			u64 bytenr;
+
+			bytenr = ((zones[i].start + zones[i].len)
+				   << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE;
+
+			ret = pread64(fd, buf[i], BTRFS_SUPER_INFO_SIZE,
+				      bytenr);
+			if (ret != BTRFS_SUPER_INFO_SIZE)
+				return -EIO;
+			super[i] = (struct btrfs_super_block *)&buf[i];
+		}
+
+		if (super[0]->generation > super[1]->generation)
+			sector = zones[1].start;
+		else
+			sector = zones[0].start;
+	} else if (!full[0] && (empty[1] || full[1])) {
+		sector = zones[0].wp;
+	} else if (full[0]) {
+		sector = zones[1].wp;
+	} else {
+		return -EUCLEAN;
+	}
+	*wp_ret = sector << SECTOR_SHIFT;
+	return 0;
+}
+
+static int sb_log_offset(blkid_probe pr, uint64_t *bytenr_ret)
+{
+	uint32_t zone_num = 0;
+	uint32_t zone_size_sector;
+	struct blk_zone_report *rep;
+	struct blk_zone *zones;
+	size_t rep_size;
+	int ret;
+	uint64_t wp;
+
+	zone_size_sector = pr->zone_size >> SECTOR_SHIFT;
+
+	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2;
+	rep = malloc(rep_size);
+	if (!rep)
+		return -errno;
+
+	memset(rep, 0, rep_size);
+	rep->sector = zone_num * zone_size_sector;
+	rep->nr_zones = 2;
+
+	ret = ioctl(pr->fd, BLKREPORTZONE, rep);
+	if (ret) {
+		ret = -errno;
+		goto out;
+	}
+	if (rep->nr_zones != 2) {
+		ret = 1;
+		goto out;
+	}
+
+	zones = (struct blk_zone *)(rep + 1);
+
+	if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		*bytenr_ret = zones[0].start << SECTOR_SHIFT;
+		ret = 0;
+		goto out;
+	} else if (zones[1].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		*bytenr_ret = zones[1].start << SECTOR_SHIFT;
+		ret = 0;
+		goto out;
+	}
+
+	ret = sb_write_pointer(pr->fd, zones, &wp);
+	if (ret != -ENOENT && ret) {
+		ret = 1;
+		goto out;
+	}
+	if (ret != -ENOENT) {
+		if (wp == zones[0].start << SECTOR_SHIFT)
+			wp = (zones[1].start + zones[1].len) << SECTOR_SHIFT;
+		wp -= BTRFS_SUPER_INFO_SIZE;
+	}
+	*bytenr_ret = wp;
+
+	ret = 0;
+out:
+	free(rep);
+
+	return ret;
+}
+#endif
+
 static int probe_btrfs(blkid_probe pr, const struct blkid_idmag *mag)
 {
 	struct btrfs_super_block *bfs;
 
-	bfs = blkid_probe_get_sb(pr, mag, struct btrfs_super_block);
+	if (pr->zone_size) {
+#ifdef HAVE_LINUX_BLKZONED_H
+		uint64_t offset = 0;
+		int ret;
+
+		ret = sb_log_offset(pr, &offset);
+		if (ret)
+			return ret;
+		bfs = (struct btrfs_super_block *)
+			blkid_probe_get_buffer(pr, offset,
+					       sizeof(struct btrfs_super_block));
+#else
+		/* Nothing can be done */
+		return 1;
+#endif
+	} else {
+		bfs = blkid_probe_get_sb(pr, mag, struct btrfs_super_block);
+	}
 	if (!bfs)
 		return errno ? -errno : 1;
 
@@ -88,6 +259,18 @@ const struct blkid_idinfo btrfs_idinfo =
 	.magics		=
 	{
 	  { .magic = "_BHRfS_M", .len = 8, .sboff = 0x40, .kboff = 64 },
+	  /* For zoned btrfs */
+	  { .magic = "_BHRfS_M", .len = 8, .sboff = 0x40,
+	    .is_zoned = 1, .zonenum = 0, .kboff_inzone = 0 },
+	  { .magic = "_BHRfS_M", .len = 8, .sboff = 0x40,
+	    .is_zoned = 1, .zonenum = 1, .kboff_inzone = 0 },
+	  /*
+	   * For zoned btrfs, we also need to detect a temporary superblock
+	   * at zone #0. Mkfs.btrfs creates it in the initialize process.
+	   * It persits until both zones are filled up then reset.
+	   */
+	  { .magic = "!BHRfS_M", .len = 8, .sboff = 0x40,
+	    .is_zoned = 1, .zonenum = 0, .kboff_inzone = 0 },
 	  { NULL }
 	}
 };
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] blkid: support zone reset for wipefs
  2021-04-14  1:33 [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
  2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
  2021-04-14  1:33 ` [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
@ 2021-04-14  1:33 ` Naohiro Aota
  2021-04-14  9:13   ` Damien Le Moal
                     ` (2 more replies)
  2021-04-14 14:09 ` [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak
  2021-04-22 14:41 ` Karel Zak
  4 siblings, 3 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-14  1:33 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

We cannot overwrite superblock magic in a sequential required zone. So,
wipefs cannot work as it is. Instead, this commit implements the wiping by
zone resetting.

Zone resetting must be done only for a sequential write zone. This is
checked by is_conventional().

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 libblkid/src/probe.c | 79 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 73 insertions(+), 6 deletions(-)

diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
index 9d180aab5242..23c3621627d4 100644
--- a/libblkid/src/probe.c
+++ b/libblkid/src/probe.c
@@ -107,6 +107,7 @@
 #include <stdint.h>
 #include <stdarg.h>
 #include <limits.h>
+#include <stdbool.h>
 
 #include "blkidP.h"
 #include "all-io.h"
@@ -1228,6 +1229,48 @@ int blkid_do_probe(blkid_probe pr)
 	return rc;
 }
 
+#ifdef HAVE_LINUX_BLKZONED_H
+static int is_conventional(blkid_probe pr, uint64_t offset)
+{
+	struct blk_zone_report *rep = NULL;
+	size_t rep_size;
+	int ret;
+	uint64_t zone_mask;
+
+	if (!pr->zone_size)
+		return 1;
+
+	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone);
+	rep = calloc(1, rep_size);
+	if (!rep)
+		return -1;
+
+	zone_mask = ~(pr->zone_size - 1);
+	rep->sector = (offset & zone_mask) >> 9;
+	rep->nr_zones = 1;
+	ret = ioctl(blkid_probe_get_fd(pr), BLKREPORTZONE, rep);
+	if (ret) {
+		free(rep);
+		return -1;
+	}
+
+	if (rep->zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL)
+		ret = 1;
+	else
+		ret = 0;
+
+	free(rep);
+
+	return ret;
+}
+#else
+static inline int is_conventional(blkid_probe pr __attribute__((__unused__)),
+				  uint64_t offset __attribute__((__unused__)))
+{
+	return 1;
+}
+#endif
+
 /**
  * blkid_do_wipe:
  * @pr: prober
@@ -1267,6 +1310,7 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
 	const char *off = NULL;
 	size_t len = 0;
 	uint64_t offset, magoff;
+	bool conventional;
 	char buf[BUFSIZ];
 	int fd, rc = 0;
 	struct blkid_chain *chn;
@@ -1302,6 +1346,11 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
 	if (len > sizeof(buf))
 		len = sizeof(buf);
 
+	rc = is_conventional(pr, offset);
+	if (rc < 0)
+		return rc;
+	conventional = rc == 1;
+
 	DBG(LOWPROBE, ul_debug(
 	    "do_wipe [offset=0x%"PRIx64" (%"PRIu64"), len=%zu, chain=%s, idx=%d, dryrun=%s]\n",
 	    offset, offset, len, chn->driver->name, chn->idx, dryrun ? "yes" : "not"));
@@ -1309,13 +1358,31 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
 	if (lseek(fd, offset, SEEK_SET) == (off_t) -1)
 		return -1;
 
-	memset(buf, 0, len);
-
 	if (!dryrun && len) {
-		/* wipen on device */
-		if (write_all(fd, buf, len))
-			return -1;
-		fsync(fd);
+		if (conventional) {
+			memset(buf, 0, len);
+
+			/* wipen on device */
+			if (write_all(fd, buf, len))
+				return -1;
+			fsync(fd);
+		} else {
+#ifdef HAVE_LINUX_BLKZONED_H
+			uint64_t zone_mask = ~(pr->zone_size - 1);
+			struct blk_zone_range range = {
+				.sector = (offset & zone_mask) >> 9,
+				.nr_sectors = pr->zone_size >> 9,
+			};
+
+			rc = ioctl(fd, BLKRESETZONE, &range);
+			if (rc < 0)
+				return -1;
+#else
+			/* Should not reach here */
+			assert(0);
+#endif
+		}
+
 		pr->flags &= ~BLKID_FL_MODIF_BUFF;	/* be paranoid */
 
 		return blkid_probe_step_back(pr);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] blkid: support zone reset for wipefs
  2021-04-14  1:33 ` [PATCH v2 3/3] blkid: support zone reset for wipefs Naohiro Aota
@ 2021-04-14  9:13   ` Damien Le Moal
  2021-04-14  9:50   ` Johannes Thumshirn
  2021-04-14 13:57   ` Karel Zak
  2 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2021-04-14  9:13 UTC (permalink / raw)
  To: Naohiro Aota, Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Johannes Thumshirn

On 2021/04/14 10:33, Naohiro Aota wrote:
> We cannot overwrite superblock magic in a sequential required zone. So,
> wipefs cannot work as it is. Instead, this commit implements the wiping by
> zone resetting.
> 
> Zone resetting must be done only for a sequential write zone. This is
> checked by is_conventional().
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  libblkid/src/probe.c | 79 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 73 insertions(+), 6 deletions(-)
> 
> diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
> index 9d180aab5242..23c3621627d4 100644
> --- a/libblkid/src/probe.c
> +++ b/libblkid/src/probe.c
> @@ -107,6 +107,7 @@
>  #include <stdint.h>
>  #include <stdarg.h>
>  #include <limits.h>
> +#include <stdbool.h>
>  
>  #include "blkidP.h"
>  #include "all-io.h"
> @@ -1228,6 +1229,48 @@ int blkid_do_probe(blkid_probe pr)
>  	return rc;
>  }
>  
> +#ifdef HAVE_LINUX_BLKZONED_H
> +static int is_conventional(blkid_probe pr, uint64_t offset)
> +{
> +	struct blk_zone_report *rep = NULL;
> +	size_t rep_size;
> +	int ret;
> +	uint64_t zone_mask;
> +
> +	if (!pr->zone_size)
> +		return 1;
> +
> +	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone);
> +	rep = calloc(1, rep_size);
> +	if (!rep)
> +		return -1;
> +
> +	zone_mask = ~(pr->zone_size - 1);
> +	rep->sector = (offset & zone_mask) >> 9;
> +	rep->nr_zones = 1;
> +	ret = ioctl(blkid_probe_get_fd(pr), BLKREPORTZONE, rep);
> +	if (ret) {
> +		free(rep);
> +		return -1;
> +	}
> +
> +	if (rep->zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL)
> +		ret = 1;
> +	else
> +		ret = 0;
> +
> +	free(rep);
> +
> +	return ret;
> +}
> +#else
> +static inline int is_conventional(blkid_probe pr __attribute__((__unused__)),
> +				  uint64_t offset __attribute__((__unused__)))
> +{
> +	return 1;
> +}
> +#endif
> +
>  /**
>   * blkid_do_wipe:
>   * @pr: prober
> @@ -1267,6 +1310,7 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
>  	const char *off = NULL;
>  	size_t len = 0;
>  	uint64_t offset, magoff;
> +	bool conventional;
>  	char buf[BUFSIZ];
>  	int fd, rc = 0;
>  	struct blkid_chain *chn;
> @@ -1302,6 +1346,11 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
>  	if (len > sizeof(buf))
>  		len = sizeof(buf);
>  
> +	rc = is_conventional(pr, offset);
> +	if (rc < 0)
> +		return rc;
> +	conventional = rc == 1;
> +
>  	DBG(LOWPROBE, ul_debug(
>  	    "do_wipe [offset=0x%"PRIx64" (%"PRIu64"), len=%zu, chain=%s, idx=%d, dryrun=%s]\n",
>  	    offset, offset, len, chn->driver->name, chn->idx, dryrun ? "yes" : "not"));
> @@ -1309,13 +1358,31 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
>  	if (lseek(fd, offset, SEEK_SET) == (off_t) -1)
>  		return -1;
>  
> -	memset(buf, 0, len);
> -
>  	if (!dryrun && len) {
> -		/* wipen on device */
> -		if (write_all(fd, buf, len))
> -			return -1;
> -		fsync(fd);
> +		if (conventional) {
> +			memset(buf, 0, len);
> +
> +			/* wipen on device */
> +			if (write_all(fd, buf, len))
> +				return -1;
> +			fsync(fd);
> +		} else {
> +#ifdef HAVE_LINUX_BLKZONED_H
> +			uint64_t zone_mask = ~(pr->zone_size - 1);
> +			struct blk_zone_range range = {
> +				.sector = (offset & zone_mask) >> 9,
> +				.nr_sectors = pr->zone_size >> 9,
> +			};
> +
> +			rc = ioctl(fd, BLKRESETZONE, &range);
> +			if (rc < 0)
> +				return -1;
> +#else
> +			/* Should not reach here */
> +			assert(0);
> +#endif
> +		}
> +
>  		pr->flags &= ~BLKID_FL_MODIF_BUFF;	/* be paranoid */
>  
>  		return blkid_probe_step_back(pr);
> 

Looks OK to me.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] blkid: implement zone-aware probing
  2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
@ 2021-04-14  9:14   ` Damien Le Moal
  2021-04-14  9:17   ` Johannes Thumshirn
  2021-04-14 13:31   ` Karel Zak
  2 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2021-04-14  9:14 UTC (permalink / raw)
  To: Naohiro Aota, Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Johannes Thumshirn

On 2021/04/14 10:33, Naohiro Aota wrote:
> This patch makes libblkid zone-aware. It can probe the magic located at
> some offset from the beginning of some specific zone of a device.
> 
> Ths patch introduces some new fields to struct blkid_idmag. They indicate
> the magic location is placed related to a zone, and the offset in the zone.
> 
> Also, this commit introduces `zone_size` to struct blkid_struct_probe. It
> stores the size of zones of a device.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  configure.ac          |  1 +
>  libblkid/src/blkidP.h |  5 +++++
>  libblkid/src/probe.c  | 29 +++++++++++++++++++++++++++--
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index bebb4085425a..52b164e834db 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -302,6 +302,7 @@ AC_CHECK_HEADERS([ \
>  	lastlog.h \
>  	libutil.h \
>  	linux/btrfs.h \
> +	linux/blkzoned.h \
>  	linux/capability.h \
>  	linux/cdrom.h \
>  	linux/falloc.h \
> diff --git a/libblkid/src/blkidP.h b/libblkid/src/blkidP.h
> index a3fe6748a969..e3a160aa97c0 100644
> --- a/libblkid/src/blkidP.h
> +++ b/libblkid/src/blkidP.h
> @@ -150,6 +150,10 @@ struct blkid_idmag
>  	const char	*hoff;		/* hint which contains byte offset to kboff */
>  	long		kboff;		/* kilobyte offset of superblock */
>  	unsigned int	sboff;		/* byte offset within superblock */
> +
> +	int		is_zoned;	/* indicate magic location is calcluated based on zone position  */
> +	long		zonenum;	/* zone number which has superblock */
> +	long		kboff_inzone;	/* kilobyte offset of superblock in a zone */
>  };
>  
>  /*
> @@ -206,6 +210,7 @@ struct blkid_struct_probe
>  	dev_t			disk_devno;	/* devno of the whole-disk or 0 */
>  	unsigned int		blkssz;		/* sector size (BLKSSZGET ioctl) */
>  	mode_t			mode;		/* struct stat.sb_mode */
> +	uint64_t		zone_size;	/* zone size (BLKGETZONESZ ioctl) */
>  
>  	int			flags;		/* private library flags */
>  	int			prob_flags;	/* always zeroized by blkid_do_*() */
> diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
> index a47a8720d4ac..9d180aab5242 100644
> --- a/libblkid/src/probe.c
> +++ b/libblkid/src/probe.c
> @@ -94,6 +94,9 @@
>  #ifdef HAVE_LINUX_CDROM_H
>  #include <linux/cdrom.h>
>  #endif
> +#ifdef HAVE_LINUX_BLKZONED_H
> +#include <linux/blkzoned.h>
> +#endif
>  #ifdef HAVE_SYS_STAT_H
>  #include <sys/stat.h>
>  #endif
> @@ -897,6 +900,7 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
>  	pr->wipe_off = 0;
>  	pr->wipe_size = 0;
>  	pr->wipe_chain = NULL;
> +	pr->zone_size = 0;
>  
>  	if (fd < 0)
>  		return 1;
> @@ -996,6 +1000,15 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
>  #endif
>  	free(dm_uuid);
>  
> +# ifdef HAVE_LINUX_BLKZONED_H
> +	if (S_ISBLK(sb.st_mode)) {
> +		uint32_t zone_size_sector;
> +
> +		if (!ioctl(pr->fd, BLKGETZONESZ, &zone_size_sector))
> +			pr->zone_size = zone_size_sector << 9;
> +	}
> +# endif
> +
>  	DBG(LOWPROBE, ul_debug("ready for low-probing, offset=%"PRIu64", size=%"PRIu64"",
>  				pr->off, pr->size));
>  	DBG(LOWPROBE, ul_debug("whole-disk: %s, regfile: %s",
> @@ -1064,12 +1077,24 @@ int blkid_probe_get_idmag(blkid_probe pr, const struct blkid_idinfo *id,
>  	/* try to detect by magic string */
>  	while(mag && mag->magic) {
>  		unsigned char *buf;
> +		uint64_t kboff;
>  		uint64_t hint_offset;
>  
>  		if (!mag->hoff || blkid_probe_get_hint(pr, mag->hoff, &hint_offset) < 0)
>  			hint_offset = 0;
>  
> -		off = hint_offset + ((mag->kboff + (mag->sboff >> 10)) << 10);
> +		/* If the magic is for zoned device, skip non-zoned device */
> +		if (mag->is_zoned && !pr->zone_size) {
> +			mag++;
> +			continue;
> +		}
> +
> +		if (!mag->is_zoned)
> +			kboff = mag->kboff;
> +		else
> +			kboff = ((mag->zonenum * pr->zone_size) >> 10) + mag->kboff_inzone;
> +
> +		off = hint_offset + ((kboff + (mag->sboff >> 10)) << 10);
>  		buf = blkid_probe_get_buffer(pr, off, 1024);
>  
>  		if (!buf && errno)
> @@ -1079,7 +1104,7 @@ int blkid_probe_get_idmag(blkid_probe pr, const struct blkid_idinfo *id,
>  				buf + (mag->sboff & 0x3ff), mag->len)) {
>  
>  			DBG(LOWPROBE, ul_debug("\tmagic sboff=%u, kboff=%ld",
> -				mag->sboff, mag->kboff));
> +				mag->sboff, kboff));
>  			if (offset)
>  				*offset = off + (mag->sboff & 0x3ff);
>  			if (res)
> 

Looks OK to me.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] blkid: implement zone-aware probing
  2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
  2021-04-14  9:14   ` Damien Le Moal
@ 2021-04-14  9:17   ` Johannes Thumshirn
  2021-04-14 13:31   ` Karel Zak
  2 siblings, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2021-04-14  9:17 UTC (permalink / raw)
  To: Naohiro Aota, Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-14  1:33 ` [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
@ 2021-04-14  9:49   ` Johannes Thumshirn
  2021-04-14 13:47   ` Karel Zak
  2021-04-16 15:52   ` David Sterba
  2 siblings, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2021-04-14  9:49 UTC (permalink / raw)
  To: Naohiro Aota, Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal

On 14/04/2021 03:33, Naohiro Aota wrote:
> This commit adds zone-aware magics and probing functions for zoned btrfs.
> 
> Superblock (and its copies) is the only data structure in btrfs with a

The superblock?

> fixed location on a device. Since we cannot overwrite in a sequential write

cannot do overwrites

> required zone, we cannot place superblock in the zone.

the superblock

> 
> Thus, zoned btrfs use superblock log writing to update superblock on

Thus, zoned btrfs uses superblock log writing to update superblocks on

> sequential write required zones. It uses two zones as a circular buffer to
> write updated superblocks. Once the first zone is filled up, start writing
> into the second buffer. When both zones are filled up and before start

starting to write

> writing to the first zone again, it reset the first zone.
> 
> We can determine the position of the latest superblock by reading write

reading the write pointer

> pointer information from a device. One corner case is when both zones are
> full. For this situation, we read out the last superblock of each zone and
> compare them to determine which zone is older.
> 
> The magics can detect a superblock magic ("_BHRfs_M") at the beginning of
> zone #0 or zone #1 to see if it is zoned btrfs. When both zones are filled
> up, zoned btrfs reset the first zone to write a new superblock. If btrfs


resets

> crash at the moment, we do not see a superblock at zone #0. Thus, we need

crashes

> to check not only zone #0 but also zone #1.
> 
> It also supports temporary magic ("!BHRfS_M") in zone #0. The mkfs.btrfs

the temporary magic [...]. Mkfs.btrfs

[...]

> +	 * Log position:
> +	 *   *: Special case, no superblock is written
> +	 *   0: Use write pointer of zones[0]
> +	 *   1: Use write pointer of zones[1]
> +	 *   C: Compare super blcoks from zones[0] and zones[1], use the latest
                        blocks ~^

[...]

> +	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2;
> +	rep = malloc(rep_size);
> +	if (!rep)
> +		return -errno;
> +
> +	memset(rep, 0, rep_size);

I think Damien already pointed this out, but that's a good opportunity for calloc().

Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] blkid: support zone reset for wipefs
  2021-04-14  1:33 ` [PATCH v2 3/3] blkid: support zone reset for wipefs Naohiro Aota
  2021-04-14  9:13   ` Damien Le Moal
@ 2021-04-14  9:50   ` Johannes Thumshirn
  2021-04-14 13:57   ` Karel Zak
  2 siblings, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2021-04-14  9:50 UTC (permalink / raw)
  To: Naohiro Aota, Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] blkid: implement zone-aware probing
  2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
  2021-04-14  9:14   ` Damien Le Moal
  2021-04-14  9:17   ` Johannes Thumshirn
@ 2021-04-14 13:31   ` Karel Zak
  2021-04-14 15:03     ` Naohiro Aota
  2 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2021-04-14 13:31 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Wed, Apr 14, 2021 at 10:33:37AM +0900, Naohiro Aota wrote:
> --- a/configure.ac
> +++ b/configure.ac
> @@ -302,6 +302,7 @@ AC_CHECK_HEADERS([ \
>  	lastlog.h \
>  	libutil.h \
>  	linux/btrfs.h \
> +	linux/blkzoned.h \

unnecessary, there is already AC_CHECK_HEADERS([linux/blkzoned.h]) on
another place.

>  	linux/capability.h \
>  	linux/cdrom.h \
>  	linux/falloc.h \
> diff --git a/libblkid/src/blkidP.h b/libblkid/src/blkidP.h
> index a3fe6748a969..e3a160aa97c0 100644
> --- a/libblkid/src/blkidP.h
> +++ b/libblkid/src/blkidP.h
> @@ -150,6 +150,10 @@ struct blkid_idmag
>  	const char	*hoff;		/* hint which contains byte offset to kboff */
>  	long		kboff;		/* kilobyte offset of superblock */
>  	unsigned int	sboff;		/* byte offset within superblock */
> +
> +	int		is_zoned;	/* indicate magic location is calcluated based on zone position  */
> +	long		zonenum;	/* zone number which has superblock */
> +	long		kboff_inzone;	/* kilobyte offset of superblock in a zone */

It would be better to use 'flags' struct field and

  #define BLKID_FL_ZONED_DEV (1 << 6)

like we use for another stuff.

> diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
> index a47a8720d4ac..9d180aab5242 100644
> --- a/libblkid/src/probe.c
> +++ b/libblkid/src/probe.c
> @@ -94,6 +94,9 @@
>  #ifdef HAVE_LINUX_CDROM_H
>  #include <linux/cdrom.h>
>  #endif
> +#ifdef HAVE_LINUX_BLKZONED_H
> +#include <linux/blkzoned.h>
> +#endif
>  #ifdef HAVE_SYS_STAT_H
>  #include <sys/stat.h>
>  #endif
> @@ -897,6 +900,7 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
>  	pr->wipe_off = 0;
>  	pr->wipe_size = 0;
>  	pr->wipe_chain = NULL;
> +	pr->zone_size = 0;

you also need to update blkid_clone_probe() function

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-14  1:33 ` [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
  2021-04-14  9:49   ` Johannes Thumshirn
@ 2021-04-14 13:47   ` Karel Zak
  2021-04-14 15:08     ` Naohiro Aota
  2021-04-16 15:52   ` David Sterba
  2 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2021-04-14 13:47 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Wed, Apr 14, 2021 at 10:33:38AM +0900, Naohiro Aota wrote:
> +#define ASSERT(x) assert(x)

Really? ;-)

> +typedef uint64_t u64;
> +typedef uint64_t sector_t;
> +typedef uint8_t u8;

I do not see a reason for u64 and u8 here.

> +
> +#ifdef HAVE_LINUX_BLKZONED_H
> +static int sb_write_pointer(int fd, struct blk_zone *zones, u64 *wp_ret)
> +{
> +	bool empty[BTRFS_NR_SB_LOG_ZONES];
> +	bool full[BTRFS_NR_SB_LOG_ZONES];
> +	sector_t sector;
> +
> +	ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL &&
> +	       zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL);

assert()

 ...
> +		for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> +			u64 bytenr;
> +
> +			bytenr = ((zones[i].start + zones[i].len)
> +				   << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE;
> +
> +			ret = pread64(fd, buf[i], BTRFS_SUPER_INFO_SIZE,
> +				      bytenr);

 please, use  

     ptr = blkid_probe_get_buffer(pr, BTRFS_SUPER_INFO_SIZE, bytenr);

 the library will care about the buffer and reuse it. It's also
 important to keep blkid_do_wipe() usable.

> +			if (ret != BTRFS_SUPER_INFO_SIZE)
> +				return -EIO;
> +			super[i] = (struct btrfs_super_block *)&buf[i];

  super[i] = (struct btrfs_super_block *) ptr;

> +		}
> +
> +		if (super[0]->generation > super[1]->generation)
> +			sector = zones[1].start;
> +		else
> +			sector = zones[0].start;
> +	} else if (!full[0] && (empty[1] || full[1])) {
> +		sector = zones[0].wp;
> +	} else if (full[0]) {
> +		sector = zones[1].wp;
> +	} else {
> +		return -EUCLEAN;
> +	}
> +	*wp_ret = sector << SECTOR_SHIFT;
> +	return 0;
> +}
> +
> +static int sb_log_offset(blkid_probe pr, uint64_t *bytenr_ret)
> +{
> +	uint32_t zone_num = 0;
> +	uint32_t zone_size_sector;
> +	struct blk_zone_report *rep;
> +	struct blk_zone *zones;
> +	size_t rep_size;
> +	int ret;
> +	uint64_t wp;
> +
> +	zone_size_sector = pr->zone_size >> SECTOR_SHIFT;
> +
> +	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2;
> +	rep = malloc(rep_size);
> +	if (!rep)
> +		return -errno;
> +
> +	memset(rep, 0, rep_size);
> +	rep->sector = zone_num * zone_size_sector;
> +	rep->nr_zones = 2;

what about to add to lib/blkdev.c a new function:

   struct blk_zone_report *blkdev_get_zonereport(int fd, uint64 sector, int nzones);

and call this function from your sb_log_offset() as well as from blkid_do_wipe()?

Anyway, calloc() is better than malloc()+memset().

> +	if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +		*bytenr_ret = zones[0].start << SECTOR_SHIFT;
> +		ret = 0;
> +		goto out;
> +	} else if (zones[1].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +		*bytenr_ret = zones[1].start << SECTOR_SHIFT;
> +		ret = 0;
> +		goto out;
> +	}

what about:

 for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
   if (zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
      *bytenr_ret = zones[i].start << SECTOR_SHIFT;
      ret = 0;
      goto out;
   }
 }




 Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] blkid: support zone reset for wipefs
  2021-04-14  1:33 ` [PATCH v2 3/3] blkid: support zone reset for wipefs Naohiro Aota
  2021-04-14  9:13   ` Damien Le Moal
  2021-04-14  9:50   ` Johannes Thumshirn
@ 2021-04-14 13:57   ` Karel Zak
  2021-04-16 16:04     ` David Sterba
  2 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2021-04-14 13:57 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Wed, Apr 14, 2021 at 10:33:39AM +0900, Naohiro Aota wrote:
> +static int is_conventional(blkid_probe pr, uint64_t offset)
> +{
> +	struct blk_zone_report *rep = NULL;
> +	size_t rep_size;
> +	int ret;
> +	uint64_t zone_mask;
> +
> +	if (!pr->zone_size)
> +		return 1;
> +
> +	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone);
> +	rep = calloc(1, rep_size);
> +	if (!rep)
> +		return -1;
> +
> +	zone_mask = ~(pr->zone_size - 1);
> +	rep->sector = (offset & zone_mask) >> 9;
> +	rep->nr_zones = 1;
> +	ret = ioctl(blkid_probe_get_fd(pr), BLKREPORTZONE, rep);
> +	if (ret) {
> +		free(rep);
> +		return -1;
> +	}

  ret = blkdev_get_zonereport()

:-)

>  /**
>   * blkid_do_wipe:
>   * @pr: prober
> @@ -1267,6 +1310,7 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
>  	const char *off = NULL;
>  	size_t len = 0;
>  	uint64_t offset, magoff;
> +	bool conventional;

BTW, nowhere in libblkid we use "bool". It would be probably better to include
<stdbool.h> to blkidP.h.

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs
  2021-04-14  1:33 [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
                   ` (2 preceding siblings ...)
  2021-04-14  1:33 ` [PATCH v2 3/3] blkid: support zone reset for wipefs Naohiro Aota
@ 2021-04-14 14:09 ` Karel Zak
  2021-04-14 22:04   ` Damien Le Moal
  2021-04-22 14:41 ` Karel Zak
  4 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2021-04-14 14:09 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Wed, Apr 14, 2021 at 10:33:36AM +0900, Naohiro Aota wrote:
> This series implements probing and wiping of the superblock of zoned btrfs.

I have no disk with zones support, but it seems scsi_debug supports
it. Do you have any step by step example how to test with btrfs? If
yes, I will add a test to our test suite.

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] blkid: implement zone-aware probing
  2021-04-14 13:31   ` Karel Zak
@ 2021-04-14 15:03     ` Naohiro Aota
  0 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-14 15:03 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Wed, Apr 14, 2021 at 03:31:01PM +0200, Karel Zak wrote:
> On Wed, Apr 14, 2021 at 10:33:37AM +0900, Naohiro Aota wrote:
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -302,6 +302,7 @@ AC_CHECK_HEADERS([ \
> >  	lastlog.h \
> >  	libutil.h \
> >  	linux/btrfs.h \
> > +	linux/blkzoned.h \
> 
> unnecessary, there is already AC_CHECK_HEADERS([linux/blkzoned.h]) on
> another place.

Ah, I missed that. I will drop it.

> >  	linux/capability.h \
> >  	linux/cdrom.h \
> >  	linux/falloc.h \
> > diff --git a/libblkid/src/blkidP.h b/libblkid/src/blkidP.h
> > index a3fe6748a969..e3a160aa97c0 100644
> > --- a/libblkid/src/blkidP.h
> > +++ b/libblkid/src/blkidP.h
> > @@ -150,6 +150,10 @@ struct blkid_idmag
> >  	const char	*hoff;		/* hint which contains byte offset to kboff */
> >  	long		kboff;		/* kilobyte offset of superblock */
> >  	unsigned int	sboff;		/* byte offset within superblock */
> > +
> > +	int		is_zoned;	/* indicate magic location is calcluated based on zone position  */
> > +	long		zonenum;	/* zone number which has superblock */
> > +	long		kboff_inzone;	/* kilobyte offset of superblock in a zone */
> 
> It would be better to use 'flags' struct field and
> 
>   #define BLKID_FL_ZONED_DEV (1 << 6)
> 
> like we use for another stuff.

BLKID_FL_* flags looks like to indicate a device's property. Instead,
this one indicates a magic is placed relative to a zone. I do not see
blkid_idmag is currently using "flags" filed. Should we really add it
and follow the flag style? I thought, we can do it later when other
use case exists.

> > diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
> > index a47a8720d4ac..9d180aab5242 100644
> > --- a/libblkid/src/probe.c
> > +++ b/libblkid/src/probe.c
> > @@ -94,6 +94,9 @@
> >  #ifdef HAVE_LINUX_CDROM_H
> >  #include <linux/cdrom.h>
> >  #endif
> > +#ifdef HAVE_LINUX_BLKZONED_H
> > +#include <linux/blkzoned.h>
> > +#endif
> >  #ifdef HAVE_SYS_STAT_H
> >  #include <sys/stat.h>
> >  #endif
> > @@ -897,6 +900,7 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
> >  	pr->wipe_off = 0;
> >  	pr->wipe_size = 0;
> >  	pr->wipe_chain = NULL;
> > +	pr->zone_size = 0;
> 
> you also need to update blkid_clone_probe() function

Will do. I completely missed that. Thanks.

>   Karel
> 
> -- 
>  Karel Zak  <kzak@redhat.com>
>  http://karelzak.blogspot.com
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-14 13:47   ` Karel Zak
@ 2021-04-14 15:08     ` Naohiro Aota
  0 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-14 15:08 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Wed, Apr 14, 2021 at 03:47:08PM +0200, Karel Zak wrote:
> On Wed, Apr 14, 2021 at 10:33:38AM +0900, Naohiro Aota wrote:
> > +#define ASSERT(x) assert(x)
> 
> Really? ;-)
> 
> > +typedef uint64_t u64;
> > +typedef uint64_t sector_t;
> > +typedef uint8_t u8;
> 
> I do not see a reason for u64 and u8 here.

Yep, these are here just to make it easy to copy the code from
kernel. But this code won't change so much, so I can drop these.

> > +
> > +#ifdef HAVE_LINUX_BLKZONED_H
> > +static int sb_write_pointer(int fd, struct blk_zone *zones, u64 *wp_ret)
> > +{
> > +	bool empty[BTRFS_NR_SB_LOG_ZONES];
> > +	bool full[BTRFS_NR_SB_LOG_ZONES];
> > +	sector_t sector;
> > +
> > +	ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL &&
> > +	       zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL);
> 
> assert()

I will use it.

>  ...
> > +		for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
> > +			u64 bytenr;
> > +
> > +			bytenr = ((zones[i].start + zones[i].len)
> > +				   << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE;
> > +
> > +			ret = pread64(fd, buf[i], BTRFS_SUPER_INFO_SIZE,
> > +				      bytenr);
> 
>  please, use  
> 
>      ptr = blkid_probe_get_buffer(pr, BTRFS_SUPER_INFO_SIZE, bytenr);
> 
>  the library will care about the buffer and reuse it. It's also
>  important to keep blkid_do_wipe() usable.

Sure. I'll use it.

> > +			if (ret != BTRFS_SUPER_INFO_SIZE)
> > +				return -EIO;
> > +			super[i] = (struct btrfs_super_block *)&buf[i];
> 
>   super[i] = (struct btrfs_super_block *) ptr;
> 
> > +		}
> > +
> > +		if (super[0]->generation > super[1]->generation)
> > +			sector = zones[1].start;
> > +		else
> > +			sector = zones[0].start;
> > +	} else if (!full[0] && (empty[1] || full[1])) {
> > +		sector = zones[0].wp;
> > +	} else if (full[0]) {
> > +		sector = zones[1].wp;
> > +	} else {
> > +		return -EUCLEAN;
> > +	}
> > +	*wp_ret = sector << SECTOR_SHIFT;
> > +	return 0;
> > +}
> > +
> > +static int sb_log_offset(blkid_probe pr, uint64_t *bytenr_ret)
> > +{
> > +	uint32_t zone_num = 0;
> > +	uint32_t zone_size_sector;
> > +	struct blk_zone_report *rep;
> > +	struct blk_zone *zones;
> > +	size_t rep_size;
> > +	int ret;
> > +	uint64_t wp;
> > +
> > +	zone_size_sector = pr->zone_size >> SECTOR_SHIFT;
> > +
> > +	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2;
> > +	rep = malloc(rep_size);
> > +	if (!rep)
> > +		return -errno;
> > +
> > +	memset(rep, 0, rep_size);
> > +	rep->sector = zone_num * zone_size_sector;
> > +	rep->nr_zones = 2;
> 
> what about to add to lib/blkdev.c a new function:
> 
>    struct blk_zone_report *blkdev_get_zonereport(int fd, uint64 sector, int nzones);
> 
> and call this function from your sb_log_offset() as well as from blkid_do_wipe()?
> 
> Anyway, calloc() is better than malloc()+memset().

Indeed. I will do so.

> > +	if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> > +		*bytenr_ret = zones[0].start << SECTOR_SHIFT;
> > +		ret = 0;
> > +		goto out;
> > +	} else if (zones[1].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> > +		*bytenr_ret = zones[1].start << SECTOR_SHIFT;
> > +		ret = 0;
> > +		goto out;
> > +	}
> 
> what about:
> 
>  for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
>    if (zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
>       *bytenr_ret = zones[i].start << SECTOR_SHIFT;
>       ret = 0;
>       goto out;
>    }
>  }

Yes, this looks cleaner. Thanks.

> 
> 
> 
>  Karel
> 
> -- 
>  Karel Zak  <kzak@redhat.com>
>  http://karelzak.blogspot.com
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs
  2021-04-14 14:09 ` [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak
@ 2021-04-14 22:04   ` Damien Le Moal
  0 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2021-04-14 22:04 UTC (permalink / raw)
  To: Karel Zak, Naohiro Aota
  Cc: util-linux, linux-btrfs, linux-fsdevel, Johannes Thumshirn

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

On 2021/04/14 23:11, Karel Zak wrote:
> On Wed, Apr 14, 2021 at 10:33:36AM +0900, Naohiro Aota wrote:
>> This series implements probing and wiping of the superblock of zoned btrfs.
> 
> I have no disk with zones support, but it seems scsi_debug supports
> it. Do you have any step by step example how to test with btrfs? If
> yes, I will add a test to our test suite.

Yes, scsi_debug does support emulating a ZBC disk. You can setup a disk like this:

modprobe scsi_debug \
	max_luns=1 \
	sector_size=4096 \
	zbc=host-managed \
	dev_size_mb=2048 \
	zone_size_mb=64 \
	zone_nr_conv=0

This will create a 2GB capacity disk with 64 MB zones.
Another solution, may be simpler, is to use null_blk. I am attaching a script
that I use to create zoned null block devices.

# nullblk-create.sh --help
Usage: nullblk-create.sh [options]
Options:
    -h | --help      : Display this help message and exit
    -v               : Be verbose (display final config)
    -cap <size (GB)> : set device capacity (default: 8)
                       For zoned devices, capacity is determined
                       with zone size and total number of zones
    -bs <size (B)>   : set sector size (default: 512)
    -m               : enable memory backing (default: false)
    -z               : create a zoned device (default: false)
    -qm <mode>       : set queue mode (default: 2)
                       0=bio, 1=rq, 2=multiqueue
    -sq <num>        : set number of submission queues
                       (default: nproc)
    -qd <depth>      : set queue depth (default: 64)
    -im <mode>       : set IRQ mode (default: 0)
                       0=none, 1=softirq, 2=timer
    -c <nsecs>       : set completion time for timer completion
                       (default: 10000 ns)
Options for zoned devices:
    -zs <size (MB)>  : set zone size (default: 8 MB)
    -zc <size (MB)>  : set zone capacity (default: zone size)
    -znc <num>       : set number of conv zones (default: 0)
    -zns <num>       : set number of swr zones (default: 8)
    -zr              : add a smaller runt swr zone (default: none)
    -zmo <num>       : set max open zones (default: no limit)
    -zma <num>       : set max active zones (default: no limit)

Something like this:

# nullblk-create.sh -m -cap 2 -z -zs 64 -znc 0
Created /dev/nullb0

Will also create a 2GB capacity disk with memory backing and 64 MB zones.

For the correct setup of the drive to test btrfs, I will let Naohiro and
Johannes comment.

-- 
Damien Le Moal
Western Digital Research

[-- Attachment #2: nullblk-create.sh --]
[-- Type: application/x-sh, Size: 3574 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-14  1:33 ` [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
  2021-04-14  9:49   ` Johannes Thumshirn
  2021-04-14 13:47   ` Karel Zak
@ 2021-04-16 15:52   ` David Sterba
  2021-04-26  1:38     ` Naohiro Aota
  2 siblings, 1 reply; 20+ messages in thread
From: David Sterba @ 2021-04-16 15:52 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: Karel Zak, util-linux, linux-btrfs, linux-fsdevel,
	Damien Le Moal, Johannes Thumshirn

On Wed, Apr 14, 2021 at 10:33:38AM +0900, Naohiro Aota wrote:
> It also supports temporary magic ("!BHRfS_M") in zone #0. The mkfs.btrfs
> first writes the temporary superblock to the zone during the mkfs process.
> It will survive there until the zones are filled up and reset. So, we also
> need to detect this temporary magic.

> +	  /*
> +	   * For zoned btrfs, we also need to detect a temporary superblock
> +	   * at zone #0. Mkfs.btrfs creates it in the initialize process.
> +	   * It persits until both zones are filled up then reset.
> +	   */
> +	  { .magic = "!BHRfS_M", .len = 8, .sboff = 0x40,
> +	    .is_zoned = 1, .zonenum = 0, .kboff_inzone = 0 },

Should we rather reset the zone twice so the initial superblock does not
have the temporary signature?

For the primary superblock at offset 64K and as a fallback the signature
should be valid for detection purposes (ie. not necessarily to get the
latest superblock).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] blkid: support zone reset for wipefs
  2021-04-14 13:57   ` Karel Zak
@ 2021-04-16 16:04     ` David Sterba
  0 siblings, 0 replies; 20+ messages in thread
From: David Sterba @ 2021-04-16 16:04 UTC (permalink / raw)
  To: Karel Zak
  Cc: Naohiro Aota, util-linux, linux-btrfs, linux-fsdevel,
	Damien Le Moal, Johannes Thumshirn

On Wed, Apr 14, 2021 at 03:57:42PM +0200, Karel Zak wrote:
> On Wed, Apr 14, 2021 at 10:33:39AM +0900, Naohiro Aota wrote:
> >  /**
> >   * blkid_do_wipe:
> >   * @pr: prober
> > @@ -1267,6 +1310,7 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
> >  	const char *off = NULL;
> >  	size_t len = 0;
> >  	uint64_t offset, magoff;
> > +	bool conventional;
> 
> BTW, nowhere in libblkid we use "bool". It would be probably better to include
> <stdbool.h> to blkidP.h.

Pulling a whole new header just for one local variable that can be int
seems too much.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs
  2021-04-14  1:33 [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
                   ` (3 preceding siblings ...)
  2021-04-14 14:09 ` [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak
@ 2021-04-22 14:41 ` Karel Zak
  4 siblings, 0 replies; 20+ messages in thread
From: Karel Zak @ 2021-04-22 14:41 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: util-linux, Damien Le Moal, Johannes Thumshirn

On Wed, Apr 14, 2021 at 10:33:36AM +0900, Naohiro Aota wrote:
>  configure.ac                     |   1 +
>  libblkid/src/blkidP.h            |   5 +
>  libblkid/src/probe.c             | 108 ++++++++++++++++--
>  libblkid/src/superblocks/btrfs.c | 185 ++++++++++++++++++++++++++++++-
>  4 files changed, 290 insertions(+), 9 deletions(-)

Note that I'll merge it to v2.38, it seems too late for v2.37 where we
have already -rc1 and I guess Naohiro will prepare v3 of the patchs too.

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-16 15:52   ` David Sterba
@ 2021-04-26  1:38     ` Naohiro Aota
  0 siblings, 0 replies; 20+ messages in thread
From: Naohiro Aota @ 2021-04-26  1:38 UTC (permalink / raw)
  To: dsterba
  Cc: Karel Zak, util-linux, linux-btrfs, linux-fsdevel,
	Damien Le Moal, Johannes Thumshirn

On Fri, Apr 16, 2021 at 05:52:41PM +0200, David Sterba wrote:
> On Wed, Apr 14, 2021 at 10:33:38AM +0900, Naohiro Aota wrote:
> > It also supports temporary magic ("!BHRfS_M") in zone #0. The mkfs.btrfs
> > first writes the temporary superblock to the zone during the mkfs process.
> > It will survive there until the zones are filled up and reset. So, we also
> > need to detect this temporary magic.
> 
> > +	  /*
> > +	   * For zoned btrfs, we also need to detect a temporary superblock
> > +	   * at zone #0. Mkfs.btrfs creates it in the initialize process.
> > +	   * It persits until both zones are filled up then reset.
> > +	   */
> > +	  { .magic = "!BHRfS_M", .len = 8, .sboff = 0x40,
> > +	    .is_zoned = 1, .zonenum = 0, .kboff_inzone = 0 },
> 
> Should we rather reset the zone twice so the initial superblock does not
> have the temporary signature?

OK, sure. I'll modify the mkfs.btrfs to reset the superblock log zone
before writing out final superblock.

> For the primary superblock at offset 64K and as a fallback the signature
> should be valid for detection purposes (ie. not necessarily to get the
> latest superblock).

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-04-26  1:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14  1:33 [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
2021-04-14  1:33 ` [PATCH v2 1/3] blkid: implement zone-aware probing Naohiro Aota
2021-04-14  9:14   ` Damien Le Moal
2021-04-14  9:17   ` Johannes Thumshirn
2021-04-14 13:31   ` Karel Zak
2021-04-14 15:03     ` Naohiro Aota
2021-04-14  1:33 ` [PATCH v2 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
2021-04-14  9:49   ` Johannes Thumshirn
2021-04-14 13:47   ` Karel Zak
2021-04-14 15:08     ` Naohiro Aota
2021-04-16 15:52   ` David Sterba
2021-04-26  1:38     ` Naohiro Aota
2021-04-14  1:33 ` [PATCH v2 3/3] blkid: support zone reset for wipefs Naohiro Aota
2021-04-14  9:13   ` Damien Le Moal
2021-04-14  9:50   ` Johannes Thumshirn
2021-04-14 13:57   ` Karel Zak
2021-04-16 16:04     ` David Sterba
2021-04-14 14:09 ` [PATCH v2 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak
2021-04-14 22:04   ` Damien Le Moal
2021-04-22 14:41 ` Karel Zak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).