util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs
@ 2021-04-26  5:50 Naohiro Aota
  2021-04-26  5:50 ` [PATCH v3 1/3] blkid: implement zone-aware probing Naohiro Aota
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Naohiro Aota @ 2021-04-26  5:50 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

This series implements probing and wiping of the superblock of zoned btrfs.

Changes:
  - v3:
     - Implement and use blkdev_get_zonereport()
     - Also modify blkid_clone_probe() for completeness
     - Drop temporary btrfs magic from the table
     - Do not try to aggressively copy-paste the kernel side code
     - Fix commit log
  - v2:
     - Fix zone alignment calculation
     - Fix the build without HAVE_LINUX_BLKZONED_H

Zoned btrfs is merged with this series:
https://lore.kernel.org/linux-btrfs/20210222160049.GR1993@twin.jikos.cz/T/

And, superblock locations are finalized with this patch:
https://lore.kernel.org/linux-btrfs/BL0PR04MB651442E6ACBF48342BD00FEBE7719@BL0PR04MB6514.namprd04.prod.outlook.com/T/

Corresponding btrfs-progs is available here:
https://github.com/naota/btrfs-progs/tree/btrfs-zoned

A zoned block device consists of a number of zones. Zones are either
conventional and accepting random writes or sequential and requiring that
writes be issued in LBA order from each zone write pointer position.

Superblock (and its copies) is the only data structure in btrfs with a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone.

Thus, zoned btrfs use superblock log writing to update superblock on
sequential write required zones. It uses two zones as a circular buffer to
write updated superblocks. Once the first zone is filled up, start writing
into the second buffer. When both zones are filled up and before start
writing to the first zone again, it reset the first zone.

This series first implements zone based detection of the magic location.
Then, it adds magics for zoned btrfs and implements a probing function to
detect the latest superblock. Finally, this series also implements
zone-aware wiping by zone resetting.

* Testing device

You need devices with zone append writing command support to run ZONED
btrfs.

Other than real devices, null_blk supports zone append write command. You
can use memory backed null_blk to run the test on it. Following script
creates 12800 MB /dev/nullb0 filled with 4MB zones.

    sysfs=/sys/kernel/config/nullb/nullb0
    size=12800 # MB
    zone_size= 4 # MB
    
    # drop nullb0
    if [[ -d $sysfs ]]; then
            echo 0 > "${sysfs}"/power
            rmdir $sysfs
    fi
    lsmod | grep -q null_blk && rmmod null_blk
    modprobe null_blk nr_devices=0
    
    mkdir "${sysfs}"
    
    echo "${size}" > "${sysfs}"/size
    echo 1 > "${sysfs}"/zoned
    echo "${zone_size}" > "${sysfs}"/zone_size
    echo 0 > "${sysfs}"/zone_nr_conv
    echo 1 > "${sysfs}"/memory_backed
    
    echo 1 > "${sysfs}"/power
    udevadm settle

Zoned SCSI devices such as SMR HDDs or scsi_debug also support the zone
append command as an emulated command within the SCSI sd driver. This
emulation is completely transparent to the user and provides the same
semantic as a NVMe ZNS native drive support.

Also, there is a qemu patch available to enable NVMe ZNS device.

Then, you can create zoned btrfs with the above btrfs-progs.

    $ mkfs.btrfs -d single -m single /dev/nullb0
    btrfs-progs v5.11
    See http://btrfs.wiki.kernel.org for more information.
    
    ERROR: superblock magic doesn't match
    /dev/nullb0: host-managed device detected, setting zoned feature
    Resetting device zones /dev/nullb0 (3200 zones) ...
    Label:              (null)
    UUID:               1e5912a2-b5c3-46fb-aa9a-ee3d073ff600
    Node size:          16384
    Sector size:        4096
    Filesystem size:    12.50GiB
    Block group profiles:
      Data:             single            4.00MiB
      Metadata:         single            4.00MiB
      System:           single            4.00MiB
    SSD detected:       yes
    Zoned device:       yes
    Zone size:          4.00MiB
    Incompat features:  extref, skinny-metadata, zoned
    Runtime features:   
    Checksum:           crc32c
    Number of devices:  1
    Devices:
       ID        SIZE  PATH
        1    12.50GiB  /dev/nullb0
    $ mount /dev/nullb0 /mnt/somewhere
    $ dmesg | tail
    ...
    [272816.682461] BTRFS: device fsid 1e5912a2-b5c3-46fb-aa9a-ee3d073ff600 devid 1 transid 5 /dev/nullb0 scanned by mkfs.btrfs (44367)
    [272883.678401] BTRFS info (device nullb0): has skinny extents
    [272883.686373] BTRFS info (device nullb0): flagging fs with big metadata feature
    [272883.699020] BTRFS info (device nullb0): host-managed zoned block device /dev/nullb0, 3200 zones of 4194304 bytes
    [272883.711736] BTRFS info (device nullb0): zoned mode enabled with zone size 4194304
    [272883.722388] BTRFS info (device nullb0): enabling ssd optimizations
    [272883.731332] BTRFS info (device nullb0): checking UUID tree

Naohiro Aota (3):
  blkid: implement zone-aware probing
  blkid: add magic and probing for zoned btrfs
  blkid: support zone reset for wipefs

 include/blkdev.h                 |   9 ++
 lib/blkdev.c                     |  29 ++++++
 libblkid/src/blkidP.h            |   5 +
 libblkid/src/probe.c             |  99 +++++++++++++++++--
 libblkid/src/superblocks/btrfs.c | 159 ++++++++++++++++++++++++++++++-
 5 files changed, 292 insertions(+), 9 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/3] blkid: implement zone-aware probing
  2021-04-26  5:50 [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
@ 2021-04-26  5:50 ` Naohiro Aota
  2021-04-26  5:50 ` [PATCH v3 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Naohiro Aota @ 2021-04-26  5:50 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

This patch makes libblkid zone-aware. It can probe the magic located at
some offset from the beginning of some specific zone of a device.

This patch introduces some new fields to struct blkid_idmag. They indicate
the magic location is placed related to a zone and the offset in the zone.

Also, this commit introduces `zone_size` to struct blkid_struct_probe. It
stores the size of zones of a device.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 libblkid/src/blkidP.h |  5 +++++
 libblkid/src/probe.c  | 30 ++++++++++++++++++++++++++++--
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/libblkid/src/blkidP.h b/libblkid/src/blkidP.h
index a3fe6748a969..e3a160aa97c0 100644
--- a/libblkid/src/blkidP.h
+++ b/libblkid/src/blkidP.h
@@ -150,6 +150,10 @@ struct blkid_idmag
 	const char	*hoff;		/* hint which contains byte offset to kboff */
 	long		kboff;		/* kilobyte offset of superblock */
 	unsigned int	sboff;		/* byte offset within superblock */
+
+	int		is_zoned;	/* indicate magic location is calcluated based on zone position  */
+	long		zonenum;	/* zone number which has superblock */
+	long		kboff_inzone;	/* kilobyte offset of superblock in a zone */
 };
 
 /*
@@ -206,6 +210,7 @@ struct blkid_struct_probe
 	dev_t			disk_devno;	/* devno of the whole-disk or 0 */
 	unsigned int		blkssz;		/* sector size (BLKSSZGET ioctl) */
 	mode_t			mode;		/* struct stat.sb_mode */
+	uint64_t		zone_size;	/* zone size (BLKGETZONESZ ioctl) */
 
 	int			flags;		/* private library flags */
 	int			prob_flags;	/* always zeroized by blkid_do_*() */
diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
index a47a8720d4ac..219cceea0f94 100644
--- a/libblkid/src/probe.c
+++ b/libblkid/src/probe.c
@@ -94,6 +94,9 @@
 #ifdef HAVE_LINUX_CDROM_H
 #include <linux/cdrom.h>
 #endif
+#ifdef HAVE_LINUX_BLKZONED_H
+#include <linux/blkzoned.h>
+#endif
 #ifdef HAVE_SYS_STAT_H
 #include <sys/stat.h>
 #endif
@@ -177,6 +180,7 @@ blkid_probe blkid_clone_probe(blkid_probe parent)
 	pr->disk_devno = parent->disk_devno;
 	pr->blkssz = parent->blkssz;
 	pr->flags = parent->flags;
+	pr->zone_size = parent->zone_size;
 	pr->parent = parent;
 
 	pr->flags &= ~BLKID_FL_PRIVATE_FD;
@@ -897,6 +901,7 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
 	pr->wipe_off = 0;
 	pr->wipe_size = 0;
 	pr->wipe_chain = NULL;
+	pr->zone_size = 0;
 
 	if (fd < 0)
 		return 1;
@@ -996,6 +1001,15 @@ int blkid_probe_set_device(blkid_probe pr, int fd,
 #endif
 	free(dm_uuid);
 
+# ifdef HAVE_LINUX_BLKZONED_H
+	if (S_ISBLK(sb.st_mode)) {
+		uint32_t zone_size_sector;
+
+		if (!ioctl(pr->fd, BLKGETZONESZ, &zone_size_sector))
+			pr->zone_size = zone_size_sector << 9;
+	}
+# endif
+
 	DBG(LOWPROBE, ul_debug("ready for low-probing, offset=%"PRIu64", size=%"PRIu64"",
 				pr->off, pr->size));
 	DBG(LOWPROBE, ul_debug("whole-disk: %s, regfile: %s",
@@ -1064,12 +1078,24 @@ int blkid_probe_get_idmag(blkid_probe pr, const struct blkid_idinfo *id,
 	/* try to detect by magic string */
 	while(mag && mag->magic) {
 		unsigned char *buf;
+		uint64_t kboff;
 		uint64_t hint_offset;
 
 		if (!mag->hoff || blkid_probe_get_hint(pr, mag->hoff, &hint_offset) < 0)
 			hint_offset = 0;
 
-		off = hint_offset + ((mag->kboff + (mag->sboff >> 10)) << 10);
+		/* If the magic is for zoned device, skip non-zoned device */
+		if (mag->is_zoned && !pr->zone_size) {
+			mag++;
+			continue;
+		}
+
+		if (!mag->is_zoned)
+			kboff = mag->kboff;
+		else
+			kboff = ((mag->zonenum * pr->zone_size) >> 10) + mag->kboff_inzone;
+
+		off = hint_offset + ((kboff + (mag->sboff >> 10)) << 10);
 		buf = blkid_probe_get_buffer(pr, off, 1024);
 
 		if (!buf && errno)
@@ -1079,7 +1105,7 @@ int blkid_probe_get_idmag(blkid_probe pr, const struct blkid_idinfo *id,
 				buf + (mag->sboff & 0x3ff), mag->len)) {
 
 			DBG(LOWPROBE, ul_debug("\tmagic sboff=%u, kboff=%ld",
-				mag->sboff, mag->kboff));
+				mag->sboff, kboff));
 			if (offset)
 				*offset = off + (mag->sboff & 0x3ff);
 			if (res)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/3] blkid: add magic and probing for zoned btrfs
  2021-04-26  5:50 [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
  2021-04-26  5:50 ` [PATCH v3 1/3] blkid: implement zone-aware probing Naohiro Aota
@ 2021-04-26  5:50 ` Naohiro Aota
  2021-04-26  5:50 ` [PATCH v3 3/3] blkid: support zone reset for wipefs Naohiro Aota
  2021-04-28 11:36 ` [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak
  3 siblings, 0 replies; 5+ messages in thread
From: Naohiro Aota @ 2021-04-26  5:50 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

This commit adds zone-aware magics and probing functions for zoned btrfs.

The superblock (and its copies) are the only data structure in btrfs with a
fixed location on a device. Since we cannot do overwrites in a sequential
write required zone, we cannot place the superblock in the zone.

Thus, zoned btrfs uses superblock log writing to update superblocks on
sequential write required zones. It uses two zones as a circular buffer to
write updated superblocks. Once the first zone is filled up, start writing
into the second buffer. When both zones are filled up, and before starting
to write to the first zone again, it reset the first zone.

We can determine the position of the latest superblock by reading the write
pointer information from a device. One corner case is when both zones are
full. For this situation, we read out the last superblock of each zone and
compare them to determine which zone is older.

The magics can detect a superblock magic ("_BHRfs_M") at the beginning of
zone #0 or zone #1 to see if it is zoned btrfs. When both zones are filled
up, zoned btrfs resets the first zone to write a new superblock. If btrfs
crashes at the moment, we do not see a superblock at zone #0. Thus, we need
to check not only zone #0 but also zone #1.

It also supports the temporary magic ("!BHRfS_M") in zone #0. Mkfs.btrfs
first writes the temporary superblock to the zone during the mkfs process.
It will survive there until the zones are filled up and reset. So, we also
need to detect this temporary magic.

Finally, this commit extends probe_btrfs() to load the latest superblock
determined by the write pointers.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 include/blkdev.h                 |   9 ++
 lib/blkdev.c                     |  29 ++++++
 libblkid/src/superblocks/btrfs.c | 159 ++++++++++++++++++++++++++++++-
 3 files changed, 196 insertions(+), 1 deletion(-)

diff --git a/include/blkdev.h b/include/blkdev.h
index 6cbecbb65f82..43a5f5224857 100644
--- a/include/blkdev.h
+++ b/include/blkdev.h
@@ -15,6 +15,7 @@
 #include <fcntl.h>
 #include <unistd.h>
 #include <sys/stat.h>
+#include <stdint.h>
 
 #ifdef HAVE_SYS_MKDEV_H
 # include <sys/mkdev.h>		/* major and minor on Solaris */
@@ -147,5 +148,13 @@ int blkdev_get_geometry(int fd, unsigned int *h, unsigned int *s);
 const char *blkdev_scsi_type_to_name(int type);
 
 int blkdev_lock(int fd, const char *devname, const char *lockmode);
+#ifdef HAVE_LINUX_BLKZONED_H
+struct blk_zone_report *blkdev_get_zonereport(int fd, uint64_t sector, uint32_t nzones);
+#else
+static inline struct blk_zone_report *blkdev_get_zonereport(int fd, uint64_t sector, uint32_t nzones)
+{
+	return NULL;
+}
+#endif
 
 #endif /* BLKDEV_H */
diff --git a/lib/blkdev.c b/lib/blkdev.c
index c22853ddcbb0..9de8512917a9 100644
--- a/lib/blkdev.c
+++ b/lib/blkdev.c
@@ -15,6 +15,10 @@
 #include <linux/fd.h>
 #endif
 
+#ifdef HAVE_LINUX_BLKZONED_H
+#include <linux/blkzoned.h>
+#endif
+
 #ifdef HAVE_SYS_DISKLABEL_H
 #include <sys/disklabel.h>
 #endif
@@ -412,6 +416,31 @@ int blkdev_lock(int fd, const char *devname, const char *lockmode)
 	return rc;
 }
 
+#ifdef HAVE_LINUX_BLKZONED_H
+struct blk_zone_report *blkdev_get_zonereport(int fd, uint64_t sector, uint32_t nzones)
+{
+	struct blk_zone_report *rep;
+	size_t rep_size;
+	int ret;
+
+	rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2;
+	rep = calloc(1, rep_size);
+	if (!rep)
+		return NULL;
+
+	rep->sector = sector;
+	rep->nr_zones = nzones;
+
+	ret = ioctl(fd, BLKREPORTZONE, rep);
+	if (ret || rep->nr_zones != nzones) {
+		free(rep);
+		return NULL;
+	}
+
+	return rep;
+}
+#endif
+
 
 #ifdef TEST_PROGRAM_BLKDEV
 #include <stdio.h>
diff --git a/libblkid/src/superblocks/btrfs.c b/libblkid/src/superblocks/btrfs.c
index f0fde700d896..03aa7e979298 100644
--- a/libblkid/src/superblocks/btrfs.c
+++ b/libblkid/src/superblocks/btrfs.c
@@ -9,6 +9,12 @@
 #include <unistd.h>
 #include <string.h>
 #include <stdint.h>
+#include <stdbool.h>
+#include <assert.h>
+
+#ifdef HAVE_LINUX_BLKZONED_H
+#include <linux/blkzoned.h>
+#endif
 
 #include "superblocks.h"
 
@@ -59,11 +65,157 @@ struct btrfs_super_block {
 	uint8_t label[256];
 } __attribute__ ((__packed__));
 
+#define BTRFS_SUPER_INFO_SIZE 4096
+
+/* Number of superblock log zones */
+#define BTRFS_NR_SB_LOG_ZONES 2
+
+/* Introduce some macros and types to unify the code with kernel side */
+#define SECTOR_SHIFT 9
+
+typedef uint64_t sector_t;
+
+#ifdef HAVE_LINUX_BLKZONED_H
+static int sb_write_pointer(blkid_probe pr, struct blk_zone *zones, uint64_t *wp_ret)
+{
+	bool empty[BTRFS_NR_SB_LOG_ZONES];
+	bool full[BTRFS_NR_SB_LOG_ZONES];
+	sector_t sector;
+
+	assert(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL &&
+	       zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL);
+
+	empty[0] = zones[0].cond == BLK_ZONE_COND_EMPTY;
+	empty[1] = zones[1].cond == BLK_ZONE_COND_EMPTY;
+	full[0] = zones[0].cond == BLK_ZONE_COND_FULL;
+	full[1] = zones[1].cond == BLK_ZONE_COND_FULL;
+
+	/*
+	 * Possible states of log buffer zones
+	 *
+	 *           Empty[0]  In use[0]  Full[0]
+	 * Empty[1]         *          x        0
+	 * In use[1]        0          x        0
+	 * Full[1]          1          1        C
+	 *
+	 * Log position:
+	 *   *: Special case, no superblock is written
+	 *   0: Use write pointer of zones[0]
+	 *   1: Use write pointer of zones[1]
+	 *   C: Compare super blcoks from zones[0] and zones[1], use the latest
+	 *      one determined by generation
+	 *   x: Invalid state
+	 */
+
+	if (empty[0] && empty[1]) {
+		/* Special case to distinguish no superblock to read */
+		*wp_ret = zones[0].start << SECTOR_SHIFT;
+		return -ENOENT;
+	} else if (full[0] && full[1]) {
+		/* Compare two super blocks */
+		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
+		int i;
+
+		for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+			uint64_t bytenr;
+
+			bytenr = ((zones[i].start + zones[i].len)
+				   << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE;
+
+			super[i] = (struct btrfs_super_block *)
+				blkid_probe_get_buffer(pr, bytenr, BTRFS_SUPER_INFO_SIZE);
+			if (!super[i])
+				return -EIO;
+		}
+
+		if (super[0]->generation > super[1]->generation)
+			sector = zones[1].start;
+		else
+			sector = zones[0].start;
+	} else if (!full[0] && (empty[1] || full[1])) {
+		sector = zones[0].wp;
+	} else if (full[0]) {
+		sector = zones[1].wp;
+	} else {
+		return -EUCLEAN;
+	}
+	*wp_ret = sector << SECTOR_SHIFT;
+	return 0;
+}
+
+static int sb_log_offset(blkid_probe pr, uint64_t *bytenr_ret)
+{
+	uint32_t zone_num = 0;
+	uint32_t zone_size_sector;
+	struct blk_zone_report *rep;
+	struct blk_zone *zones;
+	int ret;
+	int i;
+	uint64_t wp;
+
+
+	zone_size_sector = pr->zone_size >> SECTOR_SHIFT;
+	rep = blkdev_get_zonereport(pr->fd, zone_num * zone_size_sector, 2);
+	if (!rep) {
+		ret = -errno;
+		goto out;
+	}
+	zones = (struct blk_zone *)(rep + 1);
+
+	/*
+	 * Use the head of the first conventional zone, if the zones
+	 * contain one.
+	 */
+	for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
+		if (zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+			*bytenr_ret = zones[i].start << SECTOR_SHIFT;
+			ret = 0;
+			goto out;
+		}
+	}
+
+	ret = sb_write_pointer(pr, zones, &wp);
+	if (ret != -ENOENT && ret) {
+		ret = 1;
+		goto out;
+	}
+	if (ret != -ENOENT) {
+		if (wp == zones[0].start << SECTOR_SHIFT)
+			wp = (zones[1].start + zones[1].len) << SECTOR_SHIFT;
+		wp -= BTRFS_SUPER_INFO_SIZE;
+	}
+	*bytenr_ret = wp;
+
+	ret = 0;
+out:
+	free(rep);
+
+	return ret;
+}
+#endif
+
 static int probe_btrfs(blkid_probe pr, const struct blkid_idmag *mag)
 {
 	struct btrfs_super_block *bfs;
 
-	bfs = blkid_probe_get_sb(pr, mag, struct btrfs_super_block);
+	if (pr->zone_size) {
+#ifdef HAVE_LINUX_BLKZONED_H
+		uint64_t offset = 0;
+		int ret;
+
+		ret = sb_log_offset(pr, &offset);
+		if (ret)
+			return ret;
+		bfs = (struct btrfs_super_block *)
+			blkid_probe_get_buffer(pr, offset,
+					       sizeof(struct btrfs_super_block));
+#else
+		/* Nothing can be done */
+		return 1;
+#endif
+	} else {
+		bfs = blkid_probe_get_sb(pr, mag, struct btrfs_super_block);
+	}
 	if (!bfs)
 		return errno ? -errno : 1;
 
@@ -88,6 +240,11 @@ const struct blkid_idinfo btrfs_idinfo =
 	.magics		=
 	{
 	  { .magic = "_BHRfS_M", .len = 8, .sboff = 0x40, .kboff = 64 },
+	  /* For zoned btrfs */
+	  { .magic = "_BHRfS_M", .len = 8, .sboff = 0x40,
+	    .is_zoned = 1, .zonenum = 0, .kboff_inzone = 0 },
+	  { .magic = "_BHRfS_M", .len = 8, .sboff = 0x40,
+	    .is_zoned = 1, .zonenum = 1, .kboff_inzone = 0 },
 	  { NULL }
 	}
 };
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 3/3] blkid: support zone reset for wipefs
  2021-04-26  5:50 [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
  2021-04-26  5:50 ` [PATCH v3 1/3] blkid: implement zone-aware probing Naohiro Aota
  2021-04-26  5:50 ` [PATCH v3 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
@ 2021-04-26  5:50 ` Naohiro Aota
  2021-04-28 11:36 ` [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak
  3 siblings, 0 replies; 5+ messages in thread
From: Naohiro Aota @ 2021-04-26  5:50 UTC (permalink / raw)
  To: Karel Zak
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn, Naohiro Aota

We cannot overwrite superblock magic in a sequential required zone. So,
wipefs cannot work as it is. Instead, this commit implements the wiping by
zone resetting.

Zone resetting must be done only for a sequential write zone. This is
checked by is_conventional().

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 libblkid/src/probe.c | 69 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 63 insertions(+), 6 deletions(-)

diff --git a/libblkid/src/probe.c b/libblkid/src/probe.c
index 219cceea0f94..d4ca47c6dbed 100644
--- a/libblkid/src/probe.c
+++ b/libblkid/src/probe.c
@@ -1229,6 +1229,39 @@ int blkid_do_probe(blkid_probe pr)
 	return rc;
 }
 
+#ifdef HAVE_LINUX_BLKZONED_H
+static int is_conventional(blkid_probe pr, uint64_t offset)
+{
+	struct blk_zone_report *rep = NULL;
+	int ret;
+	uint64_t zone_mask;
+
+	if (!pr->zone_size)
+		return 1;
+
+	zone_mask = ~(pr->zone_size - 1);
+	rep = blkdev_get_zonereport(blkid_probe_get_fd(pr),
+				    (offset & zone_mask) >> 9, 1);
+	if (!rep)
+		return -1;
+
+	if (rep->zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL)
+		ret = 1;
+	else
+		ret = 0;
+
+	free(rep);
+
+	return ret;
+}
+#else
+static inline int is_conventional(blkid_probe pr __attribute__((__unused__)),
+				  uint64_t offset __attribute__((__unused__)))
+{
+	return 1;
+}
+#endif
+
 /**
  * blkid_do_wipe:
  * @pr: prober
@@ -1268,6 +1301,7 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
 	const char *off = NULL;
 	size_t len = 0;
 	uint64_t offset, magoff;
+	int conventional;
 	char buf[BUFSIZ];
 	int fd, rc = 0;
 	struct blkid_chain *chn;
@@ -1303,6 +1337,11 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
 	if (len > sizeof(buf))
 		len = sizeof(buf);
 
+	rc = is_conventional(pr, offset);
+	if (rc < 0)
+		return rc;
+	conventional = rc == 1;
+
 	DBG(LOWPROBE, ul_debug(
 	    "do_wipe [offset=0x%"PRIx64" (%"PRIu64"), len=%zu, chain=%s, idx=%d, dryrun=%s]\n",
 	    offset, offset, len, chn->driver->name, chn->idx, dryrun ? "yes" : "not"));
@@ -1310,13 +1349,31 @@ int blkid_do_wipe(blkid_probe pr, int dryrun)
 	if (lseek(fd, offset, SEEK_SET) == (off_t) -1)
 		return -1;
 
-	memset(buf, 0, len);
-
 	if (!dryrun && len) {
-		/* wipen on device */
-		if (write_all(fd, buf, len))
-			return -1;
-		fsync(fd);
+		if (conventional) {
+			memset(buf, 0, len);
+
+			/* wipen on device */
+			if (write_all(fd, buf, len))
+				return -1;
+			fsync(fd);
+		} else {
+#ifdef HAVE_LINUX_BLKZONED_H
+			uint64_t zone_mask = ~(pr->zone_size - 1);
+			struct blk_zone_range range = {
+				.sector = (offset & zone_mask) >> 9,
+				.nr_sectors = pr->zone_size >> 9,
+			};
+
+			rc = ioctl(fd, BLKRESETZONE, &range);
+			if (rc < 0)
+				return -1;
+#else
+			/* Should not reach here */
+			assert(0);
+#endif
+		}
+
 		pr->flags &= ~BLKID_FL_MODIF_BUFF;	/* be paranoid */
 
 		return blkid_probe_step_back(pr);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs
  2021-04-26  5:50 [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
                   ` (2 preceding siblings ...)
  2021-04-26  5:50 ` [PATCH v3 3/3] blkid: support zone reset for wipefs Naohiro Aota
@ 2021-04-28 11:36 ` Karel Zak
  3 siblings, 0 replies; 5+ messages in thread
From: Karel Zak @ 2021-04-28 11:36 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: util-linux, linux-btrfs, linux-fsdevel, Damien Le Moal,
	Johannes Thumshirn

On Mon, Apr 26, 2021 at 02:50:33PM +0900, Naohiro Aota wrote:
> Naohiro Aota (3):
>   blkid: implement zone-aware probing
>   blkid: add magic and probing for zoned btrfs
>   blkid: support zone reset for wipefs
> 
>  include/blkdev.h                 |   9 ++
>  lib/blkdev.c                     |  29 ++++++
>  libblkid/src/blkidP.h            |   5 +
>  libblkid/src/probe.c             |  99 +++++++++++++++++--
>  libblkid/src/superblocks/btrfs.c | 159 ++++++++++++++++++++++++++++++-
>  5 files changed, 292 insertions(+), 9 deletions(-)

Merged to the "next" branch (on github) and it will be merged to the
"master" later after v2.37 release. 

Thanks! (and extra thank for the examples :-)

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-28 11:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26  5:50 [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Naohiro Aota
2021-04-26  5:50 ` [PATCH v3 1/3] blkid: implement zone-aware probing Naohiro Aota
2021-04-26  5:50 ` [PATCH v3 2/3] blkid: add magic and probing for zoned btrfs Naohiro Aota
2021-04-26  5:50 ` [PATCH v3 3/3] blkid: support zone reset for wipefs Naohiro Aota
2021-04-28 11:36 ` [PATCH v3 0/3] implement zone-aware probing/wiping for zoned btrfs Karel Zak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).