qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 0/9] Add support for zoned device
@ 2022-07-12  2:13 Sam Li
  2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

This patch series adds support for zoned device to virtio-blk emulation. Zoned
Storage can support sequential writes, which reduces write amplification in SSD,
leading to higher write throughput and increased capacity.

v4:
- add block layer APIs (revision)
- add configurations for zoned block device

Sam Li (9):
  block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  qemu-io: add zoned block device operations.
  file-posix: introduce get_sysfs_long_val for a block queue of sysfs
    attribute
  file-posix: introduce get_sysfs_str_val for device zoned model.
  qemu-iotests: test new zone operations.
  raw-format: add zone operations
  config: add check to block layer
  include: add support for zoned block devices
  qapi: add support for zoned host device

 block.c                                     |   7 +
 block/block-backend.c                       |  41 +++
 block/coroutines.h                          |   5 +
 block/file-posix.c                          | 334 +++++++++++++++++++-
 block/io.c                                  |  57 ++++
 block/raw-format.c                          |  13 +
 include/block/block-common.h                |  43 ++-
 include/block/block-io.h                    |  13 +
 include/block/block_int-common.h            |  25 ++
 include/standard-headers/linux/virtio_blk.h | 157 ++++++++-
 qapi/block-core.json                        |   7 +-
 qemu-io-cmds.c                              | 143 +++++++++
 tests/qemu-iotests/tests/zoned.sh           |  69 ++++
 13 files changed, 888 insertions(+), 26 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned.sh

-- 
2.36.1



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-12  6:10   ` Hannes Reinecke
                     ` (2 more replies)
  2022-07-12  2:13 ` [RFC v4 2/9] qemu-io: add zoned block device operations Sam Li
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

By adding zone management operations in BlockDriver, storage
controller emulation can use the new block layer APIs including
zone_report and zone_mgmt(open, close, finish, reset).

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/block-backend.c            |  41 ++++++
 block/coroutines.h               |   5 +
 block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
 include/block/block-common.h     |  43 +++++-
 include/block/block_int-common.h |  20 +++
 5 files changed, 344 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f425b00793..0a05247ae4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
     return ret;
 }
 
+/*
+ * Send a zone_report command.
+ * offset can be any number within the zone size. No alignment for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+                                    int64_t *nr_zones,
+                                    BlockZoneDescriptor *zones)
+{
+    int ret;
+    IO_CODE();
+
+    blk_inc_in_flight(blk); /* increase before waiting */
+    blk_wait_while_drained(blk);
+    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
+    blk_dec_in_flight(blk);
+    return ret;
+}
+
+/*
+ * Send a zone_management command.
+ * Offset is the start of a zone and len is aligned to zones.
+ */
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
+        int64_t offset, int64_t len)
+{
+    int ret;
+    IO_CODE();
+
+    blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
+    ret = blk_check_byte_request(blk, offset, len);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
+    blk_dec_in_flight(blk);
+    return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
     BlockDriverState *bs = blk_bs(blk);
diff --git a/block/coroutines.h b/block/coroutines.h
index 830ecaa733..19aa96cc56 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -80,6 +80,11 @@ int coroutine_fn
 blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
 
 int coroutine_fn blk_co_do_flush(BlockBackend *blk);
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+                                    int64_t *nr_zones,
+                                    BlockZoneDescriptor *zones);
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
+                                  int64_t offset, int64_t len);
 
 
 /*
diff --git a/block/file-posix.c b/block/file-posix.c
index 48cd096624..e7523ae2ed 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -67,6 +67,7 @@
 #include <sys/param.h>
 #include <sys/syscall.h>
 #include <sys/vfs.h>
+#include <linux/blkzoned.h>
 #include <linux/cdrom.h>
 #include <linux/fd.h>
 #include <linux/fs.h>
@@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
             PreallocMode prealloc;
             Error **errp;
         } truncate;
+        struct {
+            int64_t *nr_zones;
+            BlockZoneDescriptor *zones;
+        } zone_report;
+        struct {
+            zone_op op;
+        } zone_mgmt;
     };
 } RawPosixAIOData;
 
@@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
 }
 #endif
 
+/*
+ * parse_zone - Fill a zone descriptor
+ */
+static inline void parse_zone(struct BlockZoneDescriptor *zone,
+                              struct blk_zone *blkz) {
+    zone->start = blkz->start;
+    zone->length = blkz->len;
+    zone->cap = blkz->capacity;
+    zone->wp = blkz->wp - blkz->start;
+    zone->type = blkz->type;
+    zone->cond = blkz->cond;
+}
+
+static int handle_aiocb_zone_report(void *opaque) {
+    RawPosixAIOData *aiocb = opaque;
+    int fd = aiocb->aio_fildes;
+    int64_t *nr_zones = aiocb->zone_report.nr_zones;
+    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
+    int64_t offset = aiocb->aio_offset;
+
+    struct blk_zone *blkz;
+    int64_t rep_size, nrz;
+    int ret, n = 0, i = 0;
+
+    nrz = *nr_zones;
+    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+    g_autofree struct blk_zone_report *rep = NULL;
+    rep = g_malloc(rep_size);
+    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
+    printf("start to report zone with offset: 0x%lx\n", offset);
+
+    blkz = (struct blk_zone *)(rep + 1);
+    while (n < nrz) {
+        memset(rep, 0, rep_size);
+        rep->sector = offset;
+        rep->nr_zones = nrz;
+
+        ret = ioctl(fd, BLKREPORTZONE, rep);
+        if (ret != 0) {
+            ret = -errno;
+            error_report("%d: ioctl BLKREPORTZONE at %ld failed %d",
+                         fd, offset, errno);
+            return ret;
+        }
+
+        if (!rep->nr_zones) {
+            break;
+        }
+
+        for (i = 0; i < rep->nr_zones; i++, n++) {
+            parse_zone(&zones[n], &blkz[i]);
+            /* The next report should start after the last zone reported */
+            offset = blkz[i].start + blkz[i].len;
+        }
+    }
+
+    *nr_zones = n;
+    return 0;
+}
+
+static int handle_aiocb_zone_mgmt(void *opaque) {
+    RawPosixAIOData *aiocb = opaque;
+    int fd = aiocb->aio_fildes;
+    int64_t offset = aiocb->aio_offset;
+    int64_t len = aiocb->aio_nbytes;
+    zone_op op = aiocb->zone_mgmt.op;
+
+    struct blk_zone_range range;
+    const char *ioctl_name;
+    unsigned long ioctl_op;
+    int64_t zone_size;
+    int64_t zone_size_mask;
+    int ret;
+
+    g_autofree struct stat *file = NULL;
+    file = g_new(struct stat, 1);
+    stat(s->filename, file);
+    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
+    zone_size_mask = zone_size - 1;
+    if (offset & zone_size_mask) {
+        error_report("offset is not the start of a zone");
+        return -EINVAL;
+    }
+
+    if (len & zone_size_mask) {
+        error_report("len is not aligned to zones");
+        return -EINVAL;
+    }
+
+    switch (op) {
+    case zone_open:
+        ioctl_name = "BLKOPENZONE";
+        ioctl_op = BLKOPENZONE;
+        break;
+    case zone_close:
+        ioctl_name = "BLKCLOSEZONE";
+        ioctl_op = BLKCLOSEZONE;
+        break;
+    case zone_finish:
+        ioctl_name = "BLKFINISHZONE";
+        ioctl_op = BLKFINISHZONE;
+        break;
+    case zone_reset:
+        ioctl_name = "BLKRESETZONE";
+        ioctl_op = BLKRESETZONE;
+        break;
+    default:
+        error_report("Invalid zone operation 0x%x", op);
+        return -EINVAL;
+    }
+
+    /* Execute the operation */
+    range.sector = offset;
+    range.nr_sectors = len;
+    ret = ioctl(fd, ioctl_op, &range);
+    if (ret != 0) {
+        error_report("ioctl %s failed %d",
+                     ioctl_name, errno);
+        return -errno;
+    }
+
+    return 0;
+}
+
 static int handle_aiocb_copy_range(void *opaque)
 {
     RawPosixAIOData *aiocb = opaque;
@@ -2973,6 +3105,59 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
     }
 }
 
+/*
+ * zone report - Get a zone block device's information in the form
+ * of an array of zone descriptors.
+ *
+ * @param bs: passing zone block device file descriptor
+ * @param zones: an array of zone descriptors to hold zone
+ * information on reply
+ * @param offset: offset can be any byte within the zone size.
+ * @param len: (not sure yet.
+ * @return 0 on success, -1 on failure
+ */
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+                                           int64_t *nr_zones,
+                                           BlockZoneDescriptor *zones) {
+    BDRVRawState *s = bs->opaque;
+    RawPosixAIOData acb;
+
+    acb = (RawPosixAIOData) {
+        .bs         = bs,
+        .aio_fildes = s->fd,
+        .aio_type   = QEMU_AIO_IOCTL,
+        .aio_offset = offset,
+        .zone_report    = {
+                .nr_zones       = nr_zones,
+                .zones          = zones,
+        },
+    };
+
+    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
+}
+
+/*
+ * zone management operations - Execute an operation on a zone
+ */
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
+        int64_t offset, int64_t len) {
+    BDRVRawState *s = bs->opaque;
+    RawPosixAIOData acb;
+
+    acb = (RawPosixAIOData) {
+        .bs             = bs,
+        .aio_fildes     = s->fd,
+        .aio_type       = QEMU_AIO_IOCTL,
+        .aio_offset     = offset,
+        .aio_nbytes     = len,
+        .zone_mgmt  = {
+                .op = op,
+        },
+    };
+
+    return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
+}
+
 static coroutine_fn int
 raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
                 bool blkdev)
@@ -3324,6 +3509,9 @@ BlockDriver bdrv_file = {
     .bdrv_abort_perm_update = raw_abort_perm_update,
     .create_opts = &raw_create_opts,
     .mutable_opts = mutable_opts,
+
+    .bdrv_co_zone_report = raw_co_zone_report,
+    .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
 };
 
 /***********************************************/
@@ -3703,6 +3891,53 @@ static BlockDriver bdrv_host_device = {
 #endif
 };
 
+static BlockDriver bdrv_zoned_host_device = {
+        .format_name = "zoned_host_device",
+        .protocol_name = "zoned_host_device",
+        .instance_size = sizeof(BDRVRawState),
+        .bdrv_needs_filename = true,
+        .bdrv_probe_device  = hdev_probe_device,
+        .bdrv_parse_filename = hdev_parse_filename,
+        .bdrv_file_open     = hdev_open,
+        .bdrv_close         = raw_close,
+        .bdrv_reopen_prepare = raw_reopen_prepare,
+        .bdrv_reopen_commit  = raw_reopen_commit,
+        .bdrv_reopen_abort   = raw_reopen_abort,
+        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
+        .create_opts         = &bdrv_create_opts_simple,
+        .mutable_opts        = mutable_opts,
+        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
+        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
+
+        .bdrv_co_preadv         = raw_co_preadv,
+        .bdrv_co_pwritev        = raw_co_pwritev,
+        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
+        .bdrv_co_pdiscard       = hdev_co_pdiscard,
+        .bdrv_co_copy_range_from = raw_co_copy_range_from,
+        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
+        .bdrv_refresh_limits = raw_refresh_limits,
+        .bdrv_io_plug = raw_aio_plug,
+        .bdrv_io_unplug = raw_aio_unplug,
+        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
+
+        .bdrv_co_truncate       = raw_co_truncate,
+        .bdrv_getlength = raw_getlength,
+        .bdrv_get_info = raw_get_info,
+        .bdrv_get_allocated_file_size
+                            = raw_get_allocated_file_size,
+        .bdrv_get_specific_stats = hdev_get_specific_stats,
+        .bdrv_check_perm = raw_check_perm,
+        .bdrv_set_perm   = raw_set_perm,
+        .bdrv_abort_perm_update = raw_abort_perm_update,
+        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
+        .bdrv_probe_geometry = hdev_probe_geometry,
+        .bdrv_co_ioctl = hdev_co_ioctl,
+
+        /* zone management operations */
+        .bdrv_co_zone_report = raw_co_zone_report,
+        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
+};
+
 #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 static void cdrom_parse_filename(const char *filename, QDict *options,
                                  Error **errp)
@@ -3964,6 +4199,7 @@ static void bdrv_file_init(void)
 #if defined(HAVE_HOST_BLOCK_DEVICE)
     bdrv_register(&bdrv_host_device);
 #ifdef __linux__
+    bdrv_register(&bdrv_zoned_host_device);
     bdrv_register(&bdrv_host_cdrom);
 #endif
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..78cddeeda5 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -23,7 +23,6 @@
  */
 #ifndef BLOCK_COMMON_H
 #define BLOCK_COMMON_H
-
 #include "block/aio.h"
 #include "block/aio-wait.h"
 #include "qemu/iov.h"
@@ -49,6 +48,48 @@ typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildClass BdrvChildClass;
 
+typedef enum zone_op {
+    zone_open,
+    zone_close,
+    zone_finish,
+    zone_reset,
+} zone_op;
+
+typedef enum zone_model {
+    BLK_Z_HM,
+    BLK_Z_HA,
+} zone_model;
+
+typedef enum BlkZoneCondition {
+    BLK_ZS_NOT_WP = 0x0,
+    BLK_ZS_EMPTY = 0x1,
+    BLK_ZS_IOPEN = 0x2,
+    BLK_ZS_EOPEN = 0x3,
+    BLK_ZS_CLOSED = 0x4,
+    BLK_ZS_RDONLY = 0xD,
+    BLK_ZS_FULL = 0xE,
+    BLK_ZS_OFFLINE = 0xF,
+} BlkZoneCondition;
+
+typedef enum BlkZoneType {
+    BLK_ZT_CONV = 0x1,
+    BLK_ZT_SWR = 0x2,
+    BLK_ZT_SWP = 0x3,
+} BlkZoneType;
+
+/*
+ * Zone descriptor data structure.
+ * Provide information on a zone with all position and size values in bytes.
+ */
+typedef struct BlockZoneDescriptor {
+    uint64_t start;
+    uint64_t length;
+    uint64_t cap;
+    uint64_t wp;
+    BlkZoneType type;
+    BlkZoneCondition cond;
+} BlockZoneDescriptor;
+
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
     int cluster_size;
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..6037871089 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -94,6 +94,20 @@ typedef struct BdrvTrackedRequest {
     struct BdrvTrackedRequest *waiting_for;
 } BdrvTrackedRequest;
 
+/**
+ * Zone device information data structure.
+ * Provide information on a device.
+ */
+typedef struct zbd_dev {
+    uint32_t zone_size;
+    zone_model model;
+    uint32_t block_size;
+    uint32_t write_granularity;
+    uint32_t nr_zones;
+    struct BlockZoneDescriptor *zones; /* array of zones */
+    uint32_t max_nr_open_zones; /* maximum number of explicitly open zones */
+    uint32_t max_nr_active_zones;
+} zbd_dev;
 
 struct BlockDriver {
     /*
@@ -691,6 +705,12 @@ struct BlockDriver {
                                           QEMUIOVector *qiov,
                                           int64_t pos);
 
+    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
+            int64_t offset, int64_t *nr_zones,
+            BlockZoneDescriptor *zones);
+    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, enum zone_op op,
+            int64_t offset, int64_t len);
+
     /* removable device specific */
     bool (*bdrv_is_inserted)(BlockDriverState *bs);
     void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 2/9] qemu-io: add zoned block device operations.
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
  2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-12  6:14   ` Hannes Reinecke
                     ` (2 more replies)
  2022-07-12  2:13 ` [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute Sam Li
                   ` (9 subsequent siblings)
  11 siblings, 3 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

Add zoned storage commands of the device: zone_open(zo), zone_close(zc),
zone_reset(zs), zone_report(zp), zone_finish(zf).

For example, it can be called by:
./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
-c "zone_report 0 0 1"

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/io.c               |  57 ++++++++++++++++
 include/block/block-io.h |  13 ++++
 qemu-io-cmds.c           | 143 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 213 insertions(+)

diff --git a/block/io.c b/block/io.c
index 1e9bf09a49..a760be0131 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3243,6 +3243,63 @@ out:
     return co.ret;
 }
 
+int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
+                        int64_t *nr_zones,
+                        BlockZoneDescriptor *zones)
+{
+    BlockDriver *drv = bs->drv;
+    CoroutineIOCompletion co = {
+            .coroutine = qemu_coroutine_self(),
+    };
+    IO_CODE();
+
+    bdrv_inc_in_flight(bs);
+    if (!drv || (!drv->bdrv_co_zone_report)) {
+        co.ret = -ENOTSUP;
+        goto out;
+    }
+
+    if (drv->bdrv_co_zone_report) {
+        co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
+    } else {
+        co.ret = -ENOTSUP;
+        goto out;
+        qemu_coroutine_yield();
+    }
+
+out:
+    bdrv_dec_in_flight(bs);
+    return co.ret;
+}
+
+int bdrv_co_zone_mgmt(BlockDriverState *bs, enum zone_op op,
+        int64_t offset, int64_t len)
+{
+    BlockDriver *drv = bs->drv;
+    CoroutineIOCompletion co = {
+            .coroutine = qemu_coroutine_self(),
+    };
+    IO_CODE();
+
+    bdrv_inc_in_flight(bs);
+    if (!drv || (!drv->bdrv_co_zone_mgmt)) {
+        co.ret = -ENOTSUP;
+        goto out;
+    }
+
+    if (drv->bdrv_co_zone_mgmt) {
+        co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
+    } else {
+        co.ret = -ENOTSUP;
+        goto out;
+        qemu_coroutine_yield();
+    }
+
+out:
+    bdrv_dec_in_flight(bs);
+    return co.ret;
+}
+
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
     IO_CODE();
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 053a27141a..a0ae140452 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -80,6 +80,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 /* Ensure contents are flushed to disk.  */
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
 
+/* Report zone information of zone block device. */
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
+                                     int64_t *nr_zones,
+                                     BlockZoneDescriptor *zones);
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, zone_op op,
+                                   int64_t offset, int64_t len);
+
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int bdrv_block_status(BlockDriverState *bs, int64_t offset,
@@ -289,6 +296,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int generated_co_wrapper
 bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 
+int generated_co_wrapper
+blk_zone_report(BlockBackend *blk, int64_t offset, int64_t *nr_zones,
+                BlockZoneDescriptor *zones);
+int generated_co_wrapper
+blk_zone_mgmt(BlockBackend *blk, enum zone_op op, int64_t offset, int64_t len);
+
 /**
  * bdrv_parent_drained_begin_single:
  *
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 2f0d8ac25a..a88fa322d2 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1706,6 +1706,144 @@ static const cmdinfo_t flush_cmd = {
     .oneline    = "flush all in-core file state to disk",
 };
 
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, nr_zones;
+
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    nr_zones = cvtnum(argv[optind]);
+
+    g_autofree BlockZoneDescriptor *zones = NULL;
+    zones = g_new(BlockZoneDescriptor, nr_zones);
+    ret = blk_zone_report(blk, offset, &nr_zones, zones);
+    if (ret < 0) {
+        printf("zone report failed: %s\n", strerror(-ret));
+    } else {
+        for (int i = 0; i < nr_zones; ++i) {
+            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
+                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
+                   "zcond:%u, [type: %u]\n",
+                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
+                   zones[i].cond, zones[i].type);
+        }
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_report_cmd = {
+        .name = "zone_report",
+        .altname = "zp",
+        .cfunc = zone_report_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset number",
+        .oneline = "report zone information",
+};
+
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, zone_open, offset, len);
+    if (ret < 0) {
+        printf("zone open failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_open_cmd = {
+        .name = "zone_open",
+        .altname = "zo",
+        .cfunc = zone_open_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "explicit open a range of zones in zone block device",
+};
+
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, zone_close, offset, len);
+    if (ret < 0) {
+        printf("zone close failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_close_cmd = {
+        .name = "zone_close",
+        .altname = "zc",
+        .cfunc = zone_close_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "close a range of zones in zone block device",
+};
+
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, zone_finish, offset, len);
+    if (ret < 0) {
+        printf("zone finish failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_finish_cmd = {
+        .name = "zone_finish",
+        .altname = "zf",
+        .cfunc = zone_finish_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "finish a range of zones in zone block device",
+};
+
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int64_t offset, len;
+    ++optind;
+    offset = cvtnum(argv[optind]);
+    ++optind;
+    len = cvtnum(argv[optind]);
+    ret = blk_zone_mgmt(blk, zone_reset, offset, len);
+    if (ret < 0) {
+        printf("zone reset failed: %s\n", strerror(-ret));
+    }
+    return ret;
+}
+
+static const cmdinfo_t zone_reset_cmd = {
+        .name = "zone_reset",
+        .altname = "zrs",
+        .cfunc = zone_reset_f,
+        .argmin = 2,
+        .argmax = 2,
+        .args = "offset len",
+        .oneline = "reset a zone write pointer in zone block device",
+};
+
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
     .name       = "truncate",
@@ -2498,6 +2636,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
     qemuio_add_command(&aio_write_cmd);
     qemuio_add_command(&aio_flush_cmd);
     qemuio_add_command(&flush_cmd);
+    qemuio_add_command(&zone_report_cmd);
+    qemuio_add_command(&zone_open_cmd);
+    qemuio_add_command(&zone_close_cmd);
+    qemuio_add_command(&zone_finish_cmd);
+    qemuio_add_command(&zone_reset_cmd);
     qemuio_add_command(&truncate_cmd);
     qemuio_add_command(&length_cmd);
     qemuio_add_command(&info_cmd);
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
  2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
  2022-07-12  2:13 ` [RFC v4 2/9] qemu-io: add zoned block device operations Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-12  6:16   ` Hannes Reinecke
  2022-07-12  7:37   ` Damien Le Moal
  2022-07-12  2:13 ` [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

Use sysfs attribute files to get the zoned device information in case
that ioctl() commands of zone management interface won't work.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/file-posix.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index e7523ae2ed..3161d39ea4 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1218,15 +1218,19 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
 #endif
 }
 
-static int hdev_get_max_segments(int fd, struct stat *st)
-{
+/*
+ * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
+ * max_open_zones, max_active_zones) through sysfs attribute files.
+ */
+static long get_sysfs_long_val(int fd, struct stat *st,
+                               const char *attribute) {
 #ifdef CONFIG_LINUX
     char buf[32];
     const char *end;
     char *sysfspath = NULL;
     int ret;
     int sysfd = -1;
-    long max_segments;
+    long val;
 
     if (S_ISCHR(st->st_mode)) {
         if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
@@ -1239,8 +1243,9 @@ static int hdev_get_max_segments(int fd, struct stat *st)
         return -ENOTSUP;
     }
 
-    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-                                major(st->st_rdev), minor(st->st_rdev));
+    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
+                                major(st->st_rdev), minor(st->st_rdev),
+                                attribute);
     sysfd = open(sysfspath, O_RDONLY);
     if (sysfd == -1) {
         ret = -errno;
@@ -1258,9 +1263,9 @@ static int hdev_get_max_segments(int fd, struct stat *st)
     }
     buf[ret] = 0;
     /* The file is ended with '\n', pass 'end' to accept that. */
-    ret = qemu_strtol(buf, &end, 10, &max_segments);
+    ret = qemu_strtol(buf, &end, 10, &val);
     if (ret == 0 && end && *end == '\n') {
-        ret = max_segments;
+        ret = val;
     }
 
 out:
@@ -1274,6 +1279,10 @@ out:
 #endif
 }
 
+static int hdev_get_max_segments(int fd, struct stat *st) {
+    return get_sysfs_long_val(fd, st, "max_segments");
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
@@ -1883,10 +1892,17 @@ static int handle_aiocb_zone_mgmt(void *opaque) {
     int64_t zone_size_mask;
     int ret;
 
-    g_autofree struct stat *file = NULL;
-    file = g_new(struct stat, 1);
-    stat(s->filename, file);
-    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
+    struct stat file;
+    if (fstat(fd, &file) < 0) {
+        return -errno;
+    }
+    mod = get_sysfs_str_val(fd, &file);
+    if (mod != BLK_Z_HM) {
+        ret = -ENOTSUP;
+        return ret;
+    }
+
+    zone_size = get_sysfs_long_val(fd, &file, "chunk_sectors");
     zone_size_mask = zone_size - 1;
     if (offset & zone_size_mask) {
         error_report("offset is not the start of a zone");
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model.
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (2 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-12  6:17   ` Hannes Reinecke
  2022-07-12  7:42   ` Damien Le Moal
  2022-07-12  2:13 ` [RFC v4 5/9] qemu-iotests: test new zone operations Sam Li
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/file-posix.c           | 60 ++++++++++++++++++++++++++++++++++++
 include/block/block-common.h |  4 +--
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 3161d39ea4..42708012ff 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1279,6 +1279,65 @@ out:
 #endif
 }
 
+/*
+ * Convert the zoned attribute file in sysfs to internal value.
+ */
+static zone_model get_sysfs_str_val(int fd, struct stat *st) {
+#ifdef CONFIG_LINUX
+    char buf[32];
+    char *sysfspath = NULL;
+    int ret, offset;
+    int sysfd = -1;
+
+    if (S_ISCHR(st->st_mode)) {
+        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
+            return ret;
+        }
+        return -ENOTSUP;
+    }
+
+    if (!S_ISBLK(st->st_mode)) {
+        return -ENOTSUP;
+    }
+
+    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/zoned",
+                                major(st->st_rdev), minor(st->st_rdev));
+    sysfd = open(sysfspath, O_RDONLY);
+    if (sysfd == -1) {
+        ret = -errno;
+        goto out;
+    }
+    offset = 0;
+    do {
+        ret = read(sysfd, buf + offset, sizeof(buf) - 1 + offset);
+        if (ret > 0) {
+            offset += ret;
+        }
+    } while (ret == -1);
+    /* The file is ended with '\n' */
+    if (buf[ret - 1] == '\n') {
+        buf[ret - 1] = '\0';
+    }
+
+    if (strcmp(buf, "host-managed") == 0) {
+        return BLK_Z_HM;
+    } else if (strcmp(buf, "host-aware") == 0) {
+        return BLK_Z_HA;
+    } else {
+        return -ENOTSUP;
+    }
+
+out:
+    if (sysfd != -1) {
+        close(sysfd);
+    }
+    g_free(sysfspath);
+    return ret;
+#else
+    return -ENOTSUP;
+#endif
+}
+
 static int hdev_get_max_segments(int fd, struct stat *st) {
     return get_sysfs_long_val(fd, st, "max_segments");
 }
@@ -1885,6 +1944,7 @@ static int handle_aiocb_zone_mgmt(void *opaque) {
     int64_t len = aiocb->aio_nbytes;
     zone_op op = aiocb->zone_mgmt.op;
 
+    zone_model mod;
     struct blk_zone_range range;
     const char *ioctl_name;
     unsigned long ioctl_op;
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 78cddeeda5..35e00afe8e 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -56,8 +56,8 @@ typedef enum zone_op {
 } zone_op;
 
 typedef enum zone_model {
-    BLK_Z_HM,
-    BLK_Z_HA,
+    BLK_Z_HM = 0x1,
+    BLK_Z_HA = 0x2,
 } zone_model;
 
 typedef enum BlkZoneCondition {
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 5/9] qemu-iotests: test new zone operations.
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (3 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-27 14:34   ` Stefan Hajnoczi
  2022-07-12  2:13 ` [RFC v4 6/9] raw-format: add " Sam Li
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

We have added new block layer APIs of zoned block devices. Test it with:
(1) Create a null_blk device, run each zone operation on it and see
whether reporting right zone information.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 tests/qemu-iotests/tests/zoned.sh | 69 +++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned.sh

diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
new file mode 100755
index 0000000000..e14a3a420e
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.sh
@@ -0,0 +1,69 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+_cleanup()
+{
+  _cleanup_test_img
+  sudo rmmod null_blk
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# This test only runs on Linux hosts with raw image files.
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+QEMU_IO="build/qemu-io"
+IMG="--image-opts driver=zoned_host_device,filename=/dev/nullb0"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo "Testing a null_blk device"
+echo "Simple cases: if the operations work"
+sudo modprobe null_blk nr_devices=1 zoned=1
+# hidden issues:
+# 1. memory allocation error of "unaligned tcache chunk detected" when the nr_zone=1 in zone report
+# 2. qemu-io: after report 10 zones, the program failed at double free error and exited.
+echo "report the first zone"
+sudo $QEMU_IO $IMG -c "zr 0 0 1"
+echo "report: the first 10 zones"
+sudo $QEMU_IO $IMG -c "zr 0 0 10"
+
+echo "open the first zone"
+sudo $QEMU_IO $IMG -c "zo 0 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zr 0 0 1"
+echo "open the last zone"
+sudo $QEMU_IO $IMG -c "zo 0x3e70000000 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zr 0x3e70000000 0 2"
+
+echo "close the first zone"
+sudo $QEMU_IO $IMG -c "zc 0 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zr 0 0 1"
+echo "close the last zone"
+sudo $QEMU_IO $IMG -c "zc 0x3e70000000 0x80000"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zr 0x3e70000000 0 2"
+
+
+echo "reset the second zone"
+sudo $QEMU_IO $IMG -c "zrs 0x80000 0x80000"
+echo "After resetting a zone:"
+sudo $QEMU_IO $IMG -c "zr 0x80000 0 5"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 6/9] raw-format: add zone operations
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (4 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 5/9] qemu-iotests: test new zone operations Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-27 14:39   ` Stefan Hajnoczi
  2022-07-12  2:13 ` [RFC v4 7/9] config: add check to block layer Sam Li
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/raw-format.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index 69fd650eaf..96bdb6c1e2 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
     return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+                                           int64_t *nr_zones,
+                                           BlockZoneDescriptor *zones) {
+    return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
+}
+
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
+                                         int64_t offset, int64_t len) {
+    return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
+}
+
 static int64_t raw_getlength(BlockDriverState *bs)
 {
     int64_t len;
@@ -614,6 +625,8 @@ BlockDriver bdrv_raw = {
     .bdrv_co_pwritev      = &raw_co_pwritev,
     .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
     .bdrv_co_pdiscard     = &raw_co_pdiscard,
+    .bdrv_co_zone_report  = &raw_co_zone_report,
+    .bdrv_co_zone_mgmt  = &raw_co_zone_mgmt,
     .bdrv_co_block_status = &raw_co_block_status,
     .bdrv_co_copy_range_from = &raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = &raw_co_copy_range_to,
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 7/9] config: add check to block layer
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (5 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 6/9] raw-format: add " Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-27 14:50   ` Stefan Hajnoczi
  2022-07-12  2:13 ` [RFC v4 8/9] include: add support for zoned block devices Sam Li
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block.c                          | 7 +++++++
 block/file-posix.c               | 2 ++
 include/block/block_int-common.h | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/block.c b/block.c
index 2c00dddd80..0e24582c7d 100644
--- a/block.c
+++ b/block.c
@@ -7945,6 +7945,13 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
         return;
     }
 
+    if (parent_bs->drv->is_zoned != child_bs->drv->is_zoned) {
+        error_setg(errp, "Cannot add a %s child to a %s parent",
+                   child_bs->drv->is_zoned ? "zoned" : "non-zoned",
+                   parent_bs->drv->is_zoned ? "zoned" : "non-zoned");
+        return;
+    }
+
     if (!QLIST_EMPTY(&child_bs->parents)) {
         error_setg(errp, "The node %s already has a parent",
                    child_bs->node_name);
diff --git a/block/file-posix.c b/block/file-posix.c
index 42708012ff..e9ad1d8e1e 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3924,6 +3924,7 @@ static BlockDriver bdrv_host_device = {
     .format_name        = "host_device",
     .protocol_name        = "host_device",
     .instance_size      = sizeof(BDRVRawState),
+    .is_zoned = false,
     .bdrv_needs_filename = true,
     .bdrv_probe_device  = hdev_probe_device,
     .bdrv_parse_filename = hdev_parse_filename,
@@ -3971,6 +3972,7 @@ static BlockDriver bdrv_zoned_host_device = {
         .format_name = "zoned_host_device",
         .protocol_name = "zoned_host_device",
         .instance_size = sizeof(BDRVRawState),
+        .is_zoned = true,
         .bdrv_needs_filename = true,
         .bdrv_probe_device  = hdev_probe_device,
         .bdrv_parse_filename = hdev_parse_filename,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 6037871089..29f1ec9184 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -141,6 +141,11 @@ struct BlockDriver {
      */
     bool is_format;
 
+    /*
+     * Set to true if the BlockDriver is a zoned block driver.
+     */
+    bool is_zoned;
+
     /*
      * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
      * this field set to true, except ones that are defined only by their
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 8/9] include: add support for zoned block devices
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (6 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 7/9] config: add check to block layer Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-27 14:52   ` Stefan Hajnoczi
  2022-07-12  2:13 ` [RFC v4 9/9] qapi: add support for zoned host device Sam Li
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

This is the virtio_blk.h header file from Dmitry's "virtio-blk: add
support for zoned block devices" patch. It introduces
virtio_blk_zoned_characteristics struct from Dmitry's virtio-blk zoned
storage spec.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 include/standard-headers/linux/virtio_blk.h | 157 ++++++++++++++++++--
 1 file changed, 141 insertions(+), 16 deletions(-)

diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
index 2dcc90826a..f07fbe1b9b 100644
--- a/include/standard-headers/linux/virtio_blk.h
+++ b/include/standard-headers/linux/virtio_blk.h
@@ -25,10 +25,10 @@
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE. */
-#include "standard-headers/linux/types.h"
-#include "standard-headers/linux/virtio_ids.h"
-#include "standard-headers/linux/virtio_config.h"
-#include "standard-headers/linux/virtio_types.h"
+#include <linux/types.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
 
 /* Feature bits */
 #define VIRTIO_BLK_F_SIZE_MAX	1	/* Indicates maximum segment size */
@@ -40,6 +40,7 @@
 #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
 #define VIRTIO_BLK_F_DISCARD	13	/* DISCARD is supported */
 #define VIRTIO_BLK_F_WRITE_ZEROES	14	/* WRITE ZEROES is supported */
+#define VIRTIO_BLK_F_ZONED		17	/* Zoned block device */
 
 /* Legacy feature bits */
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -47,8 +48,10 @@
 #define VIRTIO_BLK_F_SCSI	7	/* Supports scsi command passthru */
 #define VIRTIO_BLK_F_FLUSH	9	/* Flush command supported */
 #define VIRTIO_BLK_F_CONFIG_WCE	11	/* Writeback mode available in config */
+#ifndef __KERNEL__
 /* Old (deprecated) name for VIRTIO_BLK_F_FLUSH. */
 #define VIRTIO_BLK_F_WCE VIRTIO_BLK_F_FLUSH
+#endif
 #endif /* !VIRTIO_BLK_NO_LEGACY */
 
 #define VIRTIO_BLK_ID_BYTES	20	/* ID string length */
@@ -63,8 +66,8 @@ struct virtio_blk_config {
 	/* geometry of the device (if VIRTIO_BLK_F_GEOMETRY) */
 	struct virtio_blk_geometry {
 		__virtio16 cylinders;
-		uint8_t heads;
-		uint8_t sectors;
+		__u8 heads;
+		__u8 sectors;
 	} geometry;
 
 	/* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */
@@ -72,17 +75,17 @@ struct virtio_blk_config {
 
 	/* the next 4 entries are guarded by VIRTIO_BLK_F_TOPOLOGY  */
 	/* exponent for physical block per logical block. */
-	uint8_t physical_block_exp;
+	__u8 physical_block_exp;
 	/* alignment offset in logical blocks. */
-	uint8_t alignment_offset;
+	__u8 alignment_offset;
 	/* minimum I/O size without performance penalty in logical blocks. */
 	__virtio16 min_io_size;
 	/* optimal sustained I/O size in logical blocks. */
 	__virtio32 opt_io_size;
 
 	/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
-	uint8_t wce;
-	uint8_t unused;
+	__u8 wce;
+	__u8 unused;
 
 	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
 	__virtio16 num_queues;
@@ -116,10 +119,24 @@ struct virtio_blk_config {
 	 * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the
 	 * deallocation of one or more of the sectors.
 	 */
-	uint8_t write_zeroes_may_unmap;
+	__u8 write_zeroes_may_unmap;
 
-	uint8_t unused1[3];
-} QEMU_PACKED;
+	__u8 unused1[3];
+
+	/* Secure erase fields that are defined in the virtio spec */
+	__u8 sec_erase[12];
+
+	/* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
+	struct virtio_blk_zoned_characteristics {
+		__virtio32 zone_sectors;
+		__virtio32 max_open_zones;
+		__virtio32 max_active_zones;
+		__virtio32 max_append_sectors;
+		__virtio32 write_granularity;
+		__u8 model;
+		__u8 unused2[3];
+	} zoned;
+} __attribute__((packed));
 
 /*
  * Command types
@@ -153,6 +170,24 @@ struct virtio_blk_config {
 /* Write zeroes command */
 #define VIRTIO_BLK_T_WRITE_ZEROES	13
 
+/* Zone append command */
+#define VIRTIO_BLK_T_ZONE_APPEND    15
+
+/* Report zones command */
+#define VIRTIO_BLK_T_ZONE_REPORT    16
+
+/* Open zone command */
+#define VIRTIO_BLK_T_ZONE_OPEN      18
+
+/* Close zone command */
+#define VIRTIO_BLK_T_ZONE_CLOSE     20
+
+/* Finish zone command */
+#define VIRTIO_BLK_T_ZONE_FINISH    22
+
+/* Reset zone command */
+#define VIRTIO_BLK_T_ZONE_RESET     24
+
 #ifndef VIRTIO_BLK_NO_LEGACY
 /* Barrier before this op. */
 #define VIRTIO_BLK_T_BARRIER	0x80000000
@@ -172,17 +207,100 @@ struct virtio_blk_outhdr {
 	__virtio64 sector;
 };
 
+/*
+ * Supported zoned device models.
+ */
+
+/* Host-managed zoned device */
+#define VIRTIO_BLK_Z_HM        1
+/* Host-aware zoned device */
+#define VIRTIO_BLK_Z_HA        2
+
+/* ZBD Management Out ALL flag */
+#define VIRTIO_BLK_ZONED_FLAG_ALL	(1 << 0)
+
+/*
+ * Request header for zoned devices.
+ * The first three fields are identical in layout to
+ * struct virtio_blk_outhdr.
+ */
+struct virtio_blk_zoned_outhdr {
+	/* VIRTIO_BLK_T* */
+	__virtio32 type;
+	/* io priority. */
+	__virtio32 ioprio;
+	/* Sector (ie. 512 byte offset) */
+	__virtio64 sector;
+	/* Zoned request flags */
+	__virtio32 flags;
+};
+
+/*
+ * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
+ */
+struct virtio_blk_zone_descriptor {
+	/* Zone capacity */
+	__virtio64 z_cap;
+	/* The starting sector of the zone */
+	__virtio64 z_start;
+	/* Zone write pointer position in sectors */
+	__virtio64 z_wp;
+	/* Zone type */
+	__u8 z_type;
+	/* Zone state */
+	__u8 z_state;
+	__u8 reserved[38];
+};
+
+struct virtio_blk_zone_report {
+	__virtio64 nr_zones;
+	__u8 reserved[56];
+	struct virtio_blk_zone_descriptor zones[];
+};
+
+/*
+ * Supported zone types.
+ */
+
+/* Conventional zone */
+#define VIRTIO_BLK_ZT_CONV         1
+/* Sequential Write Required zone */
+#define VIRTIO_BLK_ZT_SWR          2
+/* Sequential Write Preferred zone */
+#define VIRTIO_BLK_ZT_SWP          3
+
+/*
+ * Zone states that are available for zones of all types.
+ */
+
+/* Not a write pointer (conventional zones only) */
+#define VIRTIO_BLK_ZS_NOT_WP       0
+/* Empty */
+#define VIRTIO_BLK_ZS_EMPTY        1
+/* Implicitly Open */
+#define VIRTIO_BLK_ZS_IOPEN        2
+/* Explicitly Open */
+#define VIRTIO_BLK_ZS_EOPEN        3
+/* Closed */
+#define VIRTIO_BLK_ZS_CLOSED       4
+/* Read-Only */
+#define VIRTIO_BLK_ZS_RDONLY       13
+/* Full */
+#define VIRTIO_BLK_ZS_FULL         14
+/* Offline */
+#define VIRTIO_BLK_ZS_OFFLINE      15
+
 /* Unmap this range (only valid for write zeroes command) */
 #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP	0x00000001
 
 /* Discard/write zeroes range for each request. */
 struct virtio_blk_discard_write_zeroes {
 	/* discard/write zeroes start sector */
-	uint64_t sector;
+	__le64 sector;
 	/* number of discard/write zeroes sectors */
-	uint32_t num_sectors;
+	__le32 num_sectors;
 	/* flags for this range */
-	uint32_t flags;
+	__le32 flags;
 };
 
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -198,4 +316,11 @@ struct virtio_scsi_inhdr {
 #define VIRTIO_BLK_S_OK		0
 #define VIRTIO_BLK_S_IOERR	1
 #define VIRTIO_BLK_S_UNSUPP	2
+
+/* Error codes that are specific to zoned block devices */
+#define VIRTIO_BLK_S_ZONE_INVALID_CMD     3
+#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP    4
+#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE   5
+#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
+
 #endif /* _LINUX_VIRTIO_BLK_H */
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC v4 9/9] qapi: add support for zoned host device
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (7 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 8/9] include: add support for zoned block devices Sam Li
@ 2022-07-12  2:13 ` Sam Li
  2022-07-12  7:48   ` Damien Le Moal
  2022-07-27 14:53   ` Stefan Hajnoczi
  2022-07-12  5:47 ` [RFC v4 0/9] Add support for zoned device Markus Armbruster
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  2:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng, hare, Sam Li

---
 block/file-posix.c   | 8 +++++++-
 qapi/block-core.json | 7 +++++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index e9ad1d8e1e..4e0aa02acf 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3737,6 +3737,12 @@ static void hdev_parse_filename(const char *filename, QDict *options,
     bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
 }
 
+static void zoned_host_device_parse_filename(const char *filename, QDict *options,
+                                Error **errp)
+{
+    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
+}
+
 static bool hdev_is_sg(BlockDriverState *bs)
 {
 
@@ -3975,7 +3981,7 @@ static BlockDriver bdrv_zoned_host_device = {
         .is_zoned = true,
         .bdrv_needs_filename = true,
         .bdrv_probe_device  = hdev_probe_device,
-        .bdrv_parse_filename = hdev_parse_filename,
+        .bdrv_parse_filename = zoned_host_device_parse_filename,
         .bdrv_file_open     = hdev_open,
         .bdrv_close         = raw_close,
         .bdrv_reopen_prepare = raw_reopen_prepare,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2173e7734a..ab05c2ef99 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2955,7 +2955,8 @@
             'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
             'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
-            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
+            { 'name': 'zoned_host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' } ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -4329,7 +4330,9 @@
       'vhdx':       'BlockdevOptionsGenericFormat',
       'vmdk':       'BlockdevOptionsGenericCOWFormat',
       'vpc':        'BlockdevOptionsGenericFormat',
-      'vvfat':      'BlockdevOptionsVVFAT'
+      'vvfat':      'BlockdevOptionsVVFAT',
+      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
+                             'if': 'HAVE_HOST_BLOCK_DEVICE' }
   } }
 
 ##
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC v4 0/9] Add support for zoned device
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (8 preceding siblings ...)
  2022-07-12  2:13 ` [RFC v4 9/9] qapi: add support for zoned host device Sam Li
@ 2022-07-12  5:47 ` Markus Armbruster
  2022-07-12  5:59   ` Sam Li
  2022-07-27 14:55 ` Stefan Hajnoczi
  2022-07-27 15:06 ` Stefan Hajnoczi
  11 siblings, 1 reply; 42+ messages in thread
From: Markus Armbruster @ 2022-07-12  5:47 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, damien.lemoal, dmitry.fomichev, Stefan Hajnoczi,
	Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

Sam Li <faithilikerun@gmail.com> writes:

> This patch series adds support for zoned device to virtio-blk emulation. Zoned
> Storage can support sequential writes, which reduces write amplification in SSD,
> leading to higher write throughput and increased capacity.

Forgive me if this has already been discussed, or is explained deeper in
the patch series...

The commit message sounds like you're extending virtio-blk to optionally
emulate zoned storage.  Correct?

PATCH 1 adds a new block block device driver 'zoned_host_device', and
PATCH 9 exposes it in QAPI.  This is for passing through a zoned host
device, correct?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 0/9] Add support for zoned device
  2022-07-12  5:47 ` [RFC v4 0/9] Add support for zoned device Markus Armbruster
@ 2022-07-12  5:59   ` Sam Li
  2022-07-18 10:53     ` Markus Armbruster
  0 siblings, 1 reply; 42+ messages in thread
From: Sam Li @ 2022-07-12  5:59 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Damien Le Moal, Dmitry Fomichev, Stefan Hajnoczi,
	Hanna Reitz, qemu block, Eric Blake, Kevin Wolf, Fam Zheng,
	Hannes Reinecke

Markus Armbruster <armbru@redhat.com> 于2022年7月12日周二 13:47写道:
>
> Sam Li <faithilikerun@gmail.com> writes:
>
> > This patch series adds support for zoned device to virtio-blk emulation. Zoned
> > Storage can support sequential writes, which reduces write amplification in SSD,
> > leading to higher write throughput and increased capacity.
>
> Forgive me if this has already been discussed, or is explained deeper in
> the patch series...
>
> The commit message sounds like you're extending virtio-blk to optionally
> emulate zoned storage.  Correct?

Yes! The main purpose is to emulate zoned storage only for the zoned
device files. Right now, QEMU sees those as regular block devices.

> PATCH 1 adds a new block block device driver 'zoned_host_device', and
> PATCH 9 exposes it in QAPI.  This is for passing through a zoned host
> device, correct?

Yes! It allows the guest os see zoned host device. It is still in
development. Maybe the implementations will change later.


Best regards,
Sam


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
@ 2022-07-12  6:10   ` Hannes Reinecke
  2022-07-12  6:17     ` Sam Li
  2022-07-12  7:35   ` Damien Le Moal
  2022-07-12 15:49   ` Stefan Hajnoczi
  2 siblings, 1 reply; 42+ messages in thread
From: Hannes Reinecke @ 2022-07-12  6:10 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng

On 7/12/22 04:13, Sam Li wrote:
> By adding zone management operations in BlockDriver, storage
> controller emulation can use the new block layer APIs including
> zone_report and zone_mgmt(open, close, finish, reset).
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>   block/block-backend.c            |  41 ++++++
>   block/coroutines.h               |   5 +
>   block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
>   include/block/block-common.h     |  43 +++++-
>   include/block/block_int-common.h |  20 +++
>   5 files changed, 344 insertions(+), 1 deletion(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f425b00793..0a05247ae4 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
>       return ret;
>   }
>   
> +/*
> + * Send a zone_report command.
> + * offset can be any number within the zone size. No alignment for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    int64_t *nr_zones,
> +                                    BlockZoneDescriptor *zones)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk); /* increase before waiting */
> +    blk_wait_while_drained(blk);
> +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * Offset is the start of a zone and len is aligned to zones.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> +        int64_t offset, int64_t len)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    ret = blk_check_byte_request(blk, offset, len);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>   void blk_drain(BlockBackend *blk)
>   {
>       BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/coroutines.h b/block/coroutines.h
> index 830ecaa733..19aa96cc56 100644
> --- a/block/coroutines.h
> +++ b/block/coroutines.h
> @@ -80,6 +80,11 @@ int coroutine_fn
>   blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
>   
>   int coroutine_fn blk_co_do_flush(BlockBackend *blk);
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    int64_t *nr_zones,
> +                                    BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> +                                  int64_t offset, int64_t len);
>   
>   
>   /*
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 48cd096624..e7523ae2ed 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,7 @@
>   #include <sys/param.h>
>   #include <sys/syscall.h>
>   #include <sys/vfs.h>
> +#include <linux/blkzoned.h>
>   #include <linux/cdrom.h>
>   #include <linux/fd.h>
>   #include <linux/fs.h>
> @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
>               PreallocMode prealloc;
>               Error **errp;
>           } truncate;
> +        struct {
> +            int64_t *nr_zones;

Why is this a pointer?
I'd rather use a number here, seeing that it's the number
of zones in the *zones array ...

But the remainder looks good.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 2/9] qemu-io: add zoned block device operations.
  2022-07-12  2:13 ` [RFC v4 2/9] qemu-io: add zoned block device operations Sam Li
@ 2022-07-12  6:14   ` Hannes Reinecke
  2022-07-12  7:44   ` Damien Le Moal
  2022-07-27 14:13   ` Stefan Hajnoczi
  2 siblings, 0 replies; 42+ messages in thread
From: Hannes Reinecke @ 2022-07-12  6:14 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng

On 7/12/22 04:13, Sam Li wrote:
> Add zoned storage commands of the device: zone_open(zo), zone_close(zc),
> zone_reset(zs), zone_report(zp), zone_finish(zf).
> 
> For example, it can be called by:
> ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zone_report 0 0 1"
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>   block/io.c               |  57 ++++++++++++++++
>   include/block/block-io.h |  13 ++++
>   qemu-io-cmds.c           | 143 +++++++++++++++++++++++++++++++++++++++
>   3 files changed, 213 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 1e9bf09a49..a760be0131 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3243,6 +3243,63 @@ out:
>       return co.ret;
>   }
>   
> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                        int64_t *nr_zones,
> +                        BlockZoneDescriptor *zones)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || (!drv->bdrv_co_zone_report)) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +
> +    if (drv->bdrv_co_zone_report) {
> +        co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> +    } else {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +        qemu_coroutine_yield();
> +    }
> +
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
> +int bdrv_co_zone_mgmt(BlockDriverState *bs, enum zone_op op,
> +        int64_t offset, int64_t len)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || (!drv->bdrv_co_zone_mgmt)) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +
> +    if (drv->bdrv_co_zone_mgmt) {
> +        co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> +    } else {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +        qemu_coroutine_yield();
> +    }
> +
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
>   void *qemu_blockalign(BlockDriverState *bs, size_t size)
>   {
>       IO_CODE();
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index 053a27141a..a0ae140452 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -80,6 +80,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>   /* Ensure contents are flushed to disk.  */
>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>   
> +/* Report zone information of zone block device. */
> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                     int64_t *nr_zones,
> +                                     BlockZoneDescriptor *zones);
> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> +                                   int64_t offset, int64_t len);
> +
>   int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>   bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>   int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> @@ -289,6 +296,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>   int generated_co_wrapper
>   bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>   
> +int generated_co_wrapper
> +blk_zone_report(BlockBackend *blk, int64_t offset, int64_t *nr_zones,
> +                BlockZoneDescriptor *zones);
> +int generated_co_wrapper
> +blk_zone_mgmt(BlockBackend *blk, enum zone_op op, int64_t offset, int64_t len);
> +
>   /**
>    * bdrv_parent_drained_begin_single:
>    *
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 2f0d8ac25a..a88fa322d2 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1706,6 +1706,144 @@ static const cmdinfo_t flush_cmd = {
>       .oneline    = "flush all in-core file state to disk",
>   };
>   
> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, nr_zones;
> +
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    nr_zones = cvtnum(argv[optind]);
> +
> +    g_autofree BlockZoneDescriptor *zones = NULL;
> +    zones = g_new(BlockZoneDescriptor, nr_zones);
> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> +    if (ret < 0) {
> +        printf("zone report failed: %s\n", strerror(-ret));
> +    } else {
> +        for (int i = 0; i < nr_zones; ++i) {
> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> +                   "zcond:%u, [type: %u]\n",
> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> +                   zones[i].cond, zones[i].type);
> +        }
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_report_cmd = {
> +        .name = "zone_report",
> +        .altname = "zp",
> +        .cfunc = zone_report_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset number",
> +        .oneline = "report zone information",
> +};
> +
'zp' is a bit odd; I'd rather go with 'zrp'.

Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute
  2022-07-12  2:13 ` [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute Sam Li
@ 2022-07-12  6:16   ` Hannes Reinecke
  2022-07-12  7:37   ` Damien Le Moal
  1 sibling, 0 replies; 42+ messages in thread
From: Hannes Reinecke @ 2022-07-12  6:16 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng

On 7/12/22 04:13, Sam Li wrote:
> Use sysfs attribute files to get the zoned device information in case
> that ioctl() commands of zone management interface won't work.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>   block/file-posix.c | 38 +++++++++++++++++++++++++++-----------
>   1 file changed, 27 insertions(+), 11 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12  6:10   ` Hannes Reinecke
@ 2022-07-12  6:17     ` Sam Li
  0 siblings, 0 replies; 42+ messages in thread
From: Sam Li @ 2022-07-12  6:17 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng

Hannes Reinecke <hare@suse.de> 于2022年7月12日周二 14:10写道:
>
> On 7/12/22 04:13, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage
> > controller emulation can use the new block layer APIs including
> > zone_report and zone_mgmt(open, close, finish, reset).
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >   block/block-backend.c            |  41 ++++++
> >   block/coroutines.h               |   5 +
> >   block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
> >   include/block/block-common.h     |  43 +++++-
> >   include/block/block_int-common.h |  20 +++
> >   5 files changed, 344 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index f425b00793..0a05247ae4 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
> >       return ret;
> >   }
> >
> > +/*
> > + * Send a zone_report command.
> > + * offset can be any number within the zone size. No alignment for offset.
> > + * nr_zones represents IN maximum and OUT actual.
> > + */
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    int64_t *nr_zones,
> > +                                    BlockZoneDescriptor *zones)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk); /* increase before waiting */
> > +    blk_wait_while_drained(blk);
> > +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> > +/*
> > + * Send a zone_management command.
> > + * Offset is the start of a zone and len is aligned to zones.
> > + */
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> > +        int64_t offset, int64_t len)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk);
> > +    blk_wait_while_drained(blk);
> > +    ret = blk_check_byte_request(blk, offset, len);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +
> > +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> >   void blk_drain(BlockBackend *blk)
> >   {
> >       BlockDriverState *bs = blk_bs(blk);
> > diff --git a/block/coroutines.h b/block/coroutines.h
> > index 830ecaa733..19aa96cc56 100644
> > --- a/block/coroutines.h
> > +++ b/block/coroutines.h
> > @@ -80,6 +80,11 @@ int coroutine_fn
> >   blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
> >
> >   int coroutine_fn blk_co_do_flush(BlockBackend *blk);
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    int64_t *nr_zones,
> > +                                    BlockZoneDescriptor *zones);
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> > +                                  int64_t offset, int64_t len);
> >
> >
> >   /*
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 48cd096624..e7523ae2ed 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -67,6 +67,7 @@
> >   #include <sys/param.h>
> >   #include <sys/syscall.h>
> >   #include <sys/vfs.h>
> > +#include <linux/blkzoned.h>
> >   #include <linux/cdrom.h>
> >   #include <linux/fd.h>
> >   #include <linux/fs.h>
> > @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
> >               PreallocMode prealloc;
> >               Error **errp;
> >           } truncate;
> > +        struct {
> > +            int64_t *nr_zones;
>
> Why is this a pointer?
> I'd rather use a number here, seeing that it's the number
> of zones in the *zones array ...

I see. The pointer is a little redundant. Will change it.

> But the remainder looks good.

Thanks for reviewing!


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model.
  2022-07-12  2:13 ` [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
@ 2022-07-12  6:17   ` Hannes Reinecke
  2022-07-12  6:35     ` Damien Le Moal
  2022-07-12  7:42   ` Damien Le Moal
  1 sibling, 1 reply; 42+ messages in thread
From: Hannes Reinecke @ 2022-07-12  6:17 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf,
	Fam Zheng

On 7/12/22 04:13, Sam Li wrote:
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>   block/file-posix.c           | 60 ++++++++++++++++++++++++++++++++++++
>   include/block/block-common.h |  4 +--
>   2 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 3161d39ea4..42708012ff 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1279,6 +1279,65 @@ out:
>   #endif
>   }
>   
> +/*
> + * Convert the zoned attribute file in sysfs to internal value.
> + */
> +static zone_model get_sysfs_str_val(int fd, struct stat *st) {
> +#ifdef CONFIG_LINUX
> +    char buf[32];
> +    char *sysfspath = NULL;
> +    int ret, offset;
> +    int sysfd = -1;
> +
> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;
> +        }
> +        return -ENOTSUP;
> +    }
> +
> +    if (!S_ISBLK(st->st_mode)) {
> +        return -ENOTSUP;
> +    }
> +
> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/zoned",
> +                                major(st->st_rdev), minor(st->st_rdev));
> +    sysfd = open(sysfspath, O_RDONLY);
> +    if (sysfd == -1) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    offset = 0;
> +    do {
> +        ret = read(sysfd, buf + offset, sizeof(buf) - 1 + offset);
> +        if (ret > 0) {
> +            offset += ret;
> +        }
> +    } while (ret == -1);
> +    /* The file is ended with '\n' */
> +    if (buf[ret - 1] == '\n') {
> +        buf[ret - 1] = '\0';
> +    }
> +
> +    if (strcmp(buf, "host-managed") == 0) {
> +        return BLK_Z_HM;
> +    } else if (strcmp(buf, "host-aware") == 0) {
> +        return BLK_Z_HA;
> +    } else {
> +        return -ENOTSUP;
> +    }
> +
> +out:
> +    if (sysfd != -1) {
> +        close(sysfd);
> +    }
> +    g_free(sysfspath);
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>   static int hdev_get_max_segments(int fd, struct stat *st) {
>       return get_sysfs_long_val(fd, st, "max_segments");
>   }
> @@ -1885,6 +1944,7 @@ static int handle_aiocb_zone_mgmt(void *opaque) {
>       int64_t len = aiocb->aio_nbytes;
>       zone_op op = aiocb->zone_mgmt.op;
>   
> +    zone_model mod;
>       struct blk_zone_range range;
>       const char *ioctl_name;
>       unsigned long ioctl_op;
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 78cddeeda5..35e00afe8e 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -56,8 +56,8 @@ typedef enum zone_op {
>   } zone_op;
>   
>   typedef enum zone_model {
> -    BLK_Z_HM,
> -    BLK_Z_HA,
> +    BLK_Z_HM = 0x1,
> +    BLK_Z_HA = 0x2,
>   } zone_model;
>   
>   typedef enum BlkZoneCondition {

This hunk is unrelated, please move it into a different patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model.
  2022-07-12  6:17   ` Hannes Reinecke
@ 2022-07-12  6:35     ` Damien Le Moal
  0 siblings, 0 replies; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12  6:35 UTC (permalink / raw)
  To: Hannes Reinecke, Sam Li, qemu-devel
  Cc: Markus Armbruster, dmitry.fomichev, Stefan Hajnoczi, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng

On 7/12/22 15:17, Hannes Reinecke wrote:
> On 7/12/22 04:13, Sam Li wrote:
>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>> ---
>>   block/file-posix.c           | 60 ++++++++++++++++++++++++++++++++++++
>>   include/block/block-common.h |  4 +--
>>   2 files changed, 62 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 3161d39ea4..42708012ff 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -1279,6 +1279,65 @@ out:
>>   #endif
>>   }
>>   
>> +/*
>> + * Convert the zoned attribute file in sysfs to internal value.
>> + */
>> +static zone_model get_sysfs_str_val(int fd, struct stat *st) {
>> +#ifdef CONFIG_LINUX
>> +    char buf[32];
>> +    char *sysfspath = NULL;
>> +    int ret, offset;
>> +    int sysfd = -1;
>> +
>> +    if (S_ISCHR(st->st_mode)) {
>> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
>> +            return ret;
>> +        }
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if (!S_ISBLK(st->st_mode)) {
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/zoned",
>> +                                major(st->st_rdev), minor(st->st_rdev));
>> +    sysfd = open(sysfspath, O_RDONLY);
>> +    if (sysfd == -1) {
>> +        ret = -errno;
>> +        goto out;
>> +    }
>> +    offset = 0;
>> +    do {
>> +        ret = read(sysfd, buf + offset, sizeof(buf) - 1 + offset);
>> +        if (ret > 0) {
>> +            offset += ret;
>> +        }
>> +    } while (ret == -1);
>> +    /* The file is ended with '\n' */
>> +    if (buf[ret - 1] == '\n') {
>> +        buf[ret - 1] = '\0';
>> +    }
>> +
>> +    if (strcmp(buf, "host-managed") == 0) {
>> +        return BLK_Z_HM;
>> +    } else if (strcmp(buf, "host-aware") == 0) {
>> +        return BLK_Z_HA;
>> +    } else {
>> +        return -ENOTSUP;
>> +    }
>> +
>> +out:
>> +    if (sysfd != -1) {
>> +        close(sysfd);
>> +    }
>> +    g_free(sysfspath);
>> +    return ret;
>> +#else
>> +    return -ENOTSUP;
>> +#endif
>> +}
>> +
>>   static int hdev_get_max_segments(int fd, struct stat *st) {
>>       return get_sysfs_long_val(fd, st, "max_segments");
>>   }
>> @@ -1885,6 +1944,7 @@ static int handle_aiocb_zone_mgmt(void *opaque) {
>>       int64_t len = aiocb->aio_nbytes;
>>       zone_op op = aiocb->zone_mgmt.op;
>>   
>> +    zone_model mod;
>>       struct blk_zone_range range;
>>       const char *ioctl_name;
>>       unsigned long ioctl_op;
>> diff --git a/include/block/block-common.h b/include/block/block-common.h
>> index 78cddeeda5..35e00afe8e 100644
>> --- a/include/block/block-common.h
>> +++ b/include/block/block-common.h
>> @@ -56,8 +56,8 @@ typedef enum zone_op {
>>   } zone_op;
>>   
>>   typedef enum zone_model {
>> -    BLK_Z_HM,
>> -    BLK_Z_HA,
>> +    BLK_Z_HM = 0x1,
>> +    BLK_Z_HA = 0x2,
>>   } zone_model;
>>   
>>   typedef enum BlkZoneCondition {
> 
> This hunk is unrelated, please move it into a different patch.

This actually belong to patch 1 where this enum is introduced. No need for
a different patch.

> 
> Cheers,
> 
> Hannes


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
  2022-07-12  6:10   ` Hannes Reinecke
@ 2022-07-12  7:35   ` Damien Le Moal
  2022-07-13  0:54     ` Sam Li
  2022-07-12 15:49   ` Stefan Hajnoczi
  2 siblings, 1 reply; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12  7:35 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: Markus Armbruster, dmitry.fomichev, Stefan Hajnoczi, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

On 7/12/22 11:13, Sam Li wrote:
> By adding zone management operations in BlockDriver, storage
> controller emulation can use the new block layer APIs including
> zone_report and zone_mgmt(open, close, finish, reset).
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/block-backend.c            |  41 ++++++
>  block/coroutines.h               |   5 +
>  block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
>  include/block/block-common.h     |  43 +++++-
>  include/block/block_int-common.h |  20 +++
>  5 files changed, 344 insertions(+), 1 deletion(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f425b00793..0a05247ae4 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
>      return ret;
>  }
>  
> +/*
> + * Send a zone_report command.
> + * offset can be any number within the zone size. No alignment for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    int64_t *nr_zones,
> +                                    BlockZoneDescriptor *zones)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk); /* increase before waiting */
> +    blk_wait_while_drained(blk);
> +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * Offset is the start of a zone and len is aligned to zones.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> +        int64_t offset, int64_t len)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    ret = blk_check_byte_request(blk, offset, len);
> +    if (ret < 0) {
> +        return ret;

You missed adding "blk_dec_in_flight(blk);" before return here. But I
think you can move the call to blk_check_byte_request() before
blk_inc_in_flight() to avoid having to call blk_dec_in_flight().

> +    }
> +
> +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/coroutines.h b/block/coroutines.h
> index 830ecaa733..19aa96cc56 100644
> --- a/block/coroutines.h
> +++ b/block/coroutines.h
> @@ -80,6 +80,11 @@ int coroutine_fn
>  blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
>  
>  int coroutine_fn blk_co_do_flush(BlockBackend *blk);
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    int64_t *nr_zones,
> +                                    BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> +                                  int64_t offset, int64_t len);
>  
>  
>  /*
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 48cd096624..e7523ae2ed 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,7 @@
>  #include <sys/param.h>
>  #include <sys/syscall.h>
>  #include <sys/vfs.h>
> +#include <linux/blkzoned.h>

You need to conditionally include this because not all kernels provide
this file. Old kernels will not have it. So you need something like:

#if defined(CONFIG_BLKZONED)
#include <linux/blkzoned.h>
#endif

And adding this to meson.build should do the trick:

diff --git a/meson.build b/meson.build
index 65a885ea69..31d8852a35 100644
--- a/meson.build
+++ b/meson.build
@@ -1869,6 +1869,7 @@ config_host_data.set('CONFIG_REPLICATION',
get_option('live_block_migration').al

 # has_header
 config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
 config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
 config_host_data.set('CONFIG_VALGRIND_H',
cc.has_header('valgrind/valgrind.h'))
 config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))

Then in build/config-host.h, you will see "#define CONFIG_BLKZONED". You
then can use "#if defined(CONFIG_BLKZONED)" to conditionally define the
code related to zoned devices.

To test all this, temporarily rename your host
/usr/include/linux/blkzoned.h file to some other name, configure qemu and
see if it compiles.

>  #include <linux/cdrom.h>
>  #include <linux/fd.h>
>  #include <linux/fs.h>
> @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
>              PreallocMode prealloc;
>              Error **errp;
>          } truncate;
> +        struct {
> +            int64_t *nr_zones;
> +            BlockZoneDescriptor *zones;
> +        } zone_report;
> +        struct {
> +            zone_op op;
> +        } zone_mgmt;
>      };
>  } RawPosixAIOData;
>  
> @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
>  }
>  #endif
>  
> +/*
> + * parse_zone - Fill a zone descriptor
> + */
> +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> +                              struct blk_zone *blkz) {
> +    zone->start = blkz->start;
> +    zone->length = blkz->len;
> +    zone->cap = blkz->capacity;
> +    zone->wp = blkz->wp - blkz->start;
> +    zone->type = blkz->type;
> +    zone->cond = blkz->cond;
> +}
> +
> +static int handle_aiocb_zone_report(void *opaque) {
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    int64_t *nr_zones = aiocb->zone_report.nr_zones;
> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> +    int64_t offset = aiocb->aio_offset;
> +
> +    struct blk_zone *blkz;
> +    int64_t rep_size, nrz;
> +    int ret, n = 0, i = 0;
> +
> +    nrz = *nr_zones;
> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> +    g_autofree struct blk_zone_report *rep = NULL;
> +    rep = g_malloc(rep_size);
> +    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
> +    printf("start to report zone with offset: 0x%lx\n", offset);
> +
> +    blkz = (struct blk_zone *)(rep + 1);
> +    while (n < nrz) {
> +        memset(rep, 0, rep_size);
> +        rep->sector = offset;
> +        rep->nr_zones = nrz;
> +
> +        ret = ioctl(fd, BLKREPORTZONE, rep);
> +        if (ret != 0) {
> +            ret = -errno;
> +            error_report("%d: ioctl BLKREPORTZONE at %ld failed %d",
> +                         fd, offset, errno);
> +            return ret;
> +        }
> +
> +        if (!rep->nr_zones) {
> +            break;
> +        }
> +
> +        for (i = 0; i < rep->nr_zones; i++, n++) {
> +            parse_zone(&zones[n], &blkz[i]);
> +            /* The next report should start after the last zone reported */
> +            offset = blkz[i].start + blkz[i].len;
> +        }
> +    }
> +
> +    *nr_zones = n;
> +    return 0;
> +}
> +
> +static int handle_aiocb_zone_mgmt(void *opaque) {
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    int64_t offset = aiocb->aio_offset;
> +    int64_t len = aiocb->aio_nbytes;
> +    zone_op op = aiocb->zone_mgmt.op;
> +
> +    struct blk_zone_range range;
> +    const char *ioctl_name;
> +    unsigned long ioctl_op;
> +    int64_t zone_size;
> +    int64_t zone_size_mask;
> +    int ret;
> +
> +    g_autofree struct stat *file = NULL;
> +    file = g_new(struct stat, 1);
> +    stat(s->filename, file);
> +    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
> +    zone_size_mask = zone_size - 1;
> +    if (offset & zone_size_mask) {
> +        error_report("offset is not the start of a zone");
> +        return -EINVAL;
> +    }
> +
> +    if (len & zone_size_mask) {
> +        error_report("len is not aligned to zones");
> +        return -EINVAL;
> +    }
> +
> +    switch (op) {
> +    case zone_open:
> +        ioctl_name = "BLKOPENZONE";
> +        ioctl_op = BLKOPENZONE;
> +        break;
> +    case zone_close:
> +        ioctl_name = "BLKCLOSEZONE";
> +        ioctl_op = BLKCLOSEZONE;
> +        break;
> +    case zone_finish:
> +        ioctl_name = "BLKFINISHZONE";
> +        ioctl_op = BLKFINISHZONE;
> +        break;
> +    case zone_reset:
> +        ioctl_name = "BLKRESETZONE";
> +        ioctl_op = BLKRESETZONE;
> +        break;
> +    default:
> +        error_report("Invalid zone operation 0x%x", op);
> +        return -EINVAL;
> +    }
> +
> +    /* Execute the operation */
> +    range.sector = offset;
> +    range.nr_sectors = len;
> +    ret = ioctl(fd, ioctl_op, &range);
> +    if (ret != 0) {
> +        error_report("ioctl %s failed %d",
> +                     ioctl_name, errno);
> +        return -errno;
> +    }
> +
> +    return 0;
> +}
> +
>  static int handle_aiocb_copy_range(void *opaque)
>  {
>      RawPosixAIOData *aiocb = opaque;
> @@ -2973,6 +3105,59 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
>      }
>  }
>  
> +/*
> + * zone report - Get a zone block device's information in the form
> + * of an array of zone descriptors.
> + *
> + * @param bs: passing zone block device file descriptor
> + * @param zones: an array of zone descriptors to hold zone
> + * information on reply
> + * @param offset: offset can be any byte within the zone size.
> + * @param len: (not sure yet.
> + * @return 0 on success, -1 on failure
> + */
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                           int64_t *nr_zones,
> +                                           BlockZoneDescriptor *zones) {
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +
> +    acb = (RawPosixAIOData) {
> +        .bs         = bs,
> +        .aio_fildes = s->fd,
> +        .aio_type   = QEMU_AIO_IOCTL,
> +        .aio_offset = offset,
> +        .zone_report    = {
> +                .nr_zones       = nr_zones,
> +                .zones          = zones,
> +        },
> +    };
> +
> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> +}
> +
> +/*
> + * zone management operations - Execute an operation on a zone
> + */
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> +        int64_t offset, int64_t len) {
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +
> +    acb = (RawPosixAIOData) {
> +        .bs             = bs,
> +        .aio_fildes     = s->fd,
> +        .aio_type       = QEMU_AIO_IOCTL,
> +        .aio_offset     = offset,
> +        .aio_nbytes     = len,
> +        .zone_mgmt  = {
> +                .op = op,
> +        },
> +    };
> +
> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> +}
> +
>  static coroutine_fn int
>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                  bool blkdev)
> @@ -3324,6 +3509,9 @@ BlockDriver bdrv_file = {
>      .bdrv_abort_perm_update = raw_abort_perm_update,
>      .create_opts = &raw_create_opts,
>      .mutable_opts = mutable_opts,
> +
> +    .bdrv_co_zone_report = raw_co_zone_report,
> +    .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>  };
>  
>  /***********************************************/
> @@ -3703,6 +3891,53 @@ static BlockDriver bdrv_host_device = {
>  #endif
>  };
>  
> +static BlockDriver bdrv_zoned_host_device = {
> +        .format_name = "zoned_host_device",
> +        .protocol_name = "zoned_host_device",
> +        .instance_size = sizeof(BDRVRawState),
> +        .bdrv_needs_filename = true,
> +        .bdrv_probe_device  = hdev_probe_device,
> +        .bdrv_parse_filename = hdev_parse_filename,
> +        .bdrv_file_open     = hdev_open,
> +        .bdrv_close         = raw_close,
> +        .bdrv_reopen_prepare = raw_reopen_prepare,
> +        .bdrv_reopen_commit  = raw_reopen_commit,
> +        .bdrv_reopen_abort   = raw_reopen_abort,
> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> +        .create_opts         = &bdrv_create_opts_simple,
> +        .mutable_opts        = mutable_opts,
> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> +
> +        .bdrv_co_preadv         = raw_co_preadv,
> +        .bdrv_co_pwritev        = raw_co_pwritev,
> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> +        .bdrv_refresh_limits = raw_refresh_limits,
> +        .bdrv_io_plug = raw_aio_plug,
> +        .bdrv_io_unplug = raw_aio_unplug,
> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> +
> +        .bdrv_co_truncate       = raw_co_truncate,
> +        .bdrv_getlength = raw_getlength,
> +        .bdrv_get_info = raw_get_info,
> +        .bdrv_get_allocated_file_size
> +                            = raw_get_allocated_file_size,
> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> +        .bdrv_check_perm = raw_check_perm,
> +        .bdrv_set_perm   = raw_set_perm,
> +        .bdrv_abort_perm_update = raw_abort_perm_update,
> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> +        .bdrv_probe_geometry = hdev_probe_geometry,
> +        .bdrv_co_ioctl = hdev_co_ioctl,
> +
> +        /* zone management operations */
> +        .bdrv_co_zone_report = raw_co_zone_report,
> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +};
> +
>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>  static void cdrom_parse_filename(const char *filename, QDict *options,
>                                   Error **errp)
> @@ -3964,6 +4199,7 @@ static void bdrv_file_init(void)
>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>      bdrv_register(&bdrv_host_device);
>  #ifdef __linux__
> +    bdrv_register(&bdrv_zoned_host_device);
>      bdrv_register(&bdrv_host_cdrom);
>  #endif
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index fdb7306e78..78cddeeda5 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -23,7 +23,6 @@
>   */
>  #ifndef BLOCK_COMMON_H
>  #define BLOCK_COMMON_H
> -
>  #include "block/aio.h"
>  #include "block/aio-wait.h"
>  #include "qemu/iov.h"
> @@ -49,6 +48,48 @@ typedef struct BlockDriver BlockDriver;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildClass BdrvChildClass;
>  
> +typedef enum zone_op {
> +    zone_open,
> +    zone_close,
> +    zone_finish,
> +    zone_reset,
> +} zone_op;
> +
> +typedef enum zone_model {
> +    BLK_Z_HM,
> +    BLK_Z_HA,
> +} zone_model;
> +
> +typedef enum BlkZoneCondition {
> +    BLK_ZS_NOT_WP = 0x0,
> +    BLK_ZS_EMPTY = 0x1,
> +    BLK_ZS_IOPEN = 0x2,
> +    BLK_ZS_EOPEN = 0x3,
> +    BLK_ZS_CLOSED = 0x4,
> +    BLK_ZS_RDONLY = 0xD,
> +    BLK_ZS_FULL = 0xE,
> +    BLK_ZS_OFFLINE = 0xF,
> +} BlkZoneCondition;
> +
> +typedef enum BlkZoneType {
> +    BLK_ZT_CONV = 0x1,
> +    BLK_ZT_SWR = 0x2,
> +    BLK_ZT_SWP = 0x3,
> +} BlkZoneType;
> +
> +/*
> + * Zone descriptor data structure.
> + * Provide information on a zone with all position and size values in bytes.
> + */
> +typedef struct BlockZoneDescriptor {
> +    uint64_t start;
> +    uint64_t length;
> +    uint64_t cap;
> +    uint64_t wp;
> +    BlkZoneType type;
> +    BlkZoneCondition cond;
> +} BlockZoneDescriptor;
> +
>  typedef struct BlockDriverInfo {
>      /* in bytes, 0 if irrelevant */
>      int cluster_size;
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 8947abab76..6037871089 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -94,6 +94,20 @@ typedef struct BdrvTrackedRequest {
>      struct BdrvTrackedRequest *waiting_for;
>  } BdrvTrackedRequest;
>  
> +/**
> + * Zone device information data structure.
> + * Provide information on a device.
> + */
> +typedef struct zbd_dev {
> +    uint32_t zone_size;
> +    zone_model model;
> +    uint32_t block_size;
> +    uint32_t write_granularity;
> +    uint32_t nr_zones;
> +    struct BlockZoneDescriptor *zones; /* array of zones */
> +    uint32_t max_nr_open_zones; /* maximum number of explicitly open zones */
> +    uint32_t max_nr_active_zones;
> +} zbd_dev;
>  
>  struct BlockDriver {
>      /*
> @@ -691,6 +705,12 @@ struct BlockDriver {
>                                            QEMUIOVector *qiov,
>                                            int64_t pos);
>  
> +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> +            int64_t offset, int64_t *nr_zones,
> +            BlockZoneDescriptor *zones);
> +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, enum zone_op op,
> +            int64_t offset, int64_t len);
> +
>      /* removable device specific */
>      bool (*bdrv_is_inserted)(BlockDriverState *bs);
>      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute
  2022-07-12  2:13 ` [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute Sam Li
  2022-07-12  6:16   ` Hannes Reinecke
@ 2022-07-12  7:37   ` Damien Le Moal
  1 sibling, 0 replies; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12  7:37 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: Markus Armbruster, dmitry.fomichev, Stefan Hajnoczi, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

On 7/12/22 11:13, Sam Li wrote:
> Use sysfs attribute files to get the zoned device information in case
> that ioctl() commands of zone management interface won't work.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/file-posix.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index e7523ae2ed..3161d39ea4 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1218,15 +1218,19 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
>  #endif
>  }
>  
> -static int hdev_get_max_segments(int fd, struct stat *st)
> -{
> +/*
> + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
> + * max_open_zones, max_active_zones) through sysfs attribute files.
> + */
> +static long get_sysfs_long_val(int fd, struct stat *st,
> +                               const char *attribute) {
>  #ifdef CONFIG_LINUX
>      char buf[32];
>      const char *end;
>      char *sysfspath = NULL;
>      int ret;
>      int sysfd = -1;
> -    long max_segments;
> +    long val;
>  
>      if (S_ISCHR(st->st_mode)) {
>          if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> @@ -1239,8 +1243,9 @@ static int hdev_get_max_segments(int fd, struct stat *st)
>          return -ENOTSUP;
>      }
>  
> -    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -                                major(st->st_rdev), minor(st->st_rdev));
> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
> +                                major(st->st_rdev), minor(st->st_rdev),
> +                                attribute);
>      sysfd = open(sysfspath, O_RDONLY);
>      if (sysfd == -1) {
>          ret = -errno;
> @@ -1258,9 +1263,9 @@ static int hdev_get_max_segments(int fd, struct stat *st)
>      }
>      buf[ret] = 0;
>      /* The file is ended with '\n', pass 'end' to accept that. */
> -    ret = qemu_strtol(buf, &end, 10, &max_segments);
> +    ret = qemu_strtol(buf, &end, 10, &val);
>      if (ret == 0 && end && *end == '\n') {
> -        ret = max_segments;
> +        ret = val;
>      }
>  
>  out:
> @@ -1274,6 +1279,10 @@ out:
>  #endif
>  }
>  
> +static int hdev_get_max_segments(int fd, struct stat *st) {
> +    return get_sysfs_long_val(fd, st, "max_segments");
> +}
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
> @@ -1883,10 +1892,17 @@ static int handle_aiocb_zone_mgmt(void *opaque) {
>      int64_t zone_size_mask;
>      int ret;
>  
> -    g_autofree struct stat *file = NULL;
> -    file = g_new(struct stat, 1);
> -    stat(s->filename, file);
> -    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
> +    struct stat file;
> +    if (fstat(fd, &file) < 0) {
> +        return -errno;
> +    }
> +    mod = get_sysfs_str_val(fd, &file);
> +    if (mod != BLK_Z_HM) {
> +        ret = -ENOTSUP;
> +        return ret;
> +    }
> +
> +    zone_size = get_sysfs_long_val(fd, &file, "chunk_sectors");
>      zone_size_mask = zone_size - 1;
>      if (offset & zone_size_mask) {
>          error_report("offset is not the start of a zone");

This hunk that modifies raw_refresh_limits() should go into another patch.
Likely, you want that as part of patch 1, so this patch to introduce the
get_sysfs_long_val() helper function should come before patch 1.


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model.
  2022-07-12  2:13 ` [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
  2022-07-12  6:17   ` Hannes Reinecke
@ 2022-07-12  7:42   ` Damien Le Moal
  1 sibling, 0 replies; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12  7:42 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: Markus Armbruster, dmitry.fomichev, Stefan Hajnoczi, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

On 7/12/22 11:13, Sam Li wrote:
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/file-posix.c           | 60 ++++++++++++++++++++++++++++++++++++
>  include/block/block-common.h |  4 +--
>  2 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 3161d39ea4..42708012ff 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1279,6 +1279,65 @@ out:
>  #endif
>  }
>  
> +/*
> + * Convert the zoned attribute file in sysfs to internal value.
> + */
> +static zone_model get_sysfs_str_val(int fd, struct stat *st) {

This is not a generic definition of a helper function: as-is, this
function can only get the zoned attribute string. It would be better to do
the same as what you did for get_sysfs_long_val() and pass the attribute
name you want to look at and add another argument to return the string, e.g.:

static void get_sysfs_str_val(int fd, struct stat *st,
			      const char *attribute, char **val)

And if the attribute does not exist, or this is not linux, always return
"none" in str.

> +#ifdef CONFIG_LINUX
> +    char buf[32];
> +    char *sysfspath = NULL;
> +    int ret, offset;
> +    int sysfd = -1;
> +
> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;
> +        }
> +        return -ENOTSUP;
> +    }
> +
> +    if (!S_ISBLK(st->st_mode)) {
> +        return -ENOTSUP;
> +    }
> +
> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/zoned",
> +                                major(st->st_rdev), minor(st->st_rdev));
> +    sysfd = open(sysfspath, O_RDONLY);
> +    if (sysfd == -1) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    offset = 0;
> +    do {
> +        ret = read(sysfd, buf + offset, sizeof(buf) - 1 + offset);
> +        if (ret > 0) {
> +            offset += ret;
> +        }
> +    } while (ret == -1);
> +    /* The file is ended with '\n' */
> +    if (buf[ret - 1] == '\n') {
> +        buf[ret - 1] = '\0';
> +    }
> +
> +    if (strcmp(buf, "host-managed") == 0) {
> +        return BLK_Z_HM;
> +    } else if (strcmp(buf, "host-aware") == 0) {
> +        return BLK_Z_HA;
> +    } else {
> +        return -ENOTSUP;
> +    }

Then all this code actually looking at the string value can go into a
different patch, or in patch 1 (same comment as for patch 3).

> +
> +out:
> +    if (sysfd != -1) {
> +        close(sysfd);
> +    }
> +    g_free(sysfspath);
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static int hdev_get_max_segments(int fd, struct stat *st) {
>      return get_sysfs_long_val(fd, st, "max_segments");
>  }
> @@ -1885,6 +1944,7 @@ static int handle_aiocb_zone_mgmt(void *opaque) {
>      int64_t len = aiocb->aio_nbytes;
>      zone_op op = aiocb->zone_mgmt.op;
>  
> +    zone_model mod;
>      struct blk_zone_range range;
>      const char *ioctl_name;
>      unsigned long ioctl_op;
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 78cddeeda5..35e00afe8e 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -56,8 +56,8 @@ typedef enum zone_op {
>  } zone_op;
>  
>  typedef enum zone_model {
> -    BLK_Z_HM,
> -    BLK_Z_HA,
> +    BLK_Z_HM = 0x1,
> +    BLK_Z_HA = 0x2,
>  } zone_model;
>  
>  typedef enum BlkZoneCondition {


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 2/9] qemu-io: add zoned block device operations.
  2022-07-12  2:13 ` [RFC v4 2/9] qemu-io: add zoned block device operations Sam Li
  2022-07-12  6:14   ` Hannes Reinecke
@ 2022-07-12  7:44   ` Damien Le Moal
  2022-07-27 14:13   ` Stefan Hajnoczi
  2 siblings, 0 replies; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12  7:44 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: Markus Armbruster, dmitry.fomichev, Stefan Hajnoczi, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

On 7/12/22 11:13, Sam Li wrote:
> Add zoned storage commands of the device: zone_open(zo), zone_close(zc),
> zone_reset(zs), zone_report(zp), zone_finish(zf).
> 
> For example, it can be called by:
> ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zone_report 0 0 1"

I would move this patch and also patch 5 last in the series, so that the
tests are added once all the code is in place.

> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/io.c               |  57 ++++++++++++++++
>  include/block/block-io.h |  13 ++++
>  qemu-io-cmds.c           | 143 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 213 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 1e9bf09a49..a760be0131 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3243,6 +3243,63 @@ out:
>      return co.ret;
>  }
>  
> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                        int64_t *nr_zones,
> +                        BlockZoneDescriptor *zones)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || (!drv->bdrv_co_zone_report)) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +
> +    if (drv->bdrv_co_zone_report) {
> +        co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> +    } else {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +        qemu_coroutine_yield();
> +    }
> +
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
> +int bdrv_co_zone_mgmt(BlockDriverState *bs, enum zone_op op,
> +        int64_t offset, int64_t len)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || (!drv->bdrv_co_zone_mgmt)) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +
> +    if (drv->bdrv_co_zone_mgmt) {
> +        co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> +    } else {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +        qemu_coroutine_yield();
> +    }
> +
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
>      IO_CODE();
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index 053a27141a..a0ae140452 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -80,6 +80,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>  /* Ensure contents are flushed to disk.  */
>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>  
> +/* Report zone information of zone block device. */
> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                     int64_t *nr_zones,
> +                                     BlockZoneDescriptor *zones);
> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> +                                   int64_t offset, int64_t len);
> +
>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> @@ -289,6 +296,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  int generated_co_wrapper
>  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  
> +int generated_co_wrapper
> +blk_zone_report(BlockBackend *blk, int64_t offset, int64_t *nr_zones,
> +                BlockZoneDescriptor *zones);
> +int generated_co_wrapper
> +blk_zone_mgmt(BlockBackend *blk, enum zone_op op, int64_t offset, int64_t len);
> +
>  /**
>   * bdrv_parent_drained_begin_single:
>   *
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 2f0d8ac25a..a88fa322d2 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1706,6 +1706,144 @@ static const cmdinfo_t flush_cmd = {
>      .oneline    = "flush all in-core file state to disk",
>  };
>  
> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, nr_zones;
> +
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    nr_zones = cvtnum(argv[optind]);
> +
> +    g_autofree BlockZoneDescriptor *zones = NULL;
> +    zones = g_new(BlockZoneDescriptor, nr_zones);
> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> +    if (ret < 0) {
> +        printf("zone report failed: %s\n", strerror(-ret));
> +    } else {
> +        for (int i = 0; i < nr_zones; ++i) {
> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> +                   "zcond:%u, [type: %u]\n",
> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> +                   zones[i].cond, zones[i].type);
> +        }
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_report_cmd = {
> +        .name = "zone_report",
> +        .altname = "zp",
> +        .cfunc = zone_report_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset number",
> +        .oneline = "report zone information",
> +};
> +
> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, zone_open, offset, len);
> +    if (ret < 0) {
> +        printf("zone open failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_open_cmd = {
> +        .name = "zone_open",
> +        .altname = "zo",
> +        .cfunc = zone_open_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "explicit open a range of zones in zone block device",
> +};
> +
> +static int zone_close_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, zone_close, offset, len);
> +    if (ret < 0) {
> +        printf("zone close failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_close_cmd = {
> +        .name = "zone_close",
> +        .altname = "zc",
> +        .cfunc = zone_close_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "close a range of zones in zone block device",
> +};
> +
> +static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, zone_finish, offset, len);
> +    if (ret < 0) {
> +        printf("zone finish failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_finish_cmd = {
> +        .name = "zone_finish",
> +        .altname = "zf",
> +        .cfunc = zone_finish_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "finish a range of zones in zone block device",
> +};
> +
> +static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, zone_reset, offset, len);
> +    if (ret < 0) {
> +        printf("zone reset failed: %s\n", strerror(-ret));
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_reset_cmd = {
> +        .name = "zone_reset",
> +        .altname = "zrs",
> +        .cfunc = zone_reset_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset len",
> +        .oneline = "reset a zone write pointer in zone block device",
> +};
> +
> +
>  static int truncate_f(BlockBackend *blk, int argc, char **argv);
>  static const cmdinfo_t truncate_cmd = {
>      .name       = "truncate",
> @@ -2498,6 +2636,11 @@ static void __attribute((constructor)) init_qemuio_commands(void)
>      qemuio_add_command(&aio_write_cmd);
>      qemuio_add_command(&aio_flush_cmd);
>      qemuio_add_command(&flush_cmd);
> +    qemuio_add_command(&zone_report_cmd);
> +    qemuio_add_command(&zone_open_cmd);
> +    qemuio_add_command(&zone_close_cmd);
> +    qemuio_add_command(&zone_finish_cmd);
> +    qemuio_add_command(&zone_reset_cmd);
>      qemuio_add_command(&truncate_cmd);
>      qemuio_add_command(&length_cmd);
>      qemuio_add_command(&info_cmd);


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 9/9] qapi: add support for zoned host device
  2022-07-12  2:13 ` [RFC v4 9/9] qapi: add support for zoned host device Sam Li
@ 2022-07-12  7:48   ` Damien Le Moal
  2022-07-27 14:53   ` Stefan Hajnoczi
  1 sibling, 0 replies; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12  7:48 UTC (permalink / raw)
  To: Sam Li, qemu-devel
  Cc: Markus Armbruster, dmitry.fomichev, Stefan Hajnoczi, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

On 7/12/22 11:13, Sam Li wrote:
> ---
>  block/file-posix.c   | 8 +++++++-
>  qapi/block-core.json | 7 +++++--
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index e9ad1d8e1e..4e0aa02acf 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3737,6 +3737,12 @@ static void hdev_parse_filename(const char *filename, QDict *options,
>      bdrv_parse_filename_strip_prefix(filename, "host_device:", options);
>  }
>  
> +static void zoned_host_device_parse_filename(const char *filename, QDict *options,
> +                                Error **errp)
> +{
> +    bdrv_parse_filename_strip_prefix(filename, "zoned_host_device:", options);
> +}
> +
>  static bool hdev_is_sg(BlockDriverState *bs)
>  {
>  
> @@ -3975,7 +3981,7 @@ static BlockDriver bdrv_zoned_host_device = {
>          .is_zoned = true,
>          .bdrv_needs_filename = true,
>          .bdrv_probe_device  = hdev_probe_device,
> -        .bdrv_parse_filename = hdev_parse_filename,
> +        .bdrv_parse_filename = zoned_host_device_parse_filename,
>          .bdrv_file_open     = hdev_open,
>          .bdrv_close         = raw_close,
>          .bdrv_reopen_prepare = raw_reopen_prepare,
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2173e7734a..ab05c2ef99 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2955,7 +2955,8 @@
>              'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>              'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>              { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> -            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +            'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> +            { 'name': 'zoned_host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' } ] }

This needs to be something like:

{ 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }

And we need to make sure CONFIG_BLKZONED is defined if and only if we also
have HAVE_HOST_BLOCK_DEVICE.

>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -4329,7 +4330,9 @@
>        'vhdx':       'BlockdevOptionsGenericFormat',
>        'vmdk':       'BlockdevOptionsGenericCOWFormat',
>        'vpc':        'BlockdevOptionsGenericFormat',
> -      'vvfat':      'BlockdevOptionsVVFAT'
> +      'vvfat':      'BlockdevOptionsVVFAT',
> +      'zoned_host_device': { 'type': 'BlockdevOptionsFile',
> +                             'if': 'HAVE_HOST_BLOCK_DEVICE' }

Same here I think.

>    } }
>  
>  ##


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
  2022-07-12  6:10   ` Hannes Reinecke
  2022-07-12  7:35   ` Damien Le Moal
@ 2022-07-12 15:49   ` Stefan Hajnoczi
  2022-07-12 22:12     ` Damien Le Moal
  2022-07-13  0:51     ` Sam Li
  2 siblings, 2 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-12 15:49 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, damien.lemoal, Markus Armbruster, dmitry.fomichev,
	Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

[-- Attachment #1: Type: text/plain, Size: 19665 bytes --]

On Tue, Jul 12, 2022 at 10:13:37AM +0800, Sam Li wrote:
> By adding zone management operations in BlockDriver, storage
> controller emulation can use the new block layer APIs including
> zone_report and zone_mgmt(open, close, finish, reset).
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/block-backend.c            |  41 ++++++
>  block/coroutines.h               |   5 +
>  block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
>  include/block/block-common.h     |  43 +++++-
>  include/block/block_int-common.h |  20 +++
>  5 files changed, 344 insertions(+), 1 deletion(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f425b00793..0a05247ae4 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
>      return ret;
>  }
>  
> +/*
> + * Send a zone_report command.
> + * offset can be any number within the zone size. No alignment for offset.

I think offset is a byte offset from the start of the device and its
range is [0, total_sectors * BDRV_SECTOR_SIZE)?

"any number within the zone size" gives the impression that the value
must be [0, zone_size_in_bytes), which is not right.

I suggest changing the text to "offset can be any number of bytes from
the start of the device" or similar.

> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    int64_t *nr_zones,
> +                                    BlockZoneDescriptor *zones)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk); /* increase before waiting */
> +    blk_wait_while_drained(blk);
> +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);

The !blk_is_available(blk) case needs to return -ENOMEDIUM before we can
safely dereference blk->root->bs (which can also be written as
blk_bs(blk)).

> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * Offset is the start of a zone and len is aligned to zones.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,

Please define typedef enum { ... } BlockZoneOp instead of enum { ... }
zone_op and then use a BlockZoneOp op argument instead of enum zone_op.
QEMU coding style uses typedefs instead of struct foo or enum foo when
possible.

> +        int64_t offset, int64_t len)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    ret = blk_check_byte_request(blk, offset, len);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/coroutines.h b/block/coroutines.h
> index 830ecaa733..19aa96cc56 100644
> --- a/block/coroutines.h
> +++ b/block/coroutines.h
> @@ -80,6 +80,11 @@ int coroutine_fn
>  blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
>  
>  int coroutine_fn blk_co_do_flush(BlockBackend *blk);
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +                                    int64_t *nr_zones,
> +                                    BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> +                                  int64_t offset, int64_t len);
>  
>  
>  /*
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 48cd096624..e7523ae2ed 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,7 @@
>  #include <sys/param.h>
>  #include <sys/syscall.h>
>  #include <sys/vfs.h>
> +#include <linux/blkzoned.h>
>  #include <linux/cdrom.h>
>  #include <linux/fd.h>
>  #include <linux/fs.h>
> @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
>              PreallocMode prealloc;
>              Error **errp;
>          } truncate;
> +        struct {
> +            int64_t *nr_zones;
> +            BlockZoneDescriptor *zones;
> +        } zone_report;
> +        struct {
> +            zone_op op;
> +        } zone_mgmt;
>      };
>  } RawPosixAIOData;
>  
> @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
>  }
>  #endif
>  

Are the functions below within #ifdef __linux__?

> +/*
> + * parse_zone - Fill a zone descriptor
> + */
> +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> +                              struct blk_zone *blkz) {
> +    zone->start = blkz->start;
> +    zone->length = blkz->len;
> +    zone->cap = blkz->capacity;
> +    zone->wp = blkz->wp - blkz->start;
> +    zone->type = blkz->type;
> +    zone->cond = blkz->cond;
> +}
> +
> +static int handle_aiocb_zone_report(void *opaque) {
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    int64_t *nr_zones = aiocb->zone_report.nr_zones;
> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> +    int64_t offset = aiocb->aio_offset;

The code is easier to read if it's clear this value is in sector units:

  int64_t sector = aiocb->aio_offset / 512; /* BLKREPORTZONE uses 512B sectors */

Then there's no confusion about whether 'offset' is bytes or sectors.

> +
> +    struct blk_zone *blkz;
> +    int64_t rep_size, nrz;
> +    int ret, n = 0, i = 0;
> +
> +    nrz = *nr_zones;
> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> +    g_autofree struct blk_zone_report *rep = NULL;
> +    rep = g_malloc(rep_size);
> +    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
> +    printf("start to report zone with offset: 0x%lx\n", offset);
> +
> +    blkz = (struct blk_zone *)(rep + 1);
> +    while (n < nrz) {
> +        memset(rep, 0, rep_size);
> +        rep->sector = offset;
> +        rep->nr_zones = nrz;

After the first time around the while loop zones[] no longer has space
for nrz elements. This must be taken into account to avoid overflowing
zones[]:

  rep->nr_zones = nrz - n;

> +
> +        ret = ioctl(fd, BLKREPORTZONE, rep);
> +        if (ret != 0) {

Does this need to retry when ret != 0 && errno == EINTR is encountered?
Damien/Hannes/Dmitry might know the answer. See handle_aiocb_rw_vector()
for an example of retrying when EINTR is returned from a blocking
syscall.

> +            ret = -errno;
> +            error_report("%d: ioctl BLKREPORTZONE at %ld failed %d",
> +                         fd, offset, errno);

Please use "... at %" PRId64 " failed ..." instead of %ld for int64_t
values. %ld is not portable because sizeof(long) varies on different
machines.

> +            return ret;
> +        }
> +
> +        if (!rep->nr_zones) {
> +            break;
> +        }
> +
> +        for (i = 0; i < rep->nr_zones; i++, n++) {
> +            parse_zone(&zones[n], &blkz[i]);
> +            /* The next report should start after the last zone reported */
> +            offset = blkz[i].start + blkz[i].len;
> +        }
> +    }
> +
> +    *nr_zones = n;
> +    return 0;
> +}
> +
> +static int handle_aiocb_zone_mgmt(void *opaque) {
> +    RawPosixAIOData *aiocb = opaque;
> +    int fd = aiocb->aio_fildes;
> +    int64_t offset = aiocb->aio_offset;
> +    int64_t len = aiocb->aio_nbytes;
> +    zone_op op = aiocb->zone_mgmt.op;
> +
> +    struct blk_zone_range range;
> +    const char *ioctl_name;
> +    unsigned long ioctl_op;
> +    int64_t zone_size;
> +    int64_t zone_size_mask;
> +    int ret;
> +
> +    g_autofree struct stat *file = NULL;
> +    file = g_new(struct stat, 1);
> +    stat(s->filename, file);

Heap allocation is not needed. It's easier to put the struct stat
variable on the stack:

  struct stat st;
  if (fstat(fd, &st) < 0) {
      return -errno;
  }

fstat(2) is preferred over stat(2) when a file descriptor is available
because stat(2) suffers from race conditions when file system paths have
changed.

> +    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");

The name "zone_sectors" would be clearer since size doesn't indicate the
units (bytes vs sectors).

> +    zone_size_mask = zone_size - 1;
> +    if (offset & zone_size_mask) {

Bytes and sectors units are being mixed here:

  int64_t offset = aiocb->aio_offset; <-- bytes

I suggest changing it to:

  int64_t sector = aiocb->aio_offset / 512; /* BLK*ZONE ioctls use 512B sectors */

> +        error_report("offset is not the start of a zone");
> +        return -EINVAL;
> +    }
> +
> +    if (len & zone_size_mask) {

Bytes and sectors are mixed here too. I think len is in bytes:

  int64_t len = aiocb->aio_nbytes;

Maybe change it to:

  int64_t nr_sectors = aiocb->aio_nbytes / 512;

> +        error_report("len is not aligned to zones");
> +        return -EINVAL;
> +    }
> +
> +    switch (op) {
> +    case zone_open:
> +        ioctl_name = "BLKOPENZONE";
> +        ioctl_op = BLKOPENZONE;
> +        break;
> +    case zone_close:
> +        ioctl_name = "BLKCLOSEZONE";
> +        ioctl_op = BLKCLOSEZONE;
> +        break;
> +    case zone_finish:
> +        ioctl_name = "BLKFINISHZONE";
> +        ioctl_op = BLKFINISHZONE;
> +        break;
> +    case zone_reset:
> +        ioctl_name = "BLKRESETZONE";
> +        ioctl_op = BLKRESETZONE;
> +        break;
> +    default:
> +        error_report("Invalid zone operation 0x%x", op);
> +        return -EINVAL;
> +    }
> +
> +    /* Execute the operation */
> +    range.sector = offset;
> +    range.nr_sectors = len;
> +    ret = ioctl(fd, ioctl_op, &range);
> +    if (ret != 0) {
> +        error_report("ioctl %s failed %d",
> +                     ioctl_name, errno);
> +        return -errno;
> +    }
> +
> +    return 0;
> +}
> +
>  static int handle_aiocb_copy_range(void *opaque)
>  {
>      RawPosixAIOData *aiocb = opaque;
> @@ -2973,6 +3105,59 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
>      }
>  }
>  
> +/*
> + * zone report - Get a zone block device's information in the form
> + * of an array of zone descriptors.
> + *
> + * @param bs: passing zone block device file descriptor
> + * @param zones: an array of zone descriptors to hold zone
> + * information on reply
> + * @param offset: offset can be any byte within the zone size.
> + * @param len: (not sure yet.
> + * @return 0 on success, -1 on failure
> + */
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                           int64_t *nr_zones,
> +                                           BlockZoneDescriptor *zones) {
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +
> +    acb = (RawPosixAIOData) {
> +        .bs         = bs,
> +        .aio_fildes = s->fd,
> +        .aio_type   = QEMU_AIO_IOCTL,

Please define QEMU_AIO_ZONE_REPORT, QEMU_AIO_ZONE_MGMT, etc values for
these new operations instead of reusing QEMU_AIO_IOCTL, which is meant
for generic passthrough ioctls.

> +        .aio_offset = offset,
> +        .zone_report    = {
> +                .nr_zones       = nr_zones,
> +                .zones          = zones,
> +        },
> +    };
> +
> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> +}
> +
> +/*
> + * zone management operations - Execute an operation on a zone
> + */
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> +        int64_t offset, int64_t len) {
> +    BDRVRawState *s = bs->opaque;
> +    RawPosixAIOData acb;
> +
> +    acb = (RawPosixAIOData) {
> +        .bs             = bs,
> +        .aio_fildes     = s->fd,
> +        .aio_type       = QEMU_AIO_IOCTL,
> +        .aio_offset     = offset,
> +        .aio_nbytes     = len,
> +        .zone_mgmt  = {
> +                .op = op,
> +        },
> +    };
> +
> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> +}
> +
>  static coroutine_fn int
>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                  bool blkdev)
> @@ -3324,6 +3509,9 @@ BlockDriver bdrv_file = {
>      .bdrv_abort_perm_update = raw_abort_perm_update,
>      .create_opts = &raw_create_opts,
>      .mutable_opts = mutable_opts,
> +
> +    .bdrv_co_zone_report = raw_co_zone_report,
> +    .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>  };
>  
>  /***********************************************/
> @@ -3703,6 +3891,53 @@ static BlockDriver bdrv_host_device = {
>  #endif
>  };
>  

#ifdef __linux__

> +static BlockDriver bdrv_zoned_host_device = {
> +        .format_name = "zoned_host_device",
> +        .protocol_name = "zoned_host_device",
> +        .instance_size = sizeof(BDRVRawState),
> +        .bdrv_needs_filename = true,
> +        .bdrv_probe_device  = hdev_probe_device,
> +        .bdrv_parse_filename = hdev_parse_filename,

This function hardcodes "host_device:". zoned_host_device needs a
separate function that uses "zoned_host_device:".

> +        .bdrv_file_open     = hdev_open,
> +        .bdrv_close         = raw_close,
> +        .bdrv_reopen_prepare = raw_reopen_prepare,
> +        .bdrv_reopen_commit  = raw_reopen_commit,
> +        .bdrv_reopen_abort   = raw_reopen_abort,
> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> +        .create_opts         = &bdrv_create_opts_simple,
> +        .mutable_opts        = mutable_opts,
> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> +
> +        .bdrv_co_preadv         = raw_co_preadv,
> +        .bdrv_co_pwritev        = raw_co_pwritev,
> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> +        .bdrv_refresh_limits = raw_refresh_limits,
> +        .bdrv_io_plug = raw_aio_plug,
> +        .bdrv_io_unplug = raw_aio_unplug,
> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> +
> +        .bdrv_co_truncate       = raw_co_truncate,
> +        .bdrv_getlength = raw_getlength,
> +        .bdrv_get_info = raw_get_info,
> +        .bdrv_get_allocated_file_size
> +                            = raw_get_allocated_file_size,
> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> +        .bdrv_check_perm = raw_check_perm,
> +        .bdrv_set_perm   = raw_set_perm,
> +        .bdrv_abort_perm_update = raw_abort_perm_update,
> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> +        .bdrv_probe_geometry = hdev_probe_geometry,
> +        .bdrv_co_ioctl = hdev_co_ioctl,
> +
> +        /* zone management operations */
> +        .bdrv_co_zone_report = raw_co_zone_report,
> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +};

#endif /* __linux__ */

> +
>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>  static void cdrom_parse_filename(const char *filename, QDict *options,
>                                   Error **errp)
> @@ -3964,6 +4199,7 @@ static void bdrv_file_init(void)
>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>      bdrv_register(&bdrv_host_device);
>  #ifdef __linux__
> +    bdrv_register(&bdrv_zoned_host_device);
>      bdrv_register(&bdrv_host_cdrom);
>  #endif
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index fdb7306e78..78cddeeda5 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -23,7 +23,6 @@
>   */
>  #ifndef BLOCK_COMMON_H
>  #define BLOCK_COMMON_H
> -
>  #include "block/aio.h"
>  #include "block/aio-wait.h"
>  #include "qemu/iov.h"

Please avoid making whitespace changes that are unrelated to the patch.

> @@ -49,6 +48,48 @@ typedef struct BlockDriver BlockDriver;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildClass BdrvChildClass;
>  
> +typedef enum zone_op {
> +    zone_open,
> +    zone_close,
> +    zone_finish,
> +    zone_reset,
> +} zone_op;

QEMU coding style:

  typedef enum {
      BLK_ZO_OPEN,
      BLK_ZO_CLOSE,
      BLK_ZO_FINISH,
      BLK_ZO_RESET,
  } BlockZoneOp;

Please also reformat the enums below.

> +
> +typedef enum zone_model {
> +    BLK_Z_HM,
> +    BLK_Z_HA,
> +} zone_model;
> +
> +typedef enum BlkZoneCondition {
> +    BLK_ZS_NOT_WP = 0x0,
> +    BLK_ZS_EMPTY = 0x1,
> +    BLK_ZS_IOPEN = 0x2,
> +    BLK_ZS_EOPEN = 0x3,
> +    BLK_ZS_CLOSED = 0x4,
> +    BLK_ZS_RDONLY = 0xD,
> +    BLK_ZS_FULL = 0xE,
> +    BLK_ZS_OFFLINE = 0xF,
> +} BlkZoneCondition;
> +
> +typedef enum BlkZoneType {
> +    BLK_ZT_CONV = 0x1,
> +    BLK_ZT_SWR = 0x2,
> +    BLK_ZT_SWP = 0x3,
> +} BlkZoneType;
> +
> +/*
> + * Zone descriptor data structure.
> + * Provide information on a zone with all position and size values in bytes.
> + */
> +typedef struct BlockZoneDescriptor {
> +    uint64_t start;
> +    uint64_t length;
> +    uint64_t cap;
> +    uint64_t wp;
> +    BlkZoneType type;
> +    BlkZoneCondition cond;
> +} BlockZoneDescriptor;
> +
>  typedef struct BlockDriverInfo {
>      /* in bytes, 0 if irrelevant */
>      int cluster_size;
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 8947abab76..6037871089 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -94,6 +94,20 @@ typedef struct BdrvTrackedRequest {
>      struct BdrvTrackedRequest *waiting_for;
>  } BdrvTrackedRequest;
>  
> +/**
> + * Zone device information data structure.
> + * Provide information on a device.
> + */
> +typedef struct zbd_dev {
> +    uint32_t zone_size;
> +    zone_model model;
> +    uint32_t block_size;
> +    uint32_t write_granularity;
> +    uint32_t nr_zones;
> +    struct BlockZoneDescriptor *zones; /* array of zones */
> +    uint32_t max_nr_open_zones; /* maximum number of explicitly open zones */
> +    uint32_t max_nr_active_zones;
> +} zbd_dev;

This struct isn't use by this patch. Please move this change into the
patch that uses struct zbd_dev.

QEMU coding style naming would be BlockZoneDev instead of zbd_dev.

I think this struct contains the block limits fields that could be added
to QEMU's BlockLimits? A new struct may not be necessary.

>  
>  struct BlockDriver {
>      /*
> @@ -691,6 +705,12 @@ struct BlockDriver {
>                                            QEMUIOVector *qiov,
>                                            int64_t pos);
>  
> +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> +            int64_t offset, int64_t *nr_zones,
> +            BlockZoneDescriptor *zones);
> +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, enum zone_op op,
> +            int64_t offset, int64_t len);
> +
>      /* removable device specific */
>      bool (*bdrv_is_inserted)(BlockDriverState *bs);
>      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> -- 
> 2.36.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12 15:49   ` Stefan Hajnoczi
@ 2022-07-12 22:12     ` Damien Le Moal
  2022-07-13  6:22       ` Stefan Hajnoczi
  2022-07-13  0:51     ` Sam Li
  1 sibling, 1 reply; 42+ messages in thread
From: Damien Le Moal @ 2022-07-12 22:12 UTC (permalink / raw)
  To: Stefan Hajnoczi, Sam Li
  Cc: qemu-devel, Markus Armbruster, dmitry.fomichev, Hanna Reitz,
	qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

On 7/13/22 00:49, Stefan Hajnoczi wrote:
> On Tue, Jul 12, 2022 at 10:13:37AM +0800, Sam Li wrote:
>> By adding zone management operations in BlockDriver, storage
>> controller emulation can use the new block layer APIs including
>> zone_report and zone_mgmt(open, close, finish, reset).
>>
>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>> ---
>>  block/block-backend.c            |  41 ++++++
>>  block/coroutines.h               |   5 +
>>  block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
>>  include/block/block-common.h     |  43 +++++-
>>  include/block/block_int-common.h |  20 +++
>>  5 files changed, 344 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index f425b00793..0a05247ae4 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
>>      return ret;
>>  }
>>  
>> +/*
>> + * Send a zone_report command.
>> + * offset can be any number within the zone size. No alignment for offset.
> 
> I think offset is a byte offset from the start of the device and its
> range is [0, total_sectors * BDRV_SECTOR_SIZE)?
> 
> "any number within the zone size" gives the impression that the value
> must be [0, zone_size_in_bytes), which is not right.
> 
> I suggest changing the text to "offset can be any number of bytes from
> the start of the device" or similar.
> 
>> + * nr_zones represents IN maximum and OUT actual.
>> + */
>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
>> +                                    int64_t *nr_zones,
>> +                                    BlockZoneDescriptor *zones)
>> +{
>> +    int ret;
>> +    IO_CODE();
>> +
>> +    blk_inc_in_flight(blk); /* increase before waiting */
>> +    blk_wait_while_drained(blk);
>> +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
> 
> The !blk_is_available(blk) case needs to return -ENOMEDIUM before we can
> safely dereference blk->root->bs (which can also be written as
> blk_bs(blk)).
> 
>> +    blk_dec_in_flight(blk);
>> +    return ret;
>> +}
>> +
>> +/*
>> + * Send a zone_management command.
>> + * Offset is the start of a zone and len is aligned to zones.
>> + */
>> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> 
> Please define typedef enum { ... } BlockZoneOp instead of enum { ... }
> zone_op and then use a BlockZoneOp op argument instead of enum zone_op.
> QEMU coding style uses typedefs instead of struct foo or enum foo when
> possible.
> 
>> +        int64_t offset, int64_t len)
>> +{
>> +    int ret;
>> +    IO_CODE();
>> +
>> +    blk_inc_in_flight(blk);
>> +    blk_wait_while_drained(blk);
>> +    ret = blk_check_byte_request(blk, offset, len);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
>> +    blk_dec_in_flight(blk);
>> +    return ret;
>> +}
>> +
>>  void blk_drain(BlockBackend *blk)
>>  {
>>      BlockDriverState *bs = blk_bs(blk);
>> diff --git a/block/coroutines.h b/block/coroutines.h
>> index 830ecaa733..19aa96cc56 100644
>> --- a/block/coroutines.h
>> +++ b/block/coroutines.h
>> @@ -80,6 +80,11 @@ int coroutine_fn
>>  blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
>>  
>>  int coroutine_fn blk_co_do_flush(BlockBackend *blk);
>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
>> +                                    int64_t *nr_zones,
>> +                                    BlockZoneDescriptor *zones);
>> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
>> +                                  int64_t offset, int64_t len);
>>  
>>  
>>  /*
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 48cd096624..e7523ae2ed 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -67,6 +67,7 @@
>>  #include <sys/param.h>
>>  #include <sys/syscall.h>
>>  #include <sys/vfs.h>
>> +#include <linux/blkzoned.h>
>>  #include <linux/cdrom.h>
>>  #include <linux/fd.h>
>>  #include <linux/fs.h>
>> @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
>>              PreallocMode prealloc;
>>              Error **errp;
>>          } truncate;
>> +        struct {
>> +            int64_t *nr_zones;
>> +            BlockZoneDescriptor *zones;
>> +        } zone_report;
>> +        struct {
>> +            zone_op op;
>> +        } zone_mgmt;
>>      };
>>  } RawPosixAIOData;
>>  
>> @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
>>  }
>>  #endif
>>  
> 
> Are the functions below within #ifdef __linux__?

We need more than that: linux AND blkzoned.h header present (meaning a
recent kernel). So the ifdef should be "#if defined(CONFIG_BLKZONED)" or
something like it, with CONFIG_BLKZONED defined for linux AND
/usr/include/linux/blkzoned.h present.

> 
>> +/*
>> + * parse_zone - Fill a zone descriptor
>> + */
>> +static inline void parse_zone(struct BlockZoneDescriptor *zone,
>> +                              struct blk_zone *blkz) {
>> +    zone->start = blkz->start;
>> +    zone->length = blkz->len;
>> +    zone->cap = blkz->capacity;
>> +    zone->wp = blkz->wp - blkz->start;
>> +    zone->type = blkz->type;
>> +    zone->cond = blkz->cond;
>> +}
>> +
>> +static int handle_aiocb_zone_report(void *opaque) {
>> +    RawPosixAIOData *aiocb = opaque;
>> +    int fd = aiocb->aio_fildes;
>> +    int64_t *nr_zones = aiocb->zone_report.nr_zones;
>> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
>> +    int64_t offset = aiocb->aio_offset;
> 
> The code is easier to read if it's clear this value is in sector units:
> 
>   int64_t sector = aiocb->aio_offset / 512; /* BLKREPORTZONE uses 512B sectors */
> 
> Then there's no confusion about whether 'offset' is bytes or sectors.
> 
>> +
>> +    struct blk_zone *blkz;
>> +    int64_t rep_size, nrz;
>> +    int ret, n = 0, i = 0;
>> +
>> +    nrz = *nr_zones;
>> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
>> +    g_autofree struct blk_zone_report *rep = NULL;
>> +    rep = g_malloc(rep_size);
>> +    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
>> +    printf("start to report zone with offset: 0x%lx\n", offset);
>> +
>> +    blkz = (struct blk_zone *)(rep + 1);
>> +    while (n < nrz) {
>> +        memset(rep, 0, rep_size);
>> +        rep->sector = offset;
>> +        rep->nr_zones = nrz;
> 
> After the first time around the while loop zones[] no longer has space
> for nrz elements. This must be taken into account to avoid overflowing
> zones[]:
> 
>   rep->nr_zones = nrz - n;
> 
>> +
>> +        ret = ioctl(fd, BLKREPORTZONE, rep);
>> +        if (ret != 0) {
> 
> Does this need to retry when ret != 0 && errno == EINTR is encountered?
> Damien/Hannes/Dmitry might know the answer. See handle_aiocb_rw_vector()
> for an example of retrying when EINTR is returned from a blocking
> syscall.
> 
>> +            ret = -errno;
>> +            error_report("%d: ioctl BLKREPORTZONE at %ld failed %d",
>> +                         fd, offset, errno);
> 
> Please use "... at %" PRId64 " failed ..." instead of %ld for int64_t
> values. %ld is not portable because sizeof(long) varies on different
> machines.
> 
>> +            return ret;
>> +        }
>> +
>> +        if (!rep->nr_zones) {
>> +            break;
>> +        }
>> +
>> +        for (i = 0; i < rep->nr_zones; i++, n++) {
>> +            parse_zone(&zones[n], &blkz[i]);
>> +            /* The next report should start after the last zone reported */
>> +            offset = blkz[i].start + blkz[i].len;
>> +        }
>> +    }
>> +
>> +    *nr_zones = n;
>> +    return 0;
>> +}
>> +
>> +static int handle_aiocb_zone_mgmt(void *opaque) {
>> +    RawPosixAIOData *aiocb = opaque;
>> +    int fd = aiocb->aio_fildes;
>> +    int64_t offset = aiocb->aio_offset;
>> +    int64_t len = aiocb->aio_nbytes;
>> +    zone_op op = aiocb->zone_mgmt.op;
>> +
>> +    struct blk_zone_range range;
>> +    const char *ioctl_name;
>> +    unsigned long ioctl_op;
>> +    int64_t zone_size;
>> +    int64_t zone_size_mask;
>> +    int ret;
>> +
>> +    g_autofree struct stat *file = NULL;
>> +    file = g_new(struct stat, 1);
>> +    stat(s->filename, file);
> 
> Heap allocation is not needed. It's easier to put the struct stat
> variable on the stack:
> 
>   struct stat st;
>   if (fstat(fd, &st) < 0) {
>       return -errno;
>   }
> 
> fstat(2) is preferred over stat(2) when a file descriptor is available
> because stat(2) suffers from race conditions when file system paths have
> changed.
> 
>> +    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
> 
> The name "zone_sectors" would be clearer since size doesn't indicate the
> units (bytes vs sectors).
> 
>> +    zone_size_mask = zone_size - 1;
>> +    if (offset & zone_size_mask) {
> 
> Bytes and sectors units are being mixed here:
> 
>   int64_t offset = aiocb->aio_offset; <-- bytes
> 
> I suggest changing it to:
> 
>   int64_t sector = aiocb->aio_offset / 512; /* BLK*ZONE ioctls use 512B sectors */
> 
>> +        error_report("offset is not the start of a zone");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (len & zone_size_mask) {
> 
> Bytes and sectors are mixed here too. I think len is in bytes:
> 
>   int64_t len = aiocb->aio_nbytes;
> 
> Maybe change it to:
> 
>   int64_t nr_sectors = aiocb->aio_nbytes / 512;
> 
>> +        error_report("len is not aligned to zones");
>> +        return -EINVAL;
>> +    }
>> +
>> +    switch (op) {
>> +    case zone_open:
>> +        ioctl_name = "BLKOPENZONE";
>> +        ioctl_op = BLKOPENZONE;
>> +        break;
>> +    case zone_close:
>> +        ioctl_name = "BLKCLOSEZONE";
>> +        ioctl_op = BLKCLOSEZONE;
>> +        break;
>> +    case zone_finish:
>> +        ioctl_name = "BLKFINISHZONE";
>> +        ioctl_op = BLKFINISHZONE;
>> +        break;
>> +    case zone_reset:
>> +        ioctl_name = "BLKRESETZONE";
>> +        ioctl_op = BLKRESETZONE;
>> +        break;
>> +    default:
>> +        error_report("Invalid zone operation 0x%x", op);
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Execute the operation */
>> +    range.sector = offset;
>> +    range.nr_sectors = len;
>> +    ret = ioctl(fd, ioctl_op, &range);
>> +    if (ret != 0) {
>> +        error_report("ioctl %s failed %d",
>> +                     ioctl_name, errno);
>> +        return -errno;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  static int handle_aiocb_copy_range(void *opaque)
>>  {
>>      RawPosixAIOData *aiocb = opaque;
>> @@ -2973,6 +3105,59 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
>>      }
>>  }
>>  
>> +/*
>> + * zone report - Get a zone block device's information in the form
>> + * of an array of zone descriptors.
>> + *
>> + * @param bs: passing zone block device file descriptor
>> + * @param zones: an array of zone descriptors to hold zone
>> + * information on reply
>> + * @param offset: offset can be any byte within the zone size.
>> + * @param len: (not sure yet.
>> + * @return 0 on success, -1 on failure
>> + */
>> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
>> +                                           int64_t *nr_zones,
>> +                                           BlockZoneDescriptor *zones) {
>> +    BDRVRawState *s = bs->opaque;
>> +    RawPosixAIOData acb;
>> +
>> +    acb = (RawPosixAIOData) {
>> +        .bs         = bs,
>> +        .aio_fildes = s->fd,
>> +        .aio_type   = QEMU_AIO_IOCTL,
> 
> Please define QEMU_AIO_ZONE_REPORT, QEMU_AIO_ZONE_MGMT, etc values for
> these new operations instead of reusing QEMU_AIO_IOCTL, which is meant
> for generic passthrough ioctls.
> 
>> +        .aio_offset = offset,
>> +        .zone_report    = {
>> +                .nr_zones       = nr_zones,
>> +                .zones          = zones,
>> +        },
>> +    };
>> +
>> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
>> +}
>> +
>> +/*
>> + * zone management operations - Execute an operation on a zone
>> + */
>> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
>> +        int64_t offset, int64_t len) {
>> +    BDRVRawState *s = bs->opaque;
>> +    RawPosixAIOData acb;
>> +
>> +    acb = (RawPosixAIOData) {
>> +        .bs             = bs,
>> +        .aio_fildes     = s->fd,
>> +        .aio_type       = QEMU_AIO_IOCTL,
>> +        .aio_offset     = offset,
>> +        .aio_nbytes     = len,
>> +        .zone_mgmt  = {
>> +                .op = op,
>> +        },
>> +    };
>> +
>> +    return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
>> +}
>> +
>>  static coroutine_fn int
>>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>>                  bool blkdev)
>> @@ -3324,6 +3509,9 @@ BlockDriver bdrv_file = {
>>      .bdrv_abort_perm_update = raw_abort_perm_update,
>>      .create_opts = &raw_create_opts,
>>      .mutable_opts = mutable_opts,
>> +
>> +    .bdrv_co_zone_report = raw_co_zone_report,
>> +    .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>>  };
>>  
>>  /***********************************************/
>> @@ -3703,6 +3891,53 @@ static BlockDriver bdrv_host_device = {
>>  #endif
>>  };
>>  
> 
> #ifdef __linux__
> 
>> +static BlockDriver bdrv_zoned_host_device = {
>> +        .format_name = "zoned_host_device",
>> +        .protocol_name = "zoned_host_device",
>> +        .instance_size = sizeof(BDRVRawState),
>> +        .bdrv_needs_filename = true,
>> +        .bdrv_probe_device  = hdev_probe_device,
>> +        .bdrv_parse_filename = hdev_parse_filename,
> 
> This function hardcodes "host_device:". zoned_host_device needs a
> separate function that uses "zoned_host_device:".
> 
>> +        .bdrv_file_open     = hdev_open,
>> +        .bdrv_close         = raw_close,
>> +        .bdrv_reopen_prepare = raw_reopen_prepare,
>> +        .bdrv_reopen_commit  = raw_reopen_commit,
>> +        .bdrv_reopen_abort   = raw_reopen_abort,
>> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
>> +        .create_opts         = &bdrv_create_opts_simple,
>> +        .mutable_opts        = mutable_opts,
>> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>> +
>> +        .bdrv_co_preadv         = raw_co_preadv,
>> +        .bdrv_co_pwritev        = raw_co_pwritev,
>> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
>> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
>> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>> +        .bdrv_refresh_limits = raw_refresh_limits,
>> +        .bdrv_io_plug = raw_aio_plug,
>> +        .bdrv_io_unplug = raw_aio_unplug,
>> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> +
>> +        .bdrv_co_truncate       = raw_co_truncate,
>> +        .bdrv_getlength = raw_getlength,
>> +        .bdrv_get_info = raw_get_info,
>> +        .bdrv_get_allocated_file_size
>> +                            = raw_get_allocated_file_size,
>> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
>> +        .bdrv_check_perm = raw_check_perm,
>> +        .bdrv_set_perm   = raw_set_perm,
>> +        .bdrv_abort_perm_update = raw_abort_perm_update,
>> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
>> +        .bdrv_probe_geometry = hdev_probe_geometry,
>> +        .bdrv_co_ioctl = hdev_co_ioctl,
>> +
>> +        /* zone management operations */
>> +        .bdrv_co_zone_report = raw_co_zone_report,
>> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>> +};
> 
> #endif /* __linux__ */
> 
>> +
>>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>>  static void cdrom_parse_filename(const char *filename, QDict *options,
>>                                   Error **errp)
>> @@ -3964,6 +4199,7 @@ static void bdrv_file_init(void)
>>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>>      bdrv_register(&bdrv_host_device);
>>  #ifdef __linux__
>> +    bdrv_register(&bdrv_zoned_host_device);
>>      bdrv_register(&bdrv_host_cdrom);
>>  #endif
>>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>> diff --git a/include/block/block-common.h b/include/block/block-common.h
>> index fdb7306e78..78cddeeda5 100644
>> --- a/include/block/block-common.h
>> +++ b/include/block/block-common.h
>> @@ -23,7 +23,6 @@
>>   */
>>  #ifndef BLOCK_COMMON_H
>>  #define BLOCK_COMMON_H
>> -
>>  #include "block/aio.h"
>>  #include "block/aio-wait.h"
>>  #include "qemu/iov.h"
> 
> Please avoid making whitespace changes that are unrelated to the patch.
> 
>> @@ -49,6 +48,48 @@ typedef struct BlockDriver BlockDriver;
>>  typedef struct BdrvChild BdrvChild;
>>  typedef struct BdrvChildClass BdrvChildClass;
>>  
>> +typedef enum zone_op {
>> +    zone_open,
>> +    zone_close,
>> +    zone_finish,
>> +    zone_reset,
>> +} zone_op;
> 
> QEMU coding style:
> 
>   typedef enum {
>       BLK_ZO_OPEN,
>       BLK_ZO_CLOSE,
>       BLK_ZO_FINISH,
>       BLK_ZO_RESET,
>   } BlockZoneOp;
> 
> Please also reformat the enums below.
> 
>> +
>> +typedef enum zone_model {
>> +    BLK_Z_HM,
>> +    BLK_Z_HA,
>> +} zone_model;
>> +
>> +typedef enum BlkZoneCondition {
>> +    BLK_ZS_NOT_WP = 0x0,
>> +    BLK_ZS_EMPTY = 0x1,
>> +    BLK_ZS_IOPEN = 0x2,
>> +    BLK_ZS_EOPEN = 0x3,
>> +    BLK_ZS_CLOSED = 0x4,
>> +    BLK_ZS_RDONLY = 0xD,
>> +    BLK_ZS_FULL = 0xE,
>> +    BLK_ZS_OFFLINE = 0xF,
>> +} BlkZoneCondition;
>> +
>> +typedef enum BlkZoneType {
>> +    BLK_ZT_CONV = 0x1,
>> +    BLK_ZT_SWR = 0x2,
>> +    BLK_ZT_SWP = 0x3,
>> +} BlkZoneType;
>> +
>> +/*
>> + * Zone descriptor data structure.
>> + * Provide information on a zone with all position and size values in bytes.
>> + */
>> +typedef struct BlockZoneDescriptor {
>> +    uint64_t start;
>> +    uint64_t length;
>> +    uint64_t cap;
>> +    uint64_t wp;
>> +    BlkZoneType type;
>> +    BlkZoneCondition cond;
>> +} BlockZoneDescriptor;
>> +
>>  typedef struct BlockDriverInfo {
>>      /* in bytes, 0 if irrelevant */
>>      int cluster_size;
>> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
>> index 8947abab76..6037871089 100644
>> --- a/include/block/block_int-common.h
>> +++ b/include/block/block_int-common.h
>> @@ -94,6 +94,20 @@ typedef struct BdrvTrackedRequest {
>>      struct BdrvTrackedRequest *waiting_for;
>>  } BdrvTrackedRequest;
>>  
>> +/**
>> + * Zone device information data structure.
>> + * Provide information on a device.
>> + */
>> +typedef struct zbd_dev {
>> +    uint32_t zone_size;
>> +    zone_model model;
>> +    uint32_t block_size;
>> +    uint32_t write_granularity;
>> +    uint32_t nr_zones;
>> +    struct BlockZoneDescriptor *zones; /* array of zones */
>> +    uint32_t max_nr_open_zones; /* maximum number of explicitly open zones */
>> +    uint32_t max_nr_active_zones;
>> +} zbd_dev;
> 
> This struct isn't use by this patch. Please move this change into the
> patch that uses struct zbd_dev.
> 
> QEMU coding style naming would be BlockZoneDev instead of zbd_dev.
> 
> I think this struct contains the block limits fields that could be added
> to QEMU's BlockLimits? A new struct may not be necessary.
> 
>>  
>>  struct BlockDriver {
>>      /*
>> @@ -691,6 +705,12 @@ struct BlockDriver {
>>                                            QEMUIOVector *qiov,
>>                                            int64_t pos);
>>  
>> +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
>> +            int64_t offset, int64_t *nr_zones,
>> +            BlockZoneDescriptor *zones);
>> +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, enum zone_op op,
>> +            int64_t offset, int64_t len);
>> +
>>      /* removable device specific */
>>      bool (*bdrv_is_inserted)(BlockDriverState *bs);
>>      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
>> -- 
>> 2.36.1
>>


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12 15:49   ` Stefan Hajnoczi
  2022-07-12 22:12     ` Damien Le Moal
@ 2022-07-13  0:51     ` Sam Li
  2022-07-13  6:19       ` Stefan Hajnoczi
  1 sibling, 1 reply; 42+ messages in thread
From: Sam Li @ 2022-07-13  0:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Hanna Reitz, qemu block, Eric Blake, Kevin Wolf, Fam Zheng,
	Hannes Reinecke

Stefan Hajnoczi <stefanha@redhat.com> 于2022年7月12日周二 23:49写道:
>
> On Tue, Jul 12, 2022 at 10:13:37AM +0800, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage
> > controller emulation can use the new block layer APIs including
> > zone_report and zone_mgmt(open, close, finish, reset).
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  block/block-backend.c            |  41 ++++++
> >  block/coroutines.h               |   5 +
> >  block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
> >  include/block/block-common.h     |  43 +++++-
> >  include/block/block_int-common.h |  20 +++
> >  5 files changed, 344 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index f425b00793..0a05247ae4 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
> >      return ret;
> >  }
> >
> > +/*
> > + * Send a zone_report command.
> > + * offset can be any number within the zone size. No alignment for offset.
>
> I think offset is a byte offset from the start of the device and its
> range is [0, total_sectors * BDRV_SECTOR_SIZE)?
>
> "any number within the zone size" gives the impression that the value
> must be [0, zone_size_in_bytes), which is not right.
>
> I suggest changing the text to "offset can be any number of bytes from
> the start of the device" or similar.
>
> > + * nr_zones represents IN maximum and OUT actual.
> > + */
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    int64_t *nr_zones,
> > +                                    BlockZoneDescriptor *zones)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk); /* increase before waiting */
> > +    blk_wait_while_drained(blk);
> > +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
>
> The !blk_is_available(blk) case needs to return -ENOMEDIUM before we can
> safely dereference blk->root->bs (which can also be written as
> blk_bs(blk)).
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> > +/*
> > + * Send a zone_management command.
> > + * Offset is the start of a zone and len is aligned to zones.
> > + */
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
>
> Please define typedef enum { ... } BlockZoneOp instead of enum { ... }
> zone_op and then use a BlockZoneOp op argument instead of enum zone_op.
> QEMU coding style uses typedefs instead of struct foo or enum foo when
> possible.

Yes, it must be a type.

> > +        int64_t offset, int64_t len)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk);
> > +    blk_wait_while_drained(blk);
> > +    ret = blk_check_byte_request(blk, offset, len);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +
> > +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> >  void blk_drain(BlockBackend *blk)
> >  {
> >      BlockDriverState *bs = blk_bs(blk);
> > diff --git a/block/coroutines.h b/block/coroutines.h
> > index 830ecaa733..19aa96cc56 100644
> > --- a/block/coroutines.h
> > +++ b/block/coroutines.h
> > @@ -80,6 +80,11 @@ int coroutine_fn
> >  blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
> >
> >  int coroutine_fn blk_co_do_flush(BlockBackend *blk);
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    int64_t *nr_zones,
> > +                                    BlockZoneDescriptor *zones);
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> > +                                  int64_t offset, int64_t len);
> >
> >
> >  /*
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 48cd096624..e7523ae2ed 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -67,6 +67,7 @@
> >  #include <sys/param.h>
> >  #include <sys/syscall.h>
> >  #include <sys/vfs.h>
> > +#include <linux/blkzoned.h>
> >  #include <linux/cdrom.h>
> >  #include <linux/fd.h>
> >  #include <linux/fs.h>
> > @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
> >              PreallocMode prealloc;
> >              Error **errp;
> >          } truncate;
> > +        struct {
> > +            int64_t *nr_zones;
> > +            BlockZoneDescriptor *zones;
> > +        } zone_report;
> > +        struct {
> > +            zone_op op;
> > +        } zone_mgmt;
> >      };
> >  } RawPosixAIOData;
> >
> > @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> >  }
> >  #endif
> >
>
> Are the functions below within #ifdef __linux__?

Maybe add them later?

> > +/*
> > + * parse_zone - Fill a zone descriptor
> > + */
> > +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> > +                              struct blk_zone *blkz) {
> > +    zone->start = blkz->start;
> > +    zone->length = blkz->len;
> > +    zone->cap = blkz->capacity;
> > +    zone->wp = blkz->wp - blkz->start;
> > +    zone->type = blkz->type;
> > +    zone->cond = blkz->cond;
> > +}
> > +
> > +static int handle_aiocb_zone_report(void *opaque) {
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    int64_t *nr_zones = aiocb->zone_report.nr_zones;
> > +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> > +    int64_t offset = aiocb->aio_offset;
>
> The code is easier to read if it's clear this value is in sector units:
>
>   int64_t sector = aiocb->aio_offset / 512; /* BLKREPORTZONE uses 512B sectors */
>
> Then there's no confusion about whether 'offset' is bytes or sectors.

I see. It is more readable.

> > +
> > +    struct blk_zone *blkz;
> > +    int64_t rep_size, nrz;
> > +    int ret, n = 0, i = 0;
> > +
> > +    nrz = *nr_zones;
> > +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> > +    g_autofree struct blk_zone_report *rep = NULL;
> > +    rep = g_malloc(rep_size);
> > +    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
> > +    printf("start to report zone with offset: 0x%lx\n", offset);
> > +
> > +    blkz = (struct blk_zone *)(rep + 1);
> > +    while (n < nrz) {
> > +        memset(rep, 0, rep_size);
> > +        rep->sector = offset;
> > +        rep->nr_zones = nrz;
>
> After the first time around the while loop zones[] no longer has space
> for nrz elements. This must be taken into account to avoid overflowing
> zones[]:
>
>   rep->nr_zones = nrz - n;
>
> > +
> > +        ret = ioctl(fd, BLKREPORTZONE, rep);
> > +        if (ret != 0) {
>
> Does this need to retry when ret != 0 && errno == EINTR is encountered?
> Damien/Hannes/Dmitry might know the answer. See handle_aiocb_rw_vector()
> for an example of retrying when EINTR is returned from a blocking
> syscall.
>
> > +            ret = -errno;
> > +            error_report("%d: ioctl BLKREPORTZONE at %ld failed %d",
> > +                         fd, offset, errno);
>
> Please use "... at %" PRId64 " failed ..." instead of %ld for int64_t
> values. %ld is not portable because sizeof(long) varies on different
> machines.

Right! I've missed other part than qemuio-cmds to use PRI64.

> > +            return ret;
> > +        }
> > +
> > +        if (!rep->nr_zones) {
> > +            break;
> > +        }
> > +
> > +        for (i = 0; i < rep->nr_zones; i++, n++) {
> > +            parse_zone(&zones[n], &blkz[i]);
> > +            /* The next report should start after the last zone reported */
> > +            offset = blkz[i].start + blkz[i].len;
> > +        }
> > +    }
> > +
> > +    *nr_zones = n;
> > +    return 0;
> > +}
> > +
> > +static int handle_aiocb_zone_mgmt(void *opaque) {
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    int64_t offset = aiocb->aio_offset;
> > +    int64_t len = aiocb->aio_nbytes;
> > +    zone_op op = aiocb->zone_mgmt.op;
> > +
> > +    struct blk_zone_range range;
> > +    const char *ioctl_name;
> > +    unsigned long ioctl_op;
> > +    int64_t zone_size;
> > +    int64_t zone_size_mask;
> > +    int ret;
> > +
> > +    g_autofree struct stat *file = NULL;
> > +    file = g_new(struct stat, 1);
> > +    stat(s->filename, file);
>
> Heap allocation is not needed. It's easier to put the struct stat
> variable on the stack:
>
>   struct stat st;
>   if (fstat(fd, &st) < 0) {
>       return -errno;
>   }
>
> fstat(2) is preferred over stat(2) when a file descriptor is available
> because stat(2) suffers from race conditions when file system paths have
> changed.

Yes, it was mentioned before. I have included this change in patch 3.
I'll move them into this patch for better reviewing later.

> > +    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
>
> The name "zone_sectors" would be clearer since size doesn't indicate the
> units (bytes vs sectors).
> > +    zone_size_mask = zone_size - 1;
> > +    if (offset & zone_size_mask) {
>
> Bytes and sectors units are being mixed here:
>
>   int64_t offset = aiocb->aio_offset; <-- bytes
>
> I suggest changing it to:
>
>   int64_t sector = aiocb->aio_offset / 512; /* BLK*ZONE ioctls use 512B sectors */

I see. I missed considering the difference between sector and byte.

> > +        error_report("offset is not the start of a zone");
> > +        return -EINVAL;
> > +    }
> > +
> > +    if (len & zone_size_mask) {
>
> Bytes and sectors are mixed here too. I think len is in bytes:
>
>   int64_t len = aiocb->aio_nbytes;
>
> Maybe change it to:
>
>   int64_t nr_sectors = aiocb->aio_nbytes / 512;
> > +        error_report("len is not aligned to zones");
> > +        return -EINVAL;
> > +    }
> > +
> > +    switch (op) {
> > +    case zone_open:
> > +        ioctl_name = "BLKOPENZONE";
> > +        ioctl_op = BLKOPENZONE;
> > +        break;
> > +    case zone_close:
> > +        ioctl_name = "BLKCLOSEZONE";
> > +        ioctl_op = BLKCLOSEZONE;
> > +        break;
> > +    case zone_finish:
> > +        ioctl_name = "BLKFINISHZONE";
> > +        ioctl_op = BLKFINISHZONE;
> > +        break;
> > +    case zone_reset:
> > +        ioctl_name = "BLKRESETZONE";
> > +        ioctl_op = BLKRESETZONE;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone operation 0x%x", op);
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* Execute the operation */
> > +    range.sector = offset;
> > +    range.nr_sectors = len;
> > +    ret = ioctl(fd, ioctl_op, &range);
> > +    if (ret != 0) {
> > +        error_report("ioctl %s failed %d",
> > +                     ioctl_name, errno);
> > +        return -errno;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static int handle_aiocb_copy_range(void *opaque)
> >  {
> >      RawPosixAIOData *aiocb = opaque;
> > @@ -2973,6 +3105,59 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
> >      }
> >  }
> >
> > +/*
> > + * zone report - Get a zone block device's information in the form
> > + * of an array of zone descriptors.
> > + *
> > + * @param bs: passing zone block device file descriptor
> > + * @param zones: an array of zone descriptors to hold zone
> > + * information on reply
> > + * @param offset: offset can be any byte within the zone size.
> > + * @param len: (not sure yet.
> > + * @return 0 on success, -1 on failure
> > + */
> > +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                                           int64_t *nr_zones,
> > +                                           BlockZoneDescriptor *zones) {
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs         = bs,
> > +        .aio_fildes = s->fd,
> > +        .aio_type   = QEMU_AIO_IOCTL,
>
> Please define QEMU_AIO_ZONE_REPORT, QEMU_AIO_ZONE_MGMT, etc values for
> these new operations instead of reusing QEMU_AIO_IOCTL, which is meant
> for generic passthrough ioctls.
> > +        .aio_offset = offset,
> > +        .zone_report    = {
> > +                .nr_zones       = nr_zones,
> > +                .zones          = zones,
> > +        },
> > +    };
> > +
> > +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> > +}
> > +
> > +/*
> > + * zone management operations - Execute an operation on a zone
> > + */
> > +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> > +        int64_t offset, int64_t len) {
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs             = bs,
> > +        .aio_fildes     = s->fd,
> > +        .aio_type       = QEMU_AIO_IOCTL,
> > +        .aio_offset     = offset,
> > +        .aio_nbytes     = len,
> > +        .zone_mgmt  = {
> > +                .op = op,
> > +        },
> > +    };
> > +
> > +    return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> > +}
> > +
> >  static coroutine_fn int
> >  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
> >                  bool blkdev)
> > @@ -3324,6 +3509,9 @@ BlockDriver bdrv_file = {
> >      .bdrv_abort_perm_update = raw_abort_perm_update,
> >      .create_opts = &raw_create_opts,
> >      .mutable_opts = mutable_opts,
> > +
> > +    .bdrv_co_zone_report = raw_co_zone_report,
> > +    .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> >  };
> >
> >  /***********************************************/
> > @@ -3703,6 +3891,53 @@ static BlockDriver bdrv_host_device = {
> >  #endif
> >  };
> >
>
> #ifdef __linux__
>
> > +static BlockDriver bdrv_zoned_host_device = {
> > +        .format_name = "zoned_host_device",
> > +        .protocol_name = "zoned_host_device",
> > +        .instance_size = sizeof(BDRVRawState),
> > +        .bdrv_needs_filename = true,
> > +        .bdrv_probe_device  = hdev_probe_device,
> > +        .bdrv_parse_filename = hdev_parse_filename,
>
> This function hardcodes "host_device:". zoned_host_device needs a
> separate function that uses "zoned_host_device:".
>
> > +        .bdrv_file_open     = hdev_open,
> > +        .bdrv_close         = raw_close,
> > +        .bdrv_reopen_prepare = raw_reopen_prepare,
> > +        .bdrv_reopen_commit  = raw_reopen_commit,
> > +        .bdrv_reopen_abort   = raw_reopen_abort,
> > +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> > +        .create_opts         = &bdrv_create_opts_simple,
> > +        .mutable_opts        = mutable_opts,
> > +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> > +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> > +
> > +        .bdrv_co_preadv         = raw_co_preadv,
> > +        .bdrv_co_pwritev        = raw_co_pwritev,
> > +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> > +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> > +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > +        .bdrv_refresh_limits = raw_refresh_limits,
> > +        .bdrv_io_plug = raw_aio_plug,
> > +        .bdrv_io_unplug = raw_aio_unplug,
> > +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > +
> > +        .bdrv_co_truncate       = raw_co_truncate,
> > +        .bdrv_getlength = raw_getlength,
> > +        .bdrv_get_info = raw_get_info,
> > +        .bdrv_get_allocated_file_size
> > +                            = raw_get_allocated_file_size,
> > +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> > +        .bdrv_check_perm = raw_check_perm,
> > +        .bdrv_set_perm   = raw_set_perm,
> > +        .bdrv_abort_perm_update = raw_abort_perm_update,
> > +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> > +        .bdrv_probe_geometry = hdev_probe_geometry,
> > +        .bdrv_co_ioctl = hdev_co_ioctl,
> > +
> > +        /* zone management operations */
> > +        .bdrv_co_zone_report = raw_co_zone_report,
> > +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> > +};
>
> #endif /* __linux__ */
>
> > +
> >  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> >  static void cdrom_parse_filename(const char *filename, QDict *options,
> >                                   Error **errp)
> > @@ -3964,6 +4199,7 @@ static void bdrv_file_init(void)
> >  #if defined(HAVE_HOST_BLOCK_DEVICE)
> >      bdrv_register(&bdrv_host_device);
> >  #ifdef __linux__
> > +    bdrv_register(&bdrv_zoned_host_device);
> >      bdrv_register(&bdrv_host_cdrom);
> >  #endif
> >  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > index fdb7306e78..78cddeeda5 100644
> > --- a/include/block/block-common.h
> > +++ b/include/block/block-common.h
> > @@ -23,7 +23,6 @@
> >   */
> >  #ifndef BLOCK_COMMON_H
> >  #define BLOCK_COMMON_H
> > -
> >  #include "block/aio.h"
> >  #include "block/aio-wait.h"
> >  #include "qemu/iov.h"
>
> Please avoid making whitespace changes that are unrelated to the patch.

Sorry, I will be more careful.

> > @@ -49,6 +48,48 @@ typedef struct BlockDriver BlockDriver;
> >  typedef struct BdrvChild BdrvChild;
> >  typedef struct BdrvChildClass BdrvChildClass;
> >
> > +typedef enum zone_op {
> > +    zone_open,
> > +    zone_close,
> > +    zone_finish,
> > +    zone_reset,
> > +} zone_op;
>
> QEMU coding style:
>
>   typedef enum {
>       BLK_ZO_OPEN,
>       BLK_ZO_CLOSE,
>       BLK_ZO_FINISH,
>       BLK_ZO_RESET,
>   } BlockZoneOp;
>
> Please also reformat the enums below.
>
> > +
> > +typedef enum zone_model {
> > +    BLK_Z_HM,
> > +    BLK_Z_HA,
> > +} zone_model;
> > +
> > +typedef enum BlkZoneCondition {
> > +    BLK_ZS_NOT_WP = 0x0,
> > +    BLK_ZS_EMPTY = 0x1,
> > +    BLK_ZS_IOPEN = 0x2,
> > +    BLK_ZS_EOPEN = 0x3,
> > +    BLK_ZS_CLOSED = 0x4,
> > +    BLK_ZS_RDONLY = 0xD,
> > +    BLK_ZS_FULL = 0xE,
> > +    BLK_ZS_OFFLINE = 0xF,
> > +} BlkZoneCondition;
> > +
> > +typedef enum BlkZoneType {
> > +    BLK_ZT_CONV = 0x1,
> > +    BLK_ZT_SWR = 0x2,
> > +    BLK_ZT_SWP = 0x3,
> > +} BlkZoneType;
> > +
> > +/*
> > + * Zone descriptor data structure.
> > + * Provide information on a zone with all position and size values in bytes.
> > + */
> > +typedef struct BlockZoneDescriptor {
> > +    uint64_t start;
> > +    uint64_t length;
> > +    uint64_t cap;
> > +    uint64_t wp;
> > +    BlkZoneType type;
> > +    BlkZoneCondition cond;
> > +} BlockZoneDescriptor;
> > +
> >  typedef struct BlockDriverInfo {
> >      /* in bytes, 0 if irrelevant */
> >      int cluster_size;
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> > index 8947abab76..6037871089 100644
> > --- a/include/block/block_int-common.h
> > +++ b/include/block/block_int-common.h
> > @@ -94,6 +94,20 @@ typedef struct BdrvTrackedRequest {
> >      struct BdrvTrackedRequest *waiting_for;
> >  } BdrvTrackedRequest;
> >
> > +/**
> > + * Zone device information data structure.
> > + * Provide information on a device.
> > + */
> > +typedef struct zbd_dev {
> > +    uint32_t zone_size;
> > +    zone_model model;
> > +    uint32_t block_size;
> > +    uint32_t write_granularity;
> > +    uint32_t nr_zones;
> > +    struct BlockZoneDescriptor *zones; /* array of zones */
> > +    uint32_t max_nr_open_zones; /* maximum number of explicitly open zones */
> > +    uint32_t max_nr_active_zones;
> > +} zbd_dev;
>
> This struct isn't use by this patch. Please move this change into the
> patch that uses struct zbd_dev.
>
> QEMU coding style naming would be BlockZoneDev instead of zbd_dev.
>
> I think this struct contains the block limits fields that could be added
> to QEMU's BlockLimits? A new struct may not be necessary.

Yes! It will in the next patch series.

> >
> >  struct BlockDriver {
> >      /*
> > @@ -691,6 +705,12 @@ struct BlockDriver {
> >                                            QEMUIOVector *qiov,
> >                                            int64_t pos);
> >
> > +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> > +            int64_t offset, int64_t *nr_zones,
> > +            BlockZoneDescriptor *zones);
> > +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, enum zone_op op,
> > +            int64_t offset, int64_t len);
> > +
> >      /* removable device specific */
> >      bool (*bdrv_is_inserted)(BlockDriverState *bs);
> >      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> > --
> > 2.36.1
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12  7:35   ` Damien Le Moal
@ 2022-07-13  0:54     ` Sam Li
  0 siblings, 0 replies; 42+ messages in thread
From: Sam Li @ 2022-07-13  0:54 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: qemu-devel, Markus Armbruster, Dmitry Fomichev, Stefan Hajnoczi,
	Hanna Reitz, qemu block, Eric Blake, Kevin Wolf, Fam Zheng,
	Hannes Reinecke

Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2022年7月12日周二 15:35写道:
>
> On 7/12/22 11:13, Sam Li wrote:
> > By adding zone management operations in BlockDriver, storage
> > controller emulation can use the new block layer APIs including
> > zone_report and zone_mgmt(open, close, finish, reset).
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  block/block-backend.c            |  41 ++++++
> >  block/coroutines.h               |   5 +
> >  block/file-posix.c               | 236 +++++++++++++++++++++++++++++++
> >  include/block/block-common.h     |  43 +++++-
> >  include/block/block_int-common.h |  20 +++
> >  5 files changed, 344 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index f425b00793..0a05247ae4 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1806,6 +1806,47 @@ int blk_flush(BlockBackend *blk)
> >      return ret;
> >  }
> >
> > +/*
> > + * Send a zone_report command.
> > + * offset can be any number within the zone size. No alignment for offset.
> > + * nr_zones represents IN maximum and OUT actual.
> > + */
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    int64_t *nr_zones,
> > +                                    BlockZoneDescriptor *zones)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk); /* increase before waiting */
> > +    blk_wait_while_drained(blk);
> > +    ret = bdrv_co_zone_report(blk->root->bs, offset, nr_zones, zones);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> > +/*
> > + * Send a zone_management command.
> > + * Offset is the start of a zone and len is aligned to zones.
> > + */
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> > +        int64_t offset, int64_t len)
> > +{
> > +    int ret;
> > +    IO_CODE();
> > +
> > +    blk_inc_in_flight(blk);
> > +    blk_wait_while_drained(blk);
> > +    ret = blk_check_byte_request(blk, offset, len);
> > +    if (ret < 0) {
> > +        return ret;
>
> You missed adding "blk_dec_in_flight(blk);" before return here. But I
> think you can move the call to blk_check_byte_request() before
> blk_inc_in_flight() to avoid having to call blk_dec_in_flight().

Yes, I'll just remove them.

> > +    }
> > +
> > +    ret = bdrv_co_zone_mgmt(blk->root->bs, op, offset, len);
> > +    blk_dec_in_flight(blk);
> > +    return ret;
> > +}
> > +
> >  void blk_drain(BlockBackend *blk)
> >  {
> >      BlockDriverState *bs = blk_bs(blk);
> > diff --git a/block/coroutines.h b/block/coroutines.h
> > index 830ecaa733..19aa96cc56 100644
> > --- a/block/coroutines.h
> > +++ b/block/coroutines.h
> > @@ -80,6 +80,11 @@ int coroutine_fn
> >  blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes);
> >
> >  int coroutine_fn blk_co_do_flush(BlockBackend *blk);
> > +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> > +                                    int64_t *nr_zones,
> > +                                    BlockZoneDescriptor *zones);
> > +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, enum zone_op op,
> > +                                  int64_t offset, int64_t len);
> >
> >
> >  /*
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 48cd096624..e7523ae2ed 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -67,6 +67,7 @@
> >  #include <sys/param.h>
> >  #include <sys/syscall.h>
> >  #include <sys/vfs.h>
> > +#include <linux/blkzoned.h>
>
> You need to conditionally include this because not all kernels provide
> this file. Old kernels will not have it. So you need something like:
>
> #if defined(CONFIG_BLKZONED)
> #include <linux/blkzoned.h>
> #endif
>
> And adding this to meson.build should do the trick:
>
> diff --git a/meson.build b/meson.build
> index 65a885ea69..31d8852a35 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1869,6 +1869,7 @@ config_host_data.set('CONFIG_REPLICATION',
> get_option('live_block_migration').al
>
>  # has_header
>  config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
> +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
>  config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
>  config_host_data.set('CONFIG_VALGRIND_H',
> cc.has_header('valgrind/valgrind.h'))
>  config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
>
> Then in build/config-host.h, you will see "#define CONFIG_BLKZONED". You
> then can use "#if defined(CONFIG_BLKZONED)" to conditionally define the
> code related to zoned devices.
>
> To test all this, temporarily rename your host
> /usr/include/linux/blkzoned.h file to some other name, configure qemu and
> see if it compiles.

Thanks!

> >  #include <linux/cdrom.h>
> >  #include <linux/fd.h>
> >  #include <linux/fs.h>
> > @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
> >              PreallocMode prealloc;
> >              Error **errp;
> >          } truncate;
> > +        struct {
> > +            int64_t *nr_zones;
> > +            BlockZoneDescriptor *zones;
> > +        } zone_report;
> > +        struct {
> > +            zone_op op;
> > +        } zone_mgmt;
> >      };
> >  } RawPosixAIOData;
> >
> > @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> >  }
> >  #endif
> >
> > +/*
> > + * parse_zone - Fill a zone descriptor
> > + */
> > +static inline void parse_zone(struct BlockZoneDescriptor *zone,
> > +                              struct blk_zone *blkz) {
> > +    zone->start = blkz->start;
> > +    zone->length = blkz->len;
> > +    zone->cap = blkz->capacity;
> > +    zone->wp = blkz->wp - blkz->start;
> > +    zone->type = blkz->type;
> > +    zone->cond = blkz->cond;
> > +}
> > +
> > +static int handle_aiocb_zone_report(void *opaque) {
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    int64_t *nr_zones = aiocb->zone_report.nr_zones;
> > +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> > +    int64_t offset = aiocb->aio_offset;
> > +
> > +    struct blk_zone *blkz;
> > +    int64_t rep_size, nrz;
> > +    int ret, n = 0, i = 0;
> > +
> > +    nrz = *nr_zones;
> > +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> > +    g_autofree struct blk_zone_report *rep = NULL;
> > +    rep = g_malloc(rep_size);
> > +    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
> > +    printf("start to report zone with offset: 0x%lx\n", offset);
> > +
> > +    blkz = (struct blk_zone *)(rep + 1);
> > +    while (n < nrz) {
> > +        memset(rep, 0, rep_size);
> > +        rep->sector = offset;
> > +        rep->nr_zones = nrz;
> > +
> > +        ret = ioctl(fd, BLKREPORTZONE, rep);
> > +        if (ret != 0) {
> > +            ret = -errno;
> > +            error_report("%d: ioctl BLKREPORTZONE at %ld failed %d",
> > +                         fd, offset, errno);
> > +            return ret;
> > +        }
> > +
> > +        if (!rep->nr_zones) {
> > +            break;
> > +        }
> > +
> > +        for (i = 0; i < rep->nr_zones; i++, n++) {
> > +            parse_zone(&zones[n], &blkz[i]);
> > +            /* The next report should start after the last zone reported */
> > +            offset = blkz[i].start + blkz[i].len;
> > +        }
> > +    }
> > +
> > +    *nr_zones = n;
> > +    return 0;
> > +}
> > +
> > +static int handle_aiocb_zone_mgmt(void *opaque) {
> > +    RawPosixAIOData *aiocb = opaque;
> > +    int fd = aiocb->aio_fildes;
> > +    int64_t offset = aiocb->aio_offset;
> > +    int64_t len = aiocb->aio_nbytes;
> > +    zone_op op = aiocb->zone_mgmt.op;
> > +
> > +    struct blk_zone_range range;
> > +    const char *ioctl_name;
> > +    unsigned long ioctl_op;
> > +    int64_t zone_size;
> > +    int64_t zone_size_mask;
> > +    int ret;
> > +
> > +    g_autofree struct stat *file = NULL;
> > +    file = g_new(struct stat, 1);
> > +    stat(s->filename, file);
> > +    zone_size = get_sysfs_long_val(fd, file, "chunk_sectors");
> > +    zone_size_mask = zone_size - 1;
> > +    if (offset & zone_size_mask) {
> > +        error_report("offset is not the start of a zone");
> > +        return -EINVAL;
> > +    }
> > +
> > +    if (len & zone_size_mask) {
> > +        error_report("len is not aligned to zones");
> > +        return -EINVAL;
> > +    }
> > +
> > +    switch (op) {
> > +    case zone_open:
> > +        ioctl_name = "BLKOPENZONE";
> > +        ioctl_op = BLKOPENZONE;
> > +        break;
> > +    case zone_close:
> > +        ioctl_name = "BLKCLOSEZONE";
> > +        ioctl_op = BLKCLOSEZONE;
> > +        break;
> > +    case zone_finish:
> > +        ioctl_name = "BLKFINISHZONE";
> > +        ioctl_op = BLKFINISHZONE;
> > +        break;
> > +    case zone_reset:
> > +        ioctl_name = "BLKRESETZONE";
> > +        ioctl_op = BLKRESETZONE;
> > +        break;
> > +    default:
> > +        error_report("Invalid zone operation 0x%x", op);
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* Execute the operation */
> > +    range.sector = offset;
> > +    range.nr_sectors = len;
> > +    ret = ioctl(fd, ioctl_op, &range);
> > +    if (ret != 0) {
> > +        error_report("ioctl %s failed %d",
> > +                     ioctl_name, errno);
> > +        return -errno;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static int handle_aiocb_copy_range(void *opaque)
> >  {
> >      RawPosixAIOData *aiocb = opaque;
> > @@ -2973,6 +3105,59 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
> >      }
> >  }
> >
> > +/*
> > + * zone report - Get a zone block device's information in the form
> > + * of an array of zone descriptors.
> > + *
> > + * @param bs: passing zone block device file descriptor
> > + * @param zones: an array of zone descriptors to hold zone
> > + * information on reply
> > + * @param offset: offset can be any byte within the zone size.
> > + * @param len: (not sure yet.
> > + * @return 0 on success, -1 on failure
> > + */
> > +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> > +                                           int64_t *nr_zones,
> > +                                           BlockZoneDescriptor *zones) {
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs         = bs,
> > +        .aio_fildes = s->fd,
> > +        .aio_type   = QEMU_AIO_IOCTL,
> > +        .aio_offset = offset,
> > +        .zone_report    = {
> > +                .nr_zones       = nr_zones,
> > +                .zones          = zones,
> > +        },
> > +    };
> > +
> > +    return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
> > +}
> > +
> > +/*
> > + * zone management operations - Execute an operation on a zone
> > + */
> > +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> > +        int64_t offset, int64_t len) {
> > +    BDRVRawState *s = bs->opaque;
> > +    RawPosixAIOData acb;
> > +
> > +    acb = (RawPosixAIOData) {
> > +        .bs             = bs,
> > +        .aio_fildes     = s->fd,
> > +        .aio_type       = QEMU_AIO_IOCTL,
> > +        .aio_offset     = offset,
> > +        .aio_nbytes     = len,
> > +        .zone_mgmt  = {
> > +                .op = op,
> > +        },
> > +    };
> > +
> > +    return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> > +}
> > +
> >  static coroutine_fn int
> >  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
> >                  bool blkdev)
> > @@ -3324,6 +3509,9 @@ BlockDriver bdrv_file = {
> >      .bdrv_abort_perm_update = raw_abort_perm_update,
> >      .create_opts = &raw_create_opts,
> >      .mutable_opts = mutable_opts,
> > +
> > +    .bdrv_co_zone_report = raw_co_zone_report,
> > +    .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> >  };
> >
> >  /***********************************************/
> > @@ -3703,6 +3891,53 @@ static BlockDriver bdrv_host_device = {
> >  #endif
> >  };
> >
> > +static BlockDriver bdrv_zoned_host_device = {
> > +        .format_name = "zoned_host_device",
> > +        .protocol_name = "zoned_host_device",
> > +        .instance_size = sizeof(BDRVRawState),
> > +        .bdrv_needs_filename = true,
> > +        .bdrv_probe_device  = hdev_probe_device,
> > +        .bdrv_parse_filename = hdev_parse_filename,
> > +        .bdrv_file_open     = hdev_open,
> > +        .bdrv_close         = raw_close,
> > +        .bdrv_reopen_prepare = raw_reopen_prepare,
> > +        .bdrv_reopen_commit  = raw_reopen_commit,
> > +        .bdrv_reopen_abort   = raw_reopen_abort,
> > +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
> > +        .create_opts         = &bdrv_create_opts_simple,
> > +        .mutable_opts        = mutable_opts,
> > +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> > +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> > +
> > +        .bdrv_co_preadv         = raw_co_preadv,
> > +        .bdrv_co_pwritev        = raw_co_pwritev,
> > +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
> > +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
> > +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > +        .bdrv_refresh_limits = raw_refresh_limits,
> > +        .bdrv_io_plug = raw_aio_plug,
> > +        .bdrv_io_unplug = raw_aio_unplug,
> > +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > +
> > +        .bdrv_co_truncate       = raw_co_truncate,
> > +        .bdrv_getlength = raw_getlength,
> > +        .bdrv_get_info = raw_get_info,
> > +        .bdrv_get_allocated_file_size
> > +                            = raw_get_allocated_file_size,
> > +        .bdrv_get_specific_stats = hdev_get_specific_stats,
> > +        .bdrv_check_perm = raw_check_perm,
> > +        .bdrv_set_perm   = raw_set_perm,
> > +        .bdrv_abort_perm_update = raw_abort_perm_update,
> > +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> > +        .bdrv_probe_geometry = hdev_probe_geometry,
> > +        .bdrv_co_ioctl = hdev_co_ioctl,
> > +
> > +        /* zone management operations */
> > +        .bdrv_co_zone_report = raw_co_zone_report,
> > +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> > +};
> > +
> >  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> >  static void cdrom_parse_filename(const char *filename, QDict *options,
> >                                   Error **errp)
> > @@ -3964,6 +4199,7 @@ static void bdrv_file_init(void)
> >  #if defined(HAVE_HOST_BLOCK_DEVICE)
> >      bdrv_register(&bdrv_host_device);
> >  #ifdef __linux__
> > +    bdrv_register(&bdrv_zoned_host_device);
> >      bdrv_register(&bdrv_host_cdrom);
> >  #endif
> >  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > index fdb7306e78..78cddeeda5 100644
> > --- a/include/block/block-common.h
> > +++ b/include/block/block-common.h
> > @@ -23,7 +23,6 @@
> >   */
> >  #ifndef BLOCK_COMMON_H
> >  #define BLOCK_COMMON_H
> > -
> >  #include "block/aio.h"
> >  #include "block/aio-wait.h"
> >  #include "qemu/iov.h"
> > @@ -49,6 +48,48 @@ typedef struct BlockDriver BlockDriver;
> >  typedef struct BdrvChild BdrvChild;
> >  typedef struct BdrvChildClass BdrvChildClass;
> >
> > +typedef enum zone_op {
> > +    zone_open,
> > +    zone_close,
> > +    zone_finish,
> > +    zone_reset,
> > +} zone_op;
> > +
> > +typedef enum zone_model {
> > +    BLK_Z_HM,
> > +    BLK_Z_HA,
> > +} zone_model;
> > +
> > +typedef enum BlkZoneCondition {
> > +    BLK_ZS_NOT_WP = 0x0,
> > +    BLK_ZS_EMPTY = 0x1,
> > +    BLK_ZS_IOPEN = 0x2,
> > +    BLK_ZS_EOPEN = 0x3,
> > +    BLK_ZS_CLOSED = 0x4,
> > +    BLK_ZS_RDONLY = 0xD,
> > +    BLK_ZS_FULL = 0xE,
> > +    BLK_ZS_OFFLINE = 0xF,
> > +} BlkZoneCondition;
> > +
> > +typedef enum BlkZoneType {
> > +    BLK_ZT_CONV = 0x1,
> > +    BLK_ZT_SWR = 0x2,
> > +    BLK_ZT_SWP = 0x3,
> > +} BlkZoneType;
> > +
> > +/*
> > + * Zone descriptor data structure.
> > + * Provide information on a zone with all position and size values in bytes.
> > + */
> > +typedef struct BlockZoneDescriptor {
> > +    uint64_t start;
> > +    uint64_t length;
> > +    uint64_t cap;
> > +    uint64_t wp;
> > +    BlkZoneType type;
> > +    BlkZoneCondition cond;
> > +} BlockZoneDescriptor;
> > +
> >  typedef struct BlockDriverInfo {
> >      /* in bytes, 0 if irrelevant */
> >      int cluster_size;
> > diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> > index 8947abab76..6037871089 100644
> > --- a/include/block/block_int-common.h
> > +++ b/include/block/block_int-common.h
> > @@ -94,6 +94,20 @@ typedef struct BdrvTrackedRequest {
> >      struct BdrvTrackedRequest *waiting_for;
> >  } BdrvTrackedRequest;
> >
> > +/**
> > + * Zone device information data structure.
> > + * Provide information on a device.
> > + */
> > +typedef struct zbd_dev {
> > +    uint32_t zone_size;
> > +    zone_model model;
> > +    uint32_t block_size;
> > +    uint32_t write_granularity;
> > +    uint32_t nr_zones;
> > +    struct BlockZoneDescriptor *zones; /* array of zones */
> > +    uint32_t max_nr_open_zones; /* maximum number of explicitly open zones */
> > +    uint32_t max_nr_active_zones;
> > +} zbd_dev;
> >
> >  struct BlockDriver {
> >      /*
> > @@ -691,6 +705,12 @@ struct BlockDriver {
> >                                            QEMUIOVector *qiov,
> >                                            int64_t pos);
> >
> > +    int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
> > +            int64_t offset, int64_t *nr_zones,
> > +            BlockZoneDescriptor *zones);
> > +    int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, enum zone_op op,
> > +            int64_t offset, int64_t len);
> > +
> >      /* removable device specific */
> >      bool (*bdrv_is_inserted)(BlockDriverState *bs);
> >      void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
>
>
> --
> Damien Le Moal
> Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-13  0:51     ` Sam Li
@ 2022-07-13  6:19       ` Stefan Hajnoczi
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-13  6:19 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Hanna Reitz, qemu block, Eric Blake, Kevin Wolf, Fam Zheng,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 3096 bytes --]

On Wed, Jul 13, 2022 at 08:51:45AM +0800, Sam Li wrote:
> Stefan Hajnoczi <stefanha@redhat.com> 于2022年7月12日周二 23:49写道:
> >
> > On Tue, Jul 12, 2022 at 10:13:37AM +0800, Sam Li wrote:
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 48cd096624..e7523ae2ed 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -67,6 +67,7 @@
> > >  #include <sys/param.h>
> > >  #include <sys/syscall.h>
> > >  #include <sys/vfs.h>
> > > +#include <linux/blkzoned.h>
> > >  #include <linux/cdrom.h>
> > >  #include <linux/fd.h>
> > >  #include <linux/fs.h>
> > > @@ -216,6 +217,13 @@ typedef struct RawPosixAIOData {
> > >              PreallocMode prealloc;
> > >              Error **errp;
> > >          } truncate;
> > > +        struct {
> > > +            int64_t *nr_zones;
> > > +            BlockZoneDescriptor *zones;
> > > +        } zone_report;
> > > +        struct {
> > > +            zone_op op;
> > > +        } zone_mgmt;
> > >      };
> > >  } RawPosixAIOData;
> > >
> > > @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> > >  }
> > >  #endif
> > >
> >
> > Are the functions below within #ifdef __linux__?
> 
> Maybe add them later?

When using #ifdefs you usually need to apply them consistently to avoid
compiler errors or warnings.

For example:

  static void foo(void) { ... }

  #ifdef __linux__
  static BlockDriver ... = {
      .foo = foo,
  };
  #endif /* __linux__ */

On non-Linux hosts the compiler only sees foo() but it's unused, so it a
warning about an unused function will be displayed.

And the other way around results in a compiler error:

  #ifdef __linux__
  static void foo(void) { ... }
  #endif /* __linux__ */

  static BlockDriver ... = {
      .foo = foo,
  };

On non-Linux hosts the compiler doesn't see the definition of foo() that
the BlockDriver struct requires and compilation fails with an error.

There will be no errors on your machine if you avoid #ifdefs everywhere,
but reviewers will be worried about it and Continuous Integration tests
will fail to build on non-Linux hosts.

I would put the #ifdefs in place from the start.

> > > +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> > > +                                           int64_t *nr_zones,
> > > +                                           BlockZoneDescriptor *zones) {
> > > +    BDRVRawState *s = bs->opaque;
> > > +    RawPosixAIOData acb;
> > > +
> > > +    acb = (RawPosixAIOData) {
> > > +        .bs         = bs,
> > > +        .aio_fildes = s->fd,
> > > +        .aio_type   = QEMU_AIO_IOCTL,
> >
> > Please define QEMU_AIO_ZONE_REPORT, QEMU_AIO_ZONE_MGMT, etc values for
> > these new operations instead of reusing QEMU_AIO_IOCTL, which is meant
> > for generic passthrough ioctls.

After looking again I think .aio_type isn't used in this code path. You
can skip initializing this field if you want and no new enum constants
are needed.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
  2022-07-12 22:12     ` Damien Le Moal
@ 2022-07-13  6:22       ` Stefan Hajnoczi
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-13  6:22 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Sam Li, qemu-devel, Markus Armbruster, dmitry.fomichev,
	Hanna Reitz, qemu-block, Eric Blake, Kevin Wolf, Fam Zheng, hare

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

On Wed, Jul 13, 2022 at 07:12:17AM +0900, Damien Le Moal wrote:
> On 7/13/22 00:49, Stefan Hajnoczi wrote:
> > On Tue, Jul 12, 2022 at 10:13:37AM +0800, Sam Li wrote:
> >> @@ -1801,6 +1809,130 @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> >>  }
> >>  #endif
> >>  
> > 
> > Are the functions below within #ifdef __linux__?
> 
> We need more than that: linux AND blkzoned.h header present (meaning a
> recent kernel). So the ifdef should be "#if defined(CONFIG_BLKZONED)" or
> something like it, with CONFIG_BLKZONED defined for linux AND
> /usr/include/linux/blkzoned.h present.

Okay. Try adding the following to meson.build:

  config_host_data.set('HAVE_LINUX_BLKZONED_H', cc.has_header('linux/blkzoned.h'))

Then:

  #ifdef HAVE_LINUX_BLKZONED_H
  ...
  #endif /* HAVE_LINUX_BLKZONED_H */

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 0/9] Add support for zoned device
  2022-07-12  5:59   ` Sam Li
@ 2022-07-18 10:53     ` Markus Armbruster
  0 siblings, 0 replies; 42+ messages in thread
From: Markus Armbruster @ 2022-07-18 10:53 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Dmitry Fomichev, Stefan Hajnoczi,
	Hanna Reitz, qemu block, Eric Blake, Kevin Wolf, Fam Zheng,
	Hannes Reinecke

Sam Li <faithilikerun@gmail.com> writes:

> Markus Armbruster <armbru@redhat.com> 于2022年7月12日周二 13:47写道:
>>
>> Sam Li <faithilikerun@gmail.com> writes:
>>
>> > This patch series adds support for zoned device to virtio-blk emulation. Zoned
>> > Storage can support sequential writes, which reduces write amplification in SSD,
>> > leading to higher write throughput and increased capacity.
>>
>> Forgive me if this has already been discussed, or is explained deeper in
>> the patch series...
>>
>> The commit message sounds like you're extending virtio-blk to optionally
>> emulate zoned storage.  Correct?
>
> Yes! The main purpose is to emulate zoned storage only for the zoned
> device files. Right now, QEMU sees those as regular block devices.
>
>> PATCH 1 adds a new block block device driver 'zoned_host_device', and
>> PATCH 9 exposes it in QAPI.  This is for passing through a zoned host
>> device, correct?
>
> Yes! It allows the guest os see zoned host device. It is still in
> development. Maybe the implementations will change later.

Your cover letter only mentions the virtio-blk part, not the
pass-through part.  Please correct that if you need to respin.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 2/9] qemu-io: add zoned block device operations.
  2022-07-12  2:13 ` [RFC v4 2/9] qemu-io: add zoned block device operations Sam Li
  2022-07-12  6:14   ` Hannes Reinecke
  2022-07-12  7:44   ` Damien Le Moal
@ 2022-07-27 14:13   ` Stefan Hajnoczi
  2022-07-28  1:57     ` Damien Le Moal
  2 siblings, 1 reply; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:13 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

On Mon, 11 Jul 2022 at 22:17, Sam Li <faithilikerun@gmail.com> wrote:
> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                        int64_t *nr_zones,
> +                        BlockZoneDescriptor *zones)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || (!drv->bdrv_co_zone_report)) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +
> +    if (drv->bdrv_co_zone_report) {

At this point we know drv->bdrv_co_zone_report is non-NULL because it
has been checked already. The if statement can be dropped.

> +        co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
> +    } else {
> +        co.ret = -ENOTSUP;

This case is already handled by if (... ||
(!drv->bdrv_co_zone_report)) above. The else body can be dropped.

> +        goto out;
> +        qemu_coroutine_yield();
> +    }
> +
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
> +int bdrv_co_zone_mgmt(BlockDriverState *bs, enum zone_op op,

Please follow QEMU coding style and typedef BdrvZoneOp instead of
writing out enum zone_op.

> +        int64_t offset, int64_t len)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || (!drv->bdrv_co_zone_mgmt)) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +
> +    if (drv->bdrv_co_zone_mgmt) {
> +        co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
> +    } else {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +        qemu_coroutine_yield();
> +    }

Same comments here.

> +
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
>      IO_CODE();
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index 053a27141a..a0ae140452 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -80,6 +80,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>  /* Ensure contents are flushed to disk.  */
>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>
> +/* Report zone information of zone block device. */
> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                     int64_t *nr_zones,
> +                                     BlockZoneDescriptor *zones);
> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> +                                   int64_t offset, int64_t len);
> +
>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> @@ -289,6 +296,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>  int generated_co_wrapper
>  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>
> +int generated_co_wrapper
> +blk_zone_report(BlockBackend *blk, int64_t offset, int64_t *nr_zones,
> +                BlockZoneDescriptor *zones);
> +int generated_co_wrapper
> +blk_zone_mgmt(BlockBackend *blk, enum zone_op op, int64_t offset, int64_t len);
> +
>  /**
>   * bdrv_parent_drained_begin_single:
>   *
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 2f0d8ac25a..a88fa322d2 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1706,6 +1706,144 @@ static const cmdinfo_t flush_cmd = {
>      .oneline    = "flush all in-core file state to disk",
>  };
>
> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, nr_zones;
> +
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    nr_zones = cvtnum(argv[optind]);
> +
> +    g_autofree BlockZoneDescriptor *zones = NULL;
> +    zones = g_new(BlockZoneDescriptor, nr_zones);
> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
> +    if (ret < 0) {
> +        printf("zone report failed: %s\n", strerror(-ret));
> +    } else {
> +        for (int i = 0; i < nr_zones; ++i) {
> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
> +                   "zcond:%u, [type: %u]\n",
> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
> +                   zones[i].cond, zones[i].type);
> +        }
> +    }
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_report_cmd = {
> +        .name = "zone_report",
> +        .altname = "zp",
> +        .cfunc = zone_report_f,
> +        .argmin = 2,
> +        .argmax = 2,
> +        .args = "offset number",
> +        .oneline = "report zone information",
> +};
> +
> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int64_t offset, len;
> +    ++optind;
> +    offset = cvtnum(argv[optind]);
> +    ++optind;
> +    len = cvtnum(argv[optind]);
> +    ret = blk_zone_mgmt(blk, zone_open, offset, len);

Please name enum constants in all caps: BDRV_ZONE_OPEN or BDRV_ZO_OPEN.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 5/9] qemu-iotests: test new zone operations.
  2022-07-12  2:13 ` [RFC v4 5/9] qemu-iotests: test new zone operations Sam Li
@ 2022-07-27 14:34   ` Stefan Hajnoczi
  2022-07-27 14:59     ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:34 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke, Jens Axboe, Ming Lei

On Mon, 11 Jul 2022 at 22:21, Sam Li <faithilikerun@gmail.com> wrote:
>
> We have added new block layer APIs of zoned block devices. Test it with:
> (1) Create a null_blk device, run each zone operation on it and see
> whether reporting right zone information.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  tests/qemu-iotests/tests/zoned.sh | 69 +++++++++++++++++++++++++++++++
>  1 file changed, 69 insertions(+)
>  create mode 100755 tests/qemu-iotests/tests/zoned.sh
>
> diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
> new file mode 100755
> index 0000000000..e14a3a420e
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/zoned.sh
> @@ -0,0 +1,69 @@
> +#!/usr/bin/env bash
> +#
> +# Test zone management operations.
> +#
> +
> +seq="$(basename $0)"
> +echo "QA output created by $seq"
> +status=1 # failure is the default!
> +
> +_cleanup()
> +{
> +  _cleanup_test_img
> +  sudo rmmod null_blk
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +. ./common.qemu
> +
> +# This test only runs on Linux hosts with raw image files.
> +_supported_fmt raw
> +_supported_proto file
> +_supported_os Linux
> +
> +QEMU_IO="build/qemu-io"
> +IMG="--image-opts driver=zoned_host_device,filename=/dev/nullb0"
> +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> +
> +echo "Testing a null_blk device"
> +echo "Simple cases: if the operations work"
> +sudo modprobe null_blk nr_devices=1 zoned=1

Jens & Ming Lei:

null_blk is not ideal for automated test cases because it requires
root and is global. If two tests are run at the same time they will
interfere with each other. There is no way to have independent
instances of the null_blk kernel module and the /dev/nullb0 device on
a single Linux system. I think this test case can be merged for now
but if there's time I suggest investigating alternatives.

Maybe the new Linux ublk_drv driver is a better choice if it supports
unprivileged operation with multiple independent block devices (plus
zoned storage!). It would be necessary to write a userspace null block
device that supports zoned storage if that doesn't exist already. I
have CCed Ming Lei and Jens Axboe for thoughts.

> +# hidden issues:
> +# 1. memory allocation error of "unaligned tcache chunk detected" when the nr_zone=1 in zone report
> +# 2. qemu-io: after report 10 zones, the program failed at double free error and exited.

What is this? It looks like a TODO list of bugs you hit? Have they
already been solved?

> +echo "report the first zone"
> +sudo $QEMU_IO $IMG -c "zr 0 0 1"
> +echo "report: the first 10 zones"
> +sudo $QEMU_IO $IMG -c "zr 0 0 10"
> +
> +echo "open the first zone"
> +sudo $QEMU_IO $IMG -c "zo 0 0x80000"
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zr 0 0 1"
> +echo "open the last zone"
> +sudo $QEMU_IO $IMG -c "zo 0x3e70000000 0x80000"
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zr 0x3e70000000 0 2"
> +
> +echo "close the first zone"
> +sudo $QEMU_IO $IMG -c "zc 0 0x80000"
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zr 0 0 1"
> +echo "close the last zone"
> +sudo $QEMU_IO $IMG -c "zc 0x3e70000000 0x80000"
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zr 0x3e70000000 0 2"
> +
> +
> +echo "reset the second zone"
> +sudo $QEMU_IO $IMG -c "zrs 0x80000 0x80000"
> +echo "After resetting a zone:"
> +sudo $QEMU_IO $IMG -c "zr 0x80000 0 5"
> +
> +# success, all done
> +echo "*** done"
> +rm -f $seq.full
> +status=0

This patch is missing the .out reference file. Did you forget to git-add(1) it?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 6/9] raw-format: add zone operations
  2022-07-12  2:13 ` [RFC v4 6/9] raw-format: add " Sam Li
@ 2022-07-27 14:39   ` Stefan Hajnoczi
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:39 UTC (permalink / raw)
  To: Sam Li, Markus Armbruster, Hanna Reitz, Kevin Wolf
  Cc: qemu-devel, Damien Le Moal, Dmitry Fomichev, Stefan Hajnoczi,
	qemu block, Eric Blake, Fam Zheng, Hannes Reinecke

On Mon, 11 Jul 2022 at 22:21, Sam Li <faithilikerun@gmail.com> wrote:
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/raw-format.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/block/raw-format.c b/block/raw-format.c
> index 69fd650eaf..96bdb6c1e2 100644
> --- a/block/raw-format.c
> +++ b/block/raw-format.c
> @@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
>      return bdrv_co_pdiscard(bs->file, offset, bytes);
>  }
>
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
> +                                           int64_t *nr_zones,
> +                                           BlockZoneDescriptor *zones) {
> +    return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
> +}
> +
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, zone_op op,
> +                                         int64_t offset, int64_t len) {
> +    return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
> +}
> +

Kevin, Markus, or Hanna: bdrv_*() APIs take a mix of BlockDriverState
*bs and BdrvChild *child arguments. Should these new APIs take bs or
child?

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 7/9] config: add check to block layer
  2022-07-12  2:13 ` [RFC v4 7/9] config: add check to block layer Sam Li
@ 2022-07-27 14:50   ` Stefan Hajnoczi
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:50 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

On Mon, 11 Jul 2022 at 22:22, Sam Li <faithilikerun@gmail.com> wrote:
>
> Putting zoned/non-zoned BlockDrivers on top of each other is not
> allowed.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block.c                          | 7 +++++++
>  block/file-posix.c               | 2 ++
>  include/block/block_int-common.h | 5 +++++
>  3 files changed, 14 insertions(+)
>
> diff --git a/block.c b/block.c
> index 2c00dddd80..0e24582c7d 100644
> --- a/block.c
> +++ b/block.c
> @@ -7945,6 +7945,13 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>          return;
>      }
>
> +    if (parent_bs->drv->is_zoned != child_bs->drv->is_zoned) {
> +        error_setg(errp, "Cannot add a %s child to a %s parent",
> +                   child_bs->drv->is_zoned ? "zoned" : "non-zoned",
> +                   parent_bs->drv->is_zoned ? "zoned" : "non-zoned");
> +        return;
> +    }

Please explain the rationale:

/*
 * Non-zoned block drivers do not follow zoned storage constraints
(i.e. sequential writes
 * to zones). Refuse mixing zoned and non-zoned drivers in a graph.
 */

> +
>      if (!QLIST_EMPTY(&child_bs->parents)) {
>          error_setg(errp, "The node %s already has a parent",
>                     child_bs->node_name);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 42708012ff..e9ad1d8e1e 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3924,6 +3924,7 @@ static BlockDriver bdrv_host_device = {
>      .format_name        = "host_device",
>      .protocol_name        = "host_device",
>      .instance_size      = sizeof(BDRVRawState),
> +    .is_zoned = false,

In C static struct fields are automatically initialized to 0/false.
This line can be omitted.

>      .bdrv_needs_filename = true,
>      .bdrv_probe_device  = hdev_probe_device,
>      .bdrv_parse_filename = hdev_parse_filename,
> @@ -3971,6 +3972,7 @@ static BlockDriver bdrv_zoned_host_device = {
>          .format_name = "zoned_host_device",
>          .protocol_name = "zoned_host_device",
>          .instance_size = sizeof(BDRVRawState),
> +        .is_zoned = true,
>          .bdrv_needs_filename = true,
>          .bdrv_probe_device  = hdev_probe_device,
>          .bdrv_parse_filename = hdev_parse_filename,
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 6037871089..29f1ec9184 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -141,6 +141,11 @@ struct BlockDriver {
>       */
>      bool is_format;
>
> +    /*
> +     * Set to true if the BlockDriver is a zoned block driver.
> +     */
> +    bool is_zoned;

This isn't powerful enough to express the constraints.
block/raw-format.c supports both non-zoned and zoned children (after
your patch) but it won't pass the check.

Perhaps add bool supports_zoned_children and change the check to:

if (child_bs->drv->is_zoned &&
!parent_bs->drv->supports_zoned_children) { ...refuse... }

Then raw-format would work on top of zoned_host_device as well as
still working on non-zoned BDSes.

Stefan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 8/9] include: add support for zoned block devices
  2022-07-12  2:13 ` [RFC v4 8/9] include: add support for zoned block devices Sam Li
@ 2022-07-27 14:52   ` Stefan Hajnoczi
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:52 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

On Mon, 11 Jul 2022 at 22:21, Sam Li <faithilikerun@gmail.com> wrote:
>
> This is the virtio_blk.h header file from Dmitry's "virtio-blk: add
> support for zoned block devices" patch. It introduces
> virtio_blk_zoned_characteristics struct from Dmitry's virtio-blk zoned
> storage spec.
>
> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  include/standard-headers/linux/virtio_blk.h | 157 ++++++++++++++++++--
>  1 file changed, 141 insertions(+), 16 deletions(-)
>
> diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
> index 2dcc90826a..f07fbe1b9b 100644
> --- a/include/standard-headers/linux/virtio_blk.h
> +++ b/include/standard-headers/linux/virtio_blk.h
> @@ -25,10 +25,10 @@
>   * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>   * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>   * SUCH DAMAGE. */
> -#include "standard-headers/linux/types.h"
> -#include "standard-headers/linux/virtio_ids.h"
> -#include "standard-headers/linux/virtio_config.h"
> -#include "standard-headers/linux/virtio_types.h"
> +#include <linux/types.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_types.h>

This file can't be copied from Linux verbatim. It needs to be
converted using scripts/update-linux-headers.sh.

Stefan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 9/9] qapi: add support for zoned host device
  2022-07-12  2:13 ` [RFC v4 9/9] qapi: add support for zoned host device Sam Li
  2022-07-12  7:48   ` Damien Le Moal
@ 2022-07-27 14:53   ` Stefan Hajnoczi
  1 sibling, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:53 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

On Mon, 11 Jul 2022 at 22:31, Sam Li <faithilikerun@gmail.com> wrote:
>
> ---
>  block/file-posix.c   | 8 +++++++-
>  qapi/block-core.json | 7 +++++--
>  2 files changed, 12 insertions(+), 3 deletions(-)

Please squash this into the patch that adds zoned_host_device.

Stefan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 0/9] Add support for zoned device
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (9 preceding siblings ...)
  2022-07-12  5:47 ` [RFC v4 0/9] Add support for zoned device Markus Armbruster
@ 2022-07-27 14:55 ` Stefan Hajnoczi
  2022-07-27 15:06 ` Stefan Hajnoczi
  11 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 14:55 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

On Mon, 11 Jul 2022 at 22:17, Sam Li <faithilikerun@gmail.com> wrote:
>
> This patch series adds support for zoned device to virtio-blk emulation. Zoned
> Storage can support sequential writes, which reduces write amplification in SSD,
> leading to higher write throughput and increased capacity.

Please use "git rebase -i master" and add "x make" lines after each
commit to check that the code still builds after each commit. It is
important to order patches so that each commit still builds
successfully. That way git-bisect(1) works and patches can be
backported/cherry-picked.

Stefan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 5/9] qemu-iotests: test new zone operations.
  2022-07-27 14:34   ` Stefan Hajnoczi
@ 2022-07-27 14:59     ` Ming Lei
  2022-07-27 15:12       ` Stefan Hajnoczi
  0 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2022-07-27 14:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Sam Li, qemu-devel, Damien Le Moal, Markus Armbruster,
	Dmitry Fomichev, Stefan Hajnoczi, Hanna Reitz, qemu block,
	Eric Blake, Kevin Wolf, Fam Zheng, Hannes Reinecke, Jens Axboe,
	ming.lei

On Wed, Jul 27, 2022 at 10:34:56AM -0400, Stefan Hajnoczi wrote:
> On Mon, 11 Jul 2022 at 22:21, Sam Li <faithilikerun@gmail.com> wrote:
> >
> > We have added new block layer APIs of zoned block devices. Test it with:
> > (1) Create a null_blk device, run each zone operation on it and see
> > whether reporting right zone information.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  tests/qemu-iotests/tests/zoned.sh | 69 +++++++++++++++++++++++++++++++
> >  1 file changed, 69 insertions(+)
> >  create mode 100755 tests/qemu-iotests/tests/zoned.sh
> >
> > diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
> > new file mode 100755
> > index 0000000000..e14a3a420e
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/zoned.sh
> > @@ -0,0 +1,69 @@
> > +#!/usr/bin/env bash
> > +#
> > +# Test zone management operations.
> > +#
> > +
> > +seq="$(basename $0)"
> > +echo "QA output created by $seq"
> > +status=1 # failure is the default!
> > +
> > +_cleanup()
> > +{
> > +  _cleanup_test_img
> > +  sudo rmmod null_blk
> > +}
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +# get standard environment, filters and checks
> > +. ./common.rc
> > +. ./common.filter
> > +. ./common.qemu
> > +
> > +# This test only runs on Linux hosts with raw image files.
> > +_supported_fmt raw
> > +_supported_proto file
> > +_supported_os Linux
> > +
> > +QEMU_IO="build/qemu-io"
> > +IMG="--image-opts driver=zoned_host_device,filename=/dev/nullb0"
> > +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> > +
> > +echo "Testing a null_blk device"
> > +echo "Simple cases: if the operations work"
> > +sudo modprobe null_blk nr_devices=1 zoned=1
> 
> Jens & Ming Lei:
> 
> null_blk is not ideal for automated test cases because it requires
> root and is global. If two tests are run at the same time they will
> interfere with each other. There is no way to have independent
> instances of the null_blk kernel module and the /dev/nullb0 device on
> a single Linux system. I think this test case can be merged for now
> but if there's time I suggest investigating alternatives.
> 
> Maybe the new Linux ublk_drv driver is a better choice if it supports
> unprivileged operation with multiple independent block devices (plus

This can be one big topic, and IMO, the io path needs to be reviewed
wrt. security risk. Also if one block device is stuck, it shouldn't
affect others, so far it can be one problem at least in sync/writeback,
if one device is hang, writeback on all block device may not move on.

> zoned storage!). It would be necessary to write a userspace null block
> device that supports zoned storage if that doesn't exist already. I
> have CCed Ming Lei and Jens Axboe for thoughts.

IMO, it shouldn't be hard to simulate zoned from userspace, the
patches[1] for setting device parameters are just sent out, then zoned
parameter could be sent to driver with the added commands easily.

Even ublk can be used to implement zoned target, such as, ublk is
exposed as one normal disk, but the backing disk could be one real
zoned/zns device.

[1] https://lore.kernel.org/linux-block/20220727141628.985429-1-ming.lei@redhat.com/T/#mb5518cf418b63fb6ca649f0df199137e50907a29

thanks,
Ming



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 0/9] Add support for zoned device
  2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
                   ` (10 preceding siblings ...)
  2022-07-27 14:55 ` Stefan Hajnoczi
@ 2022-07-27 15:06 ` Stefan Hajnoczi
  2022-07-27 15:14   ` Sam Li
  11 siblings, 1 reply; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 15:06 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

This patch series introduces the concept of zoned storage to the QEMU
block layer. Documentation is needed so that users and developers know
how to use and maintain this feature.

As a minimum, let's document how to pass through zoned block devices on Linux:

diff --git a/docs/system/qemu-block-drivers.rst.inc
b/docs/system/qemu-block-drivers.rst.inc
index dfe5d2293d..f6ba05710a 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -430,6 +430,12 @@ Hard disks
   you may corrupt your host data (use the ``-snapshot`` command
   line option or modify the device permissions accordingly).

+Zoned block devices
+  Zoned block devices can be passed through to the guest if the emulated
+  storage controller supports zoned storage. Use ``--blockdev
+  zoned_host_device,node-name=drive0,filename=/dev/nullb0`` to pass through
+  ``/dev/nullb0`` as ``drive0``.
+
 Windows
 ^^^^^^^

For developers there should be an explanation of the zoned storage
APIs and how BlockDrivers declare support. It should also mention the
status of pass through (implemented in the zoned_host_device driver)
vs zone emulation (not implemented in the QEMU block layer) so
developers understand the block layer's zoned storage capabilities.
You can add a docs/devel/zoned-storage.rst file to document this or
let me know if you want me to write it.

Stefan


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC v4 5/9] qemu-iotests: test new zone operations.
  2022-07-27 14:59     ` Ming Lei
@ 2022-07-27 15:12       ` Stefan Hajnoczi
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Hajnoczi @ 2022-07-27 15:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Sam Li, qemu-devel, Damien Le Moal, Markus Armbruster,
	Dmitry Fomichev, Stefan Hajnoczi, Hanna Reitz, qemu block,
	Eric Blake, Kevin Wolf, Fam Zheng, Hannes Reinecke, Jens Axboe

On Wed, 27 Jul 2022 at 11:00, Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Jul 27, 2022 at 10:34:56AM -0400, Stefan Hajnoczi wrote:
> > On Mon, 11 Jul 2022 at 22:21, Sam Li <faithilikerun@gmail.com> wrote:
> > >
> > > We have added new block layer APIs of zoned block devices. Test it with:
> > > (1) Create a null_blk device, run each zone operation on it and see
> > > whether reporting right zone information.
> > >
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > ---
> > >  tests/qemu-iotests/tests/zoned.sh | 69 +++++++++++++++++++++++++++++++
> > >  1 file changed, 69 insertions(+)
> > >  create mode 100755 tests/qemu-iotests/tests/zoned.sh
> > >
> > > diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
> > > new file mode 100755
> > > index 0000000000..e14a3a420e
> > > --- /dev/null
> > > +++ b/tests/qemu-iotests/tests/zoned.sh
> > > @@ -0,0 +1,69 @@
> > > +#!/usr/bin/env bash
> > > +#
> > > +# Test zone management operations.
> > > +#
> > > +
> > > +seq="$(basename $0)"
> > > +echo "QA output created by $seq"
> > > +status=1 # failure is the default!
> > > +
> > > +_cleanup()
> > > +{
> > > +  _cleanup_test_img
> > > +  sudo rmmod null_blk
> > > +}
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common.rc
> > > +. ./common.filter
> > > +. ./common.qemu
> > > +
> > > +# This test only runs on Linux hosts with raw image files.
> > > +_supported_fmt raw
> > > +_supported_proto file
> > > +_supported_os Linux
> > > +
> > > +QEMU_IO="build/qemu-io"
> > > +IMG="--image-opts driver=zoned_host_device,filename=/dev/nullb0"
> > > +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> > > +
> > > +echo "Testing a null_blk device"
> > > +echo "Simple cases: if the operations work"
> > > +sudo modprobe null_blk nr_devices=1 zoned=1
> >
> > Jens & Ming Lei:
> >
> > null_blk is not ideal for automated test cases because it requires
> > root and is global. If two tests are run at the same time they will
> > interfere with each other. There is no way to have independent
> > instances of the null_blk kernel module and the /dev/nullb0 device on
> > a single Linux system. I think this test case can be merged for now
> > but if there's time I suggest investigating alternatives.
> >
> > Maybe the new Linux ublk_drv driver is a better choice if it supports
> > unprivileged operation with multiple independent block devices (plus
>
> This can be one big topic, and IMO, the io path needs to be reviewed
> wrt. security risk. Also if one block device is stuck, it shouldn't
> affect others, so far it can be one problem at least in sync/writeback,
> if one device is hang, writeback on all block device may not move on.
>
> > zoned storage!). It would be necessary to write a userspace null block
> > device that supports zoned storage if that doesn't exist already. I
> > have CCed Ming Lei and Jens Axboe for thoughts.
>
> IMO, it shouldn't be hard to simulate zoned from userspace, the
> patches[1] for setting device parameters are just sent out, then zoned
> parameter could be sent to driver with the added commands easily.
>
> Even ublk can be used to implement zoned target, such as, ublk is
> exposed as one normal disk, but the backing disk could be one real
> zoned/zns device.
>
> [1] https://lore.kernel.org/linux-block/20220727141628.985429-1-ming.lei@redhat.com/T/#mb5518cf418b63fb6ca649f0df199137e50907a29

Thanks for replying!

For QEMU testing purposes a null block driver with zone emulation is
ideal. It does not need to persist data or zone metadata. It shouldn't
require a backing device so that testing can be performed even without
real zoned storage devices.

It should also be possible to extend Linux null_blk to make it more
friendly for parallel tests that are run without knowledge of each
other/cooperation. But I was thinking ublk may have that
infrastructure already.

Stefan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 0/9] Add support for zoned device
  2022-07-27 15:06 ` Stefan Hajnoczi
@ 2022-07-27 15:14   ` Sam Li
  0 siblings, 0 replies; 42+ messages in thread
From: Sam Li @ 2022-07-27 15:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Damien Le Moal, Markus Armbruster, Dmitry Fomichev,
	Stefan Hajnoczi, Hanna Reitz, qemu block, Eric Blake, Kevin Wolf,
	Fam Zheng, Hannes Reinecke

Stefan Hajnoczi <stefanha@gmail.com> 于2022年7月27日周三 23:06写道:
>
> This patch series introduces the concept of zoned storage to the QEMU
> block layer. Documentation is needed so that users and developers know
> how to use and maintain this feature.
>
> As a minimum, let's document how to pass through zoned block devices on Linux:
>
> diff --git a/docs/system/qemu-block-drivers.rst.inc
> b/docs/system/qemu-block-drivers.rst.inc
> index dfe5d2293d..f6ba05710a 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -430,6 +430,12 @@ Hard disks
>    you may corrupt your host data (use the ``-snapshot`` command
>    line option or modify the device permissions accordingly).
>
> +Zoned block devices
> +  Zoned block devices can be passed through to the guest if the emulated
> +  storage controller supports zoned storage. Use ``--blockdev
> +  zoned_host_device,node-name=drive0,filename=/dev/nullb0`` to pass through
> +  ``/dev/nullb0`` as ``drive0``.
> +
>  Windows
>  ^^^^^^^
>
> For developers there should be an explanation of the zoned storage
> APIs and how BlockDrivers declare support. It should also mention the
> status of pass through (implemented in the zoned_host_device driver)
> vs zone emulation (not implemented in the QEMU block layer) so
> developers understand the block layer's zoned storage capabilities.
> You can add a docs/devel/zoned-storage.rst file to document this or
> let me know if you want me to write it.

I will write the document and address the issues in the reviews, which
should be in the next revision.
Thanks for reviewing!

Have a good day!
Sam


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v4 2/9] qemu-io: add zoned block device operations.
  2022-07-27 14:13   ` Stefan Hajnoczi
@ 2022-07-28  1:57     ` Damien Le Moal
  0 siblings, 0 replies; 42+ messages in thread
From: Damien Le Moal @ 2022-07-28  1:57 UTC (permalink / raw)
  To: Stefan Hajnoczi, Sam Li
  Cc: qemu-devel, Markus Armbruster, Dmitry Fomichev, Stefan Hajnoczi,
	Hanna Reitz, qemu block, Eric Blake, Kevin Wolf, Fam Zheng,
	Hannes Reinecke

On 7/27/22 23:13, Stefan Hajnoczi wrote:
> On Mon, 11 Jul 2022 at 22:17, Sam Li <faithilikerun@gmail.com> wrote:
>> +int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
>> +                        int64_t *nr_zones,
>> +                        BlockZoneDescriptor *zones)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +    CoroutineIOCompletion co = {
>> +            .coroutine = qemu_coroutine_self(),
>> +    };
>> +    IO_CODE();
>> +
>> +    bdrv_inc_in_flight(bs);
>> +    if (!drv || (!drv->bdrv_co_zone_report)) {

You can drop the inner parenthesis for "(!drv->bdrv_co_zone_report)".

>> +        co.ret = -ENOTSUP;
>> +        goto out;
>> +    }
>> +
>> +    if (drv->bdrv_co_zone_report) {
> 
> At this point we know drv->bdrv_co_zone_report is non-NULL because it
> has been checked already. The if statement can be dropped.
> 
>> +        co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
>> +    } else {
>> +        co.ret = -ENOTSUP;
> 
> This case is already handled by if (... ||
> (!drv->bdrv_co_zone_report)) above. The else body can be dropped.
> 
>> +        goto out;
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +out:
>> +    bdrv_dec_in_flight(bs);
>> +    return co.ret;
>> +}
>> +
>> +int bdrv_co_zone_mgmt(BlockDriverState *bs, enum zone_op op,
> 
> Please follow QEMU coding style and typedef BdrvZoneOp instead of
> writing out enum zone_op.
> 
>> +        int64_t offset, int64_t len)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +    CoroutineIOCompletion co = {
>> +            .coroutine = qemu_coroutine_self(),
>> +    };
>> +    IO_CODE();
>> +
>> +    bdrv_inc_in_flight(bs);
>> +    if (!drv || (!drv->bdrv_co_zone_mgmt)) {
>> +        co.ret = -ENOTSUP;
>> +        goto out;
>> +    }
>> +
>> +    if (drv->bdrv_co_zone_mgmt) {
>> +        co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
>> +    } else {
>> +        co.ret = -ENOTSUP;
>> +        goto out;
>> +        qemu_coroutine_yield();
>> +    }
> 
> Same comments here.
> 
>> +
>> +out:
>> +    bdrv_dec_in_flight(bs);
>> +    return co.ret;
>> +}
>> +
>>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>>  {
>>      IO_CODE();
>> diff --git a/include/block/block-io.h b/include/block/block-io.h
>> index 053a27141a..a0ae140452 100644
>> --- a/include/block/block-io.h
>> +++ b/include/block/block-io.h
>> @@ -80,6 +80,13 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
>>  /* Ensure contents are flushed to disk.  */
>>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
>>
>> +/* Report zone information of zone block device. */
>> +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
>> +                                     int64_t *nr_zones,
>> +                                     BlockZoneDescriptor *zones);
>> +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, zone_op op,
>> +                                   int64_t offset, int64_t len);
>> +
>>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
>> @@ -289,6 +296,12 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>>  int generated_co_wrapper
>>  bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>>
>> +int generated_co_wrapper
>> +blk_zone_report(BlockBackend *blk, int64_t offset, int64_t *nr_zones,
>> +                BlockZoneDescriptor *zones);
>> +int generated_co_wrapper
>> +blk_zone_mgmt(BlockBackend *blk, enum zone_op op, int64_t offset, int64_t len);
>> +
>>  /**
>>   * bdrv_parent_drained_begin_single:
>>   *
>> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
>> index 2f0d8ac25a..a88fa322d2 100644
>> --- a/qemu-io-cmds.c
>> +++ b/qemu-io-cmds.c
>> @@ -1706,6 +1706,144 @@ static const cmdinfo_t flush_cmd = {
>>      .oneline    = "flush all in-core file state to disk",
>>  };
>>
>> +static int zone_report_f(BlockBackend *blk, int argc, char **argv)
>> +{
>> +    int ret;
>> +    int64_t offset, nr_zones;
>> +
>> +    ++optind;
>> +    offset = cvtnum(argv[optind]);
>> +    ++optind;
>> +    nr_zones = cvtnum(argv[optind]);
>> +
>> +    g_autofree BlockZoneDescriptor *zones = NULL;
>> +    zones = g_new(BlockZoneDescriptor, nr_zones);
>> +    ret = blk_zone_report(blk, offset, &nr_zones, zones);
>> +    if (ret < 0) {
>> +        printf("zone report failed: %s\n", strerror(-ret));
>> +    } else {
>> +        for (int i = 0; i < nr_zones; ++i) {
>> +            printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
>> +                   "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", "
>> +                   "zcond:%u, [type: %u]\n",
>> +                   zones[i].start, zones[i].length, zones[i].cap, zones[i].wp,
>> +                   zones[i].cond, zones[i].type);
>> +        }
>> +    }
>> +    return ret;
>> +}
>> +
>> +static const cmdinfo_t zone_report_cmd = {
>> +        .name = "zone_report",
>> +        .altname = "zp",
>> +        .cfunc = zone_report_f,
>> +        .argmin = 2,
>> +        .argmax = 2,
>> +        .args = "offset number",
>> +        .oneline = "report zone information",
>> +};
>> +
>> +static int zone_open_f(BlockBackend *blk, int argc, char **argv)
>> +{
>> +    int ret;
>> +    int64_t offset, len;
>> +    ++optind;
>> +    offset = cvtnum(argv[optind]);
>> +    ++optind;
>> +    len = cvtnum(argv[optind]);
>> +    ret = blk_zone_mgmt(blk, zone_open, offset, len);
> 
> Please name enum constants in all caps: BDRV_ZONE_OPEN or BDRV_ZO_OPEN.


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-07-28  2:01 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12  2:13 [RFC v4 0/9] Add support for zoned device Sam Li
2022-07-12  2:13 ` [RFC v4 1/9] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
2022-07-12  6:10   ` Hannes Reinecke
2022-07-12  6:17     ` Sam Li
2022-07-12  7:35   ` Damien Le Moal
2022-07-13  0:54     ` Sam Li
2022-07-12 15:49   ` Stefan Hajnoczi
2022-07-12 22:12     ` Damien Le Moal
2022-07-13  6:22       ` Stefan Hajnoczi
2022-07-13  0:51     ` Sam Li
2022-07-13  6:19       ` Stefan Hajnoczi
2022-07-12  2:13 ` [RFC v4 2/9] qemu-io: add zoned block device operations Sam Li
2022-07-12  6:14   ` Hannes Reinecke
2022-07-12  7:44   ` Damien Le Moal
2022-07-27 14:13   ` Stefan Hajnoczi
2022-07-28  1:57     ` Damien Le Moal
2022-07-12  2:13 ` [RFC v4 3/9] file-posix: introduce get_sysfs_long_val for a block queue of sysfs attribute Sam Li
2022-07-12  6:16   ` Hannes Reinecke
2022-07-12  7:37   ` Damien Le Moal
2022-07-12  2:13 ` [RFC v4 4/9] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
2022-07-12  6:17   ` Hannes Reinecke
2022-07-12  6:35     ` Damien Le Moal
2022-07-12  7:42   ` Damien Le Moal
2022-07-12  2:13 ` [RFC v4 5/9] qemu-iotests: test new zone operations Sam Li
2022-07-27 14:34   ` Stefan Hajnoczi
2022-07-27 14:59     ` Ming Lei
2022-07-27 15:12       ` Stefan Hajnoczi
2022-07-12  2:13 ` [RFC v4 6/9] raw-format: add " Sam Li
2022-07-27 14:39   ` Stefan Hajnoczi
2022-07-12  2:13 ` [RFC v4 7/9] config: add check to block layer Sam Li
2022-07-27 14:50   ` Stefan Hajnoczi
2022-07-12  2:13 ` [RFC v4 8/9] include: add support for zoned block devices Sam Li
2022-07-27 14:52   ` Stefan Hajnoczi
2022-07-12  2:13 ` [RFC v4 9/9] qapi: add support for zoned host device Sam Li
2022-07-12  7:48   ` Damien Le Moal
2022-07-27 14:53   ` Stefan Hajnoczi
2022-07-12  5:47 ` [RFC v4 0/9] Add support for zoned device Markus Armbruster
2022-07-12  5:59   ` Sam Li
2022-07-18 10:53     ` Markus Armbruster
2022-07-27 14:55 ` Stefan Hajnoczi
2022-07-27 15:06 ` Stefan Hajnoczi
2022-07-27 15:14   ` Sam Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).