All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Add zone append write for zoned device
@ 2023-03-10 10:31 Sam Li
  2023-03-10 10:31 ` [PATCH v6 1/4] file-posix: add tracking of the zone write pointers Sam Li
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Sam Li @ 2023-03-10 10:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block, stefanha,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta, Sam Li

This patch series add zone append operation based on the previous
zoned device support part. The file-posix driver is modified to
add zone append emulation using regular writes.

v6:
- add small fixes

v5:
- fix locking conditions and error handling
- drop some trival optimizations
- add tracing points for zone append

v4:
- fix lock related issues[Damien]
- drop all field in zone_mgmt op [Damien]
- fix state checks in zong_mgmt command [Damien]
- return start sector of wp when issuing zap req [Damien]

v3:
- only read wps when it is locked [Damien]
- allow last smaller zone case [Damien]
- add zone type and state checks in zone_mgmt command [Damien]
- fix RESET_ALL related problems

v2:
- split patch to two patches for better reviewing
- change BlockZoneWps's structure to an array of integers
- use only mutex lock on locking conditions of zone wps
- coding styles and clean-ups

v1:
- introduce zone append write

Sam Li (4):
  file-posix: add tracking of the zone write pointers
  block: introduce zone append write for zoned devices
  qemu-iotests: test zone append operation
  block: add some trace events for zone append

 block/block-backend.c              |  60 ++++++++
 block/file-posix.c                 | 212 ++++++++++++++++++++++++++++-
 block/io.c                         |  21 +++
 block/io_uring.c                   |   4 +
 block/linux-aio.c                  |   3 +
 block/raw-format.c                 |   8 ++
 block/trace-events                 |   2 +
 include/block/block-common.h       |  14 ++
 include/block/block-io.h           |   4 +
 include/block/block_int-common.h   |   8 ++
 include/block/raw-aio.h            |   4 +-
 include/sysemu/block-backend-io.h  |   9 ++
 qemu-io-cmds.c                     |  65 +++++++++
 tests/qemu-iotests/tests/zoned.out |   7 +
 tests/qemu-iotests/tests/zoned.sh  |   9 ++
 15 files changed, 422 insertions(+), 8 deletions(-)

-- 
2.39.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 1/4] file-posix: add tracking of the zone write pointers
  2023-03-10 10:31 [PATCH v6 0/4] Add zone append write for zoned device Sam Li
@ 2023-03-10 10:31 ` Sam Li
  2023-03-14  2:23   ` Dmitry Fomichev
  2023-03-16 18:51   ` Stefan Hajnoczi
  2023-03-10 10:31 ` [PATCH v6 2/4] block: introduce zone append write for zoned devices Sam Li
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Sam Li @ 2023-03-10 10:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block, stefanha,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta, Sam Li

Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.

The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/file-posix.c               | 159 ++++++++++++++++++++++++++++++-
 include/block/block-common.h     |  14 +++
 include/block/block_int-common.h |   3 +
 3 files changed, 172 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 563acc76ae..61ed769ac8 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1324,6 +1324,77 @@ static int hdev_get_max_segments(int fd, struct stat *st)
 #endif
 }
 
+#if defined(CONFIG_BLKZONED)
+static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+                        unsigned int nrz) {
+    struct blk_zone *blkz;
+    size_t rep_size;
+    uint64_t sector = offset >> BDRV_SECTOR_BITS;
+    int ret, n = 0, i = 0;
+    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+    g_autofree struct blk_zone_report *rep = NULL;
+
+    rep = g_malloc(rep_size);
+    blkz = (struct blk_zone *)(rep + 1);
+    while (n < nrz) {
+        memset(rep, 0, rep_size);
+        rep->sector = sector;
+        rep->nr_zones = nrz - n;
+
+        do {
+            ret = ioctl(fd, BLKREPORTZONE, rep);
+        } while (ret != 0 && errno == EINTR);
+        if (ret != 0) {
+            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+                    fd, offset, errno);
+            return -errno;
+        }
+
+        if (!rep->nr_zones) {
+            break;
+        }
+
+        for (i = 0; i < rep->nr_zones; i++, n++) {
+            /*
+             * The wp tracking cares only about sequential writes required and
+             * sequential write preferred zones so that the wp can advance to
+             * the right location.
+             * Use the most significant bit of the wp location to indicate the
+             * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
+             */
+            if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+                wps->wp[i] = 1ULL << 63;
+            } else {
+                switch(blkz[i].cond) {
+                case BLK_ZONE_COND_FULL:
+                case BLK_ZONE_COND_READONLY:
+                    /* Zone not writable */
+                    wps->wp[i] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
+                    break;
+                case BLK_ZONE_COND_OFFLINE:
+                    /* Zone not writable nor readable */
+                    wps->wp[i] = (blkz[i].start) << BDRV_SECTOR_BITS;
+                    break;
+                default:
+                    wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
+                    break;
+                }
+            }
+        }
+        sector = blkz[i - 1].start + blkz[i - 1].len;
+    }
+
+    return 0;
+}
+
+static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+                            unsigned int nrz) {
+    if (get_zones_wp(fd, wps, offset, nrz) < 0) {
+        error_report("update zone wp failed");
+    }
+}
+#endif
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
@@ -1413,6 +1484,21 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
         if (ret >= 0) {
             bs->bl.max_active_zones = ret;
         }
+
+        ret = get_sysfs_long_val(&st, "physical_block_size");
+        if (ret >= 0) {
+            bs->bl.write_granularity = ret;
+        }
+
+        bs->bl.wps = g_malloc(sizeof(BlockZoneWps) +
+                sizeof(int64_t) * bs->bl.nr_zones);
+        ret = get_zones_wp(s->fd, bs->bl.wps, 0, bs->bl.nr_zones);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "report wps failed");
+            g_free(bs->bl.wps);
+            return;
+        }
+        qemu_co_mutex_init(&bs->bl.wps->colock);
         return;
     }
 out:
@@ -2338,9 +2424,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
 {
     BDRVRawState *s = bs->opaque;
     RawPosixAIOData acb;
+    int ret;
 
     if (fd_open(bs) < 0)
         return -EIO;
+#if defined(CONFIG_BLKZONED)
+    if (bs->bl.wps) {
+        qemu_co_mutex_lock(&bs->bl.wps->colock);
+    }
+#endif
 
     /*
      * When using O_DIRECT, the request must be aligned to be able to use
@@ -2354,14 +2446,16 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
     } else if (s->use_linux_io_uring) {
         LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
         assert(qiov->size == bytes);
-        return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
+        ret = luring_co_submit(bs, aio, s->fd, offset, qiov, type);
+        goto out;
 #endif
 #ifdef CONFIG_LINUX_AIO
     } else if (s->use_linux_aio) {
         LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
         assert(qiov->size == bytes);
-        return laio_co_submit(bs, aio, s->fd, offset, qiov, type,
+        ret = laio_co_submit(bs, aio, s->fd, offset, qiov, type,
                               s->aio_max_batch);
+        goto out;
 #endif
     }
 
@@ -2378,7 +2472,32 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
     };
 
     assert(qiov->size == bytes);
-    return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
+    ret = raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
+
+out:
+#if defined(CONFIG_BLKZONED)
+    BlockZoneWps *wps = bs->bl.wps;
+    if (ret == 0) {
+        if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
+            int index = offset / bs->bl.zone_size;
+            if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
+                /* Advance the wp if needed */
+                if (offset + bytes > wps->wp[index]) {
+                    wps->wp[index] = offset + bytes;
+                }
+            }
+        }
+    } else {
+        if (type & QEMU_AIO_WRITE) {
+            update_zones_wp(s->fd, bs->bl.wps, 0, 1);
+        }
+    }
+
+    if (wps) {
+        qemu_co_mutex_unlock(&wps->colock);
+    }
+#endif
+    return ret;
 }
 
 static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
@@ -2486,6 +2605,11 @@ static void raw_close(BlockDriverState *bs)
     BDRVRawState *s = bs->opaque;
 
     if (s->fd >= 0) {
+#if defined(CONFIG_BLKZONED)
+        if (bs->bl.wps) {
+            g_free(bs->bl.wps);
+        }
+#endif
         qemu_close(s->fd);
         s->fd = -1;
     }
@@ -3285,6 +3409,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
     const char *op_name;
     unsigned long zo;
     int ret;
+    BlockZoneWps *wps = bs->bl.wps;
     int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
 
     zone_size = bs->bl.zone_size;
@@ -3302,6 +3427,14 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
         return -EINVAL;
     }
 
+    qemu_co_mutex_lock(&wps->colock);
+    uint32_t index = offset / bs->bl.zone_size;
+    if (BDRV_ZT_IS_CONV(wps->wp[index]) && len != capacity) {
+        error_report("zone mgmt operations are not allowed for conventional zones");
+        ret = -EIO;
+        goto out;
+    }
+
     switch (op) {
     case BLK_ZO_OPEN:
         op_name = "BLKOPENZONE";
@@ -3321,7 +3454,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
         break;
     default:
         error_report("Unsupported zone op: 0x%x", op);
-        return -ENOTSUP;
+        ret = -ENOTSUP;
+        goto out;
     }
 
     acb = (RawPosixAIOData) {
@@ -3339,10 +3473,27 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
                         len >> BDRV_SECTOR_BITS);
     ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
     if (ret != 0) {
+        update_zones_wp(s->fd, wps, offset, index);
         ret = -errno;
         error_report("ioctl %s failed %d", op_name, ret);
+        goto out;
     }
 
+    if (zo == BLKRESETZONE && len == capacity) {
+        for (int i = 0; i < bs->bl.nr_zones; ++i) {
+            if (!BDRV_ZT_IS_CONV(wps->wp[i])) {
+                wps->wp[i] = i * bs->bl.zone_size;
+            }
+        }
+    } else if (zo == BLKRESETZONE) {
+        wps->wp[index] = offset;
+    } else if (zo == BLKFINISHZONE) {
+        /* The zoned device allows the last zone smaller that the zone size. */
+        wps->wp[index] = offset + len;
+    }
+
+out:
+    qemu_co_mutex_unlock(&wps->colock);
     return ret;
 }
 #endif
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 1576fcf2ed..93196229ac 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -118,6 +118,14 @@ typedef struct BlockZoneDescriptor {
     BlockZoneState state;
 } BlockZoneDescriptor;
 
+/*
+ * Track write pointers of a zone in bytes.
+ */
+typedef struct BlockZoneWps {
+    CoMutex colock;
+    uint64_t wp[];
+} BlockZoneWps;
+
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
     int cluster_size;
@@ -240,6 +248,12 @@ typedef enum {
 #define BDRV_SECTOR_BITS   9
 #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
 
+/*
+ * Get the first most significant bit of wp. If it is zero, then
+ * the zone type is SWR.
+ */
+#define BDRV_ZT_IS_CONV(wp)    (wp & (1ULL << 63))
+
 #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
                                            INT_MAX >> BDRV_SECTOR_BITS)
 #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 1bd2aef4d5..19915b34af 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -884,6 +884,9 @@ typedef struct BlockLimits {
 
     /* maximum number of active zones */
     int64_t max_active_zones;
+
+    /* array of write pointers' location of each zone in the zoned device. */
+    BlockZoneWps *wps;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/4] block: introduce zone append write for zoned devices
  2023-03-10 10:31 [PATCH v6 0/4] Add zone append write for zoned device Sam Li
  2023-03-10 10:31 ` [PATCH v6 1/4] file-posix: add tracking of the zone write pointers Sam Li
@ 2023-03-10 10:31 ` Sam Li
  2023-03-14  2:55   ` Dmitry Fomichev
  2023-03-16 18:56   ` Stefan Hajnoczi
  2023-03-10 10:31 ` [PATCH v6 3/4] qemu-iotests: test zone append operation Sam Li
  2023-03-10 10:31 ` [PATCH v6 4/4] block: add some trace events for zone append Sam Li
  3 siblings, 2 replies; 14+ messages in thread
From: Sam Li @ 2023-03-10 10:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block, stefanha,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta, Sam Li

A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of writes is pointing
to the write pointer of that zone. Upon completion the device will
respond with the position the data has been written in the zone.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/block-backend.c             | 60 +++++++++++++++++++++++++++++++
 block/file-posix.c                | 54 +++++++++++++++++++++++++---
 block/io.c                        | 21 +++++++++++
 block/io_uring.c                  |  4 +++
 block/linux-aio.c                 |  3 ++
 block/raw-format.c                |  8 +++++
 include/block/block-io.h          |  4 +++
 include/block/block_int-common.h  |  5 +++
 include/block/raw-aio.h           |  4 ++-
 include/sysemu/block-backend-io.h |  9 +++++
 10 files changed, 166 insertions(+), 6 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f70b08e3f6..28e8f5d778 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1888,6 +1888,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
     return &acb->common;
 }
 
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
+{
+    BlkAioEmAIOCB *acb = opaque;
+    BlkRwCo *rwco = &acb->rwco;
+
+    rwco->ret = blk_co_zone_append(rwco->blk, &acb->bytes,
+                                   rwco->iobuf, rwco->flags);
+    blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
+                                QEMUIOVector *qiov, BdrvRequestFlags flags,
+                                BlockCompletionFunc *cb, void *opaque) {
+    BlkAioEmAIOCB *acb;
+    Coroutine *co;
+    IO_CODE();
+
+    blk_inc_in_flight(blk);
+    acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
+    acb->rwco = (BlkRwCo) {
+        .blk    = blk,
+        .ret    = NOT_DONE,
+        .flags  = flags,
+        .iobuf  = qiov,
+    };
+    acb->bytes = *offset;
+    acb->has_returned = false;
+
+    co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
+    aio_co_enter(blk_get_aio_context(blk), co);
+    acb->has_returned = true;
+    if (acb->rwco.ret != NOT_DONE) {
+        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+                                         blk_aio_complete_bh, acb);
+    }
+
+    return &acb->common;
+}
+
 /*
  * Send a zone_report command.
  * offset is a byte offset from the start of the device. No alignment
@@ -1939,6 +1978,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
     return ret;
 }
 
+/*
+ * Send a zone_append command.
+ */
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
+        QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+    int ret;
+    IO_CODE();
+
+    blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
+    if (!blk_is_available(blk)) {
+        blk_dec_in_flight(blk);
+        return -ENOMEDIUM;
+    }
+
+    ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
+    blk_dec_in_flight(blk);
+    return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
     BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 61ed769ac8..2ba9174778 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -160,6 +160,7 @@ typedef struct BDRVRawState {
     bool has_write_zeroes:1;
     bool use_linux_aio:1;
     bool use_linux_io_uring:1;
+    int64_t *offset; /* offset of zone append operation */
     int page_cache_inconsistent; /* errno from fdatasync failure */
     bool has_fallocate;
     bool needs_alignment;
@@ -1672,7 +1673,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
     ssize_t len;
 
     len = RETRY_ON_EINTR(
-        (aiocb->aio_type & QEMU_AIO_WRITE) ?
+        (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
             qemu_pwritev(aiocb->aio_fildes,
                            aiocb->io.iov,
                            aiocb->io.niov,
@@ -1701,7 +1702,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
     ssize_t len;
 
     while (offset < aiocb->aio_nbytes) {
-        if (aiocb->aio_type & QEMU_AIO_WRITE) {
+        if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
             len = pwrite(aiocb->aio_fildes,
                          (const char *)buf + offset,
                          aiocb->aio_nbytes - offset,
@@ -1794,7 +1795,7 @@ static int handle_aiocb_rw(void *opaque)
     }
 
     nbytes = handle_aiocb_rw_linear(aiocb, buf);
-    if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
+    if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
         char *p = buf;
         size_t count = aiocb->aio_nbytes, copy;
         int i;
@@ -2431,6 +2432,10 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
 #if defined(CONFIG_BLKZONED)
     if (bs->bl.wps) {
         qemu_co_mutex_lock(&bs->bl.wps->colock);
+        if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
+            int index = offset / bs->bl.zone_size;
+            offset = bs->bl.wps->wp[index];
+        }
     }
 #endif
 
@@ -2478,9 +2483,13 @@ out:
 #if defined(CONFIG_BLKZONED)
     BlockZoneWps *wps = bs->bl.wps;
     if (ret == 0) {
-        if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
+        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
+            && wps && bs->bl.zone_size) {
             int index = offset / bs->bl.zone_size;
             if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
+                if (type & QEMU_AIO_ZONE_APPEND) {
+                    *s->offset = wps->wp[index];
+                }
                 /* Advance the wp if needed */
                 if (offset + bytes > wps->wp[index]) {
                     wps->wp[index] = offset + bytes;
@@ -2488,7 +2497,7 @@ out:
             }
         }
     } else {
-        if (type & QEMU_AIO_WRITE) {
+        if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
             update_zones_wp(s->fd, bs->bl.wps, 0, 1);
         }
     }
@@ -3498,6 +3507,40 @@ out:
 }
 #endif
 
+#if defined(CONFIG_BLKZONED)
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
+                                           int64_t *offset,
+                                           QEMUIOVector *qiov,
+                                           BdrvRequestFlags flags) {
+    assert(flags == 0);
+    int64_t zone_size_mask = bs->bl.zone_size - 1;
+    int64_t iov_len = 0;
+    int64_t len = 0;
+    BDRVRawState *s = bs->opaque;
+    s->offset = offset;
+
+    if (*offset & zone_size_mask) {
+        error_report("sector offset %" PRId64 " is not aligned to zone size "
+                     "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
+        return -EINVAL;
+    }
+
+    int64_t wg = bs->bl.write_granularity;
+    int64_t wg_mask = wg - 1;
+    for (int i = 0; i < qiov->niov; i++) {
+        iov_len = qiov->iov[i].iov_len;
+        if (iov_len & wg_mask) {
+            error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
+                         "block size %" PRId64 "", i, iov_len, wg);
+            return -EINVAL;
+        }
+        len += iov_len;
+    }
+
+    return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
+}
+#endif
+
 static coroutine_fn int
 raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
                 bool blkdev)
@@ -4259,6 +4302,7 @@ static BlockDriver bdrv_host_device = {
     /* zone management operations */
     .bdrv_co_zone_report = raw_co_zone_report,
     .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
+    .bdrv_co_zone_append = raw_co_zone_append,
 #endif
 };
 
diff --git a/block/io.c b/block/io.c
index 5dbf1e50f2..fe9cabaaf6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3152,6 +3152,27 @@ out:
     return co.ret;
 }
 
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
+                        QEMUIOVector *qiov,
+                        BdrvRequestFlags flags)
+{
+    BlockDriver *drv = bs->drv;
+    CoroutineIOCompletion co = {
+            .coroutine = qemu_coroutine_self(),
+    };
+    IO_CODE();
+
+    bdrv_inc_in_flight(bs);
+    if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
+        co.ret = -ENOTSUP;
+        goto out;
+    }
+    co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
+out:
+    bdrv_dec_in_flight(bs);
+    return co.ret;
+}
+
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
     IO_CODE();
diff --git a/block/io_uring.c b/block/io_uring.c
index 973e15d876..f7488c241a 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -345,6 +345,10 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
         io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
                              luringcb->qiov->niov, offset);
         break;
+    case QEMU_AIO_ZONE_APPEND:
+        io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
+                             luringcb->qiov->niov, offset);
+        break;
     case QEMU_AIO_READ:
         io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
                             luringcb->qiov->niov, offset);
diff --git a/block/linux-aio.c b/block/linux-aio.c
index d2cfb7f523..1959834156 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -389,6 +389,9 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
     case QEMU_AIO_WRITE:
         io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
         break;
+    case QEMU_AIO_ZONE_APPEND:
+        io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
+        break;
     case QEMU_AIO_READ:
         io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
         break;
diff --git a/block/raw-format.c b/block/raw-format.c
index 72e23e7b55..64e7d48d04 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -332,6 +332,13 @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
     return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
+                   BdrvRequestFlags flags)
+{
+    return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
+}
+
 static int64_t coroutine_fn GRAPH_RDLOCK
 raw_co_getlength(BlockDriverState *bs)
 {
@@ -635,6 +642,7 @@ BlockDriver bdrv_raw = {
     .bdrv_co_pdiscard     = &raw_co_pdiscard,
     .bdrv_co_zone_report  = &raw_co_zone_report,
     .bdrv_co_zone_mgmt  = &raw_co_zone_mgmt,
+    .bdrv_co_zone_append = &raw_co_zone_append,
     .bdrv_co_block_status = &raw_co_block_status,
     .bdrv_co_copy_range_from = &raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = &raw_co_copy_range_to,
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 19d1fad9cf..55fca02991 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -120,6 +120,10 @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
 int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
                                                 BlockZoneOp op,
                                                 int64_t offset, int64_t len);
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
+                                                  int64_t *offset,
+                                                  QEMUIOVector *qiov,
+                                                  BdrvRequestFlags flags);
 
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int bdrv_block_status(BlockDriverState *bs, int64_t offset,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 19915b34af..ccd8811919 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -724,6 +724,9 @@ struct BlockDriver {
             BlockZoneDescriptor *zones);
     int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
             int64_t offset, int64_t len);
+    int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
+            int64_t *offset, QEMUIOVector *qiov,
+            BdrvRequestFlags flags);
 
     /* removable device specific */
     bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
@@ -887,6 +890,8 @@ typedef struct BlockLimits {
 
     /* array of write pointers' location of each zone in the zoned device. */
     BlockZoneWps *wps;
+
+    int64_t write_granularity;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index eda6a7a253..fb9c9f5a01 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -30,6 +30,7 @@
 #define QEMU_AIO_TRUNCATE     0x0080
 #define QEMU_AIO_ZONE_REPORT  0x0100
 #define QEMU_AIO_ZONE_MGMT    0x0200
+#define QEMU_AIO_ZONE_APPEND  0x0400
 #define QEMU_AIO_TYPE_MASK \
         (QEMU_AIO_READ | \
          QEMU_AIO_WRITE | \
@@ -40,7 +41,8 @@
          QEMU_AIO_COPY_RANGE | \
          QEMU_AIO_TRUNCATE | \
          QEMU_AIO_ZONE_REPORT | \
-         QEMU_AIO_ZONE_MGMT)
+         QEMU_AIO_ZONE_MGMT | \
+         QEMU_AIO_ZONE_APPEND)
 
 /* AIO flags */
 #define QEMU_AIO_MISALIGNED   0x1000
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index f575ab5b6b..e716591a1a 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -53,6 +53,9 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
 BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
                               int64_t offset, int64_t len,
                               BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
+                                QEMUIOVector *qiov, BdrvRequestFlags flags,
+                                BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
                              BlockCompletionFunc *cb, void *opaque);
 void blk_aio_cancel_async(BlockAIOCB *acb);
@@ -201,6 +204,12 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
                                   int64_t offset, int64_t len);
 int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
                                        int64_t offset, int64_t len);
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
+                                    QEMUIOVector *qiov,
+                                    BdrvRequestFlags flags);
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
+                                         QEMUIOVector *qiov,
+                                         BdrvRequestFlags flags);
 
 int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
                                   int64_t bytes);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 3/4] qemu-iotests: test zone append operation
  2023-03-10 10:31 [PATCH v6 0/4] Add zone append write for zoned device Sam Li
  2023-03-10 10:31 ` [PATCH v6 1/4] file-posix: add tracking of the zone write pointers Sam Li
  2023-03-10 10:31 ` [PATCH v6 2/4] block: introduce zone append write for zoned devices Sam Li
@ 2023-03-10 10:31 ` Sam Li
  2023-03-16 18:59   ` Stefan Hajnoczi
  2023-03-10 10:31 ` [PATCH v6 4/4] block: add some trace events for zone append Sam Li
  3 siblings, 1 reply; 14+ messages in thread
From: Sam Li @ 2023-03-10 10:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block, stefanha,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta, Sam Li

This tests is mainly a helper to indicate append writes in block layer
behaves as expected.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 qemu-io-cmds.c                     | 65 ++++++++++++++++++++++++++++++
 tests/qemu-iotests/tests/zoned.out |  7 ++++
 tests/qemu-iotests/tests/zoned.sh  |  9 +++++
 3 files changed, 81 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index f35ea627d7..4159f41ab9 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1874,6 +1874,70 @@ static const cmdinfo_t zone_reset_cmd = {
     .oneline = "reset a zone write pointer in zone block device",
 };
 
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
+                              int64_t *offset, int flags, int *total)
+{
+    int async_ret = NOT_DONE;
+
+    blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
+    while (async_ret == NOT_DONE) {
+        main_loop_wait(false);
+    }
+
+    *total = qiov->size;
+    return async_ret < 0 ? async_ret : 1;
+}
+
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
+{
+    int ret;
+    int flags = 0;
+    int total = 0;
+    int64_t offset;
+    char *buf;
+    int nr_iov;
+    int pattern = 0xcd;
+    QEMUIOVector qiov;
+
+    if (optind > argc - 2) {
+        return -EINVAL;
+    }
+    optind++;
+    offset = cvtnum(argv[optind]);
+    if (offset < 0) {
+        print_cvtnum_err(offset, argv[optind]);
+        return offset;
+    }
+    optind++;
+    nr_iov = argc - optind;
+    buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
+                       flags & BDRV_REQ_REGISTERED_BUF);
+    if (buf == NULL) {
+        return -EINVAL;
+    }
+    ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
+    if (ret < 0) {
+        printf("zone append failed: %s\n", strerror(-ret));
+        goto out;
+    }
+
+out:
+    qemu_io_free(blk, buf, qiov.size,
+                 flags & BDRV_REQ_REGISTERED_BUF);
+    qemu_iovec_destroy(&qiov);
+    return ret;
+}
+
+static const cmdinfo_t zone_append_cmd = {
+    .name = "zone_append",
+    .altname = "zap",
+    .cfunc = zone_append_f,
+    .argmin = 3,
+    .argmax = 3,
+    .args = "offset len [len..]",
+    .oneline = "append write a number of bytes at a specified offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
     .name       = "truncate",
@@ -2672,6 +2736,7 @@ static void __attribute((constructor)) init_qemuio_commands(void)
     qemuio_add_command(&zone_close_cmd);
     qemuio_add_command(&zone_finish_cmd);
     qemuio_add_command(&zone_reset_cmd);
+    qemuio_add_command(&zone_append_cmd);
     qemuio_add_command(&truncate_cmd);
     qemuio_add_command(&length_cmd);
     qemuio_add_command(&info_cmd);
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
index 0c8f96deb9..b3b139b4ec 100644
--- a/tests/qemu-iotests/tests/zoned.out
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -50,4 +50,11 @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
 (5) resetting the second zone
 After resetting a zone:
 start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
+
+
+(6) append write
+After appending the first zone:
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
+After appending the second zone:
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
 *** done
diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
index 9d7c15dde6..6c3ded6c4c 100755
--- a/tests/qemu-iotests/tests/zoned.sh
+++ b/tests/qemu-iotests/tests/zoned.sh
@@ -79,6 +79,15 @@ echo "(5) resetting the second zone"
 sudo $QEMU_IO $IMG -c "zrs 268435456 268435456"
 echo "After resetting a zone:"
 sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(6) append write" # physical block size of the device is 4096
+sudo $QEMU_IO $IMG -c "zap 0 0x1000 0x2000"
+echo "After appending the first zone:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+sudo $QEMU_IO $IMG -c "zap 268435456 0x1000 0x2000"
+echo "After appending the second zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
 
 # success, all done
 echo "*** done"
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 4/4] block: add some trace events for zone append
  2023-03-10 10:31 [PATCH v6 0/4] Add zone append write for zoned device Sam Li
                   ` (2 preceding siblings ...)
  2023-03-10 10:31 ` [PATCH v6 3/4] qemu-iotests: test zone append operation Sam Li
@ 2023-03-10 10:31 ` Sam Li
  2023-03-14  2:28   ` Dmitry Fomichev
  3 siblings, 1 reply; 14+ messages in thread
From: Sam Li @ 2023-03-10 10:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block, stefanha,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta, Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/file-posix.c | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2ba9174778..5187f810e5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2489,6 +2489,8 @@ out:
             if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
                 if (type & QEMU_AIO_ZONE_APPEND) {
                     *s->offset = wps->wp[index];
+                    trace_zbd_zone_append_complete(bs, *s->offset
+                        >> BDRV_SECTOR_BITS);
                 }
                 /* Advance the wp if needed */
                 if (offset + bytes > wps->wp[index]) {
@@ -3537,6 +3539,7 @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
         len += iov_len;
     }
 
+    trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
     return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
 }
 #endif
diff --git a/block/trace-events b/block/trace-events
index 3f4e1d088a..32665158d6 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
 file_flush_fdatasync_failed(int err) "errno %d"
 zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
 zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
 
 # ssh.c
 sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/4] file-posix: add tracking of the zone write pointers
  2023-03-10 10:31 ` [PATCH v6 1/4] file-posix: add tracking of the zone write pointers Sam Li
@ 2023-03-14  2:23   ` Dmitry Fomichev
  2023-03-14  3:49     ` Damien Le Moal
  2023-03-16 18:51   ` Stefan Hajnoczi
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Fomichev @ 2023-03-14  2:23 UTC (permalink / raw)
  To: faithilikerun, qemu-devel
  Cc: hreitz, hare, sgarzare, mehta.aaru20, stefanha, fam, qemu-block,
	kwolf, jusual, damien.lemoal

On Fri, 2023-03-10 at 18:31 +0800, Sam Li wrote:
> Since Linux doesn't have a user API to issue zone append operations to
> zoned devices from user space, the file-posix driver is modified to add
> zone append emulation using regular writes. To do this, the file-posix
> driver tracks the wp location of all zones of the device. It uses an
> array of uint64_t. The most significant bit of each wp location indicates
> if the zone type is conventional zones.
> 
> The zones wp can be changed due to the following operations issued:
> - zone reset: change the wp to the start offset of that zone
> - zone finish: change to the end location of that zone
> - write to a zone
> - zone append
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/file-posix.c               | 159 ++++++++++++++++++++++++++++++-
>  include/block/block-common.h     |  14 +++
>  include/block/block_int-common.h |   3 +
>  3 files changed, 172 insertions(+), 4 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 563acc76ae..61ed769ac8 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1324,6 +1324,77 @@ static int hdev_get_max_segments(int fd, struct stat
> *st)
>  #endif
>  }
>  
> +#if defined(CONFIG_BLKZONED)
> +static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +                        unsigned int nrz) {
> +    struct blk_zone *blkz;
> +    size_t rep_size;
> +    uint64_t sector = offset >> BDRV_SECTOR_BITS;
> +    int ret, n = 0, i = 0;
> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
> +    g_autofree struct blk_zone_report *rep = NULL;
> +
> +    rep = g_malloc(rep_size);
> +    blkz = (struct blk_zone *)(rep + 1);
> +    while (n < nrz) {
> +        memset(rep, 0, rep_size);
> +        rep->sector = sector;
> +        rep->nr_zones = nrz - n;
> +
> +        do {
> +            ret = ioctl(fd, BLKREPORTZONE, rep);
> +        } while (ret != 0 && errno == EINTR);
> +        if (ret != 0) {
> +            error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> +                    fd, offset, errno);
> +            return -errno;
> +        }
> +
> +        if (!rep->nr_zones) {
> +            break;
> +        }
> +
> +        for (i = 0; i < rep->nr_zones; i++, n++) {
> +            /*
> +             * The wp tracking cares only about sequential writes required and
> +             * sequential write preferred zones so that the wp can advance to
> +             * the right location.
> +             * Use the most significant bit of the wp location to indicate the
> +             * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
> +             */
> +            if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +                wps->wp[i] = 1ULL << 63;
> +            } else {
> +                switch(blkz[i].cond) {
> +                case BLK_ZONE_COND_FULL:
> +                case BLK_ZONE_COND_READONLY:
> +                    /* Zone not writable */
> +                    wps->wp[i] = (blkz[i].start + blkz[i].len) <<
> BDRV_SECTOR_BITS;
> +                    break;
> +                case BLK_ZONE_COND_OFFLINE:
> +                    /* Zone not writable nor readable */
> +                    wps->wp[i] = (blkz[i].start) << BDRV_SECTOR_BITS;
> +                    break;
> +                default:
> +                    wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
> +                    break;
> +                }
> +            }
> +        }
> +        sector = blkz[i - 1].start + blkz[i - 1].len;
> +    }
> +
> +    return 0;
> +}
> +
> +static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +                            unsigned int nrz) {
> +    if (get_zones_wp(fd, wps, offset, nrz) < 0) {
> +        error_report("update zone wp failed");
> +    }
> +}
> +#endif
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
> @@ -1413,6 +1484,21 @@ static void raw_refresh_limits(BlockDriverState *bs,
> Error **errp)
>          if (ret >= 0) {
>              bs->bl.max_active_zones = ret;
>          }
> +
> +        ret = get_sysfs_long_val(&st, "physical_block_size");
> +        if (ret >= 0) {
> +            bs->bl.write_granularity = ret;
> +        }
> +
> +        bs->bl.wps = g_malloc(sizeof(BlockZoneWps) +
> +                sizeof(int64_t) * bs->bl.nr_zones);
> +        ret = get_zones_wp(s->fd, bs->bl.wps, 0, bs->bl.nr_zones);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "report wps failed");
> +            g_free(bs->bl.wps);
> +            return;
> +        }
> +        qemu_co_mutex_init(&bs->bl.wps->colock);
>          return;
>      }
>  out:
> @@ -2338,9 +2424,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs,
> uint64_t offset,
>  {
>      BDRVRawState *s = bs->opaque;
>      RawPosixAIOData acb;
> +    int ret;
>  
>      if (fd_open(bs) < 0)
>          return -EIO;
> +#if defined(CONFIG_BLKZONED)
> +    if (bs->bl.wps) {
> +        qemu_co_mutex_lock(&bs->bl.wps->colock);
> +    }
> +#endif
>  
>      /*
>       * When using O_DIRECT, the request must be aligned to be able to use
> @@ -2354,14 +2446,16 @@ static int coroutine_fn raw_co_prw(BlockDriverState
> *bs, uint64_t offset,
>      } else if (s->use_linux_io_uring) {
>          LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
>          assert(qiov->size == bytes);
> -        return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
> +        ret = luring_co_submit(bs, aio, s->fd, offset, qiov, type);
> +        goto out;
>  #endif
>  #ifdef CONFIG_LINUX_AIO
>      } else if (s->use_linux_aio) {
>          LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
>          assert(qiov->size == bytes);
> -        return laio_co_submit(bs, aio, s->fd, offset, qiov, type,
> +        ret = laio_co_submit(bs, aio, s->fd, offset, qiov, type,
>                                s->aio_max_batch);
> +        goto out;
>  #endif
>      }
>  
> @@ -2378,7 +2472,32 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs,
> uint64_t offset,
>      };
>  
>      assert(qiov->size == bytes);
> -    return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
> +    ret = raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
> +
> +out:
> +#if defined(CONFIG_BLKZONED)
> +    BlockZoneWps *wps = bs->bl.wps;
> +    if (ret == 0) {
> +        if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
> +            int index = offset / bs->bl.zone_size;

It might be cleaner to define
int64_t *wp = &wps->wp[offset / bs->bl.zone_size];
here instead of the index and use *wp in the subsequent code.

> +            if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
> +                /* Advance the wp if needed */
> +                if (offset + bytes > wps->wp[index]) {
> +                    wps->wp[index] = offset + bytes;
> +                }
> +            }
> +        }
> +    } else {
> +        if (type & QEMU_AIO_WRITE) {
> +            update_zones_wp(s->fd, bs->bl.wps, 0, 1);
> +        }
> +    }
> +
> +    if (wps) {
> +        qemu_co_mutex_unlock(&wps->colock);
> +    }
> +#endif
> +    return ret;
>  }
>  
>  static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
> @@ -2486,6 +2605,11 @@ static void raw_close(BlockDriverState *bs)
>      BDRVRawState *s = bs->opaque;
>  
>      if (s->fd >= 0) {
> +#if defined(CONFIG_BLKZONED)
> +        if (bs->bl.wps) {
> +            g_free(bs->bl.wps);
> +        }
> +#endif
>          qemu_close(s->fd);
>          s->fd = -1;
>      }
> @@ -3285,6 +3409,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState
> *bs, BlockZoneOp op,
>      const char *op_name;
>      unsigned long zo;
>      int ret;
> +    BlockZoneWps *wps = bs->bl.wps;
>      int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
>  
>      zone_size = bs->bl.zone_size;
> @@ -3302,6 +3427,14 @@ static int coroutine_fn
> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>          return -EINVAL;
>      }
>  
> +    qemu_co_mutex_lock(&wps->colock);
> +    uint32_t index = offset / bs->bl.zone_size;
> +    if (BDRV_ZT_IS_CONV(wps->wp[index]) && len != capacity) {

The wps->wp[index] expression is used a lot in this function. Consider defining
int64_t *wp = &wps->wp[index]; to simplify the code.

> +        error_report("zone mgmt operations are not allowed for conventional
> zones");
> +        ret = -EIO;
> +        goto out;
> +    }
> +
>      switch (op) {
>      case BLK_ZO_OPEN:
>          op_name = "BLKOPENZONE";
> @@ -3321,7 +3454,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState
> *bs, BlockZoneOp op,
>          break;
>      default:
>          error_report("Unsupported zone op: 0x%x", op);
> -        return -ENOTSUP;
> +        ret = -ENOTSUP;
> +        goto out;
>      }
>  
>      acb = (RawPosixAIOData) {
> @@ -3339,10 +3473,27 @@ static int coroutine_fn
> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>                          len >> BDRV_SECTOR_BITS);
>      ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
>      if (ret != 0) {
> +        update_zones_wp(s->fd, wps, offset, index);
>          ret = -errno;
>          error_report("ioctl %s failed %d", op_name, ret);
> +        goto out;
>      }
>  
> +    if (zo == BLKRESETZONE && len == capacity) {
> +        for (int i = 0; i < bs->bl.nr_zones; ++i) {
> +            if (!BDRV_ZT_IS_CONV(wps->wp[i])) {
> +                wps->wp[i] = i * bs->bl.zone_size;

This will reset write pointers of all read-only zones that may exist on the
device and make the data stored in those zones unreadable. R/O zones need to be
skipped in this loop.

> +            }
> +        }
> +    } else if (zo == BLKRESETZONE) {
> +        wps->wp[index] = offset;
> +    } else if (zo == BLKFINISHZONE) {
> +        /* The zoned device allows the last zone smaller that the zone size.
> */
> +        wps->wp[index] = offset + len;
> +    }
> +
> +out:
> +    qemu_co_mutex_unlock(&wps->colock);
>      return ret;
>  }
>  #endif
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 1576fcf2ed..93196229ac 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -118,6 +118,14 @@ typedef struct BlockZoneDescriptor {
>      BlockZoneState state;
>  } BlockZoneDescriptor;
>  
> +/*
> + * Track write pointers of a zone in bytes.
> + */
> +typedef struct BlockZoneWps {
> +    CoMutex colock;
> +    uint64_t wp[];
> +} BlockZoneWps;
> +
>  typedef struct BlockDriverInfo {
>      /* in bytes, 0 if irrelevant */
>      int cluster_size;
> @@ -240,6 +248,12 @@ typedef enum {
>  #define BDRV_SECTOR_BITS   9
>  #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
>  
> +/*
> + * Get the first most significant bit of wp. If it is zero, then
> + * the zone type is SWR.
> + */
> +#define BDRV_ZT_IS_CONV(wp)    (wp & (1ULL << 63))
> +
>  #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
>                                             INT_MAX >> BDRV_SECTOR_BITS)
>  #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
> diff --git a/include/block/block_int-common.h b/include/block/block_int-
> common.h
> index 1bd2aef4d5..19915b34af 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -884,6 +884,9 @@ typedef struct BlockLimits {
>  
>      /* maximum number of active zones */
>      int64_t max_active_zones;
> +
> +    /* array of write pointers' location of each zone in the zoned device. */
> +    BlockZoneWps *wps;
>  } BlockLimits;
>  
>  typedef struct BdrvOpBlocker BdrvOpBlocker;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/4] block: add some trace events for zone append
  2023-03-10 10:31 ` [PATCH v6 4/4] block: add some trace events for zone append Sam Li
@ 2023-03-14  2:28   ` Dmitry Fomichev
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Fomichev @ 2023-03-14  2:28 UTC (permalink / raw)
  To: faithilikerun, qemu-devel
  Cc: hreitz, hare, sgarzare, mehta.aaru20, stefanha, fam, qemu-block,
	kwolf, jusual, damien.lemoal

On Fri, 2023-03-10 at 18:31 +0800, Sam Li wrote:
> Signed-off-by: Sam Li <faithilikerun@gmail.com>

Looks good,

Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>

>  block/file-posix.c | 3 +++
>  block/trace-events | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2ba9174778..5187f810e5 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2489,6 +2489,8 @@ out:
>              if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
>                  if (type & QEMU_AIO_ZONE_APPEND) {
>                      *s->offset = wps->wp[index];
> +                    trace_zbd_zone_append_complete(bs, *s->offset
> +                        >> BDRV_SECTOR_BITS);
>                  }
>                  /* Advance the wp if needed */
>                  if (offset + bytes > wps->wp[index]) {
> @@ -3537,6 +3539,7 @@ static int coroutine_fn
> raw_co_zone_append(BlockDriverState *bs,
>          len += iov_len;
>      }
>  
> +    trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
>      return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
>  }
>  #endif
> diff --git a/block/trace-events b/block/trace-events
> index 3f4e1d088a..32665158d6 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found:
> type=%d, version=%d"
>  file_flush_fdatasync_failed(int err) "errno %d"
>  zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report
> %d zones starting at sector offset 0x%" PRIx64 ""
>  zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs
> %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 "
> sectors"
> +zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%"
> PRIx64 ""
> +zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append
> sector 0x%" PRIx64 ""
>  
>  # ssh.c
>  sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int
> sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/4] block: introduce zone append write for zoned devices
  2023-03-10 10:31 ` [PATCH v6 2/4] block: introduce zone append write for zoned devices Sam Li
@ 2023-03-14  2:55   ` Dmitry Fomichev
  2023-03-16 18:56   ` Stefan Hajnoczi
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Fomichev @ 2023-03-14  2:55 UTC (permalink / raw)
  To: faithilikerun, qemu-devel
  Cc: hreitz, hare, sgarzare, mehta.aaru20, stefanha, fam, qemu-block,
	kwolf, jusual, damien.lemoal

On Fri, 2023-03-10 at 18:31 +0800, Sam Li wrote:
> A zone append command is a write operation that specifies the first
> logical block of a zone as the write position. When writing to a zoned
> block device using zone append, the byte offset of writes is pointing
> to the write pointer of that zone.

s/writes is pointing to the write pointer of that zone/the call may point at any
position within the zone to which the data is being appended/

>  Upon completion the device will
> respond with the position the data

s/position the data/position where the data/

>  has been written in the zone.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>

With nits above,

Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>

> ---
>  block/block-backend.c             | 60 +++++++++++++++++++++++++++++++
>  block/file-posix.c                | 54 +++++++++++++++++++++++++---
>  block/io.c                        | 21 +++++++++++
>  block/io_uring.c                  |  4 +++
>  block/linux-aio.c                 |  3 ++
>  block/raw-format.c                |  8 +++++
>  include/block/block-io.h          |  4 +++
>  include/block/block_int-common.h  |  5 +++
>  include/block/raw-aio.h           |  4 ++-
>  include/sysemu/block-backend-io.h |  9 +++++
>  10 files changed, 166 insertions(+), 6 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f70b08e3f6..28e8f5d778 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1888,6 +1888,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk,
> BlockZoneOp op,
>      return &acb->common;
>  }
>  
> +static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
> +{
> +    BlkAioEmAIOCB *acb = opaque;
> +    BlkRwCo *rwco = &acb->rwco;
> +
> +    rwco->ret = blk_co_zone_append(rwco->blk, &acb->bytes,
> +                                   rwco->iobuf, rwco->flags);
> +    blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
> +                                QEMUIOVector *qiov, BdrvRequestFlags flags,
> +                                BlockCompletionFunc *cb, void *opaque) {
> +    BlkAioEmAIOCB *acb;
> +    Coroutine *co;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> +    acb->rwco = (BlkRwCo) {
> +        .blk    = blk,
> +        .ret    = NOT_DONE,
> +        .flags  = flags,
> +        .iobuf  = qiov,
> +    };
> +    acb->bytes = *offset;
> +    acb->has_returned = false;
> +
> +    co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
> +    aio_co_enter(blk_get_aio_context(blk), co);
> +    acb->has_returned = true;
> +    if (acb->rwco.ret != NOT_DONE) {
> +        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> +                                         blk_aio_complete_bh, acb);
> +    }
> +
> +    return &acb->common;
> +}
> +
>  /*
>   * Send a zone_report command.
>   * offset is a byte offset from the start of the device. No alignment
> @@ -1939,6 +1978,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk,
> BlockZoneOp op,
>      return ret;
>  }
>  
> +/*
> + * Send a zone_append command.
> + */
> +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
> +        QEMUIOVector *qiov, BdrvRequestFlags flags)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    if (!blk_is_available(blk)) {
> +        blk_dec_in_flight(blk);
> +        return -ENOMEDIUM;
> +    }
> +
> +    ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 61ed769ac8..2ba9174778 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -160,6 +160,7 @@ typedef struct BDRVRawState {
>      bool has_write_zeroes:1;
>      bool use_linux_aio:1;
>      bool use_linux_io_uring:1;
> +    int64_t *offset; /* offset of zone append operation */
>      int page_cache_inconsistent; /* errno from fdatasync failure */
>      bool has_fallocate;
>      bool needs_alignment;
> @@ -1672,7 +1673,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData
> *aiocb)
>      ssize_t len;
>  
>      len = RETRY_ON_EINTR(
> -        (aiocb->aio_type & QEMU_AIO_WRITE) ?
> +        (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
>              qemu_pwritev(aiocb->aio_fildes,
>                             aiocb->io.iov,
>                             aiocb->io.niov,
> @@ -1701,7 +1702,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData
> *aiocb, char *buf)
>      ssize_t len;
>  
>      while (offset < aiocb->aio_nbytes) {
> -        if (aiocb->aio_type & QEMU_AIO_WRITE) {
> +        if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
>              len = pwrite(aiocb->aio_fildes,
>                           (const char *)buf + offset,
>                           aiocb->aio_nbytes - offset,
> @@ -1794,7 +1795,7 @@ static int handle_aiocb_rw(void *opaque)
>      }
>  
>      nbytes = handle_aiocb_rw_linear(aiocb, buf);
> -    if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
> +    if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
>          char *p = buf;
>          size_t count = aiocb->aio_nbytes, copy;
>          int i;
> @@ -2431,6 +2432,10 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs,
> uint64_t offset,
>  #if defined(CONFIG_BLKZONED)
>      if (bs->bl.wps) {
>          qemu_co_mutex_lock(&bs->bl.wps->colock);
> +        if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
> +            int index = offset / bs->bl.zone_size;
> +            offset = bs->bl.wps->wp[index];
> +        }
>      }
>  #endif
>  
> @@ -2478,9 +2483,13 @@ out:
>  #if defined(CONFIG_BLKZONED)
>      BlockZoneWps *wps = bs->bl.wps;
>      if (ret == 0) {
> -        if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
> +        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
> +            && wps && bs->bl.zone_size) {
>              int index = offset / bs->bl.zone_size;
>              if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
> +                if (type & QEMU_AIO_ZONE_APPEND) {
> +                    *s->offset = wps->wp[index];
> +                }
>                  /* Advance the wp if needed */
>                  if (offset + bytes > wps->wp[index]) {
>                      wps->wp[index] = offset + bytes;
> @@ -2488,7 +2497,7 @@ out:
>              }
>          }
>      } else {
> -        if (type & QEMU_AIO_WRITE) {
> +        if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
>              update_zones_wp(s->fd, bs->bl.wps, 0, 1);
>          }
>      }
> @@ -3498,6 +3507,40 @@ out:
>  }
>  #endif
>  
> +#if defined(CONFIG_BLKZONED)
> +static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
> +                                           int64_t *offset,
> +                                           QEMUIOVector *qiov,
> +                                           BdrvRequestFlags flags) {
> +    assert(flags == 0);
> +    int64_t zone_size_mask = bs->bl.zone_size - 1;
> +    int64_t iov_len = 0;
> +    int64_t len = 0;
> +    BDRVRawState *s = bs->opaque;
> +    s->offset = offset;
> +
> +    if (*offset & zone_size_mask) {
> +        error_report("sector offset %" PRId64 " is not aligned to zone size "
> +                     "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
> +        return -EINVAL;
> +    }
> +
> +    int64_t wg = bs->bl.write_granularity;
> +    int64_t wg_mask = wg - 1;
> +    for (int i = 0; i < qiov->niov; i++) {
> +        iov_len = qiov->iov[i].iov_len;
> +        if (iov_len & wg_mask) {
> +            error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
> +                         "block size %" PRId64 "", i, iov_len, wg);
> +            return -EINVAL;
> +        }
> +        len += iov_len;
> +    }
> +
> +    return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
> +}
> +#endif
> +
>  static coroutine_fn int
>  raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                  bool blkdev)
> @@ -4259,6 +4302,7 @@ static BlockDriver bdrv_host_device = {
>      /* zone management operations */
>      .bdrv_co_zone_report = raw_co_zone_report,
>      .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +    .bdrv_co_zone_append = raw_co_zone_append,
>  #endif
>  };
>  
> diff --git a/block/io.c b/block/io.c
> index 5dbf1e50f2..fe9cabaaf6 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3152,6 +3152,27 @@ out:
>      return co.ret;
>  }
>  
> +int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
> +                        QEMUIOVector *qiov,
> +                        BdrvRequestFlags flags)
> +{
> +    BlockDriver *drv = bs->drv;
> +    CoroutineIOCompletion co = {
> +            .coroutine = qemu_coroutine_self(),
> +    };
> +    IO_CODE();
> +
> +    bdrv_inc_in_flight(bs);
> +    if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
> +        co.ret = -ENOTSUP;
> +        goto out;
> +    }
> +    co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
> +out:
> +    bdrv_dec_in_flight(bs);
> +    return co.ret;
> +}
> +
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
>      IO_CODE();
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 973e15d876..f7488c241a 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -345,6 +345,10 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb,
> LuringState *s,
>          io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
>                               luringcb->qiov->niov, offset);
>          break;
> +    case QEMU_AIO_ZONE_APPEND:
> +        io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
> +                             luringcb->qiov->niov, offset);
> +        break;
>      case QEMU_AIO_READ:
>          io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
>                              luringcb->qiov->niov, offset);
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index d2cfb7f523..1959834156 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -389,6 +389,9 @@ static int laio_do_submit(int fd, struct qemu_laiocb
> *laiocb, off_t offset,
>      case QEMU_AIO_WRITE:
>          io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
>          break;
> +    case QEMU_AIO_ZONE_APPEND:
> +        io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
> +        break;
>      case QEMU_AIO_READ:
>          io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
>          break;
> diff --git a/block/raw-format.c b/block/raw-format.c
> index 72e23e7b55..64e7d48d04 100644
> --- a/block/raw-format.c
> +++ b/block/raw-format.c
> @@ -332,6 +332,13 @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>      return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
>  }
>  
> +static int coroutine_fn GRAPH_RDLOCK
> +raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
> +                   BdrvRequestFlags flags)
> +{
> +    return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
> +}
> +
>  static int64_t coroutine_fn GRAPH_RDLOCK
>  raw_co_getlength(BlockDriverState *bs)
>  {
> @@ -635,6 +642,7 @@ BlockDriver bdrv_raw = {
>      .bdrv_co_pdiscard     = &raw_co_pdiscard,
>      .bdrv_co_zone_report  = &raw_co_zone_report,
>      .bdrv_co_zone_mgmt  = &raw_co_zone_mgmt,
> +    .bdrv_co_zone_append = &raw_co_zone_append,
>      .bdrv_co_block_status = &raw_co_block_status,
>      .bdrv_co_copy_range_from = &raw_co_copy_range_from,
>      .bdrv_co_copy_range_to  = &raw_co_copy_range_to,
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index 19d1fad9cf..55fca02991 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -120,6 +120,10 @@ int coroutine_fn GRAPH_RDLOCK
> bdrv_co_zone_report(BlockDriverState *bs,
>  int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
>                                                  BlockZoneOp op,
>                                                  int64_t offset, int64_t len);
> +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
> +                                                  int64_t *offset,
> +                                                  QEMUIOVector *qiov,
> +                                                  BdrvRequestFlags flags);
>  
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> diff --git a/include/block/block_int-common.h b/include/block/block_int-
> common.h
> index 19915b34af..ccd8811919 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -724,6 +724,9 @@ struct BlockDriver {
>              BlockZoneDescriptor *zones);
>      int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp
> op,
>              int64_t offset, int64_t len);
> +    int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
> +            int64_t *offset, QEMUIOVector *qiov,
> +            BdrvRequestFlags flags);
>  
>      /* removable device specific */
>      bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
> @@ -887,6 +890,8 @@ typedef struct BlockLimits {
>  
>      /* array of write pointers' location of each zone in the zoned device. */
>      BlockZoneWps *wps;
> +
> +    int64_t write_granularity;
>  } BlockLimits;
>  
>  typedef struct BdrvOpBlocker BdrvOpBlocker;
> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> index eda6a7a253..fb9c9f5a01 100644
> --- a/include/block/raw-aio.h
> +++ b/include/block/raw-aio.h
> @@ -30,6 +30,7 @@
>  #define QEMU_AIO_TRUNCATE     0x0080
>  #define QEMU_AIO_ZONE_REPORT  0x0100
>  #define QEMU_AIO_ZONE_MGMT    0x0200
> +#define QEMU_AIO_ZONE_APPEND  0x0400
>  #define QEMU_AIO_TYPE_MASK \
>          (QEMU_AIO_READ | \
>           QEMU_AIO_WRITE | \
> @@ -40,7 +41,8 @@
>           QEMU_AIO_COPY_RANGE | \
>           QEMU_AIO_TRUNCATE | \
>           QEMU_AIO_ZONE_REPORT | \
> -         QEMU_AIO_ZONE_MGMT)
> +         QEMU_AIO_ZONE_MGMT | \
> +         QEMU_AIO_ZONE_APPEND)
>  
>  /* AIO flags */
>  #define QEMU_AIO_MISALIGNED   0x1000
> diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-
> io.h
> index f575ab5b6b..e716591a1a 100644
> --- a/include/sysemu/block-backend-io.h
> +++ b/include/sysemu/block-backend-io.h
> @@ -53,6 +53,9 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t
> offset,
>  BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
>                                int64_t offset, int64_t len,
>                                BlockCompletionFunc *cb, void *opaque);
> +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
> +                                QEMUIOVector *qiov, BdrvRequestFlags flags,
> +                                BlockCompletionFunc *cb, void *opaque);
>  BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
>                               BlockCompletionFunc *cb, void *opaque);
>  void blk_aio_cancel_async(BlockAIOCB *acb);
> @@ -201,6 +204,12 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk,
> BlockZoneOp op,
>                                    int64_t offset, int64_t len);
>  int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
>                                         int64_t offset, int64_t len);
> +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
> +                                    QEMUIOVector *qiov,
> +                                    BdrvRequestFlags flags);
> +int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
> +                                         QEMUIOVector *qiov,
> +                                         BdrvRequestFlags flags);
>  
>  int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
>                                    int64_t bytes);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/4] file-posix: add tracking of the zone write pointers
  2023-03-14  2:23   ` Dmitry Fomichev
@ 2023-03-14  3:49     ` Damien Le Moal
  2023-03-15 12:59       ` Sam Li
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2023-03-14  3:49 UTC (permalink / raw)
  To: Dmitry Fomichev, faithilikerun, qemu-devel
  Cc: hreitz, hare, sgarzare, mehta.aaru20, stefanha, fam, qemu-block,
	kwolf, jusual

On 3/14/23 11:23, Dmitry Fomichev wrote:
>> @@ -3339,10 +3473,27 @@ static int coroutine_fn
>> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>>                          len >> BDRV_SECTOR_BITS);
>>      ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
>>      if (ret != 0) {
>> +        update_zones_wp(s->fd, wps, offset, index);
>>          ret = -errno;
>>          error_report("ioctl %s failed %d", op_name, ret);
>> +        goto out;
>>      }
>>  
>> +    if (zo == BLKRESETZONE && len == capacity) {
>> +        for (int i = 0; i < bs->bl.nr_zones; ++i) {
>> +            if (!BDRV_ZT_IS_CONV(wps->wp[i])) {
>> +                wps->wp[i] = i * bs->bl.zone_size;
> 
> This will reset write pointers of all read-only zones that may exist on the
> device and make the data stored in those zones unreadable. R/O zones need to be
> skipped in this loop.

And offline zones need to be skipped as well.

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/4] file-posix: add tracking of the zone write pointers
  2023-03-14  3:49     ` Damien Le Moal
@ 2023-03-15 12:59       ` Sam Li
  2023-03-15 21:23         ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Sam Li @ 2023-03-15 12:59 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Dmitry Fomichev, qemu-devel, hreitz, hare, sgarzare,
	mehta.aaru20, stefanha, fam, qemu-block, kwolf, jusual

Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2023年3月14日周二 11:49写道:
>
> On 3/14/23 11:23, Dmitry Fomichev wrote:
> >> @@ -3339,10 +3473,27 @@ static int coroutine_fn
> >> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> >>                          len >> BDRV_SECTOR_BITS);
> >>      ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> >>      if (ret != 0) {
> >> +        update_zones_wp(s->fd, wps, offset, index);
> >>          ret = -errno;
> >>          error_report("ioctl %s failed %d", op_name, ret);
> >> +        goto out;
> >>      }
> >>
> >> +    if (zo == BLKRESETZONE && len == capacity) {
> >> +        for (int i = 0; i < bs->bl.nr_zones; ++i) {
> >> +            if (!BDRV_ZT_IS_CONV(wps->wp[i])) {
> >> +                wps->wp[i] = i * bs->bl.zone_size;
> >
> > This will reset write pointers of all read-only zones that may exist on the
> > device and make the data stored in those zones unreadable. R/O zones need to be
> > skipped in this loop.
>
> And offline zones need to be skipped as well.

I see. That can be done thanks to get_zones_wp() which can show the
state of the zone at specific position.

Sam


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/4] file-posix: add tracking of the zone write pointers
  2023-03-15 12:59       ` Sam Li
@ 2023-03-15 21:23         ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2023-03-15 21:23 UTC (permalink / raw)
  To: Sam Li
  Cc: Dmitry Fomichev, qemu-devel, hreitz, hare, sgarzare,
	mehta.aaru20, stefanha, fam, qemu-block, kwolf, jusual

On 3/15/23 21:59, Sam Li wrote:
> Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2023年3月14日周二 11:49写道:
>>
>> On 3/14/23 11:23, Dmitry Fomichev wrote:
>>>> @@ -3339,10 +3473,27 @@ static int coroutine_fn
>>>> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>>>>                          len >> BDRV_SECTOR_BITS);
>>>>      ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
>>>>      if (ret != 0) {
>>>> +        update_zones_wp(s->fd, wps, offset, index);
>>>>          ret = -errno;
>>>>          error_report("ioctl %s failed %d", op_name, ret);
>>>> +        goto out;
>>>>      }
>>>>
>>>> +    if (zo == BLKRESETZONE && len == capacity) {
>>>> +        for (int i = 0; i < bs->bl.nr_zones; ++i) {
>>>> +            if (!BDRV_ZT_IS_CONV(wps->wp[i])) {
>>>> +                wps->wp[i] = i * bs->bl.zone_size;
>>>
>>> This will reset write pointers of all read-only zones that may exist on the
>>> device and make the data stored in those zones unreadable. R/O zones need to be
>>> skipped in this loop.
>>
>> And offline zones need to be skipped as well.
> 
> I see. That can be done thanks to get_zones_wp() which can show the
> state of the zone at specific position.

I do not think so: a zone wp is invalid for read-only and offline zones. So you
cannot rely on the wp value to detect these states. Even a valid wp value would
not tell you if the zone is read only or offline anyway. You need to track these
states with flags set when doing the first report zone on startup and when doing
a report zone after an IO error.

> 
> Sam

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/4] file-posix: add tracking of the zone write pointers
  2023-03-10 10:31 ` [PATCH v6 1/4] file-posix: add tracking of the zone write pointers Sam Li
  2023-03-14  2:23   ` Dmitry Fomichev
@ 2023-03-16 18:51   ` Stefan Hajnoczi
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-03-16 18:51 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Fri, Mar 10, 2023 at 06:31:03PM +0800, Sam Li wrote:
> @@ -2338,9 +2424,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
>  {
>      BDRVRawState *s = bs->opaque;
>      RawPosixAIOData acb;
> +    int ret;
>  
>      if (fd_open(bs) < 0)
>          return -EIO;
> +#if defined(CONFIG_BLKZONED)
> +    if (bs->bl.wps) {
> +        qemu_co_mutex_lock(&bs->bl.wps->colock);
> +    }
> +#endif

Is the lock only needed by QEMU_AIO_WRITE requests? If yes, can we skip
it for other request types to avoid serializing those requests?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/4] block: introduce zone append write for zoned devices
  2023-03-10 10:31 ` [PATCH v6 2/4] block: introduce zone append write for zoned devices Sam Li
  2023-03-14  2:55   ` Dmitry Fomichev
@ 2023-03-16 18:56   ` Stefan Hajnoczi
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-03-16 18:56 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta

[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]

On Fri, Mar 10, 2023 at 06:31:04PM +0800, Sam Li wrote:
> A zone append command is a write operation that specifies the first
> logical block of a zone as the write position. When writing to a zoned
> block device using zone append, the byte offset of writes is pointing
> to the write pointer of that zone. Upon completion the device will
> respond with the position the data has been written in the zone.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/block-backend.c             | 60 +++++++++++++++++++++++++++++++
>  block/file-posix.c                | 54 +++++++++++++++++++++++++---
>  block/io.c                        | 21 +++++++++++
>  block/io_uring.c                  |  4 +++
>  block/linux-aio.c                 |  3 ++
>  block/raw-format.c                |  8 +++++
>  include/block/block-io.h          |  4 +++
>  include/block/block_int-common.h  |  5 +++
>  include/block/raw-aio.h           |  4 ++-
>  include/sysemu/block-backend-io.h |  9 +++++
>  10 files changed, 166 insertions(+), 6 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f70b08e3f6..28e8f5d778 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1888,6 +1888,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
>      return &acb->common;
>  }
>  
> +static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
> +{
> +    BlkAioEmAIOCB *acb = opaque;
> +    BlkRwCo *rwco = &acb->rwco;
> +
> +    rwco->ret = blk_co_zone_append(rwco->blk, &acb->bytes,
> +                                   rwco->iobuf, rwco->flags);
> +    blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
> +                                QEMUIOVector *qiov, BdrvRequestFlags flags,
> +                                BlockCompletionFunc *cb, void *opaque) {
> +    BlkAioEmAIOCB *acb;
> +    Coroutine *co;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> +    acb->rwco = (BlkRwCo) {
> +        .blk    = blk,
> +        .ret    = NOT_DONE,
> +        .flags  = flags,
> +        .iobuf  = qiov,
> +    };
> +    acb->bytes = *offset;
> +    acb->has_returned = false;
> +
> +    co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
> +    aio_co_enter(blk_get_aio_context(blk), co);
> +    acb->has_returned = true;
> +    if (acb->rwco.ret != NOT_DONE) {
> +        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> +                                         blk_aio_complete_bh, acb);
> +    }
> +
> +    return &acb->common;
> +}

How is the resulting offset value communicated back to the caller? I
see offset being read (dereferenced) but there is no write (assignment).
Maybe this function should pass through acb->bytes = (int64_t)offset
instead so that blk_co_zone_append() can modify the offset?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 3/4] qemu-iotests: test zone append operation
  2023-03-10 10:31 ` [PATCH v6 3/4] qemu-iotests: test zone append operation Sam Li
@ 2023-03-16 18:59   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-03-16 18:59 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, Kevin Wolf, Fam Zheng, dmitry.fomichev, qemu-block,
	hare, Julia Suvorova, Stefano Garzarella, Hanna Reitz,
	damien.lemoal, Aarushi Mehta

[-- Attachment #1: Type: text/plain, Size: 5001 bytes --]

On Fri, Mar 10, 2023 at 06:31:05PM +0800, Sam Li wrote:
> This tests is mainly a helper to indicate append writes in block layer
> behaves as expected.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  qemu-io-cmds.c                     | 65 ++++++++++++++++++++++++++++++
>  tests/qemu-iotests/tests/zoned.out |  7 ++++
>  tests/qemu-iotests/tests/zoned.sh  |  9 +++++
>  3 files changed, 81 insertions(+)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index f35ea627d7..4159f41ab9 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1874,6 +1874,70 @@ static const cmdinfo_t zone_reset_cmd = {
>      .oneline = "reset a zone write pointer in zone block device",
>  };
>  
> +static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
> +                              int64_t *offset, int flags, int *total)
> +{
> +    int async_ret = NOT_DONE;
> +
> +    blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
> +    while (async_ret == NOT_DONE) {
> +        main_loop_wait(false);
> +    }
> +
> +    *total = qiov->size;
> +    return async_ret < 0 ? async_ret : 1;
> +}
> +
> +static int zone_append_f(BlockBackend *blk, int argc, char **argv)
> +{
> +    int ret;
> +    int flags = 0;
> +    int total = 0;
> +    int64_t offset;
> +    char *buf;
> +    int nr_iov;
> +    int pattern = 0xcd;
> +    QEMUIOVector qiov;
> +
> +    if (optind > argc - 2) {
> +        return -EINVAL;
> +    }
> +    optind++;
> +    offset = cvtnum(argv[optind]);
> +    if (offset < 0) {
> +        print_cvtnum_err(offset, argv[optind]);
> +        return offset;
> +    }
> +    optind++;
> +    nr_iov = argc - optind;
> +    buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
> +                       flags & BDRV_REQ_REGISTERED_BUF);
> +    if (buf == NULL) {
> +        return -EINVAL;
> +    }
> +    ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
> +    if (ret < 0) {
> +        printf("zone append failed: %s\n", strerror(-ret));
> +        goto out;
> +    }

How about a -p option that prints the value of offset after the
operation completes? That way the test case can check that
blk_aio_zone_append() produces the right offset value.

(The tests should also check zone_report output, but they should verify
that offset is correctly updated by zone_append too.)

> +
> +out:
> +    qemu_io_free(blk, buf, qiov.size,
> +                 flags & BDRV_REQ_REGISTERED_BUF);
> +    qemu_iovec_destroy(&qiov);
> +    return ret;
> +}
> +
> +static const cmdinfo_t zone_append_cmd = {
> +    .name = "zone_append",
> +    .altname = "zap",
> +    .cfunc = zone_append_f,
> +    .argmin = 3,
> +    .argmax = 3,
> +    .args = "offset len [len..]",
> +    .oneline = "append write a number of bytes at a specified offset",
> +};
> +
>  static int truncate_f(BlockBackend *blk, int argc, char **argv);
>  static const cmdinfo_t truncate_cmd = {
>      .name       = "truncate",
> @@ -2672,6 +2736,7 @@ static void __attribute((constructor)) init_qemuio_commands(void)
>      qemuio_add_command(&zone_close_cmd);
>      qemuio_add_command(&zone_finish_cmd);
>      qemuio_add_command(&zone_reset_cmd);
> +    qemuio_add_command(&zone_append_cmd);
>      qemuio_add_command(&truncate_cmd);
>      qemuio_add_command(&length_cmd);
>      qemuio_add_command(&info_cmd);
> diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
> index 0c8f96deb9..b3b139b4ec 100644
> --- a/tests/qemu-iotests/tests/zoned.out
> +++ b/tests/qemu-iotests/tests/zoned.out
> @@ -50,4 +50,11 @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
>  (5) resetting the second zone
>  After resetting a zone:
>  start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
> +
> +
> +(6) append write
> +After appending the first zone:
> +start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
> +After appending the second zone:
> +start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
>  *** done
> diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh
> index 9d7c15dde6..6c3ded6c4c 100755
> --- a/tests/qemu-iotests/tests/zoned.sh
> +++ b/tests/qemu-iotests/tests/zoned.sh
> @@ -79,6 +79,15 @@ echo "(5) resetting the second zone"
>  sudo $QEMU_IO $IMG -c "zrs 268435456 268435456"
>  echo "After resetting a zone:"
>  sudo $QEMU_IO $IMG -c "zrp 268435456 1"
> +echo
> +echo
> +echo "(6) append write" # physical block size of the device is 4096
> +sudo $QEMU_IO $IMG -c "zap 0 0x1000 0x2000"
> +echo "After appending the first zone:"
> +sudo $QEMU_IO $IMG -c "zrp 0 1"
> +sudo $QEMU_IO $IMG -c "zap 268435456 0x1000 0x2000"
> +echo "After appending the second zone:"
> +sudo $QEMU_IO $IMG -c "zrp 268435456 1"
>  
>  # success, all done
>  echo "*** done"
> -- 
> 2.39.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-03-16 19:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10 10:31 [PATCH v6 0/4] Add zone append write for zoned device Sam Li
2023-03-10 10:31 ` [PATCH v6 1/4] file-posix: add tracking of the zone write pointers Sam Li
2023-03-14  2:23   ` Dmitry Fomichev
2023-03-14  3:49     ` Damien Le Moal
2023-03-15 12:59       ` Sam Li
2023-03-15 21:23         ` Damien Le Moal
2023-03-16 18:51   ` Stefan Hajnoczi
2023-03-10 10:31 ` [PATCH v6 2/4] block: introduce zone append write for zoned devices Sam Li
2023-03-14  2:55   ` Dmitry Fomichev
2023-03-16 18:56   ` Stefan Hajnoczi
2023-03-10 10:31 ` [PATCH v6 3/4] qemu-iotests: test zone append operation Sam Li
2023-03-16 18:59   ` Stefan Hajnoczi
2023-03-10 10:31 ` [PATCH v6 4/4] block: add some trace events for zone append Sam Li
2023-03-14  2:28   ` Dmitry Fomichev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.