All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/7] Add persistence to NVMe ZNS emulation
@ 2023-11-27  8:56 Sam Li
  2023-11-27  8:56 ` [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Sam Li
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

ZNS emulation follows NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs and
the qcow2 img as backing file. It is the second part after the patches
- adding full zoned storage emulation to qcow2 driver.
https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilikerun@gmail.com/

The metadata of ZNS emulation divides into two parts, zone metadata and
zone descriptor extension data. The zone metadata is composed of zone
states, zone type, wp and zone attributes. The zone information can be
stored at an uint64_t wp to save space and easy access. The structure of
wp of each zone is as follows:
|0000(4)| zone type (1)| zone attr (8)| wp (51) ||

The zone descriptor extension data is relatively small comparing to the
overall size therefore we adopt the option that store zded of all zones
in an array regardless of the valid bit set.

Creating a zns format qcow2 image file adds one more option zd_extension_size
to zoned device configurations.

To attach this file as emulated zns drive in the command line of QEMU, use:
  -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
  -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \

Sorry, send this one more time due to network problems.

v1->v2:
- split [v1 2/5] patch to three (doc, config, block layer API)
- adapt qcow2 v6

Sam Li (7):
  docs/qcow2: add zd_extension_size option to the zoned format feature
  qcow2: add zd_extension configurations to zoned metadata
  hw/nvme: use blk_get_*() to access zone info in the block layer
  hw/nvme: add blk_get_zone_extension to access zd_extensions
  hw/nvme: make the metadata of ZNS emulation persistent
  hw/nvme: refactor zone append write using block layer APIs
  hw/nvme: make ZDED persistent

 block/block-backend.c             |   88 ++
 block/qcow2.c                     |  119 ++-
 block/qcow2.h                     |    2 +
 docs/interop/qcow2.txt            |    3 +
 hw/nvme/ctrl.c                    | 1247 ++++++++---------------------
 hw/nvme/ns.c                      |  162 +---
 hw/nvme/nvme.h                    |   95 +--
 include/block/block-common.h      |    9 +
 include/block/block_int-common.h  |    8 +
 include/sysemu/block-backend-io.h |   11 +
 include/sysemu/dma.h              |    3 +
 qapi/block-core.json              |    4 +
 system/dma-helpers.c              |   17 +
 13 files changed, 647 insertions(+), 1121 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-30 10:05   ` Markus Armbruster
  2023-11-27  8:56 ` [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata Sam Li
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 docs/interop/qcow2.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 0f1938f056..458d05371a 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -428,6 +428,9 @@ The fields of the zoned extension are:
                    The offset of zoned metadata structure in the contained
                    image, in bytes.
 
+          44 - 51:  zd_extension_size
+                    The size of zone descriptor extension data in bytes.
+
 == Full disk encryption header pointer ==
 
 The full disk encryption header must be present if, and only if, the
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
  2023-11-27  8:56 ` [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-30 10:12   ` Markus Armbruster
  2023-11-27  8:56 ` [RFC v2 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer Sam Li
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

Zone descriptor data is host definied data that is associated with
each zone. Add zone descriptor extensions to zonedmeta struct.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/qcow2.c                    | 69 +++++++++++++++++++++++++++++---
 block/qcow2.h                    |  2 +
 include/block/block_int-common.h |  6 +++
 qapi/block-core.json             |  4 ++
 4 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 26f2bb4a87..75dff27216 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -354,7 +354,8 @@ static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs)
 {
     int ret;
     BDRVQcow2State *s = bs->opaque;
-    uint64_t wps_size = s->zoned_header.zonedmeta_size;
+    uint64_t wps_size = s->zoned_header.zonedmeta_size -
+        s->zded_size;
     g_autofree uint64_t *temp = NULL;
     temp = g_new(uint64_t, wps_size);
     ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
@@ -364,7 +365,17 @@ static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs)
         return ret;
     }
 
+    g_autofree uint8_t *zded = NULL;
+    zded = g_try_malloc0(s->zded_size);
+    ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset + wps_size,
+                     s->zded_size, zded, 0);
+    if (ret < 0) {
+        error_report("Can not read zded");
+        return ret;
+    }
+
     memcpy(bs->wps->wp, temp, wps_size);
+    memcpy(bs->zd_extensions, zded, s->zded_size);
     return 0;
 }
 
@@ -390,6 +401,19 @@ qcow2_check_zone_options(Qcow2ZonedHeaderExtension *zone_opt)
             return false;
         }
 
+        if (zone_opt->zd_extension_size) {
+            if (zone_opt->zd_extension_size & 0x3f) {
+                error_report("zone descriptor extension size must be a "
+                             "multiple of 64B");
+                return false;
+            }
+
+            if ((zone_opt->zd_extension_size >> 6) > 0xff) {
+                error_report("Zone descriptor extension size is too large");
+                return false;
+            }
+        }
+
         if (zone_opt->max_active_zones > zone_opt->nr_zones) {
             error_report("Max_active_zones %" PRIu32 " exceeds "
                          "nr_zones %" PRIu32". Set it to nr_zones.",
@@ -676,6 +700,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
             zoned_ext.conventional_zones =
                 be32_to_cpu(zoned_ext.conventional_zones);
             zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+            zoned_ext.zd_extension_size =
+                be32_to_cpu(zoned_ext.zd_extension_size);
             zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
             zoned_ext.max_active_zones =
                 be32_to_cpu(zoned_ext.max_active_zones);
@@ -686,7 +712,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
             zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
             s->zoned_header = zoned_ext;
             bs->wps = g_malloc(sizeof(BlockZoneWps)
-                + s->zoned_header.zonedmeta_size);
+                + zoned_ext.zonedmeta_size - s->zded_size);
+            bs->zd_extensions = g_malloc0(s->zded_size);
             ret = qcow2_refresh_zonedmeta(bs);
             if (ret < 0) {
                 error_setg_errno(errp, -ret, "zonedmeta: "
@@ -2264,6 +2291,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.zone_size = s->zoned_header.zone_size;
     bs->bl.zone_capacity = s->zoned_header.zone_capacity;
     bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+    bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 }
 
 static int GRAPH_UNLOCKED
@@ -3534,6 +3562,8 @@ int qcow2_update_header(BlockDriverState *bs)
             .conventional_zones =
                 cpu_to_be32(s->zoned_header.conventional_zones),
             .nr_zones           = cpu_to_be32(s->zoned_header.nr_zones),
+            .zd_extension_size  =
+                cpu_to_be32(s->zoned_header.zd_extension_size),
             .max_open_zones     = cpu_to_be32(s->zoned_header.max_open_zones),
             .max_active_zones   =
                 cpu_to_be32(s->zoned_header.max_active_zones),
@@ -4287,6 +4317,15 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         }
         s->zoned_header.max_append_bytes = zone_host_managed->max_append_bytes;
 
+        uint64_t zded_size = 0;
+        if (zone_host_managed->has_descriptor_extension_size) {
+            s->zoned_header.zd_extension_size =
+                zone_host_managed->descriptor_extension_size;
+            zded_size = s->zoned_header.zd_extension_size *
+                bs->bl.nr_zones;
+        }
+        s->zded_size = zded_size;
+
         if (!qcow2_check_zone_options(&s->zoned_header)) {
             s->zoned_header.zoned = BLK_Z_NONE;
             ret = -EINVAL;
@@ -4294,7 +4333,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         }
 
         uint32_t nrz = s->zoned_header.nr_zones;
-        zoned_meta_size =  sizeof(uint64_t) * nrz;
+        zoned_meta_size =  sizeof(uint64_t) * nrz + zded_size;
         g_autofree uint64_t *meta = NULL;
         meta = g_new0(uint64_t, nrz);
 
@@ -4326,11 +4365,24 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
             error_setg_errno(errp, -ret, "Could not zero fill zoned metadata");
             goto out;
         }
-        ret = bdrv_pwrite(blk_bs(blk)->file, offset, zoned_meta_size, meta, 0);
+
+        ret = bdrv_pwrite(blk_bs(blk)->file, offset,
+                          zoned_meta_size - zded_size, meta, 0);
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Could not write zoned metadata "
                                          "to disk");
         }
+
+        if (zone_host_managed->has_descriptor_extension_size) {
+            /* Initialize zone descriptor extensions */
+            ret = bdrv_co_pwrite_zeroes(blk_bs(blk)->file, offset + zded_size,
+                                        zded_size, 0);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Could not write zone descriptor"
+                                             "extensions to disk");
+                goto out;
+            }
+        }
     } else {
         s->zoned_header.zoned = BLK_Z_NONE;
     }
@@ -4472,6 +4524,7 @@ qcow2_co_create_opts(BlockDriver *drv, const char *filename, QemuOpts *opts,
         { BLOCK_OPT_MAX_OPEN_ZONES,     "zone.max-open-zones" },
         { BLOCK_OPT_MAX_ACTIVE_ZONES,   "zone.max-active-zones" },
         { BLOCK_OPT_MAX_APPEND_BYTES,   "zone.max-append-bytes" },
+        { BLOCK_OPT_ZD_EXT_SIZE,        "zone.descriptor-extension-size" },
         { NULL, NULL },
     };
 
@@ -7061,7 +7114,13 @@ static QemuOptsList qcow2_create_opts = {
             .name = BLOCK_OPT_MAX_OPEN_ZONES,                           \
             .type = QEMU_OPT_NUMBER,                                    \
             .help = "max open zones",                                   \
-        },
+        },                                                              \
+            {                                                           \
+            .name = BLOCK_OPT_ZD_EXT_SIZE,                              \
+            .type = QEMU_OPT_SIZE,                                      \
+            .help = "zone descriptor extension size (defaults "         \
+                    "to 0, must be a multiple of 64 bytes)",            \
+        },                                                              \
         QCOW_COMMON_OPTIONS,
         { /* end of list */ }
     }
diff --git a/block/qcow2.h b/block/qcow2.h
index 7f37bb4034..b7a8f4f4b6 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -249,6 +249,7 @@ typedef struct Qcow2ZonedHeaderExtension {
     uint32_t max_append_bytes;
     uint64_t zonedmeta_size;
     uint64_t zonedmeta_offset;
+    uint32_t zd_extension_size; /* must be multiple of 64 B */
 } QEMU_PACKED Qcow2ZonedHeaderExtension;
 
 typedef struct Qcow2ZoneListEntry {
@@ -456,6 +457,7 @@ typedef struct BDRVQcow2State {
     uint32_t nr_zones_exp_open;
     uint32_t nr_zones_imp_open;
     uint32_t nr_zones_closed;
+    uint64_t zded_size;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 0d231bd1f7..c649f1ca75 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -64,6 +64,7 @@
 #define BLOCK_OPT_MAX_APPEND_BYTES      "zone.max_append_bytes"
 #define BLOCK_OPT_MAX_ACTIVE_ZONES      "zone.max_active_zones"
 #define BLOCK_OPT_MAX_OPEN_ZONES        "zone.max_open_zones"
+#define BLOCK_OPT_ZD_EXT_SIZE        "zd_extension_size"
 
 #define BLOCK_PROBE_BUF_SIZE        512
 
@@ -912,6 +913,9 @@ typedef struct BlockLimits {
     uint32_t max_active_zones;
 
     uint32_t write_granularity;
+
+    /* size of data that is associated with a zone in bytes */
+    uint32_t zd_extension_size;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
@@ -1270,6 +1274,8 @@ struct BlockDriverState {
 
     /* array of write pointers' location of each zone in the zoned device. */
     BlockZoneWps *wps;
+
+    uint8_t *zd_extensions;
 };
 
 struct BlockBackendRootState {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ef98dc83a0..a7f238371c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5074,12 +5074,16 @@
 #     append request that can be issued to the device.  It must be
 #     512-byte aligned
 #
+# @descriptor-extension-size: The size of zone descriptor extension
+#     data. Must be a multiple of 64 bytes (since 8.2)
+#
 # Since 8.2
 ##
 { 'struct': 'Qcow2ZoneHostManaged',
   'data': { '*size':          'size',
             '*capacity':      'size',
             '*conventional-zones': 'uint32',
+            '*descriptor-extension-size':  'size',
             '*max-open-zones':     'uint32',
             '*max-active-zones':   'uint32',
             '*max-append-bytes':   'uint32' } }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
  2023-11-27  8:56 ` [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Sam Li
  2023-11-27  8:56 ` [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-27  8:56 ` [RFC v2 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions Sam Li
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

The zone information is contained in the BlockLimits fileds. Add blk_get_*() functions
to access the block layer and update zone info accessing in the NVMe device emulation.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/block-backend.c             | 72 +++++++++++++++++++++++++++++++
 hw/nvme/ctrl.c                    | 34 +++++----------
 hw/nvme/ns.c                      | 61 ++++++++------------------
 hw/nvme/nvme.h                    |  3 --
 include/sysemu/block-backend-io.h |  9 ++++
 5 files changed, 111 insertions(+), 68 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index ec21148806..666df9cfea 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2380,6 +2380,78 @@ int blk_get_max_iov(BlockBackend *blk)
     return blk->root->bs->bl.max_iov;
 }
 
+uint8_t blk_get_zone_model(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+    return bs ? bs->bl.zoned: 0;
+
+}
+
+uint32_t blk_get_zone_size(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.zone_size : 0;
+}
+
+uint32_t blk_get_zone_capacity(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.zone_capacity : 0;
+}
+
+uint32_t blk_get_max_open_zones(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.max_open_zones : 0;
+}
+
+uint32_t blk_get_max_active_zones(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.max_active_zones : 0;
+}
+
+uint32_t blk_get_max_append_sectors(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.max_append_sectors : 0;
+}
+
+uint32_t blk_get_nr_zones(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.nr_zones : 0;
+}
+
+uint32_t blk_get_write_granularity(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.write_granularity : 0;
+}
+
+BlockZoneWps *blk_get_zone_wps(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->wps : NULL;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
     IO_CODE();
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e..e64b021454 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -417,18 +417,6 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
 static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
                                          uint32_t opn, uint32_t zrwa)
 {
-    if (ns->params.max_active_zones != 0 &&
-        ns->nr_active_zones + act > ns->params.max_active_zones) {
-        trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones);
-        return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR;
-    }
-
-    if (ns->params.max_open_zones != 0 &&
-        ns->nr_open_zones + opn > ns->params.max_open_zones) {
-        trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones);
-        return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR;
-    }
-
     if (zrwa > ns->zns.numzrwa) {
         return NVME_NOZRWA | NVME_DNR;
     }
@@ -1988,9 +1976,9 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, NvmeZone *zone)
 static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns)
 {
     NvmeZone *zone;
+    int moz = blk_get_max_open_zones(ns->blkconf.blk);
 
-    if (ns->params.max_open_zones &&
-        ns->nr_open_zones == ns->params.max_open_zones) {
+    if (moz && ns->nr_open_zones == moz) {
         zone = QTAILQ_FIRST(&ns->imp_open_zones);
         if (zone) {
             /*
@@ -2160,7 +2148,7 @@ void nvme_rw_complete_cb(void *opaque, int ret)
         block_acct_done(stats, acct);
     }
 
-    if (ns->params.zoned && nvme_is_write(req)) {
+    if (blk_get_zone_model(blk) && nvme_is_write(req)) {
         nvme_finalize_zoned_write(ns, req);
     }
 
@@ -2882,7 +2870,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret)
         goto out;
     }
 
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         nvme_advance_zone_wp(ns, iocb->zone, nlb);
     }
 
@@ -2994,7 +2982,7 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret)
         goto invalid;
     }
 
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb);
         if (status) {
             goto invalid;
@@ -3088,7 +3076,7 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb)
         }
     }
 
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         status = nvme_check_zone_read(ns, slba, nlb);
         if (status) {
             goto invalid;
@@ -3164,7 +3152,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
     iocb->slba = le64_to_cpu(copy->sdlba);
 
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         iocb->zone = nvme_get_zone_by_slba(ns, iocb->slba);
         if (!iocb->zone) {
             status = NVME_LBA_RANGE | NVME_DNR;
@@ -3434,7 +3422,7 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
         goto invalid;
     }
 
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(blk)) {
         status = nvme_check_zone_read(ns, slba, nlb);
         if (status) {
             trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status);
@@ -3549,7 +3537,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
         goto invalid;
     }
 
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(blk)) {
         zone = nvme_get_zone_by_slba(ns, slba);
         assert(zone);
 
@@ -3667,7 +3655,7 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c,
     uint32_t dw10 = le32_to_cpu(c->cdw10);
     uint32_t dw11 = le32_to_cpu(c->cdw11);
 
-    if (!ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         trace_pci_nvme_err_invalid_opc(c->opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
     }
@@ -6527,7 +6515,7 @@ done:
 
 static uint16_t nvme_format_check(NvmeNamespace *ns, uint8_t lbaf, uint8_t pi)
 {
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         return NVME_INVALID_FORMAT | NVME_DNR;
     }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 0eabcf5cf5..82d4f7932d 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -25,7 +25,6 @@
 #include "trace.h"
 
 #define MIN_DISCARD_GRANULARITY (4 * KiB)
-#define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
 
 void nvme_ns_init_format(NvmeNamespace *ns)
 {
@@ -177,19 +176,11 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
 
 static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp)
 {
-    uint64_t zone_size, zone_cap;
+    BlockBackend *blk = ns->blkconf.blk;
+    uint64_t zone_size = blk_get_zone_size(blk);
+    uint64_t zone_cap = blk_get_zone_capacity(blk);
 
     /* Make sure that the values of ZNS properties are sane */
-    if (ns->params.zone_size_bs) {
-        zone_size = ns->params.zone_size_bs;
-    } else {
-        zone_size = NVME_DEFAULT_ZONE_SIZE;
-    }
-    if (ns->params.zone_cap_bs) {
-        zone_cap = ns->params.zone_cap_bs;
-    } else {
-        zone_cap = zone_size;
-    }
     if (zone_cap > zone_size) {
         error_setg(errp, "zone capacity %"PRIu64"B exceeds "
                    "zone size %"PRIu64"B", zone_cap, zone_size);
@@ -266,6 +257,7 @@ static void nvme_ns_zoned_init_state(NvmeNamespace *ns)
 
 static void nvme_ns_init_zoned(NvmeNamespace *ns)
 {
+    BlockBackend *blk = ns->blkconf.blk;
     NvmeIdNsZoned *id_ns_z;
     int i;
 
@@ -274,8 +266,8 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
     id_ns_z = g_new0(NvmeIdNsZoned, 1);
 
     /* MAR/MOR are zeroes-based, FFFFFFFFFh means no limit */
-    id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1);
-    id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1);
+    id_ns_z->mar = cpu_to_le32(blk_get_max_active_zones(blk) - 1);
+    id_ns_z->mor = cpu_to_le32(blk_get_max_open_zones(blk) - 1);
     id_ns_z->zoc = 0;
     id_ns_z->ozcs = ns->params.cross_zone_read ?
         NVME_ID_NS_ZONED_OZCS_RAZB : 0x00;
@@ -539,6 +531,7 @@ static bool nvme_ns_init_fdp(NvmeNamespace *ns, Error **errp)
 
 static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
 {
+    BlockBackend *blk = ns->blkconf.blk;
     unsigned int pi_size;
 
     if (!ns->blkconf.blk) {
@@ -577,25 +570,12 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
         return -1;
     }
 
-    if (ns->params.zoned && ns->endgrp && ns->endgrp->fdp.enabled) {
+    if (blk_get_zone_model(blk) && ns->endgrp && ns->endgrp->fdp.enabled) {
         error_setg(errp, "cannot be a zoned- in an FDP configuration");
         return -1;
     }
 
-    if (ns->params.zoned) {
-        if (ns->params.max_active_zones) {
-            if (ns->params.max_open_zones > ns->params.max_active_zones) {
-                error_setg(errp, "max_open_zones (%u) exceeds "
-                           "max_active_zones (%u)", ns->params.max_open_zones,
-                           ns->params.max_active_zones);
-                return -1;
-            }
-
-            if (!ns->params.max_open_zones) {
-                ns->params.max_open_zones = ns->params.max_active_zones;
-            }
-        }
-
+    if (blk_get_zone_model(blk)) {
         if (ns->params.zd_extension_size) {
             if (ns->params.zd_extension_size & 0x3f) {
                 error_setg(errp, "zone descriptor extension size must be a "
@@ -630,14 +610,14 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
                 return -1;
             }
 
-            if (ns->params.max_active_zones) {
-                if (ns->params.numzrwa > ns->params.max_active_zones) {
+            int maz = blk_get_max_active_zones(blk);
+            if (maz) {
+                if (ns->params.numzrwa > maz) {
                     error_setg(errp, "number of zone random write area "
                                "resources (zoned.numzrwa, %d) must be less "
                                "than or equal to maximum active resources "
                                "(zoned.max_active_zones, %d)",
-                               ns->params.numzrwa,
-                               ns->params.max_active_zones);
+                               ns->params.numzrwa, maz);
                     return -1;
                 }
             }
@@ -660,7 +640,7 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
     if (nvme_ns_init(ns, errp)) {
         return -1;
     }
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         if (nvme_ns_zoned_check_calc_geometry(ns, errp) != 0) {
             return -1;
         }
@@ -683,15 +663,17 @@ void nvme_ns_drain(NvmeNamespace *ns)
 
 void nvme_ns_shutdown(NvmeNamespace *ns)
 {
-    blk_flush(ns->blkconf.blk);
-    if (ns->params.zoned) {
+
+    BlockBackend *blk = ns->blkconf.blk;
+    blk_flush(blk);
+    if (blk_get_zone_model(blk)) {
         nvme_zoned_ns_shutdown(ns);
     }
 }
 
 void nvme_ns_cleanup(NvmeNamespace *ns)
 {
-    if (ns->params.zoned) {
+    if (blk_get_zone_model(ns->blkconf.blk)) {
         g_free(ns->id_ns_zoned);
         g_free(ns->zone_array);
         g_free(ns->zd_extensions);
@@ -806,11 +788,6 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
     DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
     DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
-    DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
-    DEFINE_PROP_SIZE("zoned.zone_size", NvmeNamespace, params.zone_size_bs,
-                     NVME_DEFAULT_ZONE_SIZE),
-    DEFINE_PROP_SIZE("zoned.zone_capacity", NvmeNamespace, params.zone_cap_bs,
-                     0),
     DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace,
                      params.cross_zone_read, false),
     DEFINE_PROP_UINT32("zoned.max_active", NvmeNamespace,
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 5f2ae7b28b..76677a86e9 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -189,10 +189,7 @@ typedef struct NvmeNamespaceParams {
     uint32_t mcl;
     uint8_t  msrc;
 
-    bool     zoned;
     bool     cross_zone_read;
-    uint64_t zone_size_bs;
-    uint64_t zone_cap_bs;
     uint32_t max_active_zones;
     uint32_t max_open_zones;
     uint32_t zd_extension_size;
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index d174275a5c..44e44954fa 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -99,6 +99,15 @@ void blk_error_action(BlockBackend *blk, BlockErrorAction action,
 void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
+uint8_t blk_get_zone_model(BlockBackend *blk);
+uint32_t blk_get_zone_size(BlockBackend *blk);
+uint32_t blk_get_zone_capacity(BlockBackend *blk);
+uint32_t blk_get_max_open_zones(BlockBackend *blk);
+uint32_t blk_get_max_active_zones(BlockBackend *blk);
+uint32_t blk_get_max_append_sectors(BlockBackend *blk);
+uint32_t blk_get_nr_zones(BlockBackend *blk);
+uint32_t blk_get_write_granularity(BlockBackend *blk);
+BlockZoneWps *blk_get_zone_wps(BlockBackend *blk);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
                   ` (2 preceding siblings ...)
  2023-11-27  8:56 ` [RFC v2 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-27  8:56 ` [RFC v2 5/7] hw/nvme: make the metadata of ZNS emulation persistent Sam Li
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/block-backend.c             | 16 ++++++++++++++++
 hw/nvme/ctrl.c                    | 20 ++++++++++++++------
 hw/nvme/ns.c                      | 24 ++++--------------------
 hw/nvme/nvme.h                    |  7 -------
 include/sysemu/block-backend-io.h |  2 ++
 5 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 666df9cfea..fcdcbe28bf 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2452,6 +2452,22 @@ BlockZoneWps *blk_get_zone_wps(BlockBackend *blk)
     return bs ? bs->wps : NULL;
 }
 
+uint8_t *blk_get_zone_extension(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->zd_extensions : NULL;
+}
+
+uint32_t blk_get_zd_ext_size(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+
+    return bs ? bs->bl.zd_extension_size : 0;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
     IO_CODE();
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e64b021454..dae6f00e4f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4004,6 +4004,12 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone,
     return NVME_SUCCESS;
 }
 
+static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
+                                        uint32_t zone_idx)
+{
+    return &ns->zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)];
+}
+
 static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd;
@@ -4088,11 +4094,11 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
 
     case NVME_ZONE_ACTION_SET_ZD_EXT:
         trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
-        if (all || !ns->params.zd_extension_size) {
+        if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) {
             return NVME_INVALID_FIELD | NVME_DNR;
         }
         zd_ext = nvme_get_zd_extension(ns, zone_idx);
-        status = nvme_h2c(n, zd_ext, ns->params.zd_extension_size, req);
+        status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), req);
         if (status) {
             trace_pci_nvme_err_zd_extension_map_error(zone_idx);
             return status;
@@ -4183,7 +4189,8 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
     if (zra != NVME_ZONE_REPORT && zra != NVME_ZONE_REPORT_EXTENDED) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
-    if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) {
+    if (zra == NVME_ZONE_REPORT_EXTENDED &&
+        !blk_get_zd_ext_size(ns->blkconf.blk)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
@@ -4205,7 +4212,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
 
     zone_entry_sz = sizeof(NvmeZoneDescr);
     if (zra == NVME_ZONE_REPORT_EXTENDED) {
-        zone_entry_sz += ns->params.zd_extension_size;
+        zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ;
     }
 
     max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
@@ -4243,11 +4250,12 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
             }
 
             if (zra == NVME_ZONE_REPORT_EXTENDED) {
+                int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk);
                 if (zone->d.za & NVME_ZA_ZD_EXT_VALID) {
                     memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx),
-                           ns->params.zd_extension_size);
+                           zd_ext_size);
                 }
-                buf_p += ns->params.zd_extension_size;
+                buf_p += zd_ext_size;
             }
 
             max_zones--;
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 82d4f7932d..45c08391f5 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -218,15 +218,15 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp)
 
 static void nvme_ns_zoned_init_state(NvmeNamespace *ns)
 {
+    BlockBackend *blk = ns->blkconf.blk;
     uint64_t start = 0, zone_size = ns->zone_size;
     uint64_t capacity = ns->num_zones * zone_size;
     NvmeZone *zone;
     int i;
 
     ns->zone_array = g_new0(NvmeZone, ns->num_zones);
-    if (ns->params.zd_extension_size) {
-        ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
-                                      ns->num_zones);
+    if (blk_get_zone_extension(blk)) {
+        ns->zd_extensions = blk_get_zone_extension(blk);
     }
 
     QTAILQ_INIT(&ns->exp_open_zones);
@@ -275,7 +275,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
     for (i = 0; i <= ns->id_ns.nlbaf; i++) {
         id_ns_z->lbafe[i].zsze = cpu_to_le64(ns->zone_size);
         id_ns_z->lbafe[i].zdes =
-            ns->params.zd_extension_size >> 6; /* Units of 64B */
+            blk_get_zd_ext_size(blk) >> 6; /* Units of 64B */
     }
 
     if (ns->params.zrwas) {
@@ -576,19 +576,6 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
     }
 
     if (blk_get_zone_model(blk)) {
-        if (ns->params.zd_extension_size) {
-            if (ns->params.zd_extension_size & 0x3f) {
-                error_setg(errp, "zone descriptor extension size must be a "
-                           "multiple of 64B");
-                return -1;
-            }
-            if ((ns->params.zd_extension_size >> 6) > 0xff) {
-                error_setg(errp,
-                           "zone descriptor extension size is too large");
-                return -1;
-            }
-        }
-
         if (ns->params.zrwas) {
             if (ns->params.zrwas % ns->blkconf.logical_block_size) {
                 error_setg(errp, "zone random write area size (zoned.zrwas "
@@ -676,7 +663,6 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
     if (blk_get_zone_model(ns->blkconf.blk)) {
         g_free(ns->id_ns_zoned);
         g_free(ns->zone_array);
-        g_free(ns->zd_extensions);
     }
 
     if (ns->endgrp && ns->endgrp->fdp.enabled) {
@@ -794,8 +780,6 @@ static Property nvme_ns_props[] = {
                        params.max_active_zones, 0),
     DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace,
                        params.max_open_zones, 0),
-    DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace,
-                       params.zd_extension_size, 0),
     DEFINE_PROP_UINT32("zoned.numzrwa", NvmeNamespace, params.numzrwa, 0),
     DEFINE_PROP_SIZE("zoned.zrwas", NvmeNamespace, params.zrwas, 0),
     DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 76677a86e9..37007952fc 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -192,7 +192,6 @@ typedef struct NvmeNamespaceParams {
     bool     cross_zone_read;
     uint32_t max_active_zones;
     uint32_t max_open_zones;
-    uint32_t zd_extension_size;
 
     uint32_t numzrwa;
     uint64_t zrwas;
@@ -315,12 +314,6 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone)
            st != NVME_ZONE_STATE_OFFLINE;
 }
 
-static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
-                                             uint32_t zone_idx)
-{
-    return &ns->zd_extensions[zone_idx * ns->params.zd_extension_size];
-}
-
 static inline void nvme_aor_inc_open(NvmeNamespace *ns)
 {
     assert(ns->nr_open_zones >= 0);
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index 44e44954fa..ab388801b1 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -108,6 +108,8 @@ uint32_t blk_get_max_append_sectors(BlockBackend *blk);
 uint32_t blk_get_nr_zones(BlockBackend *blk);
 uint32_t blk_get_write_granularity(BlockBackend *blk);
 BlockZoneWps *blk_get_zone_wps(BlockBackend *blk);
+uint8_t *blk_get_zone_extension(BlockBackend *blk);
+uint32_t blk_get_zd_ext_size(BlockBackend *blk);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 5/7] hw/nvme: make the metadata of ZNS emulation persistent
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
                   ` (3 preceding siblings ...)
  2023-11-27  8:56 ` [RFC v2 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-27  8:56 ` [RFC v2 6/7] hw/nvme: refactor zone append write using block layer APIs Sam Li
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

The NVMe ZNS devices follow NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs. The
ZNS device calls zone report and zone mgmt APIs from the block layer
which will handle zone state transition and manage zone resources.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/qcow2.c                    |    3 +
 hw/nvme/ctrl.c                   | 1106 +++++++-----------------------
 hw/nvme/ns.c                     |   77 +--
 hw/nvme/nvme.h                   |   85 +--
 include/block/block-common.h     |    8 +
 include/block/block_int-common.h |    2 +
 6 files changed, 264 insertions(+), 1017 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 75dff27216..dfaf5566e2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5043,6 +5043,9 @@ static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
     case BLK_ZO_RESET:
         ret = qcow2_reset_zone(bs, index, len);
         break;
+    case BLK_ZO_OFFLINE:
+        /* There are no transitions from the offline state to any other state */
+        break;
     default:
         error_report("Unsupported zone op: 0x%x", op);
         ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dae6f00e4f..b9ed3495e1 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -372,67 +372,6 @@ static inline bool nvme_parse_pid(NvmeNamespace *ns, uint16_t pid,
     return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg);
 }
 
-static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
-                                   NvmeZoneState state)
-{
-    if (QTAILQ_IN_USE(zone, entry)) {
-        switch (nvme_get_zone_state(zone)) {
-        case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-            QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry);
-            break;
-        case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-            QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry);
-            break;
-        case NVME_ZONE_STATE_CLOSED:
-            QTAILQ_REMOVE(&ns->closed_zones, zone, entry);
-            break;
-        case NVME_ZONE_STATE_FULL:
-            QTAILQ_REMOVE(&ns->full_zones, zone, entry);
-        default:
-            ;
-        }
-    }
-
-    nvme_set_zone_state(zone, state);
-
-    switch (state) {
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-        QTAILQ_INSERT_TAIL(&ns->exp_open_zones, zone, entry);
-        break;
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-        QTAILQ_INSERT_TAIL(&ns->imp_open_zones, zone, entry);
-        break;
-    case NVME_ZONE_STATE_CLOSED:
-        QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry);
-        break;
-    case NVME_ZONE_STATE_FULL:
-        QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry);
-    case NVME_ZONE_STATE_READ_ONLY:
-        break;
-    default:
-        zone->d.za = 0;
-    }
-}
-
-static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
-                                         uint32_t opn, uint32_t zrwa)
-{
-    if (zrwa > ns->zns.numzrwa) {
-        return NVME_NOZRWA | NVME_DNR;
-    }
-
-    return NVME_SUCCESS;
-}
-
-/*
- * Check if we can open a zone without exceeding open/active limits.
- * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5).
- */
-static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn)
-{
-    return nvme_zns_check_resources(ns, act, opn, 0);
-}
-
 static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer *ebuf)
 {
     NvmeFdpEvent *ret = NULL;
@@ -1769,346 +1708,11 @@ static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba)
                                     slba / ns->zone_size;
 }
 
-static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba)
-{
-    uint32_t zone_idx = nvme_zone_idx(ns, slba);
-
-    if (zone_idx >= ns->num_zones) {
-        return NULL;
-    }
-
-    return &ns->zone_array[zone_idx];
-}
-
-static uint16_t nvme_check_zone_state_for_write(NvmeZone *zone)
-{
-    uint64_t zslba = zone->d.zslba;
-
-    switch (nvme_get_zone_state(zone)) {
-    case NVME_ZONE_STATE_EMPTY:
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-    case NVME_ZONE_STATE_CLOSED:
-        return NVME_SUCCESS;
-    case NVME_ZONE_STATE_FULL:
-        trace_pci_nvme_err_zone_is_full(zslba);
-        return NVME_ZONE_FULL;
-    case NVME_ZONE_STATE_OFFLINE:
-        trace_pci_nvme_err_zone_is_offline(zslba);
-        return NVME_ZONE_OFFLINE;
-    case NVME_ZONE_STATE_READ_ONLY:
-        trace_pci_nvme_err_zone_is_read_only(zslba);
-        return NVME_ZONE_READ_ONLY;
-    default:
-        assert(false);
-    }
-
-    return NVME_INTERNAL_DEV_ERROR;
-}
-
-static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone,
-                                      uint64_t slba, uint32_t nlb)
-{
-    uint64_t zcap = nvme_zone_wr_boundary(zone);
-    uint16_t status;
-
-    status = nvme_check_zone_state_for_write(zone);
-    if (status) {
-        return status;
-    }
-
-    if (zone->d.za & NVME_ZA_ZRWA_VALID) {
-        uint64_t ezrwa = zone->w_ptr + 2 * ns->zns.zrwas;
-
-        if (slba < zone->w_ptr || slba + nlb > ezrwa) {
-            trace_pci_nvme_err_zone_invalid_write(slba, zone->w_ptr);
-            return NVME_ZONE_INVALID_WRITE;
-        }
-    } else {
-        if (unlikely(slba != zone->w_ptr)) {
-            trace_pci_nvme_err_write_not_at_wp(slba, zone->d.zslba,
-                                               zone->w_ptr);
-            return NVME_ZONE_INVALID_WRITE;
-        }
-    }
-
-    if (unlikely((slba + nlb) > zcap)) {
-        trace_pci_nvme_err_zone_boundary(slba, nlb, zcap);
-        return NVME_ZONE_BOUNDARY_ERROR;
-    }
-
-    return NVME_SUCCESS;
-}
-
-static uint16_t nvme_check_zone_state_for_read(NvmeZone *zone)
-{
-    switch (nvme_get_zone_state(zone)) {
-    case NVME_ZONE_STATE_EMPTY:
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-    case NVME_ZONE_STATE_FULL:
-    case NVME_ZONE_STATE_CLOSED:
-    case NVME_ZONE_STATE_READ_ONLY:
-        return NVME_SUCCESS;
-    case NVME_ZONE_STATE_OFFLINE:
-        trace_pci_nvme_err_zone_is_offline(zone->d.zslba);
-        return NVME_ZONE_OFFLINE;
-    default:
-        assert(false);
-    }
-
-    return NVME_INTERNAL_DEV_ERROR;
-}
-
-static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba,
-                                     uint32_t nlb)
-{
-    NvmeZone *zone;
-    uint64_t bndry, end;
-    uint16_t status;
-
-    zone = nvme_get_zone_by_slba(ns, slba);
-    assert(zone);
-
-    bndry = nvme_zone_rd_boundary(ns, zone);
-    end = slba + nlb;
-
-    status = nvme_check_zone_state_for_read(zone);
-    if (status) {
-        ;
-    } else if (unlikely(end > bndry)) {
-        if (!ns->params.cross_zone_read) {
-            status = NVME_ZONE_BOUNDARY_ERROR;
-        } else {
-            /*
-             * Read across zone boundary - check that all subsequent
-             * zones that are being read have an appropriate state.
-             */
-            do {
-                zone++;
-                status = nvme_check_zone_state_for_read(zone);
-                if (status) {
-                    break;
-                }
-            } while (end > nvme_zone_rd_boundary(ns, zone));
-        }
-    }
-
-    return status;
-}
-
-static uint16_t nvme_zrm_finish(NvmeNamespace *ns, NvmeZone *zone)
-{
-    switch (nvme_get_zone_state(zone)) {
-    case NVME_ZONE_STATE_FULL:
-        return NVME_SUCCESS;
-
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-        nvme_aor_dec_open(ns);
-        /* fallthrough */
-    case NVME_ZONE_STATE_CLOSED:
-        nvme_aor_dec_active(ns);
-
-        if (zone->d.za & NVME_ZA_ZRWA_VALID) {
-            zone->d.za &= ~NVME_ZA_ZRWA_VALID;
-            if (ns->params.numzrwa) {
-                ns->zns.numzrwa++;
-            }
-        }
-
-        /* fallthrough */
-    case NVME_ZONE_STATE_EMPTY:
-        nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL);
-        return NVME_SUCCESS;
-
-    default:
-        return NVME_ZONE_INVAL_TRANSITION;
-    }
-}
-
-static uint16_t nvme_zrm_close(NvmeNamespace *ns, NvmeZone *zone)
-{
-    switch (nvme_get_zone_state(zone)) {
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-        nvme_aor_dec_open(ns);
-        nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
-        /* fall through */
-    case NVME_ZONE_STATE_CLOSED:
-        return NVME_SUCCESS;
-
-    default:
-        return NVME_ZONE_INVAL_TRANSITION;
-    }
-}
-
-static uint16_t nvme_zrm_reset(NvmeNamespace *ns, NvmeZone *zone)
-{
-    switch (nvme_get_zone_state(zone)) {
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-        nvme_aor_dec_open(ns);
-        /* fallthrough */
-    case NVME_ZONE_STATE_CLOSED:
-        nvme_aor_dec_active(ns);
-
-        if (zone->d.za & NVME_ZA_ZRWA_VALID) {
-            if (ns->params.numzrwa) {
-                ns->zns.numzrwa++;
-            }
-        }
-
-        /* fallthrough */
-    case NVME_ZONE_STATE_FULL:
-        zone->w_ptr = zone->d.zslba;
-        zone->d.wp = zone->w_ptr;
-        nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY);
-        /* fallthrough */
-    case NVME_ZONE_STATE_EMPTY:
-        return NVME_SUCCESS;
-
-    default:
-        return NVME_ZONE_INVAL_TRANSITION;
-    }
-}
-
-static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns)
-{
-    NvmeZone *zone;
-    int moz = blk_get_max_open_zones(ns->blkconf.blk);
-
-    if (moz && ns->nr_open_zones == moz) {
-        zone = QTAILQ_FIRST(&ns->imp_open_zones);
-        if (zone) {
-            /*
-             * Automatically close this implicitly open zone.
-             */
-            QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry);
-            nvme_zrm_close(ns, zone);
-        }
-    }
-}
-
 enum {
     NVME_ZRM_AUTO = 1 << 0,
     NVME_ZRM_ZRWA = 1 << 1,
 };
 
-static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns,
-                                    NvmeZone *zone, int flags)
-{
-    int act = 0;
-    uint16_t status;
-
-    switch (nvme_get_zone_state(zone)) {
-    case NVME_ZONE_STATE_EMPTY:
-        act = 1;
-
-        /* fallthrough */
-
-    case NVME_ZONE_STATE_CLOSED:
-        if (n->params.auto_transition_zones) {
-            nvme_zrm_auto_transition_zone(ns);
-        }
-        status = nvme_zns_check_resources(ns, act, 1,
-                                          (flags & NVME_ZRM_ZRWA) ? 1 : 0);
-        if (status) {
-            return status;
-        }
-
-        if (act) {
-            nvme_aor_inc_active(ns);
-        }
-
-        nvme_aor_inc_open(ns);
-
-        if (flags & NVME_ZRM_AUTO) {
-            nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN);
-            return NVME_SUCCESS;
-        }
-
-        /* fallthrough */
-
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-        if (flags & NVME_ZRM_AUTO) {
-            return NVME_SUCCESS;
-        }
-
-        nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN);
-
-        /* fallthrough */
-
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-        if (flags & NVME_ZRM_ZRWA) {
-            ns->zns.numzrwa--;
-
-            zone->d.za |= NVME_ZA_ZRWA_VALID;
-        }
-
-        return NVME_SUCCESS;
-
-    default:
-        return NVME_ZONE_INVAL_TRANSITION;
-    }
-}
-
-static inline uint16_t nvme_zrm_auto(NvmeCtrl *n, NvmeNamespace *ns,
-                                     NvmeZone *zone)
-{
-    return nvme_zrm_open_flags(n, ns, zone, NVME_ZRM_AUTO);
-}
-
-static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
-                                 uint32_t nlb)
-{
-    zone->d.wp += nlb;
-
-    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
-        nvme_zrm_finish(ns, zone);
-    }
-}
-
-static void nvme_zoned_zrwa_implicit_flush(NvmeNamespace *ns, NvmeZone *zone,
-                                           uint32_t nlbc)
-{
-    uint16_t nzrwafgs = DIV_ROUND_UP(nlbc, ns->zns.zrwafg);
-
-    nlbc = nzrwafgs * ns->zns.zrwafg;
-
-    trace_pci_nvme_zoned_zrwa_implicit_flush(zone->d.zslba, nlbc);
-
-    zone->w_ptr += nlbc;
-
-    nvme_advance_zone_wp(ns, zone, nlbc);
-}
-
-static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req)
-{
-    NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
-    NvmeZone *zone;
-    uint64_t slba;
-    uint32_t nlb;
-
-    slba = le64_to_cpu(rw->slba);
-    nlb = le16_to_cpu(rw->nlb) + 1;
-    zone = nvme_get_zone_by_slba(ns, slba);
-    assert(zone);
-
-    if (zone->d.za & NVME_ZA_ZRWA_VALID) {
-        uint64_t ezrwa = zone->w_ptr + ns->zns.zrwas - 1;
-        uint64_t elba = slba + nlb - 1;
-
-        if (elba > ezrwa) {
-            nvme_zoned_zrwa_implicit_flush(ns, zone, elba - ezrwa);
-        }
-
-        return;
-    }
-
-    nvme_advance_zone_wp(ns, zone, nlb);
-}
-
 static inline bool nvme_is_write(NvmeRequest *req)
 {
     NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
@@ -2148,10 +1752,6 @@ void nvme_rw_complete_cb(void *opaque, int ret)
         block_acct_done(stats, acct);
     }
 
-    if (blk_get_zone_model(blk) && nvme_is_write(req)) {
-        nvme_finalize_zoned_write(ns, req);
-    }
-
     nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
@@ -2856,8 +2456,6 @@ static inline uint16_t nvme_check_copy_mcl(NvmeNamespace *ns,
 static void nvme_copy_out_completed_cb(void *opaque, int ret)
 {
     NvmeCopyAIOCB *iocb = opaque;
-    NvmeRequest *req = iocb->req;
-    NvmeNamespace *ns = req->ns;
     uint32_t nlb;
 
     nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, NULL,
@@ -2870,10 +2468,6 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret)
         goto out;
     }
 
-    if (blk_get_zone_model(ns->blkconf.blk)) {
-        nvme_advance_zone_wp(ns, iocb->zone, nlb);
-    }
-
     iocb->idx++;
     iocb->slba += nlb;
 out:
@@ -2982,17 +2576,6 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret)
         goto invalid;
     }
 
-    if (blk_get_zone_model(ns->blkconf.blk)) {
-        status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb);
-        if (status) {
-            goto invalid;
-        }
-
-        if (!(iocb->zone->d.za & NVME_ZA_ZRWA_VALID)) {
-            iocb->zone->w_ptr += nlb;
-        }
-    }
-
     qemu_iovec_reset(&iocb->iov);
     qemu_iovec_add(&iocb->iov, iocb->bounce, len);
 
@@ -3076,13 +2659,6 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb)
         }
     }
 
-    if (blk_get_zone_model(ns->blkconf.blk)) {
-        status = nvme_check_zone_read(ns, slba, nlb);
-        if (status) {
-            goto invalid;
-        }
-    }
-
     qemu_iovec_reset(&iocb->iov);
     qemu_iovec_add(&iocb->iov, iocb->bounce, len);
 
@@ -3152,19 +2728,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
     iocb->slba = le64_to_cpu(copy->sdlba);
 
-    if (blk_get_zone_model(ns->blkconf.blk)) {
-        iocb->zone = nvme_get_zone_by_slba(ns, iocb->slba);
-        if (!iocb->zone) {
-            status = NVME_LBA_RANGE | NVME_DNR;
-            goto invalid;
-        }
-
-        status = nvme_zrm_auto(n, ns, iocb->zone);
-        if (status) {
-            goto invalid;
-        }
-    }
-
     status = nvme_check_copy_mcl(ns, iocb, nr);
     if (status) {
         goto invalid;
@@ -3422,14 +2985,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
         goto invalid;
     }
 
-    if (blk_get_zone_model(blk)) {
-        status = nvme_check_zone_read(ns, slba, nlb);
-        if (status) {
-            trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status);
-            goto invalid;
-        }
-    }
-
     if (NVME_ERR_REC_DULBE(ns->features.err_rec)) {
         status = nvme_check_dulbe(ns, slba, nlb);
         if (status) {
@@ -3505,8 +3060,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
     uint64_t data_size = nvme_l2b(ns, nlb);
     uint64_t mapped_size = data_size;
     uint64_t data_offset;
-    NvmeZone *zone;
-    NvmeZonedResult *res = (NvmeZonedResult *)&req->cqe;
     BlockBackend *blk = ns->blkconf.blk;
     uint16_t status;
 
@@ -3538,32 +3091,20 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
     }
 
     if (blk_get_zone_model(blk)) {
-        zone = nvme_get_zone_by_slba(ns, slba);
-        assert(zone);
+        uint32_t zone_size = blk_get_zone_size(blk);
+        uint32_t zone_idx = slba / zone_size;
+        int64_t zone_start = zone_idx * zone_size;
 
         if (append) {
             bool piremap = !!(ctrl & NVME_RW_PIREMAP);
 
-            if (unlikely(zone->d.za & NVME_ZA_ZRWA_VALID)) {
-                return NVME_INVALID_ZONE_OP | NVME_DNR;
-            }
-
-            if (unlikely(slba != zone->d.zslba)) {
-                trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba);
-                status = NVME_INVALID_FIELD;
-                goto invalid;
-            }
-
             if (n->params.zasl &&
                 data_size > (uint64_t)n->page_size << n->params.zasl) {
                 trace_pci_nvme_err_zasl(data_size);
                 return NVME_INVALID_FIELD | NVME_DNR;
             }
 
-            slba = zone->w_ptr;
             rw->slba = cpu_to_le64(slba);
-            res->slba = cpu_to_le64(slba);
-
             switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
             case NVME_ID_NS_DPS_TYPE_1:
                 if (!piremap) {
@@ -3575,7 +3116,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
             case NVME_ID_NS_DPS_TYPE_2:
                 if (piremap) {
                     uint32_t reftag = le32_to_cpu(rw->reftag);
-                    rw->reftag = cpu_to_le32(reftag + (slba - zone->d.zslba));
+                    rw->reftag = cpu_to_le32(reftag + (slba - zone_start));
                 }
 
                 break;
@@ -3589,19 +3130,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
             }
         }
 
-        status = nvme_check_zone_write(ns, zone, slba, nlb);
-        if (status) {
-            goto invalid;
-        }
-
-        status = nvme_zrm_auto(n, ns, zone);
-        if (status) {
-            goto invalid;
-        }
-
-        if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) {
-            zone->w_ptr += nlb;
-        }
     } else if (ns->endgrp && ns->endgrp->fdp.enabled) {
         nvme_do_write_fdp(n, req, slba, nlb);
     }
@@ -3644,6 +3172,23 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
     return nvme_do_write(n, req, false, true);
 }
 
+typedef struct NvmeZoneCmdAIOCB {
+    NvmeRequest *req;
+    NvmeCmd *cmd;
+    NvmeCtrl *n;
+
+    union {
+        struct {
+          uint32_t partial;
+          unsigned int nr_zones;
+          BlockZoneDescriptor *zones;
+        } zone_report_data;
+        struct {
+          int64_t offset;
+        } zone_append_data;
+    };
+} NvmeZoneCmdAIOCB;
+
 static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req)
 {
     return nvme_do_write(n, req, true, false);
@@ -3655,7 +3200,7 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c,
     uint32_t dw10 = le32_to_cpu(c->cdw10);
     uint32_t dw11 = le32_to_cpu(c->cdw11);
 
-    if (blk_get_zone_model(ns->blkconf.blk)) {
+    if (!blk_get_zone_model(ns->blkconf.blk)) {
         trace_pci_nvme_err_invalid_opc(c->opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
     }
@@ -3673,198 +3218,21 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c,
     return NVME_SUCCESS;
 }
 
-typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, NvmeZoneState,
-                                 NvmeRequest *);
-
-enum NvmeZoneProcessingMask {
-    NVME_PROC_CURRENT_ZONE    = 0,
-    NVME_PROC_OPENED_ZONES    = 1 << 0,
-    NVME_PROC_CLOSED_ZONES    = 1 << 1,
-    NVME_PROC_READ_ONLY_ZONES = 1 << 2,
-    NVME_PROC_FULL_ZONES      = 1 << 3,
-};
-
-static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone,
-                               NvmeZoneState state, NvmeRequest *req)
-{
-    NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd;
-    int flags = 0;
-
-    if (cmd->zsflags & NVME_ZSFLAG_ZRWA_ALLOC) {
-        uint16_t ozcs = le16_to_cpu(ns->id_ns_zoned->ozcs);
-
-        if (!(ozcs & NVME_ID_NS_ZONED_OZCS_ZRWASUP)) {
-            return NVME_INVALID_ZONE_OP | NVME_DNR;
-        }
-
-        if (zone->w_ptr % ns->zns.zrwafg) {
-            return NVME_NOZRWA | NVME_DNR;
-        }
-
-        flags = NVME_ZRM_ZRWA;
-    }
-
-    return nvme_zrm_open_flags(nvme_ctrl(req), ns, zone, flags);
-}
-
-static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone,
-                                NvmeZoneState state, NvmeRequest *req)
-{
-    return nvme_zrm_close(ns, zone);
-}
-
-static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone,
-                                 NvmeZoneState state, NvmeRequest *req)
-{
-    return nvme_zrm_finish(ns, zone);
-}
-
-static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone,
-                                  NvmeZoneState state, NvmeRequest *req)
-{
-    switch (state) {
-    case NVME_ZONE_STATE_READ_ONLY:
-        nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE);
-        /* fall through */
-    case NVME_ZONE_STATE_OFFLINE:
-        return NVME_SUCCESS;
-    default:
-        return NVME_ZONE_INVAL_TRANSITION;
-    }
-}
-
-static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone)
-{
-    uint16_t status;
-    uint8_t state = nvme_get_zone_state(zone);
-
-    if (state == NVME_ZONE_STATE_EMPTY) {
-        status = nvme_aor_check(ns, 1, 0);
-        if (status) {
-            return status;
-        }
-        nvme_aor_inc_active(ns);
-        zone->d.za |= NVME_ZA_ZD_EXT_VALID;
-        nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
-        return NVME_SUCCESS;
-    }
-
-    return NVME_ZONE_INVAL_TRANSITION;
-}
-
-static uint16_t nvme_bulk_proc_zone(NvmeNamespace *ns, NvmeZone *zone,
-                                    enum NvmeZoneProcessingMask proc_mask,
-                                    op_handler_t op_hndlr, NvmeRequest *req)
-{
-    uint16_t status = NVME_SUCCESS;
-    NvmeZoneState zs = nvme_get_zone_state(zone);
-    bool proc_zone;
-
-    switch (zs) {
-    case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-    case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-        proc_zone = proc_mask & NVME_PROC_OPENED_ZONES;
-        break;
-    case NVME_ZONE_STATE_CLOSED:
-        proc_zone = proc_mask & NVME_PROC_CLOSED_ZONES;
-        break;
-    case NVME_ZONE_STATE_READ_ONLY:
-        proc_zone = proc_mask & NVME_PROC_READ_ONLY_ZONES;
-        break;
-    case NVME_ZONE_STATE_FULL:
-        proc_zone = proc_mask & NVME_PROC_FULL_ZONES;
-        break;
-    default:
-        proc_zone = false;
-    }
-
-    if (proc_zone) {
-        status = op_hndlr(ns, zone, zs, req);
-    }
-
-    return status;
-}
-
-static uint16_t nvme_do_zone_op(NvmeNamespace *ns, NvmeZone *zone,
-                                enum NvmeZoneProcessingMask proc_mask,
-                                op_handler_t op_hndlr, NvmeRequest *req)
-{
-    NvmeZone *next;
-    uint16_t status = NVME_SUCCESS;
-    int i;
-
-    if (!proc_mask) {
-        status = op_hndlr(ns, zone, nvme_get_zone_state(zone), req);
-    } else {
-        if (proc_mask & NVME_PROC_CLOSED_ZONES) {
-            QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) {
-                status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr,
-                                             req);
-                if (status && status != NVME_NO_COMPLETE) {
-                    goto out;
-                }
-            }
-        }
-        if (proc_mask & NVME_PROC_OPENED_ZONES) {
-            QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) {
-                status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr,
-                                             req);
-                if (status && status != NVME_NO_COMPLETE) {
-                    goto out;
-                }
-            }
-
-            QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) {
-                status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr,
-                                             req);
-                if (status && status != NVME_NO_COMPLETE) {
-                    goto out;
-                }
-            }
-        }
-        if (proc_mask & NVME_PROC_FULL_ZONES) {
-            QTAILQ_FOREACH_SAFE(zone, &ns->full_zones, entry, next) {
-                status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr,
-                                             req);
-                if (status && status != NVME_NO_COMPLETE) {
-                    goto out;
-                }
-            }
-        }
-
-        if (proc_mask & NVME_PROC_READ_ONLY_ZONES) {
-            for (i = 0; i < ns->num_zones; i++, zone++) {
-                status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr,
-                                             req);
-                if (status && status != NVME_NO_COMPLETE) {
-                    goto out;
-                }
-            }
-        }
-    }
-
-out:
-    return status;
-}
-
-typedef struct NvmeZoneResetAIOCB {
+typedef struct NvmeZoneMgmtAIOCB {
     BlockAIOCB common;
     BlockAIOCB *aiocb;
     NvmeRequest *req;
     int ret;
 
     bool all;
-    int idx;
-    NvmeZone *zone;
-} NvmeZoneResetAIOCB;
+    uint64_t offset;
+    uint64_t len;
+    BlockZoneOp op;
+} NvmeZoneMgmtAIOCB;
 
-static void nvme_zone_reset_cancel(BlockAIOCB *aiocb)
+static void nvme_zone_mgmt_send_cancel(BlockAIOCB *aiocb)
 {
-    NvmeZoneResetAIOCB *iocb = container_of(aiocb, NvmeZoneResetAIOCB, common);
-    NvmeRequest *req = iocb->req;
-    NvmeNamespace *ns = req->ns;
-
-    iocb->idx = ns->num_zones;
+    NvmeZoneMgmtAIOCB *iocb = container_of(aiocb, NvmeZoneMgmtAIOCB, common);
 
     iocb->ret = -ECANCELED;
 
@@ -3874,117 +3242,66 @@ static void nvme_zone_reset_cancel(BlockAIOCB *aiocb)
     }
 }
 
-static const AIOCBInfo nvme_zone_reset_aiocb_info = {
-    .aiocb_size = sizeof(NvmeZoneResetAIOCB),
-    .cancel_async = nvme_zone_reset_cancel,
+static const AIOCBInfo nvme_zone_mgmt_aiocb_info = {
+    .aiocb_size = sizeof(NvmeZoneMgmtAIOCB),
+    .cancel_async = nvme_zone_mgmt_send_cancel,
 };
 
-static void nvme_zone_reset_cb(void *opaque, int ret);
+static void nvme_zone_mgmt_send_cb(void *opaque, int ret);
 
-static void nvme_zone_reset_epilogue_cb(void *opaque, int ret)
+static void nvme_zone_mgmt_send_epilogue_cb(void *opaque, int ret)
 {
-    NvmeZoneResetAIOCB *iocb = opaque;
-    NvmeRequest *req = iocb->req;
-    NvmeNamespace *ns = req->ns;
-    int64_t moff;
-    int count;
+    NvmeZoneMgmtAIOCB *iocb = opaque;
+    NvmeNamespace *ns = iocb->req->ns;
 
     if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) {
-        goto out;
+        iocb->ret = ret;
+        error_report("Invalid zone mgmt op %d", ret);
+        goto done;
     }
 
-    moff = nvme_moff(ns, iocb->zone->d.zslba);
-    count = nvme_m2b(ns, ns->zone_size);
-
-    iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, moff, count,
-                                        BDRV_REQ_MAY_UNMAP,
-                                        nvme_zone_reset_cb, iocb);
     return;
 
-out:
-    nvme_zone_reset_cb(iocb, ret);
+done:
+    iocb->aiocb = NULL;
+    iocb->common.cb(iocb->common.opaque, iocb->ret);
+    qemu_aio_unref(iocb);
 }
 
-static void nvme_zone_reset_cb(void *opaque, int ret)
+static void nvme_zone_mgmt_send_cb(void *opaque, int ret)
 {
-    NvmeZoneResetAIOCB *iocb = opaque;
+    NvmeZoneMgmtAIOCB *iocb = opaque;
     NvmeRequest *req = iocb->req;
     NvmeNamespace *ns = req->ns;
+    BlockBackend *blk = ns->blkconf.blk;
 
-    if (iocb->ret < 0) {
-        goto done;
-    } else if (ret < 0) {
-        iocb->ret = ret;
-        goto done;
-    }
-
-    if (iocb->zone) {
-        nvme_zrm_reset(ns, iocb->zone);
-
-        if (!iocb->all) {
-            goto done;
-        }
-    }
-
-    while (iocb->idx < ns->num_zones) {
-        NvmeZone *zone = &ns->zone_array[iocb->idx++];
-
-        switch (nvme_get_zone_state(zone)) {
-        case NVME_ZONE_STATE_EMPTY:
-            if (!iocb->all) {
-                goto done;
-            }
-
-            continue;
-
-        case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-        case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-        case NVME_ZONE_STATE_CLOSED:
-        case NVME_ZONE_STATE_FULL:
-            iocb->zone = zone;
-            break;
-
-        default:
-            continue;
-        }
-
-        trace_pci_nvme_zns_zone_reset(zone->d.zslba);
-
-        iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk,
-                                            nvme_l2b(ns, zone->d.zslba),
-                                            nvme_l2b(ns, ns->zone_size),
-                                            BDRV_REQ_MAY_UNMAP,
-                                            nvme_zone_reset_epilogue_cb,
-                                            iocb);
-        return;
-    }
-
-done:
-    iocb->aiocb = NULL;
-
-    iocb->common.cb(iocb->common.opaque, iocb->ret);
-    qemu_aio_unref(iocb);
+    iocb->aiocb = blk_aio_zone_mgmt(blk, iocb->op, iocb->offset,
+                                    iocb->len,
+                                    nvme_zone_mgmt_send_epilogue_cb, iocb);
+    return;
 }
 
-static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone,
+static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, uint32_t zidx,
                                                uint64_t elba, NvmeRequest *req)
 {
     NvmeNamespace *ns = req->ns;
     uint16_t ozcs = le16_to_cpu(ns->id_ns_zoned->ozcs);
-    uint64_t wp = zone->d.wp;
-    uint32_t nlb = elba - wp + 1;
-    uint16_t status;
-
+    BlockZoneWps *wps = blk_get_zone_wps(ns->blkconf.blk);
+    uint64_t *wp = &wps->wp[zidx];
+    uint64_t raw_wpv = BDRV_ZP_GET_WP(*wp);
+    uint8_t za = BDRV_ZP_GET_ZA(raw_wpv);
+    uint64_t wpv = BDRV_ZP_GET_WP(raw_wpv);
+    uint32_t nlb = elba - wpv + 1;
 
     if (!(ozcs & NVME_ID_NS_ZONED_OZCS_ZRWASUP)) {
         return NVME_INVALID_ZONE_OP | NVME_DNR;
     }
 
-    if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) {
+    if (!(za & NVME_ZA_ZRWA_VALID)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    if (elba < wp || elba > wp + ns->zns.zrwas) {
+    if (elba < wpv || elba > wpv + ns->zns.zrwas) {
         return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR;
     }
 
@@ -3992,37 +3309,36 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone,
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    status = nvme_zrm_auto(n, ns, zone);
-    if (status) {
-        return status;
-    }
-
-    zone->w_ptr += nlb;
-
-    nvme_advance_zone_wp(ns, zone, nlb);
+    *wp += nlb;
 
     return NVME_SUCCESS;
 }
 
 static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
-                                        uint32_t zone_idx)
+                                             uint32_t zone_idx)
 {
     return &ns->zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)];
 }
 
+#define BLK_ZO_UNSUP 0x22
 static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)&req->cmd;
     NvmeNamespace *ns = req->ns;
-    NvmeZone *zone;
-    NvmeZoneResetAIOCB *iocb;
-    uint8_t *zd_ext;
+    NvmeZoneMgmtAIOCB *iocb;
     uint64_t slba = 0;
     uint32_t zone_idx = 0;
     uint16_t status;
     uint8_t action = cmd->zsa;
+    uint8_t *zd_ext;
+    uint64_t offset, len;
+    BlockBackend *blk = ns->blkconf.blk;
+    uint32_t zone_size = blk_get_zone_size(blk);
+    uint64_t size = zone_size * blk_get_nr_zones(blk);
+    BlockZoneOp op = BLK_ZO_UNSUP;
+    /* support flag, true when the op is supported */
+    bool flag = true;
     bool all;
-    enum NvmeZoneProcessingMask proc_mask = NVME_PROC_CURRENT_ZONE;
 
     all = cmd->zsflags & NVME_ZSFLAG_SELECT_ALL;
 
@@ -4033,82 +3349,51 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
         if (status) {
             return status;
         }
-    }
-
-    zone = &ns->zone_array[zone_idx];
-    if (slba != zone->d.zslba && action != NVME_ZONE_ACTION_ZRWA_FLUSH) {
-        trace_pci_nvme_err_unaligned_zone_cmd(action, slba, zone->d.zslba);
-        return NVME_INVALID_FIELD | NVME_DNR;
+        len = zone_size;
+    } else {
+        len = size;
     }
 
     switch (action) {
 
     case NVME_ZONE_ACTION_OPEN:
-        if (all) {
-            proc_mask = NVME_PROC_CLOSED_ZONES;
-        }
+        op = BLK_ZO_OPEN;
         trace_pci_nvme_open_zone(slba, zone_idx, all);
-        status = nvme_do_zone_op(ns, zone, proc_mask, nvme_open_zone, req);
         break;
 
     case NVME_ZONE_ACTION_CLOSE:
-        if (all) {
-            proc_mask = NVME_PROC_OPENED_ZONES;
-        }
+        op = BLK_ZO_CLOSE;
         trace_pci_nvme_close_zone(slba, zone_idx, all);
-        status = nvme_do_zone_op(ns, zone, proc_mask, nvme_close_zone, req);
         break;
 
     case NVME_ZONE_ACTION_FINISH:
-        if (all) {
-            proc_mask = NVME_PROC_OPENED_ZONES | NVME_PROC_CLOSED_ZONES;
-        }
+        op = BLK_ZO_FINISH;
         trace_pci_nvme_finish_zone(slba, zone_idx, all);
-        status = nvme_do_zone_op(ns, zone, proc_mask, nvme_finish_zone, req);
         break;
 
     case NVME_ZONE_ACTION_RESET:
+        op = BLK_ZO_RESET;
         trace_pci_nvme_reset_zone(slba, zone_idx, all);
-
-        iocb = blk_aio_get(&nvme_zone_reset_aiocb_info, ns->blkconf.blk,
-                           nvme_misc_cb, req);
-
-        iocb->req = req;
-        iocb->ret = 0;
-        iocb->all = all;
-        iocb->idx = zone_idx;
-        iocb->zone = NULL;
-
-        req->aiocb = &iocb->common;
-        nvme_zone_reset_cb(iocb, 0);
-
-        return NVME_NO_COMPLETE;
+        break;
 
     case NVME_ZONE_ACTION_OFFLINE:
-        if (all) {
-            proc_mask = NVME_PROC_READ_ONLY_ZONES;
-        }
+        op = BLK_ZO_OFFLINE;
         trace_pci_nvme_offline_zone(slba, zone_idx, all);
-        status = nvme_do_zone_op(ns, zone, proc_mask, nvme_offline_zone, req);
         break;
 
     case NVME_ZONE_ACTION_SET_ZD_EXT:
+        int zd_ext_size = blk_get_zd_ext_size(blk);
         trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
-        if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) {
+        if (all || !zd_ext_size) {
             return NVME_INVALID_FIELD | NVME_DNR;
         }
         zd_ext = nvme_get_zd_extension(ns, zone_idx);
-        status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), req);
+        status = nvme_h2c(n, zd_ext, zd_ext_size, req);
         if (status) {
             trace_pci_nvme_err_zd_extension_map_error(zone_idx);
             return status;
         }
-
-        status = nvme_set_zd_ext(ns, zone);
-        if (status == NVME_SUCCESS) {
-            trace_pci_nvme_zd_extension_set(zone_idx);
-            return status;
-        }
+        trace_pci_nvme_zd_extension_set(zone_idx);
         break;
 
     case NVME_ZONE_ACTION_ZRWA_FLUSH:
@@ -4116,16 +3401,34 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
             return NVME_INVALID_FIELD | NVME_DNR;
         }
 
-        return nvme_zone_mgmt_send_zrwa_flush(n, zone, slba, req);
+        return nvme_zone_mgmt_send_zrwa_flush(n, zone_idx, slba, req);
 
     default:
         trace_pci_nvme_err_invalid_mgmt_action(action);
         status = NVME_INVALID_FIELD;
     }
 
+    if (flag && (op != BLK_ZO_UNSUP)) {
+        iocb = blk_aio_get(&nvme_zone_mgmt_aiocb_info, ns->blkconf.blk,
+                           nvme_misc_cb, req);
+        iocb->req = req;
+        iocb->ret = 0;
+        iocb->all = all;
+        /* Convert it to bytes for accessing block layers */
+        offset = nvme_l2b(ns, slba);
+        iocb->offset = offset;
+        iocb->len = len;
+        iocb->op = op;
+
+        req->aiocb = &iocb->common;
+        nvme_zone_mgmt_send_cb(iocb, 0);
+
+        return NVME_NO_COMPLETE;
+    }
+
     if (status == NVME_ZONE_INVAL_TRANSITION) {
         trace_pci_nvme_err_invalid_zone_state_transition(action, slba,
-                                                         zone->d.za);
+                                                         TO_DO_ZA);
     }
     if (status) {
         status |= NVME_DNR;
@@ -4134,50 +3437,144 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
     return status;
 }
 
-static bool nvme_zone_matches_filter(uint32_t zafs, NvmeZone *zl)
+static bool nvme_zone_matches_filter(uint32_t zafs, BlockZoneState zs)
 {
-    NvmeZoneState zs = nvme_get_zone_state(zl);
-
     switch (zafs) {
     case NVME_ZONE_REPORT_ALL:
         return true;
     case NVME_ZONE_REPORT_EMPTY:
-        return zs == NVME_ZONE_STATE_EMPTY;
+        return zs == BLK_ZS_EMPTY;
     case NVME_ZONE_REPORT_IMPLICITLY_OPEN:
-        return zs == NVME_ZONE_STATE_IMPLICITLY_OPEN;
+        return zs == BLK_ZS_IOPEN;
     case NVME_ZONE_REPORT_EXPLICITLY_OPEN:
-        return zs == NVME_ZONE_STATE_EXPLICITLY_OPEN;
+        return zs == BLK_ZS_EOPEN;
     case NVME_ZONE_REPORT_CLOSED:
-        return zs == NVME_ZONE_STATE_CLOSED;
+        return zs == BLK_ZS_CLOSED;
     case NVME_ZONE_REPORT_FULL:
-        return zs == NVME_ZONE_STATE_FULL;
+        return zs == BLK_ZS_FULL;
     case NVME_ZONE_REPORT_READ_ONLY:
-        return zs == NVME_ZONE_STATE_READ_ONLY;
+        return zs == BLK_ZS_RDONLY;
     case NVME_ZONE_REPORT_OFFLINE:
-        return zs == NVME_ZONE_STATE_OFFLINE;
+        return zs == BLK_ZS_OFFLINE;
     default:
         return false;
     }
 }
 
+static void nvme_zone_mgmt_recv_completed_cb(void *opaque, int ret)
+{
+    NvmeZoneCmdAIOCB *iocb = opaque;
+    NvmeRequest *req = iocb->req;
+    NvmeCmd *cmd = iocb->cmd;
+    uint32_t dw13 = le32_to_cpu(cmd->cdw13);
+    int64_t zrp_size, j = 0;
+    uint32_t zrasf;
+    g_autofree void *buf = NULL;
+    void *buf_p;
+    NvmeZoneReportHeader *zrp_hdr;
+    uint64_t nz = iocb->zone_report_data.nr_zones;
+    BlockZoneDescriptor *in_zone = iocb->zone_report_data.zones;
+    NvmeZoneDescr *out_zone;
+
+    if (ret < 0) {
+        error_report("Invalid zone recv %d", ret);
+        goto out;
+    }
+
+    zrasf = (dw13 >> 8) & 0xff;
+    if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
+        error_report("Nvme invalid field");
+        return;
+    }
+
+    zrp_size = sizeof(NvmeZoneReportHeader) + sizeof(NvmeZoneDescr) * nz;
+    buf = g_malloc0(zrp_size);
+
+    zrp_hdr = buf;
+    zrp_hdr->nr_zones = cpu_to_le64(nz);
+    buf_p = buf + sizeof(NvmeZoneReportHeader);
+
+    for (; j < nz; j++) {
+        out_zone = buf_p;
+        buf_p += sizeof(NvmeZoneDescr);
+
+        BlockZoneState zs = in_zone[j].state;
+        if (!nvme_zone_matches_filter(zrasf, zs)) {
+            continue;
+        }
+
+        *out_zone = (NvmeZoneDescr) {
+            .zslba = nvme_b2l(req->ns, in_zone[j].start),
+            .zcap = nvme_b2l(req->ns, in_zone[j].cap),
+            .wp = nvme_b2l(req->ns, in_zone[j].wp),
+        };
+
+        switch (in_zone[j].type) {
+        case BLK_ZT_CONV:
+            out_zone->zt = NVME_ZONE_TYPE_RESERVED;
+            break;
+        case BLK_ZT_SWR:
+            out_zone->zt = NVME_ZONE_TYPE_SEQ_WRITE;
+            break;
+        case BLK_ZT_SWP:
+            out_zone->zt = NVME_ZONE_TYPE_RESERVED;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        switch (zs) {
+        case BLK_ZS_RDONLY:
+            out_zone->zs = NVME_ZONE_STATE_READ_ONLY << 4;
+            break;
+        case BLK_ZS_OFFLINE:
+            out_zone->zs = NVME_ZONE_STATE_OFFLINE << 4;
+            break;
+        case BLK_ZS_EMPTY:
+            out_zone->zs = NVME_ZONE_STATE_EMPTY << 4;
+            break;
+        case BLK_ZS_CLOSED:
+            out_zone->zs = NVME_ZONE_STATE_CLOSED << 4;
+            break;
+        case BLK_ZS_FULL:
+            out_zone->zs = NVME_ZONE_STATE_FULL << 4;
+            break;
+        case BLK_ZS_EOPEN:
+            out_zone->zs = NVME_ZONE_STATE_EXPLICITLY_OPEN << 4;
+            break;
+        case BLK_ZS_IOPEN:
+            out_zone->zs = NVME_ZONE_STATE_IMPLICITLY_OPEN << 4;
+            break;
+        case BLK_ZS_NOT_WP:
+            out_zone->zs = NVME_ZONE_STATE_RESERVED << 4;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    nvme_c2h(iocb->n, (uint8_t *)buf, zrp_size, req);
+
+out:
+    g_free(iocb->zone_report_data.zones);
+    g_free(iocb);
+    return;
+}
+
 static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeCmd *cmd = (NvmeCmd *)&req->cmd;
     NvmeNamespace *ns = req->ns;
+    BlockBackend *blk = ns->blkconf.blk;
+    NvmeZoneCmdAIOCB *iocb;
     /* cdw12 is zero-based number of dwords to return. Convert to bytes */
     uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
     uint32_t dw13 = le32_to_cpu(cmd->cdw13);
-    uint32_t zone_idx, zra, zrasf, partial;
-    uint64_t max_zones, nr_zones = 0;
+    uint32_t zone_idx, zra, zrasf, partial, nr_zones;
     uint16_t status;
     uint64_t slba;
-    NvmeZoneDescr *z;
-    NvmeZone *zone;
-    NvmeZoneReportHeader *header;
-    void *buf, *buf_p;
     size_t zone_entry_sz;
-    int i;
-
+    int64_t offset;
     req->status = NVME_SUCCESS;
 
     status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx);
@@ -4208,64 +3605,31 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
         return status;
     }
 
-    partial = (dw13 >> 16) & 0x01;
-
     zone_entry_sz = sizeof(NvmeZoneDescr);
     if (zra == NVME_ZONE_REPORT_EXTENDED) {
-        zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ;
-    }
-
-    max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
-    buf = g_malloc0(data_size);
-
-    zone = &ns->zone_array[zone_idx];
-    for (i = zone_idx; i < ns->num_zones; i++) {
-        if (partial && nr_zones >= max_zones) {
-            break;
-        }
-        if (nvme_zone_matches_filter(zrasf, zone++)) {
-            nr_zones++;
-        }
+        zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk);
     }
-    header = buf;
-    header->nr_zones = cpu_to_le64(nr_zones);
-
-    buf_p = buf + sizeof(NvmeZoneReportHeader);
-    for (; zone_idx < ns->num_zones && max_zones > 0; zone_idx++) {
-        zone = &ns->zone_array[zone_idx];
-        if (nvme_zone_matches_filter(zrasf, zone)) {
-            z = buf_p;
-            buf_p += sizeof(NvmeZoneDescr);
-
-            z->zt = zone->d.zt;
-            z->zs = zone->d.zs;
-            z->zcap = cpu_to_le64(zone->d.zcap);
-            z->zslba = cpu_to_le64(zone->d.zslba);
-            z->za = zone->d.za;
-
-            if (nvme_wp_is_valid(zone)) {
-                z->wp = cpu_to_le64(zone->d.wp);
-            } else {
-                z->wp = cpu_to_le64(~0ULL);
-            }
 
-            if (zra == NVME_ZONE_REPORT_EXTENDED) {
-                int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk);
-                if (zone->d.za & NVME_ZA_ZD_EXT_VALID) {
-                    memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx),
-                           zd_ext_size);
-                }
-                buf_p += zd_ext_size;
-            }
-
-            max_zones--;
-        }
+    offset = nvme_l2b(ns, slba);
+    nr_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
+    partial = (dw13 >> 16) & 0x01;
+    if (!partial) {
+        nr_zones = blk_get_nr_zones(blk);
+        offset = 0;
     }
 
-    status = nvme_c2h(n, (uint8_t *)buf, data_size, req);
-
-    g_free(buf);
-
+    iocb = g_malloc0(sizeof(NvmeZoneCmdAIOCB));
+    iocb->req = req;
+    iocb->n = n;
+    iocb->cmd = cmd;
+    iocb->zone_report_data.nr_zones = nr_zones;
+    iocb->zone_report_data.zones = g_malloc0(
+        sizeof(BlockZoneDescriptor) * nr_zones);
+
+    blk_aio_zone_report(blk, offset,
+                        &iocb->zone_report_data.nr_zones,
+                        iocb->zone_report_data.zones,
+                        nvme_zone_mgmt_recv_completed_cb, iocb);
     return status;
 }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 45c08391f5..63106a0f27 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -219,36 +219,10 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp)
 static void nvme_ns_zoned_init_state(NvmeNamespace *ns)
 {
     BlockBackend *blk = ns->blkconf.blk;
-    uint64_t start = 0, zone_size = ns->zone_size;
-    uint64_t capacity = ns->num_zones * zone_size;
-    NvmeZone *zone;
-    int i;
-
-    ns->zone_array = g_new0(NvmeZone, ns->num_zones);
     if (blk_get_zone_extension(blk)) {
         ns->zd_extensions = blk_get_zone_extension(blk);
     }
 
-    QTAILQ_INIT(&ns->exp_open_zones);
-    QTAILQ_INIT(&ns->imp_open_zones);
-    QTAILQ_INIT(&ns->closed_zones);
-    QTAILQ_INIT(&ns->full_zones);
-
-    zone = ns->zone_array;
-    for (i = 0; i < ns->num_zones; i++, zone++) {
-        if (start + zone_size > capacity) {
-            zone_size = capacity - start;
-        }
-        zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE;
-        nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY);
-        zone->d.za = 0;
-        zone->d.zcap = ns->zone_capacity;
-        zone->d.zslba = start;
-        zone->d.wp = start;
-        zone->w_ptr = start;
-        start += zone_size;
-    }
-
     ns->zone_size_log2 = 0;
     if (is_power_of_2(ns->zone_size)) {
         ns->zone_size_log2 = 63 - clz64(ns->zone_size);
@@ -319,56 +293,12 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
     ns->id_ns_zoned = id_ns_z;
 }
 
-static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone)
-{
-    uint8_t state;
-
-    zone->w_ptr = zone->d.wp;
-    state = nvme_get_zone_state(zone);
-    if (zone->d.wp != zone->d.zslba ||
-        (zone->d.za & NVME_ZA_ZD_EXT_VALID)) {
-        if (state != NVME_ZONE_STATE_CLOSED) {
-            trace_pci_nvme_clear_ns_close(state, zone->d.zslba);
-            nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED);
-        }
-        nvme_aor_inc_active(ns);
-        QTAILQ_INSERT_HEAD(&ns->closed_zones, zone, entry);
-    } else {
-        trace_pci_nvme_clear_ns_reset(state, zone->d.zslba);
-        if (zone->d.za & NVME_ZA_ZRWA_VALID) {
-            zone->d.za &= ~NVME_ZA_ZRWA_VALID;
-            ns->zns.numzrwa++;
-        }
-        nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY);
-    }
-}
-
 /*
  * Close all the zones that are currently open.
  */
 static void nvme_zoned_ns_shutdown(NvmeNamespace *ns)
 {
-    NvmeZone *zone, *next;
-
-    QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) {
-        QTAILQ_REMOVE(&ns->closed_zones, zone, entry);
-        nvme_aor_dec_active(ns);
-        nvme_clear_zone(ns, zone);
-    }
-    QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) {
-        QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry);
-        nvme_aor_dec_open(ns);
-        nvme_aor_dec_active(ns);
-        nvme_clear_zone(ns, zone);
-    }
-    QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) {
-        QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry);
-        nvme_aor_dec_open(ns);
-        nvme_aor_dec_active(ns);
-        nvme_clear_zone(ns, zone);
-    }
-
-    assert(ns->nr_open_zones == 0);
+    /* Set states (exp/imp_open/closed/full) to empty */
 }
 
 static NvmeRuHandle *nvme_find_ruh_by_attr(NvmeEnduranceGroup *endgrp,
@@ -662,7 +592,6 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
 {
     if (blk_get_zone_model(ns->blkconf.blk)) {
         g_free(ns->id_ns_zoned);
-        g_free(ns->zone_array);
     }
 
     if (ns->endgrp && ns->endgrp->fdp.enabled) {
@@ -776,10 +705,6 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
     DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace,
                      params.cross_zone_read, false),
-    DEFINE_PROP_UINT32("zoned.max_active", NvmeNamespace,
-                       params.max_active_zones, 0),
-    DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace,
-                       params.max_open_zones, 0),
     DEFINE_PROP_UINT32("zoned.numzrwa", NvmeNamespace, params.numzrwa, 0),
     DEFINE_PROP_SIZE("zoned.zrwas", NvmeNamespace, params.zrwas, 0),
     DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 37007952fc..c2d1b07f88 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -150,6 +150,9 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys,
 #define NVME_NS(obj) \
     OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
 
+#define TO_DO_STATE 0
+#define TO_DO_ZA 0
+
 typedef struct NvmeZone {
     NvmeZoneDescr   d;
     uint64_t        w_ptr;
@@ -190,8 +193,6 @@ typedef struct NvmeNamespaceParams {
     uint8_t  msrc;
 
     bool     cross_zone_read;
-    uint32_t max_active_zones;
-    uint32_t max_open_zones;
 
     uint32_t numzrwa;
     uint64_t zrwas;
@@ -228,11 +229,10 @@ typedef struct NvmeNamespace {
     QTAILQ_ENTRY(NvmeNamespace) entry;
 
     NvmeIdNsZoned   *id_ns_zoned;
-    NvmeZone        *zone_array;
-    QTAILQ_HEAD(, NvmeZone) exp_open_zones;
-    QTAILQ_HEAD(, NvmeZone) imp_open_zones;
-    QTAILQ_HEAD(, NvmeZone) closed_zones;
-    QTAILQ_HEAD(, NvmeZone) full_zones;
+    uint32_t        *exp_open_zones;
+    uint32_t        *imp_open_zones;
+    uint32_t        *closed_zones;
+    uint32_t        *full_zones;
     uint32_t        num_zones;
     uint64_t        zone_size;
     uint64_t        zone_capacity;
@@ -265,6 +265,12 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
     return 0;
 }
 
+/* Bytes to LBAs */
+static inline uint64_t nvme_b2l(NvmeNamespace *ns, uint64_t lba)
+{
+    return lba >> ns->lbaf.ds;
+}
+
 static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
 {
     return lba << ns->lbaf.ds;
@@ -285,70 +291,9 @@ static inline bool nvme_ns_ext(NvmeNamespace *ns)
     return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas);
 }
 
-static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone)
+static inline NvmeZoneState nvme_get_zone_state(uint64_t wp)
 {
-    return zone->d.zs >> 4;
-}
-
-static inline void nvme_set_zone_state(NvmeZone *zone, NvmeZoneState state)
-{
-    zone->d.zs = state << 4;
-}
-
-static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone)
-{
-    return zone->d.zslba + ns->zone_size;
-}
-
-static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone)
-{
-    return zone->d.zslba + zone->d.zcap;
-}
-
-static inline bool nvme_wp_is_valid(NvmeZone *zone)
-{
-    uint8_t st = nvme_get_zone_state(zone);
-
-    return st != NVME_ZONE_STATE_FULL &&
-           st != NVME_ZONE_STATE_READ_ONLY &&
-           st != NVME_ZONE_STATE_OFFLINE;
-}
-
-static inline void nvme_aor_inc_open(NvmeNamespace *ns)
-{
-    assert(ns->nr_open_zones >= 0);
-    if (ns->params.max_open_zones) {
-        ns->nr_open_zones++;
-        assert(ns->nr_open_zones <= ns->params.max_open_zones);
-    }
-}
-
-static inline void nvme_aor_dec_open(NvmeNamespace *ns)
-{
-    if (ns->params.max_open_zones) {
-        assert(ns->nr_open_zones > 0);
-        ns->nr_open_zones--;
-    }
-    assert(ns->nr_open_zones >= 0);
-}
-
-static inline void nvme_aor_inc_active(NvmeNamespace *ns)
-{
-    assert(ns->nr_active_zones >= 0);
-    if (ns->params.max_active_zones) {
-        ns->nr_active_zones++;
-        assert(ns->nr_active_zones <= ns->params.max_active_zones);
-    }
-}
-
-static inline void nvme_aor_dec_active(NvmeNamespace *ns)
-{
-    if (ns->params.max_active_zones) {
-        assert(ns->nr_active_zones > 0);
-        ns->nr_active_zones--;
-        assert(ns->nr_active_zones >= ns->nr_open_zones);
-    }
-    assert(ns->nr_active_zones >= 0);
+    return wp >> 60;
 }
 
 static inline void nvme_fdp_stat_inc(uint64_t *a, uint64_t b)
diff --git a/include/block/block-common.h b/include/block/block-common.h
index d7599564db..ea213c3887 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -90,6 +90,7 @@ typedef enum BlockZoneOp {
     BLK_ZO_CLOSE,
     BLK_ZO_FINISH,
     BLK_ZO_RESET,
+    BLK_ZO_OFFLINE,
 } BlockZoneOp;
 
 typedef enum BlockZoneModel {
@@ -269,6 +270,13 @@ typedef enum {
  */
 #define BDRV_ZT_IS_CONV(wp)    (wp & (1ULL << 63))
 
+/*
+ * Clear the zone state, type and attribute information in the wp.
+ */
+#define BDRV_ZP_GET_WP(wp)     ((wp << 6) >> 6)
+#define BDRV_ZP_GET_ZS(wp)     (wp >> 60)
+#define BDRV_ZP_GET_ZA(wp)      (wp & ((1ULL << 8) - 1ULL) << 51)
+
 #define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
                                            INT_MAX >> BDRV_SECTOR_BITS)
 #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index c649f1ca75..ad983ad243 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -916,6 +916,8 @@ typedef struct BlockLimits {
 
     /* size of data that is associated with a zone in bytes */
     uint32_t zd_extension_size;
+
+    uint8_t zone_attribute;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 6/7] hw/nvme: refactor zone append write using block layer APIs
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
                   ` (4 preceding siblings ...)
  2023-11-27  8:56 ` [RFC v2 5/7] hw/nvme: make the metadata of ZNS emulation persistent Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-27  8:56 ` [RFC v2 7/7] hw/nvme: make ZDED persistent Sam Li
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/qcow2.c        |   2 +-
 hw/nvme/ctrl.c       | 190 ++++++++++++++++++++++++++++++++-----------
 include/sysemu/dma.h |   3 +
 system/dma-helpers.c |  17 ++++
 4 files changed, 162 insertions(+), 50 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dfaf5566e2..74d2e2bf39 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2290,7 +2290,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.max_open_zones = s->zoned_header.max_open_zones;
     bs->bl.zone_size = s->zoned_header.zone_size;
     bs->bl.zone_capacity = s->zoned_header.zone_capacity;
-    bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+    bs->bl.write_granularity = BDRV_SECTOR_SIZE; /* physical block size */
     bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 }
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index b9ed3495e1..f65a87646e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1735,6 +1735,95 @@ static void nvme_misc_cb(void *opaque, int ret)
     nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+typedef struct NvmeZoneCmdAIOCB {
+    NvmeRequest *req;
+    NvmeCmd *cmd;
+    NvmeCtrl *n;
+
+    union {
+        struct {
+          uint32_t partial;
+          unsigned int nr_zones;
+          BlockZoneDescriptor *zones;
+        } zone_report_data;
+        struct {
+          int64_t offset;
+        } zone_append_data;
+    };
+} NvmeZoneCmdAIOCB;
+
+static void nvme_blk_zone_append_complete_cb(void *opaque, int ret)
+{
+    NvmeZoneCmdAIOCB *cb = opaque;
+    NvmeRequest *req = cb->req;
+    int64_t *offset = (int64_t *)&req->cqe;
+
+    if (ret) {
+        nvme_aio_err(req, ret);
+    }
+
+    *offset = nvme_b2l(req->ns, cb->zone_append_data.offset);
+    nvme_enqueue_req_completion(nvme_cq(req), req);
+    g_free(cb);
+}
+
+static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset,
+                                  uint32_t align,
+                                  BlockCompletionFunc *cb,
+                                  NvmeZoneCmdAIOCB *aiocb)
+{
+    NvmeRequest *req = aiocb->req;
+    assert(req->sg.flags & NVME_SG_ALLOC);
+
+    if (req->sg.flags & NVME_SG_DMA) {
+        req->aiocb = dma_blk_zone_append(blk, &req->sg.qsg, (int64_t)offset,
+                                         align, cb, aiocb);
+    } else {
+        req->aiocb = blk_aio_zone_append(blk, offset, &req->sg.iov, 0,
+                                         cb, aiocb);
+    }
+}
+
+static void nvme_zone_append_cb(void *opaque, int ret)
+{
+    NvmeZoneCmdAIOCB *aiocb = opaque;
+    NvmeRequest *req = aiocb->req;
+    NvmeNamespace *ns = req->ns;
+
+    BlockBackend *blk = ns->blkconf.blk;
+
+    trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
+
+    if (ret) {
+        goto out;
+    }
+
+    if (ns->lbaf.ms) {
+        NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+        uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+        int64_t offset = aiocb->zone_append_data.offset;
+
+        if (nvme_ns_ext(ns) || req->cmd.mptr) {
+            uint16_t status;
+
+            nvme_sg_unmap(&req->sg);
+            status = nvme_map_mdata(nvme_ctrl(req), nlb, req);
+            if (status) {
+                ret = -EFAULT;
+                goto out;
+            }
+
+            return nvme_blk_zone_append(blk, &offset, 1,
+                                        nvme_blk_zone_append_complete_cb,
+                                        aiocb);
+        }
+    }
+
+out:
+    nvme_blk_zone_append_complete_cb(aiocb, ret);
+}
+
+
 void nvme_rw_complete_cb(void *opaque, int ret)
 {
     NvmeRequest *req = opaque;
@@ -3061,6 +3150,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
     uint64_t mapped_size = data_size;
     uint64_t data_offset;
     BlockBackend *blk = ns->blkconf.blk;
+    BlockZoneWps *wps = blk_get_zone_wps(blk);
+    uint32_t zone_size = blk_get_zone_size(blk);
+    uint32_t zone_idx;
     uint16_t status;
 
     if (nvme_ns_ext(ns)) {
@@ -3091,42 +3183,47 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
     }
 
     if (blk_get_zone_model(blk)) {
-        uint32_t zone_size = blk_get_zone_size(blk);
-        uint32_t zone_idx = slba / zone_size;
-        int64_t zone_start = zone_idx * zone_size;
+        assert(wps);
+        if (zone_size) {
+            zone_idx = slba / zone_size;
+            int64_t zone_start = zone_idx * zone_size;
+
+            if (append) {
+                bool piremap = !!(ctrl & NVME_RW_PIREMAP);
+
+                if (n->params.zasl &&
+                    data_size > (uint64_t)
+                    n->page_size << n->params.zasl) {
+                    trace_pci_nvme_err_zasl(data_size);
+                    return NVME_INVALID_FIELD | NVME_DNR;
+                }
 
-        if (append) {
-            bool piremap = !!(ctrl & NVME_RW_PIREMAP);
+                rw->slba = cpu_to_le64(slba);
 
-            if (n->params.zasl &&
-                data_size > (uint64_t)n->page_size << n->params.zasl) {
-                trace_pci_nvme_err_zasl(data_size);
-                return NVME_INVALID_FIELD | NVME_DNR;
-            }
+                switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+                case NVME_ID_NS_DPS_TYPE_1:
+                    if (!piremap) {
+                        return NVME_INVALID_PROT_INFO | NVME_DNR;
+                    }
 
-            rw->slba = cpu_to_le64(slba);
-            switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
-            case NVME_ID_NS_DPS_TYPE_1:
-                if (!piremap) {
-                    return NVME_INVALID_PROT_INFO | NVME_DNR;
-                }
+                    /* fallthrough */
 
-                /* fallthrough */
+                case NVME_ID_NS_DPS_TYPE_2:
+                    if (piremap) {
+                        uint32_t reftag = le32_to_cpu(rw->reftag);
+                        rw->reftag =
+                            cpu_to_le32(reftag + (slba - zone_start));
+                    }
 
-            case NVME_ID_NS_DPS_TYPE_2:
-                if (piremap) {
-                    uint32_t reftag = le32_to_cpu(rw->reftag);
-                    rw->reftag = cpu_to_le32(reftag + (slba - zone_start));
-                }
+                    break;
 
-                break;
+                case NVME_ID_NS_DPS_TYPE_3:
+                    if (piremap) {
+                        return NVME_INVALID_PROT_INFO | NVME_DNR;
+                    }
 
-            case NVME_ID_NS_DPS_TYPE_3:
-                if (piremap) {
-                    return NVME_INVALID_PROT_INFO | NVME_DNR;
+                    break;
                 }
-
-                break;
             }
         }
 
@@ -3146,9 +3243,21 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
             goto invalid;
         }
 
-        block_acct_start(blk_get_stats(blk), &req->acct, data_size,
-                         BLOCK_ACCT_WRITE);
-        nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+        if (append) {
+            NvmeZoneCmdAIOCB *cb = g_malloc(sizeof(NvmeZoneCmdAIOCB));
+            cb->req = req;
+            cb->zone_append_data.offset = data_offset;
+
+            block_acct_start(blk_get_stats(blk), &req->acct, data_size,
+                             BLOCK_ACCT_ZONE_APPEND);
+            nvme_blk_zone_append(blk, &cb->zone_append_data.offset,
+                                 blk_get_write_granularity(blk),
+                                 nvme_zone_append_cb, cb);
+        } else {
+            block_acct_start(blk_get_stats(blk), &req->acct, data_size,
+                             BLOCK_ACCT_WRITE);
+            nvme_blk_write(blk, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+        }
     } else {
         req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
                                            BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
@@ -3172,24 +3281,7 @@ static inline uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
     return nvme_do_write(n, req, false, true);
 }
 
-typedef struct NvmeZoneCmdAIOCB {
-    NvmeRequest *req;
-    NvmeCmd *cmd;
-    NvmeCtrl *n;
-
-    union {
-        struct {
-          uint32_t partial;
-          unsigned int nr_zones;
-          BlockZoneDescriptor *zones;
-        } zone_report_data;
-        struct {
-          int64_t offset;
-        } zone_append_data;
-    };
-} NvmeZoneCmdAIOCB;
-
-static inline uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req)
 {
     return nvme_do_write(n, req, true, false);
 }
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index a1ac5bc1b5..680e0b5477 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -301,6 +301,9 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk,
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
                           QEMUSGList *sg, uint64_t offset, uint32_t align,
                           BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *dma_blk_zone_append(BlockBackend *blk,
+                          QEMUSGList *sg, int64_t offset, uint32_t align,
+                          void (*cb)(void *opaque, int ret), void *opaque);
 MemTxResult dma_buf_read(void *ptr, dma_addr_t len, dma_addr_t *residual,
                          QEMUSGList *sg, MemTxAttrs attrs);
 MemTxResult dma_buf_write(void *ptr, dma_addr_t len, dma_addr_t *residual,
diff --git a/system/dma-helpers.c b/system/dma-helpers.c
index 36211acc7e..98c97a165d 100644
--- a/system/dma-helpers.c
+++ b/system/dma-helpers.c
@@ -274,6 +274,23 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
                       DMA_DIRECTION_TO_DEVICE);
 }
 
+static
+BlockAIOCB *dma_blk_zone_append_io_func(int64_t offset, QEMUIOVector *iov,
+                                  BlockCompletionFunc *cb, void *cb_opaque,
+                                  void *opaque)
+{
+    BlockBackend *blk = opaque;
+    return blk_aio_zone_append(blk, (int64_t *)offset, iov, 0, cb, cb_opaque);
+}
+
+BlockAIOCB *dma_blk_zone_append(BlockBackend *blk,
+                          QEMUSGList *sg, int64_t offset, uint32_t align,
+                          void (*cb)(void *opaque, int ret), void *opaque)
+{
+    return dma_blk_io(blk_get_aio_context(blk), sg, offset, align,
+                      dma_blk_zone_append_io_func, blk, cb, opaque,
+                      DMA_DIRECTION_TO_DEVICE);
+}
 
 static MemTxResult dma_buf_rw(void *buf, dma_addr_t len, dma_addr_t *residual,
                               QEMUSGList *sg, DMADirection dir,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 7/7] hw/nvme: make ZDED persistent
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
                   ` (5 preceding siblings ...)
  2023-11-27  8:56 ` [RFC v2 6/7] hw/nvme: refactor zone append write using block layer APIs Sam Li
@ 2023-11-27  8:56 ` Sam Li
  2023-11-30 10:11 ` [RFC v2 0/7] Add persistence to NVMe ZNS emulation Markus Armbruster
  2024-01-10  6:52 ` Klaus Jensen
  8 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, Klaus Jensen, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal,
	Sam Li

Zone descriptor extension data (ZDED) is not persistent across QEMU
restarts. The zone descriptor extension valid bit (ZDEV) is part of
zone attributes, which sets to one when the ZDED is associated with
the zone.

With the qcow2 img as the backing file, the NVMe ZNS device stores
the zone attributes at the following eight bit of zone type bit of write
pointers for each zone. The ZDED is stored as part of zoned metadata as
write pointers.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/qcow2.c                | 45 ++++++++++++++++++++++++++++++++++++
 hw/nvme/ctrl.c               |  1 +
 include/block/block-common.h |  1 +
 3 files changed, 47 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 74d2e2bf39..861a8f9f06 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 
 #include "block/qdict.h"
+#include "block/nvme.h"
 #include "sysemu/block-backend.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -235,6 +236,17 @@ static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs,
     return BLK_ZS_NOT_WP;
 }
 
+static inline void qcow2_set_za(uint64_t *wp, uint8_t za)
+{
+    /*
+     * The zone attribute takes up one byte. Store it after the zoned
+     * bit.
+     */
+    uint64_t addr = *wp;
+    addr |= ((uint64_t)za << 51);
+    *wp = addr;
+}
+
 /*
  * Write the new wp value to the dedicated location of the image file.
  */
@@ -4990,6 +5002,36 @@ unlock:
     return ret;
 }
 
+static int qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int ret;
+
+    qemu_co_mutex_lock(&bs->wps->colock);
+    uint64_t *wp = &bs->wps->wp[index];
+    BlockZoneState zs = qcow2_get_zone_state(bs, index);
+    if (zs == BLK_ZS_EMPTY) {
+        ret = qcow2_check_zone_resources(bs, zs);
+        if (ret < 0) {
+            goto unlock;
+        }
+
+        qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID);
+        ret = qcow2_write_wp_at(bs, wp, index);
+        if (ret < 0) {
+            error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp);
+            goto unlock;
+        }
+        s->nr_zones_closed++;
+        qemu_co_mutex_unlock(&bs->wps->colock);
+        return ret;
+    }
+
+unlock:
+    qemu_co_mutex_unlock(&bs->wps->colock);
+    return NVME_ZONE_INVAL_TRANSITION;
+}
+
 static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
                                            int64_t offset, int64_t len)
 {
@@ -5046,6 +5088,9 @@ static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
     case BLK_ZO_OFFLINE:
         /* There are no transitions from the offline state to any other state */
         break;
+    case BLK_ZO_SET_ZDED:
+        ret = qcow2_zns_set_zded(bs, index);
+        break;
     default:
         error_report("Unsupported zone op: 0x%x", op);
         ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f65a87646e..c33e24e303 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3474,6 +3474,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
         break;
 
     case NVME_ZONE_ACTION_SET_ZD_EXT:
+        op = BLK_ZO_SET_ZDED;
         int zd_ext_size = blk_get_zd_ext_size(blk);
         trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
         if (all || !zd_ext_size) {
diff --git a/include/block/block-common.h b/include/block/block-common.h
index ea213c3887..b61541599f 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -91,6 +91,7 @@ typedef enum BlockZoneOp {
     BLK_ZO_FINISH,
     BLK_ZO_RESET,
     BLK_ZO_OFFLINE,
+    BLK_ZO_SET_ZDED,
 } BlockZoneOp;
 
 typedef enum BlockZoneModel {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature
  2023-11-27  8:56 ` [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Sam Li
@ 2023-11-30 10:05   ` Markus Armbruster
  0 siblings, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2023-11-30 10:05 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, stefanha, Klaus Jensen, qemu-block, hare,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Eric Blake, Peter Xu, Paolo Bonzini, dlemoal

Sam Li <faithilikerun@gmail.com> writes:

> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  docs/interop/qcow2.txt | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index 0f1938f056..458d05371a 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -428,6 +428,9 @@ The fields of the zoned extension are:
>                     The offset of zoned metadata structure in the contained
>                     image, in bytes.
>  
> +          44 - 51:  zd_extension_size
> +                    The size of zone descriptor extension data in bytes.
> +

Indentation is off.

>  == Full disk encryption header pointer ==
>  
>  The full disk encryption header must be present if, and only if, the



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/7] Add persistence to NVMe ZNS emulation
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
                   ` (6 preceding siblings ...)
  2023-11-27  8:56 ` [RFC v2 7/7] hw/nvme: make ZDED persistent Sam Li
@ 2023-11-30 10:11 ` Markus Armbruster
  2023-11-30 10:20   ` Sam Li
  2024-01-10  6:52 ` Klaus Jensen
  8 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2023-11-30 10:11 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, stefanha, Klaus Jensen, qemu-block, hare,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Eric Blake, Peter Xu, Paolo Bonzini, dlemoal

Sam Li <faithilikerun@gmail.com> writes:

> ZNS emulation follows NVMe ZNS spec but the state of namespace
> zones does not persist accross restarts of QEMU. This patch makes the
> metadata of ZNS emulation persistent by using new block layer APIs and
> the qcow2 img as backing file. It is the second part after the patches
> - adding full zoned storage emulation to qcow2 driver.
> https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilikerun@gmail.com/

In the future, also add this information the machine-readable way,
i.e. like

  Based-on: <20231127043703.49489-1-faithilikerun@gmail.com>

However, it doesn't apply on top of that series for me.  Got something I
could pull?

[...]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata
  2023-11-27  8:56 ` [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata Sam Li
@ 2023-11-30 10:12   ` Markus Armbruster
  0 siblings, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2023-11-30 10:12 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, stefanha, Klaus Jensen, qemu-block, hare,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Eric Blake, Peter Xu, Paolo Bonzini, dlemoal

Sam Li <faithilikerun@gmail.com> writes:

> Zone descriptor data is host definied data that is associated with
> each zone. Add zone descriptor extensions to zonedmeta struct.
>
> Signed-off-by: Sam Li <faithilikerun@gmail.com>

[...]

>  struct BlockBackendRootState {
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ef98dc83a0..a7f238371c 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5074,12 +5074,16 @@
>  #     append request that can be issued to the device.  It must be
>  #     512-byte aligned
>  #
> +# @descriptor-extension-size: The size of zone descriptor extension
> +#     data. Must be a multiple of 64 bytes (since 8.2)

Two spaces between sentences for consistency, please.

What's the default?

> +#
>  # Since 8.2
>  ##
>  { 'struct': 'Qcow2ZoneHostManaged',
>    'data': { '*size':          'size',
>              '*capacity':      'size',
>              '*conventional-zones': 'uint32',
> +            '*descriptor-extension-size':  'size',
>              '*max-open-zones':     'uint32',
>              '*max-active-zones':   'uint32',
>              '*max-append-bytes':   'uint32' } }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/7] Add persistence to NVMe ZNS emulation
  2023-11-30 10:11 ` [RFC v2 0/7] Add persistence to NVMe ZNS emulation Markus Armbruster
@ 2023-11-30 10:20   ` Sam Li
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-30 10:20 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, stefanha, Klaus Jensen, qemu-block, hare,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Eric Blake, Peter Xu, Paolo Bonzini, dlemoal

Markus Armbruster <armbru@redhat.com> 于2023年11月30日周四 18:11写道:
>
> Sam Li <faithilikerun@gmail.com> writes:
>
> > ZNS emulation follows NVMe ZNS spec but the state of namespace
> > zones does not persist accross restarts of QEMU. This patch makes the
> > metadata of ZNS emulation persistent by using new block layer APIs and
> > the qcow2 img as backing file. It is the second part after the patches
> > - adding full zoned storage emulation to qcow2 driver.
> > https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilikerun@gmail.com/
>
> In the future, also add this information the machine-readable way,
> i.e. like
>
>   Based-on: <20231127043703.49489-1-faithilikerun@gmail.com>
>
> However, it doesn't apply on top of that series for me.  Got something I
> could pull?

Weird, I biuld this on top of v6 qcow2 patches. I'll check that after
settling down. I am moving to another city recently.

Thanks,
Sam


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/7] Add persistence to NVMe ZNS emulation
  2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
                   ` (7 preceding siblings ...)
  2023-11-30 10:11 ` [RFC v2 0/7] Add persistence to NVMe ZNS emulation Markus Armbruster
@ 2024-01-10  6:52 ` Klaus Jensen
  2024-01-22 18:43   ` Sam Li
  8 siblings, 1 reply; 15+ messages in thread
From: Klaus Jensen @ 2024-01-10  6:52 UTC (permalink / raw)
  To: Sam Li
  Cc: qemu-devel, stefanha, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal

[-- Attachment #1: Type: text/plain, Size: 2912 bytes --]

On Nov 27 16:56, Sam Li wrote:
> ZNS emulation follows NVMe ZNS spec but the state of namespace
> zones does not persist accross restarts of QEMU. This patch makes the
> metadata of ZNS emulation persistent by using new block layer APIs and
> the qcow2 img as backing file. It is the second part after the patches
> - adding full zoned storage emulation to qcow2 driver.
> https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilikerun@gmail.com/
> 
> The metadata of ZNS emulation divides into two parts, zone metadata and
> zone descriptor extension data. The zone metadata is composed of zone
> states, zone type, wp and zone attributes. The zone information can be
> stored at an uint64_t wp to save space and easy access. The structure of
> wp of each zone is as follows:
> |0000(4)| zone type (1)| zone attr (8)| wp (51) ||
> 
> The zone descriptor extension data is relatively small comparing to the
> overall size therefore we adopt the option that store zded of all zones
> in an array regardless of the valid bit set.
> 
> Creating a zns format qcow2 image file adds one more option zd_extension_size
> to zoned device configurations.
> 
> To attach this file as emulated zns drive in the command line of QEMU, use:
>   -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
>   -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \
> 
> Sorry, send this one more time due to network problems.
> 
> v1->v2:
> - split [v1 2/5] patch to three (doc, config, block layer API)
> - adapt qcow2 v6
> 
> Sam Li (7):
>   docs/qcow2: add zd_extension_size option to the zoned format feature
>   qcow2: add zd_extension configurations to zoned metadata
>   hw/nvme: use blk_get_*() to access zone info in the block layer
>   hw/nvme: add blk_get_zone_extension to access zd_extensions
>   hw/nvme: make the metadata of ZNS emulation persistent
>   hw/nvme: refactor zone append write using block layer APIs
>   hw/nvme: make ZDED persistent
> 
>  block/block-backend.c             |   88 ++
>  block/qcow2.c                     |  119 ++-
>  block/qcow2.h                     |    2 +
>  docs/interop/qcow2.txt            |    3 +
>  hw/nvme/ctrl.c                    | 1247 ++++++++---------------------
>  hw/nvme/ns.c                      |  162 +---
>  hw/nvme/nvme.h                    |   95 +--
>  include/block/block-common.h      |    9 +
>  include/block/block_int-common.h  |    8 +
>  include/sysemu/block-backend-io.h |   11 +
>  include/sysemu/dma.h              |    3 +
>  qapi/block-core.json              |    4 +
>  system/dma-helpers.c              |   17 +
>  13 files changed, 647 insertions(+), 1121 deletions(-)
> 
> -- 
> 2.40.1
> 

Hi Sam,

This is awesome. For the hw/nvme parts,

Acked-by: Klaus Jensen <k.jensen@samsung.com>

I'll give it a proper R-b when you drop the RFC status.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/7] Add persistence to NVMe ZNS emulation
  2024-01-10  6:52 ` Klaus Jensen
@ 2024-01-22 18:43   ` Sam Li
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2024-01-22 18:43 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: qemu-devel, stefanha, qemu-block, hare, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Keith Busch, Hanna Reitz, dmitry.fomichev, Kevin Wolf,
	Markus Armbruster, Eric Blake, Peter Xu, Paolo Bonzini, dlemoal

Klaus Jensen <its@irrelevant.dk> 于2024年1月10日周三 07:52写道:
>
> Hi Sam,
>
> This is awesome. For the hw/nvme parts,
>
> Acked-by: Klaus Jensen <k.jensen@samsung.com>
>
> I'll give it a proper R-b when you drop the RFC status.

Hi Klaus,

Sorry for the late response. I will submit a new RFC patch series very
soon.

Now the zone states should persist. The following is the result of
regression tests on zonefs. It's been a while since I worked on this
series. Please let me know if I made any mistake.

Thanks,
Sam

[root@guest tests]# ./zonefs-tests.sh /dev/nvme0n1
Gathering information on /dev/nvme0n1...
zonefs-tests on /dev/nvme0n1:
  12 zones (0 conventional zones, 12 sequential zones)
  131072 512B sectors zone size (64 MiB)
  6 max open zones
  8 max active zones
Running tests
...
75 / 112 tests passed (37 skipped, 0 failures)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC v2 0/7] Add persistence to NVMe ZNS emulation
@ 2023-11-27  8:33 Sam Li
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Li @ 2023-11-27  8:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eric Blake, Klaus Jensen, Markus Armbruster, Peter Xu,
	qemu-block, dlemoal, Hanna Reitz, Keith Busch, stefanha, hare,
	Philippe Mathieu-Daudé,
	Kevin Wolf, dmitry.fomichev, David Hildenbrand, Paolo Bonzini,
	Sam Li

ZNS emulation follows NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs and
the qcow2 img as backing file. It is the second part after the patches
- adding full zoned storage emulation to qcow2 driver.
https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilikerun@gmail.com/

The metadata of ZNS emulation divides into two parts, zone metadata and
zone descriptor extension data. The zone metadata is composed of zone
states, zone type, wp and zone attributes. The zone information can be
stored at an uint64_t wp to save space and easy access. The structure of
wp of each zone is as follows:
|0000(4)| zone type (1)| zone attr (8)| wp (51) ||

The zone descriptor extension data is relatively small comparing to the
overall size therefore we adopt the option that store zded of all zones
in an array regardless of the valid bit set.

Creating a zns format qcow2 image file adds one more option zd_extension_size
to zoned device configurations.

To attach this file as emulated zns drive in the command line of QEMU, use:
  -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
  -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \

v1->v2:
- split [v1 2/5] patch to three (doc, config, block layer API)
- adapt qcow2 v6

Sam Li (7):
  docs/qcow2: add zd_extension_size option to the zoned format feature
  qcow2: add zd_extension configurations to zoned metadata
  hw/nvme: use blk_get_*() to access zone info in the block layer
  hw/nvme: add blk_get_zone_extension to access zd_extensions
  hw/nvme: make the metadata of ZNS emulation persistent
  hw/nvme: refactor zone append write using block layer APIs
  hw/nvme: make ZDED persistent

 block/block-backend.c             |   88 ++
 block/qcow2.c                     |  119 ++-
 block/qcow2.h                     |    2 +
 docs/interop/qcow2.txt            |    3 +
 hw/nvme/ctrl.c                    | 1247 ++++++++---------------------
 hw/nvme/ns.c                      |  162 +---
 hw/nvme/nvme.h                    |   95 +--
 include/block/block-common.h      |    9 +
 include/block/block_int-common.h  |    8 +
 include/sysemu/block-backend-io.h |   11 +
 include/sysemu/dma.h              |    3 +
 qapi/block-core.json              |    4 +
 system/dma-helpers.c              |   17 +
 13 files changed, 647 insertions(+), 1121 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-01-22 18:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-27  8:56 [RFC v2 0/7] Add persistence to NVMe ZNS emulation Sam Li
2023-11-27  8:56 ` [RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature Sam Li
2023-11-30 10:05   ` Markus Armbruster
2023-11-27  8:56 ` [RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata Sam Li
2023-11-30 10:12   ` Markus Armbruster
2023-11-27  8:56 ` [RFC v2 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer Sam Li
2023-11-27  8:56 ` [RFC v2 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions Sam Li
2023-11-27  8:56 ` [RFC v2 5/7] hw/nvme: make the metadata of ZNS emulation persistent Sam Li
2023-11-27  8:56 ` [RFC v2 6/7] hw/nvme: refactor zone append write using block layer APIs Sam Li
2023-11-27  8:56 ` [RFC v2 7/7] hw/nvme: make ZDED persistent Sam Li
2023-11-30 10:11 ` [RFC v2 0/7] Add persistence to NVMe ZNS emulation Markus Armbruster
2023-11-30 10:20   ` Sam Li
2024-01-10  6:52 ` Klaus Jensen
2024-01-22 18:43   ` Sam Li
  -- strict thread matches above, loose matches on Subject: below --
2023-11-27  8:33 Sam Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.