qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
@ 2020-06-30 10:01 Klaus Jensen
  2020-06-30 10:01 ` [PATCH 01/10] hw/block/nvme: support I/O Command Sets Klaus Jensen
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

From: Klaus Jensen <k.jensen@samsung.com>

Hi all,

This series adds support for TP 4056 ("Namespace Types") and TP 4053
("Zoned Namespaces") and is an alternative implementation to the one
submitted by Dmitry[1].

While I don't want this to end up as a discussion about the merits of
each version, I want to point out a couple of differences from Dmitry's
version. At a glance, my version

  * builds on my patch series that adds fairly complete NVMe v1.4
    mandatory support, as well as nice-to-have feature such as SGLs,
    multiple namespaces and mostly just overall clean up. This finally
    brings the nvme device into a fairly compliant state on which we can
    add new features. I've tried hard to get these compliance and
    clean-up patches merged for a long time (in parallel with developing
    the emulation of NST and ZNS) and I would be really sad to see them
    by-passed since they have already been through many iterations and
    already carries Acked- and Reviewed-by's for the bulk of the
    patches. I think the nvme device is already in a "frankenstate" wrt.
    the implemented nvme version and the features it currently supports,
    so I think this kind of cleanup is long overdue.

  * uses an attached blockdev and standard blk_aio for persistent zone
    info. This is the same method used in our patches for Write
    Uncorrectable and (separate and extended lba) metadata support, but
    I've left those optional features out for now to ease the review
    process.

  * relies on the universal dulbe support added in ("hw/block/nvme: add
    support for dulbe") and sparse images for handling reads in gaps
    (above write pointer and below ZSZE); that is - the size of the
    underlying blockdev is in terms of ZSZE, not ZCAP

  * the controller uses timers to autonomously finish zones (wrt. FRL)

I've been on paternity leave for a month, so I havn't been around to
review Dmitry's patches, but I have started that process now. I would
also be happy to work with Dmitry & Friends on merging our versions to
get the best of both worlds if it makes sense.

This series and all preparatory patch sets (the ones I've been posting
yesterday and today) are available on my GitHub[2]. Unfortunately
Patchew got screwed up in the middle of me sending patches and it never
picked up v2 of "hw/block/nvme: support multiple namespaces" because it
was getting late and I made a mistake with the CC's. So my posted series
don't apply according to Patchew, but they actually do if you follow the
Based-on's (... or just grab [2]).


  [1]: Message-Id: <20200617213415.22417-1-dmitry.fomichev@wdc.com>
  [2]: https://github.com/birkelund/qemu/tree/for-master/nvme


Based-on: <20200630043122.1307043-1-its@irrelevant.dk>
("[PATCH 0/3] hw/block/nvme: bump to v1.4")

Klaus Jensen (10):
  hw/block/nvme: support I/O Command Sets
  hw/block/nvme: add zns specific fields and types
  hw/block/nvme: add basic read/write for zoned namespaces
  hw/block/nvme: add the zone management receive command
  hw/block/nvme: add the zone management send command
  hw/block/nvme: add the zone append command
  hw/block/nvme: track and enforce zone resources
  hw/block/nvme: allow open to close transitions by controller
  hw/block/nvme: allow zone excursions
  hw/block/nvme: support reset/finish recommended limits

 block/nvme.c          |    6 +-
 hw/block/nvme-ns.c    |  397 +++++++++-
 hw/block/nvme-ns.h    |  148 +++-
 hw/block/nvme.c       | 1676 +++++++++++++++++++++++++++++++++++++++--
 hw/block/nvme.h       |   76 +-
 hw/block/trace-events |   43 +-
 include/block/nvme.h  |  252 ++++++-
 7 files changed, 2469 insertions(+), 129 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/10] hw/block/nvme: support I/O Command Sets
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 02/10] hw/block/nvme: add zns specific fields and types Klaus Jensen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

From: Klaus Jensen <k.jensen@samsung.com>

Implement support for TP 4056 ("Namespace Types"). This adds the 'iocs'
(I/O Command Set) device parameter to the nvme-ns device.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 block/nvme.c          |   6 +-
 hw/block/nvme-ns.c    |  24 +++--
 hw/block/nvme-ns.h    |  11 +-
 hw/block/nvme.c       | 226 +++++++++++++++++++++++++++++++++---------
 hw/block/nvme.h       |  52 ++++++----
 hw/block/trace-events |   6 +-
 include/block/nvme.h  |  53 ++++++++--
 7 files changed, 285 insertions(+), 93 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 05485fdd1189..e7fe0c7accd1 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
 {
     uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
     if (status) {
-        trace_nvme_error(le32_to_cpu(c->result),
+        trace_nvme_error(le32_to_cpu(c->dw0),
                          le16_to_cpu(c->sq_head),
                          le16_to_cpu(c->sq_id),
                          le16_to_cpu(c->cid),
@@ -495,7 +495,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 {
     BDRVNVMeState *s = bs->opaque;
     NvmeIdCtrl *idctrl;
-    NvmeIdNs *idns;
+    NvmeIdNsNvm *idns;
     NvmeLBAF *lbaf;
     uint8_t *resp;
     uint16_t oncs;
@@ -512,7 +512,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         goto out;
     }
     idctrl = (NvmeIdCtrl *)resp;
-    idns = (NvmeIdNs *)resp;
+    idns = (NvmeIdNsNvm *)resp;
     r = qemu_vfio_dma_map(s->vfio, resp, sizeof(NvmeIdCtrl), true, &iova);
     if (r) {
         error_setg(errp, "Cannot map buffer for DMA");
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7c825c38c69d..ae051784caaf 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -59,8 +59,16 @@ static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp)
 
 static void nvme_ns_init(NvmeNamespace *ns)
 {
-    NvmeIdNs *id_ns = &ns->id_ns;
+    NvmeIdNsNvm *id_ns;
 
+    int unmap = blk_get_flags(ns->blk) & BDRV_O_UNMAP;
+
+    ns->id_ns[NVME_IOCS_NVM] = g_new0(NvmeIdNsNvm, 1);
+    id_ns = nvme_ns_id_nvm(ns);
+
+    ns->iocs = ns->params.iocs;
+
+    id_ns->dlfeat = unmap ? 0x9 : 0x0;
     id_ns->lbaf[0].ds = ns->params.lbads;
 
     id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
@@ -130,8 +138,7 @@ static int nvme_ns_init_blk_state(NvmeNamespace *ns, Error **errp)
     return 0;
 }
 
-static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id,
-                            Error **errp)
+static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 {
     uint64_t perm, shared_perm;
 
@@ -174,7 +181,8 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id,
     return 0;
 }
 
-static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
+static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, Error
+                                     **errp)
 {
     if (!ns->blk) {
         error_setg(errp, "block backend not configured");
@@ -191,11 +199,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
 
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 {
-    if (nvme_ns_check_constraints(ns, errp)) {
+    if (nvme_ns_check_constraints(n, ns, errp)) {
         return -1;
     }
 
-    if (nvme_ns_init_blk(n, ns, &n->id_ctrl, errp)) {
+    if (nvme_ns_init_blk(n, ns, errp)) {
         return -1;
     }
 
@@ -210,7 +218,8 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
          * With a state file in place we can enable the Deallocated or
          * Unwritten Logical Block Error feature.
          */
-        ns->id_ns.nsfeat |= 0x4;
+        NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
+        id_ns->nsfeat |= 0x4;
     }
 
     if (nvme_register_namespace(n, ns, errp)) {
@@ -239,6 +248,7 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
     DEFINE_PROP_UINT8("lbads", NvmeNamespace, params.lbads, BDRV_SECTOR_BITS),
     DEFINE_PROP_DRIVE("state", NvmeNamespace, blk_state),
+    DEFINE_PROP_UINT8("iocs", NvmeNamespace, params.iocs, 0x0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index eb901acc912b..4124f20f1cef 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
     uint32_t nsid;
+    uint8_t  iocs;
     uint8_t  lbads;
 } NvmeNamespaceParams;
 
@@ -30,8 +31,9 @@ typedef struct NvmeNamespace {
     BlockBackend *blk_state;
     int32_t      bootindex;
     int64_t      size;
+    uint8_t      iocs;
 
-    NvmeIdNs            id_ns;
+    void         *id_ns[256];
     NvmeNamespaceParams params;
 
     unsigned long *utilization;
@@ -50,9 +52,14 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
     return -1;
 }
 
+static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns)
+{
+    return ns->id_ns[NVME_IOCS_NVM];
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
-    NvmeIdNs *id_ns = &ns->id_ns;
+    NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
     return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 25d79bcd0bc9..1662c11a4cf3 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -854,7 +854,7 @@ static void nvme_process_aers(void *opaque)
 
         req = n->aer_reqs[n->outstanding_aers];
 
-        result = (NvmeAerResult *) &req->cqe.result;
+        result = (NvmeAerResult *) &req->cqe.dw0;
         result->event_type = event->result.event_type;
         result->event_info = event->result.event_info;
         result->log_page = event->result.log_page;
@@ -916,7 +916,8 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len)
 static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
                                          uint64_t slba, uint32_t nlb)
 {
-    uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
+    NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
+    uint64_t nsze = le64_to_cpu(id_ns->nsze);
 
     if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
         return NVME_LBA_RANGE | NVME_DNR;
@@ -951,8 +952,9 @@ static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
 
     status = nvme_check_bounds(n, ns, req->slba, req->nlb);
     if (status) {
+        NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
         trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb,
-                                             ns->id_ns.nsze);
+                                             id_ns->nsze);
         return status;
     }
 
@@ -1154,8 +1156,9 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
 
     status = nvme_check_bounds(n, ns, req->slba, req->nlb);
     if (status) {
+        NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
         trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb,
-                                             ns->id_ns.nsze);
+                                             id_ns->nsze);
         return status;
     }
 
@@ -1481,14 +1484,19 @@ static uint16_t nvme_effects_log(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
     NvmeRequest *req)
 {
     uint32_t trans_len;
+    uint8_t csi = le32_to_cpu(req->cmd.cdw14) >> 24;
 
-    if (off > sizeof(nvme_effects)) {
+    if (!(n->iocscs[n->features.iocsci] & (1 << csi))) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    trans_len = MIN(sizeof(nvme_effects) - off, buf_len);
+    if (off > sizeof(NvmeEffectsLog)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
 
-    return nvme_dma(n, (uint8_t *)&nvme_effects + off, trans_len,
+    trans_len = MIN(sizeof(NvmeEffectsLog) - off, buf_len);
+
+    return nvme_dma(n, (uint8_t *)&nvme_effects[csi] + off, trans_len,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
@@ -1648,69 +1656,129 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ctrl(NvmeCtrl *n, uint8_t cns, uint8_t csi,
+                                   NvmeRequest *req)
 {
+    NvmeIdCtrl empty = { 0 };
+    NvmeIdCtrl *id_ctrl = &empty;
+
     trace_pci_nvme_identify_ctrl();
 
-    return nvme_dma(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
+    switch (cns) {
+    case NVME_ID_CNS_CTRL:
+        id_ctrl = &n->id_ctrl;
+
+        break;
+
+    case NVME_ID_CNS_CTRL_IOCS:
+        if (!(n->iocscs[n->features.iocsci] & (1 << csi))) {
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+
+        if (n->id_ctrl_iocss[csi]) {
+            id_ctrl = n->id_ctrl_iocss[csi];
+        }
+
+        break;
+
+    default:
+        assert(cns);
+    }
+
+    return nvme_dma(n, (uint8_t *)id_ctrl, sizeof(*id_ctrl),
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, uint8_t cns, uint8_t csi,
+                                 NvmeRequest *req)
 {
+    NvmeIdNsNvm empty = { 0 };
+    void *id_ns = &empty;
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
     NvmeNamespace *ns;
-    NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
-    NvmeIdNs *id_ns, inactive = { 0 };
-    uint32_t nsid = le32_to_cpu(c->nsid);
 
-    trace_pci_nvme_identify_ns(nsid);
+    trace_pci_nvme_identify_ns(nsid, csi);
 
     if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) {
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
     ns = nvme_ns(n, nsid);
-    if (unlikely(!ns)) {
-        id_ns = &inactive;
-    } else {
-        id_ns = &ns->id_ns;
+    if (ns) {
+        switch (cns) {
+        case NVME_ID_CNS_NS:
+            id_ns = ns->id_ns[NVME_IOCS_NVM];
+            if (!id_ns) {
+                return NVME_INVALID_IOCS | NVME_DNR;
+            }
+
+            break;
+
+        case NVME_ID_CNS_NS_IOCS:
+            if (csi == NVME_IOCS_NVM) {
+                break;
+            }
+
+            id_ns = ns->id_ns[csi];
+            if (!id_ns) {
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
+
+            break;
+
+        default:
+            assert(cns);
+        }
     }
 
-    return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs),
+    return nvme_dma(n, (uint8_t *)id_ns, NVME_IDENTIFY_DATA_SIZE,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, uint8_t cns, uint8_t csi,
+                                     NvmeRequest *req)
 {
-    NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
-    static const int data_len = NVME_IDENTIFY_DATA_SIZE;
-    uint32_t min_nsid = le32_to_cpu(c->nsid);
+    static const int len = NVME_IDENTIFY_DATA_SIZE;
+    uint32_t min_nsid = le32_to_cpu(req->cmd.nsid);
     uint32_t *list;
     uint16_t ret;
     int j = 0;
 
-    trace_pci_nvme_identify_nslist(min_nsid);
+    trace_pci_nvme_identify_nslist(min_nsid, csi);
 
-    list = g_malloc0(data_len);
+    if (min_nsid == 0xfffffffe || min_nsid == 0xffffffff) {
+        return NVME_INVALID_NSID | NVME_DNR;
+    }
+
+    if (cns == NVME_ID_CNS_NS_ACTIVE_LIST_IOCS && !csi) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    list = g_malloc0(len);
     for (int i = 1; i <= n->num_namespaces; i++) {
-        if (i <= min_nsid || !nvme_ns(n, i)) {
+        NvmeNamespace *ns = nvme_ns(n, i);
+        if (i <= min_nsid || !ns) {
             continue;
         }
+
+        if (cns == NVME_ID_CNS_NS_ACTIVE_LIST_IOCS && csi && csi != ns->iocs) {
+            continue;
+        }
+
         list[j++] = cpu_to_le32(i);
-        if (j == data_len / sizeof(uint32_t)) {
+        if (j == len / sizeof(uint32_t)) {
             break;
         }
     }
-    ret = nvme_dma(n, (uint8_t *)list, data_len, DMA_DIRECTION_FROM_DEVICE,
-                   req);
+    ret = nvme_dma(n, (uint8_t *)list, len, DMA_DIRECTION_FROM_DEVICE, req);
     g_free(list);
     return ret;
 }
 
 static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
-    uint32_t nsid = le32_to_cpu(c->nsid);
+    NvmeNamespace *ns;
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
     uint8_t list[NVME_IDENTIFY_DATA_SIZE];
 
     struct data {
@@ -1718,6 +1786,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
             NvmeIdNsDescr hdr;
             uint8_t v[16];
         } uuid;
+
+        struct {
+            NvmeIdNsDescr hdr;
+            uint8_t v;
+        } iocs;
     };
 
     struct data *ns_descrs = (struct data *)list;
@@ -1728,7 +1801,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
-    if (unlikely(!nvme_ns(n, nsid))) {
+    ns = nvme_ns(n, nsid);
+    if (unlikely(!ns)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
@@ -1744,25 +1818,45 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
     ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN;
     stl_be_p(&ns_descrs->uuid.v, nsid);
 
+    ns_descrs->iocs.hdr.nidt = NVME_NIDT_CSI;
+    ns_descrs->iocs.hdr.nidl = NVME_NIDT_CSI_LEN;
+    stb_p(&ns_descrs->iocs.v, ns->iocs);
+
     return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_identify_iocs(NvmeCtrl *n, uint16_t cntid,
+                                   NvmeRequest *req)
+{
+    return nvme_dma(n, (uint8_t *) n->iocscs, sizeof(n->iocscs),
+                    DMA_DIRECTION_FROM_DEVICE, req);
+}
+
 static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
+    NvmeIdentify *id = (NvmeIdentify *) &req->cmd;
 
-    switch (le32_to_cpu(c->cns)) {
+    trace_pci_nvme_identify(nvme_cid(req), le32_to_cpu(req->cmd.nsid),
+                            le16_to_cpu(id->cntid), id->cns, id->csi,
+                            le16_to_cpu(id->nvmsetid));
+
+    switch (le32_to_cpu(id->cns)) {
     case NVME_ID_CNS_NS:
-        return nvme_identify_ns(n, req);
+    case NVME_ID_CNS_NS_IOCS:
+        return nvme_identify_ns(n, id->cns, id->csi, req);
     case NVME_ID_CNS_CTRL:
-        return nvme_identify_ctrl(n, req);
+    case NVME_ID_CNS_CTRL_IOCS:
+        return nvme_identify_ctrl(n, id->cns, id->csi, req);
     case NVME_ID_CNS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist(n, req);
+    case NVME_ID_CNS_NS_ACTIVE_LIST_IOCS:
+        return nvme_identify_nslist(n, id->cns, id->csi, req);
     case NVME_ID_CNS_NS_DESCR_LIST:
         return nvme_identify_ns_descr_list(n, req);
+    case NVME_ID_CNS_IOCS:
+        return nvme_identify_iocs(n, id->cntid, req);
     default:
-        trace_pci_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns));
+        trace_pci_nvme_err_invalid_identify_cns(id->cns);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 }
@@ -1771,7 +1865,7 @@ static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req)
 {
     uint16_t sqid = le32_to_cpu(req->cmd.cdw10) & 0xffff;
 
-    req->cqe.result = 1;
+    req->cqe.dw0 = 1;
     if (nvme_check_sqid(n, sqid)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
@@ -1954,13 +2048,17 @@ defaults:
 
         result = cpu_to_le32(result);
         break;
+    case NVME_COMMAND_SET_PROFILE:
+        result = cpu_to_le32(n->features.iocsci & 0x1ff);
+        break;
     default:
         result = cpu_to_le32(nvme_feature_default[fid]);
         break;
     }
 
 out:
-    req->cqe.result = result;
+    req->cqe.dw0 = result;
+
     return NVME_SUCCESS;
 }
 
@@ -1983,6 +2081,7 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeRequest *req)
 static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeNamespace *ns = NULL;
+    NvmeIdNsNvm *id_ns;
 
     NvmeCmd *cmd = &req->cmd;
     uint32_t dw10 = le32_to_cpu(cmd->cdw10);
@@ -2059,7 +2158,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
                     continue;
                 }
 
-                if (NVME_ID_NS_NSFEAT_DULBE(ns->id_ns.nsfeat)) {
+                id_ns = nvme_ns_id_nvm(ns);
+                if (NVME_ID_NS_NSFEAT_DULBE(id_ns->nsfeat)) {
                     ns->features.err_rec = dw11;
                 }
             }
@@ -2075,6 +2175,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
 
         for (int i = 1; i <= n->num_namespaces; i++) {
             ns = nvme_ns(n, i);
+
             if (!ns) {
                 continue;
             }
@@ -2105,14 +2206,34 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
                                     ((dw11 >> 16) & 0xFFFF) + 1,
                                     n->params.max_ioqpairs,
                                     n->params.max_ioqpairs);
-        req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
-                                      ((n->params.max_ioqpairs - 1) << 16));
+        req->cqe.dw0 = cpu_to_le32((n->params.max_ioqpairs - 1) |
+                                   ((n->params.max_ioqpairs - 1) << 16));
         break;
     case NVME_ASYNCHRONOUS_EVENT_CONF:
         n->features.async_config = dw11;
         break;
     case NVME_TIMESTAMP:
         return nvme_set_feature_timestamp(n, req);
+    case NVME_COMMAND_SET_PROFILE:
+        if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ALL) {
+            uint16_t iocsci = dw11 & 0x1ff;
+            uint64_t iocsc = n->iocscs[iocsci];
+
+            for (int i = 1; i <= n->num_namespaces; i++) {
+                ns = nvme_ns(n, i);
+                if (!ns) {
+                    continue;
+                }
+
+                if (!(iocsc & (1 << ns->iocs))) {
+                    return NVME_IOCS_COMB_REJECTED | NVME_DNR;
+                }
+            }
+
+            n->features.iocsci = iocsci;
+        }
+
+        break;
     default:
         return NVME_FEAT_NOT_CHANGABLE | NVME_DNR;
     }
@@ -2265,6 +2386,8 @@ static int nvme_start_ctrl(NvmeCtrl *n)
     uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12;
     uint32_t page_size = 1 << page_bits;
 
+    NvmeIdCtrl *id_ctrl = &n->id_ctrl;
+
     if (unlikely(n->cq[0])) {
         trace_pci_nvme_err_startfail_cq();
         return -1;
@@ -2304,28 +2427,28 @@ static int nvme_start_ctrl(NvmeCtrl *n)
         return -1;
     }
     if (unlikely(NVME_CC_IOCQES(n->bar.cc) <
-                 NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
+                 NVME_CTRL_CQES_MIN(id_ctrl->cqes))) {
         trace_pci_nvme_err_startfail_cqent_too_small(
                     NVME_CC_IOCQES(n->bar.cc),
                     NVME_CTRL_CQES_MIN(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOCQES(n->bar.cc) >
-                 NVME_CTRL_CQES_MAX(n->id_ctrl.cqes))) {
+                 NVME_CTRL_CQES_MAX(id_ctrl->cqes))) {
         trace_pci_nvme_err_startfail_cqent_too_large(
                     NVME_CC_IOCQES(n->bar.cc),
                     NVME_CTRL_CQES_MAX(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOSQES(n->bar.cc) <
-                 NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
+                 NVME_CTRL_SQES_MIN(id_ctrl->sqes))) {
         trace_pci_nvme_err_startfail_sqent_too_small(
                     NVME_CC_IOSQES(n->bar.cc),
                     NVME_CTRL_SQES_MIN(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOSQES(n->bar.cc) >
-                 NVME_CTRL_SQES_MAX(n->id_ctrl.sqes))) {
+                 NVME_CTRL_SQES_MAX(id_ctrl->sqes))) {
         trace_pci_nvme_err_startfail_sqent_too_large(
                     NVME_CC_IOSQES(n->bar.cc),
                     NVME_CTRL_SQES_MAX(n->bar.cap));
@@ -2774,6 +2897,8 @@ static void nvme_init_state(NvmeCtrl *n)
     n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
     n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
     n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+    n->iocscs[0] = 1 << NVME_IOCS_NVM;
+    n->features.iocsci = 0;
 }
 
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
@@ -2977,7 +3102,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
     NVME_CAP_SET_CQR(n->bar.cap, 1);
     NVME_CAP_SET_TO(n->bar.cap, 0xf);
-    NVME_CAP_SET_CSS(n->bar.cap, 1);
+    NVME_CAP_SET_CSS(n->bar.cap, (NVME_CAP_CSS_NVM | NVME_CAP_CSS_CSI));
     NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 
     n->bar.vs = NVME_SPEC_VER;
@@ -3037,6 +3162,11 @@ static void nvme_exit(PCIDevice *pci_dev)
     if (n->pmrdev) {
         host_memory_backend_set_mapped(n->pmrdev, false);
     }
+
+    for (int i = 0; i < 256; i++) {
+        g_free(n->id_ctrl_iocss[i]);
+    }
+
     msix_uninit_exclusive_bar(pci_dev);
 }
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e62bcd12a7a8..69be47963f5d 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -18,28 +18,33 @@ typedef struct NvmeParams {
     bool     use_intel_id;
 } NvmeParams;
 
-static const NvmeEffectsLog nvme_effects = {
-    .acs = {
-        [NVME_ADM_CMD_DELETE_SQ]    = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_CREATE_SQ]    = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_DELETE_CQ]    = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_CREATE_CQ]    = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_IDENTIFY]     = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_ABORT]        = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_SET_FEATURES] = NVME_EFFECTS_CSUPP | NVME_EFFECTS_CCC |
-            NVME_EFFECTS_NIC | NVME_EFFECTS_NCC,
-        [NVME_ADM_CMD_GET_FEATURES] = NVME_EFFECTS_CSUPP,
-        [NVME_ADM_CMD_FORMAT_NVM]   = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC |
-            NVME_EFFECTS_NCC | NVME_EFFECTS_NIC | NVME_EFFECTS_CSE_MULTI,
-        [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_EFFECTS_CSUPP,
-    },
+static const NvmeEffectsLog nvme_effects[] = {
+    [NVME_IOCS_NVM] = {
+        .acs = {
+            [NVME_ADM_CMD_DELETE_SQ]    = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_CREATE_SQ]    = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_DELETE_CQ]    = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_CREATE_CQ]    = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_IDENTIFY]     = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_ABORT]        = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_SET_FEATURES] = NVME_EFFECTS_CSUPP |
+                NVME_EFFECTS_CCC | NVME_EFFECTS_NIC | NVME_EFFECTS_NCC,
+            [NVME_ADM_CMD_GET_FEATURES] = NVME_EFFECTS_CSUPP,
+            [NVME_ADM_CMD_FORMAT_NVM]   = NVME_EFFECTS_CSUPP |
+                NVME_EFFECTS_LBCC | NVME_EFFECTS_NCC | NVME_EFFECTS_NIC |
+                NVME_EFFECTS_CSE_MULTI,
+            [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_EFFECTS_CSUPP,
+        },
 
-    .iocs = {
-        [NVME_CMD_FLUSH]            = NVME_EFFECTS_CSUPP,
-        [NVME_CMD_WRITE]            = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC,
-        [NVME_CMD_READ]             = NVME_EFFECTS_CSUPP,
-        [NVME_CMD_WRITE_ZEROES]     = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC,
+        .iocs = {
+            [NVME_CMD_FLUSH]            = NVME_EFFECTS_CSUPP,
+            [NVME_CMD_WRITE]            = NVME_EFFECTS_CSUPP |
+                NVME_EFFECTS_LBCC,
+            [NVME_CMD_READ]             = NVME_EFFECTS_CSUPP,
+            [NVME_CMD_WRITE_ZEROES]     = NVME_EFFECTS_CSUPP |
+                NVME_EFFECTS_LBCC,
+        },
     },
 };
 
@@ -193,6 +198,7 @@ typedef struct NvmeFeatureVal {
     };
     uint32_t    async_config;
     uint32_t    vwc;
+    uint32_t    iocsci;
 } NvmeFeatureVal;
 
 static const uint32_t nvme_feature_cap[0x100] = {
@@ -202,6 +208,7 @@ static const uint32_t nvme_feature_cap[0x100] = {
     [NVME_NUMBER_OF_QUEUES]         = NVME_FEAT_CAP_CHANGE,
     [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
     [NVME_TIMESTAMP]                = NVME_FEAT_CAP_CHANGE,
+    [NVME_COMMAND_SET_PROFILE]      = NVME_FEAT_CAP_CHANGE,
 };
 
 static const uint32_t nvme_feature_default[0x100] = {
@@ -220,6 +227,7 @@ static const bool nvme_feature_support[0x100] = {
     [NVME_WRITE_ATOMICITY]          = true,
     [NVME_ASYNCHRONOUS_EVENT_CONF]  = true,
     [NVME_TIMESTAMP]                = true,
+    [NVME_COMMAND_SET_PROFILE]      = true,
 };
 
 typedef struct NvmeCtrl {
@@ -247,6 +255,7 @@ typedef struct NvmeCtrl {
     uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
     uint64_t    starttime_ms;
     uint16_t    temperature;
+    uint64_t    iocscs[512];
 
     HostMemoryBackend *pmrdev;
 
@@ -262,6 +271,7 @@ typedef struct NvmeCtrl {
     NvmeSQueue      admin_sq;
     NvmeCQueue      admin_cq;
     NvmeIdCtrl      id_ctrl;
+    void            *id_ctrl_iocss[256];
     NvmeFeatureVal  features;
 } NvmeCtrl;
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index ed21609f1a4f..4cf0236631d2 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -51,10 +51,12 @@ pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize,
 pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
 pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
 pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16""
+pci_nvme_identify(uint16_t cid, uint32_t nsid, uint16_t cntid, uint8_t cns, uint8_t csi, uint16_t nvmsetid) "cid %"PRIu16" nsid %"PRIu32" cntid 0x%"PRIx16" cns 0x%"PRIx8" csi 0x%"PRIx8" nvmsetid %"PRIu16""
 pci_nvme_identify_ctrl(void) "identify controller"
-pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
-pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
+pci_nvme_identify_ns(uint32_t ns, uint8_t csi) "nsid %"PRIu32" csi 0x%"PRIx8""
+pci_nvme_identify_nslist(uint32_t ns, uint8_t csi) "nsid %"PRIu32" csi 0x%"PRIx8""
 pci_nvme_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32""
+pci_nvme_identify_io_cmd_set(uint16_t cid) "cid %"PRIu16""
 pci_nvme_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
 pci_nvme_getfeat(uint16_t cid, uint8_t fid, uint8_t sel, uint32_t cdw11) "cid %"PRIu16" fid 0x%"PRIx8" sel 0x%"PRIx8" cdw11 0x%"PRIx32""
 pci_nvme_setfeat(uint16_t cid, uint8_t fid, uint8_t save, uint32_t cdw11) "cid %"PRIu16" fid 0x%"PRIx8" save 0x%"PRIx8" cdw11 0x%"PRIx32""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 040e4ef36ddc..637be0ddd2fc 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -93,6 +93,11 @@ enum NvmeCapMask {
 #define NVME_CAP_SET_CMBS(cap, val)   (cap |= (uint64_t)(val & CAP_CMBS_MASK)\
                                                             << CAP_CMBS_SHIFT)
 
+enum NvmeCapCss {
+    NVME_CAP_CSS_NVM = 1 << 0,
+    NVME_CAP_CSS_CSI = 1 << 6,
+};
+
 enum NvmeCcShift {
     CC_EN_SHIFT     = 0,
     CC_CSS_SHIFT    = 4,
@@ -121,6 +126,11 @@ enum NvmeCcMask {
 #define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK)
 #define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK)
 
+enum NvmeCcCss {
+    NVME_CC_CSS_NVM = 0x0,
+    NVME_CC_CSS_ALL = 0x6,
+};
+
 enum NvmeCstsShift {
     CSTS_RDY_SHIFT      = 0,
     CSTS_CFS_SHIFT      = 1,
@@ -454,6 +464,10 @@ enum NvmeCmbmscMask {
 
 #define NVME_CMBSTS_CBAI(cmbsts) (cmsts & 0x1)
 
+enum NvmeCommandSet {
+    NVME_IOCS_NVM = 0x0,
+};
+
 enum NvmeSglDescriptorType {
     NVME_SGL_DESCR_TYPE_DATA_BLOCK          = 0x0,
     NVME_SGL_DESCR_TYPE_BIT_BUCKET          = 0x1,
@@ -604,7 +618,8 @@ typedef struct NvmeIdentify {
     uint8_t     rsvd3;
     uint16_t    cntid;
     uint16_t    nvmsetid;
-    uint16_t    rsvd4;
+    uint8_t     rsvd4;
+    uint8_t     csi;
     uint32_t    rsvd11[4];
 } NvmeIdentify;
 
@@ -697,8 +712,15 @@ typedef struct NvmeAerResult {
 } NvmeAerResult;
 
 typedef struct NvmeCqe {
-    uint32_t    result;
-    uint32_t    rsvd;
+    union {
+        struct {
+            uint32_t    dw0;
+            uint32_t    dw1;
+        };
+
+        uint64_t qw0;
+    };
+
     uint16_t    sq_head;
     uint16_t    sq_id;
     uint16_t    cid;
@@ -746,6 +768,10 @@ enum NvmeStatusCodes {
     NVME_FEAT_NOT_CHANGABLE     = 0x010e,
     NVME_FEAT_NOT_NS_SPEC       = 0x010f,
     NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
+    NVME_IOCS_NOT_SUPPORTED     = 0x0127,
+    NVME_IOCS_NOT_ENABLED       = 0x0128,
+    NVME_IOCS_COMB_REJECTED     = 0x0129,
+    NVME_INVALID_IOCS           = 0x0126,
     NVME_CONFLICTING_ATTRS      = 0x0180,
     NVME_INVALID_PROT_INFO      = 0x0181,
     NVME_WRITE_TO_RO            = 0x0182,
@@ -890,10 +916,14 @@ typedef struct NvmePSD {
 #define NVME_IDENTIFY_DATA_SIZE 4096
 
 enum {
-    NVME_ID_CNS_NS             = 0x0,
-    NVME_ID_CNS_CTRL           = 0x1,
-    NVME_ID_CNS_NS_ACTIVE_LIST = 0x2,
-    NVME_ID_CNS_NS_DESCR_LIST  = 0x3,
+    NVME_ID_CNS_NS                     = 0x00,
+    NVME_ID_CNS_CTRL                   = 0x01,
+    NVME_ID_CNS_NS_ACTIVE_LIST         = 0x02,
+    NVME_ID_CNS_NS_DESCR_LIST          = 0x03,
+    NVME_ID_CNS_NS_IOCS                = 0x05,
+    NVME_ID_CNS_CTRL_IOCS              = 0x06,
+    NVME_ID_CNS_NS_ACTIVE_LIST_IOCS    = 0x07,
+    NVME_ID_CNS_IOCS                   = 0x1c,
 };
 
 typedef struct NvmeIdCtrl {
@@ -1058,6 +1088,7 @@ enum NvmeFeatureIds {
     NVME_WRITE_ATOMICITY            = 0xa,
     NVME_ASYNCHRONOUS_EVENT_CONF    = 0xb,
     NVME_TIMESTAMP                  = 0xe,
+    NVME_COMMAND_SET_PROFILE        = 0x19,
     NVME_SOFTWARE_PROGRESS_MARKER   = 0x80
 };
 
@@ -1105,7 +1136,7 @@ typedef struct NvmeLBAF {
 
 #define NVME_NSID_BROADCAST 0xffffffff
 
-typedef struct NvmeIdNs {
+typedef struct NvmeIdNsNvm {
     uint64_t    nsze;
     uint64_t    ncap;
     uint64_t    nuse;
@@ -1143,7 +1174,7 @@ typedef struct NvmeIdNs {
     NvmeLBAF    lbaf[16];
     uint8_t     rsvd192[192];
     uint8_t     vs[3712];
-} NvmeIdNs;
+} NvmeIdNsNvm;
 
 typedef struct NvmeIdNsDescr {
     uint8_t nidt;
@@ -1154,11 +1185,13 @@ typedef struct NvmeIdNsDescr {
 #define NVME_NIDT_EUI64_LEN 8
 #define NVME_NIDT_NGUID_LEN 16
 #define NVME_NIDT_UUID_LEN  16
+#define NVME_NIDT_CSI_LEN   1
 
 enum {
     NVME_NIDT_EUI64 = 0x1,
     NVME_NIDT_NGUID = 0x2,
     NVME_NIDT_UUID  = 0x3,
+    NVME_NIDT_CSI   = 0x4,
 };
 
 /*Deallocate Logical Block Features*/
@@ -1211,7 +1244,7 @@ static inline void _nvme_check_size(void)
     QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512);
     QEMU_BUILD_BUG_ON(sizeof(NvmeEnduranceGroupLog) != 512);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096);
-    QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeNvmSetAttributes) != 128);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdNvmSetList) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/10] hw/block/nvme: add zns specific fields and types
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
  2020-06-30 10:01 ` [PATCH 01/10] hw/block/nvme: support I/O Command Sets Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 03/10] hw/block/nvme: add basic read/write for zoned namespaces Klaus Jensen
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Add new fields, types and data structures for TP 4053 ("Zoned Namespaces").

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 include/block/nvme.h | 186 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 180 insertions(+), 6 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 637be0ddd2fc..ddf948132272 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -465,7 +465,8 @@ enum NvmeCmbmscMask {
 #define NVME_CMBSTS_CBAI(cmbsts) (cmsts & 0x1)
 
 enum NvmeCommandSet {
-    NVME_IOCS_NVM = 0x0,
+    NVME_IOCS_NVM   = 0x0,
+    NVME_IOCS_ZONED = 0x2,
 };
 
 enum NvmeSglDescriptorType {
@@ -552,6 +553,11 @@ enum NvmeIoCommands {
     NVME_CMD_COMPARE            = 0x05,
     NVME_CMD_WRITE_ZEROES       = 0x08,
     NVME_CMD_DSM                = 0x09,
+
+    /* Zoned Command Set */
+    NVME_CMD_ZONE_MGMT_SEND     = 0x79,
+    NVME_CMD_ZONE_MGMT_RECV     = 0x7a,
+    NVME_CMD_ZONE_APPEND        = 0x7d,
 };
 
 typedef struct NvmeDeleteQ {
@@ -664,6 +670,82 @@ enum {
     NVME_RW_PRINFO_PRCHK_REF    = 1 << 10,
 };
 
+typedef struct NvmeZoneAppendCmd {
+    uint8_t     opcode;
+    uint8_t     flags;
+    uint16_t    cid;
+    uint32_t    nsid;
+    uint32_t    rsvd8[2];
+    uint64_t    mptr;
+    NvmeCmdDptr dptr;
+    uint64_t    zslba;
+    uint16_t    nlb;
+    uint8_t     rsvd50;
+    uint8_t     control;
+    uint32_t    ilbrt;
+    uint16_t    lbat;
+    uint16_t    lbatm;
+} NvmeZoneAppendCmd;
+
+typedef struct NvmeZoneManagementSendCmd {
+    uint8_t     opcode;
+    uint8_t     flags;
+    uint16_t    cid;
+    uint32_t    nsid;
+    uint32_t    rsvd8[4];
+    NvmeCmdDptr dptr;
+    uint64_t    slba;
+    uint32_t    rsvd48;
+    uint8_t     zsa;
+    uint8_t     zsflags;
+    uint16_t    rsvd54;
+    uint32_t    rsvd56[2];
+} NvmeZoneManagementSendCmd;
+
+#define NVME_CMD_ZONE_MGMT_SEND_SELECT_ALL(zsflags) ((zsflags) & 0x1)
+
+typedef enum NvmeZoneManagementSendAction {
+    NVME_CMD_ZONE_MGMT_SEND_CLOSE   = 0x1,
+    NVME_CMD_ZONE_MGMT_SEND_FINISH  = 0x2,
+    NVME_CMD_ZONE_MGMT_SEND_OPEN    = 0x3,
+    NVME_CMD_ZONE_MGMT_SEND_RESET   = 0x4,
+    NVME_CMD_ZONE_MGMT_SEND_OFFLINE = 0x5,
+    NVME_CMD_ZONE_MGMT_SEND_SET_ZDE = 0x10,
+} NvmeZoneManagementSendAction;
+
+typedef struct NvmeZoneManagementRecvCmd {
+    uint8_t     opcode;
+    uint8_t     flags;
+    uint16_t    cid;
+    uint32_t    nsid;
+    uint8_t     rsvd8[16];
+    NvmeCmdDptr dptr;
+    uint64_t    slba;
+    uint32_t    numdw;
+    uint8_t     zra;
+    uint8_t     zrasp;
+    uint8_t     zrasf;
+    uint8_t     rsvd55[9];
+} NvmeZoneManagementRecvCmd;
+
+typedef enum NvmeZoneManagementRecvAction {
+    NVME_CMD_ZONE_MGMT_RECV_REPORT_ZONES          = 0x0,
+    NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES = 0x1,
+} NvmeZoneManagementRecvAction;
+
+typedef enum NvmeZoneManagementRecvActionSpecificField {
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ALL  = 0x0,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSE  = 0x1,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSIO = 0x2,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSEO = 0x3,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSC  = 0x4,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSF  = 0x5,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSRO = 0x6,
+    NVME_CMD_ZONE_MGMT_RECV_LIST_ZSO  = 0x7,
+} NvmeZoneManagementRecvActionSpecificField;
+
+#define NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL 0x1
+
 typedef struct NvmeDsmCmd {
     uint8_t     opcode;
     uint8_t     flags;
@@ -702,13 +784,15 @@ enum NvmeAsyncEventRequest {
     NVME_AER_INFO_SMART_RELIABILITY         = 0,
     NVME_AER_INFO_SMART_TEMP_THRESH         = 1,
     NVME_AER_INFO_SMART_SPARE_THRESH        = 2,
+    NVME_AER_INFO_NOTICE_ZONE_DESCR_CHANGED = 0xef,
 };
 
 typedef struct NvmeAerResult {
-    uint8_t event_type;
-    uint8_t event_info;
-    uint8_t log_page;
-    uint8_t resv;
+    uint8_t  event_type;
+    uint8_t  event_info;
+    uint8_t  log_page;
+    uint8_t  resv;
+    uint32_t nsid;
 } NvmeAerResult;
 
 typedef struct NvmeCqe {
@@ -775,6 +859,14 @@ enum NvmeStatusCodes {
     NVME_CONFLICTING_ATTRS      = 0x0180,
     NVME_INVALID_PROT_INFO      = 0x0181,
     NVME_WRITE_TO_RO            = 0x0182,
+    NVME_ZONE_BOUNDARY_ERROR    = 0x01b8,
+    NVME_ZONE_IS_FULL           = 0x01b9,
+    NVME_ZONE_IS_READ_ONLY      = 0x01ba,
+    NVME_ZONE_IS_OFFLINE        = 0x01bb,
+    NVME_ZONE_INVALID_WRITE     = 0x01bc,
+    NVME_TOO_MANY_ACTIVE_ZONES  = 0x01bd,
+    NVME_TOO_MANY_OPEN_ZONES    = 0x01be,
+    NVME_INVALID_ZONE_STATE_TRANSITION = 0x01bf,
     NVME_WRITE_FAULT            = 0x0280,
     NVME_UNRECOVERED_READ       = 0x0281,
     NVME_E2E_GUARD_ERROR        = 0x0282,
@@ -868,6 +960,46 @@ enum {
     NVME_EFFECTS_UUID_SEL   = 1 << 19,
 };
 
+typedef enum NvmeZoneType {
+    NVME_ZT_SEQ = 0x2,
+} NvmeZoneType;
+
+typedef enum NvmeZoneState {
+    NVME_ZS_ZSE  = 0x1,
+    NVME_ZS_ZSIO = 0x2,
+    NVME_ZS_ZSEO = 0x3,
+    NVME_ZS_ZSC  = 0x4,
+    NVME_ZS_ZSRO = 0xd,
+    NVME_ZS_ZSF  = 0xe,
+    NVME_ZS_ZSO  = 0xf,
+} NvmeZoneState;
+
+typedef struct NvmeZoneDescriptor {
+    uint8_t  zt;
+    uint8_t  zs;
+    uint8_t  za;
+    uint8_t  rsvd3[5];
+    uint64_t zcap;
+    uint64_t zslba;
+    uint64_t wp;
+    uint8_t  rsvd32[32];
+} NvmeZoneDescriptor;
+
+#define NVME_ZS(zs) (((zs) >> 4) & 0xf)
+#define NVME_ZS_SET(zs, state) ((zs) = ((state) << 4))
+
+#define NVME_ZA_ZFC(za)  ((za) & (1 << 0))
+#define NVME_ZA_FZR(za)  ((za) & (1 << 1))
+#define NVME_ZA_RZR(za)  ((za) & (1 << 2))
+#define NVME_ZA_ZDEV(za) ((za) & (1 << 7))
+
+#define NVME_ZA_SET_ZFC(za, val)  ((za) |= (((val) & 1) << 0))
+#define NVME_ZA_SET_FZR(za, val)  ((za) |= (((val) & 1) << 1))
+#define NVME_ZA_SET_RZR(za, val)  ((za) |= (((val) & 1) << 2))
+#define NVME_ZA_SET_ZDEV(za, val) ((za) |= (((val) & 1) << 7))
+
+#define NVME_ZA_CLEAR(za) ((za) = 0x0)
+
 enum NvmeSmartWarn {
     NVME_SMART_SPARE                  = 1 << 0,
     NVME_SMART_TEMPERATURE            = 1 << 1,
@@ -899,6 +1031,7 @@ enum NvmeLogIdentifier {
     NVME_LOG_SMART_INFO     = 0x02,
     NVME_LOG_FW_SLOT_INFO   = 0x03,
     NVME_LOG_EFFECTS        = 0x05,
+    NVME_LOG_CHANGED_ZONE_LIST = 0xbf,
 };
 
 typedef struct NvmePSD {
@@ -1008,6 +1141,10 @@ typedef struct NvmeIdCtrl {
     uint8_t     vs[1024];
 } NvmeIdCtrl;
 
+enum NvmeIdCtrlOaes {
+    NVME_OAES_ZDCN = 1 << 27,
+};
+
 enum NvmeIdCtrlOacs {
     NVME_OACS_SECURITY  = 1 << 0,
     NVME_OACS_FORMAT    = 1 << 1,
@@ -1048,6 +1185,11 @@ enum NvmeIdCtrlLpa {
 #define NVME_CTRL_SGLS_MPTR_SGL                  (0x1 << 19)
 #define NVME_CTRL_SGLS_ADDR_OFFSET               (0x1 << 20)
 
+typedef struct NvmeIdCtrlZns {
+    uint8_t zasl;
+    uint8_t rsvd1[4095];
+} NvmeIdCtrlZns;
+
 #define NVME_ARB_AB(arb)    (arb & 0x7)
 #define NVME_ARB_AB_NOLIMIT 0x7
 #define NVME_ARB_LPW(arb)   ((arb >> 8) & 0xff)
@@ -1071,6 +1213,7 @@ enum NvmeIdCtrlLpa {
 #define NVME_AEC_SMART(aec)         (aec & 0xff)
 #define NVME_AEC_NS_ATTR(aec)       ((aec >> 8) & 0x1)
 #define NVME_AEC_FW_ACTIVATION(aec) ((aec >> 9) & 0x1)
+#define NVME_AEC_ZDCN(aec)          ((aec >> 27) & 0x1)
 
 #define NVME_ERR_REC_TLER(err_rec)  (err_rec & 0xffff)
 #define NVME_ERR_REC_DULBE(err_rec) (err_rec & 0x10000)
@@ -1226,9 +1369,33 @@ enum NvmeIdNsDps {
     DPS_FIRST_EIGHT = 8,
 };
 
+typedef struct NvmeLBAFE {
+    uint64_t    zsze;
+    uint8_t     zdes;
+    uint8_t     rsvd9[7];
+} NvmeLBAFE;
+
+typedef struct NvmeIdNsZns {
+    uint16_t    zoc;
+    uint16_t    ozcs;
+    uint32_t    mar;
+    uint32_t    mor;
+    uint32_t    rrl;
+    uint32_t    frl;
+    uint8_t     rsvd20[2795];
+    NvmeLBAFE   lbafe[16];
+    uint8_t     rsvd3072[768];
+    uint8_t     vs[256];
+} NvmeIdNsZns;
+
+#define NVME_ID_NS_ZNS_ZOC_VZC (1 << 0)
+#define NVME_ID_NS_ZNS_ZOC_ZAE (1 << 1)
+
+#define NVME_ID_NS_ZNS_OZCS_RAZB (1 << 0)
+
 static inline void _nvme_check_size(void)
 {
-    QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 4);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 8);
     QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16);
     QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16);
     QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64);
@@ -1237,17 +1404,24 @@ static inline void _nvme_check_size(void)
     QEMU_BUILD_BUG_ON(sizeof(NvmeCreateSq) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeZoneAppendCmd) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementSendCmd) != 64);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementRecvCmd) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
     QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512);
     QEMU_BUILD_BUG_ON(sizeof(NvmeEnduranceGroupLog) != 512);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZns) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZns) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeNvmSetAttributes) != 128);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdNvmSetList) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescriptor) != 64);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16);
 }
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/10] hw/block/nvme: add basic read/write for zoned namespaces
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
  2020-06-30 10:01 ` [PATCH 01/10] hw/block/nvme: support I/O Command Sets Klaus Jensen
  2020-06-30 10:01 ` [PATCH 02/10] hw/block/nvme: add zns specific fields and types Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 04/10] hw/block/nvme: add the zone management receive command Klaus Jensen
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

This adds basic read and write for zoned namespaces.

A zoned namespace is created by setting the iocs parameter to 0x2 and
supplying a zero-sized blockdev for zone info persistent state
(zns.zoneinfo parameter) and the zns.zcap parameter to specify the
individual zone capacities. The namespace device will compute the
resulting zone size to be the next power of two and fit in as many zones
as possible on the underlying namespace blockdev.

If the zone info blockdev pointed to by zns.zoneinfo is non-zero in size
it will be assumed to contain existing zone state.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme-ns.c    | 227 +++++++++++++++++++++++++-
 hw/block/nvme-ns.h    | 103 ++++++++++++
 hw/block/nvme.c       | 361 +++++++++++++++++++++++++++++++++++++++---
 hw/block/nvme.h       |   1 +
 hw/block/trace-events |  10 ++
 5 files changed, 677 insertions(+), 25 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index ae051784caaf..9a08b2ba0fb2 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -28,6 +28,26 @@
 #include "nvme.h"
 #include "nvme-ns.h"
 
+const char *nvme_zs_str(NvmeZone *zone)
+{
+    return nvme_zs_to_str(nvme_zs(zone));
+}
+
+const char *nvme_zs_to_str(NvmeZoneState zs)
+{
+    switch (zs) {
+    case NVME_ZS_ZSE:  return "ZSE";
+    case NVME_ZS_ZSIO: return "ZSIO";
+    case NVME_ZS_ZSEO: return "ZSEO";
+    case NVME_ZS_ZSC:  return "ZSC";
+    case NVME_ZS_ZSRO: return "ZSRO";
+    case NVME_ZS_ZSF:  return "ZSF";
+    case NVME_ZS_ZSO:  return "ZSO";
+    }
+
+    return NULL;
+}
+
 static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp)
 {
 	Error *local_err = NULL;
@@ -57,6 +77,171 @@ static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp)
 	return 0;
 }
 
+static int nvme_ns_init_blk_zoneinfo(NvmeNamespace *ns, size_t len,
+                                     Error **errp)
+{
+    NvmeZone *zone;
+    NvmeZoneDescriptor *zd;
+    uint64_t zslba;
+    int ret;
+
+    BlockBackend *blk = ns->zns.info.blk;
+
+    Error *local_err = NULL;
+
+    for (int i = 0; i < ns->zns.info.num_zones; i++) {
+        zslba = i * nvme_ns_zsze(ns);
+        zone = nvme_ns_get_zone(ns, zslba);
+        zd = &zone->zd;
+
+        zd->zt = NVME_ZT_SEQ;
+        nvme_zs_set(zone, NVME_ZS_ZSE);
+        zd->zcap = ns->params.zns.zcap;
+        zone->wp_staging = zslba;
+        zd->wp = zd->zslba = cpu_to_le64(zslba);
+    }
+
+    ret = nvme_ns_blk_resize(blk, len, &local_err);
+    if (ret) {
+        error_propagate_prepend(errp, local_err,
+                                "could not resize zoneinfo blockdev: ");
+        return ret;
+    }
+
+    for (int i = 0; i < ns->zns.info.num_zones; i++) {
+        zd = &ns->zns.info.zones[i].zd;
+
+        ret = blk_pwrite(blk, i * sizeof(NvmeZoneDescriptor), zd,
+                         sizeof(NvmeZoneDescriptor), 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "blk_pwrite: ");
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp)
+{
+    NvmeZone *zone;
+    NvmeZoneDescriptor *zd;
+    BlockBackend *blk = ns->zns.info.blk;
+    uint64_t perm, shared_perm;
+    int64_t len, zoneinfo_len;
+
+    Error *local_err = NULL;
+    int ret;
+
+    perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
+    shared_perm = BLK_PERM_ALL;
+
+    ret = blk_set_perm(blk, perm, shared_perm, &local_err);
+    if (ret) {
+        error_propagate_prepend(errp, local_err, "blk_set_perm: ");
+        return ret;
+    }
+
+    zoneinfo_len = ROUND_UP(ns->zns.info.num_zones *
+                            sizeof(NvmeZoneDescriptor), BDRV_SECTOR_SIZE);
+
+    len = blk_getlength(blk);
+    if (len < 0) {
+        error_setg_errno(errp, -len, "blk_getlength: ");
+        return len;
+    }
+
+    if (len) {
+        if (len != zoneinfo_len) {
+            error_setg(errp, "zoneinfo size mismatch "
+                       "(expected %"PRIu64" bytes; was %"PRIu64" bytes)",
+                       zoneinfo_len, len);
+            error_append_hint(errp, "Did you change the zone size or "
+                              "zone descriptor size?\n");
+            return -1;
+        }
+
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zone = &ns->zns.info.zones[i];
+            zd = &zone->zd;
+
+            ret = blk_pread(blk, i * sizeof(NvmeZoneDescriptor), zd,
+                            sizeof(NvmeZoneDescriptor));
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "blk_pread: ");
+                return ret;
+            } else if (ret != sizeof(NvmeZoneDescriptor)) {
+                error_setg(errp, "blk_pread: short read");
+                return -1;
+            }
+
+            zone->wp_staging = nvme_wp(zone);
+
+            switch (nvme_zs(zone)) {
+            case NVME_ZS_ZSE:
+            case NVME_ZS_ZSF:
+            case NVME_ZS_ZSRO:
+            case NVME_ZS_ZSO:
+                continue;
+
+            case NVME_ZS_ZSC:
+                if (nvme_wp(zone) == nvme_zslba(zone)) {
+                    nvme_zs_set(zone, NVME_ZS_ZSE);
+                    continue;
+                }
+
+                /* fallthrough */
+
+            case NVME_ZS_ZSIO:
+            case NVME_ZS_ZSEO:
+                nvme_zs_set(zone, NVME_ZS_ZSF);
+                NVME_ZA_SET_ZFC(zd->za, 0x1);
+            }
+        }
+
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zd = &ns->zns.info.zones[i].zd;
+
+            ret = blk_pwrite(blk, i * sizeof(NvmeZoneDescriptor), zd,
+                             sizeof(NvmeZoneDescriptor), 0);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "blk_pwrite: ");
+                return ret;
+            }
+        }
+
+        return 0;
+    }
+
+    if (nvme_ns_init_blk_zoneinfo(ns, zoneinfo_len, &local_err)) {
+        error_propagate_prepend(errp, local_err,
+                                "could not initialize zoneinfo blockdev: ");
+    }
+
+    return 0;
+}
+
+static void nvme_ns_init_zoned(NvmeNamespace *ns)
+{
+    NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
+    NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns);
+
+    id_ns_zns->zoc = cpu_to_le16(ns->params.zns.zoc);
+    id_ns_zns->ozcs = cpu_to_le16(ns->params.zns.ozcs);
+
+    for (int i = 0; i <= id_ns->nlbaf; i++) {
+        id_ns_zns->lbafe[i].zsze = cpu_to_le64(pow2ceil(ns->params.zns.zcap));
+    }
+
+    ns->zns.info.num_zones = nvme_ns_nlbas(ns) / nvme_ns_zsze(ns);
+    ns->zns.info.zones = g_malloc0_n(ns->zns.info.num_zones, sizeof(NvmeZone));
+
+    id_ns->ncap = ns->zns.info.num_zones * ns->params.zns.zcap;
+
+    id_ns_zns->mar = 0xffffffff;
+    id_ns_zns->mor = 0xffffffff;
+}
+
 static void nvme_ns_init(NvmeNamespace *ns)
 {
     NvmeIdNsNvm *id_ns;
@@ -69,12 +254,20 @@ static void nvme_ns_init(NvmeNamespace *ns)
     ns->iocs = ns->params.iocs;
 
     id_ns->dlfeat = unmap ? 0x9 : 0x0;
+    if (!nvme_ns_zoned(ns)) {
+        id_ns->dlfeat = unmap ? 0x9 : 0x0;
+    }
     id_ns->lbaf[0].ds = ns->params.lbads;
 
     id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
+    id_ns->ncap = id_ns->nsze;
+
+    if (ns->iocs == NVME_IOCS_ZONED) {
+        ns->id_ns[NVME_IOCS_ZONED] = g_new0(NvmeIdNsZns, 1);
+        nvme_ns_init_zoned(ns);
+    }
 
     /* no thin provisioning */
-    id_ns->ncap = id_ns->nsze;
     id_ns->nuse = id_ns->ncap;
 }
 
@@ -194,6 +387,28 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, Error
         return -1;
     }
 
+    switch (ns->params.iocs) {
+    case NVME_IOCS_NVM:
+        break;
+
+    case NVME_IOCS_ZONED:
+        if (!ns->zns.info.blk) {
+            error_setg(errp, "zone info block backend not configured");
+            return -1;
+        }
+
+        if (!ns->params.zns.zcap) {
+            error_setg(errp, "zero zone capacity");
+            return -1;
+        }
+
+        break;
+
+    default:
+        error_setg(errp, "unsupported I/O command set");
+        return -1;
+    }
+
     return 0;
 }
 
@@ -222,6 +437,12 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
         id_ns->nsfeat |= 0x4;
     }
 
+    if (nvme_ns_zoned(ns)) {
+        if (nvme_ns_setup_blk_zoneinfo(ns, errp)) {
+            return -1;
+        }
+    }
+
     if (nvme_register_namespace(n, ns, errp)) {
         return -1;
     }
@@ -249,6 +470,10 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT8("lbads", NvmeNamespace, params.lbads, BDRV_SECTOR_BITS),
     DEFINE_PROP_DRIVE("state", NvmeNamespace, blk_state),
     DEFINE_PROP_UINT8("iocs", NvmeNamespace, params.iocs, 0x0),
+    DEFINE_PROP_DRIVE("zns.zoneinfo", NvmeNamespace, zns.info.blk),
+    DEFINE_PROP_UINT64("zns.zcap", NvmeNamespace, params.zns.zcap, 0),
+    DEFINE_PROP_UINT16("zns.zoc", NvmeNamespace, params.zns.zoc, 0),
+    DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 4124f20f1cef..7dcf0f02a07f 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -23,8 +23,20 @@ typedef struct NvmeNamespaceParams {
     uint32_t nsid;
     uint8_t  iocs;
     uint8_t  lbads;
+
+    struct {
+        uint64_t zcap;
+        uint16_t zoc;
+        uint16_t ozcs;
+    } zns;
 } NvmeNamespaceParams;
 
+typedef struct NvmeZone {
+    NvmeZoneDescriptor zd;
+
+    uint64_t wp_staging;
+} NvmeZone;
+
 typedef struct NvmeNamespace {
     DeviceState  parent_obj;
     BlockBackend *blk;
@@ -41,8 +53,22 @@ typedef struct NvmeNamespace {
     struct {
         uint32_t err_rec;
     } features;
+
+    struct {
+        struct {
+            BlockBackend *blk;
+
+            uint64_t  num_zones;
+            NvmeZone *zones;
+        } info;
+    } zns;
 } NvmeNamespace;
 
+static inline bool nvme_ns_zoned(NvmeNamespace *ns)
+{
+    return ns->iocs == NVME_IOCS_ZONED;
+}
+
 static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 {
     if (ns) {
@@ -57,17 +83,39 @@ static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns)
     return ns->id_ns[NVME_IOCS_NVM];
 }
 
+static inline NvmeIdNsZns *nvme_ns_id_zoned(NvmeNamespace *ns)
+{
+    return ns->id_ns[NVME_IOCS_ZONED];
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
     NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
     return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 }
 
+static inline NvmeLBAFE *nvme_ns_lbafe(NvmeNamespace *ns)
+{
+    NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
+    NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns);
+    return &id_ns_zns->lbafe[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
 static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
 {
     return nvme_ns_lbaf(ns)->ds;
 }
 
+static inline uint64_t nvme_ns_zsze(NvmeNamespace *ns)
+{
+    return nvme_ns_lbafe(ns)->zsze;
+}
+
+static inline uint64_t nvme_ns_zsze_bytes(NvmeNamespace *ns)
+{
+    return nvme_ns_zsze(ns) << nvme_ns_lbads(ns);
+}
+
 /* calculate the number of LBAs that the namespace can accomodate */
 static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
 {
@@ -79,8 +127,63 @@ static inline size_t nvme_ns_blk_state_len(NvmeNamespace *ns)
     return ROUND_UP(DIV_ROUND_UP(nvme_ns_nlbas(ns), 8), BDRV_SECTOR_SIZE);
 }
 
+static inline uint64_t nvme_ns_zone_idx(NvmeNamespace *ns, uint64_t lba)
+{
+    return lba / nvme_ns_zsze(ns);
+}
+
+static inline NvmeZone *nvme_ns_get_zone(NvmeNamespace *ns, uint64_t lba)
+{
+    uint64_t idx = nvme_ns_zone_idx(ns, lba);
+    if (unlikely(idx >= ns->zns.info.num_zones)) {
+        return NULL;
+    }
+
+    return &ns->zns.info.zones[idx];
+}
+
+static inline NvmeZoneState nvme_zs(NvmeZone *zone)
+{
+    return (zone->zd.zs >> 4) & 0xf;
+}
+
+static inline void nvme_zs_set(NvmeZone *zone, NvmeZoneState zs)
+{
+    zone->zd.zs = zs << 4;
+}
+
+static inline bool nvme_ns_zone_wp_valid(NvmeZone *zone)
+{
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSF:
+    case NVME_ZS_ZSRO:
+    case NVME_ZS_ZSO:
+        return false;
+    default:
+        return false;
+    }
+}
+
+static inline uint64_t nvme_zslba(NvmeZone *zone)
+{
+    return le64_to_cpu(zone->zd.zslba);
+}
+
+static inline uint64_t nvme_zcap(NvmeZone *zone)
+{
+    return le64_to_cpu(zone->zd.zcap);
+}
+
+static inline uint64_t nvme_wp(NvmeZone *zone)
+{
+    return le64_to_cpu(zone->zd.wp);
+}
+
 typedef struct NvmeCtrl NvmeCtrl;
 
+const char *nvme_zs_str(NvmeZone *zone);
+const char *nvme_zs_to_str(NvmeZoneState zs);
+
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 
 #endif /* NVME_NS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1662c11a4cf3..4ec3b3029388 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -902,6 +902,115 @@ static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
     }
 }
 
+static uint16_t nvme_check_zone_readable(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeZoneState zs = nvme_zs(zone);
+    uint64_t zslba = nvme_zslba(zone);
+
+    if (zs == NVME_ZS_ZSO) {
+        trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), zslba,
+                                                  NVME_ZS_ZSO);
+        return NVME_ZONE_IS_OFFLINE | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
+static uint16_t nvme_check_zone_read(NvmeCtrl *n, uint64_t slba, uint32_t nlb,
+    NvmeRequest *req, NvmeZone *zone)
+{
+    NvmeNamespace *ns = req->ns;
+    NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns);
+    uint64_t zslba = nvme_zslba(zone);
+    uint64_t zsze = nvme_ns_zsze(ns);
+    uint16_t status;
+
+    status = nvme_check_zone_readable(n, req, zone);
+    if (status) {
+        return status;
+    }
+
+    if ((slba + nlb) > (zslba + zsze)) {
+        if (!(id_ns_zns->ozcs & NVME_ID_NS_ZNS_OZCS_RAZB)) {
+            trace_pci_nvme_err_zone_boundary(nvme_cid(req), slba, nlb, zsze);
+            return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR;
+        }
+    }
+
+    return NVME_SUCCESS;
+}
+
+static uint16_t nvme_check_zone_writeable(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeZoneState zs = nvme_zs(zone);
+    uint64_t zslba = nvme_zslba(zone);
+
+    if (zs == NVME_ZS_ZSO) {
+        trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), zslba,
+                                                  NVME_ZS_ZSO);
+        return NVME_ZONE_IS_OFFLINE | NVME_DNR;
+    }
+
+    switch (zs) {
+    case NVME_ZS_ZSE:
+    case NVME_ZS_ZSC:
+    case NVME_ZS_ZSIO:
+    case NVME_ZS_ZSEO:
+        return NVME_SUCCESS;
+    case NVME_ZS_ZSF:
+        trace_pci_nvme_err_zone_is_full(nvme_cid(req), req->slba);
+        return NVME_ZONE_IS_FULL | NVME_DNR;
+    case NVME_ZS_ZSRO:
+        trace_pci_nvme_err_zone_is_read_only(nvme_cid(req), req->slba);
+        return NVME_ZONE_IS_READ_ONLY | NVME_DNR;
+    default:
+        break;
+    }
+
+    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), zslba, zs);
+    return NVME_INTERNAL_DEV_ERROR | NVME_DNR;
+}
+
+static uint16_t nvme_check_zone_write(NvmeCtrl *n, uint64_t slba, uint32_t nlb,
+    NvmeRequest *req, NvmeZone *zone)
+{
+    uint64_t zslba, wp, zcap;
+    uint16_t status;
+
+    zslba = nvme_zslba(zone);
+    wp = zone->wp_staging;
+    zcap = nvme_zcap(zone);
+
+    status = nvme_check_zone_writeable(n, req, zone);
+    if (status) {
+        return status;
+    }
+
+    if ((wp - zslba) + nlb > zcap) {
+        trace_pci_nvme_err_zone_boundary(nvme_cid(req), slba, nlb, zcap);
+        return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR;
+    }
+
+    if (slba != wp) {
+        trace_pci_nvme_err_zone_invalid_write(nvme_cid(req), slba, wp);
+        return NVME_ZONE_INVALID_WRITE | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
+static inline uint16_t nvme_check_rwz_zone(NvmeCtrl *n, uint64_t slba,
+    uint32_t nlb, NvmeRequest *req, NvmeZone *zone)
+{
+    if (nvme_req_is_write(req)) {
+        return nvme_check_zone_write(n, slba, nlb, req, zone);
+    }
+
+    return nvme_check_zone_read(n, slba, nlb, req, zone);
+}
+
 static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len)
 {
     uint8_t mdts = n->params.mdts;
@@ -995,6 +1104,44 @@ static void nvme_ns_update_util(NvmeNamespace *ns, uint64_t slba,
     nvme_req_add_aio(req, aio);
 }
 
+static void nvme_update_zone_info(NvmeNamespace *ns, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    uint64_t zslba = -1;
+
+    QEMUIOVector *iov = g_new0(QEMUIOVector, 1);
+    NvmeAIO *aio = g_new0(NvmeAIO, 1);
+
+    *aio = (NvmeAIO) {
+        .opc = NVME_AIO_OPC_WRITE,
+        .blk = ns->zns.info.blk,
+        .payload = iov,
+        .req = req,
+        .flags = NVME_AIO_INTERNAL,
+    };
+
+    qemu_iovec_init(iov, 1);
+
+    if (zone) {
+        zslba = nvme_zslba(zone);
+        trace_pci_nvme_update_zone_info(nvme_cid(req), ns->params.nsid, zslba);
+
+        aio->offset = nvme_ns_zone_idx(ns, zslba) * sizeof(NvmeZoneDescriptor);
+        qemu_iovec_add(iov, &zone->zd, sizeof(NvmeZoneDescriptor));
+    } else {
+        trace_pci_nvme_update_zone_info(nvme_cid(req), ns->params.nsid, zslba);
+
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            qemu_iovec_add(iov, &ns->zns.info.zones[i].zd,
+                           sizeof(NvmeZoneDescriptor));
+        }
+    }
+
+    aio->len = iov->size;
+
+    nvme_req_add_aio(req, aio);
+}
+
 static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret)
 {
     NvmeRequest *req = aio->req;
@@ -1009,6 +1156,44 @@ static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret)
     }
 }
 
+static void nvme_zone_advance_wp(NvmeZone *zone, uint32_t nlb,
+    NvmeRequest *req)
+{
+    NvmeZoneDescriptor *zd = &zone->zd;
+    uint64_t wp = nvme_wp(zone);
+    uint64_t zslba = nvme_zslba(zone);
+
+    trace_pci_nvme_zone_advance_wp(nvme_cid(req), zslba, nlb, wp, wp + nlb);
+
+    wp += nlb;
+    if (wp == zslba + nvme_zcap(zone)) {
+        nvme_zs_set(zone, NVME_ZS_ZSF);
+    }
+
+    zd->wp = cpu_to_le64(wp);
+}
+
+static void nvme_aio_zone_write_cb(NvmeAIO *aio, void *opaque, int ret)
+{
+    NvmeZone *zone = opaque;
+    NvmeRequest *req = aio->req;
+    NvmeNamespace *ns = req->ns;
+    uint32_t nlb = req->nlb;
+    uint64_t zslba = nvme_zslba(zone);
+    uint64_t wp = nvme_wp(zone);
+
+    trace_pci_nvme_aio_zone_write_cb(nvme_cid(req), zslba, nlb, wp);
+
+    if (ret) {
+        return;
+    }
+
+    nvme_aio_write_cb(aio, opaque, ret);
+    nvme_zone_advance_wp(zone, nlb, req);
+
+    nvme_update_zone_info(ns, req, zone);
+}
+
 static void nvme_rw_cb(NvmeRequest *req, void *opaque)
 {
     NvmeNamespace *ns = req->ns;
@@ -1045,6 +1230,7 @@ static void nvme_aio_cb(void *opaque, int ret)
         block_acct_failed(stats, acct);
 
         if (req) {
+            NvmeNamespace *ns = req->ns;
             uint16_t status;
 
             switch (aio->opc) {
@@ -1075,6 +1261,16 @@ static void nvme_aio_cb(void *opaque, int ret)
             if (!req->status || (status & 0xfff) == NVME_INTERNAL_DEV_ERROR) {
                 req->status = status;
             }
+
+            /* transition the zone to offline state */
+            if (nvme_ns_zoned(ns)) {
+                NvmeZone *zone = nvme_ns_get_zone(ns, req->slba);
+
+                nvme_zs_set(zone, NVME_ZS_ZSO);
+                NVME_ZA_CLEAR(zone->zd.za);
+
+                nvme_update_zone_info(ns, req, zone);
+            }
         }
     }
 
@@ -1098,7 +1294,8 @@ static void nvme_aio_cb(void *opaque, int ret)
 }
 
 static void nvme_aio_rw(NvmeNamespace *ns, NvmeAIOOp opc,
-                        NvmeAIOCompletionFunc *cb, NvmeRequest *req)
+                        NvmeAIOCompletionFunc *cb, void *cb_arg,
+                        NvmeRequest *req)
 {
     NvmeAIO *aio = g_new(NvmeAIO, 1);
 
@@ -1108,6 +1305,7 @@ static void nvme_aio_rw(NvmeNamespace *ns, NvmeAIOOp opc,
         .offset = req->slba << nvme_ns_lbads(ns),
         .req = req,
         .cb = cb,
+        .cb_arg = cb_arg,
     };
 
     if (req->qsg.sg) {
@@ -1138,33 +1336,59 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_do_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
-    NvmeNamespace *ns = req->ns;
     NvmeAIO *aio;
+    NvmeAIOCompletionFunc *cb = nvme_aio_write_cb;
+    void *cb_arg = NULL;
+
+    NvmeNamespace *ns = req->ns;
 
     int64_t offset;
     size_t count;
     uint16_t status;
 
-    req->slba = le64_to_cpu(rw->slba);
-    req->nlb  = le16_to_cpu(rw->nlb) + 1;
-
     trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), req->slba,
                                 req->nlb);
 
     status = nvme_check_bounds(n, ns, req->slba, req->nlb);
     if (status) {
         NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
-        trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb,
-                                             id_ns->nsze);
-        return status;
+        trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb, id_ns->nsze);
+
+        goto invalid;
     }
 
     offset = req->slba << nvme_ns_lbads(ns);
     count = req->nlb << nvme_ns_lbads(ns);
 
+    if (nvme_ns_zoned(ns)) {
+        NvmeZone *zone = nvme_ns_get_zone(ns, req->slba);
+        if (!zone) {
+            trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba);
+            status = NVME_INVALID_FIELD | NVME_DNR;
+            goto invalid;
+        }
+
+        status = nvme_check_zone_write(n, req->slba, req->nlb, req, zone);
+        if (status) {
+            goto invalid;
+        }
+
+        switch (nvme_zs(zone)) {
+        case NVME_ZS_ZSE:
+        case NVME_ZS_ZSC:
+            nvme_zs_set(zone, NVME_ZS_ZSIO);
+        default:
+            break;
+        }
+
+        cb = nvme_aio_zone_write_cb;
+        cb_arg = zone;
+
+        zone->wp_staging += req->nlb;
+    }
+
     aio = g_new0(NvmeAIO, 1);
 
     *aio = (NvmeAIO) {
@@ -1173,25 +1397,33 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
         .offset = offset,
         .len = count,
         .req = req,
-        .cb = nvme_aio_write_cb,
+        .cb = cb,
+        .cb_arg = cb_arg,
     };
 
     nvme_req_add_aio(req, aio);
 
+    nvme_req_set_cb(req, nvme_rw_cb, NULL);
+
     return NVME_NO_COMPLETE;
+
+invalid:
+    block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE);
+    return status;
 }
 
-static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_do_rw(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+    NvmeAIOCompletionFunc *cb = NULL;
+    void *cb_arg = NULL;
+
     NvmeNamespace *ns = req->ns;
 
-    uint32_t len;
-    int status;
+    size_t len;
+    uint16_t status;
 
     enum BlockAcctType acct = BLOCK_ACCT_READ;
     NvmeAIOOp opc = NVME_AIO_OPC_READ;
-    NvmeAIOCompletionFunc *cb = NULL;
 
     if (nvme_req_is_write(req)) {
         acct = BLOCK_ACCT_WRITE;
@@ -1199,8 +1431,6 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
         cb = nvme_aio_write_cb;
     }
 
-    req->nlb  = le16_to_cpu(rw->nlb) + 1;
-    req->slba = le64_to_cpu(rw->slba);
     len = req->nlb << nvme_ns_lbads(ns);
 
     trace_pci_nvme_rw(nvme_cid(req), nvme_req_is_write(req) ? "write" : "read",
@@ -1216,7 +1446,38 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
         goto invalid;
     }
 
-    nvme_aio_rw(ns, opc, cb, req);
+    if (nvme_ns_zoned(ns)) {
+        NvmeZone *zone = nvme_ns_get_zone(ns, req->slba);
+        if (!zone) {
+            trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba);
+            status = NVME_INVALID_FIELD | NVME_DNR;
+            goto invalid;
+        }
+
+        status = nvme_check_rwz_zone(n, req->slba, req->nlb, req, zone);
+        if (status) {
+            goto invalid;
+        }
+
+        if (nvme_req_is_write(req)) {
+            switch (nvme_zs(zone)) {
+            case NVME_ZS_ZSE:
+            case NVME_ZS_ZSC:
+                nvme_zs_set(zone, NVME_ZS_ZSIO);
+            default:
+                break;
+            }
+
+            cb = nvme_aio_zone_write_cb;
+            cb_arg = zone;
+
+            zone->wp_staging += req->nlb;
+        }
+    } else if (nvme_req_is_write(req)) {
+        cb = nvme_aio_write_cb;
+    }
+
+    nvme_aio_rw(ns, opc, cb, cb_arg, req);
     nvme_req_set_cb(req, nvme_rw_cb, NULL);
 
     return NVME_NO_COMPLETE;
@@ -1226,6 +1487,47 @@ invalid:
     return status;
 }
 
+static uint16_t nvme_rwz(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
+    NvmeNamespace *ns = req->ns;
+    NvmeZone *zone;
+
+    req->nlb  = le16_to_cpu(rw->nlb) + 1;
+    req->slba = le64_to_cpu(rw->slba);
+
+    if (nvme_ns_zoned(ns) && nvme_req_is_write(req)) {
+        zone = nvme_ns_get_zone(ns, req->slba);
+        if (!zone) {
+            trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba);
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+
+        if (zone->wp_staging != nvme_wp(zone)) {
+            NVME_GUEST_ERR(pci_nvme_zone_pending_writes,
+                           "cid %"PRIu16"; zone (zslba 0x%"PRIx64") has "
+                           "pending writes "
+                           "(wp 0x%"PRIx64" wp_staging 0x%"PRIx64"; "
+                           "additional writes should not be submitted",
+                           nvme_cid(req), nvme_zslba(zone), nvme_wp(zone),
+                           zone->wp_staging);
+
+            if (n->params.defensive) {
+                return NVME_ZONE_INVALID_WRITE;
+            }
+        }
+    }
+
+    switch (req->cmd.opcode) {
+    case NVME_CMD_WRITE_ZEROES:
+        return nvme_do_write_zeroes(n, req);
+    default:
+        break;
+    }
+
+    return nvme_do_rw(n, req);
+}
+
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
     uint32_t nsid = le32_to_cpu(req->cmd.nsid);
@@ -1245,11 +1547,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
     switch (req->cmd.opcode) {
     case NVME_CMD_FLUSH:
         return nvme_flush(n, req);
-    case NVME_CMD_WRITE_ZEROES:
-        return nvme_write_zeroes(n, req);
-    case NVME_CMD_WRITE:
     case NVME_CMD_READ:
-        return nvme_rw(n, req);
+    case NVME_CMD_WRITE:
+    case NVME_CMD_WRITE_ZEROES:
+        return nvme_rwz(n, req);
     default:
         trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
@@ -2342,6 +2643,10 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
         if (ns->blk_state) {
             blk_drain(ns->blk_state);
         }
+
+        if (nvme_ns_zoned(ns)) {
+            blk_drain(ns->zns.info.blk);
+        }
     }
 
     for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
@@ -2376,6 +2681,10 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
         if (ns->blk_state) {
             blk_flush(ns->blk_state);
         }
+
+        if (nvme_ns_zoned(ns)) {
+            blk_flush(ns->zns.info.blk);
+        }
     }
 
     n->bar.cc = 0;
@@ -2897,7 +3206,7 @@ static void nvme_init_state(NvmeCtrl *n)
     n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
     n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
     n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
-    n->iocscs[0] = 1 << NVME_IOCS_NVM;
+    n->iocscs[0] = (1 << NVME_IOCS_NVM) | (1 << NVME_IOCS_ZONED);
     n->features.iocsci = 0;
 }
 
@@ -3047,6 +3356,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     NvmeIdCtrl *id = &n->id_ctrl;
     uint8_t *pci_conf = pci_dev->config;
 
+    n->id_ctrl_iocss[NVME_IOCS_NVM] = g_new0(NvmeIdCtrl, 1);
+    n->id_ctrl_iocss[NVME_IOCS_ZONED] = g_new0(NvmeIdCtrl, 1);
+
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
     strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
@@ -3183,6 +3495,7 @@ static Property nvme_props[] = {
     DEFINE_PROP_UINT8("aerl", NvmeCtrl, params.aerl, 3),
     DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64),
     DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7),
+    DEFINE_PROP_BOOL("defensive", NvmeCtrl, params.defensive, false),
     DEFINE_PROP_BOOL("x-use-intel-id", NvmeCtrl, params.use_intel_id, false),
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 69be47963f5d..1ec1af8d6291 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -7,6 +7,7 @@
 #define NVME_MAX_NAMESPACES 256
 
 typedef struct NvmeParams {
+    bool     defensive;
     char     *serial;
     uint32_t num_queues; /* deprecated since 5.1 */
     uint32_t max_ioqpairs;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 4cf0236631d2..9e0b848186c8 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -42,6 +42,8 @@ pci_nvme_req_add_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offs
 pci_nvme_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
 pci_nvme_aio_discard_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32""
 pci_nvme_aio_write_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32""
+pci_nvme_aio_zone_write_cb(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp 0x%"PRIx64""
+pci_nvme_zone_advance_wp(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp_old, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp_old 0x%"PRIx64" wp 0x%"PRIx64""
 pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
 pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8""
 pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
@@ -80,6 +82,8 @@ pci_nvme_mmio_write(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" data 0x%"PRI
 pci_nvme_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_head %"PRIu16""
 pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_tail %"PRIu16""
 pci_nvme_ns_update_util(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
+pci_nvme_zone_pending_writes(uint16_t cid, uint64_t zslba, uint64_t wp, uint64_t wp_staging) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" wp_staging 0x%"PRIx64""
+pci_nvme_update_zone_info(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64""
 pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
@@ -99,6 +103,10 @@ pci_nvme_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset,
 pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8""
 pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
+pci_nvme_err_zone_is_full(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PRIx64""
+pci_nvme_err_zone_is_read_only(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PRIx64""
+pci_nvme_err_zone_invalid_write(uint16_t cid, uint64_t slba, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" wp 0x%"PRIx64""
+pci_nvme_err_zone_boundary(uint16_t cid, uint64_t slba, uint32_t nlb, uint64_t zcap) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" zcap 0x%"PRIx64""
 pci_nvme_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
 pci_nvme_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
 pci_nvme_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16""
@@ -127,6 +135,8 @@ pci_nvme_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx1
 pci_nvme_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
 pci_nvme_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
 pci_nvme_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 0x%"PRIx16""
+pci_nvme_err_invalid_zone(uint16_t cid, uint64_t lba) "cid %"PRIu16" lba 0x%"PRIx64""
+pci_nvme_err_invalid_zone_condition(uint16_t cid, uint64_t zslba, uint8_t condition) "cid %"PRIu16" zslba 0x%"PRIx64" condition 0x%"PRIx8""
 pci_nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
 pci_nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
 pci_nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/10] hw/block/nvme: add the zone management receive command
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (2 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 03/10] hw/block/nvme: add basic read/write for zoned namespaces Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 05/10] hw/block/nvme: add the zone management send command Klaus Jensen
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Add the Zone Management Receive command.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme-ns.c    |  33 +++++++++--
 hw/block/nvme-ns.h    |   9 ++-
 hw/block/nvme.c       | 130 ++++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme.h       |   6 ++
 hw/block/trace-events |   1 +
 include/block/nvme.h  |   5 ++
 6 files changed, 179 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 9a08b2ba0fb2..68996c2f0e72 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -99,6 +99,10 @@ static int nvme_ns_init_blk_zoneinfo(NvmeNamespace *ns, size_t len,
         zd->zcap = ns->params.zns.zcap;
         zone->wp_staging = zslba;
         zd->wp = zd->zslba = cpu_to_le64(zslba);
+
+        if (ns->params.zns.zdes) {
+            zone->zde = g_malloc0(nvme_ns_zdes_bytes(ns));
+        }
     }
 
     ret = nvme_ns_blk_resize(blk, len, &local_err);
@@ -128,7 +132,7 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp)
     NvmeZoneDescriptor *zd;
     BlockBackend *blk = ns->zns.info.blk;
     uint64_t perm, shared_perm;
-    int64_t len, zoneinfo_len;
+    int64_t len, zoneinfo_len, zone_len;
 
     Error *local_err = NULL;
     int ret;
@@ -142,8 +146,9 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp)
         return ret;
     }
 
-    zoneinfo_len = ROUND_UP(ns->zns.info.num_zones *
-                            sizeof(NvmeZoneDescriptor), BDRV_SECTOR_SIZE);
+    zone_len = sizeof(NvmeZoneDescriptor) + nvme_ns_zdes_bytes(ns);
+    zoneinfo_len = ROUND_UP(ns->zns.info.num_zones * zone_len,
+                            BDRV_SECTOR_SIZE);
 
     len = blk_getlength(blk);
     if (len < 0) {
@@ -177,6 +182,23 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp)
 
             zone->wp_staging = nvme_wp(zone);
 
+            if (ns->params.zns.zdes) {
+                uint16_t zde_bytes = nvme_ns_zdes_bytes(ns);
+                int64_t offset = ns->zns.info.num_zones *
+                    sizeof(NvmeZoneDescriptor);
+                ns->zns.info.zones[i].zde = g_malloc(zde_bytes);
+
+                ret = blk_pread(blk, offset + i * zde_bytes,
+                                ns->zns.info.zones[i].zde, zde_bytes);
+                if (ret < 0) {
+                    error_setg_errno(errp, -ret, "blk_pread: ");
+                    return ret;
+                } else if (ret != zde_bytes) {
+                    error_setg(errp, "blk_pread: short read");
+                    return -1;
+                }
+            }
+
             switch (nvme_zs(zone)) {
             case NVME_ZS_ZSE:
             case NVME_ZS_ZSF:
@@ -185,7 +207,8 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp)
                 continue;
 
             case NVME_ZS_ZSC:
-                if (nvme_wp(zone) == nvme_zslba(zone)) {
+                if (nvme_wp(zone) == nvme_zslba(zone) &&
+                    !NVME_ZA_ZDEV(zd->za)) {
                     nvme_zs_set(zone, NVME_ZS_ZSE);
                     continue;
                 }
@@ -231,6 +254,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 
     for (int i = 0; i <= id_ns->nlbaf; i++) {
         id_ns_zns->lbafe[i].zsze = cpu_to_le64(pow2ceil(ns->params.zns.zcap));
+        id_ns_zns->lbafe[i].zdes = ns->params.zns.zdes;
     }
 
     ns->zns.info.num_zones = nvme_ns_nlbas(ns) / nvme_ns_zsze(ns);
@@ -472,6 +496,7 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT8("iocs", NvmeNamespace, params.iocs, 0x0),
     DEFINE_PROP_DRIVE("zns.zoneinfo", NvmeNamespace, zns.info.blk),
     DEFINE_PROP_UINT64("zns.zcap", NvmeNamespace, params.zns.zcap, 0),
+    DEFINE_PROP_UINT8("zns.zdes", NvmeNamespace, params.zns.zdes, 0),
     DEFINE_PROP_UINT16("zns.zoc", NvmeNamespace, params.zns.zoc, 0),
     DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7dcf0f02a07f..5940fb73e72b 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -26,13 +26,15 @@ typedef struct NvmeNamespaceParams {
 
     struct {
         uint64_t zcap;
+        uint8_t  zdes;
         uint16_t zoc;
         uint16_t ozcs;
     } zns;
 } NvmeNamespaceParams;
 
 typedef struct NvmeZone {
-    NvmeZoneDescriptor zd;
+    NvmeZoneDescriptor  zd;
+    uint8_t             *zde;
 
     uint64_t wp_staging;
 } NvmeZone;
@@ -152,6 +154,11 @@ static inline void nvme_zs_set(NvmeZone *zone, NvmeZoneState zs)
     zone->zd.zs = zs << 4;
 }
 
+static inline size_t nvme_ns_zdes_bytes(NvmeNamespace *ns)
+{
+    return ns->params.zns.zdes << 6;
+}
+
 static inline bool nvme_ns_zone_wp_valid(NvmeZone *zone)
 {
     switch (nvme_zs(zone)) {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 4ec3b3029388..7e943dece352 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1528,6 +1528,134 @@ static uint16_t nvme_rwz(NvmeCtrl *n, NvmeRequest *req)
     return nvme_do_rw(n, req);
 }
 
+static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeZoneManagementRecvCmd *recv;
+    NvmeZoneManagementRecvAction zra;
+    NvmeZoneManagementRecvActionSpecificField zrasp;
+    NvmeNamespace *ns = req->ns;
+    NvmeZone *zone;
+
+    uint8_t *buf, *bufp, zs_list;
+    uint64_t slba, num_zones = 0, zidx = 0, zidx_begin;
+    uint16_t zes, status;
+    size_t len;
+
+    recv = (NvmeZoneManagementRecvCmd *) &req->cmd;
+
+    zra = recv->zra;
+    zrasp = recv->zrasp;
+    slba = le64_to_cpu(recv->slba);
+    len = (le32_to_cpu(recv->numdw) + 1) << 2;
+
+    if (!nvme_ns_zoned(ns)) {
+        return NVME_INVALID_OPCODE | NVME_DNR;
+    }
+
+    trace_pci_nvme_zone_mgmt_recv(nvme_cid(req), nvme_nsid(ns), slba, len,
+                                  zra, zrasp, recv->zrasf);
+
+    if (!len) {
+        return NVME_SUCCESS;
+    }
+
+    switch (zrasp) {
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ALL:
+        zs_list = 0;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSE:
+        zs_list = NVME_ZS_ZSE;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSIO:
+        zs_list = NVME_ZS_ZSIO;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSEO:
+        zs_list = NVME_ZS_ZSEO;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSC:
+        zs_list = NVME_ZS_ZSC;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSF:
+        zs_list = NVME_ZS_ZSF;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSRO:
+        zs_list = NVME_ZS_ZSRO;
+        break;
+
+    case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSO:
+        zs_list = NVME_ZS_ZSO;
+        break;
+    default:
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    status = nvme_check_mdts(n, len);
+    if (status) {
+        return status;
+    }
+
+    if (!nvme_ns_get_zone(ns, slba)) {
+        trace_pci_nvme_err_invalid_zone(nvme_cid(req), slba);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    zidx_begin = zidx = nvme_ns_zone_idx(ns, slba);
+    zes = sizeof(NvmeZoneDescriptor);
+    if (zra == NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES) {
+        zes += nvme_ns_zdes_bytes(ns);
+    }
+
+    buf = bufp = g_malloc0(len);
+    bufp += sizeof(NvmeZoneReportHeader);
+
+    while ((bufp + zes) - buf <= len && zidx < ns->zns.info.num_zones) {
+        zone = &ns->zns.info.zones[zidx++];
+
+        if (zs_list && zs_list != nvme_zs(zone)) {
+            continue;
+        }
+
+        num_zones++;
+
+        memcpy(bufp, &zone->zd, sizeof(NvmeZoneDescriptor));
+
+        if (zra == NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES) {
+            memcpy(bufp + sizeof(NvmeZoneDescriptor), zone->zde,
+                   nvme_ns_zdes_bytes(ns));
+        }
+
+        bufp += zes;
+    }
+
+    if (!(recv->zrasf & NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL)) {
+        if (!zs_list) {
+            num_zones = ns->zns.info.num_zones - zidx_begin;
+        } else {
+            num_zones = 0;
+            for (int i = zidx_begin; i < ns->zns.info.num_zones; i++) {
+                zone = &ns->zns.info.zones[i];
+
+                if (zs_list == nvme_zs(zone)) {
+                    num_zones++;
+                }
+            }
+        }
+    }
+
+    stq_le_p(buf, num_zones);
+
+    status = nvme_dma(n, buf, len, DMA_DIRECTION_FROM_DEVICE, req);
+    g_free(buf);
+
+    return status;
+}
+
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
     uint32_t nsid = le32_to_cpu(req->cmd.nsid);
@@ -1551,6 +1679,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
     case NVME_CMD_WRITE:
     case NVME_CMD_WRITE_ZEROES:
         return nvme_rwz(n, req);
+    case NVME_CMD_ZONE_MGMT_RECV:
+        return nvme_zone_mgmt_recv(n, req);
     default:
         trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1ec1af8d6291..92aebb6a6416 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -47,6 +47,12 @@ static const NvmeEffectsLog nvme_effects[] = {
                 NVME_EFFECTS_LBCC,
         },
     },
+
+    [NVME_IOCS_ZONED] = {
+        .iocs = {
+            [NVME_CMD_ZONE_MGMT_RECV]   = NVME_EFFECTS_CSUPP,
+        }
+    },
 };
 
 typedef struct NvmeAsyncEvent {
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 9e0b848186c8..9d2a7c2766b6 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -49,6 +49,7 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" s
 pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
 pci_nvme_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
+pci_nvme_zone_mgmt_recv(uint16_t cid, uint32_t nsid, uint64_t slba, uint64_t len, uint8_t zra, uint8_t zrasp, uint8_t zrasf) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" len %"PRIu64" zra 0x%"PRIx8" zrasp 0x%"PRIx8" zrasf 0x%"PRIx8""
 pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
 pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
 pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index ddf948132272..68dac2582b06 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -746,6 +746,11 @@ typedef enum NvmeZoneManagementRecvActionSpecificField {
 
 #define NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL 0x1
 
+typedef struct NvmeZoneReportHeader {
+    uint64_t num_zones;
+    uint8_t  rsvd[56];
+} NvmeZoneReportHeader;
+
 typedef struct NvmeDsmCmd {
     uint8_t     opcode;
     uint8_t     flags;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/10] hw/block/nvme: add the zone management send command
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (3 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 04/10] hw/block/nvme: add the zone management receive command Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 06/10] hw/block/nvme: add the zone append command Klaus Jensen
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Add the Zone Management Send command.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 461 ++++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme.h       |   4 +
 hw/block/trace-events |  12 ++
 3 files changed, 477 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7e943dece352..a4527ad9840e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -748,6 +748,11 @@ static void nvme_submit_aio(NvmeAIO *aio)
         }
 
         break;
+
+    case NVME_AIO_OPC_DISCARD:
+        aio->aiocb = blk_aio_pdiscard(blk, aio->offset, aio->len, nvme_aio_cb,
+                                      aio);
+        break;
     }
 }
 
@@ -1142,6 +1147,46 @@ static void nvme_update_zone_info(NvmeNamespace *ns, NvmeRequest *req,
     nvme_req_add_aio(req, aio);
 }
 
+static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    uint64_t zslba = -1;
+    QEMUIOVector *iov = g_new0(QEMUIOVector, 1);
+    NvmeAIO *aio = g_new0(NvmeAIO, 1);
+
+    *aio = (NvmeAIO) {
+        .opc = NVME_AIO_OPC_WRITE,
+        .blk = ns->zns.info.blk,
+        .payload = iov,
+        .offset = ns->zns.info.num_zones * sizeof(NvmeZoneDescriptor),
+        .req = req,
+        .flags = NVME_AIO_INTERNAL,
+    };
+
+    qemu_iovec_init(iov, 1);
+
+    if (zone) {
+        zslba = nvme_zslba(zone);
+        trace_pci_nvme_update_zone_descr(nvme_cid(req), ns->params.nsid,
+                                         zslba);
+
+        aio->offset += nvme_ns_zone_idx(ns, zslba) * nvme_ns_zdes_bytes(ns);
+        qemu_iovec_add(iov, zone->zde, nvme_ns_zdes_bytes(ns));
+    } else {
+        trace_pci_nvme_update_zone_descr(nvme_cid(req), ns->params.nsid,
+                                         zslba);
+
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            qemu_iovec_add(iov, ns->zns.info.zones[i].zde,
+                nvme_ns_zdes_bytes(ns));
+        }
+    }
+
+    aio->len = iov->size;
+
+    nvme_req_add_aio(req, aio);
+}
+
 static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret)
 {
     NvmeRequest *req = aio->req;
@@ -1206,6 +1251,49 @@ static void nvme_rw_cb(NvmeRequest *req, void *opaque)
     nvme_enqueue_req_completion(cq, req);
 }
 
+static void nvme_zone_mgmt_send_reset_cb(NvmeRequest *req, void *opaque)
+{
+    NvmeSQueue *sq = req->sq;
+    NvmeCtrl *n = sq->ctrl;
+    NvmeCQueue *cq = n->cq[sq->cqid];
+    NvmeNamespace *ns = req->ns;
+
+    trace_pci_nvme_zone_mgmt_send_reset_cb(nvme_cid(req), nvme_nsid(ns));
+
+    g_free(opaque);
+
+    nvme_enqueue_req_completion(cq, req);
+}
+
+static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret)
+{
+    NvmeRequest *req = aio->req;
+    NvmeZone *zone = opaque;
+    NvmeNamespace *ns = req->ns;
+
+    uint64_t zslba = nvme_zslba(zone);
+    uint64_t zcap = nvme_zcap(zone);
+
+    if (ret) {
+        return;
+    }
+
+    trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), ns->params.nsid, zslba);
+
+    nvme_zs_set(zone, NVME_ZS_ZSE);
+    NVME_ZA_CLEAR(zone->zd.za);
+
+    zone->zd.wp = zone->zd.zslba;
+    zone->wp_staging = zslba;
+
+    nvme_update_zone_info(ns, req, zone);
+
+    if (ns->blk_state) {
+        bitmap_clear(ns->utilization, zslba, zcap);
+        nvme_ns_update_util(ns, zslba, zcap, req);
+    }
+}
+
 static void nvme_aio_cb(void *opaque, int ret)
 {
     NvmeAIO *aio = opaque;
@@ -1336,6 +1424,377 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 }
 
+static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeNamespace *ns = req->ns;
+
+    trace_pci_nvme_zone_mgmt_send_close(nvme_cid(req), nvme_nsid(ns),
+                                        nvme_zslba(zone), nvme_zs_str(zone));
+
+
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSIO:
+    case NVME_ZS_ZSEO:
+        nvme_zs_set(zone, NVME_ZS_ZSC);
+
+        nvme_update_zone_info(ns, req, zone);
+
+        return NVME_NO_COMPLETE;
+
+    case NVME_ZS_ZSC:
+        return NVME_SUCCESS;
+
+    default:
+        break;
+    }
+
+    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
+                                              nvme_zs(zone));
+    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+}
+
+static uint16_t nvme_zone_mgmt_send_finish(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeNamespace *ns = req->ns;
+
+    trace_pci_nvme_zone_mgmt_send_finish(nvme_cid(req), nvme_nsid(ns),
+                                         nvme_zslba(zone), nvme_zs_str(zone));
+
+
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSIO:
+    case NVME_ZS_ZSEO:
+    case NVME_ZS_ZSC:
+    case NVME_ZS_ZSE:
+        nvme_zs_set(zone, NVME_ZS_ZSF);
+
+        nvme_update_zone_info(ns, req, zone);
+
+        return NVME_NO_COMPLETE;
+
+    case NVME_ZS_ZSF:
+        return NVME_SUCCESS;
+
+    default:
+        break;
+    }
+
+    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
+                                              nvme_zs(zone));
+    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+}
+
+static uint16_t nvme_zone_mgmt_send_open(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeNamespace *ns = req->ns;
+
+    trace_pci_nvme_zone_mgmt_send_open(nvme_cid(req), nvme_nsid(ns),
+                                       nvme_zslba(zone), nvme_zs_str(zone));
+
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSE:
+    case NVME_ZS_ZSC:
+    case NVME_ZS_ZSIO:
+        nvme_zs_set(zone, NVME_ZS_ZSEO);
+
+        nvme_update_zone_info(ns, req, zone);
+        return NVME_NO_COMPLETE;
+
+    case NVME_ZS_ZSEO:
+        return NVME_SUCCESS;
+
+    default:
+        break;
+    }
+
+    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
+                                              nvme_zs(zone));
+    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+}
+
+static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeAIO *aio;
+    NvmeNamespace *ns = req->ns;
+    uint64_t zslba = nvme_zslba(zone);
+    uint64_t zcap = nvme_zcap(zone);
+    uint8_t lbads = nvme_ns_lbads(ns);
+
+    trace_pci_nvme_zone_mgmt_send_reset(nvme_cid(req), nvme_nsid(ns),
+                                        nvme_zslba(zone), nvme_zs_str(zone));
+
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSIO:
+    case NVME_ZS_ZSEO:
+    case NVME_ZS_ZSC:
+    case NVME_ZS_ZSF:
+        aio = g_new0(NvmeAIO, 1);
+
+        *aio = (NvmeAIO) {
+            .opc = NVME_AIO_OPC_DISCARD,
+            .blk = ns->blk,
+            .offset = zslba << lbads,
+            .len = zcap << lbads,
+            .req = req,
+            .cb = nvme_aio_zone_reset_cb,
+            .cb_arg = zone,
+        };
+
+        nvme_req_add_aio(req, aio);
+        nvme_req_set_cb(req, nvme_zone_mgmt_send_reset_cb, NULL);
+
+        return NVME_NO_COMPLETE;
+
+    case NVME_ZS_ZSE:
+        return NVME_SUCCESS;
+
+    case NVME_ZS_ZSRO:
+        nvme_zs_set(zone, NVME_ZS_ZSO);
+
+        nvme_update_zone_info(ns, req, zone);
+
+        return NVME_NO_COMPLETE;
+
+    default:
+        break;
+    }
+
+    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
+                                              nvme_zs(zone));
+    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+}
+
+static uint16_t nvme_zone_mgmt_send_offline(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeNamespace *ns = req->ns;
+
+    trace_pci_nvme_zone_mgmt_send_offline(nvme_cid(req), nvme_nsid(ns),
+                                          nvme_zslba(zone), nvme_zs_str(zone));
+
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSRO:
+        nvme_zs_set(zone, NVME_ZS_ZSO);
+
+        nvme_update_zone_info(ns, req, zone);
+        return NVME_NO_COMPLETE;
+
+    case NVME_ZS_ZSO:
+        return NVME_SUCCESS;
+
+    default:
+        break;
+    }
+
+    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
+                                              nvme_zs(zone));
+    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+}
+
+static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeNamespace *ns = req->ns;
+    uint16_t status;
+
+    trace_pci_nvme_zone_mgmt_send_set_zde(nvme_cid(req), nvme_nsid(ns),
+                                          nvme_zslba(zone), nvme_zs_str(zone));
+
+    if (nvme_zs(zone) != NVME_ZS_ZSE) {
+        trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req),
+                                                  nvme_zslba(zone),
+                                                  nvme_zs(zone));
+        return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+    }
+
+    nvme_zs_set(zone, NVME_ZS_ZSEO);
+
+    status = nvme_dma(n, zone->zde, nvme_ns_zdes_bytes(ns),
+                      DMA_DIRECTION_TO_DEVICE, req);
+    if (status) {
+        return status;
+    }
+
+    NVME_ZA_SET_ZDEV(zone->zd.za, 0x1);
+    nvme_update_zone_descr(ns, req, zone);
+    nvme_update_zone_info(ns, req, zone);
+
+    return NVME_NO_COMPLETE;
+}
+
+static uint16_t nvme_zone_mgmt_send_all(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeZoneManagementSendCmd *send = (NvmeZoneManagementSendCmd *) &req->cmd;
+    NvmeNamespace *ns = req->ns;
+    NvmeZone *zone;
+    NvmeZoneState zs;
+
+    uint16_t status = NVME_SUCCESS;
+
+    trace_pci_nvme_zone_mgmt_send_all(nvme_cid(req), nvme_nsid(ns), send->zsa);
+
+    switch (send->zsa) {
+    case NVME_CMD_ZONE_MGMT_SEND_SET_ZDE:
+        return NVME_INVALID_FIELD | NVME_DNR;
+
+    case NVME_CMD_ZONE_MGMT_SEND_CLOSE:
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zone = &ns->zns.info.zones[i];
+            zs = nvme_zs(zone);
+
+            switch (zs) {
+            case NVME_ZS_ZSIO:
+            case NVME_ZS_ZSEO:
+                status = nvme_zone_mgmt_send_close(n, req, zone);
+                if (status && status != NVME_NO_COMPLETE) {
+                    goto err_out;
+                }
+
+            default:
+                continue;
+            }
+        }
+
+        break;
+
+    case NVME_CMD_ZONE_MGMT_SEND_FINISH:
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zone = &ns->zns.info.zones[i];
+            zs = nvme_zs(zone);
+
+            switch (zs) {
+            case NVME_ZS_ZSIO:
+            case NVME_ZS_ZSEO:
+            case NVME_ZS_ZSC:
+                status = nvme_zone_mgmt_send_finish(n, req, zone);
+                if (status && status != NVME_NO_COMPLETE) {
+                    goto err_out;
+                }
+
+            default:
+                continue;
+            }
+        }
+
+        break;
+
+    case NVME_CMD_ZONE_MGMT_SEND_OPEN:
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zone = &ns->zns.info.zones[i];
+            zs = nvme_zs(zone);
+
+            if (zs == NVME_ZS_ZSC) {
+                status = nvme_zone_mgmt_send_open(n, req, zone);
+                if (status && status != NVME_NO_COMPLETE) {
+                    goto err_out;
+                }
+            }
+        }
+
+        break;
+
+    case NVME_CMD_ZONE_MGMT_SEND_RESET:
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zone = &ns->zns.info.zones[i];
+            zs = nvme_zs(zone);
+
+            switch (zs) {
+            case NVME_ZS_ZSIO:
+            case NVME_ZS_ZSEO:
+            case NVME_ZS_ZSC:
+            case NVME_ZS_ZSF:
+                status = nvme_zone_mgmt_send_reset(n, req, zone);
+                if (status && status != NVME_NO_COMPLETE) {
+                    goto err_out;
+                }
+
+            default:
+                continue;
+            }
+        }
+
+        break;
+
+    case NVME_CMD_ZONE_MGMT_SEND_OFFLINE:
+        for (int i = 0; i < ns->zns.info.num_zones; i++) {
+            zone = &ns->zns.info.zones[i];
+            zs = nvme_zs(zone);
+
+            if (zs == NVME_ZS_ZSRO) {
+                status = nvme_zone_mgmt_send_offline(n, req, zone);
+                if (status && status != NVME_NO_COMPLETE) {
+                    goto err_out;
+                }
+            }
+        }
+
+        break;
+    }
+
+    return status;
+
+err_out:
+    req->status = status;
+
+    if (!QTAILQ_EMPTY(&req->aio_tailq)) {
+        return NVME_NO_COMPLETE;
+    }
+
+    return status;
+}
+
+static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeZoneManagementSendCmd *send = (NvmeZoneManagementSendCmd *) &req->cmd;
+    NvmeZoneManagementSendAction zsa = send->zsa;
+    NvmeNamespace *ns = req->ns;
+    NvmeZone *zone;
+    uint64_t zslba = le64_to_cpu(send->slba);
+
+    if (!nvme_ns_zoned(ns)) {
+        return NVME_INVALID_OPCODE | NVME_DNR;
+    }
+
+    trace_pci_nvme_zone_mgmt_send(nvme_cid(req), ns->params.nsid, zslba, zsa,
+                                  send->zsflags);
+
+    if (NVME_CMD_ZONE_MGMT_SEND_SELECT_ALL(send->zsflags)) {
+        return nvme_zone_mgmt_send_all(n, req);
+    }
+
+    if (zslba & (nvme_ns_zsze(ns) - 1)) {
+        trace_pci_nvme_err_invalid_zslba(nvme_cid(req), zslba);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    zone = nvme_ns_get_zone(ns, zslba);
+    if (!zone) {
+        trace_pci_nvme_err_invalid_zone(nvme_cid(req), zslba);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    switch (zsa) {
+    case NVME_CMD_ZONE_MGMT_SEND_CLOSE:
+        return nvme_zone_mgmt_send_close(n, req, zone);
+    case NVME_CMD_ZONE_MGMT_SEND_FINISH:
+        return nvme_zone_mgmt_send_finish(n, req, zone);
+    case NVME_CMD_ZONE_MGMT_SEND_OPEN:
+        return nvme_zone_mgmt_send_open(n, req, zone);
+    case NVME_CMD_ZONE_MGMT_SEND_RESET:
+        return nvme_zone_mgmt_send_reset(n, req, zone);
+    case NVME_CMD_ZONE_MGMT_SEND_OFFLINE:
+        return nvme_zone_mgmt_send_offline(n, req, zone);
+    case NVME_CMD_ZONE_MGMT_SEND_SET_ZDE:
+        return nvme_zone_mgmt_send_set_zde(n, req, zone);
+    }
+
+    return NVME_INVALID_FIELD | NVME_DNR;
+}
+
 static uint16_t nvme_do_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeAIO *aio;
@@ -1679,6 +2138,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
     case NVME_CMD_WRITE:
     case NVME_CMD_WRITE_ZEROES:
         return nvme_rwz(n, req);
+    case NVME_CMD_ZONE_MGMT_SEND:
+        return nvme_zone_mgmt_send(n, req);
     case NVME_CMD_ZONE_MGMT_RECV:
         return nvme_zone_mgmt_recv(n, req);
     default:
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 92aebb6a6416..757277d339bf 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -51,6 +51,8 @@ static const NvmeEffectsLog nvme_effects[] = {
     [NVME_IOCS_ZONED] = {
         .iocs = {
             [NVME_CMD_ZONE_MGMT_RECV]   = NVME_EFFECTS_CSUPP,
+            [NVME_CMD_ZONE_MGMT_SEND]   = NVME_EFFECTS_CSUPP |
+                NVME_EFFECTS_LBCC,
         }
     },
 };
@@ -127,6 +129,7 @@ typedef enum NvmeAIOOp {
     NVME_AIO_OPC_READ         = 0x2,
     NVME_AIO_OPC_WRITE        = 0x3,
     NVME_AIO_OPC_WRITE_ZEROES = 0x4,
+    NVME_AIO_OPC_DISCARD      = 0x5,
 } NvmeAIOOp;
 
 typedef enum NvmeAIOFlags {
@@ -164,6 +167,7 @@ static inline const char *nvme_aio_opc_str(NvmeAIO *aio)
     case NVME_AIO_OPC_READ:         return "NVME_AIO_OP_READ";
     case NVME_AIO_OPC_WRITE:        return "NVME_AIO_OP_WRITE";
     case NVME_AIO_OPC_WRITE_ZEROES: return "NVME_AIO_OP_WRITE_ZEROES";
+    case NVME_AIO_OPC_DISCARD:      return "NVME_AIO_OP_DISCARD";
     default:                        return "NVME_AIO_OP_UNKNOWN";
     }
 }
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 9d2a7c2766b6..1da48d1c29d0 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -43,12 +43,22 @@ pci_nvme_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, c
 pci_nvme_aio_discard_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32""
 pci_nvme_aio_write_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32""
 pci_nvme_aio_zone_write_cb(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp 0x%"PRIx64""
+pci_nvme_aio_zone_reset_cb(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64""
 pci_nvme_zone_advance_wp(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp_old, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp_old 0x%"PRIx64" wp 0x%"PRIx64""
 pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
 pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8""
 pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
 pci_nvme_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
+pci_nvme_zone_mgmt_send(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t zsa, uint8_t zsflags) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zsa 0x%"PRIx8" zsflags 0x%"PRIx8""
+pci_nvme_zone_mgmt_send_all(uint16_t cid, uint32_t nsid, uint8_t za) "cid %"PRIu16" nsid %"PRIu32" za 0x%"PRIx8""
+pci_nvme_zone_mgmt_send_close(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
+pci_nvme_zone_mgmt_send_finish(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
+pci_nvme_zone_mgmt_send_open(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
+pci_nvme_zone_mgmt_send_reset(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
+pci_nvme_zone_mgmt_send_reset_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
+pci_nvme_zone_mgmt_send_offline(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
+pci_nvme_zone_mgmt_send_set_zde(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
 pci_nvme_zone_mgmt_recv(uint16_t cid, uint32_t nsid, uint64_t slba, uint64_t len, uint8_t zra, uint8_t zrasp, uint8_t zrasf) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" len %"PRIu64" zra 0x%"PRIx8" zrasp 0x%"PRIx8" zrasf 0x%"PRIx8""
 pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
 pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
@@ -85,6 +95,7 @@ pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
 pci_nvme_ns_update_util(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 pci_nvme_zone_pending_writes(uint16_t cid, uint64_t zslba, uint64_t wp, uint64_t wp_staging) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" wp_staging 0x%"PRIx64""
 pci_nvme_update_zone_info(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64""
+pci_nvme_update_zone_descr(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64""
 pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
@@ -138,6 +149,7 @@ pci_nvme_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx
 pci_nvme_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 0x%"PRIx16""
 pci_nvme_err_invalid_zone(uint16_t cid, uint64_t lba) "cid %"PRIu16" lba 0x%"PRIx64""
 pci_nvme_err_invalid_zone_condition(uint16_t cid, uint64_t zslba, uint8_t condition) "cid %"PRIu16" zslba 0x%"PRIx64" condition 0x%"PRIx8""
+pci_nvme_err_invalid_zslba(uint16_t cid, uint64_t zslba) "cid %"PRIu16" zslba 0x%"PRIx64""
 pci_nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
 pci_nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
 pci_nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/10] hw/block/nvme: add the zone append command
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (4 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 05/10] hw/block/nvme: add the zone management send command Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 07/10] hw/block/nvme: track and enforce zone resources Klaus Jensen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Add the Zone Append command.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 106 ++++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme.h       |   3 ++
 hw/block/trace-events |   2 +
 3 files changed, 111 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a4527ad9840e..6b394d374c8e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1294,6 +1294,12 @@ static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret)
     }
 }
 
+static void nvme_zone_append_cb(NvmeRequest *req, void *opaque)
+{
+    trace_pci_nvme_zone_append_cb(nvme_cid(req), le64_to_cpu(req->cqe.qw0));
+    nvme_rw_cb(req, opaque);
+}
+
 static void nvme_aio_cb(void *opaque, int ret)
 {
     NvmeAIO *aio = opaque;
@@ -1424,6 +1430,104 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 }
 
+static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req,
+    NvmeZone *zone)
+{
+    NvmeAIO *aio;
+    NvmeNamespace *ns = req->ns;
+
+    uint64_t zslba = nvme_zslba(zone);
+    uint64_t wp = zone->wp_staging;
+
+    size_t len;
+    uint16_t status;
+
+    req->cqe.qw0 = cpu_to_le64(wp);
+    req->slba = wp;
+
+    len = req->nlb << nvme_ns_lbads(ns);
+
+    trace_pci_nvme_zone_append(nvme_cid(req), zslba, wp, req->nlb);
+
+    status = nvme_check_rw(n, req);
+    if (status) {
+        goto invalid;
+    }
+
+    status = nvme_check_zone_write(n, req->slba, req->nlb, req, zone);
+    if (status) {
+        goto invalid;
+    }
+
+    switch (nvme_zs(zone)) {
+    case NVME_ZS_ZSE:
+    case NVME_ZS_ZSC:
+        nvme_zs_set(zone, NVME_ZS_ZSIO);
+    default:
+        break;
+    }
+
+    status = nvme_map(n, len, req);
+    if (status) {
+        goto invalid;
+    }
+
+    aio = g_new0(NvmeAIO, 1);
+    *aio = (NvmeAIO) {
+        .opc = NVME_AIO_OPC_WRITE,
+        .blk = ns->blk,
+        .offset = req->slba << nvme_ns_lbads(ns),
+        .req = req,
+        .cb = nvme_aio_zone_write_cb,
+        .cb_arg = zone,
+    };
+
+    if (req->qsg.sg) {
+        aio->len = req->qsg.size;
+        aio->flags |= NVME_AIO_DMA;
+    } else {
+        aio->len = req->iov.size;
+    }
+
+    nvme_req_add_aio(req, aio);
+    nvme_req_set_cb(req, nvme_zone_append_cb, zone);
+
+    zone->wp_staging += req->nlb;
+
+    return NVME_NO_COMPLETE;
+
+invalid:
+    block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE);
+    return status;
+}
+
+static uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeZone *zone;
+    NvmeZoneAppendCmd *zappend = (NvmeZoneAppendCmd *) &req->cmd;
+    NvmeNamespace *ns = req->ns;
+    uint64_t zslba = le64_to_cpu(zappend->zslba);
+
+    if (!nvme_ns_zoned(ns)) {
+        return NVME_INVALID_OPCODE | NVME_DNR;
+    }
+
+    if (zslba & (nvme_ns_zsze(ns) - 1)) {
+        trace_pci_nvme_err_invalid_zslba(nvme_cid(req), zslba);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    req->nlb = le16_to_cpu(zappend->nlb) + 1;
+
+    zone = nvme_ns_get_zone(ns, zslba);
+    if (!zone) {
+        trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    return nvme_do_zone_append(n, req, zone);
+}
+
 static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req,
     NvmeZone *zone)
 {
@@ -2142,6 +2246,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
         return nvme_zone_mgmt_send(n, req);
     case NVME_CMD_ZONE_MGMT_RECV:
         return nvme_zone_mgmt_recv(n, req);
+    case NVME_CMD_ZONE_APPEND:
+        return nvme_zone_append(n, req);
     default:
         trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 757277d339bf..6b4eb0098450 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -53,6 +53,8 @@ static const NvmeEffectsLog nvme_effects[] = {
             [NVME_CMD_ZONE_MGMT_RECV]   = NVME_EFFECTS_CSUPP,
             [NVME_CMD_ZONE_MGMT_SEND]   = NVME_EFFECTS_CSUPP |
                 NVME_EFFECTS_LBCC,
+            [NVME_CMD_ZONE_APPEND]      = NVME_EFFECTS_CSUPP |
+                NVME_EFFECTS_LBCC,
         }
     },
 };
@@ -177,6 +179,7 @@ static inline bool nvme_req_is_write(NvmeRequest *req)
     switch (req->cmd.opcode) {
     case NVME_CMD_WRITE:
     case NVME_CMD_WRITE_ZEROES:
+    case NVME_CMD_ZONE_APPEND:
         return true;
     default:
         return false;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 1da48d1c29d0..0dfc6e22008e 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -50,6 +50,8 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" s
 pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
 pci_nvme_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
+pci_nvme_zone_append(uint16_t cid, uint64_t zslba, uint64_t wp, uint16_t nlb) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" nlb %"PRIu16""
+pci_nvme_zone_append_cb(uint16_t cid, uint64_t slba) "cid %"PRIu16" slba 0x%"PRIx64""
 pci_nvme_zone_mgmt_send(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t zsa, uint8_t zsflags) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zsa 0x%"PRIx8" zsflags 0x%"PRIx8""
 pci_nvme_zone_mgmt_send_all(uint16_t cid, uint32_t nsid, uint8_t za) "cid %"PRIu16" nsid %"PRIu32" za 0x%"PRIx8""
 pci_nvme_zone_mgmt_send_close(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/10] hw/block/nvme: track and enforce zone resources
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (5 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 06/10] hw/block/nvme: add the zone append command Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 08/10] hw/block/nvme: allow open to close transitions by controller Klaus Jensen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Move all zone transition rules to a single state machine that also
manages zone resources.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme-ns.c |  17 ++-
 hw/block/nvme-ns.h |   7 ++
 hw/block/nvme.c    | 304 ++++++++++++++++++++++++++++++++-------------
 3 files changed, 242 insertions(+), 86 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 68996c2f0e72..5a55a0191f55 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -262,8 +262,13 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 
     id_ns->ncap = ns->zns.info.num_zones * ns->params.zns.zcap;
 
-    id_ns_zns->mar = 0xffffffff;
-    id_ns_zns->mor = 0xffffffff;
+    id_ns_zns->mar = cpu_to_le32(ns->params.zns.mar);
+    id_ns_zns->mor = cpu_to_le32(ns->params.zns.mor);
+
+    ns->zns.resources.active = ns->params.zns.mar != 0xffffffff ?
+        ns->params.zns.mar + 1 : ns->zns.info.num_zones;
+    ns->zns.resources.open = ns->params.zns.mor != 0xffffffff ?
+        ns->params.zns.mor + 1 : ns->zns.info.num_zones;
 }
 
 static void nvme_ns_init(NvmeNamespace *ns)
@@ -426,6 +431,12 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, Error
             return -1;
         }
 
+        if (ns->params.zns.mor > ns->params.zns.mar) {
+            error_setg(errp, "maximum open resources (MOR) must be less "
+                       "than or equal to maximum active resources (MAR)");
+            return -1;
+        }
+
         break;
 
     default:
@@ -499,6 +510,8 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT8("zns.zdes", NvmeNamespace, params.zns.zdes, 0),
     DEFINE_PROP_UINT16("zns.zoc", NvmeNamespace, params.zns.zoc, 0),
     DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0),
+    DEFINE_PROP_UINT32("zns.mar", NvmeNamespace, params.zns.mar, 0xffffffff),
+    DEFINE_PROP_UINT32("zns.mor", NvmeNamespace, params.zns.mor, 0xffffffff),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 5940fb73e72b..5660934d6199 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,8 @@ typedef struct NvmeNamespaceParams {
         uint8_t  zdes;
         uint16_t zoc;
         uint16_t ozcs;
+        uint32_t mar;
+        uint32_t mor;
     } zns;
 } NvmeNamespaceParams;
 
@@ -63,6 +65,11 @@ typedef struct NvmeNamespace {
             uint64_t  num_zones;
             NvmeZone *zones;
         } info;
+
+        struct {
+            uint32_t open;
+            uint32_t active;
+        } resources;
     } zns;
 } NvmeNamespace;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b394d374c8e..d5d521954cfc 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1187,6 +1187,155 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req,
     nvme_req_add_aio(req, aio);
 }
 
+/*
+ * nvme_zrm_transition validates zone state transitions under the constraint of
+ * the Number of Active and Open Resources (NAR and NOR) limits as reported by
+ * the Identify Namespace Data Structure.
+ *
+ * The function does NOT change the Zone Attribute field; this must be done by
+ * the caller.
+ */
+static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
+                                    NvmeZoneState to)
+{
+    NvmeZoneState from = nvme_zs(zone);
+
+    /* fast path */
+    if (from == to) {
+        return NVME_SUCCESS;
+    }
+
+    switch (from) {
+    case NVME_ZS_ZSE:
+        switch (to) {
+        case NVME_ZS_ZSRO:
+        case NVME_ZS_ZSO:
+        case NVME_ZS_ZSF:
+            nvme_zs_set(zone, to);
+            return NVME_SUCCESS;
+
+        case NVME_ZS_ZSC:
+            if (!ns->zns.resources.active) {
+                return NVME_TOO_MANY_ACTIVE_ZONES;
+            }
+
+            ns->zns.resources.active--;
+
+            nvme_zs_set(zone, to);
+
+            return NVME_SUCCESS;
+
+        case NVME_ZS_ZSIO:
+        case NVME_ZS_ZSEO:
+            if (!ns->zns.resources.active) {
+                return NVME_TOO_MANY_ACTIVE_ZONES;
+            }
+
+            if (!ns->zns.resources.open) {
+                return NVME_TOO_MANY_OPEN_ZONES;
+            }
+
+            ns->zns.resources.active--;
+            ns->zns.resources.open--;
+
+            nvme_zs_set(zone, to);
+
+            return NVME_SUCCESS;
+
+        default:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        }
+
+    case NVME_ZS_ZSEO:
+        switch (to) {
+        case NVME_ZS_ZSIO:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        default:
+            break;
+        }
+
+        /* fallthrough */
+
+    case NVME_ZS_ZSIO:
+        switch (to) {
+        case NVME_ZS_ZSEO:
+            nvme_zs_set(zone, to);
+            return NVME_SUCCESS;
+
+        case NVME_ZS_ZSE:
+        case NVME_ZS_ZSF:
+        case NVME_ZS_ZSRO:
+        case NVME_ZS_ZSO:
+            ns->zns.resources.active++;
+
+            /* fallthrough */
+
+        case NVME_ZS_ZSC:
+            ns->zns.resources.open++;
+
+            nvme_zs_set(zone, to);
+
+            return NVME_SUCCESS;
+
+        default:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        }
+
+    case NVME_ZS_ZSC:
+        switch (to) {
+        case NVME_ZS_ZSE:
+        case NVME_ZS_ZSF:
+        case NVME_ZS_ZSRO:
+        case NVME_ZS_ZSO:
+            ns->zns.resources.active++;
+            nvme_zs_set(zone, to);
+
+            return NVME_SUCCESS;
+
+        case NVME_ZS_ZSIO:
+        case NVME_ZS_ZSEO:
+            if (!ns->zns.resources.open) {
+                return NVME_TOO_MANY_OPEN_ZONES;
+            }
+
+            ns->zns.resources.open--;
+
+            nvme_zs_set(zone, to);
+
+            return NVME_SUCCESS;
+
+        default:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        }
+
+    case NVME_ZS_ZSRO:
+        switch (to) {
+        case NVME_ZS_ZSO:
+            nvme_zs_set(zone, to);
+            return NVME_SUCCESS;
+        default:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        }
+
+    case NVME_ZS_ZSF:
+        switch (to) {
+        case NVME_ZS_ZSE:
+        case NVME_ZS_ZSRO:
+        case NVME_ZS_ZSO:
+            nvme_zs_set(zone, to);
+            return NVME_SUCCESS;
+        default:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        }
+
+    case NVME_ZS_ZSO:
+        return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+
+    default:
+        return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+    }
+}
+
 static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret)
 {
     NvmeRequest *req = aio->req;
@@ -1212,7 +1361,8 @@ static void nvme_zone_advance_wp(NvmeZone *zone, uint32_t nlb,
 
     wp += nlb;
     if (wp == zslba + nvme_zcap(zone)) {
-        nvme_zs_set(zone, NVME_ZS_ZSF);
+        /* if we cannot transition to ZFS something is horribly wrong */
+        assert(nvme_zrm_transition(req->ns, zone, NVME_ZS_ZSF) == NVME_SUCCESS);
     }
 
     zd->wp = cpu_to_le64(wp);
@@ -1280,7 +1430,8 @@ static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret)
 
     trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), ns->params.nsid, zslba);
 
-    nvme_zs_set(zone, NVME_ZS_ZSE);
+    /* if we cannot transition to ZSE something is horribly wrong */
+    assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSE) == NVME_SUCCESS);
     NVME_ZA_CLEAR(zone->zd.za);
 
     zone->zd.wp = zone->zd.zslba;
@@ -1360,7 +1511,7 @@ static void nvme_aio_cb(void *opaque, int ret)
             if (nvme_ns_zoned(ns)) {
                 NvmeZone *zone = nvme_ns_get_zone(ns, req->slba);
 
-                nvme_zs_set(zone, NVME_ZS_ZSO);
+                assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO));
                 NVME_ZA_CLEAR(zone->zd.za);
 
                 nvme_update_zone_info(ns, req, zone);
@@ -1431,10 +1582,11 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 }
 
 static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req,
-    NvmeZone *zone)
+                                    NvmeZone *zone)
 {
     NvmeAIO *aio;
     NvmeNamespace *ns = req->ns;
+    NvmeZoneState zs_orig = nvme_zs(zone);
 
     uint64_t zslba = nvme_zslba(zone);
     uint64_t wp = zone->wp_staging;
@@ -1459,17 +1611,20 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req,
         goto invalid;
     }
 
-    switch (nvme_zs(zone)) {
-    case NVME_ZS_ZSE:
-    case NVME_ZS_ZSC:
-        nvme_zs_set(zone, NVME_ZS_ZSIO);
-    default:
+    switch (zs_orig) {
+    case NVME_ZS_ZSIO:
+    case NVME_ZS_ZSEO:
         break;
+    default:
+        status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO);
+        if (status) {
+            goto invalid;
+        }
     }
 
     status = nvme_map(n, len, req);
     if (status) {
-        goto invalid;
+        goto zrm_revert;
     }
 
     aio = g_new0(NvmeAIO, 1);
@@ -1496,6 +1651,10 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req,
 
     return NVME_NO_COMPLETE;
 
+zrm_revert:
+    /* if we cannot revert the transition something is horribly wrong */
+    assert(nvme_zrm_transition(ns, zone, zs_orig) == NVME_SUCCESS);
+
 invalid:
     block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE);
     return status;
@@ -1532,91 +1691,66 @@ static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req,
     NvmeZone *zone)
 {
     NvmeNamespace *ns = req->ns;
+    NvmeZoneState zs = nvme_zs(zone);
+    uint16_t status;
 
     trace_pci_nvme_zone_mgmt_send_close(nvme_cid(req), nvme_nsid(ns),
                                         nvme_zslba(zone), nvme_zs_str(zone));
 
-
-    switch (nvme_zs(zone)) {
-    case NVME_ZS_ZSIO:
-    case NVME_ZS_ZSEO:
-        nvme_zs_set(zone, NVME_ZS_ZSC);
-
-        nvme_update_zone_info(ns, req, zone);
-
-        return NVME_NO_COMPLETE;
-
-    case NVME_ZS_ZSC:
-        return NVME_SUCCESS;
-
-    default:
-        break;
+    /*
+     * The state machine in nvme_zrm_transition allows zones to transition fram
+     * ZSE to ZSC. That transition is only valid if done as part Set Zone
+     * Descriptor, so do an early check here.
+     */
+    if (zs == NVME_ZS_ZSE) {
+        return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
 
-    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
-                                              nvme_zs(zone));
-    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC);
+    if (status) {
+        return status;
+    }
+
+    nvme_update_zone_info(ns, req, zone);
+
+    return NVME_NO_COMPLETE;
 }
 
 static uint16_t nvme_zone_mgmt_send_finish(NvmeCtrl *n, NvmeRequest *req,
     NvmeZone *zone)
 {
     NvmeNamespace *ns = req->ns;
+    uint16_t status;
 
     trace_pci_nvme_zone_mgmt_send_finish(nvme_cid(req), nvme_nsid(ns),
                                          nvme_zslba(zone), nvme_zs_str(zone));
 
-
-    switch (nvme_zs(zone)) {
-    case NVME_ZS_ZSIO:
-    case NVME_ZS_ZSEO:
-    case NVME_ZS_ZSC:
-    case NVME_ZS_ZSE:
-        nvme_zs_set(zone, NVME_ZS_ZSF);
-
-        nvme_update_zone_info(ns, req, zone);
-
-        return NVME_NO_COMPLETE;
-
-    case NVME_ZS_ZSF:
-        return NVME_SUCCESS;
-
-    default:
-        break;
+    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSF);
+    if (status) {
+        return status;
     }
 
-    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
-                                              nvme_zs(zone));
-    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+    nvme_update_zone_info(ns, req, zone);
+    return NVME_NO_COMPLETE;
 }
 
 static uint16_t nvme_zone_mgmt_send_open(NvmeCtrl *n, NvmeRequest *req,
     NvmeZone *zone)
 {
     NvmeNamespace *ns = req->ns;
+    uint16_t status;
 
     trace_pci_nvme_zone_mgmt_send_open(nvme_cid(req), nvme_nsid(ns),
                                        nvme_zslba(zone), nvme_zs_str(zone));
 
-    switch (nvme_zs(zone)) {
-    case NVME_ZS_ZSE:
-    case NVME_ZS_ZSC:
-    case NVME_ZS_ZSIO:
-        nvme_zs_set(zone, NVME_ZS_ZSEO);
-
-        nvme_update_zone_info(ns, req, zone);
-        return NVME_NO_COMPLETE;
-
-    case NVME_ZS_ZSEO:
-        return NVME_SUCCESS;
-
-    default:
-        break;
+    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSEO);
+    if (status) {
+        return status;
     }
 
-    trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
-                                              nvme_zs(zone));
-    return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+    nvme_update_zone_info(ns, req, zone);
+
+    return NVME_NO_COMPLETE;
 }
 
 static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
@@ -1624,6 +1758,7 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
 {
     NvmeAIO *aio;
     NvmeNamespace *ns = req->ns;
+    NvmeZoneState zs = nvme_zs(zone);
     uint64_t zslba = nvme_zslba(zone);
     uint64_t zcap = nvme_zcap(zone);
     uint8_t lbads = nvme_ns_lbads(ns);
@@ -1631,7 +1766,10 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
     trace_pci_nvme_zone_mgmt_send_reset(nvme_cid(req), nvme_nsid(ns),
                                         nvme_zslba(zone), nvme_zs_str(zone));
 
-    switch (nvme_zs(zone)) {
+    switch (zs) {
+    case NVME_ZS_ZSE:
+        return NVME_SUCCESS;
+
     case NVME_ZS_ZSIO:
     case NVME_ZS_ZSEO:
     case NVME_ZS_ZSC:
@@ -1653,18 +1791,13 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
 
         return NVME_NO_COMPLETE;
 
-    case NVME_ZS_ZSE:
-        return NVME_SUCCESS;
-
     case NVME_ZS_ZSRO:
-        nvme_zs_set(zone, NVME_ZS_ZSO);
-
+        assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSO) == NVME_SUCCESS);
         nvme_update_zone_info(ns, req, zone);
-
         return NVME_NO_COMPLETE;
 
-    default:
-        break;
+    case NVME_ZS_ZSO:
+        return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
 
     trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone),
@@ -1682,14 +1815,10 @@ static uint16_t nvme_zone_mgmt_send_offline(NvmeCtrl *n, NvmeRequest *req,
 
     switch (nvme_zs(zone)) {
     case NVME_ZS_ZSRO:
-        nvme_zs_set(zone, NVME_ZS_ZSO);
-
+        assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO));
         nvme_update_zone_info(ns, req, zone);
         return NVME_NO_COMPLETE;
 
-    case NVME_ZS_ZSO:
-        return NVME_SUCCESS;
-
     default:
         break;
     }
@@ -1715,11 +1844,15 @@ static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req,
         return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
 
-    nvme_zs_set(zone, NVME_ZS_ZSEO);
+    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC);
+    if (status) {
+        return status;
+    }
 
     status = nvme_dma(n, zone->zde, nvme_ns_zdes_bytes(ns),
                       DMA_DIRECTION_TO_DEVICE, req);
     if (status) {
+        assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSE));
         return status;
     }
 
@@ -2024,11 +2157,14 @@ static uint16_t nvme_do_rw(NvmeCtrl *n, NvmeRequest *req)
 
         if (nvme_req_is_write(req)) {
             switch (nvme_zs(zone)) {
-            case NVME_ZS_ZSE:
-            case NVME_ZS_ZSC:
-                nvme_zs_set(zone, NVME_ZS_ZSIO);
-            default:
+            case NVME_ZS_ZSIO:
+            case NVME_ZS_ZSEO:
                 break;
+            default:
+                status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO);
+                if (status) {
+                    return status;
+                }
             }
 
             cb = nvme_aio_zone_write_cb;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/10] hw/block/nvme: allow open to close transitions by controller
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (6 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 07/10] hw/block/nvme: track and enforce zone resources Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 09/10] hw/block/nvme: allow zone excursions Klaus Jensen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Allow the controller to release open resources by transitioning
implicitly and explicitly opened zones to closed. This is done using a
naive "least recently opened" strategy. Some workloads may behave very
badly with this, but for the purpose of testing how software deals with
this it is acceptable for now.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme-ns.c    |   3 +
 hw/block/nvme-ns.h    |   5 ++
 hw/block/nvme.c       | 176 +++++++++++++++++++++++++++++++-----------
 hw/block/nvme.h       |   5 ++
 hw/block/trace-events |   5 ++
 5 files changed, 147 insertions(+), 47 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 5a55a0191f55..3b9fa91c7af8 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -269,6 +269,9 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
         ns->params.zns.mar + 1 : ns->zns.info.num_zones;
     ns->zns.resources.open = ns->params.zns.mor != 0xffffffff ?
         ns->params.zns.mor + 1 : ns->zns.info.num_zones;
+
+    QTAILQ_INIT(&ns->zns.resources.lru_open);
+    QTAILQ_INIT(&ns->zns.resources.lru_active);
 }
 
 static void nvme_ns_init(NvmeNamespace *ns)
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 5660934d6199..6d3a6dc07cd8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -39,6 +39,8 @@ typedef struct NvmeZone {
     uint8_t             *zde;
 
     uint64_t wp_staging;
+
+    QTAILQ_ENTRY(NvmeZone) lru_entry;
 } NvmeZone;
 
 typedef struct NvmeNamespace {
@@ -69,6 +71,9 @@ typedef struct NvmeNamespace {
         struct {
             uint32_t open;
             uint32_t active;
+
+            QTAILQ_HEAD(, NvmeZone) lru_open;
+            QTAILQ_HEAD(, NvmeZone) lru_active;
         } resources;
     } zns;
 } NvmeNamespace;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d5d521954cfc..f7b4618bc805 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1187,6 +1187,41 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req,
     nvme_req_add_aio(req, aio);
 }
 
+static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
+                                    NvmeZone *zone, NvmeZoneState to,
+                                    NvmeRequest *req);
+
+static uint16_t nvme_zrm_release_open(NvmeCtrl *n, NvmeNamespace *ns,
+                                      NvmeRequest *req)
+{
+    NvmeZone *candidate;
+    NvmeZoneState zs;
+
+    trace_pci_nvme_zone_zrm_release_open(nvme_cid(req), ns->params.nsid);
+
+    QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_open, lru_entry) {
+        zs = nvme_zs(candidate);
+
+        trace_pci_nvme_zone_zrm_candidate(nvme_cid(req), ns->params.nsid,
+                                          nvme_zslba(candidate),
+                                          nvme_wp(candidate), zs);
+
+        /* skip explicitly opened zones */
+        if (zs == NVME_ZS_ZSEO) {
+            continue;
+        }
+
+        /* the zone cannot be closed if it is currently writing */
+        if (candidate->wp_staging != nvme_wp(candidate)) {
+            continue;
+        }
+
+        return nvme_zrm_transition(n, ns, candidate, NVME_ZS_ZSC, req);
+    }
+
+    return NVME_TOO_MANY_OPEN_ZONES;
+}
+
 /*
  * nvme_zrm_transition validates zone state transitions under the constraint of
  * the Number of Active and Open Resources (NAR and NOR) limits as reported by
@@ -1195,52 +1230,59 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req,
  * The function does NOT change the Zone Attribute field; this must be done by
  * the caller.
  */
-static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
-                                    NvmeZoneState to)
+static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
+                                    NvmeZone *zone, NvmeZoneState to,
+                                    NvmeRequest *req)
 {
     NvmeZoneState from = nvme_zs(zone);
+    uint16_t status;
 
-    /* fast path */
-    if (from == to) {
-        return NVME_SUCCESS;
-    }
+    trace_pci_nvme_zone_zrm_transition(nvme_cid(req), ns->params.nsid,
+                                       nvme_zslba(zone), nvme_zs(zone), to);
 
     switch (from) {
     case NVME_ZS_ZSE:
         switch (to) {
+        case NVME_ZS_ZSE:
+            return NVME_SUCCESS;
+
         case NVME_ZS_ZSRO:
         case NVME_ZS_ZSO:
         case NVME_ZS_ZSF:
-            nvme_zs_set(zone, to);
-            return NVME_SUCCESS;
+            goto out;
 
         case NVME_ZS_ZSC:
             if (!ns->zns.resources.active) {
+                trace_pci_nvme_err_too_many_active_zones(nvme_cid(req));
                 return NVME_TOO_MANY_ACTIVE_ZONES;
             }
 
             ns->zns.resources.active--;
 
-            nvme_zs_set(zone, to);
+            QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry);
 
-            return NVME_SUCCESS;
+            goto out;
 
         case NVME_ZS_ZSIO:
         case NVME_ZS_ZSEO:
             if (!ns->zns.resources.active) {
+                trace_pci_nvme_err_too_many_active_zones(nvme_cid(req));
                 return NVME_TOO_MANY_ACTIVE_ZONES;
             }
 
             if (!ns->zns.resources.open) {
-                return NVME_TOO_MANY_OPEN_ZONES;
+                status = nvme_zrm_release_open(n, ns, req);
+                if (status) {
+                    return status;
+                }
             }
 
             ns->zns.resources.active--;
             ns->zns.resources.open--;
 
-            nvme_zs_set(zone, to);
+            QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry);
 
-            return NVME_SUCCESS;
+            goto out;
 
         default:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
@@ -1248,6 +1290,9 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 
     case NVME_ZS_ZSEO:
         switch (to) {
+        case NVME_ZS_ZSEO:
+            return NVME_SUCCESS;
+
         case NVME_ZS_ZSIO:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
         default:
@@ -1258,24 +1303,30 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 
     case NVME_ZS_ZSIO:
         switch (to) {
-        case NVME_ZS_ZSEO:
-            nvme_zs_set(zone, to);
+        case NVME_ZS_ZSIO:
             return NVME_SUCCESS;
 
+        case NVME_ZS_ZSEO:
+            goto out;
+
         case NVME_ZS_ZSE:
         case NVME_ZS_ZSF:
         case NVME_ZS_ZSRO:
         case NVME_ZS_ZSO:
             ns->zns.resources.active++;
+            ns->zns.resources.open++;
 
-            /* fallthrough */
+            QTAILQ_REMOVE(&ns->zns.resources.lru_open, zone, lru_entry);
+
+            goto out;
 
         case NVME_ZS_ZSC:
             ns->zns.resources.open++;
 
-            nvme_zs_set(zone, to);
+            QTAILQ_REMOVE(&ns->zns.resources.lru_open, zone, lru_entry);
+            QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry);
 
-            return NVME_SUCCESS;
+            goto out;
 
         default:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
@@ -1283,26 +1334,33 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 
     case NVME_ZS_ZSC:
         switch (to) {
+        case NVME_ZS_ZSC:
+            return NVME_SUCCESS;
+
         case NVME_ZS_ZSE:
         case NVME_ZS_ZSF:
         case NVME_ZS_ZSRO:
         case NVME_ZS_ZSO:
             ns->zns.resources.active++;
-            nvme_zs_set(zone, to);
 
-            return NVME_SUCCESS;
+            QTAILQ_REMOVE(&ns->zns.resources.lru_active, zone, lru_entry);
+
+            goto out;
 
         case NVME_ZS_ZSIO:
         case NVME_ZS_ZSEO:
             if (!ns->zns.resources.open) {
-                return NVME_TOO_MANY_OPEN_ZONES;
+                status = nvme_zrm_release_open(n, ns, req);
+                if (status) {
+                    return status;
+                }
             }
 
             ns->zns.resources.open--;
+            QTAILQ_REMOVE(&ns->zns.resources.lru_active, zone, lru_entry);
+            QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry);
 
-            nvme_zs_set(zone, to);
-
-            return NVME_SUCCESS;
+            goto out;
 
         default:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
@@ -1310,30 +1368,46 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 
     case NVME_ZS_ZSRO:
         switch (to) {
-        case NVME_ZS_ZSO:
-            nvme_zs_set(zone, to);
+        case NVME_ZS_ZSRO:
             return NVME_SUCCESS;
+
+        case NVME_ZS_ZSO:
+            goto out;
+
         default:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
         }
 
     case NVME_ZS_ZSF:
         switch (to) {
+        case NVME_ZS_ZSF:
+            return NVME_SUCCESS;
+
         case NVME_ZS_ZSE:
         case NVME_ZS_ZSRO:
         case NVME_ZS_ZSO:
-            nvme_zs_set(zone, to);
-            return NVME_SUCCESS;
+            goto out;
+
         default:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
         }
 
     case NVME_ZS_ZSO:
-        return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        switch (to) {
+        case NVME_ZS_ZSO:
+            return NVME_SUCCESS;
+
+        default:
+            return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
+        }
 
     default:
         return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
+
+out:
+    nvme_zs_set(zone, to);
+    return NVME_SUCCESS;
 }
 
 static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret)
@@ -1361,8 +1435,11 @@ static void nvme_zone_advance_wp(NvmeZone *zone, uint32_t nlb,
 
     wp += nlb;
     if (wp == zslba + nvme_zcap(zone)) {
-        /* if we cannot transition to ZFS something is horribly wrong */
-        assert(nvme_zrm_transition(req->ns, zone, NVME_ZS_ZSF) == NVME_SUCCESS);
+        NvmeCtrl *n = nvme_ctrl(req);
+
+        /* if we cannot transition to ZSF something is horribly wrong */
+        assert(nvme_zrm_transition(n, req->ns, zone, NVME_ZS_ZSF, req) ==
+               NVME_SUCCESS);
     }
 
     zd->wp = cpu_to_le64(wp);
@@ -1418,6 +1495,7 @@ static void nvme_zone_mgmt_send_reset_cb(NvmeRequest *req, void *opaque)
 static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret)
 {
     NvmeRequest *req = aio->req;
+    NvmeCtrl *n = nvme_ctrl(req);
     NvmeZone *zone = opaque;
     NvmeNamespace *ns = req->ns;
 
@@ -1431,7 +1509,7 @@ static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret)
     trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), ns->params.nsid, zslba);
 
     /* if we cannot transition to ZSE something is horribly wrong */
-    assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSE) == NVME_SUCCESS);
+    assert(nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSE, req) == NVME_SUCCESS);
     NVME_ZA_CLEAR(zone->zd.za);
 
     zone->zd.wp = zone->zd.zslba;
@@ -1476,6 +1554,7 @@ static void nvme_aio_cb(void *opaque, int ret)
 
         if (req) {
             NvmeNamespace *ns = req->ns;
+            NvmeCtrl *n = nvme_ctrl(req);
             uint16_t status;
 
             switch (aio->opc) {
@@ -1511,7 +1590,7 @@ static void nvme_aio_cb(void *opaque, int ret)
             if (nvme_ns_zoned(ns)) {
                 NvmeZone *zone = nvme_ns_get_zone(ns, req->slba);
 
-                assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO));
+                assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req));
                 NVME_ZA_CLEAR(zone->zd.za);
 
                 nvme_update_zone_info(ns, req, zone);
@@ -1616,7 +1695,7 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req,
     case NVME_ZS_ZSEO:
         break;
     default:
-        status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO);
+        status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSIO, req);
         if (status) {
             goto invalid;
         }
@@ -1653,7 +1732,7 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req,
 
 zrm_revert:
     /* if we cannot revert the transition something is horribly wrong */
-    assert(nvme_zrm_transition(ns, zone, zs_orig) == NVME_SUCCESS);
+    assert(nvme_zrm_transition(n, ns, zone, zs_orig, req) == NVME_SUCCESS);
 
 invalid:
     block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE);
@@ -1706,7 +1785,7 @@ static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req,
         return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
 
-    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC);
+    status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSC, req);
     if (status) {
         return status;
     }
@@ -1725,7 +1804,7 @@ static uint16_t nvme_zone_mgmt_send_finish(NvmeCtrl *n, NvmeRequest *req,
     trace_pci_nvme_zone_mgmt_send_finish(nvme_cid(req), nvme_nsid(ns),
                                          nvme_zslba(zone), nvme_zs_str(zone));
 
-    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSF);
+    status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSF, req);
     if (status) {
         return status;
     }
@@ -1743,7 +1822,7 @@ static uint16_t nvme_zone_mgmt_send_open(NvmeCtrl *n, NvmeRequest *req,
     trace_pci_nvme_zone_mgmt_send_open(nvme_cid(req), nvme_nsid(ns),
                                        nvme_zslba(zone), nvme_zs_str(zone));
 
-    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSEO);
+    status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSEO, req);
     if (status) {
         return status;
     }
@@ -1792,7 +1871,7 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
         return NVME_NO_COMPLETE;
 
     case NVME_ZS_ZSRO:
-        assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSO) == NVME_SUCCESS);
+        assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req));
         nvme_update_zone_info(ns, req, zone);
         return NVME_NO_COMPLETE;
 
@@ -1815,7 +1894,7 @@ static uint16_t nvme_zone_mgmt_send_offline(NvmeCtrl *n, NvmeRequest *req,
 
     switch (nvme_zs(zone)) {
     case NVME_ZS_ZSRO:
-        assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO));
+        assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req));
         nvme_update_zone_info(ns, req, zone);
         return NVME_NO_COMPLETE;
 
@@ -1844,7 +1923,7 @@ static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req,
         return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
 
-    status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC);
+    status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSC, req);
     if (status) {
         return status;
     }
@@ -1852,7 +1931,7 @@ static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req,
     status = nvme_dma(n, zone->zde, nvme_ns_zdes_bytes(ns),
                       DMA_DIRECTION_TO_DEVICE, req);
     if (status) {
-        assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSE));
+        assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSE, req));
         return status;
     }
 
@@ -2072,11 +2151,14 @@ static uint16_t nvme_do_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
         }
 
         switch (nvme_zs(zone)) {
-        case NVME_ZS_ZSE:
-        case NVME_ZS_ZSC:
-            nvme_zs_set(zone, NVME_ZS_ZSIO);
-        default:
+        case NVME_ZS_ZSIO:
+        case NVME_ZS_ZSEO:
             break;
+        default:
+            status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSIO, req);
+            if (status) {
+                return status;
+            }
         }
 
         cb = nvme_aio_zone_write_cb;
@@ -2161,7 +2243,7 @@ static uint16_t nvme_do_rw(NvmeCtrl *n, NvmeRequest *req)
             case NVME_ZS_ZSEO:
                 break;
             default:
-                status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO);
+                status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSIO, req);
                 if (status) {
                     return status;
                 }
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 6b4eb0098450..309fb1b94ecb 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -312,6 +312,11 @@ static inline uint16_t nvme_sqid(NvmeRequest *req)
     return le16_to_cpu(req->sq->sqid);
 }
 
+static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
+{
+    return req->sq->ctrl;
+}
+
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 
 #endif /* HW_NVME_H */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 0dfc6e22008e..4b4f2ed7605f 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -98,6 +98,9 @@ pci_nvme_ns_update_util(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu3
 pci_nvme_zone_pending_writes(uint16_t cid, uint64_t zslba, uint64_t wp, uint64_t wp_staging) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" wp_staging 0x%"PRIx64""
 pci_nvme_update_zone_info(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64""
 pci_nvme_update_zone_descr(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64""
+pci_nvme_zone_zrm_transition(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t from, uint8_t to) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" from 0x%"PRIx8" to 0x%"PRIx8""
+pci_nvme_zone_zrm_candidate(uint16_t cid, uint32_t nsid, uint64_t zslba, uint64_t wp, uint8_t zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" wp 0x%"PRIx64" zc 0x%"PRIx8""
+pci_nvme_zone_zrm_release_open(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
@@ -121,6 +124,8 @@ pci_nvme_err_zone_is_full(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PR
 pci_nvme_err_zone_is_read_only(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PRIx64""
 pci_nvme_err_zone_invalid_write(uint16_t cid, uint64_t slba, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" wp 0x%"PRIx64""
 pci_nvme_err_zone_boundary(uint16_t cid, uint64_t slba, uint32_t nlb, uint64_t zcap) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" zcap 0x%"PRIx64""
+pci_nvme_err_too_many_active_zones(uint16_t cid) "cid %"PRIu16""
+pci_nvme_err_too_many_open_zones(uint16_t cid) "cid %"PRIu16""
 pci_nvme_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
 pci_nvme_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
 pci_nvme_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16""
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/10] hw/block/nvme: allow zone excursions
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (7 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 08/10] hw/block/nvme: allow open to close transitions by controller Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 10:01 ` [PATCH 10/10] hw/block/nvme: support reset/finish recommended limits Klaus Jensen
  2020-06-30 12:59 ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Niklas Cassel
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Allow the controller to release active resources by transitioning zones
to the full state.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme-ns.h    |   2 +
 hw/block/nvme.c       | 171 ++++++++++++++++++++++++++++++++++++++----
 hw/block/trace-events |   4 +
 include/block/nvme.h  |  10 +++
 4 files changed, 174 insertions(+), 13 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 6d3a6dc07cd8..6acda5c2cf3f 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -75,6 +75,8 @@ typedef struct NvmeNamespace {
             QTAILQ_HEAD(, NvmeZone) lru_open;
             QTAILQ_HEAD(, NvmeZone) lru_active;
         } resources;
+
+        NvmeChangedZoneList changed_list;
     } zns;
 } NvmeNamespace;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f7b4618bc805..6db6daa62bc5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -859,10 +859,11 @@ static void nvme_process_aers(void *opaque)
 
         req = n->aer_reqs[n->outstanding_aers];
 
-        result = (NvmeAerResult *) &req->cqe.dw0;
+        result = (NvmeAerResult *) &req->cqe.qw0;
         result->event_type = event->result.event_type;
         result->event_info = event->result.event_info;
         result->log_page = event->result.log_page;
+        result->nsid = event->result.nsid;
         g_free(event);
 
         req->status = NVME_SUCCESS;
@@ -874,8 +875,9 @@ static void nvme_process_aers(void *opaque)
     }
 }
 
-static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
-                               uint8_t event_info, uint8_t log_page)
+static void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns,
+                               uint8_t event_type, uint8_t event_info,
+                               uint8_t log_page)
 {
     NvmeAsyncEvent *event;
 
@@ -893,6 +895,11 @@ static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
         .log_page   = log_page,
     };
 
+    if (event_info == NVME_AER_INFO_NOTICE_ZONE_DESCR_CHANGED) {
+        assert(ns);
+        event->result.nsid = ns->params.nsid;
+    }
+
     QTAILQ_INSERT_TAIL(&n->aer_queue, event, entry);
     n->aer_queued++;
 
@@ -1187,15 +1194,50 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req,
     nvme_req_add_aio(req, aio);
 }
 
+static void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone)
+{
+    uint16_t num_ids = le16_to_cpu(ns->zns.changed_list.num_ids);
+
+    trace_pci_nvme_zone_changed(ns->params.nsid, nvme_zslba(zone));
+
+    if (num_ids < NVME_CHANGED_ZONE_LIST_MAX_IDS) {
+        ns->zns.changed_list.ids[num_ids] = zone->zd.zslba;
+        ns->zns.changed_list.num_ids = cpu_to_le16(num_ids + 1);
+    } else {
+        memset(&ns->zns.changed_list, 0x0, sizeof(NvmeChangedZoneList));
+        ns->zns.changed_list.num_ids = cpu_to_le16(0xffff);
+    }
+
+    nvme_enqueue_event(n, ns, NVME_AER_TYPE_NOTICE,
+                       NVME_AER_INFO_NOTICE_ZONE_DESCR_CHANGED,
+                       NVME_LOG_CHANGED_ZONE_LIST);
+}
+
 static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
                                     NvmeZone *zone, NvmeZoneState to,
                                     NvmeRequest *req);
 
+static void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
+    NvmeRequest *req)
+{
+    trace_pci_nvme_zone_excursion(ns->params.nsid, nvme_zslba(zone),
+                                  nvme_zs_str(zone));
+
+    assert(nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSF, req) == NVME_SUCCESS);
+
+    NVME_ZA_SET_ZFC(zone->zd.za, 0x1);
+
+    nvme_zone_changed(n, ns, zone);
+
+    nvme_update_zone_info(ns, req, zone);
+}
+
 static uint16_t nvme_zrm_release_open(NvmeCtrl *n, NvmeNamespace *ns,
                                       NvmeRequest *req)
 {
     NvmeZone *candidate;
     NvmeZoneState zs;
+    uint16_t status;
 
     trace_pci_nvme_zone_zrm_release_open(nvme_cid(req), ns->params.nsid);
 
@@ -1216,12 +1258,73 @@ static uint16_t nvme_zrm_release_open(NvmeCtrl *n, NvmeNamespace *ns,
             continue;
         }
 
-        return nvme_zrm_transition(n, ns, candidate, NVME_ZS_ZSC, req);
+        status = nvme_zrm_transition(n, ns, candidate, NVME_ZS_ZSC, req);
+        if (status) {
+            return status;
+        }
+
+        nvme_update_zone_info(ns, req, candidate);
+        return NVME_SUCCESS;
     }
 
     return NVME_TOO_MANY_OPEN_ZONES;
 }
 
+static uint16_t nvme_zrm_release_active(NvmeCtrl *n, NvmeNamespace *ns,
+    NvmeRequest *req)
+{
+    NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns);
+    NvmeZone *candidate = NULL;
+    NvmeZoneDescriptor *zd;
+    NvmeZoneState zs;
+
+    trace_pci_nvme_zone_zrm_release_active(nvme_cid(req), ns->params.nsid);
+
+    /* bail out if Zone Active Excursions are not permitted */
+    if (!(le16_to_cpu(id_ns_zns->zoc) & NVME_ID_NS_ZNS_ZOC_ZAE)) {
+        trace_pci_nvme_zone_zrm_excursion_not_allowed(nvme_cid(req),
+                                                      ns->params.nsid);
+        return NVME_TOO_MANY_ACTIVE_ZONES;
+    }
+
+    QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_active, lru_entry) {
+        zd = &candidate->zd;
+        zs = nvme_zs(candidate);
+
+        trace_pci_nvme_zone_zrm_candidate(nvme_cid(req), ns->params.nsid,
+                                          nvme_zslba(candidate),
+                                          nvme_wp(candidate), zs);
+
+        goto out;
+    }
+
+    /*
+     * If all zone resources are tied up on open zones we have to transition
+     * one of those to full.
+     */
+    QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_open, lru_entry) {
+        zd = &candidate->zd;
+        zs = nvme_zs(candidate);
+
+        trace_pci_nvme_zone_zrm_candidate(nvme_cid(req), ns->params.nsid,
+                                          nvme_zslba(candidate),
+                                          nvme_wp(candidate), zs);
+
+        /* the zone cannot be finished if it is currently writing */
+        if (candidate->wp_staging != le64_to_cpu(zd->wp)) {
+            continue;
+        }
+
+        break;
+    }
+
+    assert(candidate);
+
+out:
+    nvme_zone_excursion(n, ns, candidate, req);
+    return NVME_SUCCESS;
+}
+
 /*
  * nvme_zrm_transition validates zone state transitions under the constraint of
  * the Number of Active and Open Resources (NAR and NOR) limits as reported by
@@ -1253,8 +1356,10 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
 
         case NVME_ZS_ZSC:
             if (!ns->zns.resources.active) {
-                trace_pci_nvme_err_too_many_active_zones(nvme_cid(req));
-                return NVME_TOO_MANY_ACTIVE_ZONES;
+                status = nvme_zrm_release_active(n, ns, req);
+                if (status) {
+                    return status;
+                }
             }
 
             ns->zns.resources.active--;
@@ -1266,8 +1371,10 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
         case NVME_ZS_ZSIO:
         case NVME_ZS_ZSEO:
             if (!ns->zns.resources.active) {
-                trace_pci_nvme_err_too_many_active_zones(nvme_cid(req));
-                return NVME_TOO_MANY_ACTIVE_ZONES;
+                status = nvme_zrm_release_active(n, ns, req);
+                if (status) {
+                    return status;
+                }
             }
 
             if (!ns->zns.resources.open) {
@@ -2716,6 +2823,41 @@ static uint16_t nvme_effects_log(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_changed_zone_info(NvmeCtrl *n, uint32_t buf_len,
+    uint64_t off, NvmeRequest *req)
+{
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+    NvmeNamespace *ns = nvme_ns(n, nsid);
+    uint32_t trans_len;
+    uint16_t status;
+
+    if (unlikely(!ns)) {
+        return NVME_INVALID_NSID | NVME_DNR;
+    }
+
+    if (!nvme_ns_zoned(ns)) {
+        return NVME_INVALID_LOG_ID | NVME_DNR;
+    }
+
+    if (off > 4096) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    trans_len = MIN(4096 - off, buf_len);
+
+    status = nvme_dma(n, (uint8_t *) &ns->zns.changed_list + off, trans_len,
+                      DMA_DIRECTION_FROM_DEVICE, req);
+    if (status) {
+        return status;
+    }
+
+    memset(&ns->zns.changed_list, 0x0, sizeof(NvmeChangedZoneList));
+
+    nvme_clear_events(n, NVME_AER_TYPE_NOTICE);
+
+    return NVME_SUCCESS;
+}
+
 static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeCmd *cmd = &req->cmd;
@@ -2761,6 +2903,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
         return nvme_fw_log_info(n, len, off, req);
     case NVME_LOG_EFFECTS:
         return nvme_effects_log(n, len, off, req);
+    case NVME_LOG_CHANGED_ZONE_LIST:
+        return nvme_changed_zone_info(n, len, off, req);
     default:
         trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -3359,7 +3503,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
         if (((n->temperature >= n->features.temp_thresh_hi) ||
             (n->temperature <= n->features.temp_thresh_low)) &&
             NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) {
-            nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
+            nvme_enqueue_event(n, NULL, NVME_AER_TYPE_SMART,
                                NVME_AER_INFO_SMART_TEMP_THRESH,
                                NVME_LOG_SMART_INFO);
         }
@@ -3924,7 +4068,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            " sqid=%"PRIu32", ignoring", qid);
 
             if (n->outstanding_aers) {
-                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR,
                                    NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
                                    NVME_LOG_ERROR_INFO);
             }
@@ -3941,7 +4085,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            qid, new_head);
 
             if (n->outstanding_aers) {
-                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR,
                                    NVME_AER_INFO_ERR_INVALID_DB_VALUE,
                                    NVME_LOG_ERROR_INFO);
             }
@@ -3978,7 +4122,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            " sqid=%"PRIu32", ignoring", qid);
 
             if (n->outstanding_aers) {
-                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR,
                                    NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
                                    NVME_LOG_ERROR_INFO);
             }
@@ -3995,7 +4139,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            qid, new_tail);
 
             if (n->outstanding_aers) {
-                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR,
                                    NVME_AER_INFO_ERR_INVALID_DB_VALUE,
                                    NVME_LOG_ERROR_INFO);
             }
@@ -4286,6 +4430,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     id->mdts = n->params.mdts;
     id->ver = cpu_to_le32(NVME_SPEC_VER);
     id->cntrltype = 0x1;
+    id->oaes = cpu_to_le32(NVME_OAES_ZDCN);
     id->oacs = cpu_to_le16(0);
 
     /*
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 4b4f2ed7605f..c4c80644f782 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -101,6 +101,10 @@ pci_nvme_update_zone_descr(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"P
 pci_nvme_zone_zrm_transition(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t from, uint8_t to) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" from 0x%"PRIx8" to 0x%"PRIx8""
 pci_nvme_zone_zrm_candidate(uint16_t cid, uint32_t nsid, uint64_t zslba, uint64_t wp, uint8_t zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" wp 0x%"PRIx64" zc 0x%"PRIx8""
 pci_nvme_zone_zrm_release_open(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
+pci_nvme_zone_zrm_release_active(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
+pci_nvme_zone_zrm_excursion_not_allowed(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
+pci_nvme_zone_changed(uint32_t nsid, uint64_t zslba) "nsid %"PRIu32" zslba 0x%"PRIx64""
+pci_nvme_zone_excursion(uint32_t nsid, uint64_t zslba, const char *zc) "nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
 pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 68dac2582b06..688ee5496168 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -778,6 +778,7 @@ typedef struct NvmeDsmRange {
 enum NvmeAsyncEventRequest {
     NVME_AER_TYPE_ERROR                     = 0,
     NVME_AER_TYPE_SMART                     = 1,
+    NVME_AER_TYPE_NOTICE                    = 2,
     NVME_AER_TYPE_IO_SPECIFIC               = 6,
     NVME_AER_TYPE_VENDOR_SPECIFIC           = 7,
     NVME_AER_INFO_ERR_INVALID_DB_REGISTER   = 0,
@@ -993,6 +994,14 @@ typedef struct NvmeZoneDescriptor {
 #define NVME_ZS(zs) (((zs) >> 4) & 0xf)
 #define NVME_ZS_SET(zs, state) ((zs) = ((state) << 4))
 
+#define NVME_CHANGED_ZONE_LIST_MAX_IDS 511
+
+typedef struct NvmeChangedZoneList {
+    uint16_t num_ids;
+    uint8_t  rsvd2[6];
+    uint64_t ids[NVME_CHANGED_ZONE_LIST_MAX_IDS];
+} NvmeChangedZoneList;
+
 #define NVME_ZA_ZFC(za)  ((za) & (1 << 0))
 #define NVME_ZA_FZR(za)  ((za) & (1 << 1))
 #define NVME_ZA_RZR(za)  ((za) & (1 << 2))
@@ -1428,5 +1437,6 @@ static inline void _nvme_check_size(void)
     QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescriptor) != 64);
     QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeChangedZoneList) != 4096);
 }
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/10] hw/block/nvme: support reset/finish recommended limits
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (8 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 09/10] hw/block/nvme: allow zone excursions Klaus Jensen
@ 2020-06-30 10:01 ` Klaus Jensen
  2020-06-30 12:59 ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Niklas Cassel
  10 siblings, 0 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 10:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Klaus Jensen, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Add the rrl and frl device parameters. The parameters specify the number
of seconds before the device may perform an internal operation to
"clear" the Reset Zone Recommended and Finish Zone Recommended
attributes respectively.

When the attibutes are set is governed by the rrld and frld parameters
(Reset/Finish Recomended Limit Delay). The Reset Zone Recommended Delay
starts when a zone becomes full. The Finish Zone Recommended Delay
starts when the zone is first activated.  When the limits are reached,
the attributes are cleared again and the process is restarted.

If zone excursions are enabled (they are by default), when the Finish
Recommended Limit is reached, the device will finish the zone.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme-ns.c    | 105 ++++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme-ns.h    |  13 ++++++
 hw/block/nvme.c       |  49 +++++++++++++-------
 hw/block/nvme.h       |   7 +++
 hw/block/trace-events |   3 +-
 5 files changed, 160 insertions(+), 17 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 3b9fa91c7af8..7f9b1d526197 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -25,6 +25,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
 
+#include "trace.h"
 #include "nvme.h"
 #include "nvme-ns.h"
 
@@ -48,6 +49,91 @@ const char *nvme_zs_to_str(NvmeZoneState zs)
     return NULL;
 }
 
+static void nvme_ns_process_timer(void *opaque)
+{
+    NvmeNamespace *ns = opaque;
+    BusState *s = qdev_get_parent_bus(&ns->parent_obj);
+    NvmeCtrl *n = NVME(s->parent);
+    NvmeZone *zone;
+
+    trace_pci_nvme_ns_process_timer(ns->params.nsid);
+
+    int64_t next_timer = INT64_MAX, now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+    QTAILQ_FOREACH(zone, &ns->zns.resources.lru_open, lru_entry) {
+        int64_t activated_ns = now - zone->stats.activated_ns;
+        if (activated_ns < ns->zns.frld_ns) {
+            next_timer = MIN(next_timer, zone->stats.activated_ns +
+                             ns->zns.frld_ns);
+
+            break;
+        }
+
+        if (activated_ns < ns->zns.frld_ns + ns->zns.frl_ns) {
+            NVME_ZA_SET_FZR(zone->zd.za, 0x1);
+            nvme_zone_changed(n, ns, zone);
+
+            next_timer = MIN(next_timer, now + ns->zns.frl_ns);
+
+            continue;
+        }
+
+        if (zone->wp_staging != le64_to_cpu(zone->zd.wp)) {
+            next_timer = now + 500;
+            continue;
+        }
+
+        nvme_zone_excursion(n, ns, zone, NULL);
+    }
+
+    QTAILQ_FOREACH(zone, &ns->zns.resources.lru_active, lru_entry) {
+        int64_t activated_ns = now - zone->stats.activated_ns;
+        if (activated_ns < ns->zns.frld_ns) {
+            next_timer = MIN(next_timer, zone->stats.activated_ns +
+                             ns->zns.frld_ns);
+
+            break;
+        }
+
+        if (activated_ns < ns->zns.frld_ns + ns->zns.frl_ns) {
+            NVME_ZA_SET_FZR(zone->zd.za, 0x1);
+            nvme_zone_changed(n, ns, zone);
+
+            next_timer = MIN(next_timer, now + ns->zns.frl_ns);
+
+            continue;
+        }
+
+        nvme_zone_excursion(n, ns, zone, NULL);
+    }
+
+    QTAILQ_FOREACH(zone, &ns->zns.lru_finished, lru_entry) {
+        int64_t finished_ns = now - zone->stats.finished_ns;
+        if (finished_ns < ns->zns.rrld_ns) {
+            next_timer = MIN(next_timer, zone->stats.finished_ns +
+                             ns->zns.rrld_ns);
+
+            break;
+        }
+
+        if (finished_ns < ns->zns.rrld_ns + ns->zns.rrl_ns) {
+            NVME_ZA_SET_RZR(zone->zd.za, 0x1);
+            nvme_zone_changed(n, ns, zone);
+
+            next_timer = MIN(next_timer, now + ns->zns.rrl_ns);
+
+            nvme_zone_changed(n, ns, zone);
+            continue;
+        }
+
+        NVME_ZA_SET_RZR(zone->zd.za, 0x0);
+    }
+
+    if (next_timer != INT64_MAX) {
+        timer_mod(ns->zns.timer, next_timer);
+    }
+}
+
 static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp)
 {
 	Error *local_err = NULL;
@@ -262,6 +348,21 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 
     id_ns->ncap = ns->zns.info.num_zones * ns->params.zns.zcap;
 
+    id_ns_zns->rrl = ns->params.zns.rrl;
+    id_ns_zns->frl = ns->params.zns.frl;
+
+    if (ns->params.zns.rrl || ns->params.zns.frl) {
+        ns->zns.rrl_ns = ns->params.zns.rrl * NANOSECONDS_PER_SECOND;
+        ns->zns.rrld_ns = ns->params.zns.rrld * NANOSECONDS_PER_SECOND;
+        ns->zns.frl_ns = ns->params.zns.frl * NANOSECONDS_PER_SECOND;
+        ns->zns.frld_ns = ns->params.zns.frld * NANOSECONDS_PER_SECOND;
+
+        ns->zns.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                     nvme_ns_process_timer, ns);
+
+        QTAILQ_INIT(&ns->zns.lru_finished);
+    }
+
     id_ns_zns->mar = cpu_to_le32(ns->params.zns.mar);
     id_ns_zns->mor = cpu_to_le32(ns->params.zns.mor);
 
@@ -515,6 +616,10 @@ static Property nvme_ns_props[] = {
     DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0),
     DEFINE_PROP_UINT32("zns.mar", NvmeNamespace, params.zns.mar, 0xffffffff),
     DEFINE_PROP_UINT32("zns.mor", NvmeNamespace, params.zns.mor, 0xffffffff),
+    DEFINE_PROP_UINT32("zns.rrl", NvmeNamespace, params.zns.rrl, 0),
+    DEFINE_PROP_UINT32("zns.frl", NvmeNamespace, params.zns.frl, 0),
+    DEFINE_PROP_UINT32("zns.rrld", NvmeNamespace, params.zns.rrld, 0),
+    DEFINE_PROP_UINT32("zns.frld", NvmeNamespace, params.zns.frld, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 6acda5c2cf3f..f92045f19948 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,10 @@ typedef struct NvmeNamespaceParams {
         uint16_t ozcs;
         uint32_t mar;
         uint32_t mor;
+        uint32_t rrl;
+        uint32_t frl;
+        uint32_t rrld;
+        uint32_t frld;
     } zns;
 } NvmeNamespaceParams;
 
@@ -40,6 +44,11 @@ typedef struct NvmeZone {
 
     uint64_t wp_staging;
 
+    struct {
+        int64_t activated_ns;
+        int64_t finished_ns;
+    } stats;
+
     QTAILQ_ENTRY(NvmeZone) lru_entry;
 } NvmeZone;
 
@@ -77,6 +86,10 @@ typedef struct NvmeNamespace {
         } resources;
 
         NvmeChangedZoneList changed_list;
+
+        QTAILQ_HEAD(, NvmeZone) lru_finished;
+        QEMUTimer *timer;
+        int64_t rrl_ns, rrld_ns, frl_ns, frld_ns;
     } zns;
 } NvmeNamespace;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6db6daa62bc5..f28373feb887 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -875,13 +875,13 @@ static void nvme_process_aers(void *opaque)
     }
 }
 
-static void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns,
-                               uint8_t event_type, uint8_t event_info,
-                               uint8_t log_page)
+void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns, uint8_t event_type,
+                        uint8_t event_info, uint8_t log_page)
 {
     NvmeAsyncEvent *event;
 
-    trace_pci_nvme_enqueue_event(event_type, event_info, log_page);
+    trace_pci_nvme_enqueue_event(ns ? ns->params.nsid : -1, event_type,
+                                 event_info, log_page);
 
     if (n->aer_queued == n->params.aer_max_queued) {
         trace_pci_nvme_enqueue_event_noqueue(n->aer_queued);
@@ -1194,7 +1194,7 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req,
     nvme_req_add_aio(req, aio);
 }
 
-static void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone)
+void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone)
 {
     uint16_t num_ids = le16_to_cpu(ns->zns.changed_list.num_ids);
 
@@ -1213,12 +1213,8 @@ static void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone)
                        NVME_LOG_CHANGED_ZONE_LIST);
 }
 
-static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
-                                    NvmeZone *zone, NvmeZoneState to,
-                                    NvmeRequest *req);
-
-static void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
-    NvmeRequest *req)
+void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
+                         NvmeRequest *req)
 {
     trace_pci_nvme_zone_excursion(ns->params.nsid, nvme_zslba(zone),
                                   nvme_zs_str(zone));
@@ -1226,6 +1222,7 @@ static void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
     assert(nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSF, req) == NVME_SUCCESS);
 
     NVME_ZA_SET_ZFC(zone->zd.za, 0x1);
+    NVME_ZA_SET_FZR(zone->zd.za, 0x0);
 
     nvme_zone_changed(n, ns, zone);
 
@@ -1333,9 +1330,8 @@ out:
  * The function does NOT change the Zone Attribute field; this must be done by
  * the caller.
  */
-static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
-                                    NvmeZone *zone, NvmeZoneState to,
-                                    NvmeRequest *req)
+uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
+                             NvmeZoneState to, NvmeRequest *req)
 {
     NvmeZoneState from = nvme_zs(zone);
     uint16_t status;
@@ -1366,7 +1362,7 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
 
             QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry);
 
-            goto out;
+            goto activated;
 
         case NVME_ZS_ZSIO:
         case NVME_ZS_ZSEO:
@@ -1389,7 +1385,7 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
 
             QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry);
 
-            goto out;
+            goto activated;
 
         default:
             return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
@@ -1512,8 +1508,28 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns,
         return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR;
     }
 
+activated:
+    zone->stats.activated_ns = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+    if (ns->params.zns.frld && !timer_pending(ns->zns.timer)) {
+        int64_t next_timer = zone->stats.activated_ns + ns->zns.frld_ns;
+        timer_mod(ns->zns.timer, next_timer);
+    }
+
 out:
     nvme_zs_set(zone, to);
+
+    if (to == NVME_ZS_ZSF && ns->params.zns.rrld) {
+        QTAILQ_INSERT_TAIL(&ns->zns.lru_finished, zone, lru_entry);
+
+        zone->stats.finished_ns = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+        if (!timer_pending(ns->zns.timer)) {
+            int64_t next_timer = zone->stats.finished_ns + ns->zns.rrld_ns;
+            timer_mod(ns->zns.timer, next_timer);
+        }
+    }
+
     return NVME_SUCCESS;
 }
 
@@ -1979,6 +1995,7 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req,
 
     case NVME_ZS_ZSRO:
         assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req));
+
         nvme_update_zone_info(ns, req, zone);
         return NVME_NO_COMPLETE;
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 309fb1b94ecb..e51a38546080 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -318,5 +318,12 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
 }
 
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
+                             NvmeZoneState to, NvmeRequest *req);
+void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns, uint8_t event_type,
+                        uint8_t event_info, uint8_t log_page);
+void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone,
+                         NvmeRequest *req);
+void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone);
 
 #endif /* HW_NVME_H */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index c4c80644f782..249487ae79fc 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -85,7 +85,7 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16""
 pci_nvme_aer_aerl_exceeded(void) "aerl exceeded"
 pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8""
 pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
-pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+pci_nvme_enqueue_event(uint32_t nsid, uint8_t typ, uint8_t info, uint8_t log_page) "nsid 0x%"PRIx32" type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
 pci_nvme_enqueue_event_noqueue(int queued) "queued %d"
 pci_nvme_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
 pci_nvme_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
@@ -105,6 +105,7 @@ pci_nvme_zone_zrm_release_active(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsi
 pci_nvme_zone_zrm_excursion_not_allowed(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 pci_nvme_zone_changed(uint32_t nsid, uint64_t zslba) "nsid %"PRIu32" zslba 0x%"PRIx64""
 pci_nvme_zone_excursion(uint32_t nsid, uint64_t zslba, const char *zc) "nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\""
+pci_nvme_ns_process_timer(uint32_t nsid) "nsid %"PRIu32""
 pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
  2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
                   ` (9 preceding siblings ...)
  2020-06-30 10:01 ` [PATCH 10/10] hw/block/nvme: support reset/finish recommended limits Klaus Jensen
@ 2020-06-30 12:59 ` Niklas Cassel
  2020-06-30 14:09   ` Philippe Mathieu-Daudé
  2020-06-30 20:29   ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
  10 siblings, 2 replies; 24+ messages in thread
From: Niklas Cassel @ 2020-06-30 12:59 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: Kevin Wolf, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

On Tue, Jun 30, 2020 at 12:01:29PM +0200, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Hi all,

Hello Klaus,

> 
> This series adds support for TP 4056 ("Namespace Types") and TP 4053
> ("Zoned Namespaces") and is an alternative implementation to the one
> submitted by Dmitry[1].
> 
> While I don't want this to end up as a discussion about the merits of
> each version, I want to point out a couple of differences from Dmitry's
> version. At a glance, my version
> 
>   * builds on my patch series that adds fairly complete NVMe v1.4
>     mandatory support, as well as nice-to-have feature such as SGLs,
>     multiple namespaces and mostly just overall clean up. This finally
>     brings the nvme device into a fairly compliant state on which we can
>     add new features. I've tried hard to get these compliance and
>     clean-up patches merged for a long time (in parallel with developing
>     the emulation of NST and ZNS) and I would be really sad to see them
>     by-passed since they have already been through many iterations and
>     already carries Acked- and Reviewed-by's for the bulk of the
>     patches. I think the nvme device is already in a "frankenstate" wrt.
>     the implemented nvme version and the features it currently supports,
>     so I think this kind of cleanup is long overdue.
> 
>   * uses an attached blockdev and standard blk_aio for persistent zone
>     info. This is the same method used in our patches for Write
>     Uncorrectable and (separate and extended lba) metadata support, but
>     I've left those optional features out for now to ease the review
>     process.
> 
>   * relies on the universal dulbe support added in ("hw/block/nvme: add
>     support for dulbe") and sparse images for handling reads in gaps
>     (above write pointer and below ZSZE); that is - the size of the
>     underlying blockdev is in terms of ZSZE, not ZCAP
> 
>   * the controller uses timers to autonomously finish zones (wrt. FRL)

AFAICT, Dmitry's patches does this as well.

> 
> I've been on paternity leave for a month, so I havn't been around to
> review Dmitry's patches, but I have started that process now. I would
> also be happy to work with Dmitry & Friends on merging our versions to
> get the best of both worlds if it makes sense.
> 
> This series and all preparatory patch sets (the ones I've been posting
> yesterday and today) are available on my GitHub[2]. Unfortunately
> Patchew got screwed up in the middle of me sending patches and it never
> picked up v2 of "hw/block/nvme: support multiple namespaces" because it
> was getting late and I made a mistake with the CC's. So my posted series
> don't apply according to Patchew, but they actually do if you follow the
> Based-on's (... or just grab [2]).
> 
> 
>   [1]: Message-Id: <20200617213415.22417-1-dmitry.fomichev@wdc.com>
>   [2]: https://github.com/birkelund/qemu/tree/for-master/nvme
> 
> 
> Based-on: <20200630043122.1307043-1-its@irrelevant.dk>
> ("[PATCH 0/3] hw/block/nvme: bump to v1.4")

Is this the only patch series that this series depends on?

In the beginning of the cover letter, you mentioned
"NVMe v1.4 mandatory support", "SGLs", "multiple namespaces",
and "and mostly just overall clean up".

> 
> Klaus Jensen (10):
>   hw/block/nvme: support I/O Command Sets
>   hw/block/nvme: add zns specific fields and types
>   hw/block/nvme: add basic read/write for zoned namespaces
>   hw/block/nvme: add the zone management receive command
>   hw/block/nvme: add the zone management send command
>   hw/block/nvme: add the zone append command
>   hw/block/nvme: track and enforce zone resources
>   hw/block/nvme: allow open to close transitions by controller
>   hw/block/nvme: allow zone excursions
>   hw/block/nvme: support reset/finish recommended limits
> 
>  block/nvme.c          |    6 +-
>  hw/block/nvme-ns.c    |  397 +++++++++-
>  hw/block/nvme-ns.h    |  148 +++-
>  hw/block/nvme.c       | 1676 +++++++++++++++++++++++++++++++++++++++--
>  hw/block/nvme.h       |   76 +-
>  hw/block/trace-events |   43 +-
>  include/block/nvme.h  |  252 ++++++-
>  7 files changed, 2469 insertions(+), 129 deletions(-)
> 
> -- 
> 2.27.0
> 

I think that you have done a great job getting the NVMe
driver out of a frankenstate, and made it compliant with
a proper spec (NVMe 1.4).

I'm also a big fan of the refactoring so that the driver
handles more than one namespace, and the new bus model.

I know that you first sent your
"nvme: support NVMe v1.3d, SGLs and multiple namespaces"
patch series July, last year.

Looking at your outstanding patch series on patchwork:
https://patchwork.kernel.org/project/qemu-devel/list/?submitter=188679

(Feel free to correct me if I have misunderstood anything.)

I see that these are related to your patch series from July last year:
hw/block/nvme: bump to v1.3
hw/block/nvme: support scatter gather lists
hw/block/nvme: support multiple namespaces
hw/block/nvme: bump to v1.4


This patch series seems minor and could probably be merged immediately:
hw/block/nvme: handle transient dma errors


This patch series looks a bit weird:
hw/block/nvme: AIO and address mapping refactoring

Since it looks like a V1 post, and was first posted yesterday.
However, 2 out of the 17 patches in are Acked-by: Keith.
(Perhaps some of your previously posted patches was put inside
this new patch series?)


This patch series:
hw/block/nvme: namespace types and zoned namespaces

Which was first posted today. Up until earlier today, we haven't seen
any patches from you related to ZNS (only overall NVMe cleanups).
Dmitry's ZNS patches have been on the list since 2020-06-16.


Just a friendly suggestion, how about:

1) We get your

hw/block/nvme: bump to v1.3
hw/block/nvme: support scatter gather lists
hw/block/nvme: support multiple namespaces
hw/block/nvme: bump to v1.4

patch series merged.

2) We get Dmitry's patch series merged.

Shared 4:th) If there is any feature that you miss in Dmitry's patch series,
perhaps you could send patches to add what you are missing.

Shared 4:th) Your other patch series:
hw/block/nvme: AIO and address mapping refactoring could get merged.


Please don't take this suggestion the wrong way, I'm simply trying
to come up with a way to move forward from here.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
  2020-06-30 12:59 ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Niklas Cassel
@ 2020-06-30 14:09   ` Philippe Mathieu-Daudé
  2020-06-30 15:42     ` Keith Busch
  2020-06-30 20:29   ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
  1 sibling, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-06-30 14:09 UTC (permalink / raw)
  To: Niklas Cassel, Klaus Jensen
  Cc: Kevin Wolf, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Matias Bjorling

On 6/30/20 2:59 PM, Niklas Cassel wrote:
> On Tue, Jun 30, 2020 at 12:01:29PM +0200, Klaus Jensen wrote:
>> From: Klaus Jensen <k.jensen@samsung.com>
>>
>> Hi all,
> 
> Hello Klaus,
> 
>>
>> This series adds support for TP 4056 ("Namespace Types") and TP 4053
>> ("Zoned Namespaces") and is an alternative implementation to the one
>> submitted by Dmitry[1].
>>
>> While I don't want this to end up as a discussion about the merits of
>> each version, I want to point out a couple of differences from Dmitry's
>> version. At a glance, my version
>>
>>   * builds on my patch series that adds fairly complete NVMe v1.4
>>     mandatory support, as well as nice-to-have feature such as SGLs,
>>     multiple namespaces and mostly just overall clean up. This finally
>>     brings the nvme device into a fairly compliant state on which we can
>>     add new features. I've tried hard to get these compliance and
>>     clean-up patches merged for a long time (in parallel with developing
>>     the emulation of NST and ZNS) and I would be really sad to see them
>>     by-passed since they have already been through many iterations and
>>     already carries Acked- and Reviewed-by's for the bulk of the
>>     patches. I think the nvme device is already in a "frankenstate" wrt.
>>     the implemented nvme version and the features it currently supports,
>>     so I think this kind of cleanup is long overdue.
>>
>>   * uses an attached blockdev and standard blk_aio for persistent zone
>>     info. This is the same method used in our patches for Write
>>     Uncorrectable and (separate and extended lba) metadata support, but
>>     I've left those optional features out for now to ease the review
>>     process.
>>
>>   * relies on the universal dulbe support added in ("hw/block/nvme: add
>>     support for dulbe") and sparse images for handling reads in gaps
>>     (above write pointer and below ZSZE); that is - the size of the
>>     underlying blockdev is in terms of ZSZE, not ZCAP
>>
>>   * the controller uses timers to autonomously finish zones (wrt. FRL)
> 
> AFAICT, Dmitry's patches does this as well.
> 
>>
>> I've been on paternity leave for a month, so I havn't been around to
>> review Dmitry's patches, but I have started that process now. I would
>> also be happy to work with Dmitry & Friends on merging our versions to
>> get the best of both worlds if it makes sense.
>>
>> This series and all preparatory patch sets (the ones I've been posting
>> yesterday and today) are available on my GitHub[2]. Unfortunately
>> Patchew got screwed up in the middle of me sending patches and it never
>> picked up v2 of "hw/block/nvme: support multiple namespaces" because it
>> was getting late and I made a mistake with the CC's. So my posted series
>> don't apply according to Patchew, but they actually do if you follow the
>> Based-on's (... or just grab [2]).
>>
>>
>>   [1]: Message-Id: <20200617213415.22417-1-dmitry.fomichev@wdc.com>
>>   [2]: https://github.com/birkelund/qemu/tree/for-master/nvme
>>
>>
>> Based-on: <20200630043122.1307043-1-its@irrelevant.dk>
>> ("[PATCH 0/3] hw/block/nvme: bump to v1.4")
> 
> Is this the only patch series that this series depends on?
> 
> In the beginning of the cover letter, you mentioned
> "NVMe v1.4 mandatory support", "SGLs", "multiple namespaces",
> and "and mostly just overall clean up".
> 
>>
>> Klaus Jensen (10):
>>   hw/block/nvme: support I/O Command Sets
>>   hw/block/nvme: add zns specific fields and types
>>   hw/block/nvme: add basic read/write for zoned namespaces
>>   hw/block/nvme: add the zone management receive command
>>   hw/block/nvme: add the zone management send command
>>   hw/block/nvme: add the zone append command
>>   hw/block/nvme: track and enforce zone resources
>>   hw/block/nvme: allow open to close transitions by controller
>>   hw/block/nvme: allow zone excursions
>>   hw/block/nvme: support reset/finish recommended limits
>>
>>  block/nvme.c          |    6 +-
>>  hw/block/nvme-ns.c    |  397 +++++++++-
>>  hw/block/nvme-ns.h    |  148 +++-
>>  hw/block/nvme.c       | 1676 +++++++++++++++++++++++++++++++++++++++--
>>  hw/block/nvme.h       |   76 +-
>>  hw/block/trace-events |   43 +-
>>  include/block/nvme.h  |  252 ++++++-
>>  7 files changed, 2469 insertions(+), 129 deletions(-)
>>
>> -- 
>> 2.27.0
>>
> 
> I think that you have done a great job getting the NVMe
> driver out of a frankenstate, and made it compliant with
> a proper spec (NVMe 1.4).
> 
> I'm also a big fan of the refactoring so that the driver
> handles more than one namespace, and the new bus model.
> 
> I know that you first sent your
> "nvme: support NVMe v1.3d, SGLs and multiple namespaces"
> patch series July, last year.
> 
> Looking at your outstanding patch series on patchwork:
> https://patchwork.kernel.org/project/qemu-devel/list/?submitter=188679
> 
> (Feel free to correct me if I have misunderstood anything.)
> 
> I see that these are related to your patch series from July last year:
> hw/block/nvme: bump to v1.3
> hw/block/nvme: support scatter gather lists
> hw/block/nvme: support multiple namespaces
> hw/block/nvme: bump to v1.4
> 
> 
> This patch series seems minor and could probably be merged immediately:
> hw/block/nvme: handle transient dma errors
> 
> 
> This patch series looks a bit weird:
> hw/block/nvme: AIO and address mapping refactoring
> 
> Since it looks like a V1 post, and was first posted yesterday.
> However, 2 out of the 17 patches in are Acked-by: Keith.
> (Perhaps some of your previously posted patches was put inside
> this new patch series?)
> 
> 
> This patch series:
> hw/block/nvme: namespace types and zoned namespaces
> 
> Which was first posted today. Up until earlier today, we haven't seen
> any patches from you related to ZNS (only overall NVMe cleanups).
> Dmitry's ZNS patches have been on the list since 2020-06-16.
> 
> 
> Just a friendly suggestion, how about:
> 
> 1) We get your
> 
> hw/block/nvme: bump to v1.3
> hw/block/nvme: support scatter gather lists
> hw/block/nvme: support multiple namespaces
> hw/block/nvme: bump to v1.4
> 
> patch series merged.
> 
> 2) We get Dmitry's patch series merged.
> 
> Shared 4:th) If there is any feature that you miss in Dmitry's patch series,
> perhaps you could send patches to add what you are missing.
> 
> Shared 4:th) Your other patch series:
> hw/block/nvme: AIO and address mapping refactoring could get merged.
> 
> 
> Please don't take this suggestion the wrong way, I'm simply trying
> to come up with a way to move forward from here.

Few months ago Klaus sent a bomb series with ~80 patches, we asked
him to split in digestible series of ~20 patches.

Earlier in this cover Klaus provided a link to his git repository
with all the patches sorted [2]:
https://github.com/birkelund/qemu/tree/for-master/nvme

This seems enough to get the big picture.

Niklas Cassel, it would be helpful if you or Dmitry can review
Klaus patches. I see Klaus is already reviewing Dmitry ones.

Both Keith and Kevin are quite busy recently.

To help them I suggest once you reviewed your patches each other,
one of you could send the big series with all patches together.

Anyway soft-freeze is next week, so you have to decide what is
critical.

What I see doable for the following days is:
- hw/block/nvme: Fix I/O BAR structure [3]
- hw/block/nvme: handle transient dma errors
- hw/block/nvme: bump to v1.3

[3] https://www.mail-archive.com/qemu-devel@nongnu.org/msg718086.html

> 
> 
> Kind regards,
> Niklas
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
  2020-06-30 14:09   ` Philippe Mathieu-Daudé
@ 2020-06-30 15:42     ` Keith Busch
  2020-06-30 20:36       ` Klaus Jensen
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2020-06-30 15:42 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, qemu-block,
	Dmitry Fomichev, Klaus Jensen, qemu-devel, Max Reitz,
	Klaus Jensen, Javier Gonzalez, Maxim Levitsky, Matias Bjorling

On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
> What I see doable for the following days is:
> - hw/block/nvme: Fix I/O BAR structure [3]
> - hw/block/nvme: handle transient dma errors
> - hw/block/nvme: bump to v1.3


These look like sensible patches to rebase future work on, IMO. The 1.3
updates had been prepared a while ago, at least.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
  2020-06-30 12:59 ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Niklas Cassel
  2020-06-30 14:09   ` Philippe Mathieu-Daudé
@ 2020-06-30 20:29   ` Klaus Jensen
  2020-07-01  1:10     ` Dmitry Fomichev
  1 sibling, 1 reply; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 20:29 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Kevin Wolf, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

On Jun 30 12:59, Niklas Cassel wrote:
> On Tue, Jun 30, 2020 at 12:01:29PM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Hi all,
> 
> Hello Klaus,
> 

Hi Niklas,

> > 
> >   * the controller uses timers to autonomously finish zones (wrt. FRL)
> 
> AFAICT, Dmitry's patches does this as well.
> 

Hmm, yeah. Something is going on at least. It's not really clear to me
why it works or what is happening with that admin completion queue
timer, but I'll dig through it.

> > 
> > I've been on paternity leave for a month, so I havn't been around to
> > review Dmitry's patches, but I have started that process now. I would
> > also be happy to work with Dmitry & Friends on merging our versions to
> > get the best of both worlds if it makes sense.
> > 
> > This series and all preparatory patch sets (the ones I've been posting
> > yesterday and today) are available on my GitHub[2]. Unfortunately
> > Patchew got screwed up in the middle of me sending patches and it never
> > picked up v2 of "hw/block/nvme: support multiple namespaces" because it
> > was getting late and I made a mistake with the CC's. So my posted series
> > don't apply according to Patchew, but they actually do if you follow the
> > Based-on's (... or just grab [2]).
> > 
> > 
> >   [1]: Message-Id: <20200617213415.22417-1-dmitry.fomichev@wdc.com>
> >   [2]: https://github.com/birkelund/qemu/tree/for-master/nvme
> > 
> > 
> > Based-on: <20200630043122.1307043-1-its@irrelevant.dk>
> > ("[PATCH 0/3] hw/block/nvme: bump to v1.4")
> 
> Is this the only patch series that this series depends on?
> 
> In the beginning of the cover letter, you mentioned
> "NVMe v1.4 mandatory support", "SGLs", "multiple namespaces",
> and "and mostly just overall clean up".
> 

No, its a string of series that all has a Based-on tag (that is, "[PATCH
0/3] hw/block/nvme: bump to v1.4" has another Based-on tag that points
to the dependency of that). The point was to have patchew nicely apply
everything, but it broke midway...

As Philippe pointed out, all of the patch sets are integrated in the
GitHub tree, applied to QEMU master.

> 
> I think that you have done a great job getting the NVMe
> driver out of a frankenstate, and made it compliant with
> a proper spec (NVMe 1.4).
> 
> I'm also a big fan of the refactoring so that the driver
> handles more than one namespace, and the new bus model.
> 

Well, thanks! :)

> I know that you first sent your
> "nvme: support NVMe v1.3d, SGLs and multiple namespaces"
> patch series July, last year.
> 
> Looking at your outstanding patch series on patchwork:
> https://patchwork.kernel.org/project/qemu-devel/list/?submitter=188679
> 
> (Feel free to correct me if I have misunderstood anything.)
> 
> I see that these are related to your patch series from July last year:
> hw/block/nvme: bump to v1.3
> hw/block/nvme: support scatter gather lists
> hw/block/nvme: support multiple namespaces
> hw/block/nvme: bump to v1.4
> 

Yeah this stuff has been around for a while so the history on patchwork
is a mess.

> 
> This patch series seems minor and could probably be merged immediately:
> hw/block/nvme: handle transient dma errors
> 

Sure, but it's nicer in combination with the previous series
("hw/block/nvme: AIO and address mapping refactoring"). What I /can/ do
is rip out "hw/block/nvme: allow multiple aios per command" as that
patch might require more time for reviews. The rest of that series are
clean ups and a couple of bug fixes.

> 
> This patch series looks a bit weird:
> hw/block/nvme: AIO and address mapping refactoring
> 
> Since it looks like a V1 post, and was first posted yesterday.
> However, 2 out of the 17 patches in are Acked-by: Keith.
> (Perhaps some of your previously posted patches was put inside
> this new patch series?)
> 

Yes that and reviewers requested a lot of separation, so basically the
patch set ballooned.

> 
> This patch series:
> hw/block/nvme: namespace types and zoned namespaces
> 
> Which was first posted today. Up until earlier today, we haven't seen
> any patches from you related to ZNS (only overall NVMe cleanups).
> Dmitry's ZNS patches have been on the list since 2020-06-16.
> 

Yeah, as I mentioned in my cover letter, I was on leave, so I wasn't
around for the big ZNS release day either. But, honestly, I think this
is irrelevant - code should be merged based on technical reasons (not
technicalities).

> 
> Just a friendly suggestion, how about:
> 
> 1) We get your
> 
> hw/block/nvme: bump to v1.3
> hw/block/nvme: support scatter gather lists
> hw/block/nvme: support multiple namespaces
> hw/block/nvme: bump to v1.4
> 
> patch series merged.
> 

Blowing my own horn here, but yeah, it seems like everyone would like to
see this merged.

> 2) We get Dmitry's patch series merged.
> 
> Shared 4:th) If there is any feature that you miss in Dmitry's patch series,
> perhaps you could send patches to add what you are missing.
>

Looks like the two version are pretty much on par in terms of features.

> Shared 4:th) Your other patch series:
> hw/block/nvme: AIO and address mapping refactoring could get merged.
> 

As stated above I think its only a single commit ("hw/block/nvme: allow
multiple aios per command") that is controversial in that series.

> 
> Please don't take this suggestion the wrong way, I'm simply trying
> to come up with a way to move forward from here.
> 

Absolutely - I totally get that you want to move forward with Dmitry's
series, but I'd like to finish my review before committing to anything.


Cheers,
Klaus


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
  2020-06-30 15:42     ` Keith Busch
@ 2020-06-30 20:36       ` Klaus Jensen
  2020-07-01 10:34         ` nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces) Kevin Wolf
  0 siblings, 1 reply; 24+ messages in thread
From: Klaus Jensen @ 2020-06-30 20:36 UTC (permalink / raw)
  To: Keith Busch
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, qemu-block,
	Dmitry Fomichev, Klaus Jensen, qemu-devel, Max Reitz,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

On Jun 30 08:42, Keith Busch wrote:
> On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
> > What I see doable for the following days is:
> > - hw/block/nvme: Fix I/O BAR structure [3]
> > - hw/block/nvme: handle transient dma errors
> > - hw/block/nvme: bump to v1.3
> 
> 
> These look like sensible patches to rebase future work on, IMO. The 1.3
> updates had been prepared a while ago, at least.

I think Philippe's "hw/block/nvme: Fix I/O BAR structure" series is a
no-brainer. It just needs to get in asap.

The "hw/block/nvme: handle transient dma errors" series would really
benefit from most of the patches in my "hw/block/nvme: AIO and address
mapping refactoring" series. The elephant in the room is the AIO part
(the "hw/block/nvme: allow multiple aios per command" patch), so I will
get rid of it and leave the cleanup patches there and post it as a new
series together with the "handle transient dma errors" fixes. This would
make it a series of around ~17-18 patches, but I think they are all
quite reviewable.

The bump to v1.3 should also pretty much be ready for merging.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces
  2020-06-30 20:29   ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
@ 2020-07-01  1:10     ` Dmitry Fomichev
  0 siblings, 0 replies; 24+ messages in thread
From: Dmitry Fomichev @ 2020-07-01  1:10 UTC (permalink / raw)
  To: Klaus Jensen, Niklas Cassel
  Cc: Kevin Wolf, Damien Le Moal, qemu-block, Klaus Jensen, qemu-devel,
	Max Reitz, Keith Busch, Javier Gonzalez, Maxim Levitsky,
	Philippe Mathieu-Daudé,
	Matias Bjorling



> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Tuesday, June 30, 2020 4:30 PM
> To: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: qemu-block@nongnu.org; Klaus Jensen <k.jensen@samsung.com>;
> qemu-devel@nongnu.org; Keith Busch <kbusch@kernel.org>; Max Reitz
> <mreitz@redhat.com>; Kevin Wolf <kwolf@redhat.com>; Javier Gonzalez
> <javier.gonz@samsung.com>; Maxim Levitsky <mlevitsk@redhat.com>;
> Philippe Mathieu-Daudé <philmd@redhat.com>; Dmitry Fomichev
> <Dmitry.Fomichev@wdc.com>; Damien Le Moal
> <Damien.LeMoal@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned
> namespaces
> 
> On Jun 30 12:59, Niklas Cassel wrote:
> > On Tue, Jun 30, 2020 at 12:01:29PM +0200, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > >
> > > Hi all,
> >
> > Hello Klaus,
> >
> 
> Hi Niklas,
> 
> > >
> > >   * the controller uses timers to autonomously finish zones (wrt. FRL)
> >
> > AFAICT, Dmitry's patches does this as well.
> >
> 
> Hmm, yeah. Something is going on at least. It's not really clear to me
> why it works or what is happening with that admin completion queue
> timer, but I'll dig through it.
> 
> > >
> > > I've been on paternity leave for a month, so I havn't been around to
> > > review Dmitry's patches, but I have started that process now. I would
> > > also be happy to work with Dmitry & Friends on merging our versions to
> > > get the best of both worlds if it makes sense.
> > >
> > > This series and all preparatory patch sets (the ones I've been posting
> > > yesterday and today) are available on my GitHub[2]. Unfortunately
> > > Patchew got screwed up in the middle of me sending patches and it
> never
> > > picked up v2 of "hw/block/nvme: support multiple namespaces" because
> it
> > > was getting late and I made a mistake with the CC's. So my posted series
> > > don't apply according to Patchew, but they actually do if you follow the
> > > Based-on's (... or just grab [2]).
> > >
> > >
> > >   [1]: Message-Id: <20200617213415.22417-1-dmitry.fomichev@wdc.com>
> > >   [2]: https://github.com/birkelund/qemu/tree/for-master/nvme
> > >
> > >
> > > Based-on: <20200630043122.1307043-1-its@irrelevant.dk>
> > > ("[PATCH 0/3] hw/block/nvme: bump to v1.4")
> >
> > Is this the only patch series that this series depends on?
> >
> > In the beginning of the cover letter, you mentioned
> > "NVMe v1.4 mandatory support", "SGLs", "multiple namespaces",
> > and "and mostly just overall clean up".
> >
> 
> No, its a string of series that all has a Based-on tag (that is, "[PATCH
> 0/3] hw/block/nvme: bump to v1.4" has another Based-on tag that points
> to the dependency of that). The point was to have patchew nicely apply
> everything, but it broke midway...
> 
> As Philippe pointed out, all of the patch sets are integrated in the
> GitHub tree, applied to QEMU master.
> 
> >
> > I think that you have done a great job getting the NVMe
> > driver out of a frankenstate, and made it compliant with
> > a proper spec (NVMe 1.4).
> >
> > I'm also a big fan of the refactoring so that the driver
> > handles more than one namespace, and the new bus model.
> >
> 
> Well, thanks! :)
> 
> > I know that you first sent your
> > "nvme: support NVMe v1.3d, SGLs and multiple namespaces"
> > patch series July, last year.
> >
> > Looking at your outstanding patch series on patchwork:
> > https://patchwork.kernel.org/project/qemu-devel/list/?submitter=188679
> >
> > (Feel free to correct me if I have misunderstood anything.)
> >
> > I see that these are related to your patch series from July last year:
> > hw/block/nvme: bump to v1.3
> > hw/block/nvme: support scatter gather lists
> > hw/block/nvme: support multiple namespaces
> > hw/block/nvme: bump to v1.4
> >
> 
> Yeah this stuff has been around for a while so the history on patchwork
> is a mess.
> 
> >
> > This patch series seems minor and could probably be merged immediately:
> > hw/block/nvme: handle transient dma errors
> >
> 
> Sure, but it's nicer in combination with the previous series
> ("hw/block/nvme: AIO and address mapping refactoring"). What I /can/ do
> is rip out "hw/block/nvme: allow multiple aios per command" as that
> patch might require more time for reviews. The rest of that series are
> clean ups and a couple of bug fixes.
> 
> >
> > This patch series looks a bit weird:
> > hw/block/nvme: AIO and address mapping refactoring
> >
> > Since it looks like a V1 post, and was first posted yesterday.
> > However, 2 out of the 17 patches in are Acked-by: Keith.
> > (Perhaps some of your previously posted patches was put inside
> > this new patch series?)
> >
> 
> Yes that and reviewers requested a lot of separation, so basically the
> patch set ballooned.
> 
> >
> > This patch series:
> > hw/block/nvme: namespace types and zoned namespaces
> >
> > Which was first posted today. Up until earlier today, we haven't seen
> > any patches from you related to ZNS (only overall NVMe cleanups).
> > Dmitry's ZNS patches have been on the list since 2020-06-16.
> >
> 
> Yeah, as I mentioned in my cover letter, I was on leave, so I wasn't
> around for the big ZNS release day either. But, honestly, I think this
> is irrelevant - code should be merged based on technical reasons (not
> technicalities).
> 
> >
> > Just a friendly suggestion, how about:
> >
> > 1) We get your
> >
> > hw/block/nvme: bump to v1.3
> > hw/block/nvme: support scatter gather lists
> > hw/block/nvme: support multiple namespaces
> > hw/block/nvme: bump to v1.4
> >
> > patch series merged.
> >
> 
> Blowing my own horn here, but yeah, it seems like everyone would like to
> see this merged.
> 
> > 2) We get Dmitry's patch series merged.
> >
> > Shared 4:th) If there is any feature that you miss in Dmitry's patch series,
> > perhaps you could send patches to add what you are missing.
> >
> 
> Looks like the two version are pretty much on par in terms of features.
> 
> > Shared 4:th) Your other patch series:
> > hw/block/nvme: AIO and address mapping refactoring could get merged.
> >
> 
> As stated above I think its only a single commit ("hw/block/nvme: allow
> multiple aios per command") that is controversial in that series.
> 
> >
> > Please don't take this suggestion the wrong way, I'm simply trying
> > to come up with a way to move forward from here.
> >
> 
> Absolutely - I totally get that you want to move forward with Dmitry's
> series, but I'd like to finish my review before committing to anything.
> 

Klaus,

I see that ZNS series from WDC is pretty much orthogonal to most of your
patches from "bump to 1.3/bump to 1.4" series... with a few notable
exceptions - I had to add support for Get Log Page command, Command
Effects Log and AERs to fulfill ZNS protocol requirements. I understand that
you've had that functionality already staged in your "bump to 1.3"
patchset and beyond and I don't insist that the implementations of these
features from our ZNS series are necessarily to be used. I think Niklas' plan
appears to be quite productive - for us to rebase on top of "bump to 1.3..1.4"
and review these patches in the process, apply our series and then
incorporate some of the ideas from your ZNS/NSTypes patchset on top of that.
This way, very little work will go to waste.

Cheers,
Dmitry

> 
> Cheers,
> Klaus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces)
  2020-06-30 20:36       ` Klaus Jensen
@ 2020-07-01 10:34         ` Kevin Wolf
  2020-07-01 13:18           ` Klaus Jensen
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2020-07-01 10:34 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: Niklas Cassel, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Philippe Mathieu-Daudé,
	Matias Bjorling

Am 30.06.2020 um 22:36 hat Klaus Jensen geschrieben:
> On Jun 30 08:42, Keith Busch wrote:
> > On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
> > > What I see doable for the following days is:
> > > - hw/block/nvme: Fix I/O BAR structure [3]
> > > - hw/block/nvme: handle transient dma errors
> > > - hw/block/nvme: bump to v1.3
> > 
> > 
> > These look like sensible patches to rebase future work on, IMO. The 1.3
> > updates had been prepared a while ago, at least.
> 
> I think Philippe's "hw/block/nvme: Fix I/O BAR structure" series is a
> no-brainer. It just needs to get in asap.

I think we need to talk about how nvme patches are supposed to get
merged. I'm not familiar with the hardware nor the code, so the model
was that I just blindly merge patches that Keith has reviewed/acked,
just to spare him the work to prepare a pull request. But obviously, we
started doing things this way when there was a lot less activity around
the nvme emulation.

If we find that this doesn't scale any more, maybe we need to change
something. Depending on how much time Keith can spend on review in the
near future and how much control he wants to keep over the development,
I could imagine adding Klaus to MAINTAINERS, either as a co-maintainer
or as a reviewer. Then I could rely on reviews/acks from either of you
for merging series.

Of course, the patches don't necessarily have to go through my tree
either if this only serves to complicate things these days. If sending
separate pull requests directly to Peter would make things easier, I
certainly wouldn't object.

Kevin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces)
  2020-07-01 10:34         ` nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces) Kevin Wolf
@ 2020-07-01 13:18           ` Klaus Jensen
  2020-07-01 13:29             ` Maxim Levitsky
  2020-07-01 13:57             ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 24+ messages in thread
From: Klaus Jensen @ 2020-07-01 13:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Niklas Cassel, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Andrzej Jakowski,
	Keith Busch, Javier Gonzalez, Maxim Levitsky,
	Philippe Mathieu-Daudé,
	Matias Bjorling

On Jul  1 12:34, Kevin Wolf wrote:
> Am 30.06.2020 um 22:36 hat Klaus Jensen geschrieben:
> > On Jun 30 08:42, Keith Busch wrote:
> > > On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
> > > > What I see doable for the following days is:
> > > > - hw/block/nvme: Fix I/O BAR structure [3]
> > > > - hw/block/nvme: handle transient dma errors
> > > > - hw/block/nvme: bump to v1.3
> > > 
> > > 
> > > These look like sensible patches to rebase future work on, IMO. The 1.3
> > > updates had been prepared a while ago, at least.
> > 
> > I think Philippe's "hw/block/nvme: Fix I/O BAR structure" series is a
> > no-brainer. It just needs to get in asap.
> 
> I think we need to talk about how nvme patches are supposed to get
> merged. I'm not familiar with the hardware nor the code, so the model
> was that I just blindly merge patches that Keith has reviewed/acked,
> just to spare him the work to prepare a pull request. But obviously, we
> started doing things this way when there was a lot less activity around
> the nvme emulation.
> 
> If we find that this doesn't scale any more, maybe we need to change
> something.

Honestly, I do not think the current model has worked very well for some
time; especially for larger series where I, for one, has felt that my
work was largely ignored due to a lack of designated reviewers. Things
only picked up when Beata, Maxim and Philippe started reviewing my
series - maybe out of pity or because I was bombing the list, I don't
know ;)

We've also seen good patches from Andrzej linger on the list for quite a
while, prompting a number of RESENDs. I only recently allocated more
time and upped my review game, but I hope that contributors feel that
stuff gets reviewed in a timely fashion by now.

Please understand that this is in NO WAY a criticism of Keith who
already made it very clear to me that he did not have a lot time to
review, but only ack the odd patch.

> Depending on how much time Keith can spend on review in the
> near future and how much control he wants to keep over the development,
> I could imagine adding Klaus to MAINTAINERS, either as a co-maintainer
> or as a reviewer. Then I could rely on reviews/acks from either of you
> for merging series.
> 

I would be happy to step up (officially) to help maintain the device
with Keith and review on a daily basis, and my position can support
this.

> Of course, the patches don't necessarily have to go through my tree
> either if this only serves to complicate things these days. If sending
> separate pull requests directly to Peter would make things easier, I
> certainly wouldn't object.
> 

I don't think there is any reason to by-pass your tree. I think the
volume would need to increase even further for that to make sense.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces)
  2020-07-01 13:18           ` Klaus Jensen
@ 2020-07-01 13:29             ` Maxim Levitsky
  2020-07-01 13:57             ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 24+ messages in thread
From: Maxim Levitsky @ 2020-07-01 13:29 UTC (permalink / raw)
  To: Klaus Jensen, Kevin Wolf
  Cc: Niklas Cassel, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Andrzej Jakowski,
	Keith Busch, Javier Gonzalez, Philippe Mathieu-Daudé,
	Matias Bjorling

On Wed, 2020-07-01 at 15:18 +0200, Klaus Jensen wrote:
> On Jul  1 12:34, Kevin Wolf wrote:
> > Am 30.06.2020 um 22:36 hat Klaus Jensen geschrieben:
> > > On Jun 30 08:42, Keith Busch wrote:
> > > > On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
> > > > > What I see doable for the following days is:
> > > > > - hw/block/nvme: Fix I/O BAR structure [3]
> > > > > - hw/block/nvme: handle transient dma errors
> > > > > - hw/block/nvme: bump to v1.3
> > > > 
> > > > These look like sensible patches to rebase future work on, IMO. The 1.3
> > > > updates had been prepared a while ago, at least.
> > > 
> > > I think Philippe's "hw/block/nvme: Fix I/O BAR structure" series is a
> > > no-brainer. It just needs to get in asap.
> > 
> > I think we need to talk about how nvme patches are supposed to get
> > merged. I'm not familiar with the hardware nor the code, so the model
> > was that I just blindly merge patches that Keith has reviewed/acked,
> > just to spare him the work to prepare a pull request. But obviously, we
> > started doing things this way when there was a lot less activity around
> > the nvme emulation.
> > 
> > If we find that this doesn't scale any more, maybe we need to change
> > something.
> 
> Honestly, I do not think the current model has worked very well for some
> time; especially for larger series where I, for one, has felt that my
> work was largely ignored due to a lack of designated reviewers. Things
> only picked up when Beata, Maxim and Philippe started reviewing my
> series - maybe out of pity or because I was bombing the list, I don't
> know ;)
> 
> We've also seen good patches from Andrzej linger on the list for quite a
> while, prompting a number of RESENDs. I only recently allocated more
> time and upped my review game, but I hope that contributors feel that
> stuff gets reviewed in a timely fashion by now.
> 
> Please understand that this is in NO WAY a criticism of Keith who
> already made it very clear to me that he did not have a lot time to
> review, but only ack the odd patch.
> 
> > Depending on how much time Keith can spend on review in the
> > near future and how much control he wants to keep over the development,
> > I could imagine adding Klaus to MAINTAINERS, either as a co-maintainer
> > or as a reviewer. Then I could rely on reviews/acks from either of you
> > for merging series.
> > 
> 
> I would be happy to step up (officially) to help maintain the device
> with Keith and review on a daily basis, and my position can support
> this.
> 
> > Of course, the patches don't necessarily have to go through my tree
> > either if this only serves to complicate things these days. If sending
> > separate pull requests directly to Peter would make things easier, I
> > certainly wouldn't object.
> > 
> 
> I don't think there is any reason to by-pass your tree. I think the
> volume would need to increase even further for that to make sense.
> 
It my fault as well - I need to get back to reviewing these.
(I'll review few of them today I hope)

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces)
  2020-07-01 13:18           ` Klaus Jensen
  2020-07-01 13:29             ` Maxim Levitsky
@ 2020-07-01 13:57             ` Philippe Mathieu-Daudé
  2020-07-01 14:21               ` Keith Busch
  2020-07-02 20:29               ` nvme emulation merge process Andrzej Jakowski
  1 sibling, 2 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-01 13:57 UTC (permalink / raw)
  To: Klaus Jensen, Kevin Wolf
  Cc: Niklas Cassel, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Andrzej Jakowski,
	Keith Busch, Javier Gonzalez, Maxim Levitsky, Matias Bjorling

On 7/1/20 3:18 PM, Klaus Jensen wrote:
> On Jul  1 12:34, Kevin Wolf wrote:
>> Am 30.06.2020 um 22:36 hat Klaus Jensen geschrieben:
>>> On Jun 30 08:42, Keith Busch wrote:
>>>> On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
>>>>> What I see doable for the following days is:
>>>>> - hw/block/nvme: Fix I/O BAR structure [3]
>>>>> - hw/block/nvme: handle transient dma errors
>>>>> - hw/block/nvme: bump to v1.3
>>>>
>>>>
>>>> These look like sensible patches to rebase future work on, IMO. The 1.3
>>>> updates had been prepared a while ago, at least.
>>>
>>> I think Philippe's "hw/block/nvme: Fix I/O BAR structure" series is a
>>> no-brainer. It just needs to get in asap.
>>
>> I think we need to talk about how nvme patches are supposed to get
>> merged. I'm not familiar with the hardware nor the code, so the model
>> was that I just blindly merge patches that Keith has reviewed/acked,
>> just to spare him the work to prepare a pull request. But obviously, we
>> started doing things this way when there was a lot less activity around
>> the nvme emulation.
>>
>> If we find that this doesn't scale any more, maybe we need to change
>> something.
> 
> Honestly, I do not think the current model has worked very well for some
> time; especially for larger series where I, for one, has felt that my
> work was largely ignored due to a lack of designated reviewers. Things
> only picked up when Beata, Maxim and Philippe started reviewing my
> series - maybe out of pity or because I was bombing the list, I don't
> know ;)

I have no interest in the NVMe device emulation, but one of the first
thing I notice when I look at the wiki the time I wanted to send my
first patch, is the "Return the favor" paragraph:
https://wiki.qemu.org/Contribute/SubmitAPatch#Return_the_favor

 "Peer review only works if everyone chips in a bit of review time.
  If everyone submitted more patches than they reviewed, we would
  have a patch backlog. A good goal is to try to review at least as
  many patches from others as what you submit. Don't worry if you
  don't know the code base as well as a maintainer; it's perfectly
  fine to admit when your review is weak because you are unfamiliar
  with the code."

So as some reviewed my patches, I try to return the favor to the
community, in particular when I see someone is stuck waiting for
review, and the patch topic is some area I can understand.

I don't see that as an "out of pity" reaction.

Note, it is true bomb series scares reviewers. You learned it the
bad way. But you can see, after resending the first part of your
"bomb", even if it took 10 versions, the result is a great
improvement!

> We've also seen good patches from Andrzej linger on the list for quite a
> while, prompting a number of RESENDs. I only recently allocated more
> time and upped my review game, but I hope that contributors feel that
> stuff gets reviewed in a timely fashion by now.
> 
> Please understand that this is in NO WAY a criticism of Keith who
> already made it very clear to me that he did not have a lot time to
> review, but only ack the odd patch.
> 
>> Depending on how much time Keith can spend on review in the
>> near future and how much control he wants to keep over the development,
>> I could imagine adding Klaus to MAINTAINERS, either as a co-maintainer
>> or as a reviewer. Then I could rely on reviews/acks from either of you
>> for merging series.
>>
> 
> I would be happy to step up (officially) to help maintain the device
> with Keith and review on a daily basis, and my position can support
> this.

Sounds good to me, but it is up to Keith Busch to accept.

It would be nice to have at least one developer from WDC listed as
designated reviewer too.

Maxim is candidate for designated reviewer but I think he doesn't
have the time.

It would also nice to have Andrzej Jakowski listed, if he is interested.

> 
>> Of course, the patches don't necessarily have to go through my tree
>> either if this only serves to complicate things these days. If sending
>> separate pull requests directly to Peter would make things easier, I
>> certainly wouldn't object.
>>
> 
> I don't think there is any reason to by-pass your tree. I think the
> volume would need to increase even further for that to make sense.
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces)
  2020-07-01 13:57             ` Philippe Mathieu-Daudé
@ 2020-07-01 14:21               ` Keith Busch
  2020-07-02 20:29               ` nvme emulation merge process Andrzej Jakowski
  1 sibling, 0 replies; 24+ messages in thread
From: Keith Busch @ 2020-07-01 14:21 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, qemu-block,
	Dmitry Fomichev, Klaus Jensen, qemu-devel, Max Reitz,
	Andrzej Jakowski, Klaus Jensen, Javier Gonzalez, Maxim Levitsky,
	Matias Bjorling

On Wed, Jul 01, 2020 at 03:57:27PM +0200, Philippe Mathieu-Daudé wrote:
> On 7/1/20 3:18 PM, Klaus Jensen wrote:
> > We've also seen good patches from Andrzej linger on the list for quite a
> > while, prompting a number of RESENDs. I only recently allocated more
> > time and upped my review game, but I hope that contributors feel that
> > stuff gets reviewed in a timely fashion by now.
> > 
> > Please understand that this is in NO WAY a criticism of Keith who
> > already made it very clear to me that he did not have a lot time to
> > review, but only ack the odd patch.
> > 
> >> Depending on how much time Keith can spend on review in the
> >> near future and how much control he wants to keep over the development,
> >> I could imagine adding Klaus to MAINTAINERS, either as a co-maintainer
> >> or as a reviewer. Then I could rely on reviews/acks from either of you
> >> for merging series.
> >>
> > 
> > I would be happy to step up (officially) to help maintain the device
> > with Keith and review on a daily basis, and my position can support
> > this.
> 
> Sounds good to me, but it is up to Keith Busch to accept.

I definitely want to continue at least having the opprotunity to review,
though you may have noticed I am a bit low on time for more thorough
maintenance this project deserves. The recent development pace for nvme
would benefit from having its own tree, so I'm open to either
co-maintenance, or handing off this to others. Please allow me to send a
few queries off-list today to check-in with potentially interested parties.
 
> It would be nice to have at least one developer from WDC listed as
> designated reviewer too.
> 
> Maxim is candidate for designated reviewer but I think he doesn't
> have the time.
> 
> It would also nice to have Andrzej Jakowski listed, if he is interested.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nvme emulation merge process
  2020-07-01 13:57             ` Philippe Mathieu-Daudé
  2020-07-01 14:21               ` Keith Busch
@ 2020-07-02 20:29               ` Andrzej Jakowski
  2020-07-02 21:13                 ` Keith Busch
  1 sibling, 1 reply; 24+ messages in thread
From: Andrzej Jakowski @ 2020-07-02 20:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Klaus Jensen, Kevin Wolf
  Cc: Niklas Cassel, Damien Le Moal, qemu-block, Dmitry Fomichev,
	Klaus Jensen, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez, Maxim Levitsky, Matias Bjorling

On 7/1/20 6:57 AM, Philippe Mathieu-Daudé wrote:
> On 7/1/20 3:18 PM, Klaus Jensen wrote:
>> On Jul  1 12:34, Kevin Wolf wrote:
>>> Am 30.06.2020 um 22:36 hat Klaus Jensen geschrieben:
>>>> On Jun 30 08:42, Keith Busch wrote:
>>>>> On Tue, Jun 30, 2020 at 04:09:46PM +0200, Philippe Mathieu-Daudé wrote:
>>>>>> What I see doable for the following days is:
>>>>>> - hw/block/nvme: Fix I/O BAR structure [3]
>>>>>> - hw/block/nvme: handle transient dma errors
>>>>>> - hw/block/nvme: bump to v1.3
>>>>>
>>>>>
>>>>> These look like sensible patches to rebase future work on, IMO. The 1.3
>>>>> updates had been prepared a while ago, at least.
>>>>
>>>> I think Philippe's "hw/block/nvme: Fix I/O BAR structure" series is a
>>>> no-brainer. It just needs to get in asap.
>>>
>>> I think we need to talk about how nvme patches are supposed to get
>>> merged. I'm not familiar with the hardware nor the code, so the model
>>> was that I just blindly merge patches that Keith has reviewed/acked,
>>> just to spare him the work to prepare a pull request. But obviously, we
>>> started doing things this way when there was a lot less activity around
>>> the nvme emulation.
>>>
>>> If we find that this doesn't scale any more, maybe we need to change
>>> something.
>>
>> Honestly, I do not think the current model has worked very well for some
>> time; especially for larger series where I, for one, has felt that my
>> work was largely ignored due to a lack of designated reviewers. Things
>> only picked up when Beata, Maxim and Philippe started reviewing my
>> series - maybe out of pity or because I was bombing the list, I don't
>> know ;)
> 
> I have no interest in the NVMe device emulation, but one of the first
> thing I notice when I look at the wiki the time I wanted to send my
> first patch, is the "Return the favor" paragraph:
> https://wiki.qemu.org/Contribute/SubmitAPatch#Return_the_favor
> 
>  "Peer review only works if everyone chips in a bit of review time.
>   If everyone submitted more patches than they reviewed, we would
>   have a patch backlog. A good goal is to try to review at least as
>   many patches from others as what you submit. Don't worry if you
>   don't know the code base as well as a maintainer; it's perfectly
>   fine to admit when your review is weak because you are unfamiliar
>   with the code."
> 
> So as some reviewed my patches, I try to return the favor to the
> community, in particular when I see someone is stuck waiting for
> review, and the patch topic is some area I can understand.
> 
> I don't see that as an "out of pity" reaction.
> 
> Note, it is true bomb series scares reviewers. You learned it the
> bad way. But you can see, after resending the first part of your
> "bomb", even if it took 10 versions, the result is a great
> improvement!
> 
>> We've also seen good patches from Andrzej linger on the list for quite a
>> while, prompting a number of RESENDs. I only recently allocated more
>> time and upped my review game, but I hope that contributors feel that
>> stuff gets reviewed in a timely fashion by now.
>>
>> Please understand that this is in NO WAY a criticism of Keith who
>> already made it very clear to me that he did not have a lot time to
>> review, but only ack the odd patch.
>>
>>> Depending on how much time Keith can spend on review in the
>>> near future and how much control he wants to keep over the development,
>>> I could imagine adding Klaus to MAINTAINERS, either as a co-maintainer
>>> or as a reviewer. Then I could rely on reviews/acks from either of you
>>> for merging series.
>>>
>>
>> I would be happy to step up (officially) to help maintain the device
>> with Keith and review on a daily basis, and my position can support
>> this.
> 
> Sounds good to me, but it is up to Keith Busch to accept.
> 
> It would be nice to have at least one developer from WDC listed as
> designated reviewer too.
> 
> Maxim is candidate for designated reviewer but I think he doesn't
> have the time.
> 
> It would also nice to have Andrzej Jakowski listed, if he is interested.

Thx! Of course I am interested in helping and I think it is actually great 
idea to have couple of designated maintainers/reviewers as it would be easier
for folks to receive feedback vs requesting it in polling manner :)
And please don't get me wrong -- I'm not complaining about anything -- I
think it is just reality that everybody is stretched out into multiple directions
struggling to allocate time for multiple things. Having many people will
actually increase likelihood of introducing high quality improvements.

Also, +1 on separate tree for nvme emulation.

> 
>>
>>> Of course, the patches don't necessarily have to go through my tree
>>> either if this only serves to complicate things these days. If sending
>>> separate pull requests directly to Peter would make things easier, I
>>> certainly wouldn't object.
>>>
>>
>> I don't think there is any reason to by-pass your tree. I think the
>> volume would need to increase even further for that to make sense.
>>
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nvme emulation merge process
  2020-07-02 20:29               ` nvme emulation merge process Andrzej Jakowski
@ 2020-07-02 21:13                 ` Keith Busch
  0 siblings, 0 replies; 24+ messages in thread
From: Keith Busch @ 2020-07-02 21:13 UTC (permalink / raw)
  To: Andrzej Jakowski
  Cc: Kevin Wolf, Niklas Cassel, Damien Le Moal, qemu-block,
	Dmitry Fomichev, Klaus Jensen, qemu-devel, Max Reitz,
	Klaus Jensen, Javier Gonzalez, Maxim Levitsky,
	Philippe Mathieu-Daudé,
	Matias Bjorling

On Thu, Jul 02, 2020 at 01:29:26PM -0700, Andrzej Jakowski wrote:
> 
> Thx! Of course I am interested in helping and I think it is actually great 
> idea to have couple of designated maintainers/reviewers as it would be easier
> for folks to receive feedback vs requesting it in polling manner :)
> And please don't get me wrong -- I'm not complaining about anything -- I
> think it is just reality that everybody is stretched out into multiple directions
> struggling to allocate time for multiple things. Having many people will
> actually increase likelihood of introducing high quality improvements.
> 
> Also, +1 on separate tree for nvme emulation.

Thanks for your help.

Klaus and I will be setting up an external tree for qemu-nvme
development (tentatively on git.infradead.org) and pull-request. I'm
just waiting for the server admin to upload our public keys. If I don't
hear back by Monday, I will use an alternate server in the interim.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-07-02 21:14 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30 10:01 [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
2020-06-30 10:01 ` [PATCH 01/10] hw/block/nvme: support I/O Command Sets Klaus Jensen
2020-06-30 10:01 ` [PATCH 02/10] hw/block/nvme: add zns specific fields and types Klaus Jensen
2020-06-30 10:01 ` [PATCH 03/10] hw/block/nvme: add basic read/write for zoned namespaces Klaus Jensen
2020-06-30 10:01 ` [PATCH 04/10] hw/block/nvme: add the zone management receive command Klaus Jensen
2020-06-30 10:01 ` [PATCH 05/10] hw/block/nvme: add the zone management send command Klaus Jensen
2020-06-30 10:01 ` [PATCH 06/10] hw/block/nvme: add the zone append command Klaus Jensen
2020-06-30 10:01 ` [PATCH 07/10] hw/block/nvme: track and enforce zone resources Klaus Jensen
2020-06-30 10:01 ` [PATCH 08/10] hw/block/nvme: allow open to close transitions by controller Klaus Jensen
2020-06-30 10:01 ` [PATCH 09/10] hw/block/nvme: allow zone excursions Klaus Jensen
2020-06-30 10:01 ` [PATCH 10/10] hw/block/nvme: support reset/finish recommended limits Klaus Jensen
2020-06-30 12:59 ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Niklas Cassel
2020-06-30 14:09   ` Philippe Mathieu-Daudé
2020-06-30 15:42     ` Keith Busch
2020-06-30 20:36       ` Klaus Jensen
2020-07-01 10:34         ` nvme emulation merge process (was: Re: [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces) Kevin Wolf
2020-07-01 13:18           ` Klaus Jensen
2020-07-01 13:29             ` Maxim Levitsky
2020-07-01 13:57             ` Philippe Mathieu-Daudé
2020-07-01 14:21               ` Keith Busch
2020-07-02 20:29               ` nvme emulation merge process Andrzej Jakowski
2020-07-02 21:13                 ` Keith Busch
2020-06-30 20:29   ` [PATCH 00/10] hw/block/nvme: namespace types and zoned namespaces Klaus Jensen
2020-07-01  1:10     ` Dmitry Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).