qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces
@ 2020-03-16 14:28 Klaus Jensen
  2020-03-16 14:28 ` [PATCH v6 01/42] nvme: rename trace events to nvme_dev Klaus Jensen
                   ` (43 more replies)
  0 siblings, 44 replies; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Hi,

So this patchset kinda blew up in size (wrt. number of patches) after
Maxim's comments (26 -> 42), but Maxim's comments about splitting up a
bunch of the patches made a lot of sense.

v6 primarily splits up the big nasty patches into more digestible parts.
Specifically the 'nvme: refactor prp mapping' and 'nvme: allow multiple
aios per command' patches has been split up according to Maxim's
comments. Most additions to the shared include/block/nvme.h has also
been consolidated into a single patch (also according to Maxim's
comments). A lot of the patches still carries a 'Reviewed-By', but
git-backport-diff reports some changes due to changes/additions in some
of the early patches.

The only real "addition" is a new "max_ioqpairs" parameter for the
device. This is to fix some confusion about the current "num_queues"
parameter. See "nvme: add max_ioqpairs device parameter".

Maxim, I responded to your comments in the original thread and I believe
that all your comments has been adressed.

Also, I *did* change the line indentation style - I hope I caught 'em
all :)


Klaus Jensen (42):
  nvme: rename trace events to nvme_dev
  nvme: remove superfluous breaks
  nvme: move device parameters to separate struct
  nvme: bump spec data structures to v1.3
  nvme: use constant for identify data size
  nvme: add identify cns values in header
  nvme: refactor nvme_addr_read
  nvme: add support for the abort command
  nvme: add max_ioqpairs device parameter
  nvme: refactor device realization
  nvme: add temperature threshold feature
  nvme: add support for the get log page command
  nvme: add support for the asynchronous event request command
  nvme: add missing mandatory features
  nvme: additional tracing
  nvme: make sure ncqr and nsqr is valid
  nvme: add log specific field to trace events
  nvme: support identify namespace descriptor list
  nvme: enforce valid queue creation sequence
  nvme: provide the mandatory subnqn field
  nvme: bump supported version to v1.3
  nvme: memset preallocated requests structures
  nvme: add mapping helpers
  nvme: remove redundant has_sg member
  nvme: refactor dma read/write
  nvme: pass request along for tracing
  nvme: add request mapping helper
  nvme: verify validity of prp lists in the cmb
  nvme: refactor request bounds checking
  nvme: add check for mdts
  nvme: add check for prinfo
  nvme: allow multiple aios per command
  nvme: use preallocated qsg/iov in nvme_dma_prp
  pci: pass along the return value of dma_memory_rw
  nvme: handle dma errors
  nvme: add support for scatter gather lists
  nvme: refactor identify active namespace id list
  nvme: support multiple namespaces
  pci: allocate pci id for nvme
  nvme: change controller pci id
  nvme: remove redundant NvmeCmd pointer parameter
  nvme: make lba data size configurable

 MAINTAINERS            |    1 +
 block/nvme.c           |   18 +-
 docs/specs/nvme.txt    |   25 +
 docs/specs/pci-ids.txt |    1 +
 hw/block/Makefile.objs |    2 +-
 hw/block/nvme-ns.c     |  162 ++++
 hw/block/nvme-ns.h     |   62 ++
 hw/block/nvme.c        | 2041 ++++++++++++++++++++++++++++++++--------
 hw/block/nvme.h        |  205 +++-
 hw/block/trace-events  |  206 ++--
 hw/core/machine.c      |    1 +
 include/block/nvme.h   |  178 +++-
 include/hw/pci/pci.h   |    4 +-
 13 files changed, 2347 insertions(+), 559 deletions(-)
 create mode 100644 docs/specs/nvme.txt
 create mode 100644 hw/block/nvme-ns.c
 create mode 100644 hw/block/nvme-ns.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v6 01/42] nvme: rename trace events to nvme_dev
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:36   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 02/42] nvme: remove superfluous breaks Klaus Jensen
                   ` (42 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Change the prefix of all nvme device related trace events to 'nvme_dev'
to not clash with trace events from the nvme block driver.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c       | 188 +++++++++++++++++++++---------------------
 hw/block/trace-events | 172 +++++++++++++++++++-------------------
 2 files changed, 180 insertions(+), 180 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf377..3e4b18956ed2 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -112,16 +112,16 @@ static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 {
     if (cq->irq_enabled) {
         if (msix_enabled(&(n->parent_obj))) {
-            trace_nvme_irq_msix(cq->vector);
+            trace_nvme_dev_irq_msix(cq->vector);
             msix_notify(&(n->parent_obj), cq->vector);
         } else {
-            trace_nvme_irq_pin();
+            trace_nvme_dev_irq_pin();
             assert(cq->cqid < 64);
             n->irq_status |= 1 << cq->cqid;
             nvme_irq_check(n);
         }
     } else {
-        trace_nvme_irq_masked();
+        trace_nvme_dev_irq_masked();
     }
 }
 
@@ -146,7 +146,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
     int num_prps = (len >> n->page_bits) + 1;
 
     if (unlikely(!prp1)) {
-        trace_nvme_err_invalid_prp();
+        trace_nvme_dev_err_invalid_prp();
         return NVME_INVALID_FIELD | NVME_DNR;
     } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
                prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
@@ -160,7 +160,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
     len -= trans_len;
     if (len) {
         if (unlikely(!prp2)) {
-            trace_nvme_err_invalid_prp2_missing();
+            trace_nvme_dev_err_invalid_prp2_missing();
             goto unmap;
         }
         if (len > n->page_size) {
@@ -176,7 +176,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
 
                 if (i == n->max_prp_ents - 1 && len > n->page_size) {
                     if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
-                        trace_nvme_err_invalid_prplist_ent(prp_ent);
+                        trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
                         goto unmap;
                     }
 
@@ -189,7 +189,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
                 }
 
                 if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
-                    trace_nvme_err_invalid_prplist_ent(prp_ent);
+                    trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
                     goto unmap;
                 }
 
@@ -204,7 +204,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
             }
         } else {
             if (unlikely(prp2 & (n->page_size - 1))) {
-                trace_nvme_err_invalid_prp2_align(prp2);
+                trace_nvme_dev_err_invalid_prp2_align(prp2);
                 goto unmap;
             }
             if (qsg->nsg) {
@@ -252,20 +252,20 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
     QEMUIOVector iov;
     uint16_t status = NVME_SUCCESS;
 
-    trace_nvme_dma_read(prp1, prp2);
+    trace_nvme_dev_dma_read(prp1, prp2);
 
     if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
     if (qsg.nsg > 0) {
         if (unlikely(dma_buf_read(ptr, len, &qsg))) {
-            trace_nvme_err_invalid_dma();
+            trace_nvme_dev_err_invalid_dma();
             status = NVME_INVALID_FIELD | NVME_DNR;
         }
         qemu_sglist_destroy(&qsg);
     } else {
         if (unlikely(qemu_iovec_from_buf(&iov, 0, ptr, len) != len)) {
-            trace_nvme_err_invalid_dma();
+            trace_nvme_dev_err_invalid_dma();
             status = NVME_INVALID_FIELD | NVME_DNR;
         }
         qemu_iovec_destroy(&iov);
@@ -354,7 +354,7 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     uint32_t count = nlb << data_shift;
 
     if (unlikely(slba + nlb > ns->id_ns.nsze)) {
-        trace_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
+        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
         return NVME_LBA_RANGE | NVME_DNR;
     }
 
@@ -382,11 +382,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
     enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 
-    trace_nvme_rw(is_write ? "write" : "read", nlb, data_size, slba);
+    trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
 
     if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
         block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-        trace_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
+        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
         return NVME_LBA_RANGE | NVME_DNR;
     }
 
@@ -421,7 +421,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     uint32_t nsid = le32_to_cpu(cmd->nsid);
 
     if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
-        trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
+        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
@@ -435,7 +435,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     case NVME_CMD_READ:
         return nvme_rw(n, ns, cmd, req);
     default:
-        trace_nvme_err_invalid_opc(cmd->opcode);
+        trace_nvme_dev_err_invalid_opc(cmd->opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
     }
 }
@@ -460,11 +460,11 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
     uint16_t qid = le16_to_cpu(c->qid);
 
     if (unlikely(!qid || nvme_check_sqid(n, qid))) {
-        trace_nvme_err_invalid_del_sq(qid);
+        trace_nvme_dev_err_invalid_del_sq(qid);
         return NVME_INVALID_QID | NVME_DNR;
     }
 
-    trace_nvme_del_sq(qid);
+    trace_nvme_dev_del_sq(qid);
 
     sq = n->sq[qid];
     while (!QTAILQ_EMPTY(&sq->out_req_list)) {
@@ -528,26 +528,26 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
     uint16_t qflags = le16_to_cpu(c->sq_flags);
     uint64_t prp1 = le64_to_cpu(c->prp1);
 
-    trace_nvme_create_sq(prp1, sqid, cqid, qsize, qflags);
+    trace_nvme_dev_create_sq(prp1, sqid, cqid, qsize, qflags);
 
     if (unlikely(!cqid || nvme_check_cqid(n, cqid))) {
-        trace_nvme_err_invalid_create_sq_cqid(cqid);
+        trace_nvme_dev_err_invalid_create_sq_cqid(cqid);
         return NVME_INVALID_CQID | NVME_DNR;
     }
     if (unlikely(!sqid || !nvme_check_sqid(n, sqid))) {
-        trace_nvme_err_invalid_create_sq_sqid(sqid);
+        trace_nvme_dev_err_invalid_create_sq_sqid(sqid);
         return NVME_INVALID_QID | NVME_DNR;
     }
     if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
-        trace_nvme_err_invalid_create_sq_size(qsize);
+        trace_nvme_dev_err_invalid_create_sq_size(qsize);
         return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
     }
     if (unlikely(!prp1 || prp1 & (n->page_size - 1))) {
-        trace_nvme_err_invalid_create_sq_addr(prp1);
+        trace_nvme_dev_err_invalid_create_sq_addr(prp1);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
     if (unlikely(!(NVME_SQ_FLAGS_PC(qflags)))) {
-        trace_nvme_err_invalid_create_sq_qflags(NVME_SQ_FLAGS_PC(qflags));
+        trace_nvme_dev_err_invalid_create_sq_qflags(NVME_SQ_FLAGS_PC(qflags));
         return NVME_INVALID_FIELD | NVME_DNR;
     }
     sq = g_malloc0(sizeof(*sq));
@@ -573,17 +573,17 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd)
     uint16_t qid = le16_to_cpu(c->qid);
 
     if (unlikely(!qid || nvme_check_cqid(n, qid))) {
-        trace_nvme_err_invalid_del_cq_cqid(qid);
+        trace_nvme_dev_err_invalid_del_cq_cqid(qid);
         return NVME_INVALID_CQID | NVME_DNR;
     }
 
     cq = n->cq[qid];
     if (unlikely(!QTAILQ_EMPTY(&cq->sq_list))) {
-        trace_nvme_err_invalid_del_cq_notempty(qid);
+        trace_nvme_dev_err_invalid_del_cq_notempty(qid);
         return NVME_INVALID_QUEUE_DEL;
     }
     nvme_irq_deassert(n, cq);
-    trace_nvme_del_cq(qid);
+    trace_nvme_dev_del_cq(qid);
     nvme_free_cq(cq, n);
     return NVME_SUCCESS;
 }
@@ -616,27 +616,27 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
     uint16_t qflags = le16_to_cpu(c->cq_flags);
     uint64_t prp1 = le64_to_cpu(c->prp1);
 
-    trace_nvme_create_cq(prp1, cqid, vector, qsize, qflags,
-                         NVME_CQ_FLAGS_IEN(qflags) != 0);
+    trace_nvme_dev_create_cq(prp1, cqid, vector, qsize, qflags,
+                             NVME_CQ_FLAGS_IEN(qflags) != 0);
 
     if (unlikely(!cqid || !nvme_check_cqid(n, cqid))) {
-        trace_nvme_err_invalid_create_cq_cqid(cqid);
+        trace_nvme_dev_err_invalid_create_cq_cqid(cqid);
         return NVME_INVALID_CQID | NVME_DNR;
     }
     if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
-        trace_nvme_err_invalid_create_cq_size(qsize);
+        trace_nvme_dev_err_invalid_create_cq_size(qsize);
         return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
     }
     if (unlikely(!prp1)) {
-        trace_nvme_err_invalid_create_cq_addr(prp1);
+        trace_nvme_dev_err_invalid_create_cq_addr(prp1);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
     if (unlikely(vector > n->num_queues)) {
-        trace_nvme_err_invalid_create_cq_vector(vector);
+        trace_nvme_dev_err_invalid_create_cq_vector(vector);
         return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
     }
     if (unlikely(!(NVME_CQ_FLAGS_PC(qflags)))) {
-        trace_nvme_err_invalid_create_cq_qflags(NVME_CQ_FLAGS_PC(qflags));
+        trace_nvme_dev_err_invalid_create_cq_qflags(NVME_CQ_FLAGS_PC(qflags));
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
@@ -651,7 +651,7 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
     uint64_t prp1 = le64_to_cpu(c->prp1);
     uint64_t prp2 = le64_to_cpu(c->prp2);
 
-    trace_nvme_identify_ctrl();
+    trace_nvme_dev_identify_ctrl();
 
     return nvme_dma_read_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
         prp1, prp2);
@@ -664,10 +664,10 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
     uint64_t prp1 = le64_to_cpu(c->prp1);
     uint64_t prp2 = le64_to_cpu(c->prp2);
 
-    trace_nvme_identify_ns(nsid);
+    trace_nvme_dev_identify_ns(nsid);
 
     if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
-        trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
+        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
@@ -687,7 +687,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
     uint16_t ret;
     int i, j = 0;
 
-    trace_nvme_identify_nslist(min_nsid);
+    trace_nvme_dev_identify_nslist(min_nsid);
 
     list = g_malloc0(data_len);
     for (i = 0; i < n->num_namespaces; i++) {
@@ -716,14 +716,14 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
     case 0x02:
         return nvme_identify_nslist(n, c);
     default:
-        trace_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns));
+        trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 }
 
 static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
 {
-    trace_nvme_setfeat_timestamp(ts);
+    trace_nvme_dev_setfeat_timestamp(ts);
 
     n->host_timestamp = le64_to_cpu(ts);
     n->timestamp_set_qemu_clock_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
@@ -756,7 +756,7 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
     /* If the host timestamp is non-zero, set the timestamp origin */
     ts.origin = n->host_timestamp ? 0x01 : 0x00;
 
-    trace_nvme_getfeat_timestamp(ts.all);
+    trace_nvme_dev_getfeat_timestamp(ts.all);
 
     return cpu_to_le64(ts.all);
 }
@@ -780,17 +780,17 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     switch (dw10) {
     case NVME_VOLATILE_WRITE_CACHE:
         result = blk_enable_write_cache(n->conf.blk);
-        trace_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
+        trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
         break;
     case NVME_NUMBER_OF_QUEUES:
         result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
-        trace_nvme_getfeat_numq(result);
+        trace_nvme_dev_getfeat_numq(result);
         break;
     case NVME_TIMESTAMP:
         return nvme_get_feature_timestamp(n, cmd);
         break;
     default:
-        trace_nvme_err_invalid_getfeat(dw10);
+        trace_nvme_dev_err_invalid_getfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
@@ -826,9 +826,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
         break;
     case NVME_NUMBER_OF_QUEUES:
-        trace_nvme_setfeat_numq((dw11 & 0xFFFF) + 1,
-                                ((dw11 >> 16) & 0xFFFF) + 1,
-                                n->num_queues - 1, n->num_queues - 1);
+        trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
+                                    ((dw11 >> 16) & 0xFFFF) + 1,
+                                    n->num_queues - 1, n->num_queues - 1);
         req->cqe.result =
             cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
         break;
@@ -838,7 +838,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         break;
 
     default:
-        trace_nvme_err_invalid_setfeat(dw10);
+        trace_nvme_dev_err_invalid_setfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
     return NVME_SUCCESS;
@@ -862,7 +862,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     case NVME_ADM_CMD_GET_FEATURES:
         return nvme_get_feature(n, cmd, req);
     default:
-        trace_nvme_err_invalid_admin_opc(cmd->opcode);
+        trace_nvme_dev_err_invalid_admin_opc(cmd->opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
     }
 }
@@ -925,77 +925,77 @@ static int nvme_start_ctrl(NvmeCtrl *n)
     uint32_t page_size = 1 << page_bits;
 
     if (unlikely(n->cq[0])) {
-        trace_nvme_err_startfail_cq();
+        trace_nvme_dev_err_startfail_cq();
         return -1;
     }
     if (unlikely(n->sq[0])) {
-        trace_nvme_err_startfail_sq();
+        trace_nvme_dev_err_startfail_sq();
         return -1;
     }
     if (unlikely(!n->bar.asq)) {
-        trace_nvme_err_startfail_nbarasq();
+        trace_nvme_dev_err_startfail_nbarasq();
         return -1;
     }
     if (unlikely(!n->bar.acq)) {
-        trace_nvme_err_startfail_nbaracq();
+        trace_nvme_dev_err_startfail_nbaracq();
         return -1;
     }
     if (unlikely(n->bar.asq & (page_size - 1))) {
-        trace_nvme_err_startfail_asq_misaligned(n->bar.asq);
+        trace_nvme_dev_err_startfail_asq_misaligned(n->bar.asq);
         return -1;
     }
     if (unlikely(n->bar.acq & (page_size - 1))) {
-        trace_nvme_err_startfail_acq_misaligned(n->bar.acq);
+        trace_nvme_dev_err_startfail_acq_misaligned(n->bar.acq);
         return -1;
     }
     if (unlikely(NVME_CC_MPS(n->bar.cc) <
                  NVME_CAP_MPSMIN(n->bar.cap))) {
-        trace_nvme_err_startfail_page_too_small(
+        trace_nvme_dev_err_startfail_page_too_small(
                     NVME_CC_MPS(n->bar.cc),
                     NVME_CAP_MPSMIN(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_MPS(n->bar.cc) >
                  NVME_CAP_MPSMAX(n->bar.cap))) {
-        trace_nvme_err_startfail_page_too_large(
+        trace_nvme_dev_err_startfail_page_too_large(
                     NVME_CC_MPS(n->bar.cc),
                     NVME_CAP_MPSMAX(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOCQES(n->bar.cc) <
                  NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
-        trace_nvme_err_startfail_cqent_too_small(
+        trace_nvme_dev_err_startfail_cqent_too_small(
                     NVME_CC_IOCQES(n->bar.cc),
                     NVME_CTRL_CQES_MIN(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOCQES(n->bar.cc) >
                  NVME_CTRL_CQES_MAX(n->id_ctrl.cqes))) {
-        trace_nvme_err_startfail_cqent_too_large(
+        trace_nvme_dev_err_startfail_cqent_too_large(
                     NVME_CC_IOCQES(n->bar.cc),
                     NVME_CTRL_CQES_MAX(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOSQES(n->bar.cc) <
                  NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
-        trace_nvme_err_startfail_sqent_too_small(
+        trace_nvme_dev_err_startfail_sqent_too_small(
                     NVME_CC_IOSQES(n->bar.cc),
                     NVME_CTRL_SQES_MIN(n->bar.cap));
         return -1;
     }
     if (unlikely(NVME_CC_IOSQES(n->bar.cc) >
                  NVME_CTRL_SQES_MAX(n->id_ctrl.sqes))) {
-        trace_nvme_err_startfail_sqent_too_large(
+        trace_nvme_dev_err_startfail_sqent_too_large(
                     NVME_CC_IOSQES(n->bar.cc),
                     NVME_CTRL_SQES_MAX(n->bar.cap));
         return -1;
     }
     if (unlikely(!NVME_AQA_ASQS(n->bar.aqa))) {
-        trace_nvme_err_startfail_asqent_sz_zero();
+        trace_nvme_dev_err_startfail_asqent_sz_zero();
         return -1;
     }
     if (unlikely(!NVME_AQA_ACQS(n->bar.aqa))) {
-        trace_nvme_err_startfail_acqent_sz_zero();
+        trace_nvme_dev_err_startfail_acqent_sz_zero();
         return -1;
     }
 
@@ -1018,14 +1018,14 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
     unsigned size)
 {
     if (unlikely(offset & (sizeof(uint32_t) - 1))) {
-        NVME_GUEST_ERR(nvme_ub_mmiowr_misaligned32,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_misaligned32,
                        "MMIO write not 32-bit aligned,"
                        " offset=0x%"PRIx64"", offset);
         /* should be ignored, fall through for now */
     }
 
     if (unlikely(size < sizeof(uint32_t))) {
-        NVME_GUEST_ERR(nvme_ub_mmiowr_toosmall,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_toosmall,
                        "MMIO write smaller than 32-bits,"
                        " offset=0x%"PRIx64", size=%u",
                        offset, size);
@@ -1035,32 +1035,32 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
     switch (offset) {
     case 0xc:   /* INTMS */
         if (unlikely(msix_enabled(&(n->parent_obj)))) {
-            NVME_GUEST_ERR(nvme_ub_mmiowr_intmask_with_msix,
+            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_intmask_with_msix,
                            "undefined access to interrupt mask set"
                            " when MSI-X is enabled");
             /* should be ignored, fall through for now */
         }
         n->bar.intms |= data & 0xffffffff;
         n->bar.intmc = n->bar.intms;
-        trace_nvme_mmio_intm_set(data & 0xffffffff,
+        trace_nvme_dev_mmio_intm_set(data & 0xffffffff,
                                  n->bar.intmc);
         nvme_irq_check(n);
         break;
     case 0x10:  /* INTMC */
         if (unlikely(msix_enabled(&(n->parent_obj)))) {
-            NVME_GUEST_ERR(nvme_ub_mmiowr_intmask_with_msix,
+            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_intmask_with_msix,
                            "undefined access to interrupt mask clr"
                            " when MSI-X is enabled");
             /* should be ignored, fall through for now */
         }
         n->bar.intms &= ~(data & 0xffffffff);
         n->bar.intmc = n->bar.intms;
-        trace_nvme_mmio_intm_clr(data & 0xffffffff,
+        trace_nvme_dev_mmio_intm_clr(data & 0xffffffff,
                                  n->bar.intmc);
         nvme_irq_check(n);
         break;
     case 0x14:  /* CC */
-        trace_nvme_mmio_cfg(data & 0xffffffff);
+        trace_nvme_dev_mmio_cfg(data & 0xffffffff);
         /* Windows first sends data, then sends enable bit */
         if (!NVME_CC_EN(data) && !NVME_CC_EN(n->bar.cc) &&
             !NVME_CC_SHN(data) && !NVME_CC_SHN(n->bar.cc))
@@ -1071,42 +1071,42 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
         if (NVME_CC_EN(data) && !NVME_CC_EN(n->bar.cc)) {
             n->bar.cc = data;
             if (unlikely(nvme_start_ctrl(n))) {
-                trace_nvme_err_startfail();
+                trace_nvme_dev_err_startfail();
                 n->bar.csts = NVME_CSTS_FAILED;
             } else {
-                trace_nvme_mmio_start_success();
+                trace_nvme_dev_mmio_start_success();
                 n->bar.csts = NVME_CSTS_READY;
             }
         } else if (!NVME_CC_EN(data) && NVME_CC_EN(n->bar.cc)) {
-            trace_nvme_mmio_stopped();
+            trace_nvme_dev_mmio_stopped();
             nvme_clear_ctrl(n);
             n->bar.csts &= ~NVME_CSTS_READY;
         }
         if (NVME_CC_SHN(data) && !(NVME_CC_SHN(n->bar.cc))) {
-            trace_nvme_mmio_shutdown_set();
+            trace_nvme_dev_mmio_shutdown_set();
             nvme_clear_ctrl(n);
             n->bar.cc = data;
             n->bar.csts |= NVME_CSTS_SHST_COMPLETE;
         } else if (!NVME_CC_SHN(data) && NVME_CC_SHN(n->bar.cc)) {
-            trace_nvme_mmio_shutdown_cleared();
+            trace_nvme_dev_mmio_shutdown_cleared();
             n->bar.csts &= ~NVME_CSTS_SHST_COMPLETE;
             n->bar.cc = data;
         }
         break;
     case 0x1C:  /* CSTS */
         if (data & (1 << 4)) {
-            NVME_GUEST_ERR(nvme_ub_mmiowr_ssreset_w1c_unsupported,
+            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_ssreset_w1c_unsupported,
                            "attempted to W1C CSTS.NSSRO"
                            " but CAP.NSSRS is zero (not supported)");
         } else if (data != 0) {
-            NVME_GUEST_ERR(nvme_ub_mmiowr_ro_csts,
+            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_ro_csts,
                            "attempted to set a read only bit"
                            " of controller status");
         }
         break;
     case 0x20:  /* NSSR */
         if (data == 0x4E564D65) {
-            trace_nvme_ub_mmiowr_ssreset_unsupported();
+            trace_nvme_dev_ub_mmiowr_ssreset_unsupported();
         } else {
             /* The spec says that writes of other values have no effect */
             return;
@@ -1114,35 +1114,35 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
         break;
     case 0x24:  /* AQA */
         n->bar.aqa = data & 0xffffffff;
-        trace_nvme_mmio_aqattr(data & 0xffffffff);
+        trace_nvme_dev_mmio_aqattr(data & 0xffffffff);
         break;
     case 0x28:  /* ASQ */
         n->bar.asq = data;
-        trace_nvme_mmio_asqaddr(data);
+        trace_nvme_dev_mmio_asqaddr(data);
         break;
     case 0x2c:  /* ASQ hi */
         n->bar.asq |= data << 32;
-        trace_nvme_mmio_asqaddr_hi(data, n->bar.asq);
+        trace_nvme_dev_mmio_asqaddr_hi(data, n->bar.asq);
         break;
     case 0x30:  /* ACQ */
-        trace_nvme_mmio_acqaddr(data);
+        trace_nvme_dev_mmio_acqaddr(data);
         n->bar.acq = data;
         break;
     case 0x34:  /* ACQ hi */
         n->bar.acq |= data << 32;
-        trace_nvme_mmio_acqaddr_hi(data, n->bar.acq);
+        trace_nvme_dev_mmio_acqaddr_hi(data, n->bar.acq);
         break;
     case 0x38:  /* CMBLOC */
-        NVME_GUEST_ERR(nvme_ub_mmiowr_cmbloc_reserved,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_cmbloc_reserved,
                        "invalid write to reserved CMBLOC"
                        " when CMBSZ is zero, ignored");
         return;
     case 0x3C:  /* CMBSZ */
-        NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_cmbsz_readonly,
                        "invalid write to read only CMBSZ, ignored");
         return;
     default:
-        NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_invalid,
                        "invalid MMIO write,"
                        " offset=0x%"PRIx64", data=%"PRIx64"",
                        offset, data);
@@ -1157,12 +1157,12 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size)
     uint64_t val = 0;
 
     if (unlikely(addr & (sizeof(uint32_t) - 1))) {
-        NVME_GUEST_ERR(nvme_ub_mmiord_misaligned32,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiord_misaligned32,
                        "MMIO read not 32-bit aligned,"
                        " offset=0x%"PRIx64"", addr);
         /* should RAZ, fall through for now */
     } else if (unlikely(size < sizeof(uint32_t))) {
-        NVME_GUEST_ERR(nvme_ub_mmiord_toosmall,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiord_toosmall,
                        "MMIO read smaller than 32-bits,"
                        " offset=0x%"PRIx64"", addr);
         /* should RAZ, fall through for now */
@@ -1171,7 +1171,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size)
     if (addr < sizeof(n->bar)) {
         memcpy(&val, ptr + addr, size);
     } else {
-        NVME_GUEST_ERR(nvme_ub_mmiord_invalid_ofs,
+        NVME_GUEST_ERR(nvme_dev_ub_mmiord_invalid_ofs,
                        "MMIO read beyond last register,"
                        " offset=0x%"PRIx64", returning 0", addr);
     }
@@ -1184,7 +1184,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
     uint32_t qid;
 
     if (unlikely(addr & ((1 << 2) - 1))) {
-        NVME_GUEST_ERR(nvme_ub_db_wr_misaligned,
+        NVME_GUEST_ERR(nvme_dev_ub_db_wr_misaligned,
                        "doorbell write not 32-bit aligned,"
                        " offset=0x%"PRIx64", ignoring", addr);
         return;
@@ -1199,7 +1199,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 
         qid = (addr - (0x1000 + (1 << 2))) >> 3;
         if (unlikely(nvme_check_cqid(n, qid))) {
-            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_cq,
+            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_cq,
                            "completion queue doorbell write"
                            " for nonexistent queue,"
                            " sqid=%"PRIu32", ignoring", qid);
@@ -1208,7 +1208,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 
         cq = n->cq[qid];
         if (unlikely(new_head >= cq->size)) {
-            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_cqhead,
+            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_cqhead,
                            "completion queue doorbell write value"
                            " beyond queue size, sqid=%"PRIu32","
                            " new_head=%"PRIu16", ignoring",
@@ -1237,7 +1237,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 
         qid = (addr - 0x1000) >> 3;
         if (unlikely(nvme_check_sqid(n, qid))) {
-            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_sq,
+            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_sq,
                            "submission queue doorbell write"
                            " for nonexistent queue,"
                            " sqid=%"PRIu32", ignoring", qid);
@@ -1246,7 +1246,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 
         sq = n->sq[qid];
         if (unlikely(new_tail >= sq->size)) {
-            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_sqtail,
+            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_sqtail,
                            "submission queue doorbell write value"
                            " beyond queue size, sqid=%"PRIu32","
                            " new_tail=%"PRIu16", ignoring",
diff --git a/hw/block/trace-events b/hw/block/trace-events
index c03e80c2c9c9..ade506ea2bb2 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -29,96 +29,96 @@ hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int t
 
 # nvme.c
 # nvme traces for successful events
-nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
-nvme_irq_pin(void) "pulsing IRQ pin"
-nvme_irq_masked(void) "IRQ is masked"
-nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
-nvme_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
-nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
-nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
-nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
-nvme_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
-nvme_identify_ctrl(void) "identify controller"
-nvme_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
-nvme_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
-nvme_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
-nvme_getfeat_numq(int result) "get feature number of queues, result=%d"
-nvme_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
-nvme_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
-nvme_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
-nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
-nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
-nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
-nvme_mmio_aqattr(uint64_t data) "wrote MMIO, admin queue attributes=0x%"PRIx64""
-nvme_mmio_asqaddr(uint64_t data) "wrote MMIO, admin submission queue address=0x%"PRIx64""
-nvme_mmio_acqaddr(uint64_t data) "wrote MMIO, admin completion queue address=0x%"PRIx64""
-nvme_mmio_asqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin submission queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
-nvme_mmio_acqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin completion queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
-nvme_mmio_start_success(void) "setting controller enable bit succeeded"
-nvme_mmio_stopped(void) "cleared controller enable bit"
-nvme_mmio_shutdown_set(void) "shutdown bit set"
-nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
+nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
+nvme_dev_irq_pin(void) "pulsing IRQ pin"
+nvme_dev_irq_masked(void) "IRQ is masked"
+nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
+nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
+nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
+nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
+nvme_dev_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
+nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
+nvme_dev_identify_ctrl(void) "identify controller"
+nvme_dev_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
+nvme_dev_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
+nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
+nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
+nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
+nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
+nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
+nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
+nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
+nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
+nvme_dev_mmio_aqattr(uint64_t data) "wrote MMIO, admin queue attributes=0x%"PRIx64""
+nvme_dev_mmio_asqaddr(uint64_t data) "wrote MMIO, admin submission queue address=0x%"PRIx64""
+nvme_dev_mmio_acqaddr(uint64_t data) "wrote MMIO, admin completion queue address=0x%"PRIx64""
+nvme_dev_mmio_asqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin submission queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
+nvme_dev_mmio_acqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin completion queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
+nvme_dev_mmio_start_success(void) "setting controller enable bit succeeded"
+nvme_dev_mmio_stopped(void) "cleared controller enable bit"
+nvme_dev_mmio_shutdown_set(void) "shutdown bit set"
+nvme_dev_mmio_shutdown_cleared(void) "shutdown bit cleared"
 
 # nvme traces for error conditions
-nvme_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
-nvme_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
-nvme_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
-nvme_err_invalid_prp2_missing(void) "PRP2 is null and more data to be transferred"
-nvme_err_invalid_prp(void) "invalid PRP"
-nvme_err_invalid_ns(uint32_t ns, uint32_t limit) "invalid namespace %u not within 1-%u"
-nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8""
-nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8""
-nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64""
-nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16""
-nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16""
-nvme_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16""
-nvme_err_invalid_create_sq_size(uint16_t qsize) "failed creating submission queue, invalid qsize=%"PRIu16""
-nvme_err_invalid_create_sq_addr(uint64_t addr) "failed creating submission queue, addr=0x%"PRIx64""
-nvme_err_invalid_create_sq_qflags(uint16_t qflags) "failed creating submission queue, qflags=%"PRIu16""
-nvme_err_invalid_del_cq_cqid(uint16_t cqid) "failed deleting completion queue, cqid=%"PRIu16""
-nvme_err_invalid_del_cq_notempty(uint16_t cqid) "failed deleting completion queue, it is not empty, cqid=%"PRIu16""
-nvme_err_invalid_create_cq_cqid(uint16_t cqid) "failed creating completion queue, cqid=%"PRIu16""
-nvme_err_invalid_create_cq_size(uint16_t size) "failed creating completion queue, size=%"PRIu16""
-nvme_err_invalid_create_cq_addr(uint64_t addr) "failed creating completion queue, addr=0x%"PRIx64""
-nvme_err_invalid_create_cq_vector(uint16_t vector) "failed creating completion queue, vector=%"PRIu16""
-nvme_err_invalid_create_cq_qflags(uint16_t qflags) "failed creating completion queue, qflags=%"PRIu16""
-nvme_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx16""
-nvme_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
-nvme_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
-nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
-nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
-nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
-nvme_err_startfail_nbaracq(void) "nvme_start_ctrl failed because the admin completion queue address is null"
-nvme_err_startfail_asq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin submission queue address is misaligned: 0x%"PRIx64""
-nvme_err_startfail_acq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin completion queue address is misaligned: 0x%"PRIx64""
-nvme_err_startfail_page_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too small: log2size=%u, min=%u"
-nvme_err_startfail_page_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too large: log2size=%u, max=%u"
-nvme_err_startfail_cqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too small: log2size=%u, min=%u"
-nvme_err_startfail_cqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too large: log2size=%u, max=%u"
-nvme_err_startfail_sqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too small: log2size=%u, min=%u"
-nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too large: log2size=%u, max=%u"
-nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero"
-nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero"
-nvme_err_startfail(void) "setting controller enable bit failed"
+nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
+nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
+nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
+nvme_dev_err_invalid_prp2_missing(void) "PRP2 is null and more data to be transferred"
+nvme_dev_err_invalid_prp(void) "invalid PRP"
+nvme_dev_err_invalid_ns(uint32_t ns, uint32_t limit) "invalid namespace %u not within 1-%u"
+nvme_dev_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8""
+nvme_dev_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8""
+nvme_dev_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64""
+nvme_dev_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16""
+nvme_dev_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16""
+nvme_dev_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16""
+nvme_dev_err_invalid_create_sq_size(uint16_t qsize) "failed creating submission queue, invalid qsize=%"PRIu16""
+nvme_dev_err_invalid_create_sq_addr(uint64_t addr) "failed creating submission queue, addr=0x%"PRIx64""
+nvme_dev_err_invalid_create_sq_qflags(uint16_t qflags) "failed creating submission queue, qflags=%"PRIu16""
+nvme_dev_err_invalid_del_cq_cqid(uint16_t cqid) "failed deleting completion queue, cqid=%"PRIu16""
+nvme_dev_err_invalid_del_cq_notempty(uint16_t cqid) "failed deleting completion queue, it is not empty, cqid=%"PRIu16""
+nvme_dev_err_invalid_create_cq_cqid(uint16_t cqid) "failed creating completion queue, cqid=%"PRIu16""
+nvme_dev_err_invalid_create_cq_size(uint16_t size) "failed creating completion queue, size=%"PRIu16""
+nvme_dev_err_invalid_create_cq_addr(uint64_t addr) "failed creating completion queue, addr=0x%"PRIx64""
+nvme_dev_err_invalid_create_cq_vector(uint16_t vector) "failed creating completion queue, vector=%"PRIu16""
+nvme_dev_err_invalid_create_cq_qflags(uint16_t qflags) "failed creating completion queue, qflags=%"PRIu16""
+nvme_dev_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx16""
+nvme_dev_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
+nvme_dev_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
+nvme_dev_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
+nvme_dev_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
+nvme_dev_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
+nvme_dev_err_startfail_nbaracq(void) "nvme_start_ctrl failed because the admin completion queue address is null"
+nvme_dev_err_startfail_asq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin submission queue address is misaligned: 0x%"PRIx64""
+nvme_dev_err_startfail_acq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin completion queue address is misaligned: 0x%"PRIx64""
+nvme_dev_err_startfail_page_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too small: log2size=%u, min=%u"
+nvme_dev_err_startfail_page_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too large: log2size=%u, max=%u"
+nvme_dev_err_startfail_cqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too small: log2size=%u, min=%u"
+nvme_dev_err_startfail_cqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too large: log2size=%u, max=%u"
+nvme_dev_err_startfail_sqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too small: log2size=%u, min=%u"
+nvme_dev_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too large: log2size=%u, max=%u"
+nvme_dev_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero"
+nvme_dev_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero"
+nvme_dev_err_startfail(void) "setting controller enable bit failed"
 
 # Traces for undefined behavior
-nvme_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64""
-nvme_ub_mmiowr_toosmall(uint64_t offset, unsigned size) "MMIO write smaller than 32 bits, offset=0x%"PRIx64", size=%u"
-nvme_ub_mmiowr_intmask_with_msix(void) "undefined access to interrupt mask set when MSI-X is enabled"
-nvme_ub_mmiowr_ro_csts(void) "attempted to set a read only bit of controller status"
-nvme_ub_mmiowr_ssreset_w1c_unsupported(void) "attempted to W1C CSTS.NSSRO but CAP.NSSRS is zero (not supported)"
-nvme_ub_mmiowr_ssreset_unsupported(void) "attempted NVM subsystem reset but CAP.NSSRS is zero (not supported)"
-nvme_ub_mmiowr_cmbloc_reserved(void) "invalid write to reserved CMBLOC when CMBSZ is zero, ignored"
-nvme_ub_mmiowr_cmbsz_readonly(void) "invalid write to read only CMBSZ, ignored"
-nvme_ub_mmiowr_invalid(uint64_t offset, uint64_t data) "invalid MMIO write, offset=0x%"PRIx64", data=0x%"PRIx64""
-nvme_ub_mmiord_misaligned32(uint64_t offset) "MMIO read not 32-bit aligned, offset=0x%"PRIx64""
-nvme_ub_mmiord_toosmall(uint64_t offset) "MMIO read smaller than 32-bits, offset=0x%"PRIx64""
-nvme_ub_mmiord_invalid_ofs(uint64_t offset) "MMIO read beyond last register, offset=0x%"PRIx64", returning 0"
-nvme_ub_db_wr_misaligned(uint64_t offset) "doorbell write not 32-bit aligned, offset=0x%"PRIx64", ignoring"
-nvme_ub_db_wr_invalid_cq(uint32_t qid) "completion queue doorbell write for nonexistent queue, cqid=%"PRIu32", ignoring"
-nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion queue doorbell write value beyond queue size, cqid=%"PRIu32", new_head=%"PRIu16", ignoring"
-nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring"
-nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring"
+nvme_dev_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64""
+nvme_dev_ub_mmiowr_toosmall(uint64_t offset, unsigned size) "MMIO write smaller than 32 bits, offset=0x%"PRIx64", size=%u"
+nvme_dev_ub_mmiowr_intmask_with_msix(void) "undefined access to interrupt mask set when MSI-X is enabled"
+nvme_dev_ub_mmiowr_ro_csts(void) "attempted to set a read only bit of controller status"
+nvme_dev_ub_mmiowr_ssreset_w1c_unsupported(void) "attempted to W1C CSTS.NSSRO but CAP.NSSRS is zero (not supported)"
+nvme_dev_ub_mmiowr_ssreset_unsupported(void) "attempted NVM subsystem reset but CAP.NSSRS is zero (not supported)"
+nvme_dev_ub_mmiowr_cmbloc_reserved(void) "invalid write to reserved CMBLOC when CMBSZ is zero, ignored"
+nvme_dev_ub_mmiowr_cmbsz_readonly(void) "invalid write to read only CMBSZ, ignored"
+nvme_dev_ub_mmiowr_invalid(uint64_t offset, uint64_t data) "invalid MMIO write, offset=0x%"PRIx64", data=0x%"PRIx64""
+nvme_dev_ub_mmiord_misaligned32(uint64_t offset) "MMIO read not 32-bit aligned, offset=0x%"PRIx64""
+nvme_dev_ub_mmiord_toosmall(uint64_t offset) "MMIO read smaller than 32-bits, offset=0x%"PRIx64""
+nvme_dev_ub_mmiord_invalid_ofs(uint64_t offset) "MMIO read beyond last register, offset=0x%"PRIx64", returning 0"
+nvme_dev_ub_db_wr_misaligned(uint64_t offset) "doorbell write not 32-bit aligned, offset=0x%"PRIx64", ignoring"
+nvme_dev_ub_db_wr_invalid_cq(uint32_t qid) "completion queue doorbell write for nonexistent queue, cqid=%"PRIu32", ignoring"
+nvme_dev_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion queue doorbell write value beyond queue size, cqid=%"PRIu32", new_head=%"PRIu16", ignoring"
+nvme_dev_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring"
+nvme_dev_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring"
 
 # xen-block.c
 xen_block_realize(const char *type, uint32_t disk, uint32_t partition) "%s d%up%u"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 02/42] nvme: remove superfluous breaks
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
  2020-03-16 14:28 ` [PATCH v6 01/42] nvme: rename trace events to nvme_dev Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-16 14:28 ` [PATCH v6 03/42] nvme: move device parameters to separate struct Klaus Jensen
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

These break statements was left over when commit 3036a626e9ef ("nvme:
add Get/Set Feature Timestamp support") was merged.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3e4b18956ed2..9740948b354a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -788,7 +788,6 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         break;
     case NVME_TIMESTAMP:
         return nvme_get_feature_timestamp(n, cmd);
-        break;
     default:
         trace_nvme_dev_err_invalid_getfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -832,11 +831,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         req->cqe.result =
             cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
         break;
-
     case NVME_TIMESTAMP:
         return nvme_set_feature_timestamp(n, cmd);
-        break;
-
     default:
         trace_nvme_dev_err_invalid_setfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 03/42] nvme: move device parameters to separate struct
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
  2020-03-16 14:28 ` [PATCH v6 01/42] nvme: rename trace events to nvme_dev Klaus Jensen
  2020-03-16 14:28 ` [PATCH v6 02/42] nvme: remove superfluous breaks Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:36   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 04/42] nvme: bump spec data structures to v1.3 Klaus Jensen
                   ` (40 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Move device configuration parameters to separate struct to make it
explicit what is configurable and what is set internally.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c | 44 ++++++++++++++++++++++----------------------
 hw/block/nvme.h | 16 +++++++++++++---
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9740948b354a..b532818b4b76 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -64,12 +64,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-    return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1;
+    return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-    return cqid < n->num_queues && n->cq[cqid] != NULL ? 0 : -1;
+    return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -631,7 +631,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
         trace_nvme_dev_err_invalid_create_cq_addr(prp1);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
-    if (unlikely(vector > n->num_queues)) {
+    if (unlikely(vector > n->params.num_queues)) {
         trace_nvme_dev_err_invalid_create_cq_vector(vector);
         return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
     }
@@ -783,7 +783,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
         break;
     case NVME_NUMBER_OF_QUEUES:
-        result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
+        result = cpu_to_le32((n->params.num_queues - 2) |
+                             ((n->params.num_queues - 2) << 16));
         trace_nvme_dev_getfeat_numq(result);
         break;
     case NVME_TIMESTAMP:
@@ -827,9 +828,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     case NVME_NUMBER_OF_QUEUES:
         trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
                                     ((dw11 >> 16) & 0xFFFF) + 1,
-                                    n->num_queues - 1, n->num_queues - 1);
-        req->cqe.result =
-            cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
+                                    n->params.num_queues - 1,
+                                    n->params.num_queues - 1);
+        req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
+                                      ((n->params.num_queues - 2) << 16));
         break;
     case NVME_TIMESTAMP:
         return nvme_set_feature_timestamp(n, cmd);
@@ -900,12 +902,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
 
     blk_drain(n->conf.blk);
 
-    for (i = 0; i < n->num_queues; i++) {
+    for (i = 0; i < n->params.num_queues; i++) {
         if (n->sq[i] != NULL) {
             nvme_free_sq(n->sq[i], n);
         }
     }
-    for (i = 0; i < n->num_queues; i++) {
+    for (i = 0; i < n->params.num_queues; i++) {
         if (n->cq[i] != NULL) {
             nvme_free_cq(n->cq[i], n);
         }
@@ -1308,7 +1310,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     int64_t bs_size;
     uint8_t *pci_conf;
 
-    if (!n->num_queues) {
+    if (!n->params.num_queues) {
         error_setg(errp, "num_queues can't be zero");
         return;
     }
@@ -1324,7 +1326,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         return;
     }
 
-    if (!n->serial) {
+    if (!n->params.serial) {
         error_setg(errp, "serial property not set");
         return;
     }
@@ -1341,25 +1343,25 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     pcie_endpoint_cap_init(pci_dev, 0x80);
 
     n->num_namespaces = 1;
-    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
+    n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
     n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
     n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
-    n->sq = g_new0(NvmeSQueue *, n->num_queues);
-    n->cq = g_new0(NvmeCQueue *, n->num_queues);
+    n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
+    n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
 
     memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
                           "nvme", n->reg_size);
     pci_register_bar(pci_dev, 0,
         PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
         &n->iomem);
-    msix_init_exclusive_bar(pci_dev, n->num_queues, 4, NULL);
+    msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL);
 
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
     strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
     strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
-    strpadcpy((char *)id->sn, sizeof(id->sn), n->serial, ' ');
+    strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
     id->rab = 6;
     id->ieee[0] = 0x00;
     id->ieee[1] = 0x02;
@@ -1388,7 +1390,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     n->bar.vs = 0x00010200;
     n->bar.intmc = n->bar.intms = 0;
 
-    if (n->cmb_size_mb) {
+    if (n->params.cmb_size_mb) {
 
         NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
         NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
@@ -1399,7 +1401,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
         NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
         NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
-        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->cmb_size_mb);
+        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
 
         n->cmbloc = n->bar.cmbloc;
         n->cmbsz = n->bar.cmbsz;
@@ -1438,7 +1440,7 @@ static void nvme_exit(PCIDevice *pci_dev)
     g_free(n->cq);
     g_free(n->sq);
 
-    if (n->cmb_size_mb) {
+    if (n->params.cmb_size_mb) {
         g_free(n->cmbuf);
     }
     msix_uninit_exclusive_bar(pci_dev);
@@ -1446,9 +1448,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 
 static Property nvme_props[] = {
     DEFINE_BLOCK_PROPERTIES(NvmeCtrl, conf),
-    DEFINE_PROP_STRING("serial", NvmeCtrl, serial),
-    DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, cmb_size_mb, 0),
-    DEFINE_PROP_UINT32("num_queues", NvmeCtrl, num_queues, 64),
+    DEFINE_NVME_PROPERTIES(NvmeCtrl, params),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 557194ee1954..9957c4a200e2 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -1,7 +1,19 @@
 #ifndef HW_NVME_H
 #define HW_NVME_H
+
 #include "block/nvme.h"
 
+#define DEFINE_NVME_PROPERTIES(_state, _props) \
+    DEFINE_PROP_STRING("serial", _state, _props.serial), \
+    DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
+    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64)
+
+typedef struct NvmeParams {
+    char     *serial;
+    uint32_t num_queues;
+    uint32_t cmb_size_mb;
+} NvmeParams;
+
 typedef struct NvmeAsyncEvent {
     QSIMPLEQ_ENTRY(NvmeAsyncEvent) entry;
     NvmeAerResult result;
@@ -63,6 +75,7 @@ typedef struct NvmeCtrl {
     MemoryRegion ctrl_mem;
     NvmeBar      bar;
     BlockConf    conf;
+    NvmeParams   params;
 
     uint32_t    page_size;
     uint16_t    page_bits;
@@ -71,10 +84,8 @@ typedef struct NvmeCtrl {
     uint16_t    sqe_size;
     uint32_t    reg_size;
     uint32_t    num_namespaces;
-    uint32_t    num_queues;
     uint32_t    max_q_ents;
     uint64_t    ns_size;
-    uint32_t    cmb_size_mb;
     uint32_t    cmbsz;
     uint32_t    cmbloc;
     uint8_t     *cmbuf;
@@ -82,7 +93,6 @@ typedef struct NvmeCtrl {
     uint64_t    host_timestamp;                 /* Timestamp sent by the host */
     uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
 
-    char            *serial;
     NvmeNamespace   *namespaces;
     NvmeSQueue      **sq;
     NvmeCQueue      **cq;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 04/42] nvme: bump spec data structures to v1.3
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (2 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 03/42] nvme: move device parameters to separate struct Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:37   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 05/42] nvme: use constant for identify data size Klaus Jensen
                   ` (39 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Add missing fields in the Identify Controller and Identify Namespace
data structures to bring them in line with NVMe v1.3.

This also adds data structures and defines for SGL support which
requires a couple of trivial changes to the nvme block driver as well.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Fam Zheng <fam@euphon.net>
---
 block/nvme.c         |  18 ++---
 hw/block/nvme.c      |  12 ++--
 include/block/nvme.h | 153 ++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 151 insertions(+), 32 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index d41c4bda6e39..99b9bb3dac96 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -446,7 +446,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         error_setg(errp, "Cannot map buffer for DMA");
         goto out;
     }
-    cmd.prp1 = cpu_to_le64(iova);
+    cmd.dptr.prp1 = cpu_to_le64(iova);
 
     if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
         error_setg(errp, "Failed to identify controller");
@@ -545,7 +545,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
     }
     cmd = (NvmeCmd) {
         .opcode = NVME_ADM_CMD_CREATE_CQ,
-        .prp1 = cpu_to_le64(q->cq.iova),
+        .dptr.prp1 = cpu_to_le64(q->cq.iova),
         .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)),
         .cdw11 = cpu_to_le32(0x3),
     };
@@ -556,7 +556,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
     }
     cmd = (NvmeCmd) {
         .opcode = NVME_ADM_CMD_CREATE_SQ,
-        .prp1 = cpu_to_le64(q->sq.iova),
+        .dptr.prp1 = cpu_to_le64(q->sq.iova),
         .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)),
         .cdw11 = cpu_to_le32(0x1 | (n << 16)),
     };
@@ -906,16 +906,16 @@ try_map:
     case 0:
         abort();
     case 1:
-        cmd->prp1 = pagelist[0];
-        cmd->prp2 = 0;
+        cmd->dptr.prp1 = pagelist[0];
+        cmd->dptr.prp2 = 0;
         break;
     case 2:
-        cmd->prp1 = pagelist[0];
-        cmd->prp2 = pagelist[1];
+        cmd->dptr.prp1 = pagelist[0];
+        cmd->dptr.prp2 = pagelist[1];
         break;
     default:
-        cmd->prp1 = pagelist[0];
-        cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
+        cmd->dptr.prp1 = pagelist[0];
+        cmd->dptr.prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
         break;
     }
     trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b532818b4b76..40cb176dea3c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -372,8 +372,8 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
     uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
     uint64_t slba = le64_to_cpu(rw->slba);
-    uint64_t prp1 = le64_to_cpu(rw->prp1);
-    uint64_t prp2 = le64_to_cpu(rw->prp2);
+    uint64_t prp1 = le64_to_cpu(rw->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(rw->dptr.prp2);
 
     uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
     uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
@@ -763,8 +763,8 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
 
 static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
 {
-    uint64_t prp1 = le64_to_cpu(cmd->prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->prp2);
+    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
 
     uint64_t timestamp = nvme_get_timestamp(n);
 
@@ -802,8 +802,8 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
 {
     uint16_t ret;
     uint64_t timestamp;
-    uint64_t prp1 = le64_to_cpu(cmd->prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->prp2);
+    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
 
     ret = nvme_dma_write_prp(n, (uint8_t *)&timestamp,
                                 sizeof(timestamp), prp1, prp2);
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 8fb941c6537c..a083c1b3a613 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -205,15 +205,53 @@ enum NvmeCmbszMask {
 #define NVME_CMBSZ_GETSIZE(cmbsz) \
     (NVME_CMBSZ_SZ(cmbsz) * (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz))))
 
+enum NvmeSglDescriptorType {
+    NVME_SGL_DESCR_TYPE_DATA_BLOCK          = 0x0,
+    NVME_SGL_DESCR_TYPE_BIT_BUCKET          = 0x1,
+    NVME_SGL_DESCR_TYPE_SEGMENT             = 0x2,
+    NVME_SGL_DESCR_TYPE_LAST_SEGMENT        = 0x3,
+    NVME_SGL_DESCR_TYPE_KEYED_DATA_BLOCK    = 0x4,
+
+    NVME_SGL_DESCR_TYPE_VENDOR_SPECIFIC     = 0xf,
+};
+
+enum NvmeSglDescriptorSubtype {
+    NVME_SGL_DESCR_SUBTYPE_ADDRESS = 0x0,
+};
+
+typedef struct NvmeSglDescriptor {
+    uint64_t addr;
+    uint32_t len;
+    uint8_t  rsvd[3];
+    uint8_t  type;
+} NvmeSglDescriptor;
+
+#define NVME_SGL_TYPE(type)     ((type >> 4) & 0xf)
+#define NVME_SGL_SUBTYPE(type)  (type & 0xf)
+
+typedef union NvmeCmdDptr {
+    struct {
+        uint64_t    prp1;
+        uint64_t    prp2;
+    };
+
+    NvmeSglDescriptor sgl;
+} NvmeCmdDptr;
+
+enum NvmePsdt {
+    PSDT_PRP                 = 0x0,
+    PSDT_SGL_MPTR_CONTIGUOUS = 0x1,
+    PSDT_SGL_MPTR_SGL        = 0x2,
+};
+
 typedef struct NvmeCmd {
     uint8_t     opcode;
-    uint8_t     fuse;
+    uint8_t     flags;
     uint16_t    cid;
     uint32_t    nsid;
     uint64_t    res1;
     uint64_t    mptr;
-    uint64_t    prp1;
-    uint64_t    prp2;
+    NvmeCmdDptr dptr;
     uint32_t    cdw10;
     uint32_t    cdw11;
     uint32_t    cdw12;
@@ -222,6 +260,9 @@ typedef struct NvmeCmd {
     uint32_t    cdw15;
 } NvmeCmd;
 
+#define NVME_CMD_FLAGS_FUSE(flags) (flags & 0x3)
+#define NVME_CMD_FLAGS_PSDT(flags) ((flags >> 6) & 0x3)
+
 enum NvmeAdminCommands {
     NVME_ADM_CMD_DELETE_SQ      = 0x00,
     NVME_ADM_CMD_CREATE_SQ      = 0x01,
@@ -321,8 +362,7 @@ typedef struct NvmeRwCmd {
     uint32_t    nsid;
     uint64_t    rsvd2;
     uint64_t    mptr;
-    uint64_t    prp1;
-    uint64_t    prp2;
+    NvmeCmdDptr dptr;
     uint64_t    slba;
     uint16_t    nlb;
     uint16_t    control;
@@ -362,8 +402,7 @@ typedef struct NvmeDsmCmd {
     uint16_t    cid;
     uint32_t    nsid;
     uint64_t    rsvd2[2];
-    uint64_t    prp1;
-    uint64_t    prp2;
+    NvmeCmdDptr dptr;
     uint32_t    nr;
     uint32_t    attributes;
     uint32_t    rsvd12[4];
@@ -427,6 +466,12 @@ enum NvmeStatusCodes {
     NVME_CMD_ABORT_MISSING_FUSE = 0x000a,
     NVME_INVALID_NSID           = 0x000b,
     NVME_CMD_SEQ_ERROR          = 0x000c,
+    NVME_INVALID_SGL_SEG_DESCR  = 0x000d,
+    NVME_INVALID_NUM_SGL_DESCRS = 0x000e,
+    NVME_DATA_SGL_LEN_INVALID   = 0x000f,
+    NVME_MD_SGL_LEN_INVALID     = 0x0010,
+    NVME_SGL_DESCR_TYPE_INVALID = 0x0011,
+    NVME_INVALID_USE_OF_CMB     = 0x0012,
     NVME_LBA_RANGE              = 0x0080,
     NVME_CAP_EXCEEDED           = 0x0081,
     NVME_NS_NOT_READY           = 0x0082,
@@ -515,7 +560,7 @@ enum NvmeSmartWarn {
     NVME_SMART_FAILED_VOLATILE_MEDIA  = 1 << 4,
 };
 
-enum LogIdentifier {
+enum NvmeLogIdentifier {
     NVME_LOG_ERROR_INFO     = 0x01,
     NVME_LOG_SMART_INFO     = 0x02,
     NVME_LOG_FW_SLOT_INFO   = 0x03,
@@ -533,6 +578,15 @@ typedef struct NvmePSD {
     uint8_t     resv[16];
 } NvmePSD;
 
+#define NVME_IDENTIFY_DATA_SIZE 4096
+
+enum {
+    NVME_ID_CNS_NS             = 0x0,
+    NVME_ID_CNS_CTRL           = 0x1,
+    NVME_ID_CNS_NS_ACTIVE_LIST = 0x2,
+    NVME_ID_CNS_NS_DESCR_LIST  = 0x3,
+};
+
 typedef struct NvmeIdCtrl {
     uint16_t    vid;
     uint16_t    ssvid;
@@ -543,7 +597,15 @@ typedef struct NvmeIdCtrl {
     uint8_t     ieee[3];
     uint8_t     cmic;
     uint8_t     mdts;
-    uint8_t     rsvd255[178];
+    uint16_t    cntlid;
+    uint32_t    ver;
+    uint32_t    rtd3r;
+    uint32_t    rtd3e;
+    uint32_t    oaes;
+    uint32_t    ctratt;
+    uint8_t     rsvd100[12];
+    uint8_t     fguid[16];
+    uint8_t     rsvd128[128];
     uint16_t    oacs;
     uint8_t     acl;
     uint8_t     aerl;
@@ -551,10 +613,28 @@ typedef struct NvmeIdCtrl {
     uint8_t     lpa;
     uint8_t     elpe;
     uint8_t     npss;
-    uint8_t     rsvd511[248];
+    uint8_t     avscc;
+    uint8_t     apsta;
+    uint16_t    wctemp;
+    uint16_t    cctemp;
+    uint16_t    mtfa;
+    uint32_t    hmpre;
+    uint32_t    hmmin;
+    uint8_t     tnvmcap[16];
+    uint8_t     unvmcap[16];
+    uint32_t    rpmbs;
+    uint16_t    edstt;
+    uint8_t     dsto;
+    uint8_t     fwug;
+    uint16_t    kas;
+    uint16_t    hctma;
+    uint16_t    mntmt;
+    uint16_t    mxtmt;
+    uint32_t    sanicap;
+    uint8_t     rsvd332[180];
     uint8_t     sqes;
     uint8_t     cqes;
-    uint16_t    rsvd515;
+    uint16_t    maxcmd;
     uint32_t    nn;
     uint16_t    oncs;
     uint16_t    fuses;
@@ -562,8 +642,14 @@ typedef struct NvmeIdCtrl {
     uint8_t     vwc;
     uint16_t    awun;
     uint16_t    awupf;
-    uint8_t     rsvd703[174];
-    uint8_t     rsvd2047[1344];
+    uint8_t     nvscc;
+    uint8_t     rsvd531;
+    uint16_t    acwu;
+    uint8_t     rsvd534[2];
+    uint32_t    sgls;
+    uint8_t     rsvd540[228];
+    uint8_t     subnqn[256];
+    uint8_t     rsvd1024[1024];
     NvmePSD     psd[32];
     uint8_t     vs[1024];
 } NvmeIdCtrl;
@@ -589,6 +675,16 @@ enum NvmeIdCtrlOncs {
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
 #define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf)
 
+#define NVME_CTRL_SGLS_SUPPORTED_MASK            (0x3 <<  0)
+#define NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT    (0x1 <<  0)
+#define NVME_CTRL_SGLS_SUPPORTED_DWORD_ALIGNMENT (0x1 <<  1)
+#define NVME_CTRL_SGLS_KEYED                     (0x1 <<  2)
+#define NVME_CTRL_SGLS_BITBUCKET                 (0x1 << 16)
+#define NVME_CTRL_SGLS_MPTR_CONTIGUOUS           (0x1 << 17)
+#define NVME_CTRL_SGLS_EXCESS_LENGTH             (0x1 << 18)
+#define NVME_CTRL_SGLS_MPTR_SGL                  (0x1 << 19)
+#define NVME_CTRL_SGLS_ADDR_OFFSET               (0x1 << 20)
+
 typedef struct NvmeFeatureVal {
     uint32_t    arbitration;
     uint32_t    power_mgmt;
@@ -611,6 +707,10 @@ typedef struct NvmeFeatureVal {
 #define NVME_INTC_THR(intc)     (intc & 0xff)
 #define NVME_INTC_TIME(intc)    ((intc >> 8) & 0xff)
 
+#define NVME_TEMP_THSEL(temp)  ((temp >> 20) & 0x3)
+#define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
+#define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
+
 enum NvmeFeatureIds {
     NVME_ARBITRATION                = 0x1,
     NVME_POWER_MANAGEMENT           = 0x2,
@@ -653,18 +753,37 @@ typedef struct NvmeIdNs {
     uint8_t     mc;
     uint8_t     dpc;
     uint8_t     dps;
-
     uint8_t     nmic;
     uint8_t     rescap;
     uint8_t     fpi;
     uint8_t     dlfeat;
-
-    uint8_t     res34[94];
+    uint16_t    nawun;
+    uint16_t    nawupf;
+    uint16_t    nacwu;
+    uint16_t    nabsn;
+    uint16_t    nabo;
+    uint16_t    nabspf;
+    uint16_t    noiob;
+    uint8_t     nvmcap[16];
+    uint8_t     rsvd64[40];
+    uint8_t     nguid[16];
+    uint64_t    eui64;
     NvmeLBAF    lbaf[16];
-    uint8_t     res192[192];
+    uint8_t     rsvd192[192];
     uint8_t     vs[3712];
 } NvmeIdNs;
 
+typedef struct NvmeIdNsDescr {
+    uint8_t nidt;
+    uint8_t nidl;
+    uint8_t rsvd2[2];
+} NvmeIdNsDescr;
+
+#define NVME_NIDT_UUID_LEN 16
+
+enum {
+    NVME_NIDT_UUID = 0x3,
+};
 
 /*Deallocate Logical Block Features*/
 #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 05/42] nvme: use constant for identify data size
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (3 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 04/42] nvme: bump spec data structures to v1.3 Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:37   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 06/42] nvme: add identify cns values in header Klaus Jensen
                   ` (38 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 40cb176dea3c..f716f690a594 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -679,7 +679,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
 
 static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
 {
-    static const int data_len = 4 * KiB;
+    static const int data_len = NVME_IDENTIFY_DATA_SIZE;
     uint32_t min_nsid = le32_to_cpu(c->nsid);
     uint64_t prp1 = le64_to_cpu(c->prp1);
     uint64_t prp2 = le64_to_cpu(c->prp2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 06/42] nvme: add identify cns values in header
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (4 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 05/42] nvme: use constant for identify data size Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:37   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 07/42] nvme: refactor nvme_addr_read Klaus Jensen
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f716f690a594..b38d7e548a60 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -709,11 +709,11 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
     NvmeIdentify *c = (NvmeIdentify *)cmd;
 
     switch (le32_to_cpu(c->cns)) {
-    case 0x00:
+    case NVME_ID_CNS_NS:
         return nvme_identify_ns(n, c);
-    case 0x01:
+    case NVME_ID_CNS_CTRL:
         return nvme_identify_ctrl(n, c);
-    case 0x02:
+    case NVME_ID_CNS_NS_ACTIVE_LIST:
         return nvme_identify_nslist(n, c);
     default:
         trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 07/42] nvme: refactor nvme_addr_read
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (5 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 06/42] nvme: add identify cns values in header Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:38   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 08/42] nvme: add support for the abort command Klaus Jensen
                   ` (36 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Pull the controller memory buffer check to its own function. The check
will be used on its own in later patches.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b38d7e548a60..08a83d449de3 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -52,14 +52,22 @@
 
 static void nvme_process_sq(void *opaque);
 
+static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
+{
+    hwaddr low = n->ctrl_mem.addr;
+    hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
+
+    return addr >= low && addr < hi;
+}
+
 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-    if (n->cmbsz && addr >= n->ctrl_mem.addr &&
-                addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
+    if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
         memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
-    } else {
-        pci_dma_read(&n->parent_obj, addr, buf, size);
+        return;
     }
+
+    pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 08/42] nvme: add support for the abort command
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (6 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 07/42] nvme: refactor nvme_addr_read Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:38   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 09/42] nvme: add max_ioqpairs device parameter Klaus Jensen
                   ` (35 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
Section 5.1 ("Abort command").

The Abort command is a best effort command; for now, the device always
fails to abort the given command.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 08a83d449de3..7cf7cf55143e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -729,6 +729,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
     }
 }
 
+static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+    uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0xffff;
+
+    req->cqe.result = 1;
+    if (nvme_check_sqid(n, sqid)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
 static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
 {
     trace_nvme_dev_setfeat_timestamp(ts);
@@ -863,6 +875,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         return nvme_create_cq(n, cmd);
     case NVME_ADM_CMD_IDENTIFY:
         return nvme_identify(n, cmd);
+    case NVME_ADM_CMD_ABORT:
+        return nvme_abort(n, cmd, req);
     case NVME_ADM_CMD_SET_FEATURES:
         return nvme_set_feature(n, cmd, req);
     case NVME_ADM_CMD_GET_FEATURES:
@@ -1375,6 +1389,19 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     id->ieee[1] = 0x02;
     id->ieee[2] = 0xb3;
     id->oacs = cpu_to_le16(0);
+
+    /*
+     * Because the controller always completes the Abort command immediately,
+     * there can never be more than one concurrently executing Abort command,
+     * so this value is never used for anything. Note that there can easily be
+     * many Abort commands in the queues, but they are not considered
+     * "executing" until processed by nvme_abort.
+     *
+     * The specification recommends a value of 3 for Abort Command Limit (four
+     * concurrently outstanding Abort commands), so lets use that though it is
+     * inconsequential.
+     */
+    id->acl = 3;
     id->frmw = 7 << 1;
     id->lpa = 1 << 0;
     id->sqes = (0x6 << 4) | 0x6;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 09/42] nvme: add max_ioqpairs device parameter
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (7 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 08/42] nvme: add support for the abort command Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:39   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 10/42] nvme: refactor device realization Klaus Jensen
                   ` (34 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

The num_queues device paramater has a slightly confusing meaning because
it accounts for the admin queue pair which is not really optional.
Secondly, it is really a maximum value of queues allowed.

Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
but keep num_queues for compatibility.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 45 ++++++++++++++++++++++++++-------------------
 hw/block/nvme.h |  4 +++-
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7cf7cf55143e..7dfd8a1a392d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,7 +19,7 @@
  *      -drive file=<file>,if=none,id=<drive_id>
  *      -device nvme,drive=<drive_id>,serial=<serial>,id=<id[optional]>, \
  *              cmb_size_mb=<cmb_size_mb[optional]>, \
- *              num_queues=<N[optional]>
+ *              max_ioqpairs=<N[optional]>
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -27,6 +27,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "qemu/error-report.h"
 #include "hw/block/block.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/pci.h"
@@ -72,12 +73,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-    return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
+    return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-    return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
+    return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -639,7 +640,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
         trace_nvme_dev_err_invalid_create_cq_addr(prp1);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
-    if (unlikely(vector > n->params.num_queues)) {
+    if (unlikely(vector > n->params.max_ioqpairs + 1)) {
         trace_nvme_dev_err_invalid_create_cq_vector(vector);
         return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
     }
@@ -803,8 +804,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
         break;
     case NVME_NUMBER_OF_QUEUES:
-        result = cpu_to_le32((n->params.num_queues - 2) |
-                             ((n->params.num_queues - 2) << 16));
+        result = cpu_to_le32((n->params.max_ioqpairs - 1) |
+                             ((n->params.max_ioqpairs - 1) << 16));
         trace_nvme_dev_getfeat_numq(result);
         break;
     case NVME_TIMESTAMP:
@@ -848,10 +849,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     case NVME_NUMBER_OF_QUEUES:
         trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
                                     ((dw11 >> 16) & 0xFFFF) + 1,
-                                    n->params.num_queues - 1,
-                                    n->params.num_queues - 1);
-        req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
-                                      ((n->params.num_queues - 2) << 16));
+                                    n->params.max_ioqpairs,
+                                    n->params.max_ioqpairs);
+        req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
+                                      ((n->params.max_ioqpairs - 1) << 16));
         break;
     case NVME_TIMESTAMP:
         return nvme_set_feature_timestamp(n, cmd);
@@ -924,12 +925,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
 
     blk_drain(n->conf.blk);
 
-    for (i = 0; i < n->params.num_queues; i++) {
+    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
         if (n->sq[i] != NULL) {
             nvme_free_sq(n->sq[i], n);
         }
     }
-    for (i = 0; i < n->params.num_queues; i++) {
+    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
         if (n->cq[i] != NULL) {
             nvme_free_cq(n->cq[i], n);
         }
@@ -1332,9 +1333,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     int64_t bs_size;
     uint8_t *pci_conf;
 
-    if (!n->params.num_queues) {
-        error_setg(errp, "num_queues can't be zero");
-        return;
+    if (n->params.num_queues) {
+        warn_report("nvme: num_queues is deprecated; please use max_ioqpairs "
+                    "instead");
+
+        n->params.max_ioqpairs = n->params.num_queues - 1;
+    }
+
+    if (!n->params.max_ioqpairs) {
+        error_setg(errp, "max_ioqpairs can't be less than 1");
     }
 
     if (!n->conf.blk) {
@@ -1365,19 +1372,19 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     pcie_endpoint_cap_init(pci_dev, 0x80);
 
     n->num_namespaces = 1;
-    n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
+    n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
     n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
     n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
-    n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
-    n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
+    n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
+    n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
 
     memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
                           "nvme", n->reg_size);
     pci_register_bar(pci_dev, 0,
         PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
         &n->iomem);
-    msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL);
+    msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
 
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 9957c4a200e2..98f5b9479244 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -6,11 +6,13 @@
 #define DEFINE_NVME_PROPERTIES(_state, _props) \
     DEFINE_PROP_STRING("serial", _state, _props.serial), \
     DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
-    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64)
+    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 0), \
+    DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64)
 
 typedef struct NvmeParams {
     char     *serial;
     uint32_t num_queues;
+    uint32_t max_ioqpairs;
     uint32_t cmb_size_mb;
 } NvmeParams;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 10/42] nvme: refactor device realization
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (8 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 09/42] nvme: add max_ioqpairs device parameter Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:40   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 11/42] nvme: add temperature threshold feature Klaus Jensen
                   ` (33 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

This patch splits up nvme_realize into multiple individual functions,
each initializing a different subset of the device.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c | 178 ++++++++++++++++++++++++++++++------------------
 hw/block/nvme.h |  23 ++++++-
 2 files changed, 134 insertions(+), 67 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7dfd8a1a392d..665485045066 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -44,6 +44,8 @@
 #include "trace.h"
 #include "nvme.h"
 
+#define NVME_CMB_BIR 2
+
 #define NVME_GUEST_ERR(trace, fmt, ...) \
     do { \
         (trace_##trace)(__VA_ARGS__); \
@@ -63,7 +65,7 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 
 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-    if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
+    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
         memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
         return;
     }
@@ -157,7 +159,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
     if (unlikely(!prp1)) {
         trace_nvme_dev_err_invalid_prp();
         return NVME_INVALID_FIELD | NVME_DNR;
-    } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
+    } else if (n->bar.cmbsz && prp1 >= n->ctrl_mem.addr &&
                prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
         qsg->nsg = 0;
         qemu_iovec_init(iov, num_prps);
@@ -1324,14 +1326,9 @@ static const MemoryRegionOps nvme_cmb_ops = {
     },
 };
 
-static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
 {
-    NvmeCtrl *n = NVME(pci_dev);
-    NvmeIdCtrl *id = &n->id_ctrl;
-
-    int i;
-    int64_t bs_size;
-    uint8_t *pci_conf;
+    NvmeParams *params = &n->params;
 
     if (n->params.num_queues) {
         warn_report("nvme: num_queues is deprecated; please use max_ioqpairs "
@@ -1340,57 +1337,100 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         n->params.max_ioqpairs = n->params.num_queues - 1;
     }
 
-    if (!n->params.max_ioqpairs) {
-        error_setg(errp, "max_ioqpairs can't be less than 1");
+    if (params->max_ioqpairs < 1 ||
+        params->max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
+        error_setg(errp, "nvme: max_ioqpairs must be ");
+        return -1;
     }
 
     if (!n->conf.blk) {
-        error_setg(errp, "drive property not set");
-        return;
+        error_setg(errp, "nvme: block backend not configured");
+        return -1;
     }
 
-    bs_size = blk_getlength(n->conf.blk);
-    if (bs_size < 0) {
-        error_setg(errp, "could not get backing file size");
-        return;
+    if (!params->serial) {
+        error_setg(errp, "nvme: serial not configured");
+        return -1;
     }
 
-    if (!n->params.serial) {
-        error_setg(errp, "serial property not set");
-        return;
-    }
+    return 0;
+}
+
+static int nvme_init_blk(NvmeCtrl *n, Error **errp)
+{
     blkconf_blocksizes(&n->conf);
     if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
                                        false, errp)) {
-        return;
+        return -1;
     }
 
-    pci_conf = pci_dev->config;
-    pci_conf[PCI_INTERRUPT_PIN] = 1;
-    pci_config_set_prog_interface(pci_dev->config, 0x2);
-    pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
-    pcie_endpoint_cap_init(pci_dev, 0x80);
+    return 0;
+}
 
+static void nvme_init_state(NvmeCtrl *n)
+{
     n->num_namespaces = 1;
     n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
-    n->ns_size = bs_size / (uint64_t)n->num_namespaces;
-
     n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
     n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
     n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
+}
 
-    memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
-                          "nvme", n->reg_size);
-    pci_register_bar(pci_dev, 0,
-        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
-        &n->iomem);
+static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+    NVME_CMBLOC_SET_BIR(n->bar.cmbloc, NVME_CMB_BIR);
+    NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
+
+    NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
+    NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
+    NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
+    NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
+    NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
+    NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2);
+    NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
+
+    n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+    memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
+                          "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+    pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
+                     PCI_BASE_ADDRESS_SPACE_MEMORY |
+                     PCI_BASE_ADDRESS_MEM_TYPE_64 |
+                     PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
+}
+
+static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+    uint8_t *pci_conf = pci_dev->config;
+
+    pci_conf[PCI_INTERRUPT_PIN] = 1;
+    pci_config_set_prog_interface(pci_conf, 0x2);
+    pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
+    pci_config_set_device_id(pci_conf, 0x5845);
+    pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
+    pcie_endpoint_cap_init(pci_dev, 0x80);
+
+    memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
+                          n->reg_size);
+    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+                     PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem);
     msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
 
+    if (n->params.cmb_size_mb) {
+        nvme_init_cmb(n, pci_dev);
+    }
+}
+
+static void nvme_init_ctrl(NvmeCtrl *n)
+{
+    NvmeIdCtrl *id = &n->id_ctrl;
+    NvmeParams *params = &n->params;
+    uint8_t *pci_conf = n->parent_obj.config;
+
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
     strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
     strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
-    strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+    strpadcpy((char *)id->sn, sizeof(id->sn), params->serial, ' ');
     id->rab = 6;
     id->ieee[0] = 0x00;
     id->ieee[1] = 0x02;
@@ -1431,46 +1471,54 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 
     n->bar.vs = 0x00010200;
     n->bar.intmc = n->bar.intms = 0;
+}
 
-    if (n->params.cmb_size_mb) {
+static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+{
+    int64_t bs_size;
+    NvmeIdNs *id_ns = &ns->id_ns;
 
-        NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
-        NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
+    bs_size = blk_getlength(n->conf.blk);
+    if (bs_size < 0) {
+        error_setg_errno(errp, -bs_size, "blk_getlength");
+        return -1;
+    }
 
-        NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
-        NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
-        NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
-        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
-        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
-        NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
-        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
+    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
+    n->ns_size = bs_size;
 
-        n->cmbloc = n->bar.cmbloc;
-        n->cmbsz = n->bar.cmbsz;
+    id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(n, ns));
 
-        n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
-        memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
-                              "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
-        pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
-            PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
-            PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
+    /* no thin provisioning */
+    id_ns->ncap = id_ns->nsze;
+    id_ns->nuse = id_ns->ncap;
 
+    return 0;
+}
+
+static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+{
+    NvmeCtrl *n = NVME(pci_dev);
+    int i;
+
+    if (nvme_check_constraints(n, errp)) {
+        return;
+    }
+
+    nvme_init_state(n);
+
+    if (nvme_init_blk(n, errp)) {
+        return;
     }
 
     for (i = 0; i < n->num_namespaces; i++) {
-        NvmeNamespace *ns = &n->namespaces[i];
-        NvmeIdNs *id_ns = &ns->id_ns;
-        id_ns->nsfeat = 0;
-        id_ns->nlbaf = 0;
-        id_ns->flbas = 0;
-        id_ns->mc = 0;
-        id_ns->dpc = 0;
-        id_ns->dps = 0;
-        id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
-        id_ns->ncap  = id_ns->nuse = id_ns->nsze =
-            cpu_to_le64(n->ns_size >>
-                id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas)].ds);
+        if (nvme_init_namespace(n, &n->namespaces[i], errp)) {
+            return;
+        }
     }
+
+    nvme_init_pci(n, pci_dev);
+    nvme_init_ctrl(n);
 }
 
 static void nvme_exit(PCIDevice *pci_dev)
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 98f5b9479244..b7c465560eea 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -67,6 +67,22 @@ typedef struct NvmeNamespace {
     NvmeIdNs        id_ns;
 } NvmeNamespace;
 
+static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
+{
+    NvmeIdNs *id_ns = &ns->id_ns;
+    return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
+static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
+{
+    return nvme_ns_lbaf(ns)->ds;
+}
+
+static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
+{
+    return 1 << nvme_ns_lbads(ns);
+}
+
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
@@ -88,8 +104,6 @@ typedef struct NvmeCtrl {
     uint32_t    num_namespaces;
     uint32_t    max_q_ents;
     uint64_t    ns_size;
-    uint32_t    cmbsz;
-    uint32_t    cmbloc;
     uint8_t     *cmbuf;
     uint64_t    irq_status;
     uint64_t    host_timestamp;                 /* Timestamp sent by the host */
@@ -103,4 +117,9 @@ typedef struct NvmeCtrl {
     NvmeIdCtrl      id_ctrl;
 } NvmeCtrl;
 
+static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
+{
+    return n->ns_size >> nvme_ns_lbads(ns);
+}
+
 #endif /* HW_NVME_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 11/42] nvme: add temperature threshold feature
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (9 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 10/42] nvme: refactor device realization Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:40   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 12/42] nvme: add support for the get log page command Klaus Jensen
                   ` (32 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

It might seem wierd to implement this feature for an emulated device,
but it is mandatory to support and the feature is useful for testing
asynchronous event request support, which will be added in a later
patch.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme.h      |  2 ++
 include/block/nvme.h |  8 +++++++-
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 665485045066..64c42101df5c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -45,6 +45,9 @@
 #include "nvme.h"
 
 #define NVME_CMB_BIR 2
+#define NVME_TEMPERATURE 0x143
+#define NVME_TEMPERATURE_WARNING 0x157
+#define NVME_TEMPERATURE_CRITICAL 0x175
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
     do { \
@@ -798,9 +801,31 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
     uint32_t dw10 = le32_to_cpu(cmd->cdw10);
+    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
     uint32_t result;
 
     switch (dw10) {
+    case NVME_TEMPERATURE_THRESHOLD:
+        result = 0;
+
+        /*
+         * The controller only implements the Composite Temperature sensor, so
+         * return 0 for all other sensors.
+         */
+        if (NVME_TEMP_TMPSEL(dw11)) {
+            break;
+        }
+
+        switch (NVME_TEMP_THSEL(dw11)) {
+        case 0x0:
+            result = cpu_to_le16(n->features.temp_thresh_hi);
+            break;
+        case 0x1:
+            result = cpu_to_le16(n->features.temp_thresh_low);
+            break;
+        }
+
+        break;
     case NVME_VOLATILE_WRITE_CACHE:
         result = blk_enable_write_cache(n->conf.blk);
         trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
@@ -845,6 +870,23 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     uint32_t dw11 = le32_to_cpu(cmd->cdw11);
 
     switch (dw10) {
+    case NVME_TEMPERATURE_THRESHOLD:
+        if (NVME_TEMP_TMPSEL(dw11)) {
+            break;
+        }
+
+        switch (NVME_TEMP_THSEL(dw11)) {
+        case 0x0:
+            n->features.temp_thresh_hi = NVME_TEMP_TMPTH(dw11);
+            break;
+        case 0x1:
+            n->features.temp_thresh_low = NVME_TEMP_TMPTH(dw11);
+            break;
+        default:
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+
+        break;
     case NVME_VOLATILE_WRITE_CACHE:
         blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
         break;
@@ -1374,6 +1416,7 @@ static void nvme_init_state(NvmeCtrl *n)
     n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
     n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
     n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
+    n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -1451,6 +1494,11 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->acl = 3;
     id->frmw = 7 << 1;
     id->lpa = 1 << 0;
+
+    /* recommended default value (~70 C) */
+    id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
+    id->cctemp = cpu_to_le16(NVME_TEMPERATURE_CRITICAL);
+
     id->sqes = (0x6 << 4) | 0x6;
     id->cqes = (0x4 << 4) | 0x4;
     id->nn = cpu_to_le32(n->num_namespaces);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index b7c465560eea..8cda5f02c622 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -108,6 +108,7 @@ typedef struct NvmeCtrl {
     uint64_t    irq_status;
     uint64_t    host_timestamp;                 /* Timestamp sent by the host */
     uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
+    uint16_t    temperature;
 
     NvmeNamespace   *namespaces;
     NvmeSQueue      **sq;
@@ -115,6 +116,7 @@ typedef struct NvmeCtrl {
     NvmeSQueue      admin_sq;
     NvmeCQueue      admin_cq;
     NvmeIdCtrl      id_ctrl;
+    NvmeFeatureVal  features;
 } NvmeCtrl;
 
 static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
diff --git a/include/block/nvme.h b/include/block/nvme.h
index a083c1b3a613..91fc4738a3e0 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -688,7 +688,13 @@ enum NvmeIdCtrlOncs {
 typedef struct NvmeFeatureVal {
     uint32_t    arbitration;
     uint32_t    power_mgmt;
-    uint32_t    temp_thresh;
+    union {
+        struct {
+            uint16_t temp_thresh_hi;
+            uint16_t temp_thresh_low;
+        };
+        uint32_t temp_thresh;
+    };
     uint32_t    err_rec;
     uint32_t    volatile_wc;
     uint32_t    num_queues;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 12/42] nvme: add support for the get log page command
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (10 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 11/42] nvme: add temperature threshold feature Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:40   ` Maxim Levitsky
  2020-03-16 14:28 ` [PATCH v6 13/42] nvme: add support for the asynchronous event request command Klaus Jensen
                   ` (31 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Add support for the Get Log Page command and basic implementations of
the mandatory Error Information, SMART / Health Information and Firmware
Slot Information log pages.

In violation of the specification, the SMART / Health Information log
page does not persist information over the lifetime of the controller
because the device has no place to store such persistent state.

Note that the LPA field in the Identify Controller data structure
intentionally has bit 0 cleared because there is no namespace specific
information in the SMART / Health information log page.

Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
Section 5.10 ("Get Log Page command").

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c       | 138 +++++++++++++++++++++++++++++++++++++++++-
 hw/block/nvme.h       |  10 +++
 hw/block/trace-events |   2 +
 3 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 64c42101df5c..83ff3fbfb463 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -569,6 +569,138 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
     return NVME_SUCCESS;
 }
 
+static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
+                                uint64_t off, NvmeRequest *req)
+{
+    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
+    uint32_t nsid = le32_to_cpu(cmd->nsid);
+
+    uint32_t trans_len;
+    time_t current_ms;
+    uint64_t units_read = 0, units_written = 0;
+    uint64_t read_commands = 0, write_commands = 0;
+    NvmeSmartLog smart;
+    BlockAcctStats *s;
+
+    if (nsid && nsid != 0xffffffff) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    s = blk_get_stats(n->conf.blk);
+
+    units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
+    units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
+    read_commands = s->nr_ops[BLOCK_ACCT_READ];
+    write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
+
+    if (off > sizeof(smart)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    trans_len = MIN(sizeof(smart) - off, buf_len);
+
+    memset(&smart, 0x0, sizeof(smart));
+
+    smart.data_units_read[0] = cpu_to_le64(units_read / 1000);
+    smart.data_units_written[0] = cpu_to_le64(units_written / 1000);
+    smart.host_read_commands[0] = cpu_to_le64(read_commands);
+    smart.host_write_commands[0] = cpu_to_le64(write_commands);
+
+    smart.temperature[0] = n->temperature & 0xff;
+    smart.temperature[1] = (n->temperature >> 8) & 0xff;
+
+    if ((n->temperature > n->features.temp_thresh_hi) ||
+        (n->temperature < n->features.temp_thresh_low)) {
+        smart.critical_warning |= NVME_SMART_TEMPERATURE;
+    }
+
+    current_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+    smart.power_on_hours[0] =
+        cpu_to_le64((((current_ms - n->starttime_ms) / 1000) / 60) / 60);
+
+    return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
+                             prp2);
+}
+
+static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
+                                 uint64_t off, NvmeRequest *req)
+{
+    uint32_t trans_len;
+    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
+    NvmeFwSlotInfoLog fw_log;
+
+    if (off > sizeof(fw_log)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    memset(&fw_log, 0, sizeof(NvmeFwSlotInfoLog));
+
+    trans_len = MIN(sizeof(fw_log) - off, buf_len);
+
+    return nvme_dma_read_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1,
+                             prp2);
+}
+
+static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
+                                uint64_t off, NvmeRequest *req)
+{
+    uint32_t trans_len;
+    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
+    uint8_t errlog[64];
+
+    if (off > sizeof(errlog)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    memset(errlog, 0x0, sizeof(errlog));
+
+    trans_len = MIN(sizeof(errlog) - off, buf_len);
+
+    return nvme_dma_read_prp(n, errlog, trans_len, prp1, prp2);
+}
+
+static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+    uint32_t dw10 = le32_to_cpu(cmd->cdw10);
+    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
+    uint32_t dw12 = le32_to_cpu(cmd->cdw12);
+    uint32_t dw13 = le32_to_cpu(cmd->cdw13);
+    uint8_t  lid = dw10 & 0xff;
+    uint8_t  rae = (dw10 >> 15) & 0x1;
+    uint32_t numdl, numdu;
+    uint64_t off, lpol, lpou;
+    size_t   len;
+
+    numdl = (dw10 >> 16);
+    numdu = (dw11 & 0xffff);
+    lpol = dw12;
+    lpou = dw13;
+
+    len = (((numdu << 16) | numdl) + 1) << 2;
+    off = (lpou << 32ULL) | lpol;
+
+    if (off & 0x3) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    trace_nvme_dev_get_log(nvme_cid(req), lid, rae, len, off);
+
+    switch (lid) {
+    case NVME_LOG_ERROR_INFO:
+        return nvme_error_info(n, cmd, len, off, req);
+    case NVME_LOG_SMART_INFO:
+        return nvme_smart_info(n, cmd, len, off, req);
+    case NVME_LOG_FW_SLOT_INFO:
+        return nvme_fw_log_info(n, cmd, len, off, req);
+    default:
+        trace_nvme_dev_err_invalid_log_page(nvme_cid(req), lid);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+}
+
 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 {
     n->cq[cq->cqid] = NULL;
@@ -914,6 +1046,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         return nvme_del_sq(n, cmd);
     case NVME_ADM_CMD_CREATE_SQ:
         return nvme_create_sq(n, cmd);
+    case NVME_ADM_CMD_GET_LOG_PAGE:
+        return nvme_get_log(n, cmd, req);
     case NVME_ADM_CMD_DELETE_CQ:
         return nvme_del_cq(n, cmd);
     case NVME_ADM_CMD_CREATE_CQ:
@@ -1416,7 +1550,9 @@ static void nvme_init_state(NvmeCtrl *n)
     n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
     n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
     n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
+    n->temperature = NVME_TEMPERATURE;
     n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
+    n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -1493,7 +1629,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
      */
     id->acl = 3;
     id->frmw = 7 << 1;
-    id->lpa = 1 << 0;
+    id->lpa = 1 << 2;
 
     /* recommended default value (~70 C) */
     id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 8cda5f02c622..ebeee2edc4f4 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -109,6 +109,7 @@ typedef struct NvmeCtrl {
     uint64_t    host_timestamp;                 /* Timestamp sent by the host */
     uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
     uint16_t    temperature;
+    uint64_t    starttime_ms;
 
     NvmeNamespace   *namespaces;
     NvmeSQueue      **sq;
@@ -124,4 +125,13 @@ static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
     return n->ns_size >> nvme_ns_lbads(ns);
 }
 
+static inline uint16_t nvme_cid(NvmeRequest *req)
+{
+    if (req) {
+        return le16_to_cpu(req->cqe.cid);
+    }
+
+    return 0xffff;
+}
+
 #endif /* HW_NVME_H */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index ade506ea2bb2..7da088479f39 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -46,6 +46,7 @@ nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
 nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
 nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
 nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
+nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
 nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
@@ -85,6 +86,7 @@ nvme_dev_err_invalid_create_cq_qflags(uint16_t qflags) "failed creating completi
 nvme_dev_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx16""
 nvme_dev_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
 nvme_dev_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
+nvme_dev_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 0x%"PRIx16""
 nvme_dev_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
 nvme_dev_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
 nvme_dev_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 13/42] nvme: add support for the asynchronous event request command
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (11 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 12/42] nvme: add support for the get log page command Klaus Jensen
@ 2020-03-16 14:28 ` Klaus Jensen
  2020-03-25 10:41   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 14/42] nvme: add missing mandatory features Klaus Jensen
                   ` (30 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:28 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
Section 5.2 ("Asynchronous Event Request command").

Mostly imported from Keith's qemu-nvme tree. Modified with a max number
of queued events (controllable with the aer_max_queued device
parameter). The spec states that the controller *should* retain
events, so we do best effort here.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c       | 178 ++++++++++++++++++++++++++++++++++++++++--
 hw/block/nvme.h       |  14 +++-
 hw/block/trace-events |   9 +++
 include/block/nvme.h  |   8 +-
 4 files changed, 199 insertions(+), 10 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 83ff3fbfb463..ff8975cd6667 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -325,6 +325,85 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
     timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
 }
 
+static void nvme_process_aers(void *opaque)
+{
+    NvmeCtrl *n = opaque;
+    NvmeAsyncEvent *event, *next;
+
+    trace_nvme_dev_process_aers(n->aer_queued);
+
+    QTAILQ_FOREACH_SAFE(event, &n->aer_queue, entry, next) {
+        NvmeRequest *req;
+        NvmeAerResult *result;
+
+        /* can't post cqe if there is nothing to complete */
+        if (!n->outstanding_aers) {
+            trace_nvme_dev_no_outstanding_aers();
+            break;
+        }
+
+        /* ignore if masked (cqe posted, but event not cleared) */
+        if (n->aer_mask & (1 << event->result.event_type)) {
+            trace_nvme_dev_aer_masked(event->result.event_type, n->aer_mask);
+            continue;
+        }
+
+        QTAILQ_REMOVE(&n->aer_queue, event, entry);
+        n->aer_queued--;
+
+        n->aer_mask |= 1 << event->result.event_type;
+        n->outstanding_aers--;
+
+        req = n->aer_reqs[n->outstanding_aers];
+
+        result = (NvmeAerResult *) &req->cqe.result;
+        result->event_type = event->result.event_type;
+        result->event_info = event->result.event_info;
+        result->log_page = event->result.log_page;
+        g_free(event);
+
+        req->status = NVME_SUCCESS;
+
+        trace_nvme_dev_aer_post_cqe(result->event_type, result->event_info,
+                                    result->log_page);
+
+        nvme_enqueue_req_completion(&n->admin_cq, req);
+    }
+}
+
+static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
+                               uint8_t event_info, uint8_t log_page)
+{
+    NvmeAsyncEvent *event;
+
+    trace_nvme_dev_enqueue_event(event_type, event_info, log_page);
+
+    if (n->aer_queued == n->params.aer_max_queued) {
+        trace_nvme_dev_enqueue_event_noqueue(n->aer_queued);
+        return;
+    }
+
+    event = g_new(NvmeAsyncEvent, 1);
+    event->result = (NvmeAerResult) {
+        .event_type = event_type,
+        .event_info = event_info,
+        .log_page   = log_page,
+    };
+
+    QTAILQ_INSERT_TAIL(&n->aer_queue, event, entry);
+    n->aer_queued++;
+
+    nvme_process_aers(n);
+}
+
+static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
+{
+    n->aer_mask &= ~(1 << event_type);
+    if (!QTAILQ_EMPTY(&n->aer_queue)) {
+        nvme_process_aers(n);
+    }
+}
+
 static void nvme_rw_cb(void *opaque, int ret)
 {
     NvmeRequest *req = opaque;
@@ -569,8 +648,9 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
-                                uint64_t off, NvmeRequest *req)
+static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
+                                uint32_t buf_len, uint64_t off,
+                                NvmeRequest *req)
 {
     uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
     uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
@@ -619,6 +699,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
     smart.power_on_hours[0] =
         cpu_to_le64((((current_ms - n->starttime_ms) / 1000) / 60) / 60);
 
+    if (!rae) {
+        nvme_clear_events(n, NVME_AER_TYPE_SMART);
+    }
+
     return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
                              prp2);
 }
@@ -643,14 +727,19 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
                              prp2);
 }
 
-static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
-                                uint64_t off, NvmeRequest *req)
+static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
+                                uint32_t buf_len, uint64_t off,
+                                NvmeRequest *req)
 {
     uint32_t trans_len;
     uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
     uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
     uint8_t errlog[64];
 
+    if (!rae) {
+        nvme_clear_events(n, NVME_AER_TYPE_ERROR);
+    }
+
     if (off > sizeof(errlog)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
@@ -690,9 +779,9 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 
     switch (lid) {
     case NVME_LOG_ERROR_INFO:
-        return nvme_error_info(n, cmd, len, off, req);
+        return nvme_error_info(n, cmd, rae, len, off, req);
     case NVME_LOG_SMART_INFO:
-        return nvme_smart_info(n, cmd, len, off, req);
+        return nvme_smart_info(n, cmd, rae, len, off, req);
     case NVME_LOG_FW_SLOT_INFO:
         return nvme_fw_log_info(n, cmd, len, off, req);
     default:
@@ -969,6 +1058,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         break;
     case NVME_TIMESTAMP:
         return nvme_get_feature_timestamp(n, cmd);
+    case NVME_ASYNCHRONOUS_EVENT_CONF:
+        result = cpu_to_le32(n->features.async_config);
+        break;
     default:
         trace_nvme_dev_err_invalid_getfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1018,6 +1110,14 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
             return NVME_INVALID_FIELD | NVME_DNR;
         }
 
+        if (((n->temperature > n->features.temp_thresh_hi) ||
+            (n->temperature < n->features.temp_thresh_low)) &&
+            NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) {
+            nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
+                               NVME_AER_INFO_SMART_TEMP_THRESH,
+                               NVME_LOG_SMART_INFO);
+        }
+
         break;
     case NVME_VOLATILE_WRITE_CACHE:
         blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
@@ -1032,6 +1132,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         break;
     case NVME_TIMESTAMP:
         return nvme_set_feature_timestamp(n, cmd);
+    case NVME_ASYNCHRONOUS_EVENT_CONF:
+        n->features.async_config = dw11;
+        break;
     default:
         trace_nvme_dev_err_invalid_setfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1039,6 +1142,25 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
+static uint16_t nvme_aer(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+    trace_nvme_dev_aer(nvme_cid(req));
+
+    if (n->outstanding_aers > n->params.aerl) {
+        trace_nvme_dev_aer_aerl_exceeded();
+        return NVME_AER_LIMIT_EXCEEDED;
+    }
+
+    n->aer_reqs[n->outstanding_aers] = req;
+    n->outstanding_aers++;
+
+    if (!QTAILQ_EMPTY(&n->aer_queue)) {
+        nvme_process_aers(n);
+    }
+
+    return NVME_NO_COMPLETE;
+}
+
 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
     switch (cmd->opcode) {
@@ -1060,6 +1182,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         return nvme_set_feature(n, cmd, req);
     case NVME_ADM_CMD_GET_FEATURES:
         return nvme_get_feature(n, cmd, req);
+    case NVME_ADM_CMD_ASYNC_EV_REQ:
+        return nvme_aer(n, cmd, req);
     default:
         trace_nvme_dev_err_invalid_admin_opc(cmd->opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
@@ -1114,6 +1238,15 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
         }
     }
 
+    while (!QTAILQ_EMPTY(&n->aer_queue)) {
+        NvmeAsyncEvent *event = QTAILQ_FIRST(&n->aer_queue);
+        QTAILQ_REMOVE(&n->aer_queue, event, entry);
+        g_free(event);
+    }
+
+    n->aer_queued = 0;
+    n->outstanding_aers = 0;
+
     blk_flush(n->conf.blk);
     n->bar.cc = 0;
 }
@@ -1210,6 +1343,8 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 
     nvme_set_timestamp(n, 0ULL);
 
+    QTAILQ_INIT(&n->aer_queue);
+
     return 0;
 }
 
@@ -1402,6 +1537,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            "completion queue doorbell write"
                            " for nonexistent queue,"
                            " sqid=%"PRIu32", ignoring", qid);
+
+            if (n->outstanding_aers) {
+                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                                   NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
+                                   NVME_LOG_ERROR_INFO);
+            }
+
             return;
         }
 
@@ -1412,6 +1554,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            " beyond queue size, sqid=%"PRIu32","
                            " new_head=%"PRIu16", ignoring",
                            qid, new_head);
+
+            if (n->outstanding_aers) {
+                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                                   NVME_AER_INFO_ERR_INVALID_DB_VALUE,
+                                   NVME_LOG_ERROR_INFO);
+            }
+
             return;
         }
 
@@ -1440,6 +1589,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            "submission queue doorbell write"
                            " for nonexistent queue,"
                            " sqid=%"PRIu32", ignoring", qid);
+
+            if (n->outstanding_aers) {
+                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                                   NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
+                                   NVME_LOG_ERROR_INFO);
+            }
+
             return;
         }
 
@@ -1450,6 +1606,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
                            " beyond queue size, sqid=%"PRIu32","
                            " new_tail=%"PRIu16", ignoring",
                            qid, new_tail);
+
+            if (n->outstanding_aers) {
+                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
+                                   NVME_AER_INFO_ERR_INVALID_DB_VALUE,
+                                   NVME_LOG_ERROR_INFO);
+            }
+
             return;
         }
 
@@ -1553,6 +1716,7 @@ static void nvme_init_state(NvmeCtrl *n)
     n->temperature = NVME_TEMPERATURE;
     n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
     n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+    n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -1628,6 +1792,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
      * inconsequential.
      */
     id->acl = 3;
+    id->aerl = n->params.aerl;
     id->frmw = 7 << 1;
     id->lpa = 1 << 2;
 
@@ -1713,6 +1878,7 @@ static void nvme_exit(PCIDevice *pci_dev)
     g_free(n->namespaces);
     g_free(n->cq);
     g_free(n->sq);
+    g_free(n->aer_reqs);
 
     if (n->params.cmb_size_mb) {
         g_free(n->cmbuf);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index ebeee2edc4f4..b709a8bb8d40 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -7,17 +7,21 @@
     DEFINE_PROP_STRING("serial", _state, _props.serial), \
     DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
     DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 0), \
-    DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64)
+    DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64), \
+    DEFINE_PROP_UINT8("aerl", _state, _props.aerl, 3), \
+    DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64)
 
 typedef struct NvmeParams {
     char     *serial;
     uint32_t num_queues;
     uint32_t max_ioqpairs;
     uint32_t cmb_size_mb;
+    uint8_t  aerl;
+    uint32_t aer_max_queued;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
-    QSIMPLEQ_ENTRY(NvmeAsyncEvent) entry;
+    QTAILQ_ENTRY(NvmeAsyncEvent) entry;
     NvmeAerResult result;
 } NvmeAsyncEvent;
 
@@ -104,6 +108,7 @@ typedef struct NvmeCtrl {
     uint32_t    num_namespaces;
     uint32_t    max_q_ents;
     uint64_t    ns_size;
+    uint8_t     outstanding_aers;
     uint8_t     *cmbuf;
     uint64_t    irq_status;
     uint64_t    host_timestamp;                 /* Timestamp sent by the host */
@@ -111,6 +116,11 @@ typedef struct NvmeCtrl {
     uint16_t    temperature;
     uint64_t    starttime_ms;
 
+    uint8_t     aer_mask;
+    NvmeRequest **aer_reqs;
+    QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
+    int         aer_queued;
+
     NvmeNamespace   *namespaces;
     NvmeSQueue      **sq;
     NvmeCQueue      **cq;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 7da088479f39..3952c36774cf 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -47,6 +47,15 @@ nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_
 nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
 nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
 nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
+nvme_dev_process_aers(int queued) "queued %d"
+nvme_dev_aer(uint16_t cid) "cid %"PRIu16""
+nvme_dev_aer_aerl_exceeded(void) "aerl exceeded"
+nvme_dev_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8""
+nvme_dev_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+nvme_dev_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+nvme_dev_enqueue_event_noqueue(int queued) "queued %d"
+nvme_dev_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
+nvme_dev_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
 nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 91fc4738a3e0..f2a8b07c0f2f 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -425,8 +425,8 @@ enum NvmeAsyncEventRequest {
     NVME_AER_TYPE_SMART                     = 1,
     NVME_AER_TYPE_IO_SPECIFIC               = 6,
     NVME_AER_TYPE_VENDOR_SPECIFIC           = 7,
-    NVME_AER_INFO_ERR_INVALID_SQ            = 0,
-    NVME_AER_INFO_ERR_INVALID_DB            = 1,
+    NVME_AER_INFO_ERR_INVALID_DB_REGISTER   = 0,
+    NVME_AER_INFO_ERR_INVALID_DB_VALUE      = 1,
     NVME_AER_INFO_ERR_DIAG_FAIL             = 2,
     NVME_AER_INFO_ERR_PERS_INTERNAL_ERR     = 3,
     NVME_AER_INFO_ERR_TRANS_INTERNAL_ERR    = 4,
@@ -717,6 +717,10 @@ typedef struct NvmeFeatureVal {
 #define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
 #define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
 
+#define NVME_AEC_SMART(aec)         (aec & 0xff)
+#define NVME_AEC_NS_ATTR(aec)       ((aec >> 8) & 0x1)
+#define NVME_AEC_FW_ACTIVATION(aec) ((aec >> 9) & 0x1)
+
 enum NvmeFeatureIds {
     NVME_ARBITRATION                = 0x1,
     NVME_POWER_MANAGEMENT           = 0x2,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 14/42] nvme: add missing mandatory features
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (12 preceding siblings ...)
  2020-03-16 14:28 ` [PATCH v6 13/42] nvme: add support for the asynchronous event request command Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:41   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 15/42] nvme: additional tracing Klaus Jensen
                   ` (29 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Add support for returning a resonable response to Get/Set Features of
mandatory features.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c       | 60 ++++++++++++++++++++++++++++++++++++++++++-
 hw/block/trace-events |  2 ++
 include/block/nvme.h  |  6 ++++-
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ff8975cd6667..eb9c722df968 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1025,7 +1025,15 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     uint32_t dw11 = le32_to_cpu(cmd->cdw11);
     uint32_t result;
 
+    trace_nvme_dev_getfeat(nvme_cid(req), dw10);
+
     switch (dw10) {
+    case NVME_ARBITRATION:
+        result = cpu_to_le32(n->features.arbitration);
+        break;
+    case NVME_POWER_MANAGEMENT:
+        result = cpu_to_le32(n->features.power_mgmt);
+        break;
     case NVME_TEMPERATURE_THRESHOLD:
         result = 0;
 
@@ -1046,9 +1054,12 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
             break;
         }
 
+        break;
+    case NVME_ERROR_RECOVERY:
+        result = cpu_to_le32(n->features.err_rec);
         break;
     case NVME_VOLATILE_WRITE_CACHE:
-        result = blk_enable_write_cache(n->conf.blk);
+        result = cpu_to_le32(blk_enable_write_cache(n->conf.blk));
         trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
         break;
     case NVME_NUMBER_OF_QUEUES:
@@ -1058,6 +1069,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         break;
     case NVME_TIMESTAMP:
         return nvme_get_feature_timestamp(n, cmd);
+    case NVME_INTERRUPT_COALESCING:
+        result = cpu_to_le32(n->features.int_coalescing);
+        break;
+    case NVME_INTERRUPT_VECTOR_CONF:
+        if ((dw11 & 0xffff) > n->params.max_ioqpairs + 1) {
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+
+        result = cpu_to_le32(n->features.int_vector_config[dw11 & 0xffff]);
+        break;
+    case NVME_WRITE_ATOMICITY:
+        result = cpu_to_le32(n->features.write_atomicity);
+        break;
     case NVME_ASYNCHRONOUS_EVENT_CONF:
         result = cpu_to_le32(n->features.async_config);
         break;
@@ -1093,6 +1117,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     uint32_t dw10 = le32_to_cpu(cmd->cdw10);
     uint32_t dw11 = le32_to_cpu(cmd->cdw11);
 
+    trace_nvme_dev_setfeat(nvme_cid(req), dw10, dw11);
+
     switch (dw10) {
     case NVME_TEMPERATURE_THRESHOLD:
         if (NVME_TEMP_TMPSEL(dw11)) {
@@ -1120,6 +1146,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 
         break;
     case NVME_VOLATILE_WRITE_CACHE:
+        if (blk_enable_write_cache(n->conf.blk)) {
+            blk_flush(n->conf.blk);
+        }
+
         blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
         break;
     case NVME_NUMBER_OF_QUEUES:
@@ -1135,6 +1165,13 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     case NVME_ASYNCHRONOUS_EVENT_CONF:
         n->features.async_config = dw11;
         break;
+    case NVME_ARBITRATION:
+    case NVME_POWER_MANAGEMENT:
+    case NVME_ERROR_RECOVERY:
+    case NVME_INTERRUPT_COALESCING:
+    case NVME_INTERRUPT_VECTOR_CONF:
+    case NVME_WRITE_ATOMICITY:
+        return NVME_FEAT_NOT_CHANGABLE | NVME_DNR;
     default:
         trace_nvme_dev_err_invalid_setfeat(dw10);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1716,6 +1753,25 @@ static void nvme_init_state(NvmeCtrl *n)
     n->temperature = NVME_TEMPERATURE;
     n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
     n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+
+    /*
+     * There is no limit on the number of commands that the controller may
+     * launch at one time from a particular Submission Queue.
+     */
+    n->features.arbitration = NVME_ARB_AB_NOLIMIT;
+
+    n->features.int_vector_config = g_malloc0_n(n->params.max_ioqpairs + 1,
+        sizeof(*n->features.int_vector_config));
+
+    for (int i = 0; i < n->params.max_ioqpairs + 1; i++) {
+        n->features.int_vector_config[i] = i;
+
+        /* interrupt coalescing is not supported for the admin queue */
+        if (i == 0) {
+            n->features.int_vector_config[i] |= NVME_INTVC_NOCOALESCING;
+        }
+    }
+
     n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 }
 
@@ -1804,6 +1860,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->cqes = (0x4 << 4) | 0x4;
     id->nn = cpu_to_le32(n->num_namespaces);
     id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
+
     id->psd[0].mp = cpu_to_le16(0x9c4);
     id->psd[0].enlat = cpu_to_le32(0x10);
     id->psd[0].exlat = cpu_to_le32(0x4);
@@ -1879,6 +1936,7 @@ static void nvme_exit(PCIDevice *pci_dev)
     g_free(n->cq);
     g_free(n->sq);
     g_free(n->aer_reqs);
+    g_free(n->features.int_vector_config);
 
     if (n->params.cmb_size_mb) {
         g_free(n->cmbuf);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 3952c36774cf..4cf39961989d 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -41,6 +41,8 @@ nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
 nvme_dev_identify_ctrl(void) "identify controller"
 nvme_dev_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
 nvme_dev_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
+nvme_dev_getfeat(uint16_t cid, uint32_t fid) "cid %"PRIu16" fid 0x%"PRIx32""
+nvme_dev_setfeat(uint16_t cid, uint32_t fid, uint32_t val) "cid %"PRIu16" fid 0x%"PRIx32" val 0x%"PRIx32""
 nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
 nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
 nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f2a8b07c0f2f..ecc02fbe8bb8 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -490,7 +490,8 @@ enum NvmeStatusCodes {
     NVME_FW_REQ_RESET           = 0x010b,
     NVME_INVALID_QUEUE_DEL      = 0x010c,
     NVME_FID_NOT_SAVEABLE       = 0x010d,
-    NVME_FID_NOT_NSID_SPEC      = 0x010f,
+    NVME_FEAT_NOT_CHANGABLE     = 0x010e,
+    NVME_FEAT_NOT_NS_SPEC       = 0x010f,
     NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
     NVME_CONFLICTING_ATTRS      = 0x0180,
     NVME_INVALID_PROT_INFO      = 0x0181,
@@ -706,6 +707,7 @@ typedef struct NvmeFeatureVal {
 } NvmeFeatureVal;
 
 #define NVME_ARB_AB(arb)    (arb & 0x7)
+#define NVME_ARB_AB_NOLIMIT 0x7
 #define NVME_ARB_LPW(arb)   ((arb >> 8) & 0xff)
 #define NVME_ARB_MPW(arb)   ((arb >> 16) & 0xff)
 #define NVME_ARB_HPW(arb)   ((arb >> 24) & 0xff)
@@ -713,6 +715,8 @@ typedef struct NvmeFeatureVal {
 #define NVME_INTC_THR(intc)     (intc & 0xff)
 #define NVME_INTC_TIME(intc)    ((intc >> 8) & 0xff)
 
+#define NVME_INTVC_NOCOALESCING (0x1 << 16)
+
 #define NVME_TEMP_THSEL(temp)  ((temp >> 20) & 0x3)
 #define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
 #define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 15/42] nvme: additional tracing
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (13 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 14/42] nvme: add missing mandatory features Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:42   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 16/42] nvme: make sure ncqr and nsqr is valid Klaus Jensen
                   ` (28 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Add additional trace calls for nvme_enqueue_req_completion, mmio and
doorbell writes.

Also, streamline nvme_identify_ns and nvme_identify_ns_list. They do not
need to repeat the command, it is already in the trace name.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c       | 10 ++++++++++
 hw/block/trace-events |  9 +++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index eb9c722df968..85c7c86b35f0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -320,6 +320,8 @@ static void nvme_post_cqes(void *opaque)
 static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
 {
     assert(cq->cqid == req->sq->cqid);
+    trace_nvme_dev_enqueue_req_completion(nvme_cid(req), cq->cqid,
+                                          req->status);
     QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
     QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
     timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
@@ -1527,6 +1529,8 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size)
     uint8_t *ptr = (uint8_t *)&n->bar;
     uint64_t val = 0;
 
+    trace_nvme_dev_mmio_read(addr);
+
     if (unlikely(addr & (sizeof(uint32_t) - 1))) {
         NVME_GUEST_ERR(nvme_dev_ub_mmiord_misaligned32,
                        "MMIO read not 32-bit aligned,"
@@ -1601,6 +1605,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
             return;
         }
 
+        trace_nvme_dev_mmio_doorbell_cq(cq->cqid, new_head);
+
         start_sqs = nvme_cq_full(cq) ? 1 : 0;
         cq->head = new_head;
         if (start_sqs) {
@@ -1653,6 +1659,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
             return;
         }
 
+        trace_nvme_dev_mmio_doorbell_sq(sq->sqid, new_tail);
+
         sq->tail = new_tail;
         timer_mod(sq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
     }
@@ -1661,6 +1669,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 static void nvme_mmio_write(void *opaque, hwaddr addr, uint64_t data,
     unsigned size)
 {
+    trace_nvme_dev_mmio_write(addr, data);
+
     NvmeCtrl *n = (NvmeCtrl *)opaque;
     if (addr < sizeof(n->bar)) {
         nvme_write_bar(n, addr, data, size);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 4cf39961989d..dde1d22bc39a 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -39,8 +39,8 @@ nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size,
 nvme_dev_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
 nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
 nvme_dev_identify_ctrl(void) "identify controller"
-nvme_dev_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
-nvme_dev_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
+nvme_dev_identify_ns(uint32_t ns) "nsid %"PRIu32""
+nvme_dev_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 nvme_dev_getfeat(uint16_t cid, uint32_t fid) "cid %"PRIu16" fid 0x%"PRIx32""
 nvme_dev_setfeat(uint16_t cid, uint32_t fid, uint32_t val) "cid %"PRIu16" fid 0x%"PRIx32" val 0x%"PRIx32""
 nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
@@ -54,10 +54,13 @@ nvme_dev_aer(uint16_t cid) "cid %"PRIu16""
 nvme_dev_aer_aerl_exceeded(void) "aerl exceeded"
 nvme_dev_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8""
 nvme_dev_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+nvme_dev_enqueue_req_completion(uint16_t cid, uint16_t cqid, uint16_t status) "cid %"PRIu16" cqid %"PRIu16" status 0x%"PRIx16""
 nvme_dev_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
 nvme_dev_enqueue_event_noqueue(int queued) "queued %d"
 nvme_dev_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
 nvme_dev_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
+nvme_dev_mmio_read(uint64_t addr) "addr 0x%"PRIx64""
+nvme_dev_mmio_write(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" data 0x%"PRIx64""
 nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
@@ -70,6 +73,8 @@ nvme_dev_mmio_start_success(void) "setting controller enable bit succeeded"
 nvme_dev_mmio_stopped(void) "cleared controller enable bit"
 nvme_dev_mmio_shutdown_set(void) "shutdown bit set"
 nvme_dev_mmio_shutdown_cleared(void) "shutdown bit cleared"
+nvme_dev_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_head %"PRIu16""
+nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_tail %"PRIu16""
 
 # nvme traces for error conditions
 nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 16/42] nvme: make sure ncqr and nsqr is valid
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (14 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 15/42] nvme: additional tracing Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:42   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 17/42] nvme: add log specific field to trace events Klaus Jensen
                   ` (27 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

0xffff is not an allowed value for NCQR and NSQR in Set Features on
Number of Queues.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 85c7c86b35f0..e56142c4ea99 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1155,6 +1155,14 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
         break;
     case NVME_NUMBER_OF_QUEUES:
+        /*
+         * NVMe v1.3, Section 5.21.1.7: 0xffff is not an allowed value for NCQR
+         * and NSQR.
+         */
+        if ((dw11 & 0xffff) == 0xffff || ((dw11 >> 16) & 0xffff) == 0xffff) {
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+
         trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
                                     ((dw11 >> 16) & 0xFFFF) + 1,
                                     n->params.max_ioqpairs,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 17/42] nvme: add log specific field to trace events
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (15 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 16/42] nvme: make sure ncqr and nsqr is valid Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:43   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 18/42] nvme: support identify namespace descriptor list Klaus Jensen
                   ` (26 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

The LSP field is not used directly now, but include it in the trace.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 3 ++-
 hw/block/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e56142c4ea99..16de3ca1c5d5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -760,6 +760,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     uint32_t dw12 = le32_to_cpu(cmd->cdw12);
     uint32_t dw13 = le32_to_cpu(cmd->cdw13);
     uint8_t  lid = dw10 & 0xff;
+    uint8_t  lsp = (dw10 >> 8) & 0xf;
     uint8_t  rae = (dw10 >> 15) & 0x1;
     uint32_t numdl, numdu;
     uint64_t off, lpol, lpou;
@@ -777,7 +778,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    trace_nvme_dev_get_log(nvme_cid(req), lid, rae, len, off);
+    trace_nvme_dev_get_log(nvme_cid(req), lid, lsp, rae, len, off);
 
     switch (lid) {
     case NVME_LOG_ERROR_INFO:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index dde1d22bc39a..13e2c71664f6 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -48,7 +48,7 @@ nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
 nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
 nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
 nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
-nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
+nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
 nvme_dev_process_aers(int queued) "queued %d"
 nvme_dev_aer(uint16_t cid) "cid %"PRIu16""
 nvme_dev_aer_aerl_exceeded(void) "aerl exceeded"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 18/42] nvme: support identify namespace descriptor list
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (16 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 17/42] nvme: add log specific field to trace events Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:43   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 19/42] nvme: enforce valid queue creation sequence Klaus Jensen
                   ` (25 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Since we are not providing the NGUID or EUI64 fields, we must support
the Namespace UUID. We do not have any way of storing a persistent
unique identifier, so conjure up a UUID that is just the namespace id.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 38 ++++++++++++++++++++++++++++++++++++++
 hw/block/trace-events |  1 +
 2 files changed, 39 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 16de3ca1c5d5..007f8817f101 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -942,6 +942,42 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
     return ret;
 }
 
+static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
+{
+    uint32_t nsid = le32_to_cpu(c->nsid);
+    uint64_t prp1 = le64_to_cpu(c->prp1);
+    uint64_t prp2 = le64_to_cpu(c->prp2);
+
+    void *list;
+    uint16_t ret;
+    NvmeIdNsDescr *ns_descr;
+
+    trace_nvme_dev_identify_ns_descr_list(nsid);
+
+    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
+        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
+        return NVME_INVALID_NSID | NVME_DNR;
+    }
+
+    list = g_malloc0(NVME_IDENTIFY_DATA_SIZE);
+    ns_descr = list;
+
+    /*
+     * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data
+     * structure, a Namespace UUID (nidt = 0x3) must be reported in the
+     * Namespace Identification Descriptor. Add a very basic Namespace UUID
+     * here.
+     */
+    ns_descr->nidt = NVME_NIDT_UUID;
+    ns_descr->nidl = NVME_NIDT_UUID_LEN;
+    stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
+
+    ret = nvme_dma_read_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
+                            prp2);
+    g_free(list);
+    return ret;
+}
+
 static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
 {
     NvmeIdentify *c = (NvmeIdentify *)cmd;
@@ -953,6 +989,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
         return nvme_identify_ctrl(n, c);
     case NVME_ID_CNS_NS_ACTIVE_LIST:
         return nvme_identify_nslist(n, c);
+    case NVME_ID_CNS_NS_DESCR_LIST:
+        return nvme_identify_ns_descr_list(n, c);
     default:
         trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
         return NVME_INVALID_FIELD | NVME_DNR;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 13e2c71664f6..4cde0844ef64 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -41,6 +41,7 @@ nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
 nvme_dev_identify_ctrl(void) "identify controller"
 nvme_dev_identify_ns(uint32_t ns) "nsid %"PRIu32""
 nvme_dev_identify_nslist(uint32_t ns) "nsid %"PRIu32""
+nvme_dev_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32""
 nvme_dev_getfeat(uint16_t cid, uint32_t fid) "cid %"PRIu16" fid 0x%"PRIx32""
 nvme_dev_setfeat(uint16_t cid, uint32_t fid, uint32_t val) "cid %"PRIu16" fid 0x%"PRIx32" val 0x%"PRIx32""
 nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 19/42] nvme: enforce valid queue creation sequence
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (17 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 18/42] nvme: support identify namespace descriptor list Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:43   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 20/42] nvme: provide the mandatory subnqn field Klaus Jensen
                   ` (24 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Support returning Command Sequence Error if Set Features on Number of
Queues is called after queues have been created.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 7 +++++++
 hw/block/nvme.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 007f8817f101..b40d27cddc46 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -881,6 +881,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
     cq = g_malloc0(sizeof(*cq));
     nvme_init_cq(cq, n, prp1, cqid, vector, qsize + 1,
         NVME_CQ_FLAGS_IEN(qflags));
+
+    n->qs_created = true;
     return NVME_SUCCESS;
 }
 
@@ -1194,6 +1196,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
         break;
     case NVME_NUMBER_OF_QUEUES:
+        if (n->qs_created) {
+            return NVME_CMD_SEQ_ERROR | NVME_DNR;
+        }
+
         /*
          * NVMe v1.3, Section 5.21.1.7: 0xffff is not an allowed value for NCQR
          * and NSQR.
@@ -1332,6 +1338,7 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
 
     n->aer_queued = 0;
     n->outstanding_aers = 0;
+    n->qs_created = false;
 
     blk_flush(n->conf.blk);
     n->bar.cc = 0;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index b709a8bb8d40..b4d1738a3d0a 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -99,6 +99,7 @@ typedef struct NvmeCtrl {
     BlockConf    conf;
     NvmeParams   params;
 
+    bool        qs_created;
     uint32_t    page_size;
     uint16_t    page_bits;
     uint16_t    max_prp_ents;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 20/42] nvme: provide the mandatory subnqn field
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (18 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 19/42] nvme: enforce valid queue creation sequence Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:43   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 21/42] nvme: bump supported version to v1.3 Klaus Jensen
                   ` (23 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b40d27cddc46..74061d08fd2e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1925,6 +1925,9 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->nn = cpu_to_le32(n->num_namespaces);
     id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
 
+    pstrcpy((char *) id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:");
+    pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
+
     id->psd[0].mp = cpu_to_le16(0x9c4);
     id->psd[0].enlat = cpu_to_le32(0x10);
     id->psd[0].exlat = cpu_to_le32(0x4);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 21/42] nvme: bump supported version to v1.3
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (19 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 20/42] nvme: provide the mandatory subnqn field Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:44   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 22/42] nvme: memset preallocated requests structures Klaus Jensen
                   ` (22 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 74061d08fd2e..26c4b6e69f72 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -44,6 +44,7 @@
 #include "trace.h"
 #include "nvme.h"
 
+#define NVME_SPEC_VER 0x00010300
 #define NVME_CMB_BIR 2
 #define NVME_TEMPERATURE 0x143
 #define NVME_TEMPERATURE_WARNING 0x157
@@ -1898,6 +1899,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->ieee[0] = 0x00;
     id->ieee[1] = 0x02;
     id->ieee[2] = 0xb3;
+    id->ver = cpu_to_le32(NVME_SPEC_VER);
     id->oacs = cpu_to_le16(0);
 
     /*
@@ -1942,7 +1944,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     NVME_CAP_SET_CSS(n->bar.cap, 1);
     NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 
-    n->bar.vs = 0x00010200;
+    n->bar.vs = NVME_SPEC_VER;
     n->bar.intmc = n->bar.intms = 0;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 22/42] nvme: memset preallocated requests structures
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (20 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 21/42] nvme: bump supported version to v1.3 Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:44   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 23/42] nvme: add mapping helpers Klaus Jensen
                   ` (21 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

This is preparatory to subsequent patches that change how QSGs/IOVs are
handled. It is important that the qsg and iov members of the NvmeRequest
are initially zeroed.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 26c4b6e69f72..08267e847671 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -597,7 +597,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
     sq->size = size;
     sq->cqid = cqid;
     sq->head = sq->tail = 0;
-    sq->io_req = g_new(NvmeRequest, sq->size);
+    sq->io_req = g_new0(NvmeRequest, sq->size);
 
     QTAILQ_INIT(&sq->req_list);
     QTAILQ_INIT(&sq->out_req_list);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 23/42] nvme: add mapping helpers
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (21 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 22/42] nvme: memset preallocated requests structures Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:45   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 24/42] nvme: remove redundant has_sg member Klaus Jensen
                   ` (20 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Add nvme_map_addr, nvme_map_addr_cmb and nvme_addr_to_cmb helpers and
use them in nvme_map_prp.

This fixes a bug where in the case of a CMB transfer, the device would
map to the buffer with a wrong length.

Fixes: b2b2b67a00574 ("nvme: Add support for Read Data and Write Data in CMBs.")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 97 +++++++++++++++++++++++++++++++++++--------
 hw/block/trace-events |  1 +
 2 files changed, 81 insertions(+), 17 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 08267e847671..187c816eb6ad 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -59,6 +59,11 @@
 
 static void nvme_process_sq(void *opaque);
 
+static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
+{
+    return &n->cmbuf[addr - n->ctrl_mem.addr];
+}
+
 static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 {
     hwaddr low = n->ctrl_mem.addr;
@@ -70,7 +75,7 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
     if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
-        memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
+        memcpy(buf, nvme_addr_to_cmb(n, addr), size);
         return;
     }
 
@@ -153,29 +158,79 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
     }
 }
 
+static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
+                                  size_t len)
+{
+    if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
+        return NVME_DATA_TRAS_ERROR;
+    }
+
+    qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
+
+    return NVME_SUCCESS;
+}
+
+static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
+                              hwaddr addr, size_t len)
+{
+    if (nvme_addr_is_cmb(n, addr)) {
+        if (qsg && qsg->sg) {
+            return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+        }
+
+        assert(iov);
+
+        if (!iov->iov) {
+            qemu_iovec_init(iov, 1);
+        }
+
+        return nvme_map_addr_cmb(n, iov, addr, len);
+    }
+
+    if (iov && iov->iov) {
+        return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+    }
+
+    assert(qsg);
+
+    if (!qsg->sg) {
+        pci_dma_sglist_init(qsg, &n->parent_obj, 1);
+    }
+
+    qemu_sglist_add(qsg, addr, len);
+
+    return NVME_SUCCESS;
+}
+
 static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
                              uint64_t prp2, uint32_t len, NvmeCtrl *n)
 {
     hwaddr trans_len = n->page_size - (prp1 % n->page_size);
     trans_len = MIN(len, trans_len);
     int num_prps = (len >> n->page_bits) + 1;
+    uint16_t status;
 
     if (unlikely(!prp1)) {
         trace_nvme_dev_err_invalid_prp();
         return NVME_INVALID_FIELD | NVME_DNR;
-    } else if (n->bar.cmbsz && prp1 >= n->ctrl_mem.addr &&
-               prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
-        qsg->nsg = 0;
+    }
+
+    if (nvme_addr_is_cmb(n, prp1)) {
         qemu_iovec_init(iov, num_prps);
-        qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], trans_len);
     } else {
         pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
-        qemu_sglist_add(qsg, prp1, trans_len);
     }
+
+    status = nvme_map_addr(n, qsg, iov, prp1, trans_len);
+    if (status) {
+        goto unmap;
+    }
+
     len -= trans_len;
     if (len) {
         if (unlikely(!prp2)) {
             trace_nvme_dev_err_invalid_prp2_missing();
+            status = NVME_INVALID_FIELD | NVME_DNR;
             goto unmap;
         }
         if (len > n->page_size) {
@@ -192,6 +247,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
                 if (i == n->max_prp_ents - 1 && len > n->page_size) {
                     if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
                         trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
+                        status = NVME_INVALID_FIELD | NVME_DNR;
                         goto unmap;
                     }
 
@@ -205,14 +261,14 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
 
                 if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
                     trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
+                    status = NVME_INVALID_FIELD | NVME_DNR;
                     goto unmap;
                 }
 
                 trans_len = MIN(len, n->page_size);
-                if (qsg->nsg){
-                    qemu_sglist_add(qsg, prp_ent, trans_len);
-                } else {
-                    qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - n->ctrl_mem.addr], trans_len);
+                status = nvme_map_addr(n, qsg, iov, prp_ent, trans_len);
+                if (status) {
+                    goto unmap;
                 }
                 len -= trans_len;
                 i++;
@@ -220,20 +276,27 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
         } else {
             if (unlikely(prp2 & (n->page_size - 1))) {
                 trace_nvme_dev_err_invalid_prp2_align(prp2);
+                status = NVME_INVALID_FIELD | NVME_DNR;
                 goto unmap;
             }
-            if (qsg->nsg) {
-                qemu_sglist_add(qsg, prp2, len);
-            } else {
-                qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - n->ctrl_mem.addr], trans_len);
+            status = nvme_map_addr(n, qsg, iov, prp2, len);
+            if (status) {
+                goto unmap;
             }
         }
     }
     return NVME_SUCCESS;
 
- unmap:
-    qemu_sglist_destroy(qsg);
-    return NVME_INVALID_FIELD | NVME_DNR;
+unmap:
+    if (iov && iov->iov) {
+        qemu_iovec_destroy(iov);
+    }
+
+    if (qsg && qsg->sg) {
+        qemu_sglist_destroy(qsg);
+    }
+
+    return status;
 }
 
 static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 4cde0844ef64..adf11313f956 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -33,6 +33,7 @@ nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
 nvme_dev_irq_pin(void) "pulsing IRQ pin"
 nvme_dev_irq_masked(void) "IRQ is masked"
 nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
+nvme_dev_map_prp(uint16_t cid, uint8_t opc, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" opc 0x%"PRIx8" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
 nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
 nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
 nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 24/42] nvme: remove redundant has_sg member
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (22 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 23/42] nvme: add mapping helpers Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:45   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 25/42] nvme: refactor dma read/write Klaus Jensen
                   ` (19 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Remove the has_sg member from NvmeRequest since it's redundant.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 18 ++++++++++++------
 hw/block/nvme.h |  1 -
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 187c816eb6ad..e40c080c3b48 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -484,16 +484,20 @@ static void nvme_rw_cb(void *opaque, int ret)
         block_acct_failed(blk_get_stats(n->conf.blk), &req->acct);
         req->status = NVME_INTERNAL_DEV_ERROR;
     }
-    if (req->has_sg) {
+
+    if (req->qsg.nalloc) {
         qemu_sglist_destroy(&req->qsg);
     }
+    if (req->iov.nalloc) {
+        qemu_iovec_destroy(&req->iov);
+    }
+
     nvme_enqueue_req_completion(cq, req);
 }
 
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     NvmeRequest *req)
 {
-    req->has_sg = false;
     block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
          BLOCK_ACCT_FLUSH);
     req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
@@ -517,7 +521,6 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
         return NVME_LBA_RANGE | NVME_DNR;
     }
 
-    req->has_sg = false;
     block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
                      BLOCK_ACCT_WRITE);
     req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
@@ -554,16 +557,19 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct);
     if (req->qsg.nsg > 0) {
-        req->has_sg = true;
+        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->qsg.size,
+                         acct);
+
         req->aiocb = is_write ?
             dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
                           nvme_rw_cb, req) :
             dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
                          nvme_rw_cb, req);
     } else {
-        req->has_sg = false;
+        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->iov.size,
+                         acct);
+
         req->aiocb = is_write ?
             blk_aio_pwritev(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
                             req) :
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index b4d1738a3d0a..442b17bf1701 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -29,7 +29,6 @@ typedef struct NvmeRequest {
     struct NvmeSQueue       *sq;
     BlockAIOCB              *aiocb;
     uint16_t                status;
-    bool                    has_sg;
     NvmeCqe                 cqe;
     BlockAcctCookie         acct;
     QEMUSGList              qsg;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 25/42] nvme: refactor dma read/write
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (23 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 24/42] nvme: remove redundant has_sg member Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:46   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 26/42] nvme: pass request along for tracing Klaus Jensen
                   ` (18 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Refactor the nvme_dma_{read,write}_prp functions into a common function
taking a DMADirection parameter.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 89 ++++++++++++++++++++++++-------------------------
 1 file changed, 43 insertions(+), 46 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e40c080c3b48..809d00443369 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -299,55 +299,50 @@ unmap:
     return status;
 }
 
-static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-                                   uint64_t prp1, uint64_t prp2)
+static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+                             uint64_t prp1, uint64_t prp2, DMADirection dir)
 {
     QEMUSGList qsg;
     QEMUIOVector iov;
     uint16_t status = NVME_SUCCESS;
 
-    if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
-        return NVME_INVALID_FIELD | NVME_DNR;
+    status = nvme_map_prp(&qsg, &iov, prp1, prp2, len, n);
+    if (status) {
+        return status;
     }
-    if (qsg.nsg > 0) {
-        if (dma_buf_write(ptr, len, &qsg)) {
-            status = NVME_INVALID_FIELD | NVME_DNR;
-        }
-        qemu_sglist_destroy(&qsg);
-    } else {
-        if (qemu_iovec_to_buf(&iov, 0, ptr, len) != len) {
-            status = NVME_INVALID_FIELD | NVME_DNR;
-        }
-        qemu_iovec_destroy(&iov);
-    }
-    return status;
-}
 
-static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-    uint64_t prp1, uint64_t prp2)
-{
-    QEMUSGList qsg;
-    QEMUIOVector iov;
-    uint16_t status = NVME_SUCCESS;
+    if (qsg.nsg > 0) {
+        uint64_t residual;
 
-    trace_nvme_dev_dma_read(prp1, prp2);
+        if (dir == DMA_DIRECTION_TO_DEVICE) {
+            residual = dma_buf_write(ptr, len, &qsg);
+        } else {
+            residual = dma_buf_read(ptr, len, &qsg);
+        }
 
-    if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
-        return NVME_INVALID_FIELD | NVME_DNR;
-    }
-    if (qsg.nsg > 0) {
-        if (unlikely(dma_buf_read(ptr, len, &qsg))) {
+        if (unlikely(residual)) {
             trace_nvme_dev_err_invalid_dma();
             status = NVME_INVALID_FIELD | NVME_DNR;
         }
+
         qemu_sglist_destroy(&qsg);
     } else {
-        if (unlikely(qemu_iovec_from_buf(&iov, 0, ptr, len) != len)) {
+        size_t bytes;
+
+        if (dir == DMA_DIRECTION_TO_DEVICE) {
+            bytes = qemu_iovec_to_buf(&iov, 0, ptr, len);
+        } else {
+            bytes = qemu_iovec_from_buf(&iov, 0, ptr, len);
+        }
+
+        if (unlikely(bytes != len)) {
             trace_nvme_dev_err_invalid_dma();
             status = NVME_INVALID_FIELD | NVME_DNR;
         }
+
         qemu_iovec_destroy(&iov);
     }
+
     return status;
 }
 
@@ -775,8 +770,8 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
         nvme_clear_events(n, NVME_AER_TYPE_SMART);
     }
 
-    return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
-                             prp2);
+    return nvme_dma_prp(n, (uint8_t *) &smart + off, trans_len, prp1, prp2,
+                        DMA_DIRECTION_FROM_DEVICE);
 }
 
 static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
@@ -795,8 +790,8 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
 
     trans_len = MIN(sizeof(fw_log) - off, buf_len);
 
-    return nvme_dma_read_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1,
-                             prp2);
+    return nvme_dma_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1, prp2,
+                        DMA_DIRECTION_FROM_DEVICE);
 }
 
 static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
@@ -820,7 +815,8 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
 
     trans_len = MIN(sizeof(errlog) - off, buf_len);
 
-    return nvme_dma_read_prp(n, errlog, trans_len, prp1, prp2);
+    return nvme_dma_prp(n, errlog, trans_len, prp1, prp2,
+                        DMA_DIRECTION_FROM_DEVICE);
 }
 
 static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
@@ -963,8 +959,8 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
 
     trace_nvme_dev_identify_ctrl();
 
-    return nvme_dma_read_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
-        prp1, prp2);
+    return nvme_dma_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), prp1,
+                        prp2, DMA_DIRECTION_FROM_DEVICE);
 }
 
 static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
@@ -983,8 +979,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
 
     ns = &n->namespaces[nsid - 1];
 
-    return nvme_dma_read_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns),
-        prp1, prp2);
+    return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
+                        prp2, DMA_DIRECTION_FROM_DEVICE);
 }
 
 static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
@@ -1009,7 +1005,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
             break;
         }
     }
-    ret = nvme_dma_read_prp(n, (uint8_t *)list, data_len, prp1, prp2);
+    ret = nvme_dma_prp(n, (uint8_t *)list, data_len, prp1, prp2,
+                       DMA_DIRECTION_FROM_DEVICE);
     g_free(list);
     return ret;
 }
@@ -1044,8 +1041,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
     ns_descr->nidl = NVME_NIDT_UUID_LEN;
     stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
 
-    ret = nvme_dma_read_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
-                            prp2);
+    ret = nvme_dma_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
+                       prp2, DMA_DIRECTION_FROM_DEVICE);
     g_free(list);
     return ret;
 }
@@ -1128,8 +1125,8 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
 
     uint64_t timestamp = nvme_get_timestamp(n);
 
-    return nvme_dma_read_prp(n, (uint8_t *)&timestamp,
-                                 sizeof(timestamp), prp1, prp2);
+    return nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
+                        prp2, DMA_DIRECTION_FROM_DEVICE);
 }
 
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
@@ -1214,8 +1211,8 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
     uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
     uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
 
-    ret = nvme_dma_write_prp(n, (uint8_t *)&timestamp,
-                                sizeof(timestamp), prp1, prp2);
+    ret = nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
+                       prp2, DMA_DIRECTION_TO_DEVICE);
     if (ret != NVME_SUCCESS) {
         return ret;
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 26/42] nvme: pass request along for tracing
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (24 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 25/42] nvme: refactor dma read/write Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:55   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 27/42] nvme: add request mapping helper Klaus Jensen
                   ` (17 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 67 +++++++++++++++++++++++++------------------
 hw/block/trace-events |  2 +-
 2 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 809d00443369..3e9c2ed434c2 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -202,14 +202,18 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
-                             uint64_t prp2, uint32_t len, NvmeCtrl *n)
+static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
+                             uint64_t prp1, uint64_t prp2, uint32_t len,
+                             NvmeRequest *req)
 {
     hwaddr trans_len = n->page_size - (prp1 % n->page_size);
     trans_len = MIN(len, trans_len);
     int num_prps = (len >> n->page_bits) + 1;
     uint16_t status;
 
+    trace_nvme_dev_map_prp(nvme_cid(req), trans_len, len, prp1, prp2,
+                           num_prps);
+
     if (unlikely(!prp1)) {
         trace_nvme_dev_err_invalid_prp();
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -300,13 +304,14 @@ unmap:
 }
 
 static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-                             uint64_t prp1, uint64_t prp2, DMADirection dir)
+                             uint64_t prp1, uint64_t prp2, DMADirection dir,
+                             NvmeRequest *req)
 {
     QEMUSGList qsg;
     QEMUIOVector iov;
     uint16_t status = NVME_SUCCESS;
 
-    status = nvme_map_prp(&qsg, &iov, prp1, prp2, len, n);
+    status = nvme_map_prp(n, &qsg, &iov, prp1, prp2, len, req);
     if (status) {
         return status;
     }
@@ -547,7 +552,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
         return NVME_LBA_RANGE | NVME_DNR;
     }
 
-    if (nvme_map_prp(&req->qsg, &req->iov, prp1, prp2, data_size, n)) {
+    if (nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, data_size, req)) {
         block_acct_invalid(blk_get_stats(n->conf.blk), acct);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
@@ -771,7 +776,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
     }
 
     return nvme_dma_prp(n, (uint8_t *) &smart + off, trans_len, prp1, prp2,
-                        DMA_DIRECTION_FROM_DEVICE);
+                        DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
@@ -791,7 +796,7 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
     trans_len = MIN(sizeof(fw_log) - off, buf_len);
 
     return nvme_dma_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1, prp2,
-                        DMA_DIRECTION_FROM_DEVICE);
+                        DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
@@ -816,7 +821,7 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
     trans_len = MIN(sizeof(errlog) - off, buf_len);
 
     return nvme_dma_prp(n, errlog, trans_len, prp1, prp2,
-                        DMA_DIRECTION_FROM_DEVICE);
+                        DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
@@ -952,7 +957,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
+static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
+                                   NvmeRequest *req)
 {
     uint64_t prp1 = le64_to_cpu(c->prp1);
     uint64_t prp2 = le64_to_cpu(c->prp2);
@@ -960,10 +966,11 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
     trace_nvme_dev_identify_ctrl();
 
     return nvme_dma_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), prp1,
-                        prp2, DMA_DIRECTION_FROM_DEVICE);
+                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
+                                 NvmeRequest *req)
 {
     NvmeNamespace *ns;
     uint32_t nsid = le32_to_cpu(c->nsid);
@@ -980,10 +987,11 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
     ns = &n->namespaces[nsid - 1];
 
     return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
-                        prp2, DMA_DIRECTION_FROM_DEVICE);
+                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
+                                     NvmeRequest *req)
 {
     static const int data_len = NVME_IDENTIFY_DATA_SIZE;
     uint32_t min_nsid = le32_to_cpu(c->nsid);
@@ -1006,12 +1014,13 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
         }
     }
     ret = nvme_dma_prp(n, (uint8_t *)list, data_len, prp1, prp2,
-                       DMA_DIRECTION_FROM_DEVICE);
+                       DMA_DIRECTION_FROM_DEVICE, req);
     g_free(list);
     return ret;
 }
 
-static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
+static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
+                                            NvmeRequest *req)
 {
     uint32_t nsid = le32_to_cpu(c->nsid);
     uint64_t prp1 = le64_to_cpu(c->prp1);
@@ -1042,24 +1051,24 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
     stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
 
     ret = nvme_dma_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
-                       prp2, DMA_DIRECTION_FROM_DEVICE);
+                       prp2, DMA_DIRECTION_FROM_DEVICE, req);
     g_free(list);
     return ret;
 }
 
-static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
     NvmeIdentify *c = (NvmeIdentify *)cmd;
 
     switch (le32_to_cpu(c->cns)) {
     case NVME_ID_CNS_NS:
-        return nvme_identify_ns(n, c);
+        return nvme_identify_ns(n, c, req);
     case NVME_ID_CNS_CTRL:
-        return nvme_identify_ctrl(n, c);
+        return nvme_identify_ctrl(n, c, req);
     case NVME_ID_CNS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist(n, c);
+        return nvme_identify_nslist(n, c, req);
     case NVME_ID_CNS_NS_DESCR_LIST:
-        return nvme_identify_ns_descr_list(n, c);
+        return nvme_identify_ns_descr_list(n, c, req);
     default:
         trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1118,7 +1127,8 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
     return cpu_to_le64(ts.all);
 }
 
-static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
+                                           NvmeRequest *req)
 {
     uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
     uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
@@ -1126,7 +1136,7 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
     uint64_t timestamp = nvme_get_timestamp(n);
 
     return nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
-                        prp2, DMA_DIRECTION_FROM_DEVICE);
+                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
@@ -1178,7 +1188,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         trace_nvme_dev_getfeat_numq(result);
         break;
     case NVME_TIMESTAMP:
-        return nvme_get_feature_timestamp(n, cmd);
+        return nvme_get_feature_timestamp(n, cmd, req);
     case NVME_INTERRUPT_COALESCING:
         result = cpu_to_le32(n->features.int_coalescing);
         break;
@@ -1204,7 +1214,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
+                                           NvmeRequest *req)
 {
     uint16_t ret;
     uint64_t timestamp;
@@ -1212,7 +1223,7 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
     uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
 
     ret = nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
-                       prp2, DMA_DIRECTION_TO_DEVICE);
+                       prp2, DMA_DIRECTION_TO_DEVICE, req);
     if (ret != NVME_SUCCESS) {
         return ret;
     }
@@ -1283,7 +1294,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
                                       ((n->params.max_ioqpairs - 1) << 16));
         break;
     case NVME_TIMESTAMP:
-        return nvme_set_feature_timestamp(n, cmd);
+        return nvme_set_feature_timestamp(n, cmd, req);
     case NVME_ASYNCHRONOUS_EVENT_CONF:
         n->features.async_config = dw11;
         break;
@@ -1334,7 +1345,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     case NVME_ADM_CMD_CREATE_CQ:
         return nvme_create_cq(n, cmd);
     case NVME_ADM_CMD_IDENTIFY:
-        return nvme_identify(n, cmd);
+        return nvme_identify(n, cmd, req);
     case NVME_ADM_CMD_ABORT:
         return nvme_abort(n, cmd, req);
     case NVME_ADM_CMD_SET_FEATURES:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index adf11313f956..e31e652fa04e 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -33,7 +33,7 @@ nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
 nvme_dev_irq_pin(void) "pulsing IRQ pin"
 nvme_dev_irq_masked(void) "IRQ is masked"
 nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
-nvme_dev_map_prp(uint16_t cid, uint8_t opc, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" opc 0x%"PRIx8" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
+nvme_dev_map_prp(uint16_t cid, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
 nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
 nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
 nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 27/42] nvme: add request mapping helper
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (25 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 26/42] nvme: pass request along for tracing Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:56   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 28/42] nvme: verify validity of prp lists in the cmb Klaus Jensen
                   ` (16 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Introduce the nvme_map helper to remove some noise in the main nvme_rw
function.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3e9c2ed434c2..850087aac967 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -351,6 +351,15 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
     return status;
 }
 
+static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
+                         QEMUIOVector *iov, size_t len, NvmeRequest *req)
+{
+    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
+    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
+
+    return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
+}
+
 static void nvme_post_cqes(void *opaque)
 {
     NvmeCQueue *cq = opaque;
@@ -534,8 +543,6 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
     uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
     uint64_t slba = le64_to_cpu(rw->slba);
-    uint64_t prp1 = le64_to_cpu(rw->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(rw->dptr.prp2);
 
     uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
     uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
@@ -552,7 +559,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
         return NVME_LBA_RANGE | NVME_DNR;
     }
 
-    if (nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, data_size, req)) {
+    if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
         block_acct_invalid(blk_get_stats(n->conf.blk), acct);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 28/42] nvme: verify validity of prp lists in the cmb
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (26 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 27/42] nvme: add request mapping helper Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:56   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 29/42] nvme: refactor request bounds checking Klaus Jensen
                   ` (15 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Before this patch the device already supported this, but it did not
check for the validity of it nor announced the support in the LISTS
field.

If some of the PRPs in a PRP list are in the CMB, then ALL entries must
be there. This patch makes sure that is verified as well as properly
announcing support for PRP lists in the CMB.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 850087aac967..eecfad694bf8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -210,6 +210,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
     trans_len = MIN(len, trans_len);
     int num_prps = (len >> n->page_bits) + 1;
     uint16_t status;
+    bool prp_list_in_cmb = false;
 
     trace_nvme_dev_map_prp(nvme_cid(req), trans_len, len, prp1, prp2,
                            num_prps);
@@ -237,11 +238,16 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
             status = NVME_INVALID_FIELD | NVME_DNR;
             goto unmap;
         }
+
         if (len > n->page_size) {
             uint64_t prp_list[n->max_prp_ents];
             uint32_t nents, prp_trans;
             int i = 0;
 
+            if (nvme_addr_is_cmb(n, prp2)) {
+                prp_list_in_cmb = true;
+            }
+
             nents = (len + n->page_size - 1) >> n->page_bits;
             prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
             nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
@@ -255,6 +261,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
                         goto unmap;
                     }
 
+                    if (prp_list_in_cmb != nvme_addr_is_cmb(n, prp_ent)) {
+                        status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
+                        goto unmap;
+                    }
+
                     i = 0;
                     nents = (len + n->page_size - 1) >> n->page_bits;
                     prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
@@ -274,6 +285,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
                 if (status) {
                     goto unmap;
                 }
+
                 len -= trans_len;
                 i++;
             }
@@ -1931,7 +1943,7 @@ static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
 
     NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
     NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
-    NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
+    NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 1);
     NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
     NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
     NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 29/42] nvme: refactor request bounds checking
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (27 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 28/42] nvme: verify validity of prp lists in the cmb Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:56   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 30/42] nvme: add check for mdts Klaus Jensen
                   ` (14 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index eecfad694bf8..ba520c76bae5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -491,6 +491,20 @@ static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
     }
 }
 
+static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
+                                         uint64_t slba, uint32_t nlb,
+                                         NvmeRequest *req)
+{
+    uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
+
+    if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
+        trace_nvme_dev_err_invalid_lba_range(slba, nlb, nsze);
+        return NVME_LBA_RANGE | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
 static void nvme_rw_cb(void *opaque, int ret)
 {
     NvmeRequest *req = opaque;
@@ -536,10 +550,11 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
     uint64_t offset = slba << data_shift;
     uint32_t count = nlb << data_shift;
+    uint16_t status;
 
-    if (unlikely(slba + nlb > ns->id_ns.nsze)) {
-        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
-        return NVME_LBA_RANGE | NVME_DNR;
+    status = nvme_check_bounds(n, ns, slba, nlb, req);
+    if (status) {
+        return status;
     }
 
     block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
@@ -562,13 +577,14 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     uint64_t data_offset = slba << data_shift;
     int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
     enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
+    uint16_t status;
 
     trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
 
-    if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
+    status = nvme_check_bounds(n, ns, slba, nlb, req);
+    if (status) {
         block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
-        return NVME_LBA_RANGE | NVME_DNR;
+        return status;
     }
 
     if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 30/42] nvme: add check for mdts
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (28 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 29/42] nvme: refactor request bounds checking Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:57   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 31/42] nvme: add check for prinfo Klaus Jensen
                   ` (13 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Add 'mdts' device parameter to control the Maximum Data Transfer Size of
the controller and check that it is respected.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 29 ++++++++++++++++++++++++++++-
 hw/block/nvme.h       |  4 +++-
 hw/block/trace-events |  1 +
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ba520c76bae5..7d5340c272c6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,7 +19,8 @@
  *      -drive file=<file>,if=none,id=<drive_id>
  *      -device nvme,drive=<drive_id>,serial=<serial>,id=<id[optional]>, \
  *              cmb_size_mb=<cmb_size_mb[optional]>, \
- *              max_ioqpairs=<N[optional]>
+ *              max_ioqpairs=<N[optional]>, \
+ *              mdts=<mdts[optional]>
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -491,6 +492,19 @@ static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
     }
 }
 
+static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
+                                       NvmeRequest *req)
+{
+    uint8_t mdts = n->params.mdts;
+
+    if (mdts && len > n->page_size << mdts) {
+        trace_nvme_dev_err_mdts(nvme_cid(req), n->page_size << mdts, len);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
 static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
                                          uint64_t slba, uint32_t nlb,
                                          NvmeRequest *req)
@@ -581,6 +595,12 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
 
     trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
 
+    status = nvme_check_mdts(n, data_size, req);
+    if (status) {
+        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
+        return status;
+    }
+
     status = nvme_check_bounds(n, ns, slba, nlb, req);
     if (status) {
         block_acct_invalid(blk_get_stats(n->conf.blk), acct);
@@ -871,6 +891,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     uint32_t numdl, numdu;
     uint64_t off, lpol, lpou;
     size_t   len;
+    uint16_t status;
 
     numdl = (dw10 >> 16);
     numdu = (dw11 & 0xffff);
@@ -886,6 +907,11 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 
     trace_nvme_dev_get_log(nvme_cid(req), lid, lsp, rae, len, off);
 
+    status = nvme_check_mdts(n, len, req);
+    if (status) {
+        return status;
+    }
+
     switch (lid) {
     case NVME_LOG_ERROR_INFO:
         return nvme_error_info(n, cmd, rae, len, off, req);
@@ -2011,6 +2037,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->ieee[0] = 0x00;
     id->ieee[1] = 0x02;
     id->ieee[2] = 0xb3;
+    id->mdts = params->mdts;
     id->ver = cpu_to_le32(NVME_SPEC_VER);
     id->oacs = cpu_to_le16(0);
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 442b17bf1701..b05c2153aebf 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -9,7 +9,8 @@
     DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 0), \
     DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64), \
     DEFINE_PROP_UINT8("aerl", _state, _props.aerl, 3), \
-    DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64)
+    DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64), \
+    DEFINE_PROP_UINT8("mdts", _state, _props.mdts, 7)
 
 typedef struct NvmeParams {
     char     *serial;
@@ -18,6 +19,7 @@ typedef struct NvmeParams {
     uint32_t cmb_size_mb;
     uint8_t  aerl;
     uint32_t aer_max_queued;
+    uint8_t  mdts;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
diff --git a/hw/block/trace-events b/hw/block/trace-events
index e31e652fa04e..2df6aa38df1b 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -79,6 +79,7 @@ nvme_dev_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_
 nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_tail %"PRIu16""
 
 # nvme traces for error conditions
+nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
 nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
 nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
 nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 31/42] nvme: add check for prinfo
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (29 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 30/42] nvme: add check for mdts Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:57   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 32/42] nvme: allow multiple aios per command Klaus Jensen
                   ` (12 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Check the validity of the PRINFO field.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 50 ++++++++++++++++++++++++++++++++++++-------
 hw/block/trace-events |  1 +
 include/block/nvme.h  |  1 +
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7d5340c272c6..0d2b5b45b0c5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -505,6 +505,17 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
     return NVME_SUCCESS;
 }
 
+static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
+                                         uint16_t ctrl, NvmeRequest *req)
+{
+    if ((ctrl & NVME_RW_PRINFO_PRACT) && !(ns->id_ns.dps & DPS_TYPE_MASK)) {
+        trace_nvme_dev_err_prinfo(nvme_cid(req), ctrl);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
 static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
                                          uint64_t slba, uint32_t nlb,
                                          NvmeRequest *req)
@@ -564,11 +575,22 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
     uint64_t offset = slba << data_shift;
     uint32_t count = nlb << data_shift;
+    uint16_t ctrl = le16_to_cpu(rw->control);
     uint16_t status;
 
+    status = nvme_check_prinfo(n, ns, ctrl, req);
+    if (status) {
+        goto invalid;
+    }
+
+    if (ctrl & NVME_RW_PRINFO_PRCHK_MASK) {
+        status = NVME_INVALID_PROT_INFO | NVME_DNR;
+        goto invalid;
+    }
+
     status = nvme_check_bounds(n, ns, slba, nlb, req);
     if (status) {
-        return status;
+        goto invalid;
     }
 
     block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
@@ -576,6 +598,10 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
                                         BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
     return NVME_NO_COMPLETE;
+
+invalid:
+    block_acct_invalid(blk_get_stats(n->conf.blk), BLOCK_ACCT_WRITE);
+    return status;
 }
 
 static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
@@ -584,6 +610,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
     uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
     uint64_t slba = le64_to_cpu(rw->slba);
+    uint16_t ctrl = le16_to_cpu(rw->control);
 
     uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
     uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
@@ -597,19 +624,22 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
 
     status = nvme_check_mdts(n, data_size, req);
     if (status) {
-        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-        return status;
+        goto invalid;
+    }
+
+    status = nvme_check_prinfo(n, ns, ctrl, req);
+    if (status) {
+        goto invalid;
     }
 
     status = nvme_check_bounds(n, ns, slba, nlb, req);
     if (status) {
-        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-        return status;
+        goto invalid;
     }
 
-    if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
-        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-        return NVME_INVALID_FIELD | NVME_DNR;
+    status = nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req);
+    if (status) {
+        goto invalid;
     }
 
     if (req->qsg.nsg > 0) {
@@ -633,6 +663,10 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     }
 
     return NVME_NO_COMPLETE;
+
+invalid:
+    block_acct_invalid(blk_get_stats(n->conf.blk), acct);
+    return status;
 }
 
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 2df6aa38df1b..2aceb0537e05 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -80,6 +80,7 @@ nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
 
 # nvme traces for error conditions
 nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
+nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
 nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
 nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
 nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index ecc02fbe8bb8..293d68553538 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -394,6 +394,7 @@ enum {
     NVME_RW_PRINFO_PRCHK_GUARD  = 1 << 12,
     NVME_RW_PRINFO_PRCHK_APP    = 1 << 11,
     NVME_RW_PRINFO_PRCHK_REF    = 1 << 10,
+    NVME_RW_PRINFO_PRCHK_MASK   = 7 << 10,
 };
 
 typedef struct NvmeDsmCmd {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 32/42] nvme: allow multiple aios per command
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (30 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 31/42] nvme: add check for prinfo Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:57   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 33/42] nvme: use preallocated qsg/iov in nvme_dma_prp Klaus Jensen
                   ` (11 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

This refactors how the device issues asynchronous block backend
requests. The NvmeRequest now holds a queue of NvmeAIOs that are
associated with the command. This allows multiple aios to be issued for
a command. Only when all requests have been completed will the device
post a completion queue entry.

Because the device is currently guaranteed to only issue a single aio
request per command, the benefit is not immediately obvious. But this
functionality is required to support metadata, the dataset management
command and other features.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c       | 377 +++++++++++++++++++++++++++++++-----------
 hw/block/nvme.h       | 129 +++++++++++++--
 hw/block/trace-events |   6 +
 3 files changed, 407 insertions(+), 105 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 0d2b5b45b0c5..817384e3b1a9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -59,6 +59,7 @@
     } while (0)
 
 static void nvme_process_sq(void *opaque);
+static void nvme_aio_cb(void *opaque, int ret);
 
 static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
 {
@@ -373,6 +374,99 @@ static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
     return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
 }
 
+static void nvme_aio_destroy(NvmeAIO *aio)
+{
+    g_free(aio);
+}
+
+static inline void nvme_req_register_aio(NvmeRequest *req, NvmeAIO *aio,
+                                         NvmeAIOOp opc)
+{
+    aio->opc = opc;
+
+    trace_nvme_dev_req_register_aio(nvme_cid(req), aio, blk_name(aio->blk),
+                                    aio->offset, aio->len,
+                                    nvme_aio_opc_str(aio), req);
+
+    if (req) {
+        QTAILQ_INSERT_TAIL(&req->aio_tailq, aio, tailq_entry);
+    }
+}
+
+static void nvme_submit_aio(NvmeAIO *aio)
+{
+    BlockBackend *blk = aio->blk;
+    BlockAcctCookie *acct = &aio->acct;
+    BlockAcctStats *stats = blk_get_stats(blk);
+
+    bool is_write;
+
+    switch (aio->opc) {
+    case NVME_AIO_OPC_NONE:
+        break;
+
+    case NVME_AIO_OPC_FLUSH:
+        block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
+        aio->aiocb = blk_aio_flush(blk, nvme_aio_cb, aio);
+        break;
+
+    case NVME_AIO_OPC_WRITE_ZEROES:
+        block_acct_start(stats, acct, aio->len, BLOCK_ACCT_WRITE);
+        aio->aiocb = blk_aio_pwrite_zeroes(blk, aio->offset, aio->len,
+                                           BDRV_REQ_MAY_UNMAP, nvme_aio_cb,
+                                           aio);
+        break;
+
+    case NVME_AIO_OPC_READ:
+    case NVME_AIO_OPC_WRITE:
+        is_write = (aio->opc == NVME_AIO_OPC_WRITE);
+
+        block_acct_start(stats, acct, aio->len,
+                         is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
+
+        if (aio->qsg) {
+            if (is_write) {
+                aio->aiocb = dma_blk_write(blk, aio->qsg, aio->offset,
+                                           BDRV_SECTOR_SIZE, nvme_aio_cb, aio);
+            } else {
+                aio->aiocb = dma_blk_read(blk, aio->qsg, aio->offset,
+                                          BDRV_SECTOR_SIZE, nvme_aio_cb, aio);
+            }
+        } else {
+            if (is_write) {
+                aio->aiocb = blk_aio_pwritev(blk, aio->offset, aio->iov, 0,
+                                             nvme_aio_cb, aio);
+            } else {
+                aio->aiocb = blk_aio_preadv(blk, aio->offset, aio->iov, 0,
+                                            nvme_aio_cb, aio);
+            }
+        }
+
+        break;
+    }
+}
+
+static void nvme_rw_aio(BlockBackend *blk, uint64_t offset, NvmeRequest *req)
+{
+    NvmeAIO *aio;
+    size_t len = req->qsg.nsg > 0 ? req->qsg.size : req->iov.size;
+
+    aio = g_new0(NvmeAIO, 1);
+
+    *aio = (NvmeAIO) {
+        .blk = blk,
+        .offset = offset,
+        .len = len,
+        .req = req,
+        .qsg = req->qsg.sg ? &req->qsg : NULL,
+        .iov = req->iov.iov ? &req->iov : NULL,
+    };
+
+    nvme_req_register_aio(req, aio, nvme_req_is_write(req) ?
+                          NVME_AIO_OPC_WRITE : NVME_AIO_OPC_READ);
+    nvme_submit_aio(aio);
+}
+
 static void nvme_post_cqes(void *opaque)
 {
     NvmeCQueue *cq = opaque;
@@ -396,6 +490,7 @@ static void nvme_post_cqes(void *opaque)
         nvme_inc_cq_tail(cq);
         pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
             sizeof(req->cqe));
+        nvme_req_clear(req);
         QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
     }
     if (cq->tail != cq->head) {
@@ -406,8 +501,8 @@ static void nvme_post_cqes(void *opaque)
 static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
 {
     assert(cq->cqid == req->sq->cqid);
-    trace_nvme_dev_enqueue_req_completion(nvme_cid(req), cq->cqid,
-                                          req->status);
+    trace_nvme_dev_enqueue_req_completion(nvme_cid(req), cq->cqid, req->status);
+
     QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
     QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
     timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
@@ -505,9 +600,11 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
     return NVME_SUCCESS;
 }
 
-static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
-                                         uint16_t ctrl, NvmeRequest *req)
+static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, uint16_t ctrl,
+                                         NvmeRequest *req)
 {
+    NvmeNamespace *ns = req->ns;
+
     if ((ctrl & NVME_RW_PRINFO_PRACT) && !(ns->id_ns.dps & DPS_TYPE_MASK)) {
         trace_nvme_dev_err_prinfo(nvme_cid(req), ctrl);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -516,10 +613,10 @@ static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
     return NVME_SUCCESS;
 }
 
-static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
-                                         uint64_t slba, uint32_t nlb,
-                                         NvmeRequest *req)
+static inline uint16_t nvme_check_bounds(NvmeCtrl *n, uint64_t slba,
+                                         uint32_t nlb, NvmeRequest *req)
 {
+    NvmeNamespace *ns = req->ns;
     uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
 
     if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
@@ -530,55 +627,154 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
     return NVME_SUCCESS;
 }
 
-static void nvme_rw_cb(void *opaque, int ret)
+static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeNamespace *ns = req->ns;
+    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
+    uint16_t ctrl = le16_to_cpu(rw->control);
+    size_t len = req->nlb << nvme_ns_lbads(ns);
+    uint16_t status;
+
+    status = nvme_check_mdts(n, len, req);
+    if (status) {
+        return status;
+    }
+
+    status = nvme_check_prinfo(n, ctrl, req);
+    if (status) {
+        return status;
+    }
+
+    status = nvme_check_bounds(n, req->slba, req->nlb, req);
+    if (status) {
+        return status;
+    }
+
+    return NVME_SUCCESS;
+}
+
+static void nvme_rw_cb(NvmeRequest *req, void *opaque)
 {
-    NvmeRequest *req = opaque;
     NvmeSQueue *sq = req->sq;
     NvmeCtrl *n = sq->ctrl;
     NvmeCQueue *cq = n->cq[sq->cqid];
 
-    if (!ret) {
-        block_acct_done(blk_get_stats(n->conf.blk), &req->acct);
-        req->status = NVME_SUCCESS;
-    } else {
-        block_acct_failed(blk_get_stats(n->conf.blk), &req->acct);
-        req->status = NVME_INTERNAL_DEV_ERROR;
-    }
-
-    if (req->qsg.nalloc) {
-        qemu_sglist_destroy(&req->qsg);
-    }
-    if (req->iov.nalloc) {
-        qemu_iovec_destroy(&req->iov);
-    }
+    trace_nvme_dev_rw_cb(nvme_cid(req), req->cmd.nsid);
 
     nvme_enqueue_req_completion(cq, req);
 }
 
-static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
-    NvmeRequest *req)
+static void nvme_aio_cb(void *opaque, int ret)
 {
-    block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
-         BLOCK_ACCT_FLUSH);
-    req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
+    NvmeAIO *aio = opaque;
+    NvmeRequest *req = aio->req;
+
+    BlockBackend *blk = aio->blk;
+    BlockAcctCookie *acct = &aio->acct;
+    BlockAcctStats *stats = blk_get_stats(blk);
+
+    Error *local_err = NULL;
+
+    trace_nvme_dev_aio_cb(nvme_cid(req), aio, blk_name(blk), aio->offset,
+                          nvme_aio_opc_str(aio), req);
+
+    if (req) {
+        QTAILQ_REMOVE(&req->aio_tailq, aio, tailq_entry);
+    }
+
+    if (!ret) {
+        block_acct_done(stats, acct);
+    } else {
+        block_acct_failed(stats, acct);
+
+        if (req) {
+            uint16_t status;
+
+            switch (aio->opc) {
+            case NVME_AIO_OPC_READ:
+                status = NVME_UNRECOVERED_READ;
+                break;
+            case NVME_AIO_OPC_WRITE:
+            case NVME_AIO_OPC_WRITE_ZEROES:
+                status = NVME_WRITE_FAULT;
+                break;
+            default:
+                status = NVME_INTERNAL_DEV_ERROR;
+                break;
+            }
+
+            trace_nvme_dev_err_aio(nvme_cid(req), aio, blk_name(blk),
+                                   aio->offset, nvme_aio_opc_str(aio), req,
+                                   status);
+
+            error_setg_errno(&local_err, -ret, "aio failed");
+            error_report_err(local_err);
+
+            /*
+             * An Internal Error trumps all other errors. For other errors,
+             * only set the first error encountered. Any additional errors will
+             * be recorded in the error information log page.
+             */
+            if (!req->status ||
+                nvme_status_is_error(status, NVME_INTERNAL_DEV_ERROR)) {
+                req->status = status;
+            }
+        }
+    }
+
+    if (aio->cb) {
+        aio->cb(aio, aio->cb_arg, ret);
+    }
+
+    if (req && QTAILQ_EMPTY(&req->aio_tailq)) {
+        if (req->cb) {
+            req->cb(req, req->cb_arg);
+        } else {
+            NvmeSQueue *sq = req->sq;
+            NvmeCtrl *n = sq->ctrl;
+            NvmeCQueue *cq = n->cq[sq->cqid];
+
+            nvme_enqueue_req_completion(cq, req);
+        }
+    }
+
+    nvme_aio_destroy(aio);
+}
+
+static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+{
+    NvmeAIO *aio = g_new0(NvmeAIO, 1);
+
+    *aio = (NvmeAIO) {
+        .blk = n->conf.blk,
+        .req = req,
+    };
+
+    nvme_req_register_aio(req, aio, NVME_AIO_OPC_FLUSH);
+    nvme_submit_aio(aio);
 
     return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
-    NvmeRequest *req)
+static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
-    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
-    const uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
-    const uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
-    uint64_t slba = le64_to_cpu(rw->slba);
-    uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
-    uint64_t offset = slba << data_shift;
-    uint32_t count = nlb << data_shift;
+    NvmeAIO *aio;
+
+    NvmeNamespace *ns = req->ns;
+    NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
     uint16_t ctrl = le16_to_cpu(rw->control);
+
+    int64_t offset;
+    size_t count;
     uint16_t status;
 
-    status = nvme_check_prinfo(n, ns, ctrl, req);
+    req->slba = le64_to_cpu(rw->slba);
+    req->nlb  = le16_to_cpu(rw->nlb) + 1;
+
+    trace_nvme_dev_write_zeroes(nvme_cid(req), le32_to_cpu(cmd->nsid),
+                                req->slba, req->nlb);
+
+    status = nvme_check_prinfo(n, ctrl, req);
     if (status) {
         goto invalid;
     }
@@ -588,15 +784,26 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
         goto invalid;
     }
 
-    status = nvme_check_bounds(n, ns, slba, nlb, req);
+    status = nvme_check_bounds(n, req->slba, req->nlb, req);
     if (status) {
         goto invalid;
     }
 
-    block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
-                     BLOCK_ACCT_WRITE);
-    req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
-                                        BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
+    offset = req->slba << nvme_ns_lbads(ns);
+    count = req->nlb << nvme_ns_lbads(ns);
+
+    aio = g_new0(NvmeAIO, 1);
+
+    *aio = (NvmeAIO) {
+        .blk = n->conf.blk,
+        .offset = offset,
+        .len = count,
+        .req = req,
+    };
+
+    nvme_req_register_aio(req, aio, NVME_AIO_OPC_WRITE_ZEROES);
+    nvme_submit_aio(aio);
+
     return NVME_NO_COMPLETE;
 
 invalid:
@@ -604,63 +811,36 @@ invalid:
     return status;
 }
 
-static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
-    NvmeRequest *req)
+static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
-    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
-    uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
-    uint64_t slba = le64_to_cpu(rw->slba);
-    uint16_t ctrl = le16_to_cpu(rw->control);
+    NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
+    NvmeNamespace *ns = req->ns;
+    uint32_t len;
+    int status;
 
-    uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
-    uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
-    uint64_t data_size = (uint64_t)nlb << data_shift;
-    uint64_t data_offset = slba << data_shift;
-    int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
-    enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
-    uint16_t status;
+    enum BlockAcctType acct =
+        nvme_req_is_write(req) ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 
-    trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
+    req->nlb  = le16_to_cpu(rw->nlb) + 1;
+    req->slba = le64_to_cpu(rw->slba);
 
-    status = nvme_check_mdts(n, data_size, req);
-    if (status) {
-        goto invalid;
-    }
+    len = req->nlb << nvme_ns_lbads(ns);
 
-    status = nvme_check_prinfo(n, ns, ctrl, req);
-    if (status) {
-        goto invalid;
-    }
+    trace_nvme_dev_rw(nvme_req_is_write(req) ? "write" : "read", req->nlb,
+                      req->nlb << nvme_ns_lbads(req->ns), req->slba);
 
-    status = nvme_check_bounds(n, ns, slba, nlb, req);
+    status = nvme_check_rw(n, req);
     if (status) {
         goto invalid;
     }
 
-    status = nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req);
+    status = nvme_map(n, cmd, &req->qsg, &req->iov, len, req);
     if (status) {
         goto invalid;
     }
 
-    if (req->qsg.nsg > 0) {
-        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->qsg.size,
-                         acct);
-
-        req->aiocb = is_write ?
-            dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
-                          nvme_rw_cb, req) :
-            dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
-                         nvme_rw_cb, req);
-    } else {
-        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->iov.size,
-                         acct);
-
-        req->aiocb = is_write ?
-            blk_aio_pwritev(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
-                            req) :
-            blk_aio_preadv(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
-                           req);
-    }
+    nvme_rw_aio(n->conf.blk, req->slba << nvme_ns_lbads(ns), req);
+    nvme_req_set_cb(req, nvme_rw_cb, NULL);
 
     return NVME_NO_COMPLETE;
 
@@ -671,23 +851,26 @@ invalid:
 
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
-    NvmeNamespace *ns;
     uint32_t nsid = le32_to_cpu(cmd->nsid);
 
+    trace_nvme_dev_io_cmd(nvme_cid(req), nsid, le16_to_cpu(req->sq->sqid),
+                          cmd->opcode);
+
     if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
         trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
-    ns = &n->namespaces[nsid - 1];
+    req->ns = &n->namespaces[nsid - 1];
+
     switch (cmd->opcode) {
     case NVME_CMD_FLUSH:
-        return nvme_flush(n, ns, cmd, req);
+        return nvme_flush(n, cmd, req);
     case NVME_CMD_WRITE_ZEROS:
-        return nvme_write_zeros(n, ns, cmd, req);
+        return nvme_write_zeroes(n, cmd, req);
     case NVME_CMD_WRITE:
     case NVME_CMD_READ:
-        return nvme_rw(n, ns, cmd, req);
+        return nvme_rw(n, cmd, req);
     default:
         trace_nvme_dev_err_invalid_opc(cmd->opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
@@ -711,6 +894,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
     NvmeRequest *req, *next;
     NvmeSQueue *sq;
     NvmeCQueue *cq;
+    NvmeAIO *aio;
     uint16_t qid = le16_to_cpu(c->qid);
 
     if (unlikely(!qid || nvme_check_sqid(n, qid))) {
@@ -723,8 +907,11 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
     sq = n->sq[qid];
     while (!QTAILQ_EMPTY(&sq->out_req_list)) {
         req = QTAILQ_FIRST(&sq->out_req_list);
-        assert(req->aiocb);
-        blk_aio_cancel(req->aiocb);
+        while (!QTAILQ_EMPTY(&req->aio_tailq)) {
+            aio = QTAILQ_FIRST(&req->aio_tailq);
+            assert(aio->aiocb);
+            blk_aio_cancel(aio->aiocb);
+        }
     }
     if (!nvme_check_cqid(n, sq->cqid)) {
         cq = n->cq[sq->cqid];
@@ -761,6 +948,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
     QTAILQ_INIT(&sq->out_req_list);
     for (i = 0; i < sq->size; i++) {
         sq->io_req[i].sq = sq;
+        QTAILQ_INIT(&(sq->io_req[i].aio_tailq));
         QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry);
     }
     sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
@@ -1474,8 +1662,9 @@ static void nvme_process_sq(void *opaque)
         req = QTAILQ_FIRST(&sq->req_list);
         QTAILQ_REMOVE(&sq->req_list, req, entry);
         QTAILQ_INSERT_TAIL(&sq->out_req_list, req, entry);
-        memset(&req->cqe, 0, sizeof(req->cqe));
+
         req->cqe.cid = cmd.cid;
+        memcpy(&req->cmd, &cmd, sizeof(NvmeCmd));
 
         status = sq->sqid ? nvme_io_cmd(n, &cmd, req) :
             nvme_admin_cmd(n, &cmd, req);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index b05c2153aebf..5d5fa8c8833a 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -27,16 +27,58 @@ typedef struct NvmeAsyncEvent {
     NvmeAerResult result;
 } NvmeAsyncEvent;
 
-typedef struct NvmeRequest {
-    struct NvmeSQueue       *sq;
-    BlockAIOCB              *aiocb;
-    uint16_t                status;
-    NvmeCqe                 cqe;
-    BlockAcctCookie         acct;
-    QEMUSGList              qsg;
-    QEMUIOVector            iov;
-    QTAILQ_ENTRY(NvmeRequest)entry;
-} NvmeRequest;
+typedef struct NvmeRequest NvmeRequest;
+typedef void NvmeRequestCompletionFunc(NvmeRequest *req, void *opaque);
+
+struct NvmeRequest {
+    struct NvmeSQueue    *sq;
+    struct NvmeNamespace *ns;
+
+    NvmeCqe  cqe;
+    NvmeCmd  cmd;
+    uint16_t status;
+
+    uint64_t slba;
+    uint32_t nlb;
+
+    QEMUSGList   qsg;
+    QEMUIOVector iov;
+
+    NvmeRequestCompletionFunc *cb;
+    void                      *cb_arg;
+
+    QTAILQ_HEAD(, NvmeAIO)    aio_tailq;
+    QTAILQ_ENTRY(NvmeRequest) entry;
+};
+
+static inline void nvme_req_clear(NvmeRequest *req)
+{
+    req->ns = NULL;
+    memset(&req->cqe, 0, sizeof(req->cqe));
+    req->status = NVME_SUCCESS;
+    req->slba = req->nlb = 0x0;
+    req->cb = req->cb_arg = NULL;
+
+    if (req->qsg.sg) {
+        qemu_sglist_destroy(&req->qsg);
+    }
+
+    if (req->iov.iov) {
+        qemu_iovec_destroy(&req->iov);
+    }
+}
+
+static inline void nvme_req_set_cb(NvmeRequest *req,
+                                   NvmeRequestCompletionFunc *cb, void *cb_arg)
+{
+    req->cb = cb;
+    req->cb_arg = cb_arg;
+}
+
+static inline void nvme_req_clear_cb(NvmeRequest *req)
+{
+    req->cb = req->cb_arg = NULL;
+}
 
 typedef struct NvmeSQueue {
     struct NvmeCtrl *ctrl;
@@ -88,6 +130,60 @@ static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
     return 1 << nvme_ns_lbads(ns);
 }
 
+typedef enum NvmeAIOOp {
+    NVME_AIO_OPC_NONE         = 0x0,
+    NVME_AIO_OPC_FLUSH        = 0x1,
+    NVME_AIO_OPC_READ         = 0x2,
+    NVME_AIO_OPC_WRITE        = 0x3,
+    NVME_AIO_OPC_WRITE_ZEROES = 0x4,
+} NvmeAIOOp;
+
+typedef struct NvmeAIO NvmeAIO;
+typedef void NvmeAIOCompletionFunc(NvmeAIO *aio, void *opaque, int ret);
+
+struct NvmeAIO {
+    NvmeRequest *req;
+
+    NvmeAIOOp       opc;
+    int64_t         offset;
+    size_t          len;
+    BlockBackend    *blk;
+    BlockAIOCB      *aiocb;
+    BlockAcctCookie acct;
+
+    NvmeAIOCompletionFunc *cb;
+    void                  *cb_arg;
+
+    QEMUSGList   *qsg;
+    QEMUIOVector *iov;
+
+    QTAILQ_ENTRY(NvmeAIO) tailq_entry;
+};
+
+static inline const char *nvme_aio_opc_str(NvmeAIO *aio)
+{
+    switch (aio->opc) {
+    case NVME_AIO_OPC_NONE:         return "NVME_AIO_OP_NONE";
+    case NVME_AIO_OPC_FLUSH:        return "NVME_AIO_OP_FLUSH";
+    case NVME_AIO_OPC_READ:         return "NVME_AIO_OP_READ";
+    case NVME_AIO_OPC_WRITE:        return "NVME_AIO_OP_WRITE";
+    case NVME_AIO_OPC_WRITE_ZEROES: return "NVME_AIO_OP_WRITE_ZEROES";
+    default:                        return "NVME_AIO_OP_UNKNOWN";
+    }
+}
+
+static inline bool nvme_req_is_write(NvmeRequest *req)
+{
+    switch (req->cmd.opcode) {
+    case NVME_CMD_WRITE:
+    case NVME_CMD_WRITE_UNCOR:
+    case NVME_CMD_WRITE_ZEROS:
+        return true;
+    default:
+        return false;
+    }
+}
+
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
@@ -140,10 +236,21 @@ static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
 static inline uint16_t nvme_cid(NvmeRequest *req)
 {
     if (req) {
-        return le16_to_cpu(req->cqe.cid);
+        return le16_to_cpu(req->cmd.cid);
     }
 
     return 0xffff;
 }
 
+static inline bool nvme_status_is_error(uint16_t status, uint16_t err)
+{
+    /* strip DNR and MORE */
+    return (status & 0xfff) == err;
+}
+
+static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
+{
+    return req->sq->ctrl;
+}
+
 #endif /* HW_NVME_H */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 2aceb0537e05..aa449e314818 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -34,7 +34,12 @@ nvme_dev_irq_pin(void) "pulsing IRQ pin"
 nvme_dev_irq_masked(void) "IRQ is masked"
 nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
 nvme_dev_map_prp(uint16_t cid, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
+nvme_dev_req_register_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, uint64_t count, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" count %"PRIu64" opc \"%s\" req %p"
+nvme_dev_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
+nvme_dev_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
 nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
+nvme_dev_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
+nvme_dev_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
 nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
 nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
 nvme_dev_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
@@ -81,6 +86,7 @@ nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
 # nvme traces for error conditions
 nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
 nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
+nvme_dev_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req, uint16_t status) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p status 0x%"PRIx16""
 nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
 nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
 nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 33/42] nvme: use preallocated qsg/iov in nvme_dma_prp
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (31 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 32/42] nvme: allow multiple aios per command Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:58   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 34/42] pci: pass along the return value of dma_memory_rw Klaus Jensen
                   ` (10 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Since clean up of the request qsg/iov has been moved to the common
nvme_enqueue_req_completion function, there is no need to use a stack
allocated qsg/iov in nvme_dma_prp.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 817384e3b1a9..15ca2417af04 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -321,45 +321,39 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
                              uint64_t prp1, uint64_t prp2, DMADirection dir,
                              NvmeRequest *req)
 {
-    QEMUSGList qsg;
-    QEMUIOVector iov;
     uint16_t status = NVME_SUCCESS;
 
-    status = nvme_map_prp(n, &qsg, &iov, prp1, prp2, len, req);
+    status = nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, len, req);
     if (status) {
         return status;
     }
 
-    if (qsg.nsg > 0) {
+    if (req->qsg.nsg > 0) {
         uint64_t residual;
 
         if (dir == DMA_DIRECTION_TO_DEVICE) {
-            residual = dma_buf_write(ptr, len, &qsg);
+            residual = dma_buf_write(ptr, len, &req->qsg);
         } else {
-            residual = dma_buf_read(ptr, len, &qsg);
+            residual = dma_buf_read(ptr, len, &req->qsg);
         }
 
         if (unlikely(residual)) {
             trace_nvme_dev_err_invalid_dma();
             status = NVME_INVALID_FIELD | NVME_DNR;
         }
-
-        qemu_sglist_destroy(&qsg);
     } else {
         size_t bytes;
 
         if (dir == DMA_DIRECTION_TO_DEVICE) {
-            bytes = qemu_iovec_to_buf(&iov, 0, ptr, len);
+            bytes = qemu_iovec_to_buf(&req->iov, 0, ptr, len);
         } else {
-            bytes = qemu_iovec_from_buf(&iov, 0, ptr, len);
+            bytes = qemu_iovec_from_buf(&req->iov, 0, ptr, len);
         }
 
         if (unlikely(bytes != len)) {
             trace_nvme_dev_err_invalid_dma();
             status = NVME_INVALID_FIELD | NVME_DNR;
         }
-
-        qemu_iovec_destroy(&iov);
     }
 
     return status;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 34/42] pci: pass along the return value of dma_memory_rw
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (32 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 33/42] nvme: use preallocated qsg/iov in nvme_dma_prp Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-16 14:29 ` [PATCH v6 35/42] nvme: handle dma errors Klaus Jensen
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

The nvme device needs to know the return value of dma_memory_rw to pass
block/011 from blktests. So pass it along instead of ignoring it.

There are no existing users of the return value, so this patch should be
safe.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 include/hw/pci/pci.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index cfedf5a995d7..da9057b8db97 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -784,8 +784,7 @@ static inline AddressSpace *pci_get_address_space(PCIDevice *dev)
 static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
                              void *buf, dma_addr_t len, DMADirection dir)
 {
-    dma_memory_rw(pci_get_address_space(dev), addr, buf, len, dir);
-    return 0;
+    return dma_memory_rw(pci_get_address_space(dev), addr, buf, len, dir);
 }
 
 static inline int pci_dma_read(PCIDevice *dev, dma_addr_t addr,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 35/42] nvme: handle dma errors
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (33 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 34/42] pci: pass along the return value of dma_memory_rw Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:58   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 36/42] nvme: add support for scatter gather lists Klaus Jensen
                   ` (8 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Handling DMA errors gracefully is required for the device to pass the
block/011 test ("disable PCI device while doing I/O") in the blktests
suite.

With this patch the device passes the test by retrying "critical"
transfers (posting of completion entries and processing of submission
queue entries).

If DMA errors occur at any other point in the execution of the command
(say, while mapping the PRPs), the command is aborted with a Data
Transfer Error status code.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c       | 45 ++++++++++++++++++++++++++++++++-----------
 hw/block/trace-events |  2 ++
 include/block/nvme.h  |  2 +-
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 15ca2417af04..49d323566393 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -74,14 +74,14 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
     return addr >= low && addr < hi;
 }
 
-static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
     if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
         memcpy(buf, nvme_addr_to_cmb(n, addr), size);
-        return;
+        return 0;
     }
 
-    pci_dma_read(&n->parent_obj, addr, buf, size);
+    return pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
@@ -164,7 +164,7 @@ static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
                                   size_t len)
 {
     if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
-        return NVME_DATA_TRAS_ERROR;
+        return NVME_DATA_TRANSFER_ERROR;
     }
 
     qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
@@ -213,6 +213,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
     int num_prps = (len >> n->page_bits) + 1;
     uint16_t status;
     bool prp_list_in_cmb = false;
+    int ret;
 
     trace_nvme_dev_map_prp(nvme_cid(req), trans_len, len, prp1, prp2,
                            num_prps);
@@ -252,7 +253,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
 
             nents = (len + n->page_size - 1) >> n->page_bits;
             prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-            nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
+            ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
+            if (ret) {
+                trace_nvme_dev_err_addr_read(prp2);
+                status = NVME_DATA_TRANSFER_ERROR;
+                goto unmap;
+            }
             while (len != 0) {
                 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
 
@@ -271,8 +277,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
                     i = 0;
                     nents = (len + n->page_size - 1) >> n->page_bits;
                     prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-                    nvme_addr_read(n, prp_ent, (void *)prp_list,
-                        prp_trans);
+                    ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
+                                         prp_trans);
+                    if (ret) {
+                        trace_nvme_dev_err_addr_read(prp_ent);
+                        status = NVME_DATA_TRANSFER_ERROR;
+                        goto unmap;
+                    }
                     prp_ent = le64_to_cpu(prp_list[i]);
                 }
 
@@ -466,6 +477,7 @@ static void nvme_post_cqes(void *opaque)
     NvmeCQueue *cq = opaque;
     NvmeCtrl *n = cq->ctrl;
     NvmeRequest *req, *next;
+    int ret;
 
     QTAILQ_FOREACH_SAFE(req, &cq->req_list, entry, next) {
         NvmeSQueue *sq;
@@ -475,15 +487,21 @@ static void nvme_post_cqes(void *opaque)
             break;
         }
 
-        QTAILQ_REMOVE(&cq->req_list, req, entry);
         sq = req->sq;
         req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
         req->cqe.sq_id = cpu_to_le16(sq->sqid);
         req->cqe.sq_head = cpu_to_le16(sq->head);
         addr = cq->dma_addr + cq->tail * n->cqe_size;
+        ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
+                            sizeof(req->cqe));
+        if (ret) {
+            trace_nvme_dev_err_addr_write(addr);
+            timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                      500 * SCALE_MS);
+            break;
+        }
+        QTAILQ_REMOVE(&cq->req_list, req, entry);
         nvme_inc_cq_tail(cq);
-        pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
-            sizeof(req->cqe));
         nvme_req_clear(req);
         QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
     }
@@ -1650,7 +1668,12 @@ static void nvme_process_sq(void *opaque)
 
     while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
         addr = sq->dma_addr + sq->head * n->sqe_size;
-        nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd));
+        if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
+            trace_nvme_dev_err_addr_read(addr);
+            timer_mod(sq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                      500 * SCALE_MS);
+            break;
+        }
         nvme_inc_sq_head(sq);
 
         req = QTAILQ_FIRST(&sq->req_list);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index aa449e314818..d51c09a4e454 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -87,6 +87,8 @@ nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
 nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
 nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
 nvme_dev_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req, uint16_t status) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p status 0x%"PRIx16""
+nvme_dev_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
+nvme_dev_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
 nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
 nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
 nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 293d68553538..d1ccde4cda4b 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -458,7 +458,7 @@ enum NvmeStatusCodes {
     NVME_INVALID_OPCODE         = 0x0001,
     NVME_INVALID_FIELD          = 0x0002,
     NVME_CID_CONFLICT           = 0x0003,
-    NVME_DATA_TRAS_ERROR        = 0x0004,
+    NVME_DATA_TRANSFER_ERROR    = 0x0004,
     NVME_POWER_LOSS_ABORT       = 0x0005,
     NVME_INTERNAL_DEV_ERROR     = 0x0006,
     NVME_CMD_ABORT_REQ          = 0x0007,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 36/42] nvme: add support for scatter gather lists
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (34 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 35/42] nvme: handle dma errors Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:58   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 37/42] nvme: refactor identify active namespace id list Klaus Jensen
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

For now, support the Data Block, Segment and Last Segment descriptor
types.

See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme.c       | 310 +++++++++++++++++++++++++++++++++++-------
 hw/block/trace-events |   4 +
 2 files changed, 262 insertions(+), 52 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 49d323566393..b89b96990f52 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -76,7 +76,12 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 
 static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
+    hwaddr hi = addr + size;
+    if (hi < addr) {
+        return 1;
+    }
+
+    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
         memcpy(buf, nvme_addr_to_cmb(n, addr), size);
         return 0;
     }
@@ -328,13 +333,242 @@ unmap:
     return status;
 }
 
-static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
-                             uint64_t prp1, uint64_t prp2, DMADirection dir,
+static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
+                                  QEMUIOVector *iov,
+                                  NvmeSglDescriptor *segment, uint64_t nsgld,
+                                  size_t *len, NvmeRequest *req)
+{
+    dma_addr_t addr, trans_len;
+    uint32_t blk_len;
+    uint16_t status;
+
+    for (int i = 0; i < nsgld; i++) {
+        uint8_t type = NVME_SGL_TYPE(segment[i].type);
+
+        if (type != NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
+            switch (type) {
+            case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
+            case NVME_SGL_DESCR_TYPE_KEYED_DATA_BLOCK:
+                return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR;
+            default:
+                return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR;
+            }
+        }
+
+        if (*len == 0) {
+            uint16_t sgls = le16_to_cpu(n->id_ctrl.sgls);
+            if (sgls & NVME_CTRL_SGLS_EXCESS_LENGTH) {
+                break;
+            }
+
+            trace_nvme_dev_err_invalid_sgl_excess_length(nvme_cid(req));
+            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+        }
+
+        addr = le64_to_cpu(segment[i].addr);
+        blk_len = le32_to_cpu(segment[i].len);
+
+        if (!blk_len) {
+            continue;
+        }
+
+        if (UINT64_MAX - addr < blk_len) {
+            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+        }
+
+        trans_len = MIN(*len, blk_len);
+
+        status = nvme_map_addr(n, qsg, iov, addr, trans_len);
+        if (status) {
+            return status;
+        }
+
+        *len -= trans_len;
+    }
+
+    return NVME_SUCCESS;
+}
+
+static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
+                             NvmeSglDescriptor sgl, size_t len,
                              NvmeRequest *req)
+{
+    /*
+     * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
+     * dynamically allocating a potentially large SGL. The spec allows the SGL
+     * to be larger than the command transfer size, so it is not bounded by
+     * MDTS.
+     */
+    const int SEG_CHUNK_SIZE = 256;
+
+    NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
+    uint64_t nsgld;
+    uint32_t seg_len;
+    uint16_t status;
+    bool sgl_in_cmb = false;
+    hwaddr addr;
+    int ret;
+
+    sgld = &sgl;
+    addr = le64_to_cpu(sgl.addr);
+
+    trace_nvme_dev_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), req->nlb,
+                           len);
+
+    /*
+     * If the entire transfer can be described with a single data block it can
+     * be mapped directly.
+     */
+    if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
+        status = nvme_map_sgl_data(n, qsg, iov, sgld, 1, &len, req);
+        if (status) {
+            goto unmap;
+        }
+
+        goto out;
+    }
+
+    /*
+     * If the segment is located in the CMB, the submission queue of the
+     * request must also reside there.
+     */
+    if (nvme_addr_is_cmb(n, addr)) {
+        if (!nvme_addr_is_cmb(n, req->sq->dma_addr)) {
+            return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+        }
+
+        sgl_in_cmb = true;
+    }
+
+    for (;;) {
+        seg_len = le32_to_cpu(sgld->len);
+
+        if (!seg_len || seg_len & 0xf) {
+            return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
+        }
+
+        if (UINT64_MAX - addr < seg_len) {
+            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+        }
+
+        nsgld = seg_len / sizeof(NvmeSglDescriptor);
+
+        while (nsgld > SEG_CHUNK_SIZE) {
+            if (nvme_addr_read(n, addr, segment, sizeof(segment))) {
+                trace_nvme_dev_err_addr_read(addr);
+                status = NVME_DATA_TRANSFER_ERROR;
+                goto unmap;
+            }
+
+            status = nvme_map_sgl_data(n, qsg, iov, segment, SEG_CHUNK_SIZE,
+                                       &len, req);
+            if (status) {
+                goto unmap;
+            }
+
+            nsgld -= SEG_CHUNK_SIZE;
+            addr += SEG_CHUNK_SIZE * sizeof(NvmeSglDescriptor);
+        }
+
+        ret = nvme_addr_read(n, addr, segment, nsgld *
+                             sizeof(NvmeSglDescriptor));
+        if (ret) {
+            trace_nvme_dev_err_addr_read(addr);
+            status = NVME_DATA_TRANSFER_ERROR;
+            goto unmap;
+        }
+
+        last_sgld = &segment[nsgld - 1];
+
+        /* if the segment ends with a Data Block, then we are done */
+        if (NVME_SGL_TYPE(last_sgld->type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
+            status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld, &len, req);
+            if (status) {
+                goto unmap;
+            }
+
+            break;
+        }
+
+        /* a Last Segment must end with a Data Block descriptor */
+        if (NVME_SGL_TYPE(sgld->type) == NVME_SGL_DESCR_TYPE_LAST_SEGMENT) {
+            status = NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
+            goto unmap;
+        }
+
+        sgld = last_sgld;
+        addr = le64_to_cpu(sgld->addr);
+
+        /*
+         * Do not map the last descriptor; it will be a Segment or Last Segment
+         * descriptor instead and handled by the next iteration.
+         */
+        status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld - 1, &len, req);
+        if (status) {
+            goto unmap;
+        }
+
+        /*
+         * If the next segment is in the CMB, make sure that the sgl was
+         * already located there.
+         */
+        if (sgl_in_cmb != nvme_addr_is_cmb(n, addr)) {
+            status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
+            goto unmap;
+        }
+    }
+
+out:
+    /* if there is any residual left in len, the SGL was too short */
+    if (len) {
+        status = NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+        goto unmap;
+    }
+
+    return NVME_SUCCESS;
+
+unmap:
+    if (iov->iov) {
+        qemu_iovec_destroy(iov);
+    }
+
+    if (qsg->sg) {
+        qemu_sglist_destroy(qsg);
+    }
+
+    return status;
+}
+
+static uint16_t nvme_map(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
+                         size_t len, NvmeRequest *req)
+{
+    uint64_t prp1, prp2;
+
+    switch (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
+    case PSDT_PRP:
+        prp1 = le64_to_cpu(req->cmd.dptr.prp1);
+        prp2 = le64_to_cpu(req->cmd.dptr.prp2);
+
+        return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
+    case PSDT_SGL_MPTR_CONTIGUOUS:
+    case PSDT_SGL_MPTR_SGL:
+        /* SGLs shall not be used for Admin commands in NVMe over PCIe */
+        if (!req->sq->sqid) {
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+
+        return nvme_map_sgl(n, qsg, iov, req->cmd.dptr.sgl, len, req);
+    default:
+        return NVME_INVALID_FIELD;
+    }
+}
+
+static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+                         DMADirection dir, NvmeRequest *req)
 {
     uint16_t status = NVME_SUCCESS;
 
-    status = nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, len, req);
+    status = nvme_map(n, &req->qsg, &req->iov, len, req);
     if (status) {
         return status;
     }
@@ -370,15 +604,6 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
     return status;
 }
 
-static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
-                         QEMUIOVector *iov, size_t len, NvmeRequest *req)
-{
-    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
-
-    return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
-}
-
 static void nvme_aio_destroy(NvmeAIO *aio)
 {
     g_free(aio);
@@ -846,7 +1071,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         goto invalid;
     }
 
-    status = nvme_map(n, cmd, &req->qsg, &req->iov, len, req);
+    status = nvme_map(n, &req->qsg, &req->iov, len, req);
     if (status) {
         goto invalid;
     }
@@ -1013,8 +1238,6 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
                                 uint32_t buf_len, uint64_t off,
                                 NvmeRequest *req)
 {
-    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
     uint32_t nsid = le32_to_cpu(cmd->nsid);
 
     uint32_t trans_len;
@@ -1064,16 +1287,14 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
         nvme_clear_events(n, NVME_AER_TYPE_SMART);
     }
 
-    return nvme_dma_prp(n, (uint8_t *) &smart + off, trans_len, prp1, prp2,
-                        DMA_DIRECTION_FROM_DEVICE, req);
+    return nvme_dma(n, (uint8_t *) &smart + off, trans_len,
+                    DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
                                  uint64_t off, NvmeRequest *req)
 {
     uint32_t trans_len;
-    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
     NvmeFwSlotInfoLog fw_log;
 
     if (off > sizeof(fw_log)) {
@@ -1084,8 +1305,8 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
 
     trans_len = MIN(sizeof(fw_log) - off, buf_len);
 
-    return nvme_dma_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1, prp2,
-                        DMA_DIRECTION_FROM_DEVICE, req);
+    return nvme_dma(n, (uint8_t *) &fw_log + off, trans_len,
+                    DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
@@ -1093,8 +1314,6 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
                                 NvmeRequest *req)
 {
     uint32_t trans_len;
-    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
     uint8_t errlog[64];
 
     if (!rae) {
@@ -1109,8 +1328,7 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
 
     trans_len = MIN(sizeof(errlog) - off, buf_len);
 
-    return nvme_dma_prp(n, errlog, trans_len, prp1, prp2,
-                        DMA_DIRECTION_FROM_DEVICE, req);
+    return nvme_dma(n, errlog, trans_len, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
@@ -1255,13 +1473,10 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
 static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
                                    NvmeRequest *req)
 {
-    uint64_t prp1 = le64_to_cpu(c->prp1);
-    uint64_t prp2 = le64_to_cpu(c->prp2);
-
     trace_nvme_dev_identify_ctrl();
 
-    return nvme_dma_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), prp1,
-                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
+    return nvme_dma(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
+                    DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
@@ -1269,8 +1484,6 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
 {
     NvmeNamespace *ns;
     uint32_t nsid = le32_to_cpu(c->nsid);
-    uint64_t prp1 = le64_to_cpu(c->prp1);
-    uint64_t prp2 = le64_to_cpu(c->prp2);
 
     trace_nvme_dev_identify_ns(nsid);
 
@@ -1281,8 +1494,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
 
     ns = &n->namespaces[nsid - 1];
 
-    return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
-                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
+    return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns),
+                    DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
@@ -1290,8 +1503,6 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
 {
     static const int data_len = NVME_IDENTIFY_DATA_SIZE;
     uint32_t min_nsid = le32_to_cpu(c->nsid);
-    uint64_t prp1 = le64_to_cpu(c->prp1);
-    uint64_t prp2 = le64_to_cpu(c->prp2);
     uint32_t *list;
     uint16_t ret;
     int i, j = 0;
@@ -1308,8 +1519,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
             break;
         }
     }
-    ret = nvme_dma_prp(n, (uint8_t *)list, data_len, prp1, prp2,
-                       DMA_DIRECTION_FROM_DEVICE, req);
+    ret = nvme_dma(n, (uint8_t *)list, data_len, DMA_DIRECTION_FROM_DEVICE,
+                   req);
     g_free(list);
     return ret;
 }
@@ -1318,8 +1529,6 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
                                             NvmeRequest *req)
 {
     uint32_t nsid = le32_to_cpu(c->nsid);
-    uint64_t prp1 = le64_to_cpu(c->prp1);
-    uint64_t prp2 = le64_to_cpu(c->prp2);
 
     void *list;
     uint16_t ret;
@@ -1345,8 +1554,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
     ns_descr->nidl = NVME_NIDT_UUID_LEN;
     stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
 
-    ret = nvme_dma_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
-                       prp2, DMA_DIRECTION_FROM_DEVICE, req);
+    ret = nvme_dma(n, (uint8_t *)list, NVME_IDENTIFY_DATA_SIZE,
+                   DMA_DIRECTION_FROM_DEVICE, req);
     g_free(list);
     return ret;
 }
@@ -1425,13 +1634,10 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
 static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
                                            NvmeRequest *req)
 {
-    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
-
     uint64_t timestamp = nvme_get_timestamp(n);
 
-    return nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
-                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
+    return nvme_dma(n, (uint8_t *)&timestamp, sizeof(timestamp),
+                    DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
@@ -1514,11 +1720,9 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
 {
     uint16_t ret;
     uint64_t timestamp;
-    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
-    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
 
-    ret = nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
-                       prp2, DMA_DIRECTION_TO_DEVICE, req);
+    ret = nvme_dma(n, (uint8_t *)&timestamp, sizeof(timestamp),
+                   DMA_DIRECTION_TO_DEVICE, req);
     if (ret != NVME_SUCCESS) {
         return ret;
     }
@@ -2306,6 +2510,8 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->nn = cpu_to_le32(n->num_namespaces);
     id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
 
+    id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT);
+
     pstrcpy((char *) id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:");
     pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index d51c09a4e454..70702cc67d5a 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -34,6 +34,7 @@ nvme_dev_irq_pin(void) "pulsing IRQ pin"
 nvme_dev_irq_masked(void) "IRQ is masked"
 nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
 nvme_dev_map_prp(uint16_t cid, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
+nvme_dev_map_sgl(uint16_t cid, uint8_t typ, uint32_t nlb, uint64_t len) "cid %"PRIu16" type 0x%"PRIx8" nlb %"PRIu32" len %"PRIu64""
 nvme_dev_req_register_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, uint64_t count, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" count %"PRIu64" opc \"%s\" req %p"
 nvme_dev_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
 nvme_dev_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
@@ -89,6 +90,9 @@ nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
 nvme_dev_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req, uint16_t status) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p status 0x%"PRIx16""
 nvme_dev_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 nvme_dev_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
+nvme_dev_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
+nvme_dev_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
+nvme_dev_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16""
 nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
 nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
 nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 37/42] nvme: refactor identify active namespace id list
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (35 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 36/42] nvme: add support for scatter gather lists Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:58   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 38/42] nvme: support multiple namespaces Klaus Jensen
                   ` (6 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Prepare to support inactive namespaces.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b89b96990f52..bf9fb500842a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1505,16 +1505,16 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
     uint32_t min_nsid = le32_to_cpu(c->nsid);
     uint32_t *list;
     uint16_t ret;
-    int i, j = 0;
+    int j = 0;
 
     trace_nvme_dev_identify_nslist(min_nsid);
 
     list = g_malloc0(data_len);
-    for (i = 0; i < n->num_namespaces; i++) {
-        if (i < min_nsid) {
+    for (int i = 1; i <= n->num_namespaces; i++) {
+        if (i <= min_nsid) {
             continue;
         }
-        list[j++] = cpu_to_le32(i + 1);
+        list[j++] = cpu_to_le32(i);
         if (j == data_len / sizeof(uint32_t)) {
             break;
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 38/42] nvme: support multiple namespaces
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (36 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 37/42] nvme: refactor identify active namespace id list Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:59   ` Maxim Levitsky
  2020-03-16 14:29 ` [PATCH v6 39/42] pci: allocate pci id for nvme Klaus Jensen
                   ` (5 subsequent siblings)
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

This adds support for multiple namespaces by introducing a new 'nvme-ns'
device model. The nvme device creates a bus named from the device name
('id'). The nvme-ns devices then connect to this and registers
themselves with the nvme device.

This changes how an nvme device is created. Example with two namespaces:

  -drive file=nvme0n1.img,if=none,id=disk1
  -drive file=nvme0n2.img,if=none,id=disk2
  -device nvme,serial=deadbeef,id=nvme0
  -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
  -device nvme-ns,drive=disk2,bus=nvme0,nsid=2

The drive property is kept on the nvme device to keep the change
backward compatible, but the property is now optional. Specifying a
drive for the nvme device will always create the namespace with nsid 1.

Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/Makefile.objs |   2 +-
 hw/block/nvme-ns.c     | 157 +++++++++++++++++++++++++++
 hw/block/nvme-ns.h     |  60 +++++++++++
 hw/block/nvme.c        | 233 ++++++++++++++++++++++++++---------------
 hw/block/nvme.h        |  47 ++++-----
 hw/block/trace-events  |   4 +-
 6 files changed, 389 insertions(+), 114 deletions(-)
 create mode 100644 hw/block/nvme-ns.c
 create mode 100644 hw/block/nvme-ns.h

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index 4b4a2b338dc4..d9141d6a4b9b 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -7,7 +7,7 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
 common-obj-$(CONFIG_XEN) += xen-block.o
 common-obj-$(CONFIG_ECC) += ecc.o
 common-obj-$(CONFIG_ONENAND) += onenand.o
-common-obj-$(CONFIG_NVME_PCI) += nvme.o
+common-obj-$(CONFIG_NVME_PCI) += nvme.o nvme-ns.o
 common-obj-$(CONFIG_SWIM) += swim.o
 
 common-obj-$(CONFIG_SH4) += tc58128.o
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
new file mode 100644
index 000000000000..6d975104171d
--- /dev/null
+++ b/hw/block/nvme-ns.c
@@ -0,0 +1,157 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+#include "hw/block/block.h"
+#include "hw/pci/pci.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+
+#include "nvme.h"
+#include "nvme-ns.h"
+
+static int nvme_ns_init(NvmeNamespace *ns)
+{
+    NvmeIdNs *id_ns = &ns->id_ns;
+
+    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
+    id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
+
+    /* no thin provisioning */
+    id_ns->ncap = id_ns->nsze;
+    id_ns->nuse = id_ns->ncap;
+
+    return 0;
+}
+
+static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id,
+                            Error **errp)
+{
+    uint64_t perm, shared_perm;
+
+    Error *local_err = NULL;
+    int ret;
+
+    perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
+    shared_perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+        BLK_PERM_GRAPH_MOD;
+
+    ret = blk_set_perm(ns->blk, perm, shared_perm, &local_err);
+    if (ret) {
+        error_propagate_prepend(errp, local_err,
+                                "could not set block permissions: ");
+        return ret;
+    }
+
+    ns->size = blk_getlength(ns->blk);
+    if (ns->size < 0) {
+        error_setg_errno(errp, -ns->size, "could not get blockdev size");
+        return -1;
+    }
+
+    switch (n->conf.wce) {
+    case ON_OFF_AUTO_ON:
+        n->features.volatile_wc = 1;
+        break;
+    case ON_OFF_AUTO_OFF:
+        n->features.volatile_wc = 0;
+    case ON_OFF_AUTO_AUTO:
+        n->features.volatile_wc = blk_enable_write_cache(ns->blk);
+        break;
+    default:
+        abort();
+    }
+
+    blk_set_enable_write_cache(ns->blk, n->features.volatile_wc);
+
+    return 0;
+}
+
+static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
+{
+    if (!ns->blk) {
+        error_setg(errp, "block backend not configured");
+        return 1;
+    }
+
+    return 0;
+}
+
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+{
+    if (nvme_ns_check_constraints(ns, errp)) {
+        return -1;
+    }
+
+    if (nvme_ns_init_blk(n, ns, &n->id_ctrl, errp)) {
+        return -1;
+    }
+
+    nvme_ns_init(ns);
+    if (nvme_register_namespace(n, ns, errp)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static void nvme_ns_realize(DeviceState *dev, Error **errp)
+{
+    NvmeNamespace *ns = NVME_NS(dev);
+    BusState *s = qdev_get_parent_bus(dev);
+    NvmeCtrl *n = NVME(s->parent);
+    Error *local_err = NULL;
+
+    if (nvme_ns_setup(n, ns, &local_err)) {
+        error_propagate_prepend(errp, local_err,
+                                "could not setup namespace: ");
+        return;
+    }
+}
+
+static Property nvme_ns_props[] = {
+    DEFINE_NVME_NS_PROPERTIES(NvmeNamespace, params),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void nvme_ns_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+    dc->bus_type = TYPE_NVME_BUS;
+    dc->realize = nvme_ns_realize;
+    device_class_set_props(dc, nvme_ns_props);
+    dc->desc = "virtual nvme namespace";
+}
+
+static void nvme_ns_instance_init(Object *obj)
+{
+    NvmeNamespace *ns = NVME_NS(obj);
+    char *bootindex = g_strdup_printf("/namespace@%d,0", ns->params.nsid);
+
+    device_add_bootindex_property(obj, &ns->bootindex, "bootindex",
+                                  bootindex, DEVICE(obj), &error_abort);
+
+    g_free(bootindex);
+}
+
+static const TypeInfo nvme_ns_info = {
+    .name = TYPE_NVME_NS,
+    .parent = TYPE_DEVICE,
+    .class_init = nvme_ns_class_init,
+    .instance_size = sizeof(NvmeNamespace),
+    .instance_init = nvme_ns_instance_init,
+};
+
+static void nvme_ns_register_types(void)
+{
+    type_register_static(&nvme_ns_info);
+}
+
+type_init(nvme_ns_register_types)
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
new file mode 100644
index 000000000000..3c3651d485d0
--- /dev/null
+++ b/hw/block/nvme-ns.h
@@ -0,0 +1,60 @@
+#ifndef NVME_NS_H
+#define NVME_NS_H
+
+#define TYPE_NVME_NS "nvme-ns"
+#define NVME_NS(obj) \
+    OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
+
+#define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
+    DEFINE_PROP_DRIVE("drive", _state, blk), \
+    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
+
+typedef struct NvmeNamespaceParams {
+    uint32_t nsid;
+} NvmeNamespaceParams;
+
+typedef struct NvmeNamespace {
+    DeviceState  parent_obj;
+    BlockBackend *blk;
+    int32_t      bootindex;
+    int64_t      size;
+
+    NvmeIdNs            id_ns;
+    NvmeNamespaceParams params;
+} NvmeNamespace;
+
+static inline uint32_t nvme_nsid(NvmeNamespace *ns)
+{
+    if (ns) {
+        return ns->params.nsid;
+    }
+
+    return -1;
+}
+
+static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
+{
+    NvmeIdNs *id_ns = &ns->id_ns;
+    return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
+static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
+{
+    return nvme_ns_lbaf(ns)->ds;
+}
+
+static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
+{
+    return 1 << nvme_ns_lbads(ns);
+}
+
+static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
+{
+    return ns->size >> nvme_ns_lbads(ns);
+}
+
+typedef struct NvmeCtrl NvmeCtrl;
+
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+
+#endif /* NVME_NS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bf9fb500842a..88a0499d0fe0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -17,10 +17,11 @@
 /**
  * Usage: add options:
  *      -drive file=<file>,if=none,id=<drive_id>
- *      -device nvme,drive=<drive_id>,serial=<serial>,id=<id[optional]>, \
+ *      -device nvme,serial=<serial>,id=<bus_name>, \
  *              cmb_size_mb=<cmb_size_mb[optional]>, \
  *              max_ioqpairs=<N[optional]>, \
  *              mdts=<mdts[optional]>
+ *      -device nvme-ns,drive=<drive_id>,bus=bus_name,nsid=1
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -44,6 +45,7 @@
 #include "qemu/cutils.h"
 #include "trace.h"
 #include "nvme.h"
+#include "nvme-ns.h"
 
 #define NVME_SPEC_VER 0x00010300
 #define NVME_CMB_BIR 2
@@ -89,6 +91,11 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
     return pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
+static uint16_t nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
+{
+    return nsid && nsid <= n->num_namespaces;
+}
+
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
     return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
@@ -892,11 +899,12 @@ static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
 
 static void nvme_rw_cb(NvmeRequest *req, void *opaque)
 {
+    NvmeNamespace *ns = req->ns;
     NvmeSQueue *sq = req->sq;
     NvmeCtrl *n = sq->ctrl;
     NvmeCQueue *cq = n->cq[sq->cqid];
 
-    trace_nvme_dev_rw_cb(nvme_cid(req), req->cmd.nsid);
+    trace_nvme_dev_rw_cb(nvme_cid(req), nvme_nsid(ns));
 
     nvme_enqueue_req_completion(cq, req);
 }
@@ -980,10 +988,11 @@ static void nvme_aio_cb(void *opaque, int ret)
 
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
+    NvmeNamespace *ns = req->ns;
     NvmeAIO *aio = g_new0(NvmeAIO, 1);
 
     *aio = (NvmeAIO) {
-        .blk = n->conf.blk,
+        .blk = ns->blk,
         .req = req,
     };
 
@@ -1008,8 +1017,8 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     req->slba = le64_to_cpu(rw->slba);
     req->nlb  = le16_to_cpu(rw->nlb) + 1;
 
-    trace_nvme_dev_write_zeroes(nvme_cid(req), le32_to_cpu(cmd->nsid),
-                                req->slba, req->nlb);
+    trace_nvme_dev_write_zeroes(nvme_cid(req), nvme_nsid(ns), req->slba,
+                                req->nlb);
 
     status = nvme_check_prinfo(n, ctrl, req);
     if (status) {
@@ -1032,7 +1041,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     aio = g_new0(NvmeAIO, 1);
 
     *aio = (NvmeAIO) {
-        .blk = n->conf.blk,
+        .blk = ns->blk,
         .offset = offset,
         .len = count,
         .req = req,
@@ -1044,7 +1053,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 
 invalid:
-    block_acct_invalid(blk_get_stats(n->conf.blk), BLOCK_ACCT_WRITE);
+    block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE);
     return status;
 }
 
@@ -1060,11 +1069,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 
     req->nlb  = le16_to_cpu(rw->nlb) + 1;
     req->slba = le64_to_cpu(rw->slba);
-
     len = req->nlb << nvme_ns_lbads(ns);
 
-    trace_nvme_dev_rw(nvme_req_is_write(req) ? "write" : "read", req->nlb,
-                      req->nlb << nvme_ns_lbads(req->ns), req->slba);
+    trace_nvme_dev_rw(nvme_cid(req), nvme_req_is_write(req) ? "write" : "read",
+                      nvme_nsid(ns), req->nlb, req->nlb << nvme_ns_lbads(ns),
+                      req->slba);
 
     status = nvme_check_rw(n, req);
     if (status) {
@@ -1076,13 +1085,13 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         goto invalid;
     }
 
-    nvme_rw_aio(n->conf.blk, req->slba << nvme_ns_lbads(ns), req);
+    nvme_rw_aio(ns->blk, req->slba << nvme_ns_lbads(ns), req);
     nvme_req_set_cb(req, nvme_rw_cb, NULL);
 
     return NVME_NO_COMPLETE;
 
 invalid:
-    block_acct_invalid(blk_get_stats(n->conf.blk), acct);
+    block_acct_invalid(blk_get_stats(ns->blk), acct);
     return status;
 }
 
@@ -1093,12 +1102,15 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     trace_nvme_dev_io_cmd(nvme_cid(req), nsid, le16_to_cpu(req->sq->sqid),
                           cmd->opcode);
 
-    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
-        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
+    if (!nvme_nsid_valid(n, nsid)) {
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
-    req->ns = &n->namespaces[nsid - 1];
+    req->ns = nvme_ns(n, nsid);
+
+    if (unlikely(!req->ns)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
 
     switch (cmd->opcode) {
     case NVME_CMD_FLUSH:
@@ -1245,18 +1257,24 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
     uint64_t units_read = 0, units_written = 0;
     uint64_t read_commands = 0, write_commands = 0;
     NvmeSmartLog smart;
-    BlockAcctStats *s;
 
     if (nsid && nsid != 0xffffffff) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    s = blk_get_stats(n->conf.blk);
+    for (int i = 1; i <= n->num_namespaces; i++) {
+        NvmeNamespace *ns = nvme_ns(n, i);
+        if (!ns) {
+            continue;
+        }
 
-    units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
-    units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
-    read_commands = s->nr_ops[BLOCK_ACCT_READ];
-    write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
+        BlockAcctStats *s = blk_get_stats(ns->blk);
+
+        units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
+        units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
+        read_commands += s->nr_ops[BLOCK_ACCT_READ];
+        write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
+    }
 
     if (off > sizeof(smart)) {
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1482,19 +1500,24 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
 static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
                                  NvmeRequest *req)
 {
-    NvmeNamespace *ns;
+    NvmeIdNs *id_ns, inactive = { 0 };
     uint32_t nsid = le32_to_cpu(c->nsid);
+    NvmeNamespace *ns;
 
     trace_nvme_dev_identify_ns(nsid);
 
-    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
-        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
+    if (!nvme_nsid_valid(n, nsid)) {
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
-    ns = &n->namespaces[nsid - 1];
+    ns = nvme_ns(n, nsid);
+    if (unlikely(!ns)) {
+        id_ns = &inactive;
+    } else {
+        id_ns = &ns->id_ns;
+    }
 
-    return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns),
+    return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs),
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
@@ -1511,7 +1534,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
 
     list = g_malloc0(data_len);
     for (int i = 1; i <= n->num_namespaces; i++) {
-        if (i <= min_nsid) {
+        if (i <= min_nsid || !nvme_ns(n, i)) {
             continue;
         }
         list[j++] = cpu_to_le32(i);
@@ -1536,11 +1559,14 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
 
     trace_nvme_dev_identify_ns_descr_list(nsid);
 
-    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
-        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
+    if (!nvme_nsid_valid(n, nsid)) {
         return NVME_INVALID_NSID | NVME_DNR;
     }
 
+    if (unlikely(!nvme_ns(n, nsid))) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
     list = g_malloc0(NVME_IDENTIFY_DATA_SIZE);
     ns_descr = list;
 
@@ -1680,7 +1706,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         result = cpu_to_le32(n->features.err_rec);
         break;
     case NVME_VOLATILE_WRITE_CACHE:
-        result = cpu_to_le32(blk_enable_write_cache(n->conf.blk));
+        result = cpu_to_le32(n->features.volatile_wc);
         trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
         break;
     case NVME_NUMBER_OF_QUEUES:
@@ -1734,6 +1760,8 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
 
 static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 {
+    NvmeNamespace *ns;
+
     uint32_t dw10 = le32_to_cpu(cmd->cdw10);
     uint32_t dw11 = le32_to_cpu(cmd->cdw11);
 
@@ -1766,12 +1794,23 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 
         break;
     case NVME_VOLATILE_WRITE_CACHE:
-        if (blk_enable_write_cache(n->conf.blk)) {
-            blk_flush(n->conf.blk);
+        n->features.volatile_wc = dw11;
+
+        for (int i = 1; i <= n->num_namespaces; i++) {
+            ns = nvme_ns(n, i);
+            if (!ns) {
+                continue;
+            }
+
+            if (blk_enable_write_cache(ns->blk)) {
+                blk_flush(ns->blk);
+            }
+
+            blk_set_enable_write_cache(ns->blk, dw11 & 1);
         }
 
-        blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
         break;
+
     case NVME_NUMBER_OF_QUEUES:
         if (n->qs_created) {
             return NVME_CMD_SEQ_ERROR | NVME_DNR;
@@ -1898,9 +1937,17 @@ static void nvme_process_sq(void *opaque)
 
 static void nvme_clear_ctrl(NvmeCtrl *n)
 {
+    NvmeNamespace *ns;
     int i;
 
-    blk_drain(n->conf.blk);
+    for (i = 1; i <= n->num_namespaces; i++) {
+        ns = nvme_ns(n, i);
+        if (!ns) {
+            continue;
+        }
+
+        blk_drain(ns->blk);
+    }
 
     for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
         if (n->sq[i] != NULL) {
@@ -1923,7 +1970,15 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
     n->outstanding_aers = 0;
     n->qs_created = false;
 
-    blk_flush(n->conf.blk);
+    for (i = 1; i <= n->num_namespaces; i++) {
+        ns = nvme_ns(n, i);
+        if (!ns) {
+            continue;
+        }
+
+        blk_flush(ns->blk);
+    }
+
     n->bar.cc = 0;
 }
 
@@ -2360,17 +2415,17 @@ static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
         n->params.max_ioqpairs = n->params.num_queues - 1;
     }
 
+    if (n->namespace.blk) {
+        warn_report("nvme: drive is deprecated; please use an nvme-ns device "
+                    "instead");
+    }
+
     if (params->max_ioqpairs < 1 ||
         params->max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
         error_setg(errp, "nvme: max_ioqpairs must be ");
         return -1;
     }
 
-    if (!n->conf.blk) {
-        error_setg(errp, "nvme: block backend not configured");
-        return -1;
-    }
-
     if (!params->serial) {
         error_setg(errp, "nvme: serial not configured");
         return -1;
@@ -2379,22 +2434,10 @@ static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
     return 0;
 }
 
-static int nvme_init_blk(NvmeCtrl *n, Error **errp)
-{
-    blkconf_blocksizes(&n->conf);
-    if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
-                                       false, errp)) {
-        return -1;
-    }
-
-    return 0;
-}
-
 static void nvme_init_state(NvmeCtrl *n)
 {
-    n->num_namespaces = 1;
+    n->num_namespaces = NVME_MAX_NAMESPACES;
     n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
-    n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
     n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
     n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
     n->temperature = NVME_TEMPERATURE;
@@ -2509,7 +2552,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->cqes = (0x4 << 4) | 0x4;
     id->nn = cpu_to_le32(n->num_namespaces);
     id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
-
+    id->vwc = 0x1;
     id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT);
 
     pstrcpy((char *) id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:");
@@ -2518,9 +2561,6 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     id->psd[0].mp = cpu_to_le16(0x9c4);
     id->psd[0].enlat = cpu_to_le32(0x10);
     id->psd[0].exlat = cpu_to_le32(0x4);
-    if (blk_enable_write_cache(n->conf.blk)) {
-        id->vwc = 1;
-    }
 
     n->bar.cap = 0;
     NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
@@ -2533,25 +2573,34 @@ static void nvme_init_ctrl(NvmeCtrl *n)
     n->bar.intmc = n->bar.intms = 0;
 }
 
-static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 {
-    int64_t bs_size;
-    NvmeIdNs *id_ns = &ns->id_ns;
+    uint32_t nsid = nvme_nsid(ns);
 
-    bs_size = blk_getlength(n->conf.blk);
-    if (bs_size < 0) {
-        error_setg_errno(errp, -bs_size, "blk_getlength");
+    if (nsid > NVME_MAX_NAMESPACES) {
+        error_setg(errp, "invalid nsid (must be between 0 and %d)",
+                   NVME_MAX_NAMESPACES);
         return -1;
     }
 
-    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
-    n->ns_size = bs_size;
+    if (!nsid) {
+        for (int i = 1; i <= n->num_namespaces; i++) {
+            NvmeNamespace *ns = nvme_ns(n, i);
+            if (!ns) {
+                nsid = i;
+                break;
+            }
+        }
+    } else {
+        if (n->namespaces[nsid - 1]) {
+            error_setg(errp, "nsid must be unique");
+            return -1;
+        }
+    }
 
-    id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(n, ns));
+    trace_nvme_dev_register_namespace(nsid);
 
-    /* no thin provisioning */
-    id_ns->ncap = id_ns->nsze;
-    id_ns->nuse = id_ns->ncap;
+    n->namespaces[nsid - 1] = ns;
 
     return 0;
 }
@@ -2559,26 +2608,28 @@ static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
     NvmeCtrl *n = NVME(pci_dev);
-    int i;
+    NvmeNamespace *ns;
 
     if (nvme_check_constraints(n, errp)) {
         return;
     }
 
+    qbus_create_inplace(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS,
+                        &pci_dev->qdev, n->parent_obj.qdev.id);
+
     nvme_init_state(n);
-
-    if (nvme_init_blk(n, errp)) {
-        return;
-    }
-
-    for (i = 0; i < n->num_namespaces; i++) {
-        if (nvme_init_namespace(n, &n->namespaces[i], errp)) {
-            return;
-        }
-    }
-
     nvme_init_pci(n, pci_dev);
     nvme_init_ctrl(n);
+
+    /* setup a namespace if the controller drive property was given */
+    if (n->namespace.blk) {
+        ns = &n->namespace;
+        ns->params.nsid = 1;
+
+        if (nvme_ns_setup(n, ns, errp)) {
+            return;
+        }
+    }
 }
 
 static void nvme_exit(PCIDevice *pci_dev)
@@ -2599,7 +2650,8 @@ static void nvme_exit(PCIDevice *pci_dev)
 }
 
 static Property nvme_props[] = {
-    DEFINE_BLOCK_PROPERTIES(NvmeCtrl, conf),
+    DEFINE_BLOCK_PROPERTIES_BASE(NvmeCtrl, conf), \
+    DEFINE_PROP_DRIVE("drive", NvmeCtrl, namespace.blk), \
     DEFINE_NVME_PROPERTIES(NvmeCtrl, params),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -2631,26 +2683,35 @@ static void nvme_instance_init(Object *obj)
 {
     NvmeCtrl *s = NVME(obj);
 
-    device_add_bootindex_property(obj, &s->conf.bootindex,
-                                  "bootindex", "/namespace@1,0",
-                                  DEVICE(obj), &error_abort);
+    if (s->namespace.blk) {
+        device_add_bootindex_property(obj, &s->conf.bootindex,
+                                      "bootindex", "/namespace@1,0",
+                                      DEVICE(obj), &error_abort);
+    }
 }
 
 static const TypeInfo nvme_info = {
     .name          = TYPE_NVME,
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(NvmeCtrl),
-    .class_init    = nvme_class_init,
     .instance_init = nvme_instance_init,
+    .class_init    = nvme_class_init,
     .interfaces = (InterfaceInfo[]) {
         { INTERFACE_PCIE_DEVICE },
         { }
     },
 };
 
+static const TypeInfo nvme_bus_info = {
+    .name = TYPE_NVME_BUS,
+    .parent = TYPE_BUS,
+    .instance_size = sizeof(NvmeBus),
+};
+
 static void nvme_register_types(void)
 {
     type_register_static(&nvme_info);
+    type_register_static(&nvme_bus_info);
 }
 
 type_init(nvme_register_types)
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 5d5fa8c8833a..c66f6cd8413a 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,9 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-ns.h"
+
+#define NVME_MAX_NAMESPACES 256
 
 #define DEFINE_NVME_PROPERTIES(_state, _props) \
     DEFINE_PROP_STRING("serial", _state, _props.serial), \
@@ -110,26 +113,6 @@ typedef struct NvmeCQueue {
     QTAILQ_HEAD(, NvmeRequest) req_list;
 } NvmeCQueue;
 
-typedef struct NvmeNamespace {
-    NvmeIdNs        id_ns;
-} NvmeNamespace;
-
-static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
-{
-    NvmeIdNs *id_ns = &ns->id_ns;
-    return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
-}
-
-static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
-{
-    return nvme_ns_lbaf(ns)->ds;
-}
-
-static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
-{
-    return 1 << nvme_ns_lbads(ns);
-}
-
 typedef enum NvmeAIOOp {
     NVME_AIO_OPC_NONE         = 0x0,
     NVME_AIO_OPC_FLUSH        = 0x1,
@@ -184,6 +167,13 @@ static inline bool nvme_req_is_write(NvmeRequest *req)
     }
 }
 
+#define TYPE_NVME_BUS "nvme-bus"
+#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
+
+typedef struct NvmeBus {
+    BusState parent_bus;
+} NvmeBus;
+
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
@@ -193,8 +183,9 @@ typedef struct NvmeCtrl {
     MemoryRegion iomem;
     MemoryRegion ctrl_mem;
     NvmeBar      bar;
-    BlockConf    conf;
     NvmeParams   params;
+    NvmeBus      bus;
+    BlockConf    conf;
 
     bool        qs_created;
     uint32_t    page_size;
@@ -205,7 +196,6 @@ typedef struct NvmeCtrl {
     uint32_t    reg_size;
     uint32_t    num_namespaces;
     uint32_t    max_q_ents;
-    uint64_t    ns_size;
     uint8_t     outstanding_aers;
     uint8_t     *cmbuf;
     uint64_t    irq_status;
@@ -219,7 +209,8 @@ typedef struct NvmeCtrl {
     QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
     int         aer_queued;
 
-    NvmeNamespace   *namespaces;
+    NvmeNamespace   namespace;
+    NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
     NvmeSQueue      **sq;
     NvmeCQueue      **cq;
     NvmeSQueue      admin_sq;
@@ -228,9 +219,13 @@ typedef struct NvmeCtrl {
     NvmeFeatureVal  features;
 } NvmeCtrl;
 
-static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
+static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
 {
-    return n->ns_size >> nvme_ns_lbads(ns);
+    if (!nsid || nsid > n->num_namespaces) {
+        return NULL;
+    }
+
+    return n->namespaces[nsid - 1];
 }
 
 static inline uint16_t nvme_cid(NvmeRequest *req)
@@ -253,4 +248,6 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
     return req->sq->ctrl;
 }
 
+int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+
 #endif /* HW_NVME_H */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 70702cc67d5a..3d907eaf0800 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -29,6 +29,7 @@ hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int t
 
 # nvme.c
 # nvme traces for successful events
+nvme_dev_register_namespace(uint32_t nsid) "nsid %"PRIu32""
 nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
 nvme_dev_irq_pin(void) "pulsing IRQ pin"
 nvme_dev_irq_masked(void) "IRQ is masked"
@@ -38,7 +39,7 @@ nvme_dev_map_sgl(uint16_t cid, uint8_t typ, uint32_t nlb, uint64_t len) "cid %"P
 nvme_dev_req_register_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, uint64_t count, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" count %"PRIu64" opc \"%s\" req %p"
 nvme_dev_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
 nvme_dev_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
-nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
+nvme_dev_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
 nvme_dev_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
 nvme_dev_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
 nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
@@ -98,7 +99,6 @@ nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or no
 nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
 nvme_dev_err_invalid_prp2_missing(void) "PRP2 is null and more data to be transferred"
 nvme_dev_err_invalid_prp(void) "invalid PRP"
-nvme_dev_err_invalid_ns(uint32_t ns, uint32_t limit) "invalid namespace %u not within 1-%u"
 nvme_dev_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8""
 nvme_dev_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8""
 nvme_dev_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64""
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 39/42] pci: allocate pci id for nvme
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (37 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 38/42] nvme: support multiple namespaces Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-16 14:29 ` [PATCH v6 40/42] nvme: change controller pci id Klaus Jensen
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

The emulated nvme device (hw/block/nvme.c) is currently using an
internal Intel device id.

Prepare to change that by allocating a device id under the 1b36 (Red
Hat, Inc.) vendor id.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 MAINTAINERS            |  1 +
 docs/specs/nvme.txt    | 25 +++++++++++++++++++++++++
 docs/specs/pci-ids.txt |  1 +
 include/hw/pci/pci.h   |  1 +
 4 files changed, 28 insertions(+)
 create mode 100644 docs/specs/nvme.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 32867bc63670..fcffe790ef40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1698,6 +1698,7 @@ L: qemu-block@nongnu.org
 S: Supported
 F: hw/block/nvme*
 F: tests/qtest/nvme-test.c
+F: docs/specs/nvme.txt
 
 megasas
 M: Hannes Reinecke <hare@suse.com>
diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
new file mode 100644
index 000000000000..b51552cb5c3f
--- /dev/null
+++ b/docs/specs/nvme.txt
@@ -0,0 +1,25 @@
+NVM Express Controller
+======================
+
+The nvme device (-device nvme) emulates an NVM Express Controller.
+
+
+Reference Specifications
+------------------------
+
+The device currently implements most mandatory features of NVMe v1.3d, see
+
+  https://nvmexpress.org/resources/specifications/
+
+for the specification.
+
+
+Known issues
+------------
+
+* The device does not have any way of storing persistent state, so minor parts
+  of the implementation is in violation of the specification:
+    - The accounting numbers in the SMART/Health are reset across power cycles
+
+* Interrupt Coalescing is not supported and is disabled by default in volation
+  of the specification.
diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
index 4d53e5c7d9d5..abbdbca6be38 100644
--- a/docs/specs/pci-ids.txt
+++ b/docs/specs/pci-ids.txt
@@ -63,6 +63,7 @@ PCI devices (other than virtio):
 1b36:000b  PCIe Expander Bridge (-device pxb-pcie)
 1b36:000d  PCI xhci usb host adapter
 1b36:000f  mdpy (mdev sample device), linux/samples/vfio-mdev/mdpy.c
+1b36:0010  PCIe NVMe device (-device nvme)
 
 All these devices are documented in docs/specs.
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index da9057b8db97..92231885bc23 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -104,6 +104,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_REDHAT_XHCI        0x000d
 #define PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE 0x000e
 #define PCI_DEVICE_ID_REDHAT_MDPY        0x000f
+#define PCI_DEVICE_ID_REDHAT_NVME        0x0010
 #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
 
 #define FMT_PCIBUS                      PRIx64
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 40/42] nvme: change controller pci id
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (38 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 39/42] pci: allocate pci id for nvme Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-16 14:29 ` [PATCH v6 41/42] nvme: remove redundant NvmeCmd pointer parameter Klaus Jensen
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

There are two reasons for changing this:

  1. The nvme device currently uses an internal Intel device id.

  2. Since commits "nvme: fix write zeroes offset and count" and "nvme:
     support multiple namespaces" the controller device no longer has
     the quirks that the Linux kernel think it has.

     As the quirks are applied based on pci vendor and device id, change
     them to get rid of the quirks.

To keep backward compatibility, add a new 'x-use-intel-id' parameter to
the nvme device to force use of the Intel vendor and device id. This is
off by default but add a compat property to set this for machines 4.2
and older.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c   | 13 +++++++++----
 hw/block/nvme.h   |  4 +++-
 hw/core/machine.c |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 88a0499d0fe0..f176d859a85e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2493,8 +2493,15 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
 
     pci_conf[PCI_INTERRUPT_PIN] = 1;
     pci_config_set_prog_interface(pci_conf, 0x2);
-    pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
-    pci_config_set_device_id(pci_conf, 0x5845);
+
+    if (n->params.use_intel_id) {
+        pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
+        pci_config_set_device_id(pci_conf, 0x5846);
+    } else {
+        pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT);
+        pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME);
+    }
+
     pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
     pcie_endpoint_cap_init(pci_dev, 0x80);
 
@@ -2669,8 +2676,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
     pc->realize = nvme_realize;
     pc->exit = nvme_exit;
     pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
-    pc->vendor_id = PCI_VENDOR_ID_INTEL;
-    pc->device_id = 0x5845;
     pc->revision = 2;
 
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index c66f6cd8413a..70df17e6f893 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -13,7 +13,8 @@
     DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64), \
     DEFINE_PROP_UINT8("aerl", _state, _props.aerl, 3), \
     DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64), \
-    DEFINE_PROP_UINT8("mdts", _state, _props.mdts, 7)
+    DEFINE_PROP_UINT8("mdts", _state, _props.mdts, 7), \
+    DEFINE_PROP_BOOL("x-use-intel-id", _state, _props.use_intel_id, false)
 
 typedef struct NvmeParams {
     char     *serial;
@@ -23,6 +24,7 @@ typedef struct NvmeParams {
     uint8_t  aerl;
     uint32_t aer_max_queued;
     uint8_t  mdts;
+    bool     use_intel_id;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9e8c06036faf..fe7dbca0b9a2 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -39,6 +39,7 @@ GlobalProperty hw_compat_4_2[] = {
     { "usb-redir", "suppress-remote-wake", "off" },
     { "qxl", "revision", "4" },
     { "qxl-vga", "revision", "4" },
+    { "nvme", "x-use-intel-id", "on"},
 };
 const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 41/42] nvme: remove redundant NvmeCmd pointer parameter
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (39 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 40/42] nvme: change controller pci id Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-16 14:29 ` [PATCH v6 42/42] nvme: make lba data size configurable Klaus Jensen
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

The command struct is available in the NvmeRequest that we generally
pass around anyway.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/block/nvme.c | 164 +++++++++++++++++++++++-------------------------
 1 file changed, 78 insertions(+), 86 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f176d859a85e..4f1504fc00fe 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -986,7 +986,7 @@ static void nvme_aio_cb(void *opaque, int ret)
     nvme_aio_destroy(aio);
 }
 
-static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeNamespace *ns = req->ns;
     NvmeAIO *aio = g_new0(NvmeAIO, 1);
@@ -1002,12 +1002,12 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeAIO *aio;
 
     NvmeNamespace *ns = req->ns;
-    NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
+    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
     uint16_t ctrl = le16_to_cpu(rw->control);
 
     int64_t offset;
@@ -1057,9 +1057,9 @@ invalid:
     return status;
 }
 
-static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
+    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
     NvmeNamespace *ns = req->ns;
     uint32_t len;
     int status;
@@ -1095,12 +1095,12 @@ invalid:
     return status;
 }
 
-static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
-    uint32_t nsid = le32_to_cpu(cmd->nsid);
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 
     trace_nvme_dev_io_cmd(nvme_cid(req), nsid, le16_to_cpu(req->sq->sqid),
-                          cmd->opcode);
+                          req->cmd.opcode);
 
     if (!nvme_nsid_valid(n, nsid)) {
         return NVME_INVALID_NSID | NVME_DNR;
@@ -1112,16 +1112,16 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    switch (cmd->opcode) {
+    switch (req->cmd.opcode) {
     case NVME_CMD_FLUSH:
-        return nvme_flush(n, cmd, req);
+        return nvme_flush(n, req);
     case NVME_CMD_WRITE_ZEROS:
-        return nvme_write_zeroes(n, cmd, req);
+        return nvme_write_zeroes(n, req);
     case NVME_CMD_WRITE:
     case NVME_CMD_READ:
-        return nvme_rw(n, cmd, req);
+        return nvme_rw(n, req);
     default:
-        trace_nvme_dev_err_invalid_opc(cmd->opcode);
+        trace_nvme_dev_err_invalid_opc(req->cmd.opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
     }
 }
@@ -1137,10 +1137,10 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
     }
 }
 
-static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeDeleteQ *c = (NvmeDeleteQ *)cmd;
-    NvmeRequest *req, *next;
+    NvmeDeleteQ *c = (NvmeDeleteQ *) &req->cmd;
+    NvmeRequest *next;
     NvmeSQueue *sq;
     NvmeCQueue *cq;
     NvmeAIO *aio;
@@ -1208,10 +1208,10 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
     n->sq[sqid] = sq;
 }
 
-static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeSQueue *sq;
-    NvmeCreateSq *c = (NvmeCreateSq *)cmd;
+    NvmeCreateSq *c = (NvmeCreateSq *) &req->cmd;
 
     uint16_t cqid = le16_to_cpu(c->cqid);
     uint16_t sqid = le16_to_cpu(c->sqid);
@@ -1246,11 +1246,10 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
-                                uint32_t buf_len, uint64_t off,
-                                NvmeRequest *req)
+static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
+                                uint64_t off, NvmeRequest *req)
 {
-    uint32_t nsid = le32_to_cpu(cmd->nsid);
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 
     uint32_t trans_len;
     time_t current_ms;
@@ -1309,8 +1308,8 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
-                                 uint64_t off, NvmeRequest *req)
+static uint16_t nvme_fw_log_info(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
+                                 NvmeRequest *req)
 {
     uint32_t trans_len;
     NvmeFwSlotInfoLog fw_log;
@@ -1327,9 +1326,8 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
-                                uint32_t buf_len, uint64_t off,
-                                NvmeRequest *req)
+static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
+                                uint64_t off, NvmeRequest *req)
 {
     uint32_t trans_len;
     uint8_t errlog[64];
@@ -1349,12 +1347,12 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
     return nvme_dma(n, errlog, trans_len, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
 {
-    uint32_t dw10 = le32_to_cpu(cmd->cdw10);
-    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
-    uint32_t dw12 = le32_to_cpu(cmd->cdw12);
-    uint32_t dw13 = le32_to_cpu(cmd->cdw13);
+    uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+    uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
+    uint32_t dw12 = le32_to_cpu(req->cmd.cdw12);
+    uint32_t dw13 = le32_to_cpu(req->cmd.cdw13);
     uint8_t  lid = dw10 & 0xff;
     uint8_t  lsp = (dw10 >> 8) & 0xf;
     uint8_t  rae = (dw10 >> 15) & 0x1;
@@ -1384,11 +1382,11 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
 
     switch (lid) {
     case NVME_LOG_ERROR_INFO:
-        return nvme_error_info(n, cmd, rae, len, off, req);
+        return nvme_error_info(n, rae, len, off, req);
     case NVME_LOG_SMART_INFO:
-        return nvme_smart_info(n, cmd, rae, len, off, req);
+        return nvme_smart_info(n, rae, len, off, req);
     case NVME_LOG_FW_SLOT_INFO:
-        return nvme_fw_log_info(n, cmd, len, off, req);
+        return nvme_fw_log_info(n, len, off, req);
     default:
         trace_nvme_dev_err_invalid_log_page(nvme_cid(req), lid);
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1406,9 +1404,9 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
     }
 }
 
-static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeDeleteQ *c = (NvmeDeleteQ *)cmd;
+    NvmeDeleteQ *c = (NvmeDeleteQ *) &req->cmd;
     NvmeCQueue *cq;
     uint16_t qid = le16_to_cpu(c->qid);
 
@@ -1446,10 +1444,10 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
     cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
 }
 
-static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
+static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeCQueue *cq;
-    NvmeCreateCq *c = (NvmeCreateCq *)cmd;
+    NvmeCreateCq *c = (NvmeCreateCq *) &req->cmd;
     uint16_t cqid = le16_to_cpu(c->cqid);
     uint16_t vector = le16_to_cpu(c->irq_vector);
     uint16_t qsize = le16_to_cpu(c->qsize);
@@ -1488,8 +1486,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
-                                   NvmeRequest *req)
+static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req)
 {
     trace_nvme_dev_identify_ctrl();
 
@@ -1497,11 +1494,10 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
-                                 NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeIdNs *id_ns, inactive = { 0 };
-    uint32_t nsid = le32_to_cpu(c->nsid);
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
     NvmeNamespace *ns;
 
     trace_nvme_dev_identify_ns(nsid);
@@ -1521,11 +1517,10 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
-                                     NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
 {
     static const int data_len = NVME_IDENTIFY_DATA_SIZE;
-    uint32_t min_nsid = le32_to_cpu(c->nsid);
+    uint32_t min_nsid = le32_to_cpu(req->cmd.nsid);
     uint32_t *list;
     uint16_t ret;
     int j = 0;
@@ -1548,10 +1543,9 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
     return ret;
 }
 
-static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
-                                            NvmeRequest *req)
+static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
 {
-    uint32_t nsid = le32_to_cpu(c->nsid);
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 
     void *list;
     uint16_t ret;
@@ -1586,28 +1580,28 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
     return ret;
 }
 
-static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req)
 {
-    NvmeIdentify *c = (NvmeIdentify *)cmd;
+    NvmeIdentify *c = (NvmeIdentify *) &req->cmd;
 
     switch (le32_to_cpu(c->cns)) {
     case NVME_ID_CNS_NS:
-        return nvme_identify_ns(n, c, req);
+        return nvme_identify_ns(n, req);
     case NVME_ID_CNS_CTRL:
-        return nvme_identify_ctrl(n, c, req);
+        return nvme_identify_ctrl(n, req);
     case NVME_ID_CNS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist(n, c, req);
+        return nvme_identify_nslist(n, req);
     case NVME_ID_CNS_NS_DESCR_LIST:
-        return nvme_identify_ns_descr_list(n, c, req);
+        return nvme_identify_ns_descr_list(n, req);
     default:
         trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 }
 
-static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req)
 {
-    uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0xffff;
+    uint16_t sqid = le32_to_cpu(req->cmd.cdw10) & 0xffff;
 
     req->cqe.result = 1;
     if (nvme_check_sqid(n, sqid)) {
@@ -1657,8 +1651,7 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
     return cpu_to_le64(ts.all);
 }
 
-static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
-                                           NvmeRequest *req)
+static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeRequest *req)
 {
     uint64_t timestamp = nvme_get_timestamp(n);
 
@@ -1666,10 +1659,10 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req)
 {
-    uint32_t dw10 = le32_to_cpu(cmd->cdw10);
-    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
+    uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+    uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
     uint32_t result;
 
     trace_nvme_dev_getfeat(nvme_cid(req), dw10);
@@ -1715,7 +1708,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
         trace_nvme_dev_getfeat_numq(result);
         break;
     case NVME_TIMESTAMP:
-        return nvme_get_feature_timestamp(n, cmd, req);
+        return nvme_get_feature_timestamp(n, req);
     case NVME_INTERRUPT_COALESCING:
         result = cpu_to_le32(n->features.int_coalescing);
         break;
@@ -1741,8 +1734,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
-                                           NvmeRequest *req)
+static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeRequest *req)
 {
     uint16_t ret;
     uint64_t timestamp;
@@ -1758,12 +1750,12 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeNamespace *ns;
 
-    uint32_t dw10 = le32_to_cpu(cmd->cdw10);
-    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
+    uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+    uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
 
     trace_nvme_dev_setfeat(nvme_cid(req), dw10, dw11);
 
@@ -1832,7 +1824,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
                                       ((n->params.max_ioqpairs - 1) << 16));
         break;
     case NVME_TIMESTAMP:
-        return nvme_set_feature_timestamp(n, cmd, req);
+        return nvme_set_feature_timestamp(n, req);
     case NVME_ASYNCHRONOUS_EVENT_CONF:
         n->features.async_config = dw11;
         break;
@@ -1850,7 +1842,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
-static uint16_t nvme_aer(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 {
     trace_nvme_dev_aer(nvme_cid(req));
 
@@ -1869,31 +1861,31 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
+static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
-    switch (cmd->opcode) {
+    switch (req->cmd.opcode) {
     case NVME_ADM_CMD_DELETE_SQ:
-        return nvme_del_sq(n, cmd);
+        return nvme_del_sq(n, req);
     case NVME_ADM_CMD_CREATE_SQ:
-        return nvme_create_sq(n, cmd);
+        return nvme_create_sq(n, req);
     case NVME_ADM_CMD_GET_LOG_PAGE:
-        return nvme_get_log(n, cmd, req);
+        return nvme_get_log(n, req);
     case NVME_ADM_CMD_DELETE_CQ:
-        return nvme_del_cq(n, cmd);
+        return nvme_del_cq(n, req);
     case NVME_ADM_CMD_CREATE_CQ:
-        return nvme_create_cq(n, cmd);
+        return nvme_create_cq(n, req);
     case NVME_ADM_CMD_IDENTIFY:
-        return nvme_identify(n, cmd, req);
+        return nvme_identify(n, req);
     case NVME_ADM_CMD_ABORT:
-        return nvme_abort(n, cmd, req);
+        return nvme_abort(n, req);
     case NVME_ADM_CMD_SET_FEATURES:
-        return nvme_set_feature(n, cmd, req);
+        return nvme_set_feature(n, req);
     case NVME_ADM_CMD_GET_FEATURES:
-        return nvme_get_feature(n, cmd, req);
+        return nvme_get_feature(n, req);
     case NVME_ADM_CMD_ASYNC_EV_REQ:
-        return nvme_aer(n, cmd, req);
+        return nvme_aer(n, req);
     default:
-        trace_nvme_dev_err_invalid_admin_opc(cmd->opcode);
+        trace_nvme_dev_err_invalid_admin_opc(req->cmd.opcode);
         return NVME_INVALID_OPCODE | NVME_DNR;
     }
 }
@@ -1926,8 +1918,8 @@ static void nvme_process_sq(void *opaque)
         req->cqe.cid = cmd.cid;
         memcpy(&req->cmd, &cmd, sizeof(NvmeCmd));
 
-        status = sq->sqid ? nvme_io_cmd(n, &cmd, req) :
-            nvme_admin_cmd(n, &cmd, req);
+        status = sq->sqid ? nvme_io_cmd(n, req) :
+            nvme_admin_cmd(n, req);
         if (status != NVME_NO_COMPLETE) {
             req->status = status;
             nvme_enqueue_req_completion(cq, req);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v6 42/42] nvme: make lba data size configurable
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (40 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 41/42] nvme: remove redundant NvmeCmd pointer parameter Klaus Jensen
@ 2020-03-16 14:29 ` Klaus Jensen
  2020-03-25 10:59   ` Maxim Levitsky
  2020-03-16 19:30 ` [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces no-reply
  2020-03-25 10:35 ` Maxim Levitsky
  43 siblings, 1 reply; 121+ messages in thread
From: Klaus Jensen @ 2020-03-16 14:29 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Klaus Jensen,
	Keith Busch, Javier Gonzalez, Maxim Levitsky

From: Klaus Jensen <k.jensen@samsung.com>

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
---
 hw/block/nvme-ns.c | 7 ++++++-
 hw/block/nvme-ns.h | 4 +++-
 hw/block/nvme.c    | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 6d975104171d..d7e5c81c5f16 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -18,7 +18,7 @@ static int nvme_ns_init(NvmeNamespace *ns)
 {
     NvmeIdNs *id_ns = &ns->id_ns;
 
-    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
+    id_ns->lbaf[0].ds = ns->params.lbads;
     id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
 
     /* no thin provisioning */
@@ -78,6 +78,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
         return 1;
     }
 
+    if (ns->params.lbads < 9 || ns->params.lbads > 12) {
+        error_setg(errp, "unsupported lbads (supported: 9-12)");
+        return 1;
+    }
+
     return 0;
 }
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 3c3651d485d0..43b78f8b8d9c 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -7,10 +7,12 @@
 
 #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
     DEFINE_PROP_DRIVE("drive", _state, blk), \
-    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
+    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \
+    DEFINE_PROP_UINT8("lbads", _state, _props.lbads, BDRV_SECTOR_BITS)
 
 typedef struct NvmeNamespaceParams {
     uint32_t nsid;
+    uint8_t  lbads;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 4f1504fc00fe..61a9da970d41 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2624,6 +2624,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     if (n->namespace.blk) {
         ns = &n->namespace;
         ns->params.nsid = 1;
+        ns->params.lbads = BDRV_SECTOR_BITS;
 
         if (nvme_ns_setup(n, ns, errp)) {
             return;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (41 preceding siblings ...)
  2020-03-16 14:29 ` [PATCH v6 42/42] nvme: make lba data size configurable Klaus Jensen
@ 2020-03-16 19:30 ` no-reply
  2020-03-25 10:35 ` Maxim Levitsky
  43 siblings, 0 replies; 121+ messages in thread
From: no-reply @ 2020-03-16 19:30 UTC (permalink / raw)
  To: its
  Cc: kwolf, beata.michalska, qemu-block, qemu-devel, mreitz, kbusch,
	its, javier.gonz, mlevitsk

Patchew URL: https://patchew.org/QEMU/20200316142928.153431-1-its@irrelevant.dk/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces
Message-id: 20200316142928.153431-1-its@irrelevant.dk
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
   509f617..a98135f  master     -> master
 - [tag update]      patchew/20200315144653.22660-1-armbru@redhat.com -> patchew/20200315144653.22660-1-armbru@redhat.com
 - [tag update]      patchew/20200315235716.28448-1-philmd@redhat.com -> patchew/20200315235716.28448-1-philmd@redhat.com
 - [tag update]      patchew/20200316001111.31004-1-philmd@redhat.com -> patchew/20200316001111.31004-1-philmd@redhat.com
 - [tag update]      patchew/20200316060631.30052-1-vsementsov@virtuozzo.com -> patchew/20200316060631.30052-1-vsementsov@virtuozzo.com
 - [tag update]      patchew/20200316103203.10046-1-ovoshcha@redhat.com -> patchew/20200316103203.10046-1-ovoshcha@redhat.com
 - [tag update]      patchew/20200316120049.11225-1-philmd@redhat.com -> patchew/20200316120049.11225-1-philmd@redhat.com
 - [tag update]      patchew/20200316160702.478964-1-stefanha@redhat.com -> patchew/20200316160702.478964-1-stefanha@redhat.com
 * [new tag]         patchew/20200316172155.971-1-alex.bennee@linaro.org -> patchew/20200316172155.971-1-alex.bennee@linaro.org
 * [new tag]         patchew/20200316174610.115820-1-jandryuk@gmail.com -> patchew/20200316174610.115820-1-jandryuk@gmail.com
Switched to a new branch 'test'
29ae701 nvme: make lba data size configurable
18cddc9 nvme: remove redundant NvmeCmd pointer parameter
526882e nvme: change controller pci id
98c330b pci: allocate pci id for nvme
0e37aa0 nvme: support multiple namespaces
6983597 nvme: refactor identify active namespace id list
d918ef5 nvme: add support for scatter gather lists
30f6663 nvme: handle dma errors
0cdbf87 pci: pass along the return value of dma_memory_rw
3a6a832 nvme: use preallocated qsg/iov in nvme_dma_prp
50022cc nvme: allow multiple aios per command
0720b5d nvme: add check for prinfo
fec3f89 nvme: add check for mdts
13ce0f5 nvme: refactor request bounds checking
9e8d597 nvme: verify validity of prp lists in the cmb
52c3589 nvme: add request mapping helper
5cbee0c nvme: pass request along for tracing
85e635e nvme: refactor dma read/write
91c10c5 nvme: remove redundant has_sg member
8c73469 nvme: add mapping helpers
26c1cba nvme: memset preallocated requests structures
5f27206 nvme: bump supported version to v1.3
6301b23 nvme: provide the mandatory subnqn field
e42793c nvme: enforce valid queue creation sequence
6cb67c5 nvme: support identify namespace descriptor list
2fd6cd0 nvme: add log specific field to trace events
28646e0 nvme: make sure ncqr and nsqr is valid
f093fa6 nvme: additional tracing
e01e2c3 nvme: add missing mandatory features
acc8277 nvme: add support for the asynchronous event request command
751053c nvme: add support for the get log page command
a75d78af nvme: add temperature threshold feature
90c3b3a nvme: refactor device realization
c6909e3 nvme: add max_ioqpairs device parameter
cf80062 nvme: add support for the abort command
c150866 nvme: refactor nvme_addr_read
153786f nvme: add identify cns values in header
7fb5521 nvme: use constant for identify data size
a350897 nvme: bump spec data structures to v1.3
b3abe79 nvme: move device parameters to separate struct
5a50ee9 nvme: remove superfluous breaks
79122a7 nvme: rename trace events to nvme_dev

=== OUTPUT BEGIN ===
1/42 Checking commit 79122a7973d8 (nvme: rename trace events to nvme_dev)
2/42 Checking commit 5a50ee90c197 (nvme: remove superfluous breaks)
3/42 Checking commit b3abe7986742 (nvme: move device parameters to separate struct)
ERROR: Macros with complex values should be enclosed in parenthesis
#179: FILE: hw/block/nvme.h:6:
+#define DEFINE_NVME_PROPERTIES(_state, _props) \
+    DEFINE_PROP_STRING("serial", _state, _props.serial), \
+    DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
+    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64)

total: 1 errors, 0 warnings, 181 lines checked

Patch 3/42 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/42 Checking commit a350897dab5c (nvme: bump spec data structures to v1.3)
5/42 Checking commit 7fb5521cf3df (nvme: use constant for identify data size)
6/42 Checking commit 153786f01b2c (nvme: add identify cns values in header)
7/42 Checking commit c15086697656 (nvme: refactor nvme_addr_read)
8/42 Checking commit cf8006250540 (nvme: add support for the abort command)
9/42 Checking commit c6909e377ba7 (nvme: add max_ioqpairs device parameter)
10/42 Checking commit 90c3b3a96d0e (nvme: refactor device realization)
11/42 Checking commit a75d78afec33 (nvme: add temperature threshold feature)
12/42 Checking commit 751053c2eebe (nvme: add support for the get log page command)
13/42 Checking commit acc82772cb85 (nvme: add support for the asynchronous event request command)
14/42 Checking commit e01e2c38efb6 (nvme: add missing mandatory features)
15/42 Checking commit f093fa63dd1d (nvme: additional tracing)
16/42 Checking commit 28646e0cb43e (nvme: make sure ncqr and nsqr is valid)
17/42 Checking commit 2fd6cd0b68b4 (nvme: add log specific field to trace events)
18/42 Checking commit 6cb67c52235e (nvme: support identify namespace descriptor list)
19/42 Checking commit e42793ca050c (nvme: enforce valid queue creation sequence)
20/42 Checking commit 6301b23c832c (nvme: provide the mandatory subnqn field)
21/42 Checking commit 5f272069c824 (nvme: bump supported version to v1.3)
22/42 Checking commit 26c1cba5e438 (nvme: memset preallocated requests structures)
23/42 Checking commit 8c73469d1135 (nvme: add mapping helpers)
24/42 Checking commit 91c10c5b8a10 (nvme: remove redundant has_sg member)
25/42 Checking commit 85e635e6490b (nvme: refactor dma read/write)
26/42 Checking commit 5cbee0c45413 (nvme: pass request along for tracing)
27/42 Checking commit 52c35897ba1d (nvme: add request mapping helper)
28/42 Checking commit 9e8d597376f9 (nvme: verify validity of prp lists in the cmb)
29/42 Checking commit 13ce0f521731 (nvme: refactor request bounds checking)
30/42 Checking commit fec3f89de690 (nvme: add check for mdts)
31/42 Checking commit 0720b5d83eef (nvme: add check for prinfo)
32/42 Checking commit 50022cc39bd7 (nvme: allow multiple aios per command)
33/42 Checking commit 3a6a83209c32 (nvme: use preallocated qsg/iov in nvme_dma_prp)
34/42 Checking commit 0cdbf87e80ff (pci: pass along the return value of dma_memory_rw)
35/42 Checking commit 30f6663d58dd (nvme: handle dma errors)
36/42 Checking commit d918ef503f4c (nvme: add support for scatter gather lists)
37/42 Checking commit 6983597bfc2c (nvme: refactor identify active namespace id list)
38/42 Checking commit 0e37aa02f23a (nvme: support multiple namespaces)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#43: 
new file mode 100644

ERROR: Macros with complex values should be enclosed in parenthesis
#218: FILE: hw/block/nvme-ns.h:8:
+#define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
+    DEFINE_PROP_DRIVE("drive", _state, blk), \
+    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)

total: 1 errors, 1 warnings, 822 lines checked

Patch 38/42 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

39/42 Checking commit 98c330b4dd11 (pci: allocate pci id for nvme)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#31: 
new file mode 100644

total: 0 errors, 1 warnings, 46 lines checked

Patch 39/42 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
40/42 Checking commit 526882e1651a (nvme: change controller pci id)
41/42 Checking commit 18cddc952a86 (nvme: remove redundant NvmeCmd pointer parameter)
42/42 Checking commit 29ae701c9d3a (nvme: make lba data size configurable)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200316142928.153431-1-its@irrelevant.dk/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces
  2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
                   ` (42 preceding siblings ...)
  2020-03-16 19:30 ` [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces no-reply
@ 2020-03-25 10:35 ` Maxim Levitsky
  43 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:35 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Hi,
> 
> So this patchset kinda blew up in size (wrt. number of patches) after
> Maxim's comments (26 -> 42), but Maxim's comments about splitting up a
> bunch of the patches made a lot of sense.
I don't think this is bad. 
You might actually found the ultimate question of life the universe and everything.
;-)

Best regards,
	Maxim Levitsky

> 
> v6 primarily splits up the big nasty patches into more digestible parts.
> Specifically the 'nvme: refactor prp mapping' and 'nvme: allow multiple
> aios per command' patches has been split up according to Maxim's
> comments. Most additions to the shared include/block/nvme.h has also
> been consolidated into a single patch (also according to Maxim's
> comments). A lot of the patches still carries a 'Reviewed-By', but
> git-backport-diff reports some changes due to changes/additions in some
> of the early patches.
> 
> The only real "addition" is a new "max_ioqpairs" parameter for the
> device. This is to fix some confusion about the current "num_queues"
> parameter. See "nvme: add max_ioqpairs device parameter".
> 
> Maxim, I responded to your comments in the original thread and I believe
> that all your comments has been adressed.
> 
> Also, I *did* change the line indentation style - I hope I caught 'em
> all :)
> 
> 
> Klaus Jensen (42):
>   nvme: rename trace events to nvme_dev
>   nvme: remove superfluous breaks
>   nvme: move device parameters to separate struct
>   nvme: bump spec data structures to v1.3
>   nvme: use constant for identify data size
>   nvme: add identify cns values in header
>   nvme: refactor nvme_addr_read
>   nvme: add support for the abort command
>   nvme: add max_ioqpairs device parameter
>   nvme: refactor device realization
>   nvme: add temperature threshold feature
>   nvme: add support for the get log page command
>   nvme: add support for the asynchronous event request command
>   nvme: add missing mandatory features
>   nvme: additional tracing
>   nvme: make sure ncqr and nsqr is valid
>   nvme: add log specific field to trace events
>   nvme: support identify namespace descriptor list
>   nvme: enforce valid queue creation sequence
>   nvme: provide the mandatory subnqn field
>   nvme: bump supported version to v1.3
>   nvme: memset preallocated requests structures
>   nvme: add mapping helpers
>   nvme: remove redundant has_sg member
>   nvme: refactor dma read/write
>   nvme: pass request along for tracing
>   nvme: add request mapping helper
>   nvme: verify validity of prp lists in the cmb
>   nvme: refactor request bounds checking
>   nvme: add check for mdts
>   nvme: add check for prinfo
>   nvme: allow multiple aios per command
>   nvme: use preallocated qsg/iov in nvme_dma_prp
>   pci: pass along the return value of dma_memory_rw
>   nvme: handle dma errors
>   nvme: add support for scatter gather lists
>   nvme: refactor identify active namespace id list
>   nvme: support multiple namespaces
>   pci: allocate pci id for nvme
>   nvme: change controller pci id
>   nvme: remove redundant NvmeCmd pointer parameter
>   nvme: make lba data size configurable
> 
>  MAINTAINERS            |    1 +
>  block/nvme.c           |   18 +-
>  docs/specs/nvme.txt    |   25 +
>  docs/specs/pci-ids.txt |    1 +
>  hw/block/Makefile.objs |    2 +-
>  hw/block/nvme-ns.c     |  162 ++++
>  hw/block/nvme-ns.h     |   62 ++
>  hw/block/nvme.c        | 2041 ++++++++++++++++++++++++++++++++--------
>  hw/block/nvme.h        |  205 +++-
>  hw/block/trace-events  |  206 ++--
>  hw/core/machine.c      |    1 +
>  include/block/nvme.h   |  178 +++-
>  include/hw/pci/pci.h   |    4 +-
>  13 files changed, 2347 insertions(+), 559 deletions(-)
>  create mode 100644 docs/specs/nvme.txt
>  create mode 100644 hw/block/nvme-ns.c
>  create mode 100644 hw/block/nvme-ns.h
> 








^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 01/42] nvme: rename trace events to nvme_dev
  2020-03-16 14:28 ` [PATCH v6 01/42] nvme: rename trace events to nvme_dev Klaus Jensen
@ 2020-03-25 10:36   ` Maxim Levitsky
  2020-03-31  5:38     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:36 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Change the prefix of all nvme device related trace events to 'nvme_dev'
> to not clash with trace events from the nvme block driver.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/block/nvme.c       | 188 +++++++++++++++++++++---------------------
>  hw/block/trace-events | 172 +++++++++++++++++++-------------------
>  2 files changed, 180 insertions(+), 180 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index d28335cbf377..3e4b18956ed2 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -112,16 +112,16 @@ static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
>  {
>      if (cq->irq_enabled) {
>          if (msix_enabled(&(n->parent_obj))) {
> -            trace_nvme_irq_msix(cq->vector);
> +            trace_nvme_dev_irq_msix(cq->vector);
>              msix_notify(&(n->parent_obj), cq->vector);
>          } else {
> -            trace_nvme_irq_pin();
> +            trace_nvme_dev_irq_pin();
>              assert(cq->cqid < 64);
>              n->irq_status |= 1 << cq->cqid;
>              nvme_irq_check(n);
>          }
>      } else {
> -        trace_nvme_irq_masked();
> +        trace_nvme_dev_irq_masked();
>      }
>  }
>  
> @@ -146,7 +146,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>      int num_prps = (len >> n->page_bits) + 1;
>  
>      if (unlikely(!prp1)) {
> -        trace_nvme_err_invalid_prp();
> +        trace_nvme_dev_err_invalid_prp();
>          return NVME_INVALID_FIELD | NVME_DNR;
>      } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
>                 prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
> @@ -160,7 +160,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>      len -= trans_len;
>      if (len) {
>          if (unlikely(!prp2)) {
> -            trace_nvme_err_invalid_prp2_missing();
> +            trace_nvme_dev_err_invalid_prp2_missing();
>              goto unmap;
>          }
>          if (len > n->page_size) {
> @@ -176,7 +176,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>  
>                  if (i == n->max_prp_ents - 1 && len > n->page_size) {
>                      if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
> -                        trace_nvme_err_invalid_prplist_ent(prp_ent);
> +                        trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
>                          goto unmap;
>                      }
>  
> @@ -189,7 +189,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>                  }
>  
>                  if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
> -                    trace_nvme_err_invalid_prplist_ent(prp_ent);
> +                    trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
>                      goto unmap;
>                  }
>  
> @@ -204,7 +204,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>              }
>          } else {
>              if (unlikely(prp2 & (n->page_size - 1))) {
> -                trace_nvme_err_invalid_prp2_align(prp2);
> +                trace_nvme_dev_err_invalid_prp2_align(prp2);
>                  goto unmap;
>              }
>              if (qsg->nsg) {
> @@ -252,20 +252,20 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>      QEMUIOVector iov;
>      uint16_t status = NVME_SUCCESS;
>  
> -    trace_nvme_dma_read(prp1, prp2);
> +    trace_nvme_dev_dma_read(prp1, prp2);
>  
>      if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>      if (qsg.nsg > 0) {
>          if (unlikely(dma_buf_read(ptr, len, &qsg))) {
> -            trace_nvme_err_invalid_dma();
> +            trace_nvme_dev_err_invalid_dma();
>              status = NVME_INVALID_FIELD | NVME_DNR;
>          }
>          qemu_sglist_destroy(&qsg);
>      } else {
>          if (unlikely(qemu_iovec_from_buf(&iov, 0, ptr, len) != len)) {
> -            trace_nvme_err_invalid_dma();
> +            trace_nvme_dev_err_invalid_dma();
>              status = NVME_INVALID_FIELD | NVME_DNR;
>          }
>          qemu_iovec_destroy(&iov);
> @@ -354,7 +354,7 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      uint32_t count = nlb << data_shift;
>  
>      if (unlikely(slba + nlb > ns->id_ns.nsze)) {
> -        trace_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
> +        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
>          return NVME_LBA_RANGE | NVME_DNR;
>      }
>  
> @@ -382,11 +382,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
>      enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
>  
> -    trace_nvme_rw(is_write ? "write" : "read", nlb, data_size, slba);
> +    trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
>  
>      if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
>          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> -        trace_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
> +        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
>          return NVME_LBA_RANGE | NVME_DNR;
>      }
>  
> @@ -421,7 +421,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      uint32_t nsid = le32_to_cpu(cmd->nsid);
>  
>      if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> -        trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
> +        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
>          return NVME_INVALID_NSID | NVME_DNR;
>      }
>  
> @@ -435,7 +435,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      case NVME_CMD_READ:
>          return nvme_rw(n, ns, cmd, req);
>      default:
> -        trace_nvme_err_invalid_opc(cmd->opcode);
> +        trace_nvme_dev_err_invalid_opc(cmd->opcode);
>          return NVME_INVALID_OPCODE | NVME_DNR;
>      }
>  }
> @@ -460,11 +460,11 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
>      uint16_t qid = le16_to_cpu(c->qid);
>  
>      if (unlikely(!qid || nvme_check_sqid(n, qid))) {
> -        trace_nvme_err_invalid_del_sq(qid);
> +        trace_nvme_dev_err_invalid_del_sq(qid);
>          return NVME_INVALID_QID | NVME_DNR;
>      }
>  
> -    trace_nvme_del_sq(qid);
> +    trace_nvme_dev_del_sq(qid);
>  
>      sq = n->sq[qid];
>      while (!QTAILQ_EMPTY(&sq->out_req_list)) {
> @@ -528,26 +528,26 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
>      uint16_t qflags = le16_to_cpu(c->sq_flags);
>      uint64_t prp1 = le64_to_cpu(c->prp1);
>  
> -    trace_nvme_create_sq(prp1, sqid, cqid, qsize, qflags);
> +    trace_nvme_dev_create_sq(prp1, sqid, cqid, qsize, qflags);
>  
>      if (unlikely(!cqid || nvme_check_cqid(n, cqid))) {
> -        trace_nvme_err_invalid_create_sq_cqid(cqid);
> +        trace_nvme_dev_err_invalid_create_sq_cqid(cqid);
>          return NVME_INVALID_CQID | NVME_DNR;
>      }
>      if (unlikely(!sqid || !nvme_check_sqid(n, sqid))) {
> -        trace_nvme_err_invalid_create_sq_sqid(sqid);
> +        trace_nvme_dev_err_invalid_create_sq_sqid(sqid);
>          return NVME_INVALID_QID | NVME_DNR;
>      }
>      if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
> -        trace_nvme_err_invalid_create_sq_size(qsize);
> +        trace_nvme_dev_err_invalid_create_sq_size(qsize);
>          return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
>      }
>      if (unlikely(!prp1 || prp1 & (n->page_size - 1))) {
> -        trace_nvme_err_invalid_create_sq_addr(prp1);
> +        trace_nvme_dev_err_invalid_create_sq_addr(prp1);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>      if (unlikely(!(NVME_SQ_FLAGS_PC(qflags)))) {
> -        trace_nvme_err_invalid_create_sq_qflags(NVME_SQ_FLAGS_PC(qflags));
> +        trace_nvme_dev_err_invalid_create_sq_qflags(NVME_SQ_FLAGS_PC(qflags));
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>      sq = g_malloc0(sizeof(*sq));
> @@ -573,17 +573,17 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeCmd *cmd)
>      uint16_t qid = le16_to_cpu(c->qid);
>  
>      if (unlikely(!qid || nvme_check_cqid(n, qid))) {
> -        trace_nvme_err_invalid_del_cq_cqid(qid);
> +        trace_nvme_dev_err_invalid_del_cq_cqid(qid);
>          return NVME_INVALID_CQID | NVME_DNR;
>      }
>  
>      cq = n->cq[qid];
>      if (unlikely(!QTAILQ_EMPTY(&cq->sq_list))) {
> -        trace_nvme_err_invalid_del_cq_notempty(qid);
> +        trace_nvme_dev_err_invalid_del_cq_notempty(qid);
>          return NVME_INVALID_QUEUE_DEL;
>      }
>      nvme_irq_deassert(n, cq);
> -    trace_nvme_del_cq(qid);
> +    trace_nvme_dev_del_cq(qid);
>      nvme_free_cq(cq, n);
>      return NVME_SUCCESS;
>  }
> @@ -616,27 +616,27 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
>      uint16_t qflags = le16_to_cpu(c->cq_flags);
>      uint64_t prp1 = le64_to_cpu(c->prp1);
>  
> -    trace_nvme_create_cq(prp1, cqid, vector, qsize, qflags,
> -                         NVME_CQ_FLAGS_IEN(qflags) != 0);
> +    trace_nvme_dev_create_cq(prp1, cqid, vector, qsize, qflags,
> +                             NVME_CQ_FLAGS_IEN(qflags) != 0);
>  
>      if (unlikely(!cqid || !nvme_check_cqid(n, cqid))) {
> -        trace_nvme_err_invalid_create_cq_cqid(cqid);
> +        trace_nvme_dev_err_invalid_create_cq_cqid(cqid);
>          return NVME_INVALID_CQID | NVME_DNR;
>      }
>      if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
> -        trace_nvme_err_invalid_create_cq_size(qsize);
> +        trace_nvme_dev_err_invalid_create_cq_size(qsize);
>          return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
>      }
>      if (unlikely(!prp1)) {
> -        trace_nvme_err_invalid_create_cq_addr(prp1);
> +        trace_nvme_dev_err_invalid_create_cq_addr(prp1);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>      if (unlikely(vector > n->num_queues)) {
> -        trace_nvme_err_invalid_create_cq_vector(vector);
> +        trace_nvme_dev_err_invalid_create_cq_vector(vector);
>          return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
>      }
>      if (unlikely(!(NVME_CQ_FLAGS_PC(qflags)))) {
> -        trace_nvme_err_invalid_create_cq_qflags(NVME_CQ_FLAGS_PC(qflags));
> +        trace_nvme_dev_err_invalid_create_cq_qflags(NVME_CQ_FLAGS_PC(qflags));
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>  
> @@ -651,7 +651,7 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
>      uint64_t prp1 = le64_to_cpu(c->prp1);
>      uint64_t prp2 = le64_to_cpu(c->prp2);
>  
> -    trace_nvme_identify_ctrl();
> +    trace_nvme_dev_identify_ctrl();
>  
>      return nvme_dma_read_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
>          prp1, prp2);
> @@ -664,10 +664,10 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
>      uint64_t prp1 = le64_to_cpu(c->prp1);
>      uint64_t prp2 = le64_to_cpu(c->prp2);
>  
> -    trace_nvme_identify_ns(nsid);
> +    trace_nvme_dev_identify_ns(nsid);
>  
>      if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> -        trace_nvme_err_invalid_ns(nsid, n->num_namespaces);
> +        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
>          return NVME_INVALID_NSID | NVME_DNR;
>      }
>  
> @@ -687,7 +687,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
>      uint16_t ret;
>      int i, j = 0;
>  
> -    trace_nvme_identify_nslist(min_nsid);
> +    trace_nvme_dev_identify_nslist(min_nsid);
>  
>      list = g_malloc0(data_len);
>      for (i = 0; i < n->num_namespaces; i++) {
> @@ -716,14 +716,14 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>      case 0x02:
>          return nvme_identify_nslist(n, c);
>      default:
> -        trace_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns));
> +        trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>  }
>  
>  static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
>  {
> -    trace_nvme_setfeat_timestamp(ts);
> +    trace_nvme_dev_setfeat_timestamp(ts);
>  
>      n->host_timestamp = le64_to_cpu(ts);
>      n->timestamp_set_qemu_clock_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> @@ -756,7 +756,7 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
>      /* If the host timestamp is non-zero, set the timestamp origin */
>      ts.origin = n->host_timestamp ? 0x01 : 0x00;
>  
> -    trace_nvme_getfeat_timestamp(ts.all);
> +    trace_nvme_dev_getfeat_timestamp(ts.all);
>  
>      return cpu_to_le64(ts.all);
>  }
> @@ -780,17 +780,17 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      switch (dw10) {
>      case NVME_VOLATILE_WRITE_CACHE:
>          result = blk_enable_write_cache(n->conf.blk);
> -        trace_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
> +        trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
>          break;
>      case NVME_NUMBER_OF_QUEUES:
>          result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
> -        trace_nvme_getfeat_numq(result);
> +        trace_nvme_dev_getfeat_numq(result);
>          break;
>      case NVME_TIMESTAMP:
>          return nvme_get_feature_timestamp(n, cmd);
>          break;
>      default:
> -        trace_nvme_err_invalid_getfeat(dw10);
> +        trace_nvme_dev_err_invalid_getfeat(dw10);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>  
> @@ -826,9 +826,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> -        trace_nvme_setfeat_numq((dw11 & 0xFFFF) + 1,
> -                                ((dw11 >> 16) & 0xFFFF) + 1,
> -                                n->num_queues - 1, n->num_queues - 1);
> +        trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
> +                                    ((dw11 >> 16) & 0xFFFF) + 1,
> +                                    n->num_queues - 1, n->num_queues - 1);
>          req->cqe.result =
>              cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
>          break;
> @@ -838,7 +838,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          break;
>  
>      default:
> -        trace_nvme_err_invalid_setfeat(dw10);
> +        trace_nvme_dev_err_invalid_setfeat(dw10);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>      return NVME_SUCCESS;
> @@ -862,7 +862,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      case NVME_ADM_CMD_GET_FEATURES:
>          return nvme_get_feature(n, cmd, req);
>      default:
> -        trace_nvme_err_invalid_admin_opc(cmd->opcode);
> +        trace_nvme_dev_err_invalid_admin_opc(cmd->opcode);
>          return NVME_INVALID_OPCODE | NVME_DNR;
>      }
>  }
> @@ -925,77 +925,77 @@ static int nvme_start_ctrl(NvmeCtrl *n)
>      uint32_t page_size = 1 << page_bits;
>  
>      if (unlikely(n->cq[0])) {
> -        trace_nvme_err_startfail_cq();
> +        trace_nvme_dev_err_startfail_cq();
>          return -1;
>      }
>      if (unlikely(n->sq[0])) {
> -        trace_nvme_err_startfail_sq();
> +        trace_nvme_dev_err_startfail_sq();
>          return -1;
>      }
>      if (unlikely(!n->bar.asq)) {
> -        trace_nvme_err_startfail_nbarasq();
> +        trace_nvme_dev_err_startfail_nbarasq();
>          return -1;
>      }
>      if (unlikely(!n->bar.acq)) {
> -        trace_nvme_err_startfail_nbaracq();
> +        trace_nvme_dev_err_startfail_nbaracq();
>          return -1;
>      }
>      if (unlikely(n->bar.asq & (page_size - 1))) {
> -        trace_nvme_err_startfail_asq_misaligned(n->bar.asq);
> +        trace_nvme_dev_err_startfail_asq_misaligned(n->bar.asq);
>          return -1;
>      }
>      if (unlikely(n->bar.acq & (page_size - 1))) {
> -        trace_nvme_err_startfail_acq_misaligned(n->bar.acq);
> +        trace_nvme_dev_err_startfail_acq_misaligned(n->bar.acq);
>          return -1;
>      }
>      if (unlikely(NVME_CC_MPS(n->bar.cc) <
>                   NVME_CAP_MPSMIN(n->bar.cap))) {
> -        trace_nvme_err_startfail_page_too_small(
> +        trace_nvme_dev_err_startfail_page_too_small(
>                      NVME_CC_MPS(n->bar.cc),
>                      NVME_CAP_MPSMIN(n->bar.cap));
>          return -1;
>      }
>      if (unlikely(NVME_CC_MPS(n->bar.cc) >
>                   NVME_CAP_MPSMAX(n->bar.cap))) {
> -        trace_nvme_err_startfail_page_too_large(
> +        trace_nvme_dev_err_startfail_page_too_large(
>                      NVME_CC_MPS(n->bar.cc),
>                      NVME_CAP_MPSMAX(n->bar.cap));
>          return -1;
>      }
>      if (unlikely(NVME_CC_IOCQES(n->bar.cc) <
>                   NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) {
> -        trace_nvme_err_startfail_cqent_too_small(
> +        trace_nvme_dev_err_startfail_cqent_too_small(
>                      NVME_CC_IOCQES(n->bar.cc),
>                      NVME_CTRL_CQES_MIN(n->bar.cap));
>          return -1;
>      }
>      if (unlikely(NVME_CC_IOCQES(n->bar.cc) >
>                   NVME_CTRL_CQES_MAX(n->id_ctrl.cqes))) {
> -        trace_nvme_err_startfail_cqent_too_large(
> +        trace_nvme_dev_err_startfail_cqent_too_large(
>                      NVME_CC_IOCQES(n->bar.cc),
>                      NVME_CTRL_CQES_MAX(n->bar.cap));
>          return -1;
>      }
>      if (unlikely(NVME_CC_IOSQES(n->bar.cc) <
>                   NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) {
> -        trace_nvme_err_startfail_sqent_too_small(
> +        trace_nvme_dev_err_startfail_sqent_too_small(
>                      NVME_CC_IOSQES(n->bar.cc),
>                      NVME_CTRL_SQES_MIN(n->bar.cap));
>          return -1;
>      }
>      if (unlikely(NVME_CC_IOSQES(n->bar.cc) >
>                   NVME_CTRL_SQES_MAX(n->id_ctrl.sqes))) {
> -        trace_nvme_err_startfail_sqent_too_large(
> +        trace_nvme_dev_err_startfail_sqent_too_large(
>                      NVME_CC_IOSQES(n->bar.cc),
>                      NVME_CTRL_SQES_MAX(n->bar.cap));
>          return -1;
>      }
>      if (unlikely(!NVME_AQA_ASQS(n->bar.aqa))) {
> -        trace_nvme_err_startfail_asqent_sz_zero();
> +        trace_nvme_dev_err_startfail_asqent_sz_zero();
>          return -1;
>      }
>      if (unlikely(!NVME_AQA_ACQS(n->bar.aqa))) {
> -        trace_nvme_err_startfail_acqent_sz_zero();
> +        trace_nvme_dev_err_startfail_acqent_sz_zero();
>          return -1;
>      }
>  
> @@ -1018,14 +1018,14 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
>      unsigned size)
>  {
>      if (unlikely(offset & (sizeof(uint32_t) - 1))) {
> -        NVME_GUEST_ERR(nvme_ub_mmiowr_misaligned32,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_misaligned32,
>                         "MMIO write not 32-bit aligned,"
>                         " offset=0x%"PRIx64"", offset);
>          /* should be ignored, fall through for now */
>      }
>  
>      if (unlikely(size < sizeof(uint32_t))) {
> -        NVME_GUEST_ERR(nvme_ub_mmiowr_toosmall,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_toosmall,
>                         "MMIO write smaller than 32-bits,"
>                         " offset=0x%"PRIx64", size=%u",
>                         offset, size);
> @@ -1035,32 +1035,32 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
>      switch (offset) {
>      case 0xc:   /* INTMS */
>          if (unlikely(msix_enabled(&(n->parent_obj)))) {
> -            NVME_GUEST_ERR(nvme_ub_mmiowr_intmask_with_msix,
> +            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_intmask_with_msix,
>                             "undefined access to interrupt mask set"
>                             " when MSI-X is enabled");
>              /* should be ignored, fall through for now */
>          }
>          n->bar.intms |= data & 0xffffffff;
>          n->bar.intmc = n->bar.intms;
> -        trace_nvme_mmio_intm_set(data & 0xffffffff,
> +        trace_nvme_dev_mmio_intm_set(data & 0xffffffff,
>                                   n->bar.intmc);
Indention.

>          nvme_irq_check(n);
>          break;
>      case 0x10:  /* INTMC */
>          if (unlikely(msix_enabled(&(n->parent_obj)))) {
> -            NVME_GUEST_ERR(nvme_ub_mmiowr_intmask_with_msix,
> +            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_intmask_with_msix,
>                             "undefined access to interrupt mask clr"
>                             " when MSI-X is enabled");
>              /* should be ignored, fall through for now */
>          }
>          n->bar.intms &= ~(data & 0xffffffff);
>          n->bar.intmc = n->bar.intms;
> -        trace_nvme_mmio_intm_clr(data & 0xffffffff,
> +        trace_nvme_dev_mmio_intm_clr(data & 0xffffffff,
>                                   n->bar.intmc);
Indention.

>          nvme_irq_check(n);
>          break;
>      case 0x14:  /* CC */
> -        trace_nvme_mmio_cfg(data & 0xffffffff);
> +        trace_nvme_dev_mmio_cfg(data & 0xffffffff);
>          /* Windows first sends data, then sends enable bit */
>          if (!NVME_CC_EN(data) && !NVME_CC_EN(n->bar.cc) &&
>              !NVME_CC_SHN(data) && !NVME_CC_SHN(n->bar.cc))
> @@ -1071,42 +1071,42 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
>          if (NVME_CC_EN(data) && !NVME_CC_EN(n->bar.cc)) {
>              n->bar.cc = data;
>              if (unlikely(nvme_start_ctrl(n))) {
> -                trace_nvme_err_startfail();
> +                trace_nvme_dev_err_startfail();
>                  n->bar.csts = NVME_CSTS_FAILED;
>              } else {
> -                trace_nvme_mmio_start_success();
> +                trace_nvme_dev_mmio_start_success();
>                  n->bar.csts = NVME_CSTS_READY;
>              }
>          } else if (!NVME_CC_EN(data) && NVME_CC_EN(n->bar.cc)) {
> -            trace_nvme_mmio_stopped();
> +            trace_nvme_dev_mmio_stopped();
>              nvme_clear_ctrl(n);
>              n->bar.csts &= ~NVME_CSTS_READY;
>          }
>          if (NVME_CC_SHN(data) && !(NVME_CC_SHN(n->bar.cc))) {
> -            trace_nvme_mmio_shutdown_set();
> +            trace_nvme_dev_mmio_shutdown_set();
>              nvme_clear_ctrl(n);
>              n->bar.cc = data;
>              n->bar.csts |= NVME_CSTS_SHST_COMPLETE;
>          } else if (!NVME_CC_SHN(data) && NVME_CC_SHN(n->bar.cc)) {
> -            trace_nvme_mmio_shutdown_cleared();
> +            trace_nvme_dev_mmio_shutdown_cleared();
>              n->bar.csts &= ~NVME_CSTS_SHST_COMPLETE;
>              n->bar.cc = data;
>          }
>          break;
>      case 0x1C:  /* CSTS */
>          if (data & (1 << 4)) {
> -            NVME_GUEST_ERR(nvme_ub_mmiowr_ssreset_w1c_unsupported,
> +            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_ssreset_w1c_unsupported,
>                             "attempted to W1C CSTS.NSSRO"
>                             " but CAP.NSSRS is zero (not supported)");
>          } else if (data != 0) {
> -            NVME_GUEST_ERR(nvme_ub_mmiowr_ro_csts,
> +            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_ro_csts,
>                             "attempted to set a read only bit"
>                             " of controller status");
>          }
>          break;
>      case 0x20:  /* NSSR */
>          if (data == 0x4E564D65) {
> -            trace_nvme_ub_mmiowr_ssreset_unsupported();
> +            trace_nvme_dev_ub_mmiowr_ssreset_unsupported();
>          } else {
>              /* The spec says that writes of other values have no effect */
>              return;
> @@ -1114,35 +1114,35 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
>          break;
>      case 0x24:  /* AQA */
>          n->bar.aqa = data & 0xffffffff;
> -        trace_nvme_mmio_aqattr(data & 0xffffffff);
> +        trace_nvme_dev_mmio_aqattr(data & 0xffffffff);
>          break;
>      case 0x28:  /* ASQ */
>          n->bar.asq = data;
> -        trace_nvme_mmio_asqaddr(data);
> +        trace_nvme_dev_mmio_asqaddr(data);
>          break;
>      case 0x2c:  /* ASQ hi */
>          n->bar.asq |= data << 32;
> -        trace_nvme_mmio_asqaddr_hi(data, n->bar.asq);
> +        trace_nvme_dev_mmio_asqaddr_hi(data, n->bar.asq);
>          break;
>      case 0x30:  /* ACQ */
> -        trace_nvme_mmio_acqaddr(data);
> +        trace_nvme_dev_mmio_acqaddr(data);
>          n->bar.acq = data;
>          break;
>      case 0x34:  /* ACQ hi */
>          n->bar.acq |= data << 32;
> -        trace_nvme_mmio_acqaddr_hi(data, n->bar.acq);
> +        trace_nvme_dev_mmio_acqaddr_hi(data, n->bar.acq);
>          break;
>      case 0x38:  /* CMBLOC */
> -        NVME_GUEST_ERR(nvme_ub_mmiowr_cmbloc_reserved,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_cmbloc_reserved,
>                         "invalid write to reserved CMBLOC"
>                         " when CMBSZ is zero, ignored");
>          return;
>      case 0x3C:  /* CMBSZ */
> -        NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_cmbsz_readonly,
>                         "invalid write to read only CMBSZ, ignored");
>          return;
>      default:
> -        NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiowr_invalid,
>                         "invalid MMIO write,"
>                         " offset=0x%"PRIx64", data=%"PRIx64"",
>                         offset, data);
> @@ -1157,12 +1157,12 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size)
>      uint64_t val = 0;
>  
>      if (unlikely(addr & (sizeof(uint32_t) - 1))) {
> -        NVME_GUEST_ERR(nvme_ub_mmiord_misaligned32,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiord_misaligned32,
>                         "MMIO read not 32-bit aligned,"
>                         " offset=0x%"PRIx64"", addr);
>          /* should RAZ, fall through for now */
>      } else if (unlikely(size < sizeof(uint32_t))) {
> -        NVME_GUEST_ERR(nvme_ub_mmiord_toosmall,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiord_toosmall,
>                         "MMIO read smaller than 32-bits,"
>                         " offset=0x%"PRIx64"", addr);
>          /* should RAZ, fall through for now */
> @@ -1171,7 +1171,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size)
>      if (addr < sizeof(n->bar)) {
>          memcpy(&val, ptr + addr, size);
>      } else {
> -        NVME_GUEST_ERR(nvme_ub_mmiord_invalid_ofs,
> +        NVME_GUEST_ERR(nvme_dev_ub_mmiord_invalid_ofs,
>                         "MMIO read beyond last register,"
>                         " offset=0x%"PRIx64", returning 0", addr);
>      }
> @@ -1184,7 +1184,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>      uint32_t qid;
>  
>      if (unlikely(addr & ((1 << 2) - 1))) {
> -        NVME_GUEST_ERR(nvme_ub_db_wr_misaligned,
> +        NVME_GUEST_ERR(nvme_dev_ub_db_wr_misaligned,
>                         "doorbell write not 32-bit aligned,"
>                         " offset=0x%"PRIx64", ignoring", addr);
>          return;
> @@ -1199,7 +1199,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>  
>          qid = (addr - (0x1000 + (1 << 2))) >> 3;
>          if (unlikely(nvme_check_cqid(n, qid))) {
> -            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_cq,
> +            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_cq,
>                             "completion queue doorbell write"
>                             " for nonexistent queue,"
>                             " sqid=%"PRIu32", ignoring", qid);
> @@ -1208,7 +1208,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>  
>          cq = n->cq[qid];
>          if (unlikely(new_head >= cq->size)) {
> -            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_cqhead,
> +            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_cqhead,
>                             "completion queue doorbell write value"
>                             " beyond queue size, sqid=%"PRIu32","
>                             " new_head=%"PRIu16", ignoring",
> @@ -1237,7 +1237,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>  
>          qid = (addr - 0x1000) >> 3;
>          if (unlikely(nvme_check_sqid(n, qid))) {
> -            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_sq,
> +            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_sq,
>                             "submission queue doorbell write"
>                             " for nonexistent queue,"
>                             " sqid=%"PRIu32", ignoring", qid);
> @@ -1246,7 +1246,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>  
>          sq = n->sq[qid];
>          if (unlikely(new_tail >= sq->size)) {
> -            NVME_GUEST_ERR(nvme_ub_db_wr_invalid_sqtail,
> +            NVME_GUEST_ERR(nvme_dev_ub_db_wr_invalid_sqtail,
>                             "submission queue doorbell write value"
>                             " beyond queue size, sqid=%"PRIu32","
>                             " new_tail=%"PRIu16", ignoring",
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index c03e80c2c9c9..ade506ea2bb2 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -29,96 +29,96 @@ hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int t
>  
>  # nvme.c
>  # nvme traces for successful events
> -nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
> -nvme_irq_pin(void) "pulsing IRQ pin"
> -nvme_irq_masked(void) "IRQ is masked"
> -nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
> -nvme_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
> -nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16""
> -nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16", ien=%d"
> -nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
> -nvme_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
> -nvme_identify_ctrl(void) "identify controller"
> -nvme_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
> -nvme_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
> -nvme_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
> -nvme_getfeat_numq(int result) "get feature number of queues, result=%d"
> -nvme_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
> -nvme_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
> -nvme_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
> -nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
> -nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
> -nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
> -nvme_mmio_aqattr(uint64_t data) "wrote MMIO, admin queue attributes=0x%"PRIx64""
> -nvme_mmio_asqaddr(uint64_t data) "wrote MMIO, admin submission queue address=0x%"PRIx64""
> -nvme_mmio_acqaddr(uint64_t data) "wrote MMIO, admin completion queue address=0x%"PRIx64""
> -nvme_mmio_asqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin submission queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
> -nvme_mmio_acqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin completion queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
> -nvme_mmio_start_success(void) "setting controller enable bit succeeded"
> -nvme_mmio_stopped(void) "cleared controller enable bit"
> -nvme_mmio_shutdown_set(void) "shutdown bit set"
> -nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
> +nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
> +nvme_dev_irq_pin(void) "pulsing IRQ pin"
> +nvme_dev_irq_masked(void) "IRQ is masked"
> +nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
> +nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
> +nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16""
> +nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16",
> qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
> +nvme_dev_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
> +nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
> +nvme_dev_identify_ctrl(void) "identify controller"
> +nvme_dev_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
> +nvme_dev_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
> +nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
> +nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
> +nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
> +nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
> +nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
> +nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
> +nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
> +nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
> +nvme_dev_mmio_aqattr(uint64_t data) "wrote MMIO, admin queue attributes=0x%"PRIx64""
> +nvme_dev_mmio_asqaddr(uint64_t data) "wrote MMIO, admin submission queue address=0x%"PRIx64""
> +nvme_dev_mmio_acqaddr(uint64_t data) "wrote MMIO, admin completion queue address=0x%"PRIx64""
> +nvme_dev_mmio_asqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin submission queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
> +nvme_dev_mmio_acqaddr_hi(uint64_t data, uint64_t new_addr) "wrote MMIO, admin completion queue high half=0x%"PRIx64", new_address=0x%"PRIx64""
> +nvme_dev_mmio_start_success(void) "setting controller enable bit succeeded"
> +nvme_dev_mmio_stopped(void) "cleared controller enable bit"
> +nvme_dev_mmio_shutdown_set(void) "shutdown bit set"
> +nvme_dev_mmio_shutdown_cleared(void) "shutdown bit cleared"
>  
>  # nvme traces for error conditions
> -nvme_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
> -nvme_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
> -nvme_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
> -nvme_err_invalid_prp2_missing(void) "PRP2 is null and more data to be transferred"
> -nvme_err_invalid_prp(void) "invalid PRP"
> -nvme_err_invalid_ns(uint32_t ns, uint32_t limit) "invalid namespace %u not within 1-%u"
> -nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8""
> -nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8""
> -nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64""
> -nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16""
> -nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16""
> -nvme_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16""
> -nvme_err_invalid_create_sq_size(uint16_t qsize) "failed creating submission queue, invalid qsize=%"PRIu16""
> -nvme_err_invalid_create_sq_addr(uint64_t addr) "failed creating submission queue, addr=0x%"PRIx64""
> -nvme_err_invalid_create_sq_qflags(uint16_t qflags) "failed creating submission queue, qflags=%"PRIu16""
> -nvme_err_invalid_del_cq_cqid(uint16_t cqid) "failed deleting completion queue, cqid=%"PRIu16""
> -nvme_err_invalid_del_cq_notempty(uint16_t cqid) "failed deleting completion queue, it is not empty, cqid=%"PRIu16""
> -nvme_err_invalid_create_cq_cqid(uint16_t cqid) "failed creating completion queue, cqid=%"PRIu16""
> -nvme_err_invalid_create_cq_size(uint16_t size) "failed creating completion queue, size=%"PRIu16""
> -nvme_err_invalid_create_cq_addr(uint64_t addr) "failed creating completion queue, addr=0x%"PRIx64""
> -nvme_err_invalid_create_cq_vector(uint16_t vector) "failed creating completion queue, vector=%"PRIu16""
> -nvme_err_invalid_create_cq_qflags(uint16_t qflags) "failed creating completion queue, qflags=%"PRIu16""
> -nvme_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx16""
> -nvme_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
> -nvme_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
> -nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
> -nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
> -nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
> -nvme_err_startfail_nbaracq(void) "nvme_start_ctrl failed because the admin completion queue address is null"
> -nvme_err_startfail_asq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin submission queue address is misaligned: 0x%"PRIx64""
> -nvme_err_startfail_acq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin completion queue address is misaligned: 0x%"PRIx64""
> -nvme_err_startfail_page_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too small: log2size=%u, min=%u"
> -nvme_err_startfail_page_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too large: log2size=%u, max=%u"
> -nvme_err_startfail_cqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too small: log2size=%u, min=%u"
> -nvme_err_startfail_cqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too large: log2size=%u, max=%u"
> -nvme_err_startfail_sqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too small: log2size=%u, min=%u"
> -nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too large: log2size=%u, max=%u"
> -nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero"
> -nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero"
> -nvme_err_startfail(void) "setting controller enable bit failed"
> +nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
> +nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
> +nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
> +nvme_dev_err_invalid_prp2_missing(void) "PRP2 is null and more data to be transferred"
> +nvme_dev_err_invalid_prp(void) "invalid PRP"
> +nvme_dev_err_invalid_ns(uint32_t ns, uint32_t limit) "invalid namespace %u not within 1-%u"
> +nvme_dev_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8""
> +nvme_dev_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8""
> +nvme_dev_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64""
> +nvme_dev_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16""
> +nvme_dev_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16""
> +nvme_dev_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16""
> +nvme_dev_err_invalid_create_sq_size(uint16_t qsize) "failed creating submission queue, invalid qsize=%"PRIu16""
> +nvme_dev_err_invalid_create_sq_addr(uint64_t addr) "failed creating submission queue, addr=0x%"PRIx64""
> +nvme_dev_err_invalid_create_sq_qflags(uint16_t qflags) "failed creating submission queue, qflags=%"PRIu16""
> +nvme_dev_err_invalid_del_cq_cqid(uint16_t cqid) "failed deleting completion queue, cqid=%"PRIu16""
> +nvme_dev_err_invalid_del_cq_notempty(uint16_t cqid) "failed deleting completion queue, it is not empty, cqid=%"PRIu16""
> +nvme_dev_err_invalid_create_cq_cqid(uint16_t cqid) "failed creating completion queue, cqid=%"PRIu16""
> +nvme_dev_err_invalid_create_cq_size(uint16_t size) "failed creating completion queue, size=%"PRIu16""
> +nvme_dev_err_invalid_create_cq_addr(uint64_t addr) "failed creating completion queue, addr=0x%"PRIx64""
> +nvme_dev_err_invalid_create_cq_vector(uint16_t vector) "failed creating completion queue, vector=%"PRIu16""
> +nvme_dev_err_invalid_create_cq_qflags(uint16_t qflags) "failed creating completion queue, qflags=%"PRIu16""
> +nvme_dev_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx16""
> +nvme_dev_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
> +nvme_dev_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
> +nvme_dev_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
> +nvme_dev_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
> +nvme_dev_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"
> +nvme_dev_err_startfail_nbaracq(void) "nvme_start_ctrl failed because the admin completion queue address is null"
> +nvme_dev_err_startfail_asq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin submission queue address is misaligned: 0x%"PRIx64""
> +nvme_dev_err_startfail_acq_misaligned(uint64_t addr) "nvme_start_ctrl failed because the admin completion queue address is misaligned: 0x%"PRIx64""
> +nvme_dev_err_startfail_page_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too small: log2size=%u, min=%u"
> +nvme_dev_err_startfail_page_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the page size is too large: log2size=%u, max=%u"
> +nvme_dev_err_startfail_cqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too small: log2size=%u, min=%u"
> +nvme_dev_err_startfail_cqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the completion queue entry size is too large: log2size=%u, max=%u"
> +nvme_dev_err_startfail_sqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too small: log2size=%u, min=%u"
> +nvme_dev_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_start_ctrl failed because the submission queue entry size is too large: log2size=%u, max=%u"
> +nvme_dev_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero"
> +nvme_dev_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero"
> +nvme_dev_err_startfail(void) "setting controller enable bit failed"
>  
>  # Traces for undefined behavior
> -nvme_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64""
> -nvme_ub_mmiowr_toosmall(uint64_t offset, unsigned size) "MMIO write smaller than 32 bits, offset=0x%"PRIx64", size=%u"
> -nvme_ub_mmiowr_intmask_with_msix(void) "undefined access to interrupt mask set when MSI-X is enabled"
> -nvme_ub_mmiowr_ro_csts(void) "attempted to set a read only bit of controller status"
> -nvme_ub_mmiowr_ssreset_w1c_unsupported(void) "attempted to W1C CSTS.NSSRO but CAP.NSSRS is zero (not supported)"
> -nvme_ub_mmiowr_ssreset_unsupported(void) "attempted NVM subsystem reset but CAP.NSSRS is zero (not supported)"
> -nvme_ub_mmiowr_cmbloc_reserved(void) "invalid write to reserved CMBLOC when CMBSZ is zero, ignored"
> -nvme_ub_mmiowr_cmbsz_readonly(void) "invalid write to read only CMBSZ, ignored"
> -nvme_ub_mmiowr_invalid(uint64_t offset, uint64_t data) "invalid MMIO write, offset=0x%"PRIx64", data=0x%"PRIx64""
> -nvme_ub_mmiord_misaligned32(uint64_t offset) "MMIO read not 32-bit aligned, offset=0x%"PRIx64""
> -nvme_ub_mmiord_toosmall(uint64_t offset) "MMIO read smaller than 32-bits, offset=0x%"PRIx64""
> -nvme_ub_mmiord_invalid_ofs(uint64_t offset) "MMIO read beyond last register, offset=0x%"PRIx64", returning 0"
> -nvme_ub_db_wr_misaligned(uint64_t offset) "doorbell write not 32-bit aligned, offset=0x%"PRIx64", ignoring"
> -nvme_ub_db_wr_invalid_cq(uint32_t qid) "completion queue doorbell write for nonexistent queue, cqid=%"PRIu32", ignoring"
> -nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion queue doorbell write value beyond queue size, cqid=%"PRIu32", new_head=%"PRIu16", ignoring"
> -nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring"
> -nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring"
> +nvme_dev_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64""
> +nvme_dev_ub_mmiowr_toosmall(uint64_t offset, unsigned size) "MMIO write smaller than 32 bits, offset=0x%"PRIx64", size=%u"
> +nvme_dev_ub_mmiowr_intmask_with_msix(void) "undefined access to interrupt mask set when MSI-X is enabled"
> +nvme_dev_ub_mmiowr_ro_csts(void) "attempted to set a read only bit of controller status"
> +nvme_dev_ub_mmiowr_ssreset_w1c_unsupported(void) "attempted to W1C CSTS.NSSRO but CAP.NSSRS is zero (not supported)"
> +nvme_dev_ub_mmiowr_ssreset_unsupported(void) "attempted NVM subsystem reset but CAP.NSSRS is zero (not supported)"
> +nvme_dev_ub_mmiowr_cmbloc_reserved(void) "invalid write to reserved CMBLOC when CMBSZ is zero, ignored"
> +nvme_dev_ub_mmiowr_cmbsz_readonly(void) "invalid write to read only CMBSZ, ignored"
> +nvme_dev_ub_mmiowr_invalid(uint64_t offset, uint64_t data) "invalid MMIO write, offset=0x%"PRIx64", data=0x%"PRIx64""
> +nvme_dev_ub_mmiord_misaligned32(uint64_t offset) "MMIO read not 32-bit aligned, offset=0x%"PRIx64""
> +nvme_dev_ub_mmiord_toosmall(uint64_t offset) "MMIO read smaller than 32-bits, offset=0x%"PRIx64""
> +nvme_dev_ub_mmiord_invalid_ofs(uint64_t offset) "MMIO read beyond last register, offset=0x%"PRIx64", returning 0"
> +nvme_dev_ub_db_wr_misaligned(uint64_t offset) "doorbell write not 32-bit aligned, offset=0x%"PRIx64", ignoring"
> +nvme_dev_ub_db_wr_invalid_cq(uint32_t qid) "completion queue doorbell write for nonexistent queue, cqid=%"PRIu32", ignoring"
> +nvme_dev_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion queue doorbell write value beyond queue size, cqid=%"PRIu32", new_head=%"PRIu16", ignoring"
> +nvme_dev_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring"
> +nvme_dev_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring"
>  
>  # xen-block.c
>  xen_block_realize(const char *type, uint32_t disk, uint32_t partition) "%s d%up%u"


Other that indention nitpicks, no changes vs V5,
so my reviewed-by kept correctly.

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 03/42] nvme: move device parameters to separate struct
  2020-03-16 14:28 ` [PATCH v6 03/42] nvme: move device parameters to separate struct Klaus Jensen
@ 2020-03-25 10:36   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:36 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Move device configuration parameters to separate struct to make it
> explicit what is configurable and what is set internally.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/block/nvme.c | 44 ++++++++++++++++++++++----------------------
>  hw/block/nvme.h | 16 +++++++++++++---
>  2 files changed, 35 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 9740948b354a..b532818b4b76 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -64,12 +64,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  
>  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
>  {
> -    return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1;
> +    return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
>  }
>  
>  static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
>  {
> -    return cqid < n->num_queues && n->cq[cqid] != NULL ? 0 : -1;
> +    return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
>  }
>  
>  static void nvme_inc_cq_tail(NvmeCQueue *cq)
> @@ -631,7 +631,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
>          trace_nvme_dev_err_invalid_create_cq_addr(prp1);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
> -    if (unlikely(vector > n->num_queues)) {
> +    if (unlikely(vector > n->params.num_queues)) {
>          trace_nvme_dev_err_invalid_create_cq_vector(vector);
>          return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
>      }
> @@ -783,7 +783,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> -        result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
> +        result = cpu_to_le32((n->params.num_queues - 2) |
> +                             ((n->params.num_queues - 2) << 16));
>          trace_nvme_dev_getfeat_numq(result);
>          break;
>      case NVME_TIMESTAMP:
> @@ -827,9 +828,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      case NVME_NUMBER_OF_QUEUES:
>          trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
>                                      ((dw11 >> 16) & 0xFFFF) + 1,
> -                                    n->num_queues - 1, n->num_queues - 1);
> -        req->cqe.result =
> -            cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
> +                                    n->params.num_queues - 1,
> +                                    n->params.num_queues - 1);
> +        req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
> +                                      ((n->params.num_queues - 2) << 16));
>          break;
>      case NVME_TIMESTAMP:
>          return nvme_set_feature_timestamp(n, cmd);
> @@ -900,12 +902,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
>  
>      blk_drain(n->conf.blk);
>  
> -    for (i = 0; i < n->num_queues; i++) {
> +    for (i = 0; i < n->params.num_queues; i++) {
>          if (n->sq[i] != NULL) {
>              nvme_free_sq(n->sq[i], n);
>          }
>      }
> -    for (i = 0; i < n->num_queues; i++) {
> +    for (i = 0; i < n->params.num_queues; i++) {
>          if (n->cq[i] != NULL) {
>              nvme_free_cq(n->cq[i], n);
>          }
> @@ -1308,7 +1310,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      int64_t bs_size;
>      uint8_t *pci_conf;
>  
> -    if (!n->num_queues) {
> +    if (!n->params.num_queues) {
>          error_setg(errp, "num_queues can't be zero");
>          return;
>      }
> @@ -1324,7 +1326,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>          return;
>      }
>  
> -    if (!n->serial) {
> +    if (!n->params.serial) {
>          error_setg(errp, "serial property not set");
>          return;
>      }
> @@ -1341,25 +1343,25 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      pcie_endpoint_cap_init(pci_dev, 0x80);
>  
>      n->num_namespaces = 1;
> -    n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
> +    n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
>      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
>  
>      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> -    n->sq = g_new0(NvmeSQueue *, n->num_queues);
> -    n->cq = g_new0(NvmeCQueue *, n->num_queues);
> +    n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
> +    n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
>  
>      memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
>                            "nvme", n->reg_size);
>      pci_register_bar(pci_dev, 0,
>          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>          &n->iomem);
> -    msix_init_exclusive_bar(pci_dev, n->num_queues, 4, NULL);
> +    msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL);
>  
>      id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
>      id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
>      strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
>      strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
> -    strpadcpy((char *)id->sn, sizeof(id->sn), n->serial, ' ');
> +    strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
>      id->rab = 6;
>      id->ieee[0] = 0x00;
>      id->ieee[1] = 0x02;
> @@ -1388,7 +1390,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      n->bar.vs = 0x00010200;
>      n->bar.intmc = n->bar.intms = 0;
>  
> -    if (n->cmb_size_mb) {
> +    if (n->params.cmb_size_mb) {
>  
>          NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
>          NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
> @@ -1399,7 +1401,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>          NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
>          NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
>          NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
> -        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->cmb_size_mb);
> +        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
>  
>          n->cmbloc = n->bar.cmbloc;
>          n->cmbsz = n->bar.cmbsz;
> @@ -1438,7 +1440,7 @@ static void nvme_exit(PCIDevice *pci_dev)
>      g_free(n->cq);
>      g_free(n->sq);
>  
> -    if (n->cmb_size_mb) {
> +    if (n->params.cmb_size_mb) {
>          g_free(n->cmbuf);
>      }
>      msix_uninit_exclusive_bar(pci_dev);
> @@ -1446,9 +1448,7 @@ static void nvme_exit(PCIDevice *pci_dev)
>  
>  static Property nvme_props[] = {
>      DEFINE_BLOCK_PROPERTIES(NvmeCtrl, conf),
> -    DEFINE_PROP_STRING("serial", NvmeCtrl, serial),
> -    DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, cmb_size_mb, 0),
> -    DEFINE_PROP_UINT32("num_queues", NvmeCtrl, num_queues, 64),
> +    DEFINE_NVME_PROPERTIES(NvmeCtrl, params),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 557194ee1954..9957c4a200e2 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -1,7 +1,19 @@
>  #ifndef HW_NVME_H
>  #define HW_NVME_H
> +
>  #include "block/nvme.h"
>  
> +#define DEFINE_NVME_PROPERTIES(_state, _props) \
> +    DEFINE_PROP_STRING("serial", _state, _props.serial), \
> +    DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
> +    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64)
> +
> +typedef struct NvmeParams {
> +    char     *serial;
> +    uint32_t num_queues;
> +    uint32_t cmb_size_mb;
> +} NvmeParams;
> +
>  typedef struct NvmeAsyncEvent {
>      QSIMPLEQ_ENTRY(NvmeAsyncEvent) entry;
>      NvmeAerResult result;
> @@ -63,6 +75,7 @@ typedef struct NvmeCtrl {
>      MemoryRegion ctrl_mem;
>      NvmeBar      bar;
>      BlockConf    conf;
> +    NvmeParams   params;
>  
>      uint32_t    page_size;
>      uint16_t    page_bits;
> @@ -71,10 +84,8 @@ typedef struct NvmeCtrl {
>      uint16_t    sqe_size;
>      uint32_t    reg_size;
>      uint32_t    num_namespaces;
> -    uint32_t    num_queues;
>      uint32_t    max_q_ents;
>      uint64_t    ns_size;
> -    uint32_t    cmb_size_mb;
>      uint32_t    cmbsz;
>      uint32_t    cmbloc;
>      uint8_t     *cmbuf;
> @@ -82,7 +93,6 @@ typedef struct NvmeCtrl {
>      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
>      uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
>  
> -    char            *serial;
>      NvmeNamespace   *namespaces;
>      NvmeSQueue      **sq;
>      NvmeCQueue      **cq;
Looks the same, the indention fixed indeed.
So this keeps my reviewed-by as well.

Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 04/42] nvme: bump spec data structures to v1.3
  2020-03-16 14:28 ` [PATCH v6 04/42] nvme: bump spec data structures to v1.3 Klaus Jensen
@ 2020-03-25 10:37   ` Maxim Levitsky
  2020-03-31  5:38     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:37 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Add missing fields in the Identify Controller and Identify Namespace
> data structures to bring them in line with NVMe v1.3.
> 
> This also adds data structures and defines for SGL support which
> requires a couple of trivial changes to the nvme block driver as well.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Fam Zheng <fam@euphon.net>
> ---
>  block/nvme.c         |  18 ++---
>  hw/block/nvme.c      |  12 ++--
>  include/block/nvme.h | 153 ++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 151 insertions(+), 32 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index d41c4bda6e39..99b9bb3dac96 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -446,7 +446,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>          error_setg(errp, "Cannot map buffer for DMA");
>          goto out;
>      }
> -    cmd.prp1 = cpu_to_le64(iova);
> +    cmd.dptr.prp1 = cpu_to_le64(iova);
>  
>      if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
>          error_setg(errp, "Failed to identify controller");
> @@ -545,7 +545,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>      }
>      cmd = (NvmeCmd) {
>          .opcode = NVME_ADM_CMD_CREATE_CQ,
> -        .prp1 = cpu_to_le64(q->cq.iova),
> +        .dptr.prp1 = cpu_to_le64(q->cq.iova),
>          .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)),
>          .cdw11 = cpu_to_le32(0x3),
>      };
> @@ -556,7 +556,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>      }
>      cmd = (NvmeCmd) {
>          .opcode = NVME_ADM_CMD_CREATE_SQ,
> -        .prp1 = cpu_to_le64(q->sq.iova),
> +        .dptr.prp1 = cpu_to_le64(q->sq.iova),
>          .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)),
>          .cdw11 = cpu_to_le32(0x1 | (n << 16)),
>      };
> @@ -906,16 +906,16 @@ try_map:
>      case 0:
>          abort();
>      case 1:
> -        cmd->prp1 = pagelist[0];
> -        cmd->prp2 = 0;
> +        cmd->dptr.prp1 = pagelist[0];
> +        cmd->dptr.prp2 = 0;
>          break;
>      case 2:
> -        cmd->prp1 = pagelist[0];
> -        cmd->prp2 = pagelist[1];
> +        cmd->dptr.prp1 = pagelist[0];
> +        cmd->dptr.prp2 = pagelist[1];
>          break;
>      default:
> -        cmd->prp1 = pagelist[0];
> -        cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
> +        cmd->dptr.prp1 = pagelist[0];
> +        cmd->dptr.prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
>          break;
>      }
>      trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index b532818b4b76..40cb176dea3c 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -372,8 +372,8 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
>      uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
>      uint64_t slba = le64_to_cpu(rw->slba);
> -    uint64_t prp1 = le64_to_cpu(rw->prp1);
> -    uint64_t prp2 = le64_to_cpu(rw->prp2);
> +    uint64_t prp1 = le64_to_cpu(rw->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(rw->dptr.prp2);
>  
>      uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
>      uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
> @@ -763,8 +763,8 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
>  
>  static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>  {
> -    uint64_t prp1 = le64_to_cpu(cmd->prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>  
>      uint64_t timestamp = nvme_get_timestamp(n);
>  
> @@ -802,8 +802,8 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>  {
>      uint16_t ret;
>      uint64_t timestamp;
> -    uint64_t prp1 = le64_to_cpu(cmd->prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->prp2);
> +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>  
>      ret = nvme_dma_write_prp(n, (uint8_t *)&timestamp,
>                                  sizeof(timestamp), prp1, prp2);
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 8fb941c6537c..a083c1b3a613 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -205,15 +205,53 @@ enum NvmeCmbszMask {
>  #define NVME_CMBSZ_GETSIZE(cmbsz) \
>      (NVME_CMBSZ_SZ(cmbsz) * (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz))))
>  
> +enum NvmeSglDescriptorType {
> +    NVME_SGL_DESCR_TYPE_DATA_BLOCK          = 0x0,
> +    NVME_SGL_DESCR_TYPE_BIT_BUCKET          = 0x1,
> +    NVME_SGL_DESCR_TYPE_SEGMENT             = 0x2,
> +    NVME_SGL_DESCR_TYPE_LAST_SEGMENT        = 0x3,
> +    NVME_SGL_DESCR_TYPE_KEYED_DATA_BLOCK    = 0x4,
> +
> +    NVME_SGL_DESCR_TYPE_VENDOR_SPECIFIC     = 0xf,
> +};
OK

> +
> +enum NvmeSglDescriptorSubtype {
> +    NVME_SGL_DESCR_SUBTYPE_ADDRESS = 0x0,
> +};
OK
> +
> +typedef struct NvmeSglDescriptor {
> +    uint64_t addr;
> +    uint32_t len;
> +    uint8_t  rsvd[3];
> +    uint8_t  type;
> +} NvmeSglDescriptor;
> +
> +#define NVME_SGL_TYPE(type)     ((type >> 4) & 0xf)
> +#define NVME_SGL_SUBTYPE(type)  (type & 0xf)
OK

> +
> +typedef union NvmeCmdDptr {
> +    struct {
> +        uint64_t    prp1;
> +        uint64_t    prp2;
> +    };
> +
> +    NvmeSglDescriptor sgl;
> +} NvmeCmdDptr;

> +
> +enum NvmePsdt {
> +    PSDT_PRP                 = 0x0,
> +    PSDT_SGL_MPTR_CONTIGUOUS = 0x1,
> +    PSDT_SGL_MPTR_SGL        = 0x2,
> +};
OK
> +
>  typedef struct NvmeCmd {
>      uint8_t     opcode;
> -    uint8_t     fuse;
> +    uint8_t     flags;
Yep, that makes sense, since this contains
more that fused operation control bits.
>      uint16_t    cid;
>      uint32_t    nsid;
>      uint64_t    res1;
>      uint64_t    mptr;
> -    uint64_t    prp1;
> -    uint64_t    prp2;
> +    NvmeCmdDptr dptr;
>      uint32_t    cdw10;
>      uint32_t    cdw11;
>      uint32_t    cdw12;
> @@ -222,6 +260,9 @@ typedef struct NvmeCmd {
>      uint32_t    cdw15;
>  } NvmeCmd;
>  
> +#define NVME_CMD_FLAGS_FUSE(flags) (flags & 0x3)
> +#define NVME_CMD_FLAGS_PSDT(flags) ((flags >> 6) & 0x3)
OK
> +
>  enum NvmeAdminCommands {
>      NVME_ADM_CMD_DELETE_SQ      = 0x00,
>      NVME_ADM_CMD_CREATE_SQ      = 0x01,
> @@ -321,8 +362,7 @@ typedef struct NvmeRwCmd {
>      uint32_t    nsid;
>      uint64_t    rsvd2;
>      uint64_t    mptr;
> -    uint64_t    prp1;
> -    uint64_t    prp2;
> +    NvmeCmdDptr dptr;
>      uint64_t    slba;
>      uint16_t    nlb;
>      uint16_t    control;
> @@ -362,8 +402,7 @@ typedef struct NvmeDsmCmd {
>      uint16_t    cid;
>      uint32_t    nsid;
>      uint64_t    rsvd2[2];
> -    uint64_t    prp1;
> -    uint64_t    prp2;
> +    NvmeCmdDptr dptr;
>      uint32_t    nr;
>      uint32_t    attributes;
>      uint32_t    rsvd12[4];
> @@ -427,6 +466,12 @@ enum NvmeStatusCodes {
>      NVME_CMD_ABORT_MISSING_FUSE = 0x000a,
>      NVME_INVALID_NSID           = 0x000b,
>      NVME_CMD_SEQ_ERROR          = 0x000c,
> 

> +    NVME_INVALID_SGL_SEG_DESCR  = 0x000d,
> +    NVME_INVALID_NUM_SGL_DESCRS = 0x000e,
> +    NVME_DATA_SGL_LEN_INVALID   = 0x000f,
> +    NVME_MD_SGL_LEN_INVALID     = 0x0010,
> +    NVME_SGL_DESCR_TYPE_INVALID = 0x0011,
> +    NVME_INVALID_USE_OF_CMB     = 0x0012,
OK

>      NVME_LBA_RANGE              = 0x0080,
>      NVME_CAP_EXCEEDED           = 0x0081,
>      NVME_NS_NOT_READY           = 0x0082,
> @@ -515,7 +560,7 @@ enum NvmeSmartWarn {
>      NVME_SMART_FAILED_VOLATILE_MEDIA  = 1 << 4,
>  };
>  
> -enum LogIdentifier {
> +enum NvmeLogIdentifier {
>      NVME_LOG_ERROR_INFO     = 0x01,
>      NVME_LOG_SMART_INFO     = 0x02,
>      NVME_LOG_FW_SLOT_INFO   = 0x03,
> @@ -533,6 +578,15 @@ typedef struct NvmePSD {
>      uint8_t     resv[16];
>  } NvmePSD;
>  
> +#define NVME_IDENTIFY_DATA_SIZE 4096
> +
> +enum {
> +    NVME_ID_CNS_NS             = 0x0,
> +    NVME_ID_CNS_CTRL           = 0x1,
> +    NVME_ID_CNS_NS_ACTIVE_LIST = 0x2,
> +    NVME_ID_CNS_NS_DESCR_LIST  = 0x3,
> +};
OK


> +
>  typedef struct NvmeIdCtrl {
>      uint16_t    vid;
>      uint16_t    ssvid;
> @@ -543,7 +597,15 @@ typedef struct NvmeIdCtrl {
>      uint8_t     ieee[3];
>      uint8_t     cmic;
>      uint8_t     mdts;
> -    uint8_t     rsvd255[178];
> +    uint16_t    cntlid;
> +    uint32_t    ver;
> +    uint32_t    rtd3r;
> +    uint32_t    rtd3e;
> +    uint32_t    oaes;
> +    uint32_t    ctratt;
> +    uint8_t     rsvd100[12];
> +    uint8_t     fguid[16];
> +    uint8_t     rsvd128[128];
>      uint16_t    oacs;
>      uint8_t     acl;
>      uint8_t     aerl;
> @@ -551,10 +613,28 @@ typedef struct NvmeIdCtrl {
>      uint8_t     lpa;
>      uint8_t     elpe;
>      uint8_t     npss;
> -    uint8_t     rsvd511[248];
> +    uint8_t     avscc;
> +    uint8_t     apsta;
> +    uint16_t    wctemp;
> +    uint16_t    cctemp;
> +    uint16_t    mtfa;
> +    uint32_t    hmpre;
> +    uint32_t    hmmin;
> +    uint8_t     tnvmcap[16];
> +    uint8_t     unvmcap[16];
> +    uint32_t    rpmbs;
> +    uint16_t    edstt;
> +    uint8_t     dsto;
> +    uint8_t     fwug;
> +    uint16_t    kas;
> +    uint16_t    hctma;
> +    uint16_t    mntmt;
> +    uint16_t    mxtmt;
> +    uint32_t    sanicap;
> +    uint8_t     rsvd332[180];
>      uint8_t     sqes;
>      uint8_t     cqes;
> -    uint16_t    rsvd515;
> +    uint16_t    maxcmd;
>      uint32_t    nn;
>      uint16_t    oncs;
>      uint16_t    fuses;
> @@ -562,8 +642,14 @@ typedef struct NvmeIdCtrl {
>      uint8_t     vwc;
>      uint16_t    awun;
>      uint16_t    awupf;
> -    uint8_t     rsvd703[174];
> -    uint8_t     rsvd2047[1344];
> +    uint8_t     nvscc;
> +    uint8_t     rsvd531;
> +    uint16_t    acwu;
> +    uint8_t     rsvd534[2];
> +    uint32_t    sgls;
> +    uint8_t     rsvd540[228];
> +    uint8_t     subnqn[256];
> +    uint8_t     rsvd1024[1024];
>      NvmePSD     psd[32];
>      uint8_t     vs[1024];
>  } NvmeIdCtrl;
I checked the diff versus V5, cross referenced the spec and it looks correct now,
plus you documented even more fields which is welcome.


> @@ -589,6 +675,16 @@ enum NvmeIdCtrlOncs {
>  #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
>  #define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf)
>  
> +#define NVME_CTRL_SGLS_SUPPORTED_MASK            (0x3 <<  0)
> +#define NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT    (0x1 <<  0)
> +#define NVME_CTRL_SGLS_SUPPORTED_DWORD_ALIGNMENT (0x1 <<  1)
> +#define NVME_CTRL_SGLS_KEYED                     (0x1 <<  2)
> +#define NVME_CTRL_SGLS_BITBUCKET                 (0x1 << 16)
> +#define NVME_CTRL_SGLS_MPTR_CONTIGUOUS           (0x1 << 17)
> +#define NVME_CTRL_SGLS_EXCESS_LENGTH             (0x1 << 18)
> +#define NVME_CTRL_SGLS_MPTR_SGL                  (0x1 << 19)
> +#define NVME_CTRL_SGLS_ADDR_OFFSET               (0x1 << 20)
OK
> +
>  typedef struct NvmeFeatureVal {
>      uint32_t    arbitration;
>      uint32_t    power_mgmt;
> @@ -611,6 +707,10 @@ typedef struct NvmeFeatureVal {
>  #define NVME_INTC_THR(intc)     (intc & 0xff)
>  #define NVME_INTC_TIME(intc)    ((intc >> 8) & 0xff)
>  
> +#define NVME_TEMP_THSEL(temp)  ((temp >> 20) & 0x3)
Nitpick: If we are adding this, I'll add a #define for the values as well

> +#define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
> +#define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
> +
>  enum NvmeFeatureIds {
>      NVME_ARBITRATION                = 0x1,
>      NVME_POWER_MANAGEMENT           = 0x2,
> @@ -653,18 +753,37 @@ typedef struct NvmeIdNs {
>      uint8_t     mc;
>      uint8_t     dpc;
>      uint8_t     dps;
> -
>      uint8_t     nmic;
>      uint8_t     rescap;
>      uint8_t     fpi;
>      uint8_t     dlfeat;
> -
> -    uint8_t     res34[94];
> +    uint16_t    nawun;
> +    uint16_t    nawupf;
> +    uint16_t    nacwu;
> +    uint16_t    nabsn;
> +    uint16_t    nabo;
> +    uint16_t    nabspf;
> +    uint16_t    noiob;
> +    uint8_t     nvmcap[16];
> +    uint8_t     rsvd64[40];
> +    uint8_t     nguid[16];
> +    uint64_t    eui64;
>      NvmeLBAF    lbaf[16];
> -    uint8_t     res192[192];
> +    uint8_t     rsvd192[192];
>      uint8_t     vs[3712];
>  } NvmeIdNs;
Also checked this against V5, looks OK now

>  
> +typedef struct NvmeIdNsDescr {
> +    uint8_t nidt;
> +    uint8_t nidl;
> +    uint8_t rsvd2[2];
> +} NvmeIdNsDescr;
OK



> +
> +#define NVME_NIDT_UUID_LEN 16
> +
> +enum {
> +    NVME_NIDT_UUID = 0x3,
Very minor nitpick: I'll would add others as well just for the sake
of better understanding what this is

> +};
>  
>  /*Deallocate Logical Block Features*/
>  #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)

Looks very good.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 05/42] nvme: use constant for identify data size
  2020-03-16 14:28 ` [PATCH v6 05/42] nvme: use constant for identify data size Klaus Jensen
@ 2020-03-25 10:37   ` Maxim Levitsky
  2020-03-31  5:38     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:37 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 40cb176dea3c..f716f690a594 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -679,7 +679,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
>  
>  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
>  {
> -    static const int data_len = 4 * KiB;
> +    static const int data_len = NVME_IDENTIFY_DATA_SIZE;
>      uint32_t min_nsid = le32_to_cpu(c->nsid);
>      uint64_t prp1 = le64_to_cpu(c->prp1);
>      uint64_t prp2 = le64_to_cpu(c->prp2);

I'll probably squash this with some other refactoring patch,
but I absolutely don't mind leaving this as is.
Fine grained patches never cause any harm.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 06/42] nvme: add identify cns values in header
  2020-03-16 14:28 ` [PATCH v6 06/42] nvme: add identify cns values in header Klaus Jensen
@ 2020-03-25 10:37   ` Maxim Levitsky
  2020-03-31  5:39     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:37 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f716f690a594..b38d7e548a60 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -709,11 +709,11 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>      NvmeIdentify *c = (NvmeIdentify *)cmd;
>  
>      switch (le32_to_cpu(c->cns)) {
> -    case 0x00:
> +    case NVME_ID_CNS_NS:
>          return nvme_identify_ns(n, c);
> -    case 0x01:
> +    case NVME_ID_CNS_CTRL:
>          return nvme_identify_ctrl(n, c);
> -    case 0x02:
> +    case NVME_ID_CNS_NS_ACTIVE_LIST:
>          return nvme_identify_nslist(n, c);
>      default:
>          trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));

This is a very good candidate to be squished with the patch 5 IMHO,
but you can leave this as is as well. I don't mind.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 07/42] nvme: refactor nvme_addr_read
  2020-03-16 14:28 ` [PATCH v6 07/42] nvme: refactor nvme_addr_read Klaus Jensen
@ 2020-03-25 10:38   ` Maxim Levitsky
  2020-03-31  5:39     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:38 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Pull the controller memory buffer check to its own function. The check
> will be used on its own in later patches.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index b38d7e548a60..08a83d449de3 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -52,14 +52,22 @@
>  
>  static void nvme_process_sq(void *opaque);
>  
> +static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
> +{
> +    hwaddr low = n->ctrl_mem.addr;
> +    hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
> +
> +    return addr >= low && addr < hi;
> +}
> +
>  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
> -    if (n->cmbsz && addr >= n->ctrl_mem.addr &&
> -                addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
> +    if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
>          memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
> -    } else {
> -        pci_dma_read(&n->parent_obj, addr, buf, size);
> +        return;
>      }
> +
> +    pci_dma_read(&n->parent_obj, addr, buf, size);
>  }
>  
>  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)

Note that this patch still contains a bug that it removes the check against the accessed
size, which you fix in later patch.
I prefer to not add a bug in first place
However if you have a reason for this, I won't mind.

Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 08/42] nvme: add support for the abort command
  2020-03-16 14:28 ` [PATCH v6 08/42] nvme: add support for the abort command Klaus Jensen
@ 2020-03-25 10:38   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:38 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> Section 5.1 ("Abort command").
> 
> The Abort command is a best effort command; for now, the device always
> fails to abort the given command.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 08a83d449de3..7cf7cf55143e 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -729,6 +729,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>      }
>  }
>  
> +static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +    uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0xffff;
> +
> +    req->cqe.result = 1;
> +    if (nvme_check_sqid(n, sqid)) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    return NVME_SUCCESS;
> +}
> +
>  static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
>  {
>      trace_nvme_dev_setfeat_timestamp(ts);
> @@ -863,6 +875,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          return nvme_create_cq(n, cmd);
>      case NVME_ADM_CMD_IDENTIFY:
>          return nvme_identify(n, cmd);
> +    case NVME_ADM_CMD_ABORT:
> +        return nvme_abort(n, cmd, req);
>      case NVME_ADM_CMD_SET_FEATURES:
>          return nvme_set_feature(n, cmd, req);
>      case NVME_ADM_CMD_GET_FEATURES:
> @@ -1375,6 +1389,19 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      id->ieee[1] = 0x02;
>      id->ieee[2] = 0xb3;
>      id->oacs = cpu_to_le16(0);
> +
> +    /*
> +     * Because the controller always completes the Abort command immediately,
> +     * there can never be more than one concurrently executing Abort command,
> +     * so this value is never used for anything. Note that there can easily be
> +     * many Abort commands in the queues, but they are not considered
> +     * "executing" until processed by nvme_abort.
> +     *
> +     * The specification recommends a value of 3 for Abort Command Limit (four
> +     * concurrently outstanding Abort commands), so lets use that though it is
> +     * inconsequential.
> +     */
> +    id->acl = 3;
>      id->frmw = 7 << 1;
>      id->lpa = 1 << 0;
>      id->sqes = (0x6 << 4) | 0x6;

You forgot to move my reviewed-by from the previous version
I see that you also fixed the white space problem, thanks!
So, 

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 09/42] nvme: add max_ioqpairs device parameter
  2020-03-16 14:28 ` [PATCH v6 09/42] nvme: add max_ioqpairs device parameter Klaus Jensen
@ 2020-03-25 10:39   ` Maxim Levitsky
  2020-03-31  5:40     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:39 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The num_queues device paramater has a slightly confusing meaning because
> it accounts for the admin queue pair which is not really optional.
> Secondly, it is really a maximum value of queues allowed.
> 
> Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
> but keep num_queues for compatibility.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 45 ++++++++++++++++++++++++++-------------------
>  hw/block/nvme.h |  4 +++-
>  2 files changed, 29 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 7cf7cf55143e..7dfd8a1a392d 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -19,7 +19,7 @@
>   *      -drive file=<file>,if=none,id=<drive_id>
>   *      -device nvme,drive=<drive_id>,serial=<serial>,id=<id[optional]>, \
>   *              cmb_size_mb=<cmb_size_mb[optional]>, \
> - *              num_queues=<N[optional]>
> + *              max_ioqpairs=<N[optional]>
>   *
>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> @@ -27,6 +27,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/units.h"
> +#include "qemu/error-report.h"
>  #include "hw/block/block.h"
>  #include "hw/pci/msix.h"
>  #include "hw/pci/pci.h"
> @@ -72,12 +73,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  
>  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
>  {
> -    return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
> +    return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
>  }
>  
>  static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
>  {
> -    return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
> +    return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
>  }
>  
>  static void nvme_inc_cq_tail(NvmeCQueue *cq)
> @@ -639,7 +640,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
>          trace_nvme_dev_err_invalid_create_cq_addr(prp1);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
> -    if (unlikely(vector > n->params.num_queues)) {
> +    if (unlikely(vector > n->params.max_ioqpairs + 1)) {
>          trace_nvme_dev_err_invalid_create_cq_vector(vector);
>          return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
>      }
> @@ -803,8 +804,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> -        result = cpu_to_le32((n->params.num_queues - 2) |
> -                             ((n->params.num_queues - 2) << 16));
> +        result = cpu_to_le32((n->params.max_ioqpairs - 1) |
> +                             ((n->params.max_ioqpairs - 1) << 16));
>          trace_nvme_dev_getfeat_numq(result);
>          break;
>      case NVME_TIMESTAMP:
> @@ -848,10 +849,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      case NVME_NUMBER_OF_QUEUES:
>          trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
>                                      ((dw11 >> 16) & 0xFFFF) + 1,
> -                                    n->params.num_queues - 1,
> -                                    n->params.num_queues - 1);
> -        req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
> -                                      ((n->params.num_queues - 2) << 16));
> +                                    n->params.max_ioqpairs,
> +                                    n->params.max_ioqpairs);
> +        req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
> +                                      ((n->params.max_ioqpairs - 1) << 16));
>          break;
>      case NVME_TIMESTAMP:
>          return nvme_set_feature_timestamp(n, cmd);
> @@ -924,12 +925,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
>  
>      blk_drain(n->conf.blk);
>  
> -    for (i = 0; i < n->params.num_queues; i++) {
> +    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
>          if (n->sq[i] != NULL) {
>              nvme_free_sq(n->sq[i], n);
>          }
>      }
> -    for (i = 0; i < n->params.num_queues; i++) {
> +    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
>          if (n->cq[i] != NULL) {
>              nvme_free_cq(n->cq[i], n);
>          }
> @@ -1332,9 +1333,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      int64_t bs_size;
>      uint8_t *pci_conf;
>  
> -    if (!n->params.num_queues) {
> -        error_setg(errp, "num_queues can't be zero");
> -        return;
> +    if (n->params.num_queues) {
> +        warn_report("nvme: num_queues is deprecated; please use max_ioqpairs "
> +                    "instead");
> +
> +        n->params.max_ioqpairs = n->params.num_queues - 1;
> +    }
> +
> +    if (!n->params.max_ioqpairs) {
> +        error_setg(errp, "max_ioqpairs can't be less than 1");
>      }
This is not even a nitpick, but just and idea.

It might be worth it to allow max_ioqpairs=0 to simulate a 'broken'
nvme controller. I know that kernel has special handling for such controllers,
which include only creation of the control character device (/dev/nvme*) through
which the user can submit commands to try and 'fix' the controller (by re-uploading firmware
maybe or something like that).


>  
>      if (!n->conf.blk) {
> @@ -1365,19 +1372,19 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      pcie_endpoint_cap_init(pci_dev, 0x80);
>  
>      n->num_namespaces = 1;
> -    n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
> +    n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);

I hate to say it, but it looks like this thing (which I mentioned to you in V5)
was pre-existing bug, which is indeed fixed now.
In theory such fixes should go to separate patches, but in this case, I guess it would
be too much to ask for it.
Maybe mention this in the commit message instead, so that this fix doesn't stay hidden like that?


>      n->ns_size = bs_size / (uint64_t)n->num_namespaces;
>  
>      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> -    n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
> -    n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
> +    n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
> +    n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
>  
>      memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
>                            "nvme", n->reg_size);
>      pci_register_bar(pci_dev, 0,
>          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>          &n->iomem);
> -    msix_init_exclusive_bar(pci_dev, n->params.num_queues, 4, NULL);
> +    msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
>  
>      id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
>      id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 9957c4a200e2..98f5b9479244 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -6,11 +6,13 @@
>  #define DEFINE_NVME_PROPERTIES(_state, _props) \
>      DEFINE_PROP_STRING("serial", _state, _props.serial), \
>      DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
> -    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64)
> +    DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 0), \
> +    DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64)
>  
>  typedef struct NvmeParams {
>      char     *serial;
>      uint32_t num_queues;
> +    uint32_t max_ioqpairs;
>      uint32_t cmb_size_mb;
>  } NvmeParams;
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky








^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 10/42] nvme: refactor device realization
  2020-03-16 14:28 ` [PATCH v6 10/42] nvme: refactor device realization Klaus Jensen
@ 2020-03-25 10:40   ` Maxim Levitsky
  2020-03-31  5:40     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:40 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> This patch splits up nvme_realize into multiple individual functions,
> each initializing a different subset of the device.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c | 178 ++++++++++++++++++++++++++++++------------------
>  hw/block/nvme.h |  23 ++++++-
>  2 files changed, 134 insertions(+), 67 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 7dfd8a1a392d..665485045066 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -44,6 +44,8 @@
>  #include "trace.h"
>  #include "nvme.h"
>  
> +#define NVME_CMB_BIR 2
> +
>  #define NVME_GUEST_ERR(trace, fmt, ...) \
>      do { \
>          (trace_##trace)(__VA_ARGS__); \
> @@ -63,7 +65,7 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>  
>  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
> -    if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> +    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
>          memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
>          return;
>      }
> @@ -157,7 +159,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>      if (unlikely(!prp1)) {
>          trace_nvme_dev_err_invalid_prp();
>          return NVME_INVALID_FIELD | NVME_DNR;
> -    } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
> +    } else if (n->bar.cmbsz && prp1 >= n->ctrl_mem.addr &&
>                 prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
>          qsg->nsg = 0;
>          qemu_iovec_init(iov, num_prps);
> @@ -1324,14 +1326,9 @@ static const MemoryRegionOps nvme_cmb_ops = {
>      },
>  };
>  
> -static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> +static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
>  {
> -    NvmeCtrl *n = NVME(pci_dev);
> -    NvmeIdCtrl *id = &n->id_ctrl;
> -
> -    int i;
> -    int64_t bs_size;
> -    uint8_t *pci_conf;
> +    NvmeParams *params = &n->params;
>  
>      if (n->params.num_queues) {
>          warn_report("nvme: num_queues is deprecated; please use max_ioqpairs "
> @@ -1340,57 +1337,100 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>          n->params.max_ioqpairs = n->params.num_queues - 1;
>      }
>  
> -    if (!n->params.max_ioqpairs) {
> -        error_setg(errp, "max_ioqpairs can't be less than 1");
> +    if (params->max_ioqpairs < 1 ||
> +        params->max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
> +        error_setg(errp, "nvme: max_ioqpairs must be ");
Looks like the error message is not complete now.
> +        return -1;
>      }
>  
>      if (!n->conf.blk) {
> -        error_setg(errp, "drive property not set");
> -        return;
> +        error_setg(errp, "nvme: block backend not configured");
> +        return -1;
>      }
>  
> -    bs_size = blk_getlength(n->conf.blk);
> -    if (bs_size < 0) {
> -        error_setg(errp, "could not get backing file size");
> -        return;
> +    if (!params->serial) {
> +        error_setg(errp, "nvme: serial not configured");
> +        return -1;
>      }
>  
> -    if (!n->params.serial) {
> -        error_setg(errp, "serial property not set");
> -        return;
> -    }
> +    return 0;
> +}
> +
> +static int nvme_init_blk(NvmeCtrl *n, Error **errp)
> +{
>      blkconf_blocksizes(&n->conf);
>      if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
>                                         false, errp)) {
> -        return;
> +        return -1;
>      }
>  
> -    pci_conf = pci_dev->config;
> -    pci_conf[PCI_INTERRUPT_PIN] = 1;
> -    pci_config_set_prog_interface(pci_dev->config, 0x2);
> -    pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
> -    pcie_endpoint_cap_init(pci_dev, 0x80);
> +    return 0;
> +}
>  
> +static void nvme_init_state(NvmeCtrl *n)
> +{
>      n->num_namespaces = 1;
>      n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
> -    n->ns_size = bs_size / (uint64_t)n->num_namespaces;
> -
>      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>      n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
>      n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
> +}
>  
> -    memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
> -                          "nvme", n->reg_size);
> -    pci_register_bar(pci_dev, 0,
> -        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> -        &n->iomem);
> +static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
> +{
> +    NVME_CMBLOC_SET_BIR(n->bar.cmbloc, NVME_CMB_BIR);
> +    NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
> +
> +    NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
> +    NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
> +    NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
> +    NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
> +    NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
> +    NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2);
> +    NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
> +
> +    n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
> +    memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
> +                          "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
> +    pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
> +                     PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                     PCI_BASE_ADDRESS_MEM_TYPE_64 |
> +                     PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
> +}
> +
> +static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
> +{
> +    uint8_t *pci_conf = pci_dev->config;
> +
> +    pci_conf[PCI_INTERRUPT_PIN] = 1;
> +    pci_config_set_prog_interface(pci_conf, 0x2);
> +    pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
> +    pci_config_set_device_id(pci_conf, 0x5845);
> +    pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
> +    pcie_endpoint_cap_init(pci_dev, 0x80);
> +
> +    memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
> +                          n->reg_size);
> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                     PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem);
>      msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
>  
> +    if (n->params.cmb_size_mb) {
> +        nvme_init_cmb(n, pci_dev);
> +    }
> +}
> +
> +static void nvme_init_ctrl(NvmeCtrl *n)
> +{
> +    NvmeIdCtrl *id = &n->id_ctrl;
> +    NvmeParams *params = &n->params;
> +    uint8_t *pci_conf = n->parent_obj.config;
> +
>      id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
>      id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
>      strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
>      strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
> -    strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
> +    strpadcpy((char *)id->sn, sizeof(id->sn), params->serial, ' ');
>      id->rab = 6;
>      id->ieee[0] = 0x00;
>      id->ieee[1] = 0x02;
> @@ -1431,46 +1471,54 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>  
>      n->bar.vs = 0x00010200;
>      n->bar.intmc = n->bar.intms = 0;
> +}
>  
> -    if (n->params.cmb_size_mb) {
> +static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> +{
> +    int64_t bs_size;
> +    NvmeIdNs *id_ns = &ns->id_ns;
>  
> -        NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
> -        NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
> +    bs_size = blk_getlength(n->conf.blk);
> +    if (bs_size < 0) {
> +        error_setg_errno(errp, -bs_size, "blk_getlength");
> +        return -1;
> +    }
>  
> -        NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
> -        NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
> -        NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
> -        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
> -        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
> -        NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
> -        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
> +    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> +    n->ns_size = bs_size;
>  
> -        n->cmbloc = n->bar.cmbloc;
> -        n->cmbsz = n->bar.cmbsz;
> +    id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(n, ns));
>  
> -        n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
> -        memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
> -                              "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
> -        pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
> -            PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
> -            PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
> +    /* no thin provisioning */
> +    id_ns->ncap = id_ns->nsze;
> +    id_ns->nuse = id_ns->ncap;
>  
> +    return 0;
> +}
> +
> +static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> +{
> +    NvmeCtrl *n = NVME(pci_dev);
> +    int i;
> +
> +    if (nvme_check_constraints(n, errp)) {
> +        return;
> +    }
> +
> +    nvme_init_state(n);
> +
> +    if (nvme_init_blk(n, errp)) {
> +        return;
>      }
>  
>      for (i = 0; i < n->num_namespaces; i++) {
> -        NvmeNamespace *ns = &n->namespaces[i];
> -        NvmeIdNs *id_ns = &ns->id_ns;
> -        id_ns->nsfeat = 0;
> -        id_ns->nlbaf = 0;
> -        id_ns->flbas = 0;
> -        id_ns->mc = 0;
> -        id_ns->dpc = 0;
> -        id_ns->dps = 0;
> -        id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> -        id_ns->ncap  = id_ns->nuse = id_ns->nsze =
> -            cpu_to_le64(n->ns_size >>
> -                id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas)].ds);
> +        if (nvme_init_namespace(n, &n->namespaces[i], errp)) {
> +            return;
> +        }
>      }
> +
> +    nvme_init_pci(n, pci_dev);
> +    nvme_init_ctrl(n);
>  }
>  
>  static void nvme_exit(PCIDevice *pci_dev)
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 98f5b9479244..b7c465560eea 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -67,6 +67,22 @@ typedef struct NvmeNamespace {
>      NvmeIdNs        id_ns;
>  } NvmeNamespace;
>  
> +static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
> +{
> +    NvmeIdNs *id_ns = &ns->id_ns;
> +    return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
> +}
> +
> +static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
> +{
> +    return nvme_ns_lbaf(ns)->ds;
> +}
> +
> +static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
> +{
> +    return 1 << nvme_ns_lbads(ns);
> +}
> +
>  #define TYPE_NVME "nvme"
>  #define NVME(obj) \
>          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
> @@ -88,8 +104,6 @@ typedef struct NvmeCtrl {
>      uint32_t    num_namespaces;
>      uint32_t    max_q_ents;
>      uint64_t    ns_size;
> -    uint32_t    cmbsz;
> -    uint32_t    cmbloc;
>      uint8_t     *cmbuf;
>      uint64_t    irq_status;
>      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
> @@ -103,4 +117,9 @@ typedef struct NvmeCtrl {
>      NvmeIdCtrl      id_ctrl;
>  } NvmeCtrl;
>  
> +static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
> +{
> +    return n->ns_size >> nvme_ns_lbads(ns);
> +}
> +
>  #endif /* HW_NVME_H */

Small nitpick: To be honest this not only refactoring in the device realization since you also (rightfully)
removed the duplicated cmbsz/cmbloc so I would add a mention for this in the commit message.
But that doesn't matter that much, so

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 11/42] nvme: add temperature threshold feature
  2020-03-16 14:28 ` [PATCH v6 11/42] nvme: add temperature threshold feature Klaus Jensen
@ 2020-03-25 10:40   ` Maxim Levitsky
  2020-03-31  5:40     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:40 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> It might seem wierd to implement this feature for an emulated device,
> but it is mandatory to support and the feature is useful for testing
> asynchronous event request support, which will be added in a later
> patch.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/block/nvme.h      |  2 ++
>  include/block/nvme.h |  8 +++++++-
>  3 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 665485045066..64c42101df5c 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -45,6 +45,9 @@
>  #include "nvme.h"
>  
>  #define NVME_CMB_BIR 2
> +#define NVME_TEMPERATURE 0x143
> +#define NVME_TEMPERATURE_WARNING 0x157
> +#define NVME_TEMPERATURE_CRITICAL 0x175
>  
>  #define NVME_GUEST_ERR(trace, fmt, ...) \
>      do { \
> @@ -798,9 +801,31 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
>      uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> +    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
>      uint32_t result;
>  
>      switch (dw10) {
> +    case NVME_TEMPERATURE_THRESHOLD:
> +        result = 0;
> +
> +        /*
> +         * The controller only implements the Composite Temperature sensor, so
> +         * return 0 for all other sensors.
> +         */
> +        if (NVME_TEMP_TMPSEL(dw11)) {
> +            break;
> +        }
> +
> +        switch (NVME_TEMP_THSEL(dw11)) {
> +        case 0x0:
> +            result = cpu_to_le16(n->features.temp_thresh_hi);
> +            break;
> +        case 0x1:
> +            result = cpu_to_le16(n->features.temp_thresh_low);
> +            break;
> +        }
> +
> +        break;
>      case NVME_VOLATILE_WRITE_CACHE:
>          result = blk_enable_write_cache(n->conf.blk);
>          trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
> @@ -845,6 +870,23 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      uint32_t dw11 = le32_to_cpu(cmd->cdw11);
>  
>      switch (dw10) {
> +    case NVME_TEMPERATURE_THRESHOLD:
> +        if (NVME_TEMP_TMPSEL(dw11)) {
> +            break;
> +        }
> +
> +        switch (NVME_TEMP_THSEL(dw11)) {
> +        case 0x0:
> +            n->features.temp_thresh_hi = NVME_TEMP_TMPTH(dw11);
> +            break;
> +        case 0x1:
> +            n->features.temp_thresh_low = NVME_TEMP_TMPTH(dw11);
> +            break;
> +        default:
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +        }
> +
> +        break;
>      case NVME_VOLATILE_WRITE_CACHE:
>          blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
>          break;
> @@ -1374,6 +1416,7 @@ static void nvme_init_state(NvmeCtrl *n)
>      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>      n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
>      n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
> +    n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
>  }
>  
>  static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
> @@ -1451,6 +1494,11 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->acl = 3;
>      id->frmw = 7 << 1;
>      id->lpa = 1 << 0;
> +
> +    /* recommended default value (~70 C) */
> +    id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
> +    id->cctemp = cpu_to_le16(NVME_TEMPERATURE_CRITICAL);
> +
>      id->sqes = (0x6 << 4) | 0x6;
>      id->cqes = (0x4 << 4) | 0x4;
>      id->nn = cpu_to_le32(n->num_namespaces);
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index b7c465560eea..8cda5f02c622 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -108,6 +108,7 @@ typedef struct NvmeCtrl {
>      uint64_t    irq_status;
>      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
>      uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
> +    uint16_t    temperature;
You forgot to move this too.

>  
>      NvmeNamespace   *namespaces;
>      NvmeSQueue      **sq;
> @@ -115,6 +116,7 @@ typedef struct NvmeCtrl {
>      NvmeSQueue      admin_sq;
>      NvmeCQueue      admin_cq;
>      NvmeIdCtrl      id_ctrl;
> +    NvmeFeatureVal  features;
>  } NvmeCtrl;
>  
>  static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index a083c1b3a613..91fc4738a3e0 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -688,7 +688,13 @@ enum NvmeIdCtrlOncs {
>  typedef struct NvmeFeatureVal {
>      uint32_t    arbitration;
>      uint32_t    power_mgmt;
> -    uint32_t    temp_thresh;
> +    union {
> +        struct {
> +            uint16_t temp_thresh_hi;
> +            uint16_t temp_thresh_low;
> +        };
> +        uint32_t temp_thresh;
> +    };
>      uint32_t    err_rec;
>      uint32_t    volatile_wc;
>      uint32_t    num_queues;

With 'temperature' field removed from the header:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 12/42] nvme: add support for the get log page command
  2020-03-16 14:28 ` [PATCH v6 12/42] nvme: add support for the get log page command Klaus Jensen
@ 2020-03-25 10:40   ` Maxim Levitsky
  2020-03-31  5:41     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:40 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Add support for the Get Log Page command and basic implementations of
> the mandatory Error Information, SMART / Health Information and Firmware
> Slot Information log pages.
> 
> In violation of the specification, the SMART / Health Information log
> page does not persist information over the lifetime of the controller
> because the device has no place to store such persistent state.
> 
> Note that the LPA field in the Identify Controller data structure
> intentionally has bit 0 cleared because there is no namespace specific
> information in the SMART / Health information log page.
> 
> Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> Section 5.10 ("Get Log Page command").
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c       | 138 +++++++++++++++++++++++++++++++++++++++++-
>  hw/block/nvme.h       |  10 +++
>  hw/block/trace-events |   2 +
>  3 files changed, 149 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 64c42101df5c..83ff3fbfb463 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -569,6 +569,138 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
>      return NVME_SUCCESS;
>  }
>  
> +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> +                                uint64_t off, NvmeRequest *req)
> +{
> +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> +    uint32_t nsid = le32_to_cpu(cmd->nsid);
> +
> +    uint32_t trans_len;
> +    time_t current_ms;
> +    uint64_t units_read = 0, units_written = 0;
> +    uint64_t read_commands = 0, write_commands = 0;
> +    NvmeSmartLog smart;
> +    BlockAcctStats *s;
> +
> +    if (nsid && nsid != 0xffffffff) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    s = blk_get_stats(n->conf.blk);
> +
> +    units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> +    units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
> +    read_commands = s->nr_ops[BLOCK_ACCT_READ];
> +    write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
> +
> +    if (off > sizeof(smart)) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    trans_len = MIN(sizeof(smart) - off, buf_len);
> +
> +    memset(&smart, 0x0, sizeof(smart));
> +
> +    smart.data_units_read[0] = cpu_to_le64(units_read / 1000);
> +    smart.data_units_written[0] = cpu_to_le64(units_written / 1000);
> +    smart.host_read_commands[0] = cpu_to_le64(read_commands);
> +    smart.host_write_commands[0] = cpu_to_le64(write_commands);
> +
> +    smart.temperature[0] = n->temperature & 0xff;
> +    smart.temperature[1] = (n->temperature >> 8) & 0xff;
> +
> +    if ((n->temperature > n->features.temp_thresh_hi) ||
> +        (n->temperature < n->features.temp_thresh_low)) {
> +        smart.critical_warning |= NVME_SMART_TEMPERATURE;
> +    }
> +
> +    current_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> +    smart.power_on_hours[0] =
> +        cpu_to_le64((((current_ms - n->starttime_ms) / 1000) / 60) / 60);
OH, I didn't notice that you didn't have the endian conversion in V5, it is needed here
of course.

> +
> +    return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
> +                             prp2);
> +}
> +
> +static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> +                                 uint64_t off, NvmeRequest *req)
> +{
> +    uint32_t trans_len;
> +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> +    NvmeFwSlotInfoLog fw_log;
> +
> +    if (off > sizeof(fw_log)) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    memset(&fw_log, 0, sizeof(NvmeFwSlotInfoLog));
> +
> +    trans_len = MIN(sizeof(fw_log) - off, buf_len);
> +
> +    return nvme_dma_read_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1,
> +                             prp2);
> +}
> +
> +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> +                                uint64_t off, NvmeRequest *req)
> +{
> +    uint32_t trans_len;
> +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> +    uint8_t errlog[64];
I'll would replace this with sizeof(NvmeErrorLogEntry)
(and add NvmeErrorLogEntry to the nvme.h), just for the sake of consistency,
and in case we end up reporting some errors to the log in the future.


> +
> +    if (off > sizeof(errlog)) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    memset(errlog, 0x0, sizeof(errlog));
> +
> +    trans_len = MIN(sizeof(errlog) - off, buf_len);
> +
> +    return nvme_dma_read_prp(n, errlog, trans_len, prp1, prp2);
> +}
Besides this, looks good now.

> +
> +static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +    uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> +    uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> +    uint32_t dw12 = le32_to_cpu(cmd->cdw12);
> +    uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> +    uint8_t  lid = dw10 & 0xff;
> +    uint8_t  rae = (dw10 >> 15) & 0x1;
> +    uint32_t numdl, numdu;
> +    uint64_t off, lpol, lpou;
> +    size_t   len;
> +
> +    numdl = (dw10 >> 16);
> +    numdu = (dw11 & 0xffff);
> +    lpol = dw12;
> +    lpou = dw13;
> +
> +    len = (((numdu << 16) | numdl) + 1) << 2;
> +    off = (lpou << 32ULL) | lpol;
> +
> +    if (off & 0x3) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    trace_nvme_dev_get_log(nvme_cid(req), lid, rae, len, off);
> +
> +    switch (lid) {
> +    case NVME_LOG_ERROR_INFO:
> +        return nvme_error_info(n, cmd, len, off, req);
> +    case NVME_LOG_SMART_INFO:
> +        return nvme_smart_info(n, cmd, len, off, req);
> +    case NVME_LOG_FW_SLOT_INFO:
> +        return nvme_fw_log_info(n, cmd, len, off, req);
> +    default:
> +        trace_nvme_dev_err_invalid_log_page(nvme_cid(req), lid);
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +}
> +
>  static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
>  {
>      n->cq[cq->cqid] = NULL;
> @@ -914,6 +1046,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          return nvme_del_sq(n, cmd);
>      case NVME_ADM_CMD_CREATE_SQ:
>          return nvme_create_sq(n, cmd);
> +    case NVME_ADM_CMD_GET_LOG_PAGE:
> +        return nvme_get_log(n, cmd, req);
>      case NVME_ADM_CMD_DELETE_CQ:
>          return nvme_del_cq(n, cmd);
>      case NVME_ADM_CMD_CREATE_CQ:
> @@ -1416,7 +1550,9 @@ static void nvme_init_state(NvmeCtrl *n)
>      n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>      n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
>      n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
> +    n->temperature = NVME_TEMPERATURE;
>      n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
> +    n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
>  }
>  
>  static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
> @@ -1493,7 +1629,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>       */
>      id->acl = 3;
>      id->frmw = 7 << 1;
> -    id->lpa = 1 << 0;
> +    id->lpa = 1 << 2;
>  
>      /* recommended default value (~70 C) */
>      id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 8cda5f02c622..ebeee2edc4f4 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -109,6 +109,7 @@ typedef struct NvmeCtrl {
>      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
>      uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
>      uint16_t    temperature;
> +    uint64_t    starttime_ms;
>  
>      NvmeNamespace   *namespaces;
>      NvmeSQueue      **sq;
> @@ -124,4 +125,13 @@ static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
>      return n->ns_size >> nvme_ns_lbads(ns);
>  }
>  
> +static inline uint16_t nvme_cid(NvmeRequest *req)
> +{
> +    if (req) {
> +        return le16_to_cpu(req->cqe.cid);
> +    }
> +
> +    return 0xffff;
> +}
> +
>  #endif /* HW_NVME_H */
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index ade506ea2bb2..7da088479f39 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -46,6 +46,7 @@ nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
>  nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
>  nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
>  nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
> +nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
>  nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
> @@ -85,6 +86,7 @@ nvme_dev_err_invalid_create_cq_qflags(uint16_t qflags) "failed creating completi
>  nvme_dev_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx16""
>  nvme_dev_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32""
>  nvme_dev_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32""
> +nvme_dev_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 0x%"PRIx16""
>  nvme_dev_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues"
>  nvme_dev_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues"
>  nvme_dev_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null"


Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 13/42] nvme: add support for the asynchronous event request command
  2020-03-16 14:28 ` [PATCH v6 13/42] nvme: add support for the asynchronous event request command Klaus Jensen
@ 2020-03-25 10:41   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:41 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> Section 5.2 ("Asynchronous Event Request command").
> 
> Mostly imported from Keith's qemu-nvme tree. Modified with a max number
> of queued events (controllable with the aer_max_queued device
> parameter). The spec states that the controller *should* retain
> events, so we do best effort here.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/block/nvme.c       | 178 ++++++++++++++++++++++++++++++++++++++++--
>  hw/block/nvme.h       |  14 +++-
>  hw/block/trace-events |   9 +++
>  include/block/nvme.h  |   8 +-
>  4 files changed, 199 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 83ff3fbfb463..ff8975cd6667 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -325,6 +325,85 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
>      timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
>  }
>  
> +static void nvme_process_aers(void *opaque)
> +{
> +    NvmeCtrl *n = opaque;
> +    NvmeAsyncEvent *event, *next;
> +
> +    trace_nvme_dev_process_aers(n->aer_queued);
> +
> +    QTAILQ_FOREACH_SAFE(event, &n->aer_queue, entry, next) {
> +        NvmeRequest *req;
> +        NvmeAerResult *result;
> +
> +        /* can't post cqe if there is nothing to complete */
> +        if (!n->outstanding_aers) {
> +            trace_nvme_dev_no_outstanding_aers();
> +            break;
> +        }
> +
> +        /* ignore if masked (cqe posted, but event not cleared) */
> +        if (n->aer_mask & (1 << event->result.event_type)) {
> +            trace_nvme_dev_aer_masked(event->result.event_type, n->aer_mask);
> +            continue;
> +        }
> +
> +        QTAILQ_REMOVE(&n->aer_queue, event, entry);
> +        n->aer_queued--;
> +
> +        n->aer_mask |= 1 << event->result.event_type;
> +        n->outstanding_aers--;
> +
> +        req = n->aer_reqs[n->outstanding_aers];
> +
> +        result = (NvmeAerResult *) &req->cqe.result;
> +        result->event_type = event->result.event_type;
> +        result->event_info = event->result.event_info;
> +        result->log_page = event->result.log_page;
> +        g_free(event);
> +
> +        req->status = NVME_SUCCESS;
> +
> +        trace_nvme_dev_aer_post_cqe(result->event_type, result->event_info,
> +                                    result->log_page);
> +
> +        nvme_enqueue_req_completion(&n->admin_cq, req);
> +    }
> +}
> +
> +static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
> +                               uint8_t event_info, uint8_t log_page)
> +{
> +    NvmeAsyncEvent *event;
> +
> +    trace_nvme_dev_enqueue_event(event_type, event_info, log_page);
> +
> +    if (n->aer_queued == n->params.aer_max_queued) {
> +        trace_nvme_dev_enqueue_event_noqueue(n->aer_queued);
> +        return;
> +    }
> +
> +    event = g_new(NvmeAsyncEvent, 1);
> +    event->result = (NvmeAerResult) {
> +        .event_type = event_type,
> +        .event_info = event_info,
> +        .log_page   = log_page,
> +    };
> +
> +    QTAILQ_INSERT_TAIL(&n->aer_queue, event, entry);
> +    n->aer_queued++;
> +
> +    nvme_process_aers(n);
> +}
> +
> +static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
> +{
> +    n->aer_mask &= ~(1 << event_type);
> +    if (!QTAILQ_EMPTY(&n->aer_queue)) {
> +        nvme_process_aers(n);
> +    }
> +}
> +
>  static void nvme_rw_cb(void *opaque, int ret)
>  {
>      NvmeRequest *req = opaque;
> @@ -569,8 +648,9 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd *cmd)
>      return NVME_SUCCESS;
>  }
>  
> -static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> -                                uint64_t off, NvmeRequest *req)
> +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
> +                                uint32_t buf_len, uint64_t off,
> +                                NvmeRequest *req)
>  {
>      uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
>      uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> @@ -619,6 +699,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
>      smart.power_on_hours[0] =
>          cpu_to_le64((((current_ms - n->starttime_ms) / 1000) / 60) / 60);
>  
> +    if (!rae) {
> +        nvme_clear_events(n, NVME_AER_TYPE_SMART);
> +    }
> +
>      return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
>                               prp2);
>  }
> @@ -643,14 +727,19 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
>                               prp2);
>  }
>  
> -static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> -                                uint64_t off, NvmeRequest *req)
> +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
> +                                uint32_t buf_len, uint64_t off,
> +                                NvmeRequest *req)
>  {
>      uint32_t trans_len;
>      uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
>      uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>      uint8_t errlog[64];
>  
> +    if (!rae) {
> +        nvme_clear_events(n, NVME_AER_TYPE_ERROR);
> +    }
> +
>      if (off > sizeof(errlog)) {
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
> @@ -690,9 +779,9 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  
>      switch (lid) {
>      case NVME_LOG_ERROR_INFO:
> -        return nvme_error_info(n, cmd, len, off, req);
> +        return nvme_error_info(n, cmd, rae, len, off, req);
>      case NVME_LOG_SMART_INFO:
> -        return nvme_smart_info(n, cmd, len, off, req);
> +        return nvme_smart_info(n, cmd, rae, len, off, req);
>      case NVME_LOG_FW_SLOT_INFO:
>          return nvme_fw_log_info(n, cmd, len, off, req);
>      default:
> @@ -969,6 +1058,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          break;
>      case NVME_TIMESTAMP:
>          return nvme_get_feature_timestamp(n, cmd);
> +    case NVME_ASYNCHRONOUS_EVENT_CONF:
> +        result = cpu_to_le32(n->features.async_config);
> +        break;
>      default:
>          trace_nvme_dev_err_invalid_getfeat(dw10);
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -1018,6 +1110,14 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>              return NVME_INVALID_FIELD | NVME_DNR;
>          }
>  
> +        if (((n->temperature > n->features.temp_thresh_hi) ||
> +            (n->temperature < n->features.temp_thresh_low)) &&
> +            NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) {
> +            nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
> +                               NVME_AER_INFO_SMART_TEMP_THRESH,
> +                               NVME_LOG_SMART_INFO);
> +        }
> +
>          break;
>      case NVME_VOLATILE_WRITE_CACHE:
>          blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
> @@ -1032,6 +1132,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          break;
>      case NVME_TIMESTAMP:
>          return nvme_set_feature_timestamp(n, cmd);
> +    case NVME_ASYNCHRONOUS_EVENT_CONF:
> +        n->features.async_config = dw11;
> +        break;
>      default:
>          trace_nvme_dev_err_invalid_setfeat(dw10);
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -1039,6 +1142,25 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      return NVME_SUCCESS;
>  }
>  
> +static uint16_t nvme_aer(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +    trace_nvme_dev_aer(nvme_cid(req));
> +
> +    if (n->outstanding_aers > n->params.aerl) {
> +        trace_nvme_dev_aer_aerl_exceeded();
> +        return NVME_AER_LIMIT_EXCEEDED;
> +    }
> +
> +    n->aer_reqs[n->outstanding_aers] = req;
> +    n->outstanding_aers++;
> +
> +    if (!QTAILQ_EMPTY(&n->aer_queue)) {
> +        nvme_process_aers(n);
> +    }
> +
> +    return NVME_NO_COMPLETE;
> +}
> +
>  static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
>      switch (cmd->opcode) {
> @@ -1060,6 +1182,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          return nvme_set_feature(n, cmd, req);
>      case NVME_ADM_CMD_GET_FEATURES:
>          return nvme_get_feature(n, cmd, req);
> +    case NVME_ADM_CMD_ASYNC_EV_REQ:
> +        return nvme_aer(n, cmd, req);
>      default:
>          trace_nvme_dev_err_invalid_admin_opc(cmd->opcode);
>          return NVME_INVALID_OPCODE | NVME_DNR;
> @@ -1114,6 +1238,15 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
>          }
>      }
>  
> +    while (!QTAILQ_EMPTY(&n->aer_queue)) {
> +        NvmeAsyncEvent *event = QTAILQ_FIRST(&n->aer_queue);
> +        QTAILQ_REMOVE(&n->aer_queue, event, entry);
> +        g_free(event);
> +    }
> +
> +    n->aer_queued = 0;
> +    n->outstanding_aers = 0;
> +
>      blk_flush(n->conf.blk);
>      n->bar.cc = 0;
>  }
> @@ -1210,6 +1343,8 @@ static int nvme_start_ctrl(NvmeCtrl *n)
>  
>      nvme_set_timestamp(n, 0ULL);
>  
> +    QTAILQ_INIT(&n->aer_queue);
> +
>      return 0;
>  }
>  
> @@ -1402,6 +1537,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>                             "completion queue doorbell write"
>                             " for nonexistent queue,"
>                             " sqid=%"PRIu32", ignoring", qid);
> +
> +            if (n->outstanding_aers) {
> +                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
> +                                   NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
> +                                   NVME_LOG_ERROR_INFO);
> +            }
> +
>              return;
>          }
>  
> @@ -1412,6 +1554,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>                             " beyond queue size, sqid=%"PRIu32","
>                             " new_head=%"PRIu16", ignoring",
>                             qid, new_head);
> +
> +            if (n->outstanding_aers) {
> +                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
> +                                   NVME_AER_INFO_ERR_INVALID_DB_VALUE,
> +                                   NVME_LOG_ERROR_INFO);
> +            }
> +
>              return;
>          }
>  
> @@ -1440,6 +1589,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>                             "submission queue doorbell write"
>                             " for nonexistent queue,"
>                             " sqid=%"PRIu32", ignoring", qid);
> +
> +            if (n->outstanding_aers) {
> +                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
> +                                   NVME_AER_INFO_ERR_INVALID_DB_REGISTER,
> +                                   NVME_LOG_ERROR_INFO);
> +            }
> +
>              return;
>          }
>  
> @@ -1450,6 +1606,13 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>                             " beyond queue size, sqid=%"PRIu32","
>                             " new_tail=%"PRIu16", ignoring",
>                             qid, new_tail);
> +
> +            if (n->outstanding_aers) {
> +                nvme_enqueue_event(n, NVME_AER_TYPE_ERROR,
> +                                   NVME_AER_INFO_ERR_INVALID_DB_VALUE,
> +                                   NVME_LOG_ERROR_INFO);
> +            }
> +
>              return;
>          }
>  
> @@ -1553,6 +1716,7 @@ static void nvme_init_state(NvmeCtrl *n)
>      n->temperature = NVME_TEMPERATURE;
>      n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
>      n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> +    n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
>  }
>  
>  static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
> @@ -1628,6 +1792,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>       * inconsequential.
>       */
>      id->acl = 3;
> +    id->aerl = n->params.aerl;
>      id->frmw = 7 << 1;
>      id->lpa = 1 << 2;
>  
> @@ -1713,6 +1878,7 @@ static void nvme_exit(PCIDevice *pci_dev)
>      g_free(n->namespaces);
>      g_free(n->cq);
>      g_free(n->sq);
> +    g_free(n->aer_reqs);
>  
>      if (n->params.cmb_size_mb) {
>          g_free(n->cmbuf);
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index ebeee2edc4f4..b709a8bb8d40 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -7,17 +7,21 @@
>      DEFINE_PROP_STRING("serial", _state, _props.serial), \
>      DEFINE_PROP_UINT32("cmb_size_mb", _state, _props.cmb_size_mb, 0), \
>      DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 0), \
> -    DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64)
> +    DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64), \
> +    DEFINE_PROP_UINT8("aerl", _state, _props.aerl, 3), \
> +    DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64)
>  
>  typedef struct NvmeParams {
>      char     *serial;
>      uint32_t num_queues;
>      uint32_t max_ioqpairs;
>      uint32_t cmb_size_mb;
> +    uint8_t  aerl;
> +    uint32_t aer_max_queued;
>  } NvmeParams;
>  
>  typedef struct NvmeAsyncEvent {
> -    QSIMPLEQ_ENTRY(NvmeAsyncEvent) entry;
> +    QTAILQ_ENTRY(NvmeAsyncEvent) entry;
>      NvmeAerResult result;
>  } NvmeAsyncEvent;
>  
> @@ -104,6 +108,7 @@ typedef struct NvmeCtrl {
>      uint32_t    num_namespaces;
>      uint32_t    max_q_ents;
>      uint64_t    ns_size;
> +    uint8_t     outstanding_aers;
>      uint8_t     *cmbuf;
>      uint64_t    irq_status;
>      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
> @@ -111,6 +116,11 @@ typedef struct NvmeCtrl {
>      uint16_t    temperature;
>      uint64_t    starttime_ms;
>  
> +    uint8_t     aer_mask;
> +    NvmeRequest **aer_reqs;
> +    QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
> +    int         aer_queued;
> +
>      NvmeNamespace   *namespaces;
>      NvmeSQueue      **sq;
>      NvmeCQueue      **cq;
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 7da088479f39..3952c36774cf 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -47,6 +47,15 @@ nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_
>  nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
>  nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
>  nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
> +nvme_dev_process_aers(int queued) "queued %d"
> +nvme_dev_aer(uint16_t cid) "cid %"PRIu16""
> +nvme_dev_aer_aerl_exceeded(void) "aerl exceeded"
> +nvme_dev_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8""
> +nvme_dev_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
> +nvme_dev_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
> +nvme_dev_enqueue_event_noqueue(int queued) "queued %d"
> +nvme_dev_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
> +nvme_dev_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
>  nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 91fc4738a3e0..f2a8b07c0f2f 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -425,8 +425,8 @@ enum NvmeAsyncEventRequest {
>      NVME_AER_TYPE_SMART                     = 1,
>      NVME_AER_TYPE_IO_SPECIFIC               = 6,
>      NVME_AER_TYPE_VENDOR_SPECIFIC           = 7,
> -    NVME_AER_INFO_ERR_INVALID_SQ            = 0,
> -    NVME_AER_INFO_ERR_INVALID_DB            = 1,
> +    NVME_AER_INFO_ERR_INVALID_DB_REGISTER   = 0,
> +    NVME_AER_INFO_ERR_INVALID_DB_VALUE      = 1,
>      NVME_AER_INFO_ERR_DIAG_FAIL             = 2,
>      NVME_AER_INFO_ERR_PERS_INTERNAL_ERR     = 3,
>      NVME_AER_INFO_ERR_TRANS_INTERNAL_ERR    = 4,
> @@ -717,6 +717,10 @@ typedef struct NvmeFeatureVal {
>  #define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
>  #define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
>  
> +#define NVME_AEC_SMART(aec)         (aec & 0xff)
> +#define NVME_AEC_NS_ATTR(aec)       ((aec >> 8) & 0x1)
> +#define NVME_AEC_FW_ACTIVATION(aec) ((aec >> 9) & 0x1)
> +
>  enum NvmeFeatureIds {
>      NVME_ARBITRATION                = 0x1,
>      NVME_POWER_MANAGEMENT           = 0x2,

Indention issues indeed looks like all fixed,
All other minor changes are OK as well,
So,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 14/42] nvme: add missing mandatory features
  2020-03-16 14:29 ` [PATCH v6 14/42] nvme: add missing mandatory features Klaus Jensen
@ 2020-03-25 10:41   ` Maxim Levitsky
  2020-03-31  5:41     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:41 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Add support for returning a resonable response to Get/Set Features of
> mandatory features.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c       | 60 ++++++++++++++++++++++++++++++++++++++++++-
>  hw/block/trace-events |  2 ++
>  include/block/nvme.h  |  6 ++++-
>  3 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ff8975cd6667..eb9c722df968 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1025,7 +1025,15 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      uint32_t dw11 = le32_to_cpu(cmd->cdw11);
>      uint32_t result;
>  
> +    trace_nvme_dev_getfeat(nvme_cid(req), dw10);
> +
>      switch (dw10) {
> +    case NVME_ARBITRATION:
> +        result = cpu_to_le32(n->features.arbitration);
> +        break;
> +    case NVME_POWER_MANAGEMENT:
> +        result = cpu_to_le32(n->features.power_mgmt);
> +        break;
>      case NVME_TEMPERATURE_THRESHOLD:
>          result = 0;
>  
> @@ -1046,9 +1054,12 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>              break;
>          }
>  
> +        break;
> +    case NVME_ERROR_RECOVERY:
> +        result = cpu_to_le32(n->features.err_rec);
>          break;
>      case NVME_VOLATILE_WRITE_CACHE:
> -        result = blk_enable_write_cache(n->conf.blk);
> +        result = cpu_to_le32(blk_enable_write_cache(n->conf.blk));
>          trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> @@ -1058,6 +1069,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          break;
>      case NVME_TIMESTAMP:
>          return nvme_get_feature_timestamp(n, cmd);
> +    case NVME_INTERRUPT_COALESCING:
> +        result = cpu_to_le32(n->features.int_coalescing);
> +        break;
> +    case NVME_INTERRUPT_VECTOR_CONF:
> +        if ((dw11 & 0xffff) > n->params.max_ioqpairs + 1) {
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +        }
I still think that this should be >= since the interrupt vector is not zero based.
So if we have for example 3 IO queues, then we have 4 queues in total
which translates to irq numbers 0..3.

BTW the user of the device doesn't have to have 1:1 mapping between qid and msi interrupt index,
in fact when MSI is not used, all the queues will map to the same vector, which will be interrupt 0
from point of view of the device IMHO.
So it kind of makes sense IMHO to have num_irqs or something, even if it technically equals to number of queues.


> +
> +        result = cpu_to_le32(n->features.int_vector_config[dw11 & 0xffff]);
> +        break;
> +    case NVME_WRITE_ATOMICITY:
> +        result = cpu_to_le32(n->features.write_atomicity);
> +        break;
>      case NVME_ASYNCHRONOUS_EVENT_CONF:
>          result = cpu_to_le32(n->features.async_config);
>          break;
> @@ -1093,6 +1117,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      uint32_t dw10 = le32_to_cpu(cmd->cdw10);
>      uint32_t dw11 = le32_to_cpu(cmd->cdw11);
>  
> +    trace_nvme_dev_setfeat(nvme_cid(req), dw10, dw11);
> +
>      switch (dw10) {
>      case NVME_TEMPERATURE_THRESHOLD:
>          if (NVME_TEMP_TMPSEL(dw11)) {
> @@ -1120,6 +1146,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  
>          break;
>      case NVME_VOLATILE_WRITE_CACHE:
> +        if (blk_enable_write_cache(n->conf.blk)) {
> +            blk_flush(n->conf.blk);
> +        }

(not your fault) but the blk_enable_write_cache function name is highly misleading,
since it doesn't enable anything but just gets the flag if the write cache is enabled.
It really should be called blk_get_enable_write_cache.

> +
>          blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> @@ -1135,6 +1165,13 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      case NVME_ASYNCHRONOUS_EVENT_CONF:
>          n->features.async_config = dw11;
>          break;
> +    case NVME_ARBITRATION:
> +    case NVME_POWER_MANAGEMENT:
> +    case NVME_ERROR_RECOVERY:
> +    case NVME_INTERRUPT_COALESCING:
> +    case NVME_INTERRUPT_VECTOR_CONF:
> +    case NVME_WRITE_ATOMICITY:
> +        return NVME_FEAT_NOT_CHANGABLE | NVME_DNR;
>      default:
>          trace_nvme_dev_err_invalid_setfeat(dw10);
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -1716,6 +1753,25 @@ static void nvme_init_state(NvmeCtrl *n)
>      n->temperature = NVME_TEMPERATURE;
>      n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
>      n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> +
> +    /*
> +     * There is no limit on the number of commands that the controller may
> +     * launch at one time from a particular Submission Queue.
> +     */
> +    n->features.arbitration = NVME_ARB_AB_NOLIMIT;
> +
> +    n->features.int_vector_config = g_malloc0_n(n->params.max_ioqpairs + 1,
> +        sizeof(*n->features.int_vector_config));
> +
> +    for (int i = 0; i < n->params.max_ioqpairs + 1; i++) {
> +        n->features.int_vector_config[i] = i;
> +
> +        /* interrupt coalescing is not supported for the admin queue */
> +        if (i == 0) {
> +            n->features.int_vector_config[i] |= NVME_INTVC_NOCOALESCING;
> +        }
> +    }
> +
>      n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
>  }
>  
> @@ -1804,6 +1860,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->cqes = (0x4 << 4) | 0x4;
>      id->nn = cpu_to_le32(n->num_namespaces);
>      id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
> +
Unrelated whitespace change
>      id->psd[0].mp = cpu_to_le16(0x9c4);
>      id->psd[0].enlat = cpu_to_le32(0x10);
>      id->psd[0].exlat = cpu_to_le32(0x4);
> @@ -1879,6 +1936,7 @@ static void nvme_exit(PCIDevice *pci_dev)
>      g_free(n->cq);
>      g_free(n->sq);
>      g_free(n->aer_reqs);
> +    g_free(n->features.int_vector_config);
>  
>      if (n->params.cmb_size_mb) {
>          g_free(n->cmbuf);
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 3952c36774cf..4cf39961989d 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -41,6 +41,8 @@ nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
>  nvme_dev_identify_ctrl(void) "identify controller"
>  nvme_dev_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
>  nvme_dev_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
> +nvme_dev_getfeat(uint16_t cid, uint32_t fid) "cid %"PRIu16" fid 0x%"PRIx32""
> +nvme_dev_setfeat(uint16_t cid, uint32_t fid, uint32_t val) "cid %"PRIu16" fid 0x%"PRIx32" val 0x%"PRIx32""
>  nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
>  nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
>  nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index f2a8b07c0f2f..ecc02fbe8bb8 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -490,7 +490,8 @@ enum NvmeStatusCodes {
>      NVME_FW_REQ_RESET           = 0x010b,
>      NVME_INVALID_QUEUE_DEL      = 0x010c,
>      NVME_FID_NOT_SAVEABLE       = 0x010d,
> -    NVME_FID_NOT_NSID_SPEC      = 0x010f,
> +    NVME_FEAT_NOT_CHANGABLE     = 0x010e,
> +    NVME_FEAT_NOT_NS_SPEC       = 0x010f,
>      NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
>      NVME_CONFLICTING_ATTRS      = 0x0180,
>      NVME_INVALID_PROT_INFO      = 0x0181,
> @@ -706,6 +707,7 @@ typedef struct NvmeFeatureVal {
>  } NvmeFeatureVal;
>  
>  #define NVME_ARB_AB(arb)    (arb & 0x7)
> +#define NVME_ARB_AB_NOLIMIT 0x7
>  #define NVME_ARB_LPW(arb)   ((arb >> 8) & 0xff)
>  #define NVME_ARB_MPW(arb)   ((arb >> 16) & 0xff)
>  #define NVME_ARB_HPW(arb)   ((arb >> 24) & 0xff)
> @@ -713,6 +715,8 @@ typedef struct NvmeFeatureVal {
>  #define NVME_INTC_THR(intc)     (intc & 0xff)
>  #define NVME_INTC_TIME(intc)    ((intc >> 8) & 0xff)
>  
> +#define NVME_INTVC_NOCOALESCING (0x1 << 16)
> +
>  #define NVME_TEMP_THSEL(temp)  ((temp >> 20) & 0x3)
>  #define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
>  #define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)


Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 15/42] nvme: additional tracing
  2020-03-16 14:29 ` [PATCH v6 15/42] nvme: additional tracing Klaus Jensen
@ 2020-03-25 10:42   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:42 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Add additional trace calls for nvme_enqueue_req_completion, mmio and
> doorbell writes.
> 
> Also, streamline nvme_identify_ns and nvme_identify_ns_list. They do not
> need to repeat the command, it is already in the trace name.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c       | 10 ++++++++++
>  hw/block/trace-events |  9 +++++++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index eb9c722df968..85c7c86b35f0 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -320,6 +320,8 @@ static void nvme_post_cqes(void *opaque)
>  static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
>  {
>      assert(cq->cqid == req->sq->cqid);
> +    trace_nvme_dev_enqueue_req_completion(nvme_cid(req), cq->cqid,
> +                                          req->status);
>      QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
>      QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
>      timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
> @@ -1527,6 +1529,8 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size)
>      uint8_t *ptr = (uint8_t *)&n->bar;
>      uint64_t val = 0;
>  
> +    trace_nvme_dev_mmio_read(addr);
> +
>      if (unlikely(addr & (sizeof(uint32_t) - 1))) {
>          NVME_GUEST_ERR(nvme_dev_ub_mmiord_misaligned32,
>                         "MMIO read not 32-bit aligned,"
> @@ -1601,6 +1605,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>              return;
>          }
>  
> +        trace_nvme_dev_mmio_doorbell_cq(cq->cqid, new_head);
> +
>          start_sqs = nvme_cq_full(cq) ? 1 : 0;
>          cq->head = new_head;
>          if (start_sqs) {
> @@ -1653,6 +1659,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>              return;
>          }
>  
> +        trace_nvme_dev_mmio_doorbell_sq(sq->sqid, new_tail);
> +
>          sq->tail = new_tail;
>          timer_mod(sq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
>      }
> @@ -1661,6 +1669,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
>  static void nvme_mmio_write(void *opaque, hwaddr addr, uint64_t data,
>      unsigned size)
>  {
> +    trace_nvme_dev_mmio_write(addr, data);
> +
>      NvmeCtrl *n = (NvmeCtrl *)opaque;
>      if (addr < sizeof(n->bar)) {
>          nvme_write_bar(n, addr, data, size);
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 4cf39961989d..dde1d22bc39a 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -39,8 +39,8 @@ nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size,
>  nvme_dev_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
>  nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
>  nvme_dev_identify_ctrl(void) "identify controller"
> -nvme_dev_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
> -nvme_dev_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
> +nvme_dev_identify_ns(uint32_t ns) "nsid %"PRIu32""
> +nvme_dev_identify_nslist(uint32_t ns) "nsid %"PRIu32""
>  nvme_dev_getfeat(uint16_t cid, uint32_t fid) "cid %"PRIu16" fid 0x%"PRIx32""
>  nvme_dev_setfeat(uint16_t cid, uint32_t fid, uint32_t val) "cid %"PRIu16" fid 0x%"PRIx32" val 0x%"PRIx32""
>  nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"
> @@ -54,10 +54,13 @@ nvme_dev_aer(uint16_t cid) "cid %"PRIu16""
>  nvme_dev_aer_aerl_exceeded(void) "aerl exceeded"
>  nvme_dev_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8""
>  nvme_dev_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
> +nvme_dev_enqueue_req_completion(uint16_t cid, uint16_t cqid, uint16_t status) "cid %"PRIu16" cqid %"PRIu16" status 0x%"PRIx16""
>  nvme_dev_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
>  nvme_dev_enqueue_event_noqueue(int queued) "queued %d"
>  nvme_dev_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
>  nvme_dev_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
> +nvme_dev_mmio_read(uint64_t addr) "addr 0x%"PRIx64""
> +nvme_dev_mmio_write(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" data 0x%"PRIx64""
>  nvme_dev_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  nvme_dev_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  nvme_dev_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""
> @@ -70,6 +73,8 @@ nvme_dev_mmio_start_success(void) "setting controller enable bit succeeded"
>  nvme_dev_mmio_stopped(void) "cleared controller enable bit"
>  nvme_dev_mmio_shutdown_set(void) "shutdown bit set"
>  nvme_dev_mmio_shutdown_cleared(void) "shutdown bit cleared"
> +nvme_dev_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_head %"PRIu16""
> +nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_tail %"PRIu16""
>  
>  # nvme traces for error conditions
>  nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
Looks good.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 16/42] nvme: make sure ncqr and nsqr is valid
  2020-03-16 14:29 ` [PATCH v6 16/42] nvme: make sure ncqr and nsqr is valid Klaus Jensen
@ 2020-03-25 10:42   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:42 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> 0xffff is not an allowed value for NCQR and NSQR in Set Features on
> Number of Queues.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/block/nvme.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 85c7c86b35f0..e56142c4ea99 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1155,6 +1155,14 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> +        /*
> +         * NVMe v1.3, Section 5.21.1.7: 0xffff is not an allowed value for NCQR
> +         * and NSQR.
> +         */
> +        if ((dw11 & 0xffff) == 0xffff || ((dw11 >> 16) & 0xffff) == 0xffff) {
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +        }
> +
>          trace_nvme_dev_setfeat_numq((dw11 & 0xFFFF) + 1,
>                                      ((dw11 >> 16) & 0xFFFF) + 1,
>                                      n->params.max_ioqpairs,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 17/42] nvme: add log specific field to trace events
  2020-03-16 14:29 ` [PATCH v6 17/42] nvme: add log specific field to trace events Klaus Jensen
@ 2020-03-25 10:43   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:43 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The LSP field is not used directly now, but include it in the trace.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c       | 3 ++-
>  hw/block/trace-events | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index e56142c4ea99..16de3ca1c5d5 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -760,6 +760,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      uint32_t dw12 = le32_to_cpu(cmd->cdw12);
>      uint32_t dw13 = le32_to_cpu(cmd->cdw13);
>      uint8_t  lid = dw10 & 0xff;
> +    uint8_t  lsp = (dw10 >> 8) & 0xf;
>      uint8_t  rae = (dw10 >> 15) & 0x1;
>      uint32_t numdl, numdu;
>      uint64_t off, lpol, lpou;
> @@ -777,7 +778,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>  
> -    trace_nvme_dev_get_log(nvme_cid(req), lid, rae, len, off);
> +    trace_nvme_dev_get_log(nvme_cid(req), lid, lsp, rae, len, off);
>  
>      switch (lid) {
>      case NVME_LOG_ERROR_INFO:
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index dde1d22bc39a..13e2c71664f6 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -48,7 +48,7 @@ nvme_dev_getfeat_numq(int result) "get feature number of queues, result=%d"
>  nvme_dev_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
>  nvme_dev_setfeat_timestamp(uint64_t ts) "set feature timestamp = 0x%"PRIx64""
>  nvme_dev_getfeat_timestamp(uint64_t ts) "get feature timestamp = 0x%"PRIx64""
> -nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
> +nvme_dev_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64""
>  nvme_dev_process_aers(int queued) "queued %d"
>  nvme_dev_aer(uint16_t cid) "cid %"PRIu16""
>  nvme_dev_aer_aerl_exceeded(void) "aerl exceeded"
Perfect!
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 18/42] nvme: support identify namespace descriptor list
  2020-03-16 14:29 ` [PATCH v6 18/42] nvme: support identify namespace descriptor list Klaus Jensen
@ 2020-03-25 10:43   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:43 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Since we are not providing the NGUID or EUI64 fields, we must support
> the Namespace UUID. We do not have any way of storing a persistent
> unique identifier, so conjure up a UUID that is just the namespace id.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c       | 38 ++++++++++++++++++++++++++++++++++++++
>  hw/block/trace-events |  1 +
>  2 files changed, 39 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 16de3ca1c5d5..007f8817f101 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -942,6 +942,42 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
>      return ret;
>  }
>  
> +static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
> +{
> +    uint32_t nsid = le32_to_cpu(c->nsid);
> +    uint64_t prp1 = le64_to_cpu(c->prp1);
> +    uint64_t prp2 = le64_to_cpu(c->prp2);
> +
> +    void *list;
> +    uint16_t ret;
> +    NvmeIdNsDescr *ns_descr;
> +
> +    trace_nvme_dev_identify_ns_descr_list(nsid);
> +
> +    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> +        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
> +        return NVME_INVALID_NSID | NVME_DNR;
> +    }
> +
> +    list = g_malloc0(NVME_IDENTIFY_DATA_SIZE);
> +    ns_descr = list;
> +
> +    /*
> +     * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data
> +     * structure, a Namespace UUID (nidt = 0x3) must be reported in the
> +     * Namespace Identification Descriptor. Add a very basic Namespace UUID
> +     * here.
> +     */
> +    ns_descr->nidt = NVME_NIDT_UUID;
> +    ns_descr->nidl = NVME_NIDT_UUID_LEN;
> +    stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
> +
> +    ret = nvme_dma_read_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
> +                            prp2);
> +    g_free(list);
> +    return ret;
> +}
> +
>  static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>  {
>      NvmeIdentify *c = (NvmeIdentify *)cmd;
> @@ -953,6 +989,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
>          return nvme_identify_ctrl(n, c);
>      case NVME_ID_CNS_NS_ACTIVE_LIST:
>          return nvme_identify_nslist(n, c);
> +    case NVME_ID_CNS_NS_DESCR_LIST:
> +        return nvme_identify_ns_descr_list(n, c);
>      default:
>          trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
>          return NVME_INVALID_FIELD | NVME_DNR;
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 13e2c71664f6..4cde0844ef64 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -41,6 +41,7 @@ nvme_dev_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
>  nvme_dev_identify_ctrl(void) "identify controller"
>  nvme_dev_identify_ns(uint32_t ns) "nsid %"PRIu32""
>  nvme_dev_identify_nslist(uint32_t ns) "nsid %"PRIu32""
> +nvme_dev_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32""
>  nvme_dev_getfeat(uint16_t cid, uint32_t fid) "cid %"PRIu16" fid 0x%"PRIx32""
>  nvme_dev_setfeat(uint16_t cid, uint32_t fid, uint32_t val) "cid %"PRIu16" fid 0x%"PRIx32" val 0x%"PRIx32""
>  nvme_dev_getfeat_vwcache(const char* result) "get feature volatile write cache, result=%s"

I think that we should add namespace uuid as a device parameter,
but its OK to do this in follow up patch.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 19/42] nvme: enforce valid queue creation sequence
  2020-03-16 14:29 ` [PATCH v6 19/42] nvme: enforce valid queue creation sequence Klaus Jensen
@ 2020-03-25 10:43   ` Maxim Levitsky
  2020-03-31  5:41     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:43 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Support returning Command Sequence Error if Set Features on Number of
> Queues is called after queues have been created.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 7 +++++++
>  hw/block/nvme.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 007f8817f101..b40d27cddc46 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -881,6 +881,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
>      cq = g_malloc0(sizeof(*cq));
>      nvme_init_cq(cq, n, prp1, cqid, vector, qsize + 1,
>          NVME_CQ_FLAGS_IEN(qflags));
> +
> +    n->qs_created = true;
Very minor nitpick, maybe it is worth mentioning in a comment,
why this is only needed in CQ creation, as you explained to me.


>      return NVME_SUCCESS; 
>  }
>  
> @@ -1194,6 +1196,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> +        if (n->qs_created) {
> +            return NVME_CMD_SEQ_ERROR | NVME_DNR;
> +        }
> +
>          /*
>           * NVMe v1.3, Section 5.21.1.7: 0xffff is not an allowed value for NCQR
>           * and NSQR.
> @@ -1332,6 +1338,7 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
>  
>      n->aer_queued = 0;
>      n->outstanding_aers = 0;
> +    n->qs_created = false;
>  
>      blk_flush(n->conf.blk);
>      n->bar.cc = 0;
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index b709a8bb8d40..b4d1738a3d0a 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -99,6 +99,7 @@ typedef struct NvmeCtrl {
>      BlockConf    conf;
>      NvmeParams   params;
>  
> +    bool        qs_created;
>      uint32_t    page_size;
>      uint16_t    page_bits;
>      uint16_t    max_prp_ents;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 20/42] nvme: provide the mandatory subnqn field
  2020-03-16 14:29 ` [PATCH v6 20/42] nvme: provide the mandatory subnqn field Klaus Jensen
@ 2020-03-25 10:43   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:43 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index b40d27cddc46..74061d08fd2e 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1925,6 +1925,9 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->nn = cpu_to_le32(n->num_namespaces);
>      id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
>  
> +    pstrcpy((char *) id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:");
> +    pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
> +
>      id->psd[0].mp = cpu_to_le16(0x9c4);
>      id->psd[0].enlat = cpu_to_le32(0x10);
>      id->psd[0].exlat = cpu_to_le32(0x4);
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 21/42] nvme: bump supported version to v1.3
  2020-03-16 14:29 ` [PATCH v6 21/42] nvme: bump supported version to v1.3 Klaus Jensen
@ 2020-03-25 10:44   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:44 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 74061d08fd2e..26c4b6e69f72 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -44,6 +44,7 @@
>  #include "trace.h"
>  #include "nvme.h"
>  
> +#define NVME_SPEC_VER 0x00010300
>  #define NVME_CMB_BIR 2
>  #define NVME_TEMPERATURE 0x143
>  #define NVME_TEMPERATURE_WARNING 0x157
> @@ -1898,6 +1899,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->ieee[0] = 0x00;
>      id->ieee[1] = 0x02;
>      id->ieee[2] = 0xb3;
> +    id->ver = cpu_to_le32(NVME_SPEC_VER);
>      id->oacs = cpu_to_le16(0);
>  
>      /*
> @@ -1942,7 +1944,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      NVME_CAP_SET_CSS(n->bar.cap, 1);
>      NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
>  
> -    n->bar.vs = 0x00010200;
> +    n->bar.vs = NVME_SPEC_VER;
>      n->bar.intmc = n->bar.intms = 0;
>  }
>  
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 22/42] nvme: memset preallocated requests structures
  2020-03-16 14:29 ` [PATCH v6 22/42] nvme: memset preallocated requests structures Klaus Jensen
@ 2020-03-25 10:44   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:44 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> This is preparatory to subsequent patches that change how QSGs/IOVs are
> handled. It is important that the qsg and iov members of the NvmeRequest
> are initially zeroed.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 26c4b6e69f72..08267e847671 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -597,7 +597,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
>      sq->size = size;
>      sq->cqid = cqid;
>      sq->head = sq->tail = 0;
> -    sq->io_req = g_new(NvmeRequest, sq->size);
> +    sq->io_req = g_new0(NvmeRequest, sq->size);
>  
>      QTAILQ_INIT(&sq->req_list);
>      QTAILQ_INIT(&sq->out_req_list);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 23/42] nvme: add mapping helpers
  2020-03-16 14:29 ` [PATCH v6 23/42] nvme: add mapping helpers Klaus Jensen
@ 2020-03-25 10:45   ` Maxim Levitsky
  2020-03-31  5:44     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:45 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Add nvme_map_addr, nvme_map_addr_cmb and nvme_addr_to_cmb helpers and
> use them in nvme_map_prp.
> 
> This fixes a bug where in the case of a CMB transfer, the device would
> map to the buffer with a wrong length.
> 
> Fixes: b2b2b67a00574 ("nvme: Add support for Read Data and Write Data in CMBs.")
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c       | 97 +++++++++++++++++++++++++++++++++++--------
>  hw/block/trace-events |  1 +
>  2 files changed, 81 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 08267e847671..187c816eb6ad 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -59,6 +59,11 @@
>  
>  static void nvme_process_sq(void *opaque);
>  
> +static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
> +{
> +    return &n->cmbuf[addr - n->ctrl_mem.addr];
> +}
> +
>  static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>  {
>      hwaddr low = n->ctrl_mem.addr;
> @@ -70,7 +75,7 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
>      if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
> -        memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
> +        memcpy(buf, nvme_addr_to_cmb(n, addr), size);
>          return;
>      }
>  
> @@ -153,29 +158,79 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
>      }
>  }
>  
> +static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
> +                                  size_t len)
> +{
> +    if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> +        return NVME_DATA_TRAS_ERROR;
> +    }

I just noticed that
in theory (not that it really matters) but addr+len refers to the byte which is already 
not the part of the transfer.


> +
> +    qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
Also intersting is we can add 0 sized iovec.


> +
> +    return NVME_SUCCESS;
> +}
> +
> +static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
> +                              hwaddr addr, size_t len)
> +{
> +    if (nvme_addr_is_cmb(n, addr)) {
> +        if (qsg && qsg->sg) {
> +            return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +        }
> +
> +        assert(iov);
> +
> +        if (!iov->iov) {
> +            qemu_iovec_init(iov, 1);
> +        }
> +
> +        return nvme_map_addr_cmb(n, iov, addr, len);
> +    }
> +
> +    if (iov && iov->iov) {
> +        return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +    }
> +
> +    assert(qsg);
> +
> +    if (!qsg->sg) {
> +        pci_dma_sglist_init(qsg, &n->parent_obj, 1);
> +    }
> +
> +    qemu_sglist_add(qsg, addr, len);
> +
> +    return NVME_SUCCESS;
> +}
Looks very good.

> +
>  static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>                               uint64_t prp2, uint32_t len, NvmeCtrl *n)
>  {
>      hwaddr trans_len = n->page_size - (prp1 % n->page_size);
>      trans_len = MIN(len, trans_len);
>      int num_prps = (len >> n->page_bits) + 1;
> +    uint16_t status;
>  
>      if (unlikely(!prp1)) {
>          trace_nvme_dev_err_invalid_prp();
>          return NVME_INVALID_FIELD | NVME_DNR;
> -    } else if (n->bar.cmbsz && prp1 >= n->ctrl_mem.addr &&
> -               prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
> -        qsg->nsg = 0;
> +    }
> +
> +    if (nvme_addr_is_cmb(n, prp1)) {
>          qemu_iovec_init(iov, num_prps);
> -        qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], trans_len);
>      } else {
>          pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
> -        qemu_sglist_add(qsg, prp1, trans_len);
>      }
> +
> +    status = nvme_map_addr(n, qsg, iov, prp1, trans_len);
> +    if (status) {
> +        goto unmap;
> +    }
> +
>      len -= trans_len;
>      if (len) {
>          if (unlikely(!prp2)) {
>              trace_nvme_dev_err_invalid_prp2_missing();
> +            status = NVME_INVALID_FIELD | NVME_DNR;
>              goto unmap;
>          }
>          if (len > n->page_size) {
> @@ -192,6 +247,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>                  if (i == n->max_prp_ents - 1 && len > n->page_size) {
>                      if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
>                          trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
> +                        status = NVME_INVALID_FIELD | NVME_DNR;
>                          goto unmap;
>                      }
>  
> @@ -205,14 +261,14 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>  
>                  if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
>                      trace_nvme_dev_err_invalid_prplist_ent(prp_ent);
> +                    status = NVME_INVALID_FIELD | NVME_DNR;
>                      goto unmap;
>                  }
>  
>                  trans_len = MIN(len, n->page_size);
> -                if (qsg->nsg){
> -                    qemu_sglist_add(qsg, prp_ent, trans_len);
> -                } else {
> -                    qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - n->ctrl_mem.addr], trans_len);
> +                status = nvme_map_addr(n, qsg, iov, prp_ent, trans_len);
> +                if (status) {
> +                    goto unmap;
>                  }
>                  len -= trans_len;
>                  i++;
> @@ -220,20 +276,27 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
>          } else {
>              if (unlikely(prp2 & (n->page_size - 1))) {
>                  trace_nvme_dev_err_invalid_prp2_align(prp2);
> +                status = NVME_INVALID_FIELD | NVME_DNR;
>                  goto unmap;
>              }
> -            if (qsg->nsg) {
> -                qemu_sglist_add(qsg, prp2, len);
> -            } else {
> -                qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - n->ctrl_mem.addr], trans_len);
> +            status = nvme_map_addr(n, qsg, iov, prp2, len);
> +            if (status) {
> +                goto unmap;
>              }
>          }
>      }
>      return NVME_SUCCESS;
>  
> - unmap:
> -    qemu_sglist_destroy(qsg);
> -    return NVME_INVALID_FIELD | NVME_DNR;
> +unmap:
> +    if (iov && iov->iov) {
> +        qemu_iovec_destroy(iov);
> +    }
> +
> +    if (qsg && qsg->sg) {
> +        qemu_sglist_destroy(qsg);
> +    }
> +
> +    return status;
>  }
>  
>  static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 4cde0844ef64..adf11313f956 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -33,6 +33,7 @@ nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
>  nvme_dev_irq_pin(void) "pulsing IRQ pin"
>  nvme_dev_irq_masked(void) "IRQ is masked"
>  nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
> +nvme_dev_map_prp(uint16_t cid, uint8_t opc, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" opc 0x%"PRIx8" trans_len %"PRIu64" len %"PRIu32" prp1
> 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
>  nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
>  nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16""
>  nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16",
> qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"


Looks very good overall,

Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 24/42] nvme: remove redundant has_sg member
  2020-03-16 14:29 ` [PATCH v6 24/42] nvme: remove redundant has_sg member Klaus Jensen
@ 2020-03-25 10:45   ` Maxim Levitsky
  2020-03-31  5:44     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:45 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Remove the has_sg member from NvmeRequest since it's redundant.

To be honest this patch also replaces the dma_acct_start with block_acct_start
which looks right to me, and IMHO its OK to have both in the same patch,
but that should be mentioned in the commit message

With this fixed,
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky

> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 18 ++++++++++++------
>  hw/block/nvme.h |  1 -
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 187c816eb6ad..e40c080c3b48 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -484,16 +484,20 @@ static void nvme_rw_cb(void *opaque, int ret)
>          block_acct_failed(blk_get_stats(n->conf.blk), &req->acct);
>          req->status = NVME_INTERNAL_DEV_ERROR;
>      }
> -    if (req->has_sg) {
> +
> +    if (req->qsg.nalloc) {
>          qemu_sglist_destroy(&req->qsg);
>      }
> +    if (req->iov.nalloc) {
> +        qemu_iovec_destroy(&req->iov);
> +    }
> +
>      nvme_enqueue_req_completion(cq, req);
>  }
>  
>  static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      NvmeRequest *req)
>  {
> -    req->has_sg = false;
>      block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
>           BLOCK_ACCT_FLUSH);
>      req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
> @@ -517,7 +521,6 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>          return NVME_LBA_RANGE | NVME_DNR;
>      }
>  
> -    req->has_sg = false;
>      block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
>                       BLOCK_ACCT_WRITE);
>      req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
> @@ -554,16 +557,19 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>  
> -    dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct);
>      if (req->qsg.nsg > 0) {
> -        req->has_sg = true;
> +        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->qsg.size,
> +                         acct);
> +
>          req->aiocb = is_write ?
>              dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
>                            nvme_rw_cb, req) :
>              dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
>                           nvme_rw_cb, req);
>      } else {
> -        req->has_sg = false;
> +        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->iov.size,
> +                         acct);
> +
>          req->aiocb = is_write ?
>              blk_aio_pwritev(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
>                              req) :
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index b4d1738a3d0a..442b17bf1701 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -29,7 +29,6 @@ typedef struct NvmeRequest {
>      struct NvmeSQueue       *sq;
>      BlockAIOCB              *aiocb;
>      uint16_t                status;
> -    bool                    has_sg;
>      NvmeCqe                 cqe;
>      BlockAcctCookie         acct;
>      QEMUSGList              qsg;






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 25/42] nvme: refactor dma read/write
  2020-03-16 14:29 ` [PATCH v6 25/42] nvme: refactor dma read/write Klaus Jensen
@ 2020-03-25 10:46   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:46 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Refactor the nvme_dma_{read,write}_prp functions into a common function
> taking a DMADirection parameter.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 89 ++++++++++++++++++++++++-------------------------
>  1 file changed, 43 insertions(+), 46 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index e40c080c3b48..809d00443369 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -299,55 +299,50 @@ unmap:
>      return status;
>  }
>  
> -static uint16_t nvme_dma_write_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> -                                   uint64_t prp1, uint64_t prp2)
> +static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> +                             uint64_t prp1, uint64_t prp2, DMADirection dir)
>  {
>      QEMUSGList qsg;
>      QEMUIOVector iov;
>      uint16_t status = NVME_SUCCESS;
>  
> -    if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
> -        return NVME_INVALID_FIELD | NVME_DNR;
> +    status = nvme_map_prp(&qsg, &iov, prp1, prp2, len, n);
> +    if (status) {
> +        return status;
>      }
> -    if (qsg.nsg > 0) {
> -        if (dma_buf_write(ptr, len, &qsg)) {
> -            status = NVME_INVALID_FIELD | NVME_DNR;
> -        }
> -        qemu_sglist_destroy(&qsg);
> -    } else {
> -        if (qemu_iovec_to_buf(&iov, 0, ptr, len) != len) {
> -            status = NVME_INVALID_FIELD | NVME_DNR;
> -        }
> -        qemu_iovec_destroy(&iov);
> -    }
> -    return status;
> -}
>  
> -static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> -    uint64_t prp1, uint64_t prp2)
> -{
> -    QEMUSGList qsg;
> -    QEMUIOVector iov;
> -    uint16_t status = NVME_SUCCESS;
> +    if (qsg.nsg > 0) {
> +        uint64_t residual;
>  
> -    trace_nvme_dev_dma_read(prp1, prp2);
> +        if (dir == DMA_DIRECTION_TO_DEVICE) {
> +            residual = dma_buf_write(ptr, len, &qsg);
> +        } else {
> +            residual = dma_buf_read(ptr, len, &qsg);
> +        }
>  
> -    if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
> -        return NVME_INVALID_FIELD | NVME_DNR;
> -    }
> -    if (qsg.nsg > 0) {
> -        if (unlikely(dma_buf_read(ptr, len, &qsg))) {
> +        if (unlikely(residual)) {
>              trace_nvme_dev_err_invalid_dma();
>              status = NVME_INVALID_FIELD | NVME_DNR;
>          }
> +
>          qemu_sglist_destroy(&qsg);
>      } else {
> -        if (unlikely(qemu_iovec_from_buf(&iov, 0, ptr, len) != len)) {
> +        size_t bytes;
> +
> +        if (dir == DMA_DIRECTION_TO_DEVICE) {
> +            bytes = qemu_iovec_to_buf(&iov, 0, ptr, len);
> +        } else {
> +            bytes = qemu_iovec_from_buf(&iov, 0, ptr, len);
> +        }
> +
> +        if (unlikely(bytes != len)) {
>              trace_nvme_dev_err_invalid_dma();
>              status = NVME_INVALID_FIELD | NVME_DNR;
>          }
> +
>          qemu_iovec_destroy(&iov);
>      }
> +
>      return status;
>  }
>  
> @@ -775,8 +770,8 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>          nvme_clear_events(n, NVME_AER_TYPE_SMART);
>      }
>  
> -    return nvme_dma_read_prp(n, (uint8_t *) &smart + off, trans_len, prp1,
> -                             prp2);
> +    return nvme_dma_prp(n, (uint8_t *) &smart + off, trans_len, prp1, prp2,
> +                        DMA_DIRECTION_FROM_DEVICE);
>  }
>  
>  static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> @@ -795,8 +790,8 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
>  
>      trans_len = MIN(sizeof(fw_log) - off, buf_len);
>  
> -    return nvme_dma_read_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1,
> -                             prp2);
> +    return nvme_dma_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1, prp2,
> +                        DMA_DIRECTION_FROM_DEVICE);
>  }
>  
>  static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
> @@ -820,7 +815,8 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>  
>      trans_len = MIN(sizeof(errlog) - off, buf_len);
>  
> -    return nvme_dma_read_prp(n, errlog, trans_len, prp1, prp2);
> +    return nvme_dma_prp(n, errlog, trans_len, prp1, prp2,
> +                        DMA_DIRECTION_FROM_DEVICE);
>  }
>  
>  static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> @@ -963,8 +959,8 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
>  
>      trace_nvme_dev_identify_ctrl();
>  
> -    return nvme_dma_read_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
> -        prp1, prp2);
> +    return nvme_dma_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), prp1,
> +                        prp2, DMA_DIRECTION_FROM_DEVICE);
>  }
>  
>  static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
> @@ -983,8 +979,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
>  
>      ns = &n->namespaces[nsid - 1];
>  
> -    return nvme_dma_read_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns),
> -        prp1, prp2);
> +    return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
> +                        prp2, DMA_DIRECTION_FROM_DEVICE);
>  }
>  
>  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
> @@ -1009,7 +1005,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
>              break;
>          }
>      }
> -    ret = nvme_dma_read_prp(n, (uint8_t *)list, data_len, prp1, prp2);
> +    ret = nvme_dma_prp(n, (uint8_t *)list, data_len, prp1, prp2,
> +                       DMA_DIRECTION_FROM_DEVICE);
>      g_free(list);
>      return ret;
>  }
> @@ -1044,8 +1041,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
>      ns_descr->nidl = NVME_NIDT_UUID_LEN;
>      stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
>  
> -    ret = nvme_dma_read_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
> -                            prp2);
> +    ret = nvme_dma_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
> +                       prp2, DMA_DIRECTION_FROM_DEVICE);
>      g_free(list);
>      return ret;
>  }
> @@ -1128,8 +1125,8 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>  
>      uint64_t timestamp = nvme_get_timestamp(n);
>  
> -    return nvme_dma_read_prp(n, (uint8_t *)&timestamp,
> -                                 sizeof(timestamp), prp1, prp2);
> +    return nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
> +                        prp2, DMA_DIRECTION_FROM_DEVICE);
>  }
>  
>  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> @@ -1214,8 +1211,8 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>      uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
>      uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>  
> -    ret = nvme_dma_write_prp(n, (uint8_t *)&timestamp,
> -                                sizeof(timestamp), prp1, prp2);
> +    ret = nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
> +                       prp2, DMA_DIRECTION_TO_DEVICE);
>      if (ret != NVME_SUCCESS) {
>          return ret;
>      }


Looks OK to me.
It was a bit difficult to read the diff, so I also read the code after it was applied.
I hope I didn't miss anything.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 26/42] nvme: pass request along for tracing
  2020-03-16 14:29 ` [PATCH v6 26/42] nvme: pass request along for tracing Klaus Jensen
@ 2020-03-25 10:55   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:55 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c       | 67 +++++++++++++++++++++++++------------------
>  hw/block/trace-events |  2 +-
>  2 files changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 809d00443369..3e9c2ed434c2 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -202,14 +202,18 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>      return NVME_SUCCESS;
>  }
>  
> -static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
> -                             uint64_t prp2, uint32_t len, NvmeCtrl *n)
> +static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
> +                             uint64_t prp1, uint64_t prp2, uint32_t len,
> +                             NvmeRequest *req)
>  {
>      hwaddr trans_len = n->page_size - (prp1 % n->page_size);
>      trans_len = MIN(len, trans_len);
>      int num_prps = (len >> n->page_bits) + 1;
>      uint16_t status;
>  
> +    trace_nvme_dev_map_prp(nvme_cid(req), trans_len, len, prp1, prp2,
> +                           num_prps);
> +
>      if (unlikely(!prp1)) {
>          trace_nvme_dev_err_invalid_prp();
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -300,13 +304,14 @@ unmap:
>  }
>  
>  static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> -                             uint64_t prp1, uint64_t prp2, DMADirection dir)
> +                             uint64_t prp1, uint64_t prp2, DMADirection dir,
> +                             NvmeRequest *req)
>  {
>      QEMUSGList qsg;
>      QEMUIOVector iov;
>      uint16_t status = NVME_SUCCESS;
>  
> -    status = nvme_map_prp(&qsg, &iov, prp1, prp2, len, n);
> +    status = nvme_map_prp(n, &qsg, &iov, prp1, prp2, len, req);
>      if (status) {
>          return status;
>      }
> @@ -547,7 +552,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>          return NVME_LBA_RANGE | NVME_DNR;
>      }
>  
> -    if (nvme_map_prp(&req->qsg, &req->iov, prp1, prp2, data_size, n)) {
> +    if (nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, data_size, req)) {
>          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
> @@ -771,7 +776,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>      }
>  
>      return nvme_dma_prp(n, (uint8_t *) &smart + off, trans_len, prp1, prp2,
> -                        DMA_DIRECTION_FROM_DEVICE);
> +                        DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> @@ -791,7 +796,7 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
>      trans_len = MIN(sizeof(fw_log) - off, buf_len);
>  
>      return nvme_dma_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1, prp2,
> -                        DMA_DIRECTION_FROM_DEVICE);
> +                        DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
> @@ -816,7 +821,7 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>      trans_len = MIN(sizeof(errlog) - off, buf_len);
>  
>      return nvme_dma_prp(n, errlog, trans_len, prp1, prp2,
> -                        DMA_DIRECTION_FROM_DEVICE);
> +                        DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> @@ -952,7 +957,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
>      return NVME_SUCCESS;
>  }
>  
> -static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
> +static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
> +                                   NvmeRequest *req)
>  {
>      uint64_t prp1 = le64_to_cpu(c->prp1);
>      uint64_t prp2 = le64_to_cpu(c->prp2);
> @@ -960,10 +966,11 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c)
>      trace_nvme_dev_identify_ctrl();
>  
>      return nvme_dma_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), prp1,
> -                        prp2, DMA_DIRECTION_FROM_DEVICE);
> +                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
> -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
> +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
> +                                 NvmeRequest *req)
>  {
>      NvmeNamespace *ns;
>      uint32_t nsid = le32_to_cpu(c->nsid);
> @@ -980,10 +987,11 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
>      ns = &n->namespaces[nsid - 1];
>  
>      return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
> -                        prp2, DMA_DIRECTION_FROM_DEVICE);
> +                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
> -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
> +static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
> +                                     NvmeRequest *req)
>  {
>      static const int data_len = NVME_IDENTIFY_DATA_SIZE;
>      uint32_t min_nsid = le32_to_cpu(c->nsid);
> @@ -1006,12 +1014,13 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
>          }
>      }
>      ret = nvme_dma_prp(n, (uint8_t *)list, data_len, prp1, prp2,
> -                       DMA_DIRECTION_FROM_DEVICE);
> +                       DMA_DIRECTION_FROM_DEVICE, req);
>      g_free(list);
>      return ret;
>  }
>  
> -static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
> +static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
> +                                            NvmeRequest *req)
>  {
>      uint32_t nsid = le32_to_cpu(c->nsid);
>      uint64_t prp1 = le64_to_cpu(c->prp1);
> @@ -1042,24 +1051,24 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c)
>      stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
>  
>      ret = nvme_dma_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
> -                       prp2, DMA_DIRECTION_FROM_DEVICE);
> +                       prp2, DMA_DIRECTION_FROM_DEVICE, req);
>      g_free(list);
>      return ret;
>  }
>  
> -static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
> +static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
>      NvmeIdentify *c = (NvmeIdentify *)cmd;
>  
>      switch (le32_to_cpu(c->cns)) {
>      case NVME_ID_CNS_NS:
> -        return nvme_identify_ns(n, c);
> +        return nvme_identify_ns(n, c, req);
>      case NVME_ID_CNS_CTRL:
> -        return nvme_identify_ctrl(n, c);
> +        return nvme_identify_ctrl(n, c, req);
>      case NVME_ID_CNS_NS_ACTIVE_LIST:
> -        return nvme_identify_nslist(n, c);
> +        return nvme_identify_nslist(n, c, req);
>      case NVME_ID_CNS_NS_DESCR_LIST:
> -        return nvme_identify_ns_descr_list(n, c);
> +        return nvme_identify_ns_descr_list(n, c, req);
>      default:
>          trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -1118,7 +1127,8 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
>      return cpu_to_le64(ts.all);
>  }
>  
> -static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
> +static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
> +                                           NvmeRequest *req)
>  {
>      uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
>      uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> @@ -1126,7 +1136,7 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>      uint64_t timestamp = nvme_get_timestamp(n);
>  
>      return nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
> -                        prp2, DMA_DIRECTION_FROM_DEVICE);
> +                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> @@ -1178,7 +1188,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          trace_nvme_dev_getfeat_numq(result);
>          break;
>      case NVME_TIMESTAMP:
> -        return nvme_get_feature_timestamp(n, cmd);
> +        return nvme_get_feature_timestamp(n, cmd, req);
>      case NVME_INTERRUPT_COALESCING:
>          result = cpu_to_le32(n->features.int_coalescing);
>          break;
> @@ -1204,7 +1214,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      return NVME_SUCCESS;
>  }
>  
> -static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
> +static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
> +                                           NvmeRequest *req)
>  {
>      uint16_t ret;
>      uint64_t timestamp;
> @@ -1212,7 +1223,7 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd)
>      uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>  
>      ret = nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
> -                       prp2, DMA_DIRECTION_TO_DEVICE);
> +                       prp2, DMA_DIRECTION_TO_DEVICE, req);
>      if (ret != NVME_SUCCESS) {
>          return ret;
>      }
> @@ -1283,7 +1294,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>                                        ((n->params.max_ioqpairs - 1) << 16));
>          break;
>      case NVME_TIMESTAMP:
> -        return nvme_set_feature_timestamp(n, cmd);
> +        return nvme_set_feature_timestamp(n, cmd, req);
>      case NVME_ASYNCHRONOUS_EVENT_CONF:
>          n->features.async_config = dw11;
>          break;
> @@ -1334,7 +1345,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      case NVME_ADM_CMD_CREATE_CQ:
>          return nvme_create_cq(n, cmd);
>      case NVME_ADM_CMD_IDENTIFY:
> -        return nvme_identify(n, cmd);
> +        return nvme_identify(n, cmd, req);
>      case NVME_ADM_CMD_ABORT:
>          return nvme_abort(n, cmd, req);
>      case NVME_ADM_CMD_SET_FEATURES:
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index adf11313f956..e31e652fa04e 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -33,7 +33,7 @@ nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
>  nvme_dev_irq_pin(void) "pulsing IRQ pin"
>  nvme_dev_irq_masked(void) "IRQ is masked"
>  nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
> -nvme_dev_map_prp(uint16_t cid, uint8_t opc, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" opc 0x%"PRIx8" trans_len %"PRIu64" len %"PRIu32" prp1
> 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
> +nvme_dev_map_prp(uint16_t cid, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64"
> num_prps %d"
>  nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
>  nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16""
>  nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16",
> qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"


Another very easy to review patch

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 27/42] nvme: add request mapping helper
  2020-03-16 14:29 ` [PATCH v6 27/42] nvme: add request mapping helper Klaus Jensen
@ 2020-03-25 10:56   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:56 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Introduce the nvme_map helper to remove some noise in the main nvme_rw
> function.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 3e9c2ed434c2..850087aac967 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -351,6 +351,15 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>      return status;
>  }
>  
> +static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
> +                         QEMUIOVector *iov, size_t len, NvmeRequest *req)
> +{
> +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> +
> +    return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
> +}
> +
>  static void nvme_post_cqes(void *opaque)
>  {
>      NvmeCQueue *cq = opaque;
> @@ -534,8 +543,6 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
>      uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
>      uint64_t slba = le64_to_cpu(rw->slba);
> -    uint64_t prp1 = le64_to_cpu(rw->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(rw->dptr.prp2);
>  
>      uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
>      uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
> @@ -552,7 +559,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>          return NVME_LBA_RANGE | NVME_DNR;
>      }
>  
> -    if (nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, data_size, req)) {
> +    if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
>          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 28/42] nvme: verify validity of prp lists in the cmb
  2020-03-16 14:29 ` [PATCH v6 28/42] nvme: verify validity of prp lists in the cmb Klaus Jensen
@ 2020-03-25 10:56   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:56 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Before this patch the device already supported this, but it did not
> check for the validity of it nor announced the support in the LISTS
> field.
> 
> If some of the PRPs in a PRP list are in the CMB, then ALL entries must
> be there. This patch makes sure that is verified as well as properly
> announcing support for PRP lists in the CMB.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 850087aac967..eecfad694bf8 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -210,6 +210,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>      trans_len = MIN(len, trans_len);
>      int num_prps = (len >> n->page_bits) + 1;
>      uint16_t status;
> +    bool prp_list_in_cmb = false;
>  
>      trace_nvme_dev_map_prp(nvme_cid(req), trans_len, len, prp1, prp2,
>                             num_prps);
> @@ -237,11 +238,16 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>              status = NVME_INVALID_FIELD | NVME_DNR;
>              goto unmap;
>          }
> +
>          if (len > n->page_size) {
>              uint64_t prp_list[n->max_prp_ents];
>              uint32_t nents, prp_trans;
>              int i = 0;
>  
> +            if (nvme_addr_is_cmb(n, prp2)) {
> +                prp_list_in_cmb = true;
> +            }
> +
>              nents = (len + n->page_size - 1) >> n->page_bits;
>              prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
>              nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
> @@ -255,6 +261,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>                          goto unmap;
>                      }
>  
> +                    if (prp_list_in_cmb != nvme_addr_is_cmb(n, prp_ent)) {
> +                        status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +                        goto unmap;
> +                    }
> +
>                      i = 0;
>                      nents = (len + n->page_size - 1) >> n->page_bits;
>                      prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> @@ -274,6 +285,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>                  if (status) {
>                      goto unmap;
>                  }
> +
>                  len -= trans_len;
>                  i++;
>              }
> @@ -1931,7 +1943,7 @@ static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
>  
>      NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
>      NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
> -    NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
> +    NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 1);
>      NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
>      NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
>      NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 29/42] nvme: refactor request bounds checking
  2020-03-16 14:29 ` [PATCH v6 29/42] nvme: refactor request bounds checking Klaus Jensen
@ 2020-03-25 10:56   ` Maxim Levitsky
  2020-03-31  5:44     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:56 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index eecfad694bf8..ba520c76bae5 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -491,6 +491,20 @@ static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
>      }
>  }
>  
> +static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> +                                         uint64_t slba, uint32_t nlb,
> +                                         NvmeRequest *req)
> +{
> +    uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
> +
> +    if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
> +        trace_nvme_dev_err_invalid_lba_range(slba, nlb, nsze);
> +        return NVME_LBA_RANGE | NVME_DNR;
> +    }
> +
> +    return NVME_SUCCESS;
> +}
Looks good.

> +
>  static void nvme_rw_cb(void *opaque, int ret)
>  {
>      NvmeRequest *req = opaque;
> @@ -536,10 +550,11 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
>      uint64_t offset = slba << data_shift;
>      uint32_t count = nlb << data_shift;
> +    uint16_t status;
>  
> -    if (unlikely(slba + nlb > ns->id_ns.nsze)) {
> -        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
> -        return NVME_LBA_RANGE | NVME_DNR;
> +    status = nvme_check_bounds(n, ns, slba, nlb, req);
> +    if (status) {
> +        return status;
>      }
>  
>      block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
> @@ -562,13 +577,14 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      uint64_t data_offset = slba << data_shift;
>      int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
>      enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
> +    uint16_t status;
>  
>      trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
>  
> -    if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
> +    status = nvme_check_bounds(n, ns, slba, nlb, req);
> +    if (status) {
>          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> -        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
> -        return NVME_LBA_RANGE | NVME_DNR;
> +        return status;
>      }
>  
>      if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
Looks good as well, once we get support for discard, it will
use this as well, but for now indeed only write zeros and read/write
need bounds checking on the IO path.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 30/42] nvme: add check for mdts
  2020-03-16 14:29 ` [PATCH v6 30/42] nvme: add check for mdts Klaus Jensen
@ 2020-03-25 10:57   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:57 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Add 'mdts' device parameter to control the Maximum Data Transfer Size of
> the controller and check that it is respected.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c       | 29 ++++++++++++++++++++++++++++-
>  hw/block/nvme.h       |  4 +++-
>  hw/block/trace-events |  1 +
>  3 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ba520c76bae5..7d5340c272c6 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -19,7 +19,8 @@
>   *      -drive file=<file>,if=none,id=<drive_id>
>   *      -device nvme,drive=<drive_id>,serial=<serial>,id=<id[optional]>, \
>   *              cmb_size_mb=<cmb_size_mb[optional]>, \
> - *              max_ioqpairs=<N[optional]>
> + *              max_ioqpairs=<N[optional]>, \
> + *              mdts=<mdts[optional]>
>   *
>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> @@ -491,6 +492,19 @@ static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
>      }
>  }
>  
> +static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
> +                                       NvmeRequest *req)
> +{
> +    uint8_t mdts = n->params.mdts;
> +
> +    if (mdts && len > n->page_size << mdts) {
> +        trace_nvme_dev_err_mdts(nvme_cid(req), n->page_size << mdts, len);
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    return NVME_SUCCESS;
> +}
> +
>  static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
>                                           uint64_t slba, uint32_t nlb,
>                                           NvmeRequest *req)
> @@ -581,6 +595,12 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>  
>      trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
>  
> +    status = nvme_check_mdts(n, data_size, req);
> +    if (status) {
> +        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> +        return status;
> +    }
> +
>      status = nvme_check_bounds(n, ns, slba, nlb, req);
>      if (status) {
>          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> @@ -871,6 +891,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      uint32_t numdl, numdu;
>      uint64_t off, lpol, lpou;
>      size_t   len;
> +    uint16_t status;
>  
>      numdl = (dw10 >> 16);
>      numdu = (dw11 & 0xffff);
> @@ -886,6 +907,11 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  
>      trace_nvme_dev_get_log(nvme_cid(req), lid, lsp, rae, len, off);
>  
> +    status = nvme_check_mdts(n, len, req);
> +    if (status) {
> +        return status;
> +    }
> +
>      switch (lid) {
>      case NVME_LOG_ERROR_INFO:
>          return nvme_error_info(n, cmd, rae, len, off, req);
> @@ -2011,6 +2037,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->ieee[0] = 0x00;
>      id->ieee[1] = 0x02;
>      id->ieee[2] = 0xb3;
> +    id->mdts = params->mdts;
>      id->ver = cpu_to_le32(NVME_SPEC_VER);
>      id->oacs = cpu_to_le16(0);
>  
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 442b17bf1701..b05c2153aebf 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -9,7 +9,8 @@
>      DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 0), \
>      DEFINE_PROP_UINT32("max_ioqpairs", _state, _props.max_ioqpairs, 64), \
>      DEFINE_PROP_UINT8("aerl", _state, _props.aerl, 3), \
> -    DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64)
> +    DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64), \
> +    DEFINE_PROP_UINT8("mdts", _state, _props.mdts, 7)
>  
>  typedef struct NvmeParams {
>      char     *serial;
> @@ -18,6 +19,7 @@ typedef struct NvmeParams {
>      uint32_t cmb_size_mb;
>      uint8_t  aerl;
>      uint32_t aer_max_queued;
> +    uint8_t  mdts;
>  } NvmeParams;
>  
>  typedef struct NvmeAsyncEvent {
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index e31e652fa04e..2df6aa38df1b 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -79,6 +79,7 @@ nvme_dev_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_
>  nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_tail %"PRIu16""
>  
>  # nvme traces for error conditions
> +nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
>  nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
>  nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
>  nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 31/42] nvme: add check for prinfo
  2020-03-16 14:29 ` [PATCH v6 31/42] nvme: add check for prinfo Klaus Jensen
@ 2020-03-25 10:57   ` Maxim Levitsky
  2020-03-31  5:45     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:57 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Check the validity of the PRINFO field.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c       | 50 ++++++++++++++++++++++++++++++++++++-------
>  hw/block/trace-events |  1 +
>  include/block/nvme.h  |  1 +
>  3 files changed, 44 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 7d5340c272c6..0d2b5b45b0c5 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -505,6 +505,17 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
>      return NVME_SUCCESS;
>  }
>  
> +static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> +                                         uint16_t ctrl, NvmeRequest *req)
> +{
> +    if ((ctrl & NVME_RW_PRINFO_PRACT) && !(ns->id_ns.dps & DPS_TYPE_MASK)) {
> +        trace_nvme_dev_err_prinfo(nvme_cid(req), ctrl);
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }

I refreshed my (still very limited) knowelege on the metadata
and the protection info, and this is what I found:

I think that this is very far from complete, because we also have:

1. PRCHECK. According to the spec it is independent of PRACT
   And when some of it is set, 
   together with enabled protection (the DPS field in namespace),
   Then the 8 bytes of the protection info is checked (optionally using the
   the EILBRT and ELBAT/ELBATM fields in the command and CRC of the data for the guard field)

   So this field should also be checked to be zero when protection is disabled
   (I don't see an explicit requirement for that in the spec, but neither I see
   such requirement for PRACT)

2. The protection values to be written / checked ((E)ILBRT/(E)LBATM/(E)LBAT)
   Same here, but also these should not be set when PRCHECK is not set for reads,
   plus some are protection type specific.


The spec does mention the 'Invalid Protection Information' error code which
refers to invalid values in the PRINFO field.
So this error code I think should be returned instead of the 'Invalid field'

Another thing to optionaly check is that the metadata pointer for separate metadata.
 Is zero as long as we don't support metadata
(again I don't see an explicit requirement for this in the spec, but it mentions:

"This field is valid only if the command has metadata that is not interleaved with
the logical block data, as specified in the Format NVM command"

)


> +
> +    return NVME_SUCCESS;
> +}
> +
>  static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
>                                           uint64_t slba, uint32_t nlb,
>                                           NvmeRequest *req)
> @@ -564,11 +575,22 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
>      uint64_t offset = slba << data_shift;
>      uint32_t count = nlb << data_shift;
> +    uint16_t ctrl = le16_to_cpu(rw->control);
>      uint16_t status;
>  
> +    status = nvme_check_prinfo(n, ns, ctrl, req);
> +    if (status) {
> +        goto invalid;
> +    }
> +
> +    if (ctrl & NVME_RW_PRINFO_PRCHK_MASK) {
> +        status = NVME_INVALID_PROT_INFO | NVME_DNR;
> +        goto invalid;
> +    }
> +
>      status = nvme_check_bounds(n, ns, slba, nlb, req);
>      if (status) {
> -        return status;
> +        goto invalid;
>      }
>  
>      block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
> @@ -576,6 +598,10 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
>                                          BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
>      return NVME_NO_COMPLETE;
> +
> +invalid:
> +    block_acct_invalid(blk_get_stats(n->conf.blk), BLOCK_ACCT_WRITE);
> +    return status;
>  }
>  
>  static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> @@ -584,6 +610,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
>      uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
>      uint64_t slba = le64_to_cpu(rw->slba);
> +    uint16_t ctrl = le16_to_cpu(rw->control);
>  
>      uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
>      uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
> @@ -597,19 +624,22 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>  
>      status = nvme_check_mdts(n, data_size, req);
>      if (status) {
> -        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> -        return status;
> +        goto invalid;
> +    }
> +
> +    status = nvme_check_prinfo(n, ns, ctrl, req);
> +    if (status) {
> +        goto invalid;
>      }
>  
>      status = nvme_check_bounds(n, ns, slba, nlb, req);
>      if (status) {
> -        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> -        return status;
> +        goto invalid;
>      }
>  
> -    if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
> -        block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> -        return NVME_INVALID_FIELD | NVME_DNR;
> +    status = nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req);
> +    if (status) {
> +        goto invalid;
>      }
>  
>      if (req->qsg.nsg > 0) {
> @@ -633,6 +663,10 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>      }
>  
>      return NVME_NO_COMPLETE;
> +
> +invalid:
> +    block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> +    return status;
>  }
>  
>  static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 2df6aa38df1b..2aceb0537e05 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -80,6 +80,7 @@ nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
>  
>  # nvme traces for error conditions
>  nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
> +nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
>  nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
>  nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
>  nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index ecc02fbe8bb8..293d68553538 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -394,6 +394,7 @@ enum {
>      NVME_RW_PRINFO_PRCHK_GUARD  = 1 << 12,
>      NVME_RW_PRINFO_PRCHK_APP    = 1 << 11,
>      NVME_RW_PRINFO_PRCHK_REF    = 1 << 10,
> +    NVME_RW_PRINFO_PRCHK_MASK   = 7 << 10,
>  };
>  
>  typedef struct NvmeDsmCmd {


Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 32/42] nvme: allow multiple aios per command
  2020-03-16 14:29 ` [PATCH v6 32/42] nvme: allow multiple aios per command Klaus Jensen
@ 2020-03-25 10:57   ` Maxim Levitsky
  2020-03-31  5:47     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:57 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> This refactors how the device issues asynchronous block backend
> requests. The NvmeRequest now holds a queue of NvmeAIOs that are
> associated with the command. This allows multiple aios to be issued for
> a command. Only when all requests have been completed will the device
> post a completion queue entry.
> 
> Because the device is currently guaranteed to only issue a single aio
> request per command, the benefit is not immediately obvious. But this
> functionality is required to support metadata, the dataset management
> command and other features.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c       | 377 +++++++++++++++++++++++++++++++-----------
>  hw/block/nvme.h       | 129 +++++++++++++--
>  hw/block/trace-events |   6 +
>  3 files changed, 407 insertions(+), 105 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 0d2b5b45b0c5..817384e3b1a9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -59,6 +59,7 @@
>      } while (0)
>  
>  static void nvme_process_sq(void *opaque);
> +static void nvme_aio_cb(void *opaque, int ret);
>  
>  static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
>  {
> @@ -373,6 +374,99 @@ static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
>      return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
>  }
>  
> +static void nvme_aio_destroy(NvmeAIO *aio)
> +{
> +    g_free(aio);
> +}
> +
> +static inline void nvme_req_register_aio(NvmeRequest *req, NvmeAIO *aio,
I guess I'll call this nvme_req_add_aio,
or nvme_add_aio_to_reg.
Thoughts?
Also you can leave this as is, but add a comment on top explaining this

> +                                         NvmeAIOOp opc)
> +{
> +    aio->opc = opc;
> +
> +    trace_nvme_dev_req_register_aio(nvme_cid(req), aio, blk_name(aio->blk),
> +                                    aio->offset, aio->len,
> +                                    nvme_aio_opc_str(aio), req);
> +
> +    if (req) {
> +        QTAILQ_INSERT_TAIL(&req->aio_tailq, aio, tailq_entry);
> +    }
> +}
> +
> +static void nvme_submit_aio(NvmeAIO *aio)
OK, this name makes sense
Also please add a comment on top.
> +{
> +    BlockBackend *blk = aio->blk;
> +    BlockAcctCookie *acct = &aio->acct;
> +    BlockAcctStats *stats = blk_get_stats(blk);
> +
> +    bool is_write;
> +
> +    switch (aio->opc) {
> +    case NVME_AIO_OPC_NONE:
> +        break;
> +
> +    case NVME_AIO_OPC_FLUSH:
> +        block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
> +        aio->aiocb = blk_aio_flush(blk, nvme_aio_cb, aio);
> +        break;
> +
> +    case NVME_AIO_OPC_WRITE_ZEROES:
> +        block_acct_start(stats, acct, aio->len, BLOCK_ACCT_WRITE);
> +        aio->aiocb = blk_aio_pwrite_zeroes(blk, aio->offset, aio->len,
> +                                           BDRV_REQ_MAY_UNMAP, nvme_aio_cb,
> +                                           aio);
> +        break;
> +
> +    case NVME_AIO_OPC_READ:
> +    case NVME_AIO_OPC_WRITE:
> +        is_write = (aio->opc == NVME_AIO_OPC_WRITE);
> +
> +        block_acct_start(stats, acct, aio->len,
> +                         is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> +
> +        if (aio->qsg) {
> +            if (is_write) {
> +                aio->aiocb = dma_blk_write(blk, aio->qsg, aio->offset,
> +                                           BDRV_SECTOR_SIZE, nvme_aio_cb, aio);
> +            } else {
> +                aio->aiocb = dma_blk_read(blk, aio->qsg, aio->offset,
> +                                          BDRV_SECTOR_SIZE, nvme_aio_cb, aio);
> +            }
> +        } else {
> +            if (is_write) {
> +                aio->aiocb = blk_aio_pwritev(blk, aio->offset, aio->iov, 0,
> +                                             nvme_aio_cb, aio);
> +            } else {
> +                aio->aiocb = blk_aio_preadv(blk, aio->offset, aio->iov, 0,
> +                                            nvme_aio_cb, aio);
> +            }
> +        }
Looks much better that way that a early return!

> +
> +        break;
> +    }
> +}
> +
> +static void nvme_rw_aio(BlockBackend *blk, uint64_t offset, NvmeRequest *req)
> +{
> +    NvmeAIO *aio;
> +    size_t len = req->qsg.nsg > 0 ? req->qsg.size : req->iov.size;
> +
> +    aio = g_new0(NvmeAIO, 1);
> +
> +    *aio = (NvmeAIO) {
> +        .blk = blk,
> +        .offset = offset,
> +        .len = len,
> +        .req = req,
> +        .qsg = req->qsg.sg ? &req->qsg : NULL,
> +        .iov = req->iov.iov ? &req->iov : NULL,
OK, this is the fix for the bug I mentioned in V5, looks good.

> +    };
> +
> +    nvme_req_register_aio(req, aio, nvme_req_is_write(req) ?
> +                          NVME_AIO_OPC_WRITE : NVME_AIO_OPC_READ);
> +    nvme_submit_aio(aio);
> +}
> +
>  static void nvme_post_cqes(void *opaque)
>  {
>      NvmeCQueue *cq = opaque;
> @@ -396,6 +490,7 @@ static void nvme_post_cqes(void *opaque)
>          nvme_inc_cq_tail(cq);
>          pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
>              sizeof(req->cqe));
> +        nvme_req_clear(req);
>          QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
>      }
>      if (cq->tail != cq->head) {
> @@ -406,8 +501,8 @@ static void nvme_post_cqes(void *opaque)
>  static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
>  {
>      assert(cq->cqid == req->sq->cqid);
> -    trace_nvme_dev_enqueue_req_completion(nvme_cid(req), cq->cqid,
> -                                          req->status);
> +    trace_nvme_dev_enqueue_req_completion(nvme_cid(req), cq->cqid, req->status);
> +
>      QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
>      QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
>      timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
> @@ -505,9 +600,11 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
>      return NVME_SUCCESS;
>  }
>  
> -static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> -                                         uint16_t ctrl, NvmeRequest *req)
> +static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, uint16_t ctrl,
> +                                         NvmeRequest *req)
>  {
> +    NvmeNamespace *ns = req->ns;
> +
This should go to the patch that added nvme_check_prinfo

>      if ((ctrl & NVME_RW_PRINFO_PRACT) && !(ns->id_ns.dps & DPS_TYPE_MASK)) {
>          trace_nvme_dev_err_prinfo(nvme_cid(req), ctrl);
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -516,10 +613,10 @@ static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
>      return NVME_SUCCESS;
>  }
>  
> -static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> -                                         uint64_t slba, uint32_t nlb,
> -                                         NvmeRequest *req)
> +static inline uint16_t nvme_check_bounds(NvmeCtrl *n, uint64_t slba,
> +                                         uint32_t nlb, NvmeRequest *req)
>  {
> +    NvmeNamespace *ns = req->ns;
>      uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
This should go to the patch that added nvme_check_bounds as well

>  
>      if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
> @@ -530,55 +627,154 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
>      return NVME_SUCCESS;
>  }
>  
> -static void nvme_rw_cb(void *opaque, int ret)
> +static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
> +{
> +    NvmeNamespace *ns = req->ns;
> +    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
> +    uint16_t ctrl = le16_to_cpu(rw->control);
> +    size_t len = req->nlb << nvme_ns_lbads(ns);
> +    uint16_t status;
> +
> +    status = nvme_check_mdts(n, len, req);
> +    if (status) {
> +        return status;
> +    }
> +
> +    status = nvme_check_prinfo(n, ctrl, req);
> +    if (status) {
> +        return status;
> +    }
> +
> +    status = nvme_check_bounds(n, req->slba, req->nlb, req);
> +    if (status) {
> +        return status;
> +    }
> +
> +    return NVME_SUCCESS;
> +}

Nitpick: I hate to say it but nvme_check_rw should be in a separate patch as well.
It will also make diff more readable (when adding a funtion and changing a function
at the same time, you get a diff between two unrelated things)


> +
> +static void nvme_rw_cb(NvmeRequest *req, void *opaque)
>  {
> -    NvmeRequest *req = opaque;
>      NvmeSQueue *sq = req->sq;
>      NvmeCtrl *n = sq->ctrl;
>      NvmeCQueue *cq = n->cq[sq->cqid];
>  
> -    if (!ret) {
> -        block_acct_done(blk_get_stats(n->conf.blk), &req->acct);
> -        req->status = NVME_SUCCESS;
> -    } else {
> -        block_acct_failed(blk_get_stats(n->conf.blk), &req->acct);
> -        req->status = NVME_INTERNAL_DEV_ERROR;
> -    }
> -
> -    if (req->qsg.nalloc) {
> -        qemu_sglist_destroy(&req->qsg);
> -    }
> -    if (req->iov.nalloc) {
> -        qemu_iovec_destroy(&req->iov);
> -    }
> +    trace_nvme_dev_rw_cb(nvme_cid(req), req->cmd.nsid);
>  
>      nvme_enqueue_req_completion(cq, req);
>  }
>  
> -static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> -    NvmeRequest *req)
> +static void nvme_aio_cb(void *opaque, int ret)
>  {
> -    block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
> -         BLOCK_ACCT_FLUSH);
> -    req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
> +    NvmeAIO *aio = opaque;
> +    NvmeRequest *req = aio->req;
> +
> +    BlockBackend *blk = aio->blk;
> +    BlockAcctCookie *acct = &aio->acct;
> +    BlockAcctStats *stats = blk_get_stats(blk);
> +
> +    Error *local_err = NULL;
> +
> +    trace_nvme_dev_aio_cb(nvme_cid(req), aio, blk_name(blk), aio->offset,
> +                          nvme_aio_opc_str(aio), req);
> +
> +    if (req) {
> +        QTAILQ_REMOVE(&req->aio_tailq, aio, tailq_entry);
> +    }
> +
> +    if (!ret) {
> +        block_acct_done(stats, acct);
> +    } else {
> +        block_acct_failed(stats, acct);
> +
> +        if (req) {
> +            uint16_t status;
> +
> +            switch (aio->opc) {
> +            case NVME_AIO_OPC_READ:
> +                status = NVME_UNRECOVERED_READ;
> +                break;
> +            case NVME_AIO_OPC_WRITE:
> +            case NVME_AIO_OPC_WRITE_ZEROES:
> +                status = NVME_WRITE_FAULT;
> +                break;
> +            default:
> +                status = NVME_INTERNAL_DEV_ERROR;
> +                break;
> +            }
> +
> +            trace_nvme_dev_err_aio(nvme_cid(req), aio, blk_name(blk),
> +                                   aio->offset, nvme_aio_opc_str(aio), req,
> +                                   status);
> +
> +            error_setg_errno(&local_err, -ret, "aio failed");
> +            error_report_err(local_err);
> +
> +            /*
> +             * An Internal Error trumps all other errors. For other errors,
> +             * only set the first error encountered. Any additional errors will
> +             * be recorded in the error information log page.
> +             */
> +            if (!req->status ||
> +                nvme_status_is_error(status, NVME_INTERNAL_DEV_ERROR)) {
> +                req->status = status;
> +            }
> +        }
> +    }
> +
> +    if (aio->cb) {
> +        aio->cb(aio, aio->cb_arg, ret);
> +    }
> +
> +    if (req && QTAILQ_EMPTY(&req->aio_tailq)) {
> +        if (req->cb) {
> +            req->cb(req, req->cb_arg);
> +        } else {
> +            NvmeSQueue *sq = req->sq;
> +            NvmeCtrl *n = sq->ctrl;
> +            NvmeCQueue *cq = n->cq[sq->cqid];
> +
> +            nvme_enqueue_req_completion(cq, req);
> +        }
> +    }
> +
> +    nvme_aio_destroy(aio);
> +}
> +
> +static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +{
> +    NvmeAIO *aio = g_new0(NvmeAIO, 1);
> +
> +    *aio = (NvmeAIO) {
> +        .blk = n->conf.blk,
> +        .req = req,
> +    };
> +
> +    nvme_req_register_aio(req, aio, NVME_AIO_OPC_FLUSH);
> +    nvme_submit_aio(aio);
>  
>      return NVME_NO_COMPLETE;
>  }
>  
> -static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> -    NvmeRequest *req)
> +static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
Very small nitpick about zeros/zeroes: This should move to some refactoring patch to be honest. 

>  {
> -    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
> -    const uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
> -    const uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
> -    uint64_t slba = le64_to_cpu(rw->slba);
> -    uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
> -    uint64_t offset = slba << data_shift;
> -    uint32_t count = nlb << data_shift;
> +    NvmeAIO *aio;
> +
> +    NvmeNamespace *ns = req->ns;
> +    NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
>      uint16_t ctrl = le16_to_cpu(rw->control);
> +
> +    int64_t offset;
> +    size_t count;
>      uint16_t status;
>  
> -    status = nvme_check_prinfo(n, ns, ctrl, req);
> +    req->slba = le64_to_cpu(rw->slba);
> +    req->nlb  = le16_to_cpu(rw->nlb) + 1;
> +
> +    trace_nvme_dev_write_zeroes(nvme_cid(req), le32_to_cpu(cmd->nsid),
> +                                req->slba, req->nlb);
> +
> +    status = nvme_check_prinfo(n, ctrl, req);
>      if (status) {
>          goto invalid;
>      }
> @@ -588,15 +784,26 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
>          goto invalid;
>      }
>  
> -    status = nvme_check_bounds(n, ns, slba, nlb, req);
> +    status = nvme_check_bounds(n, req->slba, req->nlb, req);
>      if (status) {
>          goto invalid;
>      }
>  
> -    block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
> -                     BLOCK_ACCT_WRITE);
> -    req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
> -                                        BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
> +    offset = req->slba << nvme_ns_lbads(ns);
> +    count = req->nlb << nvme_ns_lbads(ns);
> +
> +    aio = g_new0(NvmeAIO, 1);
> +
> +    *aio = (NvmeAIO) {
> +        .blk = n->conf.blk,
> +        .offset = offset,
> +        .len = count,
> +        .req = req,
> +    };
> +
> +    nvme_req_register_aio(req, aio, NVME_AIO_OPC_WRITE_ZEROES);
> +    nvme_submit_aio(aio);
> +
>      return NVME_NO_COMPLETE;
>  
>  invalid:
> @@ -604,63 +811,36 @@ invalid:
>      return status;
>  }
>  
> -static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> -    NvmeRequest *req)
> +static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
> -    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
> -    uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
> -    uint64_t slba = le64_to_cpu(rw->slba);
> -    uint16_t ctrl = le16_to_cpu(rw->control);
> +    NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
> +    NvmeNamespace *ns = req->ns;
> +    uint32_t len;
> +    int status;
>  
> -    uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
> -    uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
> -    uint64_t data_size = (uint64_t)nlb << data_shift;
> -    uint64_t data_offset = slba << data_shift;
> -    int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
> -    enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
> -    uint16_t status;
> +    enum BlockAcctType acct =
> +        nvme_req_is_write(req) ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
>  
> -    trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
> +    req->nlb  = le16_to_cpu(rw->nlb) + 1;
> +    req->slba = le64_to_cpu(rw->slba);
>  
> -    status = nvme_check_mdts(n, data_size, req);
> -    if (status) {
> -        goto invalid;
> -    }
> +    len = req->nlb << nvme_ns_lbads(ns);
>  
> -    status = nvme_check_prinfo(n, ns, ctrl, req);
> -    if (status) {
> -        goto invalid;
> -    }
> +    trace_nvme_dev_rw(nvme_req_is_write(req) ? "write" : "read", req->nlb,
> +                      req->nlb << nvme_ns_lbads(req->ns), req->slba);
>  
> -    status = nvme_check_bounds(n, ns, slba, nlb, req);
> +    status = nvme_check_rw(n, req);
>      if (status) {
>          goto invalid;
>      }
>  
> -    status = nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req);
> +    status = nvme_map(n, cmd, &req->qsg, &req->iov, len, req);
>      if (status) {
>          goto invalid;
>      }
>  
> -    if (req->qsg.nsg > 0) {
> -        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->qsg.size,
> -                         acct);
> -
> -        req->aiocb = is_write ?
> -            dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
> -                          nvme_rw_cb, req) :
> -            dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
> -                         nvme_rw_cb, req);
> -    } else {
> -        block_acct_start(blk_get_stats(n->conf.blk), &req->acct, req->iov.size,
> -                         acct);
> -
> -        req->aiocb = is_write ?
> -            blk_aio_pwritev(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
> -                            req) :
> -            blk_aio_preadv(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
> -                           req);
> -    }
> +    nvme_rw_aio(n->conf.blk, req->slba << nvme_ns_lbads(ns), req);
> +    nvme_req_set_cb(req, nvme_rw_cb, NULL);
>  
>      return NVME_NO_COMPLETE;
>  
> @@ -671,23 +851,26 @@ invalid:
>  
>  static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
> -    NvmeNamespace *ns;
>      uint32_t nsid = le32_to_cpu(cmd->nsid);
>  
> +    trace_nvme_dev_io_cmd(nvme_cid(req), nsid, le16_to_cpu(req->sq->sqid),
> +                          cmd->opcode);
> +
>      if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
>          trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
>          return NVME_INVALID_NSID | NVME_DNR;
>      }
>  
> -    ns = &n->namespaces[nsid - 1];
> +    req->ns = &n->namespaces[nsid - 1];
> +
>      switch (cmd->opcode) {
>      case NVME_CMD_FLUSH:
> -        return nvme_flush(n, ns, cmd, req);
> +        return nvme_flush(n, cmd, req);
>      case NVME_CMD_WRITE_ZEROS:
> -        return nvme_write_zeros(n, ns, cmd, req);
> +        return nvme_write_zeroes(n, cmd, req);
>      case NVME_CMD_WRITE:
>      case NVME_CMD_READ:
> -        return nvme_rw(n, ns, cmd, req);
> +        return nvme_rw(n, cmd, req);
>      default:
>          trace_nvme_dev_err_invalid_opc(cmd->opcode);
>          return NVME_INVALID_OPCODE | NVME_DNR;
> @@ -711,6 +894,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
>      NvmeRequest *req, *next;
>      NvmeSQueue *sq;
>      NvmeCQueue *cq;
> +    NvmeAIO *aio;
>      uint16_t qid = le16_to_cpu(c->qid);
>  
>      if (unlikely(!qid || nvme_check_sqid(n, qid))) {
> @@ -723,8 +907,11 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeCmd *cmd)
>      sq = n->sq[qid];
>      while (!QTAILQ_EMPTY(&sq->out_req_list)) {
>          req = QTAILQ_FIRST(&sq->out_req_list);
> -        assert(req->aiocb);
> -        blk_aio_cancel(req->aiocb);
> +        while (!QTAILQ_EMPTY(&req->aio_tailq)) {
> +            aio = QTAILQ_FIRST(&req->aio_tailq);
> +            assert(aio->aiocb);
> +            blk_aio_cancel(aio->aiocb);
> +        }
>      }
>      if (!nvme_check_cqid(n, sq->cqid)) {
>          cq = n->cq[sq->cqid];
> @@ -761,6 +948,7 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
>      QTAILQ_INIT(&sq->out_req_list);
>      for (i = 0; i < sq->size; i++) {
>          sq->io_req[i].sq = sq;
> +        QTAILQ_INIT(&(sq->io_req[i].aio_tailq));
>          QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry);
>      }
>      sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
> @@ -1474,8 +1662,9 @@ static void nvme_process_sq(void *opaque)
>          req = QTAILQ_FIRST(&sq->req_list);
>          QTAILQ_REMOVE(&sq->req_list, req, entry);
>          QTAILQ_INSERT_TAIL(&sq->out_req_list, req, entry);
> -        memset(&req->cqe, 0, sizeof(req->cqe));
> +
>          req->cqe.cid = cmd.cid;
> +        memcpy(&req->cmd, &cmd, sizeof(NvmeCmd));
>  
>          status = sq->sqid ? nvme_io_cmd(n, &cmd, req) :
>              nvme_admin_cmd(n, &cmd, req);
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index b05c2153aebf..5d5fa8c8833a 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -27,16 +27,58 @@ typedef struct NvmeAsyncEvent {
>      NvmeAerResult result;
>  } NvmeAsyncEvent;
>  
> -typedef struct NvmeRequest {
> -    struct NvmeSQueue       *sq;
> -    BlockAIOCB              *aiocb;
> -    uint16_t                status;
> -    NvmeCqe                 cqe;
> -    BlockAcctCookie         acct;
> -    QEMUSGList              qsg;
> -    QEMUIOVector            iov;
> -    QTAILQ_ENTRY(NvmeRequest)entry;
> -} NvmeRequest;
> +typedef struct NvmeRequest NvmeRequest;
> +typedef void NvmeRequestCompletionFunc(NvmeRequest *req, void *opaque);
> +
> +struct NvmeRequest {
> +    struct NvmeSQueue    *sq;
> +    struct NvmeNamespace *ns;
> +
> +    NvmeCqe  cqe;
> +    NvmeCmd  cmd;
> +    uint16_t status;
> +
> +    uint64_t slba;
> +    uint32_t nlb;
> +
> +    QEMUSGList   qsg;
> +    QEMUIOVector iov;
> +
> +    NvmeRequestCompletionFunc *cb;
> +    void                      *cb_arg;
> +
> +    QTAILQ_HEAD(, NvmeAIO)    aio_tailq;
> +    QTAILQ_ENTRY(NvmeRequest) entry;
> +};
> +
> +static inline void nvme_req_clear(NvmeRequest *req)
> +{
> +    req->ns = NULL;
> +    memset(&req->cqe, 0, sizeof(req->cqe));
> +    req->status = NVME_SUCCESS;
> +    req->slba = req->nlb = 0x0;
> +    req->cb = req->cb_arg = NULL;
> +
> +    if (req->qsg.sg) {
> +        qemu_sglist_destroy(&req->qsg);
> +    }
> +
> +    if (req->iov.iov) {
> +        qemu_iovec_destroy(&req->iov);
> +    }
> +}
> +
> +static inline void nvme_req_set_cb(NvmeRequest *req,
> +                                   NvmeRequestCompletionFunc *cb, void *cb_arg)
> +{
> +    req->cb = cb;
> +    req->cb_arg = cb_arg;
> +}
> +
> +static inline void nvme_req_clear_cb(NvmeRequest *req)
> +{
> +    req->cb = req->cb_arg = NULL;
> +}
>  
>  typedef struct NvmeSQueue {
>      struct NvmeCtrl *ctrl;
> @@ -88,6 +130,60 @@ static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
>      return 1 << nvme_ns_lbads(ns);
>  }
>  
> +typedef enum NvmeAIOOp {
> +    NVME_AIO_OPC_NONE         = 0x0,
> +    NVME_AIO_OPC_FLUSH        = 0x1,
> +    NVME_AIO_OPC_READ         = 0x2,
> +    NVME_AIO_OPC_WRITE        = 0x3,
> +    NVME_AIO_OPC_WRITE_ZEROES = 0x4,
> +} NvmeAIOOp;
> +
> +typedef struct NvmeAIO NvmeAIO;
> +typedef void NvmeAIOCompletionFunc(NvmeAIO *aio, void *opaque, int ret);
> +
> +struct NvmeAIO {
> +    NvmeRequest *req;
> +
> +    NvmeAIOOp       opc;
> +    int64_t         offset;
> +    size_t          len;
> +    BlockBackend    *blk;
> +    BlockAIOCB      *aiocb;
> +    BlockAcctCookie acct;
> +
> +    NvmeAIOCompletionFunc *cb;
> +    void                  *cb_arg;
> +
> +    QEMUSGList   *qsg;
> +    QEMUIOVector *iov;
> +
> +    QTAILQ_ENTRY(NvmeAIO) tailq_entry;
> +};
> +
> +static inline const char *nvme_aio_opc_str(NvmeAIO *aio)
> +{
> +    switch (aio->opc) {
> +    case NVME_AIO_OPC_NONE:         return "NVME_AIO_OP_NONE";
> +    case NVME_AIO_OPC_FLUSH:        return "NVME_AIO_OP_FLUSH";
> +    case NVME_AIO_OPC_READ:         return "NVME_AIO_OP_READ";
> +    case NVME_AIO_OPC_WRITE:        return "NVME_AIO_OP_WRITE";
> +    case NVME_AIO_OPC_WRITE_ZEROES: return "NVME_AIO_OP_WRITE_ZEROES";
> +    default:                        return "NVME_AIO_OP_UNKNOWN";
> +    }
> +}
> +
> +static inline bool nvme_req_is_write(NvmeRequest *req)
> +{
> +    switch (req->cmd.opcode) {
> +    case NVME_CMD_WRITE:
> +    case NVME_CMD_WRITE_UNCOR:
> +    case NVME_CMD_WRITE_ZEROS:
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
>  #define TYPE_NVME "nvme"
>  #define NVME(obj) \
>          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
> @@ -140,10 +236,21 @@ static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
>  static inline uint16_t nvme_cid(NvmeRequest *req)
>  {
>      if (req) {
> -        return le16_to_cpu(req->cqe.cid);
> +        return le16_to_cpu(req->cmd.cid);
>      }
>  
>      return 0xffff;
>  }
>  
> +static inline bool nvme_status_is_error(uint16_t status, uint16_t err)
> +{
> +    /* strip DNR and MORE */
> +    return (status & 0xfff) == err;
> +}
> +
> +static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
> +{
> +    return req->sq->ctrl;
> +}
> +
>  #endif /* HW_NVME_H */
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 2aceb0537e05..aa449e314818 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -34,7 +34,12 @@ nvme_dev_irq_pin(void) "pulsing IRQ pin"
>  nvme_dev_irq_masked(void) "IRQ is masked"
>  nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
>  nvme_dev_map_prp(uint16_t cid, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64"
> num_prps %d"
> +nvme_dev_req_register_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, uint64_t count, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" count
> %"PRIu64" opc \"%s\" req %p"
> +nvme_dev_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
> +nvme_dev_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
>  nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
> +nvme_dev_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
> +nvme_dev_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
>  nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16""
>  nvme_dev_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16",
> qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
>  nvme_dev_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
> @@ -81,6 +86,7 @@ nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
>  # nvme traces for error conditions
>  nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
>  nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
> +nvme_dev_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req, uint16_t status) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p
> status 0x%"PRIx16""
>  nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
>  nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
>  nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""

The patch is still too large IMHO to review properly and few things can be split from it.
I tried my best to review it but I might have missed something.

Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 33/42] nvme: use preallocated qsg/iov in nvme_dma_prp
  2020-03-16 14:29 ` [PATCH v6 33/42] nvme: use preallocated qsg/iov in nvme_dma_prp Klaus Jensen
@ 2020-03-25 10:58   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:58 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Since clean up of the request qsg/iov has been moved to the common
> nvme_enqueue_req_completion function, there is no need to use a stack
> allocated qsg/iov in nvme_dma_prp.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/block/nvme.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 817384e3b1a9..15ca2417af04 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -321,45 +321,39 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>                               uint64_t prp1, uint64_t prp2, DMADirection dir,
>                               NvmeRequest *req)
>  {
> -    QEMUSGList qsg;
> -    QEMUIOVector iov;
>      uint16_t status = NVME_SUCCESS;
>  
> -    status = nvme_map_prp(n, &qsg, &iov, prp1, prp2, len, req);
> +    status = nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, len, req);
>      if (status) {
>          return status;
>      }
>  
> -    if (qsg.nsg > 0) {
> +    if (req->qsg.nsg > 0) {
>          uint64_t residual;
>  
>          if (dir == DMA_DIRECTION_TO_DEVICE) {
> -            residual = dma_buf_write(ptr, len, &qsg);
> +            residual = dma_buf_write(ptr, len, &req->qsg);
>          } else {
> -            residual = dma_buf_read(ptr, len, &qsg);
> +            residual = dma_buf_read(ptr, len, &req->qsg);
>          }
>  
>          if (unlikely(residual)) {
>              trace_nvme_dev_err_invalid_dma();
>              status = NVME_INVALID_FIELD | NVME_DNR;
>          }
> -
> -        qemu_sglist_destroy(&qsg);
>      } else {
>          size_t bytes;
>  
>          if (dir == DMA_DIRECTION_TO_DEVICE) {
> -            bytes = qemu_iovec_to_buf(&iov, 0, ptr, len);
> +            bytes = qemu_iovec_to_buf(&req->iov, 0, ptr, len);
>          } else {
> -            bytes = qemu_iovec_from_buf(&iov, 0, ptr, len);
> +            bytes = qemu_iovec_from_buf(&req->iov, 0, ptr, len);
>          }
>  
>          if (unlikely(bytes != len)) {
>              trace_nvme_dev_err_invalid_dma();
>              status = NVME_INVALID_FIELD | NVME_DNR;
>          }
> -
> -        qemu_iovec_destroy(&iov);
>      }
>  
>      return status;
Only minor changes from the previous version,
so 
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 35/42] nvme: handle dma errors
  2020-03-16 14:29 ` [PATCH v6 35/42] nvme: handle dma errors Klaus Jensen
@ 2020-03-25 10:58   ` Maxim Levitsky
  2020-03-31  5:47     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:58 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Handling DMA errors gracefully is required for the device to pass the
> block/011 test ("disable PCI device while doing I/O") in the blktests
> suite.
> 
> With this patch the device passes the test by retrying "critical"
> transfers (posting of completion entries and processing of submission
> queue entries).
> 
> If DMA errors occur at any other point in the execution of the command
> (say, while mapping the PRPs), the command is aborted with a Data
> Transfer Error status code.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c       | 45 ++++++++++++++++++++++++++++++++-----------
>  hw/block/trace-events |  2 ++
>  include/block/nvme.h  |  2 +-
>  3 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 15ca2417af04..49d323566393 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -74,14 +74,14 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>      return addr >= low && addr < hi;
>  }
>  
> -static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> +static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
>      if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
>          memcpy(buf, nvme_addr_to_cmb(n, addr), size);
> -        return;
> +        return 0;
>      }
>  
> -    pci_dma_read(&n->parent_obj, addr, buf, size);
> +    return pci_dma_read(&n->parent_obj, addr, buf, size);
>  }
>  
>  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
> @@ -164,7 +164,7 @@ static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
>                                    size_t len)
>  {
>      if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> -        return NVME_DATA_TRAS_ERROR;
> +        return NVME_DATA_TRANSFER_ERROR;

Minor nitpick: this is also a non functional refactoring.
I don't think that each piece of a refactoring should be in a separate patch,
so I usually group all the non functional (aka cosmetic) refactoring in one patch, usually the first in the series.
But I try not to leave such refactoring in the functional patches.

However, since there is not that much cases like that left, I don't mind leaving this particular case as is.

>      }
>  
>      qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
> @@ -213,6 +213,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>      int num_prps = (len >> n->page_bits) + 1;
>      uint16_t status;
>      bool prp_list_in_cmb = false;
> +    int ret;
>  
>      trace_nvme_dev_map_prp(nvme_cid(req), trans_len, len, prp1, prp2,
>                             num_prps);
> @@ -252,7 +253,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>  
>              nents = (len + n->page_size - 1) >> n->page_bits;
>              prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> -            nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
> +            ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
> +            if (ret) {
> +                trace_nvme_dev_err_addr_read(prp2);
> +                status = NVME_DATA_TRANSFER_ERROR;
> +                goto unmap;
> +            }
>              while (len != 0) {
>                  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
>  
> @@ -271,8 +277,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>                      i = 0;
>                      nents = (len + n->page_size - 1) >> n->page_bits;
>                      prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> -                    nvme_addr_read(n, prp_ent, (void *)prp_list,
> -                        prp_trans);
> +                    ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
> +                                         prp_trans);
> +                    if (ret) {
> +                        trace_nvme_dev_err_addr_read(prp_ent);
> +                        status = NVME_DATA_TRANSFER_ERROR;
> +                        goto unmap;
> +                    }
>                      prp_ent = le64_to_cpu(prp_list[i]);
>                  }
>  
> @@ -466,6 +477,7 @@ static void nvme_post_cqes(void *opaque)
>      NvmeCQueue *cq = opaque;
>      NvmeCtrl *n = cq->ctrl;
>      NvmeRequest *req, *next;
> +    int ret;
>  
>      QTAILQ_FOREACH_SAFE(req, &cq->req_list, entry, next) {
>          NvmeSQueue *sq;
> @@ -475,15 +487,21 @@ static void nvme_post_cqes(void *opaque)
>              break;
>          }
>  
> -        QTAILQ_REMOVE(&cq->req_list, req, entry);
>          sq = req->sq;
>          req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
>          req->cqe.sq_id = cpu_to_le16(sq->sqid);
>          req->cqe.sq_head = cpu_to_le16(sq->head);
>          addr = cq->dma_addr + cq->tail * n->cqe_size;
> +        ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
> +                            sizeof(req->cqe));
> +        if (ret) {
> +            trace_nvme_dev_err_addr_write(addr);
> +            timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +                      500 * SCALE_MS);
OK, this looks good.
> +            break;
> +        }
> +        QTAILQ_REMOVE(&cq->req_list, req, entry);
>          nvme_inc_cq_tail(cq);
> -        pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
> -            sizeof(req->cqe));
>          nvme_req_clear(req);
>          QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
>      }
> @@ -1650,7 +1668,12 @@ static void nvme_process_sq(void *opaque)
>  
>      while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
>          addr = sq->dma_addr + sq->head * n->sqe_size;
> -        nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd));
> +        if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
> +            trace_nvme_dev_err_addr_read(addr);
> +            timer_mod(sq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +                      500 * SCALE_MS);
> +            break;
> +        }
>          nvme_inc_sq_head(sq);
>  
>          req = QTAILQ_FIRST(&sq->req_list);
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index aa449e314818..d51c09a4e454 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -87,6 +87,8 @@ nvme_dev_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_
>  nvme_dev_err_mdts(uint16_t cid, size_t mdts, size_t len) "cid %"PRIu16" mdts %"PRIu64" len %"PRIu64""
>  nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
>  nvme_dev_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req, uint16_t status) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p
> status 0x%"PRIx16""
> +nvme_dev_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
> +nvme_dev_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
>  nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
>  nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
>  nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 293d68553538..d1ccde4cda4b 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -458,7 +458,7 @@ enum NvmeStatusCodes {
>      NVME_INVALID_OPCODE         = 0x0001,
>      NVME_INVALID_FIELD          = 0x0002,
>      NVME_CID_CONFLICT           = 0x0003,
> -    NVME_DATA_TRAS_ERROR        = 0x0004,
> +    NVME_DATA_TRANSFER_ERROR    = 0x0004,
>      NVME_POWER_LOSS_ABORT       = 0x0005,
>      NVME_INTERNAL_DEV_ERROR     = 0x0006,
>      NVME_CMD_ABORT_REQ          = 0x0007,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 36/42] nvme: add support for scatter gather lists
  2020-03-16 14:29 ` [PATCH v6 36/42] nvme: add support for scatter gather lists Klaus Jensen
@ 2020-03-25 10:58   ` Maxim Levitsky
  2020-03-31  5:48     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:58 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> For now, support the Data Block, Segment and Last Segment descriptor
> types.
> 
> See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme.c       | 310 +++++++++++++++++++++++++++++++++++-------
>  hw/block/trace-events |   4 +
>  2 files changed, 262 insertions(+), 52 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 49d323566393..b89b96990f52 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -76,7 +76,12 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>  
>  static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
> -    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
> +    hwaddr hi = addr + size;
> +    if (hi < addr) {
> +        return 1;
> +    }
> +
> +    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {

I would suggest to split this into a separate patch as well, since this contains not just one but 2 bugfixes
for this function and they are not related to sg lists.
Or at least move this to 'nvme: refactor nvme_addr_read' and rename this patch
to something like 'nvme: fix and refactor nvme_addr_read'


>          memcpy(buf, nvme_addr_to_cmb(n, addr), size);
>          return 0;
>      }
> @@ -328,13 +333,242 @@ unmap:
>      return status;
>  }
>  
> -static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> -                             uint64_t prp1, uint64_t prp2, DMADirection dir,
> +static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
> +                                  QEMUIOVector *iov,
> +                                  NvmeSglDescriptor *segment, uint64_t nsgld,
> +                                  size_t *len, NvmeRequest *req)
> +{
> +    dma_addr_t addr, trans_len;
> +    uint32_t blk_len;
> +    uint16_t status;
> +
> +    for (int i = 0; i < nsgld; i++) {
> +        uint8_t type = NVME_SGL_TYPE(segment[i].type);
> +
> +        if (type != NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> +            switch (type) {
> +            case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
> +            case NVME_SGL_DESCR_TYPE_KEYED_DATA_BLOCK:
> +                return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR;
> +            default:
To be honest I don't like that 'default'
I would explicitly state which segment types remain 
(I think segment list and last segment list, and various reserved types)
In fact for the reserved types you probably also want to return NVME_SGL_DESCR_TYPE_INVALID)

Also this function as well really begs to have a description prior to it,
something like 'map a sg list section, assuming that it only contains SGL data descriptions,
caller has to ensure this'.


> +                return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR;
> +            }
> +        }
> +
> +        if (*len == 0) {
> +            uint16_t sgls = le16_to_cpu(n->id_ctrl.sgls);
Nitpick: I would add a small comment here as well describiing
what this does (We reach this point if sg list covers more that that
was specified in the commmand, and the NVME_CTRL_SGLS_EXCESS_LENGTH controller
capability indicates that we support just throwing the extra data away)

> +            if (sgls & NVME_CTRL_SGLS_EXCESS_LENGTH) {
> +                break;
> +            }
> +
> +            trace_nvme_dev_err_invalid_sgl_excess_length(nvme_cid(req));
> +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> +        }
> +
> +        addr = le64_to_cpu(segment[i].addr);
> +        blk_len = le32_to_cpu(segment[i].len);
> +
> +        if (!blk_len) {
> +            continue;
> +        }
> +
> +        if (UINT64_MAX - addr < blk_len) {
> +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> +        }
Good!
> +
> +        trans_len = MIN(*len, blk_len);
> +
> +        status = nvme_map_addr(n, qsg, iov, addr, trans_len);
> +        if (status) {
> +            return status;
> +        }
> +
> +        *len -= trans_len;
> +    }
> +
> +    return NVME_SUCCESS;
> +}
> +
> +static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
> +                             NvmeSglDescriptor sgl, size_t len,
>                               NvmeRequest *req)
> +{
> +    /*
> +     * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
> +     * dynamically allocating a potentially large SGL. The spec allows the SGL
> +     * to be larger than the command transfer size, so it is not bounded by
> +     * MDTS.
> +     */
Now this is a very good comment!

However I don't fully understand the note about the SGL. I assume that you mean
that the data that SGL covers still should be less that MDTS, but the actual SGL chain,
if assembled really in inefficient way (like 1 byte per each data descriptor) might be larger.


> +    const int SEG_CHUNK_SIZE = 256;
> +
> +    NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
> +    uint64_t nsgld;
> +    uint32_t seg_len;
> +    uint16_t status;
> +    bool sgl_in_cmb = false;
> +    hwaddr addr;
> +    int ret;
> +
> +    sgld = &sgl;
> +    addr = le64_to_cpu(sgl.addr);
> +
> +    trace_nvme_dev_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), req->nlb,
> +                           len);
> +
> +    /*
> +     * If the entire transfer can be described with a single data block it can
> +     * be mapped directly.
> +     */
> +    if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> +        status = nvme_map_sgl_data(n, qsg, iov, sgld, 1, &len, req);
> +        if (status) {
> +            goto unmap;
> +        }
> +
> +        goto out;
> +    }
> +
> +    /*
> +     * If the segment is located in the CMB, the submission queue of the
> +     * request must also reside there.
> +     */
> +    if (nvme_addr_is_cmb(n, addr)) {
> +        if (!nvme_addr_is_cmb(n, req->sq->dma_addr)) {
> +            return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +        }
> +
> +        sgl_in_cmb = true;
> +    }
> +
> +    for (;;) {
> +        seg_len = le32_to_cpu(sgld->len);
> +
> +        if (!seg_len || seg_len & 0xf) {
> +            return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
> +        }
It might be worth noting here that we are dealing with sgl (last) segment descriptor
and its length indeed must be non zero and multiple of 16.
Otherwise I confused this for a moment with the alignment requirements on the data itsel.

> +
> +        if (UINT64_MAX - addr < seg_len) {
> +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> +        }
> +
> +        nsgld = seg_len / sizeof(NvmeSglDescriptor);
> +
> +        while (nsgld > SEG_CHUNK_SIZE) {
> +            if (nvme_addr_read(n, addr, segment, sizeof(segment))) {
> +                trace_nvme_dev_err_addr_read(addr);
> +                status = NVME_DATA_TRANSFER_ERROR;
> +                goto unmap;
> +            }
> +
> +            status = nvme_map_sgl_data(n, qsg, iov, segment, SEG_CHUNK_SIZE,
> +                                       &len, req);
> +            if (status) {
> +                goto unmap;
> +            }
> +
> +            nsgld -= SEG_CHUNK_SIZE;
> +            addr += SEG_CHUNK_SIZE * sizeof(NvmeSglDescriptor);
> +        }
> +
> +        ret = nvme_addr_read(n, addr, segment, nsgld *
> +                             sizeof(NvmeSglDescriptor));
> +        if (ret) {
> +            trace_nvme_dev_err_addr_read(addr);
> +            status = NVME_DATA_TRANSFER_ERROR;
> +            goto unmap;
> +        }
> +
> +        last_sgld = &segment[nsgld - 1];
> +
> +        /* if the segment ends with a Data Block, then we are done */
> +        if (NVME_SGL_TYPE(last_sgld->type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> +            status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld, &len, req);
> +            if (status) {
> +                goto unmap;
> +            }
> +
> +            break;
> +        }
> +
> +        /* a Last Segment must end with a Data Block descriptor */
> +        if (NVME_SGL_TYPE(sgld->type) == NVME_SGL_DESCR_TYPE_LAST_SEGMENT) {
> +            status = NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
> +            goto unmap;
> +        }
> +
> +        sgld = last_sgld;
> +        addr = le64_to_cpu(sgld->addr);
> +
> +        /*
> +         * Do not map the last descriptor; it will be a Segment or Last Segment
> +         * descriptor instead and handled by the next iteration.
> +         */
> +        status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld - 1, &len, req);
> +        if (status) {
> +            goto unmap;
> +        }
> +
> +        /*
> +         * If the next segment is in the CMB, make sure that the sgl was
> +         * already located there.
> +         */
> +        if (sgl_in_cmb != nvme_addr_is_cmb(n, addr)) {
> +            status = NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +            goto unmap;
> +        }
> +    }
> +
> +out:
> +    /* if there is any residual left in len, the SGL was too short */
> +    if (len) {
> +        status = NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> +        goto unmap;
> +    }
> +
> +    return NVME_SUCCESS;
> +
> +unmap:
> +    if (iov->iov) {
> +        qemu_iovec_destroy(iov);
> +    }
> +
> +    if (qsg->sg) {
> +        qemu_sglist_destroy(qsg);
> +    }
> +
> +    return status;
> +}
Looks OK, I might have missed something.

> +
> +static uint16_t nvme_map(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
> +                         size_t len, NvmeRequest *req)
> +{
> +    uint64_t prp1, prp2;
> +
> +    switch (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
> +    case PSDT_PRP:
> +        prp1 = le64_to_cpu(req->cmd.dptr.prp1);
> +        prp2 = le64_to_cpu(req->cmd.dptr.prp2);
> +
> +        return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
> +    case PSDT_SGL_MPTR_CONTIGUOUS:
> +    case PSDT_SGL_MPTR_SGL:
> +        /* SGLs shall not be used for Admin commands in NVMe over PCIe */
> +        if (!req->sq->sqid) {
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +        }
> +
> +        return nvme_map_sgl(n, qsg, iov, req->cmd.dptr.sgl, len, req);
> +    default:
> +        return NVME_INVALID_FIELD;
> +    }
> +}
Looks OK
> +
> +static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> +                         DMADirection dir, NvmeRequest *req)
>  {
>      uint16_t status = NVME_SUCCESS;
>  
> -    status = nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, len, req);
> +    status = nvme_map(n, &req->qsg, &req->iov, len, req);
>      if (status) {
>          return status;
>      }
> @@ -370,15 +604,6 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
>      return status;
>  }
>  
> -static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
> -                         QEMUIOVector *iov, size_t len, NvmeRequest *req)
> -{
> -    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> -
> -    return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
> -}
> -
>  static void nvme_aio_destroy(NvmeAIO *aio)
>  {
>      g_free(aio);
> @@ -846,7 +1071,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          goto invalid;
>      }
>  
> -    status = nvme_map(n, cmd, &req->qsg, &req->iov, len, req);
> +    status = nvme_map(n, &req->qsg, &req->iov, len, req);
>      if (status) {
>          goto invalid;
>      }
> @@ -1013,8 +1238,6 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>                                  uint32_t buf_len, uint64_t off,
>                                  NvmeRequest *req)
>  {
> -    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>      uint32_t nsid = le32_to_cpu(cmd->nsid);
>  
>      uint32_t trans_len;
> @@ -1064,16 +1287,14 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>          nvme_clear_events(n, NVME_AER_TYPE_SMART);
>      }
>  
> -    return nvme_dma_prp(n, (uint8_t *) &smart + off, trans_len, prp1, prp2,
> -                        DMA_DIRECTION_FROM_DEVICE, req);
> +    return nvme_dma(n, (uint8_t *) &smart + off, trans_len,
> +                    DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
>                                   uint64_t off, NvmeRequest *req)
>  {
>      uint32_t trans_len;
> -    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>      NvmeFwSlotInfoLog fw_log;
>  
>      if (off > sizeof(fw_log)) {
> @@ -1084,8 +1305,8 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
>  
>      trans_len = MIN(sizeof(fw_log) - off, buf_len);
>  
> -    return nvme_dma_prp(n, (uint8_t *) &fw_log + off, trans_len, prp1, prp2,
> -                        DMA_DIRECTION_FROM_DEVICE, req);
> +    return nvme_dma(n, (uint8_t *) &fw_log + off, trans_len,
> +                    DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
> @@ -1093,8 +1314,6 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>                                  NvmeRequest *req)
>  {
>      uint32_t trans_len;
> -    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>      uint8_t errlog[64];
>  
>      if (!rae) {
> @@ -1109,8 +1328,7 @@ static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>  
>      trans_len = MIN(sizeof(errlog) - off, buf_len);
>  
> -    return nvme_dma_prp(n, errlog, trans_len, prp1, prp2,
> -                        DMA_DIRECTION_FROM_DEVICE, req);
> +    return nvme_dma(n, errlog, trans_len, DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> @@ -1255,13 +1473,10 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
>  static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
>                                     NvmeRequest *req)
>  {
> -    uint64_t prp1 = le64_to_cpu(c->prp1);
> -    uint64_t prp2 = le64_to_cpu(c->prp2);
> -
>      trace_nvme_dev_identify_ctrl();
>  
> -    return nvme_dma_prp(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), prp1,
> -                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
> +    return nvme_dma(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
> +                    DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
> @@ -1269,8 +1484,6 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
>  {
>      NvmeNamespace *ns;
>      uint32_t nsid = le32_to_cpu(c->nsid);
> -    uint64_t prp1 = le64_to_cpu(c->prp1);
> -    uint64_t prp2 = le64_to_cpu(c->prp2);
>  
>      trace_nvme_dev_identify_ns(nsid);
>  
> @@ -1281,8 +1494,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
>  
>      ns = &n->namespaces[nsid - 1];
>  
> -    return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
> -                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
> +    return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns),
> +                    DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
> @@ -1290,8 +1503,6 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
>  {
>      static const int data_len = NVME_IDENTIFY_DATA_SIZE;
>      uint32_t min_nsid = le32_to_cpu(c->nsid);
> -    uint64_t prp1 = le64_to_cpu(c->prp1);
> -    uint64_t prp2 = le64_to_cpu(c->prp2);
>      uint32_t *list;
>      uint16_t ret;
>      int i, j = 0;
> @@ -1308,8 +1519,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
>              break;
>          }
>      }
> -    ret = nvme_dma_prp(n, (uint8_t *)list, data_len, prp1, prp2,
> -                       DMA_DIRECTION_FROM_DEVICE, req);
> +    ret = nvme_dma(n, (uint8_t *)list, data_len, DMA_DIRECTION_FROM_DEVICE,
> +                   req);
>      g_free(list);
>      return ret;
>  }
> @@ -1318,8 +1529,6 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
>                                              NvmeRequest *req)
>  {
>      uint32_t nsid = le32_to_cpu(c->nsid);
> -    uint64_t prp1 = le64_to_cpu(c->prp1);
> -    uint64_t prp2 = le64_to_cpu(c->prp2);
>  
>      void *list;
>      uint16_t ret;
> @@ -1345,8 +1554,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
>      ns_descr->nidl = NVME_NIDT_UUID_LEN;
>      stl_be_p(ns_descr + sizeof(*ns_descr), nsid);
>  
> -    ret = nvme_dma_prp(n, (uint8_t *) list, NVME_IDENTIFY_DATA_SIZE, prp1,
> -                       prp2, DMA_DIRECTION_FROM_DEVICE, req);
> +    ret = nvme_dma(n, (uint8_t *)list, NVME_IDENTIFY_DATA_SIZE,
> +                   DMA_DIRECTION_FROM_DEVICE, req);
>      g_free(list);
>      return ret;
>  }
> @@ -1425,13 +1634,10 @@ static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n)
>  static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
>                                             NvmeRequest *req)
>  {
> -    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> -
>      uint64_t timestamp = nvme_get_timestamp(n);
>  
> -    return nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
> -                        prp2, DMA_DIRECTION_FROM_DEVICE, req);
> +    return nvme_dma(n, (uint8_t *)&timestamp, sizeof(timestamp),
> +                    DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
>  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> @@ -1514,11 +1720,9 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
>  {
>      uint16_t ret;
>      uint64_t timestamp;
> -    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> -    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
>  
> -    ret = nvme_dma_prp(n, (uint8_t *)&timestamp, sizeof(timestamp), prp1,
> -                       prp2, DMA_DIRECTION_TO_DEVICE, req);
> +    ret = nvme_dma(n, (uint8_t *)&timestamp, sizeof(timestamp),
> +                   DMA_DIRECTION_TO_DEVICE, req);
>      if (ret != NVME_SUCCESS) {
>          return ret;
>      }
> @@ -2306,6 +2510,8 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->nn = cpu_to_le32(n->num_namespaces);
>      id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
>  
> +    id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT);
> +
>      pstrcpy((char *) id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:");
>      pstrcat((char *) id->subnqn, sizeof(id->subnqn), n->params.serial);
>  
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index d51c09a4e454..70702cc67d5a 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -34,6 +34,7 @@ nvme_dev_irq_pin(void) "pulsing IRQ pin"
>  nvme_dev_irq_masked(void) "IRQ is masked"
>  nvme_dev_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
>  nvme_dev_map_prp(uint16_t cid, uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "cid %"PRIu16" trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64"
> num_prps %d"
> +nvme_dev_map_sgl(uint16_t cid, uint8_t typ, uint32_t nlb, uint64_t len) "cid %"PRIu16" type 0x%"PRIx8" nlb %"PRIu32" len %"PRIu64""
>  nvme_dev_req_register_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, uint64_t count, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" count
> %"PRIu64" opc \"%s\" req %p"
>  nvme_dev_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
>  nvme_dev_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
> @@ -89,6 +90,9 @@ nvme_dev_err_prinfo(uint16_t cid, uint16_t ctrl) "cid %"PRIu16" ctrl %"PRIu16""
>  nvme_dev_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req, uint16_t status) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p
> status 0x%"PRIx16""
>  nvme_dev_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
>  nvme_dev_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
> +nvme_dev_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
> +nvme_dev_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
> +nvme_dev_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16""
>  nvme_dev_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
>  nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or not page aligned: 0x%"PRIx64""
>  nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""


Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 37/42] nvme: refactor identify active namespace id list
  2020-03-16 14:29 ` [PATCH v6 37/42] nvme: refactor identify active namespace id list Klaus Jensen
@ 2020-03-25 10:58   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:58 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Prepare to support inactive namespaces.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/block/nvme.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index b89b96990f52..bf9fb500842a 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1505,16 +1505,16 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
>      uint32_t min_nsid = le32_to_cpu(c->nsid);
>      uint32_t *list;
>      uint16_t ret;
> -    int i, j = 0;
> +    int j = 0;
>  
>      trace_nvme_dev_identify_nslist(min_nsid);
>  
>      list = g_malloc0(data_len);
> -    for (i = 0; i < n->num_namespaces; i++) {
> -        if (i < min_nsid) {
> +    for (int i = 1; i <= n->num_namespaces; i++) {
> +        if (i <= min_nsid) {
>              continue;
>          }
> -        list[j++] = cpu_to_le32(i + 1);
> +        list[j++] = cpu_to_le32(i);
>          if (j == data_len / sizeof(uint32_t)) {
>              break;
>          }


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 38/42] nvme: support multiple namespaces
  2020-03-16 14:29 ` [PATCH v6 38/42] nvme: support multiple namespaces Klaus Jensen
@ 2020-03-25 10:59   ` Maxim Levitsky
  2020-03-31  5:48     ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:59 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> This adds support for multiple namespaces by introducing a new 'nvme-ns'
> device model. The nvme device creates a bus named from the device name
> ('id'). The nvme-ns devices then connect to this and registers
> themselves with the nvme device.
> 
> This changes how an nvme device is created. Example with two namespaces:
> 
>   -drive file=nvme0n1.img,if=none,id=disk1
>   -drive file=nvme0n2.img,if=none,id=disk2
>   -device nvme,serial=deadbeef,id=nvme0
>   -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
>   -device nvme-ns,drive=disk2,bus=nvme0,nsid=2
> 
> The drive property is kept on the nvme device to keep the change
> backward compatible, but the property is now optional. Specifying a
> drive for the nvme device will always create the namespace with nsid 1.
> 
> Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/Makefile.objs |   2 +-
>  hw/block/nvme-ns.c     | 157 +++++++++++++++++++++++++++
>  hw/block/nvme-ns.h     |  60 +++++++++++
>  hw/block/nvme.c        | 233 ++++++++++++++++++++++++++---------------
>  hw/block/nvme.h        |  47 ++++-----
>  hw/block/trace-events  |   4 +-
>  6 files changed, 389 insertions(+), 114 deletions(-)
>  create mode 100644 hw/block/nvme-ns.c
>  create mode 100644 hw/block/nvme-ns.h
> 
> diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> index 4b4a2b338dc4..d9141d6a4b9b 100644
> --- a/hw/block/Makefile.objs
> +++ b/hw/block/Makefile.objs
> @@ -7,7 +7,7 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
>  common-obj-$(CONFIG_XEN) += xen-block.o
>  common-obj-$(CONFIG_ECC) += ecc.o
>  common-obj-$(CONFIG_ONENAND) += onenand.o
> -common-obj-$(CONFIG_NVME_PCI) += nvme.o
> +common-obj-$(CONFIG_NVME_PCI) += nvme.o nvme-ns.o
>  common-obj-$(CONFIG_SWIM) += swim.o
>  
>  common-obj-$(CONFIG_SH4) += tc58128.o
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> new file mode 100644
> index 000000000000..6d975104171d
> --- /dev/null
> +++ b/hw/block/nvme-ns.c
> @@ -0,0 +1,157 @@
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/cutils.h"
> +#include "qemu/log.h"
> +#include "hw/block/block.h"
> +#include "hw/pci/pci.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/block-backend.h"
> +#include "qapi/error.h"
> +
> +#include "hw/qdev-properties.h"
> +#include "hw/qdev-core.h"
> +
> +#include "nvme.h"
> +#include "nvme-ns.h"
> +
> +static int nvme_ns_init(NvmeNamespace *ns)
> +{
> +    NvmeIdNs *id_ns = &ns->id_ns;
> +
> +    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> +    id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
> +
> +    /* no thin provisioning */
> +    id_ns->ncap = id_ns->nsze;
> +    id_ns->nuse = id_ns->ncap;
> +
> +    return 0;
> +}
Looks great!

> +
> +static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id,
> +                            Error **errp)
> +{
> +    uint64_t perm, shared_perm;
> +
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
> +    shared_perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> +        BLK_PERM_GRAPH_MOD;
> +
> +    ret = blk_set_perm(ns->blk, perm, shared_perm, &local_err);
> +    if (ret) {
> +        error_propagate_prepend(errp, local_err,
> +                                "could not set block permissions: ");
> +        return ret;
> +    }
> +
> +    ns->size = blk_getlength(ns->blk);
> +    if (ns->size < 0) {
> +        error_setg_errno(errp, -ns->size, "could not get blockdev size");
> +        return -1;
> +    }
> +
> +    switch (n->conf.wce) {
> +    case ON_OFF_AUTO_ON:
> +        n->features.volatile_wc = 1;
> +        break;
> +    case ON_OFF_AUTO_OFF:
> +        n->features.volatile_wc = 0;
> +    case ON_OFF_AUTO_AUTO:
> +        n->features.volatile_wc = blk_enable_write_cache(ns->blk);
> +        break;
> +    default:
> +        abort();
> +    }
> +
> +    blk_set_enable_write_cache(ns->blk, n->features.volatile_wc);
> +
> +    return 0;
> +}

This needs review from someone that knows the block layer better that I do.
I still think that maybe you can somehow use the blkconf_apply_backend_options
(or even extend it to suit you somehow). I'll leave this to the block layer folks.



> +
> +static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
> +{
> +    if (!ns->blk) {
> +        error_setg(errp, "block backend not configured");
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> +{
> +    if (nvme_ns_check_constraints(ns, errp)) {
> +        return -1;
> +    }
> +
> +    if (nvme_ns_init_blk(n, ns, &n->id_ctrl, errp)) {
> +        return -1;
> +    }
> +
> +    nvme_ns_init(ns);
> +    if (nvme_register_namespace(n, ns, errp)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static void nvme_ns_realize(DeviceState *dev, Error **errp)
> +{
> +    NvmeNamespace *ns = NVME_NS(dev);
> +    BusState *s = qdev_get_parent_bus(dev);
> +    NvmeCtrl *n = NVME(s->parent);
> +    Error *local_err = NULL;
> +
> +    if (nvme_ns_setup(n, ns, &local_err)) {
> +        error_propagate_prepend(errp, local_err,
> +                                "could not setup namespace: ");
> +        return;
> +    }
> +}
> +
> +static Property nvme_ns_props[] = {
> +    DEFINE_NVME_NS_PROPERTIES(NvmeNamespace, params),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void nvme_ns_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +    set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> +
> +    dc->bus_type = TYPE_NVME_BUS;
> +    dc->realize = nvme_ns_realize;
> +    device_class_set_props(dc, nvme_ns_props);
> +    dc->desc = "virtual nvme namespace";
> +}
> +
> +static void nvme_ns_instance_init(Object *obj)
> +{
> +    NvmeNamespace *ns = NVME_NS(obj);
> +    char *bootindex = g_strdup_printf("/namespace@%d,0", ns->params.nsid);
> +
> +    device_add_bootindex_property(obj, &ns->bootindex, "bootindex",
> +                                  bootindex, DEVICE(obj), &error_abort);
> +
> +    g_free(bootindex);
> +}
> +
> +static const TypeInfo nvme_ns_info = {
> +    .name = TYPE_NVME_NS,
> +    .parent = TYPE_DEVICE,
> +    .class_init = nvme_ns_class_init,
> +    .instance_size = sizeof(NvmeNamespace),
> +    .instance_init = nvme_ns_instance_init,
> +};
> +
> +static void nvme_ns_register_types(void)
> +{
> +    type_register_static(&nvme_ns_info);
> +}
> +
> +type_init(nvme_ns_register_types)
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> new file mode 100644
> index 000000000000..3c3651d485d0
> --- /dev/null
> +++ b/hw/block/nvme-ns.h
> @@ -0,0 +1,60 @@
> +#ifndef NVME_NS_H
> +#define NVME_NS_H
> +
> +#define TYPE_NVME_NS "nvme-ns"
> +#define NVME_NS(obj) \
> +    OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
> +
> +#define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
> +    DEFINE_PROP_DRIVE("drive", _state, blk), \
> +    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
> +
> +typedef struct NvmeNamespaceParams {
> +    uint32_t nsid;
> +} NvmeNamespaceParams;
> +
> +typedef struct NvmeNamespace {
> +    DeviceState  parent_obj;
> +    BlockBackend *blk;
> +    int32_t      bootindex;
> +    int64_t      size;
> +
> +    NvmeIdNs            id_ns;
> +    NvmeNamespaceParams params;
> +} NvmeNamespace;
> +
> +static inline uint32_t nvme_nsid(NvmeNamespace *ns)
> +{
> +    if (ns) {
> +        return ns->params.nsid;
> +    }
> +
> +    return -1;
> +}
> +
> +static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
> +{
> +    NvmeIdNs *id_ns = &ns->id_ns;
> +    return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
> +}
> +
> +static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
> +{
> +    return nvme_ns_lbaf(ns)->ds;
> +}
> +
> +static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
> +{
> +    return 1 << nvme_ns_lbads(ns);
> +}
> +
> +static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
> +{
> +    return ns->size >> nvme_ns_lbads(ns);
> +}
> +
> +typedef struct NvmeCtrl NvmeCtrl;
> +
> +int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
> +
> +#endif /* NVME_NS_H */
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index bf9fb500842a..88a0499d0fe0 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -17,10 +17,11 @@
>  /**
>   * Usage: add options:
>   *      -drive file=<file>,if=none,id=<drive_id>
> - *      -device nvme,drive=<drive_id>,serial=<serial>,id=<id[optional]>, \
> + *      -device nvme,serial=<serial>,id=<bus_name>, \
>   *              cmb_size_mb=<cmb_size_mb[optional]>, \
>   *              max_ioqpairs=<N[optional]>, \
>   *              mdts=<mdts[optional]>
> + *      -device nvme-ns,drive=<drive_id>,bus=bus_name,nsid=1
>   *
>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> @@ -44,6 +45,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "nvme.h"
> +#include "nvme-ns.h"
>  
>  #define NVME_SPEC_VER 0x00010300
>  #define NVME_CMB_BIR 2
> @@ -89,6 +91,11 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>      return pci_dma_read(&n->parent_obj, addr, buf, size);
>  }
>  
> +static uint16_t nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
> +{
> +    return nsid && nsid <= n->num_namespaces;
> +}
This is correct.

> +
>  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
>  {
>      return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
> @@ -892,11 +899,12 @@ static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
>  
>  static void nvme_rw_cb(NvmeRequest *req, void *opaque)
>  {
> +    NvmeNamespace *ns = req->ns;
>      NvmeSQueue *sq = req->sq;
>      NvmeCtrl *n = sq->ctrl;
>      NvmeCQueue *cq = n->cq[sq->cqid];
>  
> -    trace_nvme_dev_rw_cb(nvme_cid(req), req->cmd.nsid);
> +    trace_nvme_dev_rw_cb(nvme_cid(req), nvme_nsid(ns));
>  
>      nvme_enqueue_req_completion(cq, req);
>  }
> @@ -980,10 +988,11 @@ static void nvme_aio_cb(void *opaque, int ret)
>  
>  static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
> +    NvmeNamespace *ns = req->ns;
>      NvmeAIO *aio = g_new0(NvmeAIO, 1);
>  
>      *aio = (NvmeAIO) {
> -        .blk = n->conf.blk,
> +        .blk = ns->blk,
>          .req = req,
>      };
>  
> @@ -1008,8 +1017,8 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      req->slba = le64_to_cpu(rw->slba);
>      req->nlb  = le16_to_cpu(rw->nlb) + 1;
>  
> -    trace_nvme_dev_write_zeroes(nvme_cid(req), le32_to_cpu(cmd->nsid),
> -                                req->slba, req->nlb);
> +    trace_nvme_dev_write_zeroes(nvme_cid(req), nvme_nsid(ns), req->slba,
> +                                req->nlb);
>  
>      status = nvme_check_prinfo(n, ctrl, req);
>      if (status) {
> @@ -1032,7 +1041,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      aio = g_new0(NvmeAIO, 1);
>  
>      *aio = (NvmeAIO) {
> -        .blk = n->conf.blk,
> +        .blk = ns->blk,
>          .offset = offset,
>          .len = count,
>          .req = req,
> @@ -1044,7 +1053,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      return NVME_NO_COMPLETE;
>  
>  invalid:
> -    block_acct_invalid(blk_get_stats(n->conf.blk), BLOCK_ACCT_WRITE);
> +    block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE);
>      return status;
>  }
>  
> @@ -1060,11 +1069,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  
>      req->nlb  = le16_to_cpu(rw->nlb) + 1;
>      req->slba = le64_to_cpu(rw->slba);
> -
>      len = req->nlb << nvme_ns_lbads(ns);
>  
> -    trace_nvme_dev_rw(nvme_req_is_write(req) ? "write" : "read", req->nlb,
> -                      req->nlb << nvme_ns_lbads(req->ns), req->slba);
> +    trace_nvme_dev_rw(nvme_cid(req), nvme_req_is_write(req) ? "write" : "read",
> +                      nvme_nsid(ns), req->nlb, req->nlb << nvme_ns_lbads(ns),
> +                      req->slba);
>  
>      status = nvme_check_rw(n, req);
>      if (status) {
> @@ -1076,13 +1085,13 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          goto invalid;
>      }
>  
> -    nvme_rw_aio(n->conf.blk, req->slba << nvme_ns_lbads(ns), req);
> +    nvme_rw_aio(ns->blk, req->slba << nvme_ns_lbads(ns), req);
>      nvme_req_set_cb(req, nvme_rw_cb, NULL);
>  
>      return NVME_NO_COMPLETE;
>  
>  invalid:
> -    block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> +    block_acct_invalid(blk_get_stats(ns->blk), acct);
>      return status;
>  }
>  
> @@ -1093,12 +1102,15 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>      trace_nvme_dev_io_cmd(nvme_cid(req), nsid, le16_to_cpu(req->sq->sqid),
>                            cmd->opcode);
>  
> -    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> -        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
> +    if (!nvme_nsid_valid(n, nsid)) {
>          return NVME_INVALID_NSID | NVME_DNR;
>      }
>  
> -    req->ns = &n->namespaces[nsid - 1];
> +    req->ns = nvme_ns(n, nsid);
> +
> +    if (unlikely(!req->ns)) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
>  
>      switch (cmd->opcode) {
>      case NVME_CMD_FLUSH:
> @@ -1245,18 +1257,24 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint8_t rae,
>      uint64_t units_read = 0, units_written = 0;
>      uint64_t read_commands = 0, write_commands = 0;
>      NvmeSmartLog smart;
> -    BlockAcctStats *s;
>  
>      if (nsid && nsid != 0xffffffff) {
>          return NVME_INVALID_FIELD | NVME_DNR;
>      }
>  
> -    s = blk_get_stats(n->conf.blk);
> +    for (int i = 1; i <= n->num_namespaces; i++) {
> +        NvmeNamespace *ns = nvme_ns(n, i);
> +        if (!ns) {
> +            continue;
> +        }
>  
> -    units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> -    units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
> -    read_commands = s->nr_ops[BLOCK_ACCT_READ];
> -    write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
> +        BlockAcctStats *s = blk_get_stats(ns->blk);
> +
> +        units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> +        units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
> +        read_commands += s->nr_ops[BLOCK_ACCT_READ];
> +        write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
> +    }
>  
>      if (off > sizeof(smart)) {
>          return NVME_INVALID_FIELD | NVME_DNR;
> @@ -1482,19 +1500,24 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeIdentify *c,
>  static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c,
>                                   NvmeRequest *req)
>  {
> -    NvmeNamespace *ns;
> +    NvmeIdNs *id_ns, inactive = { 0 };
>      uint32_t nsid = le32_to_cpu(c->nsid);
> +    NvmeNamespace *ns;
>  
>      trace_nvme_dev_identify_ns(nsid);
>  
> -    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> -        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
> +    if (!nvme_nsid_valid(n, nsid)) {
Great!
>          return NVME_INVALID_NSID | NVME_DNR;
>      }
>  
> -    ns = &n->namespaces[nsid - 1];
> +    ns = nvme_ns(n, nsid);
> +    if (unlikely(!ns)) {
> +        id_ns = &inactive;
> +    } else {
> +        id_ns = &ns->id_ns;
> +    }
>  
> -    return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns),
> +    return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs),
>                      DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
> @@ -1511,7 +1534,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c,
>  
>      list = g_malloc0(data_len);
>      for (int i = 1; i <= n->num_namespaces; i++) {
> -        if (i <= min_nsid) {
> +        if (i <= min_nsid || !nvme_ns(n, i)) {
>              continue;
>          }
>          list[j++] = cpu_to_le32(i);
> @@ -1536,11 +1559,14 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeIdentify *c,
>  
>      trace_nvme_dev_identify_ns_descr_list(nsid);
>  
> -    if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> -        trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
> +    if (!nvme_nsid_valid(n, nsid)) {
>          return NVME_INVALID_NSID | NVME_DNR;
>      }
>  
> +    if (unlikely(!nvme_ns(n, nsid))) {
> +        return NVME_INVALID_FIELD | NVME_DNR;
I double checked with the spec and that is correct.
> +    }
> +
>      list = g_malloc0(NVME_IDENTIFY_DATA_SIZE);
>      ns_descr = list;
>  
> @@ -1680,7 +1706,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>          result = cpu_to_le32(n->features.err_rec);
>          break;
>      case NVME_VOLATILE_WRITE_CACHE:
> -        result = cpu_to_le32(blk_enable_write_cache(n->conf.blk));
> +        result = cpu_to_le32(n->features.volatile_wc);
>          trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
>          break;
>      case NVME_NUMBER_OF_QUEUES:
> @@ -1734,6 +1760,8 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeCmd *cmd,
>  
>  static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  {
> +    NvmeNamespace *ns;
> +
>      uint32_t dw10 = le32_to_cpu(cmd->cdw10);
>      uint32_t dw11 = le32_to_cpu(cmd->cdw11);
>  
> @@ -1766,12 +1794,23 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
>  
>          break;
>      case NVME_VOLATILE_WRITE_CACHE:
> -        if (blk_enable_write_cache(n->conf.blk)) {
> -            blk_flush(n->conf.blk);
> +        n->features.volatile_wc = dw11;
> +
> +        for (int i = 1; i <= n->num_namespaces; i++) {
> +            ns = nvme_ns(n, i);
> +            if (!ns) {
> +                continue;
> +            }
> +
> +            if (blk_enable_write_cache(ns->blk)) {
> +                blk_flush(ns->blk);
> +            }
> +
> +            blk_set_enable_write_cache(ns->blk, dw11 & 1);
>          }
Good.
>  
> -        blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
>          break;
> +
>      case NVME_NUMBER_OF_QUEUES:
>          if (n->qs_created) {
>              return NVME_CMD_SEQ_ERROR | NVME_DNR;
> @@ -1898,9 +1937,17 @@ static void nvme_process_sq(void *opaque)
>  
>  static void nvme_clear_ctrl(NvmeCtrl *n)
>  {
> +    NvmeNamespace *ns;
>      int i;
>  
> -    blk_drain(n->conf.blk);
> +    for (i = 1; i <= n->num_namespaces; i++) {
> +        ns = nvme_ns(n, i);
> +        if (!ns) {
> +            continue;
> +        }
> +
> +        blk_drain(ns->blk);
> +    }
>  
>      for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
>          if (n->sq[i] != NULL) {
> @@ -1923,7 +1970,15 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
>      n->outstanding_aers = 0;
>      n->qs_created = false;
>  
> -    blk_flush(n->conf.blk);
> +    for (i = 1; i <= n->num_namespaces; i++) {
> +        ns = nvme_ns(n, i);
> +        if (!ns) {
> +            continue;
> +        }
> +
> +        blk_flush(ns->blk);
> +    }
> +
>      n->bar.cc = 0;
>  }
>  
> @@ -2360,17 +2415,17 @@ static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
>          n->params.max_ioqpairs = n->params.num_queues - 1;
>      }
>  
> +    if (n->namespace.blk) {
> +        warn_report("nvme: drive is deprecated; please use an nvme-ns device "
> +                    "instead");
> +    }
> +
Good.

>      if (params->max_ioqpairs < 1 ||
>          params->max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
>          error_setg(errp, "nvme: max_ioqpairs must be ");
>          return -1;
>      }
>  
> -    if (!n->conf.blk) {
> -        error_setg(errp, "nvme: block backend not configured");
> -        return -1;
> -    }
> -
>      if (!params->serial) {
>          error_setg(errp, "nvme: serial not configured");
>          return -1;
> @@ -2379,22 +2434,10 @@ static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
>      return 0;
>  }
>  
> -static int nvme_init_blk(NvmeCtrl *n, Error **errp)
> -{
> -    blkconf_blocksizes(&n->conf);
> -    if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
> -                                       false, errp)) {
> -        return -1;
> -    }
> -
> -    return 0;
> -}
> -
>  static void nvme_init_state(NvmeCtrl *n)
>  {
> -    n->num_namespaces = 1;
> +    n->num_namespaces = NVME_MAX_NAMESPACES;
Perfect!
>      n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
> -    n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>      n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
>      n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
>      n->temperature = NVME_TEMPERATURE;
> @@ -2509,7 +2552,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->cqes = (0x4 << 4) | 0x4;
>      id->nn = cpu_to_le32(n->num_namespaces);
>      id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
> -
> +    id->vwc = 0x1;
>      id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT);
>  
>      pstrcpy((char *) id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:");
> @@ -2518,9 +2561,6 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      id->psd[0].mp = cpu_to_le16(0x9c4);
>      id->psd[0].enlat = cpu_to_le32(0x10);
>      id->psd[0].exlat = cpu_to_le32(0x4);
> -    if (blk_enable_write_cache(n->conf.blk)) {
> -        id->vwc = 1;
> -    }
Shouldn't that be kept? Assuming that user used the legacy 'drive' option,
and it had no write cache enabled.

>  
>      n->bar.cap = 0;
>      NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
> @@ -2533,25 +2573,34 @@ static void nvme_init_ctrl(NvmeCtrl *n)
>      n->bar.intmc = n->bar.intms = 0;
>  }
>  
> -static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> +int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
>  {
> -    int64_t bs_size;
> -    NvmeIdNs *id_ns = &ns->id_ns;
> +    uint32_t nsid = nvme_nsid(ns);
>  
> -    bs_size = blk_getlength(n->conf.blk);
> -    if (bs_size < 0) {
> -        error_setg_errno(errp, -bs_size, "blk_getlength");
> +    if (nsid > NVME_MAX_NAMESPACES) {
> +        error_setg(errp, "invalid nsid (must be between 0 and %d)",
> +                   NVME_MAX_NAMESPACES);
>          return -1;
>      }
>  
> -    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> -    n->ns_size = bs_size;
> +    if (!nsid) {
> +        for (int i = 1; i <= n->num_namespaces; i++) {
> +            NvmeNamespace *ns = nvme_ns(n, i);
> +            if (!ns) {
> +                nsid = i;
> +                break;
> +            }
> +        }
This misses an edge error case, where all the namespaces are allocated.
Yes, it would be insane to allocate all 256 namespaces but still.


> +    } else {
> +        if (n->namespaces[nsid - 1]) {
> +            error_setg(errp, "nsid must be unique");

I''l would change that error message to something like 
"namespace id %d is already in use" or something like that.


> +            return -1;
> +        }
> +    }
>  
> -    id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(n, ns));
> +    trace_nvme_dev_register_namespace(nsid);
>  
> -    /* no thin provisioning */
> -    id_ns->ncap = id_ns->nsze;
> -    id_ns->nuse = id_ns->ncap;
> +    n->namespaces[nsid - 1] = ns;
>  
>      return 0;
>  }
> @@ -2559,26 +2608,28 @@ static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
>  static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      NvmeCtrl *n = NVME(pci_dev);
> -    int i;
> +    NvmeNamespace *ns;
>  
>      if (nvme_check_constraints(n, errp)) {
>          return;
>      }
>  
> +    qbus_create_inplace(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS,
> +                        &pci_dev->qdev, n->parent_obj.qdev.id);
> +
>      nvme_init_state(n);
> -
> -    if (nvme_init_blk(n, errp)) {
> -        return;
> -    }
> -
> -    for (i = 0; i < n->num_namespaces; i++) {
> -        if (nvme_init_namespace(n, &n->namespaces[i], errp)) {
> -            return;
> -        }
> -    }
> -
>      nvme_init_pci(n, pci_dev);
>      nvme_init_ctrl(n);
> +
> +    /* setup a namespace if the controller drive property was given */
> +    if (n->namespace.blk) {
> +        ns = &n->namespace;
> +        ns->params.nsid = 1;
> +
> +        if (nvme_ns_setup(n, ns, errp)) {
> +            return;
> +        }
> +    }
>  }
>  
>  static void nvme_exit(PCIDevice *pci_dev)
> @@ -2599,7 +2650,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>  }
>  
>  static Property nvme_props[] = {
> -    DEFINE_BLOCK_PROPERTIES(NvmeCtrl, conf),
> +    DEFINE_BLOCK_PROPERTIES_BASE(NvmeCtrl, conf), \
> +    DEFINE_PROP_DRIVE("drive", NvmeCtrl, namespace.blk), \
>      DEFINE_NVME_PROPERTIES(NvmeCtrl, params),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -2631,26 +2683,35 @@ static void nvme_instance_init(Object *obj)
>  {
>      NvmeCtrl *s = NVME(obj);
>  
> -    device_add_bootindex_property(obj, &s->conf.bootindex,
> -                                  "bootindex", "/namespace@1,0",
> -                                  DEVICE(obj), &error_abort);
> +    if (s->namespace.blk) {
> +        device_add_bootindex_property(obj, &s->conf.bootindex,
> +                                      "bootindex", "/namespace@1,0",
> +                                      DEVICE(obj), &error_abort);
> +    }
>  }
>  
>  static const TypeInfo nvme_info = {
>      .name          = TYPE_NVME,
>      .parent        = TYPE_PCI_DEVICE,
>      .instance_size = sizeof(NvmeCtrl),
> -    .class_init    = nvme_class_init,
>      .instance_init = nvme_instance_init,
> +    .class_init    = nvme_class_init,
>      .interfaces = (InterfaceInfo[]) {
>          { INTERFACE_PCIE_DEVICE },
>          { }
>      },
>  };
>  
> +static const TypeInfo nvme_bus_info = {
> +    .name = TYPE_NVME_BUS,
> +    .parent = TYPE_BUS,
> +    .instance_size = sizeof(NvmeBus),
> +};
> +
>  static void nvme_register_types(void)
>  {
>      type_register_static(&nvme_info);
> +    type_register_static(&nvme_bus_info);
>  }
>  
>  type_init(nvme_register_types)
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 5d5fa8c8833a..c66f6cd8413a 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -2,6 +2,9 @@
>  #define HW_NVME_H
>  
>  #include "block/nvme.h"
> +#include "nvme-ns.h"
> +
> +#define NVME_MAX_NAMESPACES 256
>  
>  #define DEFINE_NVME_PROPERTIES(_state, _props) \
>      DEFINE_PROP_STRING("serial", _state, _props.serial), \
> @@ -110,26 +113,6 @@ typedef struct NvmeCQueue {
>      QTAILQ_HEAD(, NvmeRequest) req_list;
>  } NvmeCQueue;
>  
> -typedef struct NvmeNamespace {
> -    NvmeIdNs        id_ns;
> -} NvmeNamespace;
> -
> -static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
> -{
> -    NvmeIdNs *id_ns = &ns->id_ns;
> -    return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
> -}
> -
> -static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
> -{
> -    return nvme_ns_lbaf(ns)->ds;
> -}
> -
> -static inline size_t nvme_ns_lbads_bytes(NvmeNamespace *ns)
> -{
> -    return 1 << nvme_ns_lbads(ns);
> -}
> -
>  typedef enum NvmeAIOOp {
>      NVME_AIO_OPC_NONE         = 0x0,
>      NVME_AIO_OPC_FLUSH        = 0x1,
> @@ -184,6 +167,13 @@ static inline bool nvme_req_is_write(NvmeRequest *req)
>      }
>  }
>  
> +#define TYPE_NVME_BUS "nvme-bus"
> +#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
> +
> +typedef struct NvmeBus {
> +    BusState parent_bus;
> +} NvmeBus;
> +
>  #define TYPE_NVME "nvme"
>  #define NVME(obj) \
>          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
> @@ -193,8 +183,9 @@ typedef struct NvmeCtrl {
>      MemoryRegion iomem;
>      MemoryRegion ctrl_mem;
>      NvmeBar      bar;
> -    BlockConf    conf;
>      NvmeParams   params;
> +    NvmeBus      bus;
> +    BlockConf    conf;
>  
>      bool        qs_created;
>      uint32_t    page_size;
> @@ -205,7 +196,6 @@ typedef struct NvmeCtrl {
>      uint32_t    reg_size;
>      uint32_t    num_namespaces;
>      uint32_t    max_q_ents;
> -    uint64_t    ns_size;
>      uint8_t     outstanding_aers;
>      uint8_t     *cmbuf;
>      uint64_t    irq_status;
> @@ -219,7 +209,8 @@ typedef struct NvmeCtrl {
>      QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
>      int         aer_queued;
>  
> -    NvmeNamespace   *namespaces;
> +    NvmeNamespace   namespace;
> +    NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
>      NvmeSQueue      **sq;
>      NvmeCQueue      **cq;
>      NvmeSQueue      admin_sq;
> @@ -228,9 +219,13 @@ typedef struct NvmeCtrl {
>      NvmeFeatureVal  features;
>  } NvmeCtrl;
>  
> -static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
> +static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>  {
> -    return n->ns_size >> nvme_ns_lbads(ns);
> +    if (!nsid || nsid > n->num_namespaces) {
> +        return NULL;
> +    }
> +
> +    return n->namespaces[nsid - 1];
>  }
>  
>  static inline uint16_t nvme_cid(NvmeRequest *req)
> @@ -253,4 +248,6 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
>      return req->sq->ctrl;
>  }
>  
> +int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
> +
>  #endif /* HW_NVME_H */
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 70702cc67d5a..3d907eaf0800 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -29,6 +29,7 @@ hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int t
>  
>  # nvme.c
>  # nvme traces for successful events
> +nvme_dev_register_namespace(uint32_t nsid) "nsid %"PRIu32""
>  nvme_dev_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
>  nvme_dev_irq_pin(void) "pulsing IRQ pin"
>  nvme_dev_irq_masked(void) "IRQ is masked"
> @@ -38,7 +39,7 @@ nvme_dev_map_sgl(uint16_t cid, uint8_t typ, uint32_t nlb, uint64_t len) "cid %"P
>  nvme_dev_req_register_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, uint64_t count, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" count
> %"PRIu64" opc \"%s\" req %p"
>  nvme_dev_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p"
>  nvme_dev_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
> -nvme_dev_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
> +nvme_dev_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
>  nvme_dev_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32""
>  nvme_dev_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
>  nvme_dev_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16",
> qflags=%"PRIu16""
> @@ -98,7 +99,6 @@ nvme_dev_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is null or no
>  nvme_dev_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
>  nvme_dev_err_invalid_prp2_missing(void) "PRP2 is null and more data to be transferred"
>  nvme_dev_err_invalid_prp(void) "invalid PRP"
> -nvme_dev_err_invalid_ns(uint32_t ns, uint32_t limit) "invalid namespace %u not within 1-%u"
>  nvme_dev_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8""
>  nvme_dev_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8""
>  nvme_dev_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64""

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 42/42] nvme: make lba data size configurable
  2020-03-16 14:29 ` [PATCH v6 42/42] nvme: make lba data size configurable Klaus Jensen
@ 2020-03-25 10:59   ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-25 10:59 UTC (permalink / raw)
  To: Klaus Jensen, qemu-block
  Cc: Kevin Wolf, Beata Michalska, qemu-devel, Max Reitz, Keith Busch,
	Javier Gonzalez

On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Keith Busch <kbusch@kernel.org>
> ---
>  hw/block/nvme-ns.c | 7 ++++++-
>  hw/block/nvme-ns.h | 4 +++-
>  hw/block/nvme.c    | 1 +
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index 6d975104171d..d7e5c81c5f16 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -18,7 +18,7 @@ static int nvme_ns_init(NvmeNamespace *ns)
>  {
>      NvmeIdNs *id_ns = &ns->id_ns;
>  
> -    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> +    id_ns->lbaf[0].ds = ns->params.lbads;
>      id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
>  
>      /* no thin provisioning */
> @@ -78,6 +78,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
>          return 1;
>      }
>  
> +    if (ns->params.lbads < 9 || ns->params.lbads > 12) {
> +        error_setg(errp, "unsupported lbads (supported: 9-12)");
> +        return 1;
> +    }
> +
>      return 0;
>  }
>  
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index 3c3651d485d0..43b78f8b8d9c 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -7,10 +7,12 @@
>  
>  #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
>      DEFINE_PROP_DRIVE("drive", _state, blk), \
> -    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
> +    DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \
> +    DEFINE_PROP_UINT8("lbads", _state, _props.lbads, BDRV_SECTOR_BITS)
>  
>  typedef struct NvmeNamespaceParams {
>      uint32_t nsid;
> +    uint8_t  lbads;
>  } NvmeNamespaceParams;
>  
>  typedef struct NvmeNamespace {
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 4f1504fc00fe..61a9da970d41 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2624,6 +2624,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>      if (n->namespace.blk) {
>          ns = &n->namespace;
>          ns->params.nsid = 1;
> +        ns->params.lbads = BDRV_SECTOR_BITS;
>  
>          if (nvme_ns_setup(n, ns, errp)) {
>              return;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 01/42] nvme: rename trace events to nvme_dev
  2020-03-25 10:36   ` Maxim Levitsky
@ 2020-03-31  5:38     ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:36, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Change the prefix of all nvme device related trace events to 'nvme_dev'
> > to not clash with trace events from the nvme block driver.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  hw/block/nvme.c       | 188 +++++++++++++++++++++---------------------
> >  hw/block/trace-events | 172 +++++++++++++++++++-------------------
> >  2 files changed, 180 insertions(+), 180 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index d28335cbf377..3e4b18956ed2 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1035,32 +1035,32 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
> >      switch (offset) {
> >      case 0xc:   /* INTMS */
> >          if (unlikely(msix_enabled(&(n->parent_obj)))) {
> > -            NVME_GUEST_ERR(nvme_ub_mmiowr_intmask_with_msix,
> > +            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_intmask_with_msix,
> >                             "undefined access to interrupt mask set"
> >                             " when MSI-X is enabled");
> >              /* should be ignored, fall through for now */
> >          }
> >          n->bar.intms |= data & 0xffffffff;
> >          n->bar.intmc = n->bar.intms;
> > -        trace_nvme_mmio_intm_set(data & 0xffffffff,
> > +        trace_nvme_dev_mmio_intm_set(data & 0xffffffff,
> >                                   n->bar.intmc);
> Indention.
> 

Fixed.

> >          nvme_irq_check(n);
> >          break;
> >      case 0x10:  /* INTMC */
> >          if (unlikely(msix_enabled(&(n->parent_obj)))) {
> > -            NVME_GUEST_ERR(nvme_ub_mmiowr_intmask_with_msix,
> > +            NVME_GUEST_ERR(nvme_dev_ub_mmiowr_intmask_with_msix,
> >                             "undefined access to interrupt mask clr"
> >                             " when MSI-X is enabled");
> >              /* should be ignored, fall through for now */
> >          }
> >          n->bar.intms &= ~(data & 0xffffffff);
> >          n->bar.intmc = n->bar.intms;
> > -        trace_nvme_mmio_intm_clr(data & 0xffffffff,
> > +        trace_nvme_dev_mmio_intm_clr(data & 0xffffffff,
> >                                   n->bar.intmc);
> Indention.
> 

Fixed.

> 
> 
> Other that indention nitpicks, no changes vs V5,
> so my reviewed-by kept correctly.
> 
> Best regards,
> 	Maxim Levitsky
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 04/42] nvme: bump spec data structures to v1.3
  2020-03-25 10:37   ` Maxim Levitsky
@ 2020-03-31  5:38     ` Klaus Birkelund Jensen
  2020-03-31 10:43       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:37, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Add missing fields in the Identify Controller and Identify Namespace
> > data structures to bring them in line with NVMe v1.3.
> > 
> > This also adds data structures and defines for SGL support which
> > requires a couple of trivial changes to the nvme block driver as well.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Fam Zheng <fam@euphon.net>
> > ---
> >  block/nvme.c         |  18 ++---
> >  hw/block/nvme.c      |  12 ++--
> >  include/block/nvme.h | 153 ++++++++++++++++++++++++++++++++++++++-----
> >  3 files changed, 151 insertions(+), 32 deletions(-)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index d41c4bda6e39..99b9bb3dac96 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -589,6 +675,16 @@ enum NvmeIdCtrlOncs {
> >  #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
> >  #define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf)
> >  
> > +#define NVME_CTRL_SGLS_SUPPORTED_MASK            (0x3 <<  0)
> > +#define NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT    (0x1 <<  0)
> > +#define NVME_CTRL_SGLS_SUPPORTED_DWORD_ALIGNMENT (0x1 <<  1)
> > +#define NVME_CTRL_SGLS_KEYED                     (0x1 <<  2)
> > +#define NVME_CTRL_SGLS_BITBUCKET                 (0x1 << 16)
> > +#define NVME_CTRL_SGLS_MPTR_CONTIGUOUS           (0x1 << 17)
> > +#define NVME_CTRL_SGLS_EXCESS_LENGTH             (0x1 << 18)
> > +#define NVME_CTRL_SGLS_MPTR_SGL                  (0x1 << 19)
> > +#define NVME_CTRL_SGLS_ADDR_OFFSET               (0x1 << 20)
> OK
> > +
> >  typedef struct NvmeFeatureVal {
> >      uint32_t    arbitration;
> >      uint32_t    power_mgmt;
> > @@ -611,6 +707,10 @@ typedef struct NvmeFeatureVal {
> >  #define NVME_INTC_THR(intc)     (intc & 0xff)
> >  #define NVME_INTC_TIME(intc)    ((intc >> 8) & 0xff)
> >  
> > +#define NVME_TEMP_THSEL(temp)  ((temp >> 20) & 0x3)
> Nitpick: If we are adding this, I'll add a #define for the values as well
> 

Done. And used in the subsequent "nvme: add temperature threshold
feature" patch.

> > +#define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
> > +#define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
> > +
> >  enum NvmeFeatureIds {
> >      NVME_ARBITRATION                = 0x1,
> >      NVME_POWER_MANAGEMENT           = 0x2,
> > @@ -653,18 +753,37 @@ typedef struct NvmeIdNs {
> >      uint8_t     mc;
> >      uint8_t     dpc;
> >      uint8_t     dps;
> > -
> >      uint8_t     nmic;
> >      uint8_t     rescap;
> >      uint8_t     fpi;
> >      uint8_t     dlfeat;
> > -
> > -    uint8_t     res34[94];
> > +    uint16_t    nawun;
> > +    uint16_t    nawupf;
> > +    uint16_t    nacwu;
> > +    uint16_t    nabsn;
> > +    uint16_t    nabo;
> > +    uint16_t    nabspf;
> > +    uint16_t    noiob;
> > +    uint8_t     nvmcap[16];
> > +    uint8_t     rsvd64[40];
> > +    uint8_t     nguid[16];
> > +    uint64_t    eui64;
> >      NvmeLBAF    lbaf[16];
> > -    uint8_t     res192[192];
> > +    uint8_t     rsvd192[192];
> >      uint8_t     vs[3712];
> >  } NvmeIdNs;
> Also checked this against V5, looks OK now
> 
> >  
> > +typedef struct NvmeIdNsDescr {
> > +    uint8_t nidt;
> > +    uint8_t nidl;
> > +    uint8_t rsvd2[2];
> > +} NvmeIdNsDescr;
> OK
> 
> 
> 
> > +
> > +#define NVME_NIDT_UUID_LEN 16
> > +
> > +enum {
> > +    NVME_NIDT_UUID = 0x3,
> Very minor nitpick: I'll would add others as well just for the sake
> of better understanding what this is
> 

Done.

> > +};
> >  
> >  /*Deallocate Logical Block Features*/
> >  #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)
> 
> Looks very good.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 05/42] nvme: use constant for identify data size
  2020-03-25 10:37   ` Maxim Levitsky
@ 2020-03-31  5:38     ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:37, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 40cb176dea3c..f716f690a594 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -679,7 +679,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify *c)
> >  
> >  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
> >  {
> > -    static const int data_len = 4 * KiB;
> > +    static const int data_len = NVME_IDENTIFY_DATA_SIZE;
> >      uint32_t min_nsid = le32_to_cpu(c->nsid);
> >      uint64_t prp1 = le64_to_cpu(c->prp1);
> >      uint64_t prp2 = le64_to_cpu(c->prp2);
> 
> I'll probably squash this with some other refactoring patch,
> but I absolutely don't mind leaving this as is.
> Fine grained patches never cause any harm.
> 

Squashed into 06/42.

> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 06/42] nvme: add identify cns values in header
  2020-03-25 10:37   ` Maxim Levitsky
@ 2020-03-31  5:39     ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:37, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index f716f690a594..b38d7e548a60 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -709,11 +709,11 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
> >      NvmeIdentify *c = (NvmeIdentify *)cmd;
> >  
> >      switch (le32_to_cpu(c->cns)) {
> > -    case 0x00:
> > +    case NVME_ID_CNS_NS:
> >          return nvme_identify_ns(n, c);
> > -    case 0x01:
> > +    case NVME_ID_CNS_CTRL:
> >          return nvme_identify_ctrl(n, c);
> > -    case 0x02:
> > +    case NVME_ID_CNS_NS_ACTIVE_LIST:
> >          return nvme_identify_nslist(n, c);
> >      default:
> >          trace_nvme_dev_err_invalid_identify_cns(le32_to_cpu(c->cns));
> 
> This is a very good candidate to be squished with the patch 5 IMHO,
> but you can leave this as is as well. I don't mind.
> 

Squashed!

> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 07/42] nvme: refactor nvme_addr_read
  2020-03-25 10:38   ` Maxim Levitsky
@ 2020-03-31  5:39     ` Klaus Birkelund Jensen
  2020-03-31 10:41       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:38, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Pull the controller memory buffer check to its own function. The check
> > will be used on its own in later patches.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c | 16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index b38d7e548a60..08a83d449de3 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -52,14 +52,22 @@
> >  
> >  static void nvme_process_sq(void *opaque);
> >  
> > +static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
> > +{
> > +    hwaddr low = n->ctrl_mem.addr;
> > +    hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
> > +
> > +    return addr >= low && addr < hi;
> > +}
> > +
> >  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> >  {
> > -    if (n->cmbsz && addr >= n->ctrl_mem.addr &&
> > -                addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
> > +    if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> >          memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
> > -    } else {
> > -        pci_dma_read(&n->parent_obj, addr, buf, size);
> > +        return;
> >      }
> > +
> > +    pci_dma_read(&n->parent_obj, addr, buf, size);
> >  }
> >  
> >  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
> 
> Note that this patch still contains a bug that it removes the check against the accessed
> size, which you fix in later patch.
> I prefer to not add a bug in first place
> However if you have a reason for this, I won't mind.
> 

So yeah. The resons is that there is actually no bug at this point
because the controller only supports PRPs. I actually thought there was
a bug as well and reported it to qemu-security some months ago as a
potential out of bounds access. I was then schooled by Keith on how PRPs
work ;) Below is a paraphrased version of Keiths analysis.

The PRPs does not cross page boundaries:

    trans_len = n->page_size - (prp1 % n->page_size);

The PRPs are always verified to be page aligned:

    if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {

and the transfer length wont go above page size. So, since the beginning
of the address is within the CMB and considering that the CMB is of an
MB aligned and sized granularity, then we can never cross outside it
with PRPs.

I could add the check at this point (because it *is* needed for when
SGLs are introduced), but I think it would just be noise and I would
need to explain why the check is there, but not really needed at this
point. Instead I'm adding a new patch before the SGL patch that explains
this.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 09/42] nvme: add max_ioqpairs device parameter
  2020-03-25 10:39   ` Maxim Levitsky
@ 2020-03-31  5:40     ` Klaus Birkelund Jensen
  2020-03-31  9:48       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:39, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > The num_queues device paramater has a slightly confusing meaning because
> > it accounts for the admin queue pair which is not really optional.
> > Secondly, it is really a maximum value of queues allowed.
> > 
> > Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
> > but keep num_queues for compatibility.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c | 45 ++++++++++++++++++++++++++-------------------
> >  hw/block/nvme.h |  4 +++-
> >  2 files changed, 29 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 7cf7cf55143e..7dfd8a1a392d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1332,9 +1333,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> >      int64_t bs_size;
> >      uint8_t *pci_conf;
> >  
> > -    if (!n->params.num_queues) {
> > -        error_setg(errp, "num_queues can't be zero");
> > -        return;
> > +    if (n->params.num_queues) {
> > +        warn_report("nvme: num_queues is deprecated; please use max_ioqpairs "
> > +                    "instead");
> > +
> > +        n->params.max_ioqpairs = n->params.num_queues - 1;
> > +    }
> > +
> > +    if (!n->params.max_ioqpairs) {
> > +        error_setg(errp, "max_ioqpairs can't be less than 1");
> >      }
> This is not even a nitpick, but just and idea.
> 
> It might be worth it to allow max_ioqpairs=0 to simulate a 'broken'
> nvme controller. I know that kernel has special handling for such controllers,
> which include only creation of the control character device (/dev/nvme*) through
> which the user can submit commands to try and 'fix' the controller (by re-uploading firmware
> maybe or something like that).
> 
> 

Not sure about the implications of this, so I'll leave that on the TODO
:) But a controller with no I/O queues is an "Administrative Controller"
and perfectly legal in NVMe v1.4 AFAIK.

> >  
> >      if (!n->conf.blk) {
> > @@ -1365,19 +1372,19 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> >      pcie_endpoint_cap_init(pci_dev, 0x80);
> >  
> >      n->num_namespaces = 1;
> > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
> > +    n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
> 
> I hate to say it, but it looks like this thing (which I mentioned to you in V5)
> was pre-existing bug, which is indeed fixed now.
> In theory such fixes should go to separate patches, but in this case, I guess it would
> be too much to ask for it.
> Maybe mention this in the commit message instead, so that this fix doesn't stay hidden like that?
> 
> 

I'm convinced now. I have added a preparatory bugfix patch before this
patch.

> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 10/42] nvme: refactor device realization
  2020-03-25 10:40   ` Maxim Levitsky
@ 2020-03-31  5:40     ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:40, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > This patch splits up nvme_realize into multiple individual functions,
> > each initializing a different subset of the device.
> > 
> > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c | 178 ++++++++++++++++++++++++++++++------------------
> >  hw/block/nvme.h |  23 ++++++-
> >  2 files changed, 134 insertions(+), 67 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 7dfd8a1a392d..665485045066 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1340,57 +1337,100 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> >          n->params.max_ioqpairs = n->params.num_queues - 1;
> >      }
> >  
> > -    if (!n->params.max_ioqpairs) {
> > -        error_setg(errp, "max_ioqpairs can't be less than 1");
> > +    if (params->max_ioqpairs < 1 ||
> > +        params->max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
> > +        error_setg(errp, "nvme: max_ioqpairs must be ");
> Looks like the error message is not complete now.

Fixed!

> 
> Small nitpick: To be honest this not only refactoring in the device realization since you also (rightfully)
> removed the duplicated cmbsz/cmbloc so I would add a mention for this in the commit message.
> But that doesn't matter that much, so
> 

You are right. I've added it as a separate patch.

> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 11/42] nvme: add temperature threshold feature
  2020-03-25 10:40   ` Maxim Levitsky
@ 2020-03-31  5:40     ` Klaus Birkelund Jensen
  2020-03-31  9:46       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:40, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > It might seem wierd to implement this feature for an emulated device,
> > but it is mandatory to support and the feature is useful for testing
> > asynchronous event request support, which will be added in a later
> > patch.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++
> >  hw/block/nvme.h      |  2 ++
> >  include/block/nvme.h |  8 +++++++-
> >  3 files changed, 57 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> > index b7c465560eea..8cda5f02c622 100644
> > --- a/hw/block/nvme.h
> > +++ b/hw/block/nvme.h
> > @@ -108,6 +108,7 @@ typedef struct NvmeCtrl {
> >      uint64_t    irq_status;
> >      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
> >      uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
> > +    uint16_t    temperature;
> You forgot to move this too.
> 

Fixed!
> 
> With 'temperature' field removed from the header:
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 12/42] nvme: add support for the get log page command
  2020-03-25 10:40   ` Maxim Levitsky
@ 2020-03-31  5:41     ` Klaus Birkelund Jensen
  2020-03-31  9:45       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:40, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Add support for the Get Log Page command and basic implementations of
> > the mandatory Error Information, SMART / Health Information and Firmware
> > Slot Information log pages.
> > 
> > In violation of the specification, the SMART / Health Information log
> > page does not persist information over the lifetime of the controller
> > because the device has no place to store such persistent state.
> > 
> > Note that the LPA field in the Identify Controller data structure
> > intentionally has bit 0 cleared because there is no namespace specific
> > information in the SMART / Health information log page.
> > 
> > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > Section 5.10 ("Get Log Page command").
> > 
> > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c       | 138 +++++++++++++++++++++++++++++++++++++++++-
> >  hw/block/nvme.h       |  10 +++
> >  hw/block/trace-events |   2 +
> >  3 files changed, 149 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 64c42101df5c..83ff3fbfb463 100644
> >
> > +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> > +                                uint64_t off, NvmeRequest *req)
> > +{
> > +    uint32_t trans_len;
> > +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> > +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> > +    uint8_t errlog[64];
> I'll would replace this with sizeof(NvmeErrorLogEntry)
> (and add NvmeErrorLogEntry to the nvme.h), just for the sake of consistency,
> and in case we end up reporting some errors to the log in the future.
> 

NvmeErrorLog is already in nvme.h; Fixed to actually use it.

> 
> > +
> > +    if (off > sizeof(errlog)) {
> > +        return NVME_INVALID_FIELD | NVME_DNR;
> > +    }
> > +
> > +    memset(errlog, 0x0, sizeof(errlog));
> > +
> > +    trans_len = MIN(sizeof(errlog) - off, buf_len);
> > +
> > +    return nvme_dma_read_prp(n, errlog, trans_len, prp1, prp2);
> > +}
> Besides this, looks good now.
> 
> 
> Best regards,
> 	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 14/42] nvme: add missing mandatory features
  2020-03-25 10:41   ` Maxim Levitsky
@ 2020-03-31  5:41     ` Klaus Birkelund Jensen
  2020-03-31  9:39       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:41, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Add support for returning a resonable response to Get/Set Features of
> > mandatory features.
> > 
> > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c       | 60 ++++++++++++++++++++++++++++++++++++++++++-
> >  hw/block/trace-events |  2 ++
> >  include/block/nvme.h  |  6 ++++-
> >  3 files changed, 66 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index ff8975cd6667..eb9c722df968 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1058,6 +1069,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> >          break;
> >      case NVME_TIMESTAMP:
> >          return nvme_get_feature_timestamp(n, cmd);
> > +    case NVME_INTERRUPT_COALESCING:
> > +        result = cpu_to_le32(n->features.int_coalescing);
> > +        break;
> > +    case NVME_INTERRUPT_VECTOR_CONF:
> > +        if ((dw11 & 0xffff) > n->params.max_ioqpairs + 1) {
> > +            return NVME_INVALID_FIELD | NVME_DNR;
> > +        }
> I still think that this should be >= since the interrupt vector is not zero based.
> So if we have for example 3 IO queues, then we have 4 queues in total
> which translates to irq numbers 0..3.
> 

Yes you are right. The device will support max_ioqpairs + 1 IVs, so
trying to access that would actually go 1 beyond the array.

Fixed.

> BTW the user of the device doesn't have to have 1:1 mapping between qid and msi interrupt index,
> in fact when MSI is not used, all the queues will map to the same vector, which will be interrupt 0
> from point of view of the device IMHO.
> So it kind of makes sense IMHO to have num_irqs or something, even if it technically equals to number of queues.
> 

Yeah, but the device will still *support* the N IVs, so they can still
be configured even though they will not be used. So I don't think we
need to introduce an additional parameter?

> > @@ -1120,6 +1146,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> >  
> >          break;
> >      case NVME_VOLATILE_WRITE_CACHE:
> > +        if (blk_enable_write_cache(n->conf.blk)) {
> > +            blk_flush(n->conf.blk);
> > +        }
> 
> (not your fault) but the blk_enable_write_cache function name is highly misleading,
> since it doesn't enable anything but just gets the flag if the write cache is enabled.
> It really should be called blk_get_enable_write_cache.
> 

Agreed :)

> > @@ -1804,6 +1860,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> >      id->cqes = (0x4 << 4) | 0x4;
> >      id->nn = cpu_to_le32(n->num_namespaces);
> >      id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
> > +
> Unrelated whitespace change

Fixed.

> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 19/42] nvme: enforce valid queue creation sequence
  2020-03-25 10:43   ` Maxim Levitsky
@ 2020-03-31  5:41     ` Klaus Birkelund Jensen
  2020-03-31  9:31       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:43, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Support returning Command Sequence Error if Set Features on Number of
> > Queues is called after queues have been created.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c | 7 +++++++
> >  hw/block/nvme.h | 1 +
> >  2 files changed, 8 insertions(+)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 007f8817f101..b40d27cddc46 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -881,6 +881,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
> >      cq = g_malloc0(sizeof(*cq));
> >      nvme_init_cq(cq, n, prp1, cqid, vector, qsize + 1,
> >          NVME_CQ_FLAGS_IEN(qflags));
> > +
> > +    n->qs_created = true;
> Very minor nitpick, maybe it is worth mentioning in a comment,
> why this is only needed in CQ creation, as you explained to me.
> 

Added.

> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 23/42] nvme: add mapping helpers
  2020-03-25 10:45   ` Maxim Levitsky
@ 2020-03-31  5:44     ` Klaus Birkelund Jensen
  2020-03-31  9:30       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:45, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Add nvme_map_addr, nvme_map_addr_cmb and nvme_addr_to_cmb helpers and
> > use them in nvme_map_prp.
> > 
> > This fixes a bug where in the case of a CMB transfer, the device would
> > map to the buffer with a wrong length.
> > 
> > Fixes: b2b2b67a00574 ("nvme: Add support for Read Data and Write Data in CMBs.")
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c       | 97 +++++++++++++++++++++++++++++++++++--------
> >  hw/block/trace-events |  1 +
> >  2 files changed, 81 insertions(+), 17 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 08267e847671..187c816eb6ad 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -153,29 +158,79 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
> >      }
> >  }
> >  
> > +static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
> > +                                  size_t len)
> > +{
> > +    if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> > +        return NVME_DATA_TRAS_ERROR;
> > +    }
> 
> I just noticed that
> in theory (not that it really matters) but addr+len refers to the byte which is already 
> not the part of the transfer.
> 

Oh. Good catch - and I think that it does matter? Can't we end up
rejecting a valid access? Anyway, I fixed it with a '- 1'.

> 
> > +
> > +    qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
> Also intersting is we can add 0 sized iovec.
> 

I added a check on len. This also makes sure the above '- 1' fix doesn't
cause an 'addr + 0 - 1' to be done.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 24/42] nvme: remove redundant has_sg member
  2020-03-25 10:45   ` Maxim Levitsky
@ 2020-03-31  5:44     ` Klaus Birkelund Jensen
  2020-03-31  9:25       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:45, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Remove the has_sg member from NvmeRequest since it's redundant.
> 
> To be honest this patch also replaces the dma_acct_start with block_acct_start
> which looks right to me, and IMHO its OK to have both in the same patch,
> but that should be mentioned in the commit message
> 

I pulled it to a separate patch :)

> With this fixed,
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 29/42] nvme: refactor request bounds checking
  2020-03-25 10:56   ` Maxim Levitsky
@ 2020-03-31  5:44     ` Klaus Birkelund Jensen
  2020-03-31  9:23       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:56, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c | 28 ++++++++++++++++++++++------
> >  1 file changed, 22 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index eecfad694bf8..ba520c76bae5 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -562,13 +577,14 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> >      uint64_t data_offset = slba << data_shift;
> >      int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
> >      enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
> > +    uint16_t status;
> >  
> >      trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
> >  
> > -    if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
> > +    status = nvme_check_bounds(n, ns, slba, nlb, req);
> > +    if (status) {
> >          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> > -        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
> > -        return NVME_LBA_RANGE | NVME_DNR;
> > +        return status;
> >      }
> >  
> >      if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
> Looks good as well, once we get support for discard, it will
> use this as well, but for now indeed only write zeros and read/write
> need bounds checking on the IO path.
> 

I have that patch in the submission queue and the check is factored out
there ;)

> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 31/42] nvme: add check for prinfo
  2020-03-25 10:57   ` Maxim Levitsky
@ 2020-03-31  5:45     ` Klaus Birkelund Jensen
  2020-03-31  9:17       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:45 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:57, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Check the validity of the PRINFO field.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  hw/block/nvme.c       | 50 ++++++++++++++++++++++++++++++++++++-------
> >  hw/block/trace-events |  1 +
> >  include/block/nvme.h  |  1 +
> >  3 files changed, 44 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 7d5340c272c6..0d2b5b45b0c5 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -505,6 +505,17 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
> >      return NVME_SUCCESS;
> >  }
> >  
> > +static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> > +                                         uint16_t ctrl, NvmeRequest *req)
> > +{
> > +    if ((ctrl & NVME_RW_PRINFO_PRACT) && !(ns->id_ns.dps & DPS_TYPE_MASK)) {
> > +        trace_nvme_dev_err_prinfo(nvme_cid(req), ctrl);
> > +        return NVME_INVALID_FIELD | NVME_DNR;
> > +    }
> 
> I refreshed my (still very limited) knowelege on the metadata
> and the protection info, and this is what I found:
> 
> I think that this is very far from complete, because we also have:
> 
> 1. PRCHECK. According to the spec it is independent of PRACT
>    And when some of it is set, 
>    together with enabled protection (the DPS field in namespace),
>    Then the 8 bytes of the protection info is checked (optionally using the
>    the EILBRT and ELBAT/ELBATM fields in the command and CRC of the data for the guard field)
> 
>    So this field should also be checked to be zero when protection is disabled
>    (I don't see an explicit requirement for that in the spec, but neither I see
>    such requirement for PRACT)
> 
> 2. The protection values to be written / checked ((E)ILBRT/(E)LBATM/(E)LBAT)
>    Same here, but also these should not be set when PRCHECK is not set for reads,
>    plus some are protection type specific.
> 
> 
> The spec does mention the 'Invalid Protection Information' error code which
> refers to invalid values in the PRINFO field.
> So this error code I think should be returned instead of the 'Invalid field'
> 
> Another thing to optionaly check is that the metadata pointer for separate metadata.
>  Is zero as long as we don't support metadata
> (again I don't see an explicit requirement for this in the spec, but it mentions:
> 
> "This field is valid only if the command has metadata that is not interleaved with
> the logical block data, as specified in the Format NVM command"
> 
> )
> 

I'm kinda inclined to just drop this patch. The spec actually says that
the PRACT and PRCHK fields are used only if the namespace is formatted
to use end-to-end protection information. Since we do not support that,
I don't think we even need to check it.

Any opinion on this?


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 32/42] nvme: allow multiple aios per command
  2020-03-25 10:57   ` Maxim Levitsky
@ 2020-03-31  5:47     ` Klaus Birkelund Jensen
  2020-03-31  9:10       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:47 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:57, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > This refactors how the device issues asynchronous block backend
> > requests. The NvmeRequest now holds a queue of NvmeAIOs that are
> > associated with the command. This allows multiple aios to be issued for
> > a command. Only when all requests have been completed will the device
> > post a completion queue entry.
> > 
> > Because the device is currently guaranteed to only issue a single aio
> > request per command, the benefit is not immediately obvious. But this
> > functionality is required to support metadata, the dataset management
> > command and other features.
> > 
> > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c       | 377 +++++++++++++++++++++++++++++++-----------
> >  hw/block/nvme.h       | 129 +++++++++++++--
> >  hw/block/trace-events |   6 +
> >  3 files changed, 407 insertions(+), 105 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 0d2b5b45b0c5..817384e3b1a9 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -373,6 +374,99 @@ static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
> >      return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
> >  }
> >  
> > +static void nvme_aio_destroy(NvmeAIO *aio)
> > +{
> > +    g_free(aio);
> > +}
> > +
> > +static inline void nvme_req_register_aio(NvmeRequest *req, NvmeAIO *aio,
> I guess I'll call this nvme_req_add_aio,
> or nvme_add_aio_to_reg.
> Thoughts?
> Also you can leave this as is, but add a comment on top explaining this
> 

nvme_req_add_aio it is :) And comment added.

> > +                                         NvmeAIOOp opc)
> > +{
> > +    aio->opc = opc;
> > +
> > +    trace_nvme_dev_req_register_aio(nvme_cid(req), aio, blk_name(aio->blk),
> > +                                    aio->offset, aio->len,
> > +                                    nvme_aio_opc_str(aio), req);
> > +
> > +    if (req) {
> > +        QTAILQ_INSERT_TAIL(&req->aio_tailq, aio, tailq_entry);
> > +    }
> > +}
> > +
> > +static void nvme_submit_aio(NvmeAIO *aio)
> OK, this name makes sense
> Also please add a comment on top.

Done.

> > @@ -505,9 +600,11 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
> >      return NVME_SUCCESS;
> >  }
> >  
> > -static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> > -                                         uint16_t ctrl, NvmeRequest *req)
> > +static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, uint16_t ctrl,
> > +                                         NvmeRequest *req)
> >  {
> > +    NvmeNamespace *ns = req->ns;
> > +
> This should go to the patch that added nvme_check_prinfo
> 

Probably killing that patch.

> > @@ -516,10 +613,10 @@ static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> >      return NVME_SUCCESS;
> >  }
> >  
> > -static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> > -                                         uint64_t slba, uint32_t nlb,
> > -                                         NvmeRequest *req)
> > +static inline uint16_t nvme_check_bounds(NvmeCtrl *n, uint64_t slba,
> > +                                         uint32_t nlb, NvmeRequest *req)
> >  {
> > +    NvmeNamespace *ns = req->ns;
> >      uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
> This should go to the patch that added nvme_check_bounds as well
> 

We can't really, because the NvmeRequest does not hold a reference to
the namespace as a struct member at that point. This is also an issue
with the nvme_check_prinfo function above.

> >  
> >      if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
> > @@ -530,55 +627,154 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> >      return NVME_SUCCESS;
> >  }
> >  
> > -static void nvme_rw_cb(void *opaque, int ret)
> > +static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
> > +{
> > +    NvmeNamespace *ns = req->ns;
> > +    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
> > +    uint16_t ctrl = le16_to_cpu(rw->control);
> > +    size_t len = req->nlb << nvme_ns_lbads(ns);
> > +    uint16_t status;
> > +
> > +    status = nvme_check_mdts(n, len, req);
> > +    if (status) {
> > +        return status;
> > +    }
> > +
> > +    status = nvme_check_prinfo(n, ctrl, req);
> > +    if (status) {
> > +        return status;
> > +    }
> > +
> > +    status = nvme_check_bounds(n, req->slba, req->nlb, req);
> > +    if (status) {
> > +        return status;
> > +    }
> > +
> > +    return NVME_SUCCESS;
> > +}
> 
> Nitpick: I hate to say it but nvme_check_rw should be in a separate patch as well.
> It will also make diff more readable (when adding a funtion and changing a function
> at the same time, you get a diff between two unrelated things)
> 

Done, but had to do it as a follow up patch.

> >  
> > -static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> > -    NvmeRequest *req)
> > +static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> Very small nitpick about zeros/zeroes: This should move to some refactoring patch to be honest. 
> 

Done ;)

> 
> The patch is still too large IMHO to review properly and few things can be split from it.
> I tried my best to review it but I might have missed something.
> 

Yeah, I know, but thanks for trying!


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 35/42] nvme: handle dma errors
  2020-03-25 10:58   ` Maxim Levitsky
@ 2020-03-31  5:47     ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:47 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:58, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Handling DMA errors gracefully is required for the device to pass the
> > block/011 test ("disable PCI device while doing I/O") in the blktests
> > suite.
> > 
> > With this patch the device passes the test by retrying "critical"
> > transfers (posting of completion entries and processing of submission
> > queue entries).
> > 
> > If DMA errors occur at any other point in the execution of the command
> > (say, while mapping the PRPs), the command is aborted with a Data
> > Transfer Error status code.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c       | 45 ++++++++++++++++++++++++++++++++-----------
> >  hw/block/trace-events |  2 ++
> >  include/block/nvme.h  |  2 +-
> >  3 files changed, 37 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 15ca2417af04..49d323566393 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -164,7 +164,7 @@ static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
> >                                    size_t len)
> >  {
> >      if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> > -        return NVME_DATA_TRAS_ERROR;
> > +        return NVME_DATA_TRANSFER_ERROR;
> 
> Minor nitpick: this is also a non functional refactoring.
> I don't think that each piece of a refactoring should be in a separate patch,
> so I usually group all the non functional (aka cosmetic) refactoring in one patch, usually the first in the series.
> But I try not to leave such refactoring in the functional patches.
> 
> However, since there is not that much cases like that left, I don't mind leaving this particular case as is.
> 

Noted. Keeping it here for now ;)

> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Best regards,
> 	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 36/42] nvme: add support for scatter gather lists
  2020-03-25 10:58   ` Maxim Levitsky
@ 2020-03-31  5:48     ` Klaus Birkelund Jensen
  2020-03-31  8:51       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:48 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:58, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > For now, support the Data Block, Segment and Last Segment descriptor
> > types.
> > 
> > See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
> > 
> > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > Acked-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/nvme.c       | 310 +++++++++++++++++++++++++++++++++++-------
> >  hw/block/trace-events |   4 +
> >  2 files changed, 262 insertions(+), 52 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 49d323566393..b89b96990f52 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -76,7 +76,12 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
> >  
> >  static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> >  {
> > -    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
> > +    hwaddr hi = addr + size;
> > +    if (hi < addr) {
> > +        return 1;
> > +    }
> > +
> > +    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
> 
> I would suggest to split this into a separate patch as well, since this contains not just one but 2 bugfixes
> for this function and they are not related to sg lists.
> Or at least move this to 'nvme: refactor nvme_addr_read' and rename this patch
> to something like 'nvme: fix and refactor nvme_addr_read'
> 

I've split it into a patch.

> 
> >          memcpy(buf, nvme_addr_to_cmb(n, addr), size);
> >          return 0;
> >      }
> > @@ -328,13 +333,242 @@ unmap:
> >      return status;
> >  }
> >  
> > -static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> > -                             uint64_t prp1, uint64_t prp2, DMADirection dir,
> > +static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
> > +                                  QEMUIOVector *iov,
> > +                                  NvmeSglDescriptor *segment, uint64_t nsgld,
> > +                                  size_t *len, NvmeRequest *req)
> > +{
> > +    dma_addr_t addr, trans_len;
> > +    uint32_t blk_len;
> > +    uint16_t status;
> > +
> > +    for (int i = 0; i < nsgld; i++) {
> > +        uint8_t type = NVME_SGL_TYPE(segment[i].type);
> > +
> > +        if (type != NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> > +            switch (type) {
> > +            case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
> > +            case NVME_SGL_DESCR_TYPE_KEYED_DATA_BLOCK:
> > +                return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR;
> > +            default:
> To be honest I don't like that 'default'
> I would explicitly state which segment types remain 
> (I think segment list and last segment list, and various reserved types)
> In fact for the reserved types you probably also want to return NVME_SGL_DESCR_TYPE_INVALID)
> 

I "negated" the logic which I think is more readable. I still really
want to keep the default, for instance, nvme v1.4 adds a new type that
we do not support (the Transport SGL Data Block descriptor).

> Also this function as well really begs to have a description prior to it,
> something like 'map a sg list section, assuming that it only contains SGL data descriptions,
> caller has to ensure this'.
> 

Done.

> 
> > +                return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR;
> > +            }
> > +        }
> > +
> > +        if (*len == 0) {
> > +            uint16_t sgls = le16_to_cpu(n->id_ctrl.sgls);
> Nitpick: I would add a small comment here as well describiing
> what this does (We reach this point if sg list covers more that that
> was specified in the commmand, and the NVME_CTRL_SGLS_EXCESS_LENGTH controller
> capability indicates that we support just throwing the extra data away)
> 

Adding a comment. It's the other way around. The size as indicated by
NLB (or whatever depending on the command) is the "authoritative" souce
of information for the size of the payload. We will never accept an SGL
that is too short such that we lose or throw away data, but we might
accept ignoring parts of the SGL.

> > +            if (sgls & NVME_CTRL_SGLS_EXCESS_LENGTH) {
> > +                break;
> > +            }
> > +
> > +            trace_nvme_dev_err_invalid_sgl_excess_length(nvme_cid(req));
> > +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> > +        }
> > +
> > +        addr = le64_to_cpu(segment[i].addr);
> > +        blk_len = le32_to_cpu(segment[i].len);
> > +
> > +        if (!blk_len) {
> > +            continue;
> > +        }
> > +
> > +        if (UINT64_MAX - addr < blk_len) {
> > +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> > +        }
> Good!
> > +
> > +        trans_len = MIN(*len, blk_len);
> > +
> > +        status = nvme_map_addr(n, qsg, iov, addr, trans_len);
> > +        if (status) {
> > +            return status;
> > +        }
> > +
> > +        *len -= trans_len;
> > +    }
> > +
> > +    return NVME_SUCCESS;
> > +}
> > +
> > +static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
> > +                             NvmeSglDescriptor sgl, size_t len,
> >                               NvmeRequest *req)
> > +{
> > +    /*
> > +     * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
> > +     * dynamically allocating a potentially large SGL. The spec allows the SGL
> > +     * to be larger than the command transfer size, so it is not bounded by
> > +     * MDTS.
> > +     */
> Now this is a very good comment!
> 
> However I don't fully understand the note about the SGL. I assume that you mean
> that the data that SGL covers still should be less that MDTS, but the actual SGL chain,
> if assembled really in inefficient way (like 1 byte per each data descriptor) might be larger.
> 

Exactly. I'll rephrase.

> 
> > +    const int SEG_CHUNK_SIZE = 256;
> > +
> > +    NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
> > +    uint64_t nsgld;
> > +    uint32_t seg_len;
> > +    uint16_t status;
> > +    bool sgl_in_cmb = false;
> > +    hwaddr addr;
> > +    int ret;
> > +
> > +    sgld = &sgl;
> > +    addr = le64_to_cpu(sgl.addr);
> > +
> > +    trace_nvme_dev_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), req->nlb,
> > +                           len);
> > +
> > +    /*
> > +     * If the entire transfer can be described with a single data block it can
> > +     * be mapped directly.
> > +     */
> > +    if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> > +        status = nvme_map_sgl_data(n, qsg, iov, sgld, 1, &len, req);
> > +        if (status) {
> > +            goto unmap;
> > +        }
> > +
> > +        goto out;
> > +    }
> > +
> > +    /*
> > +     * If the segment is located in the CMB, the submission queue of the
> > +     * request must also reside there.
> > +     */
> > +    if (nvme_addr_is_cmb(n, addr)) {
> > +        if (!nvme_addr_is_cmb(n, req->sq->dma_addr)) {
> > +            return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> > +        }
> > +
> > +        sgl_in_cmb = true;
> > +    }
> > +
> > +    for (;;) {
> > +        seg_len = le32_to_cpu(sgld->len);
> > +
> > +        if (!seg_len || seg_len & 0xf) {
> > +            return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
> > +        }
> It might be worth noting here that we are dealing with sgl (last) segment descriptor
> and its length indeed must be non zero and multiple of 16.
> Otherwise I confused this for a moment with the alignment requirements on the data itsel.
> 

Done.

> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 38/42] nvme: support multiple namespaces
  2020-03-25 10:59   ` Maxim Levitsky
@ 2020-03-31  5:48     ` Klaus Birkelund Jensen
  2020-03-31  8:47       ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31  5:48 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 25 12:59, Maxim Levitsky wrote:
> On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > This adds support for multiple namespaces by introducing a new 'nvme-ns'
> > device model. The nvme device creates a bus named from the device name
> > ('id'). The nvme-ns devices then connect to this and registers
> > themselves with the nvme device.
> > 
> > This changes how an nvme device is created. Example with two namespaces:
> > 
> >   -drive file=nvme0n1.img,if=none,id=disk1
> >   -drive file=nvme0n2.img,if=none,id=disk2
> >   -device nvme,serial=deadbeef,id=nvme0
> >   -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
> >   -device nvme-ns,drive=disk2,bus=nvme0,nsid=2
> > 
> > The drive property is kept on the nvme device to keep the change
> > backward compatible, but the property is now optional. Specifying a
> > drive for the nvme device will always create the namespace with nsid 1.
> > 
> > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Reviewed-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  hw/block/Makefile.objs |   2 +-
> >  hw/block/nvme-ns.c     | 157 +++++++++++++++++++++++++++
> >  hw/block/nvme-ns.h     |  60 +++++++++++
> >  hw/block/nvme.c        | 233 ++++++++++++++++++++++++++---------------
> >  hw/block/nvme.h        |  47 ++++-----
> >  hw/block/trace-events  |   4 +-
> >  6 files changed, 389 insertions(+), 114 deletions(-)
> >  create mode 100644 hw/block/nvme-ns.c
> >  create mode 100644 hw/block/nvme-ns.h
> > 
> > diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> > index 4b4a2b338dc4..d9141d6a4b9b 100644
> > --- a/hw/block/Makefile.objs
> > +++ b/hw/block/Makefile.objs

> > @@ -2518,9 +2561,6 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> >      id->psd[0].mp = cpu_to_le16(0x9c4);
> >      id->psd[0].enlat = cpu_to_le32(0x10);
> >      id->psd[0].exlat = cpu_to_le32(0x4);
> > -    if (blk_enable_write_cache(n->conf.blk)) {
> > -        id->vwc = 1;
> > -    }
> Shouldn't that be kept? Assuming that user used the legacy 'drive' option,
> and it had no write cache enabled.
> 

When using the drive option we still end up calling the same code that
handles the "new style" namespaces and that code will handle the write
cache similary.

> >  
> >      n->bar.cap = 0;
> >      NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
> > @@ -2533,25 +2573,34 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> >      n->bar.intmc = n->bar.intms = 0;
> >  }
> >  
> > -static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> > +int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> >  {
> > -    int64_t bs_size;
> > -    NvmeIdNs *id_ns = &ns->id_ns;
> > +    uint32_t nsid = nvme_nsid(ns);
> >  
> > -    bs_size = blk_getlength(n->conf.blk);
> > -    if (bs_size < 0) {
> > -        error_setg_errno(errp, -bs_size, "blk_getlength");
> > +    if (nsid > NVME_MAX_NAMESPACES) {
> > +        error_setg(errp, "invalid nsid (must be between 0 and %d)",
> > +                   NVME_MAX_NAMESPACES);
> >          return -1;
> >      }
> >  
> > -    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> > -    n->ns_size = bs_size;
> > +    if (!nsid) {
> > +        for (int i = 1; i <= n->num_namespaces; i++) {
> > +            NvmeNamespace *ns = nvme_ns(n, i);
> > +            if (!ns) {
> > +                nsid = i;
> > +                break;
> > +            }
> > +        }
> This misses an edge error case, where all the namespaces are allocated.
> Yes, it would be insane to allocate all 256 namespaces but still.
> 

Impressive catch! Fixed!

> 
> > +    } else {
> > +        if (n->namespaces[nsid - 1]) {
> > +            error_setg(errp, "nsid must be unique");
> 
> I''l would change that error message to something like 
> "namespace id %d is already in use" or something like that.
> 

Done.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 38/42] nvme: support multiple namespaces
  2020-03-31  5:48     ` Klaus Birkelund Jensen
@ 2020-03-31  8:47       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  8:47 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:48 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:59, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > This adds support for multiple namespaces by introducing a new 'nvme-ns'
> > > device model. The nvme device creates a bus named from the device name
> > > ('id'). The nvme-ns devices then connect to this and registers
> > > themselves with the nvme device.
> > > 
> > > This changes how an nvme device is created. Example with two namespaces:
> > > 
> > >   -drive file=nvme0n1.img,if=none,id=disk1
> > >   -drive file=nvme0n2.img,if=none,id=disk2
> > >   -device nvme,serial=deadbeef,id=nvme0
> > >   -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
> > >   -device nvme-ns,drive=disk2,bus=nvme0,nsid=2
> > > 
> > > The drive property is kept on the nvme device to keep the change
> > > backward compatible, but the property is now optional. Specifying a
> > > drive for the nvme device will always create the namespace with nsid 1.
> > > 
> > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > Reviewed-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/Makefile.objs |   2 +-
> > >  hw/block/nvme-ns.c     | 157 +++++++++++++++++++++++++++
> > >  hw/block/nvme-ns.h     |  60 +++++++++++
> > >  hw/block/nvme.c        | 233 ++++++++++++++++++++++++++---------------
> > >  hw/block/nvme.h        |  47 ++++-----
> > >  hw/block/trace-events  |   4 +-
> > >  6 files changed, 389 insertions(+), 114 deletions(-)
> > >  create mode 100644 hw/block/nvme-ns.c
> > >  create mode 100644 hw/block/nvme-ns.h
> > > 
> > > diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> > > index 4b4a2b338dc4..d9141d6a4b9b 100644
> > > --- a/hw/block/Makefile.objs
> > > +++ b/hw/block/Makefile.objs
> > > @@ -2518,9 +2561,6 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> > >      id->psd[0].mp = cpu_to_le16(0x9c4);
> > >      id->psd[0].enlat = cpu_to_le32(0x10);
> > >      id->psd[0].exlat = cpu_to_le32(0x4);
> > > -    if (blk_enable_write_cache(n->conf.blk)) {
> > > -        id->vwc = 1;
> > > -    }
> > 
> > Shouldn't that be kept? Assuming that user used the legacy 'drive' option,
> > and it had no write cache enabled.
> > 
> 
> When using the drive option we still end up calling the same code that
> handles the "new style" namespaces and that code will handle the write
> cache similary.
OK. That makes sense.

> 
> > >  
> > >      n->bar.cap = 0;
> > >      NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
> > > @@ -2533,25 +2573,34 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> > >      n->bar.intmc = n->bar.intms = 0;
> > >  }
> > >  
> > > -static int nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> > > +int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> > >  {
> > > -    int64_t bs_size;
> > > -    NvmeIdNs *id_ns = &ns->id_ns;
> > > +    uint32_t nsid = nvme_nsid(ns);
> > >  
> > > -    bs_size = blk_getlength(n->conf.blk);
> > > -    if (bs_size < 0) {
> > > -        error_setg_errno(errp, -bs_size, "blk_getlength");
> > > +    if (nsid > NVME_MAX_NAMESPACES) {
> > > +        error_setg(errp, "invalid nsid (must be between 0 and %d)",
> > > +                   NVME_MAX_NAMESPACES);
> > >          return -1;
> > >      }
> > >  
> > > -    id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> > > -    n->ns_size = bs_size;
> > > +    if (!nsid) {
> > > +        for (int i = 1; i <= n->num_namespaces; i++) {
> > > +            NvmeNamespace *ns = nvme_ns(n, i);
> > > +            if (!ns) {
> > > +                nsid = i;
> > > +                break;
> > > +            }
> > > +        }
> > 
> > This misses an edge error case, where all the namespaces are allocated.
> > Yes, it would be insane to allocate all 256 namespaces but still.
> > 
> 
> Impressive catch! Fixed!
Thanks!
> 
> > 
> > > +    } else {
> > > +        if (n->namespaces[nsid - 1]) {
> > > +            error_setg(errp, "nsid must be unique");
> > 
> > I''l would change that error message to something like 
> > "namespace id %d is already in use" or something like that.
> > 
> 
> Done.
> 

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 36/42] nvme: add support for scatter gather lists
  2020-03-31  5:48     ` Klaus Birkelund Jensen
@ 2020-03-31  8:51       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  8:51 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:48 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:58, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > For now, support the Data Block, Segment and Last Segment descriptor
> > > types.
> > > 
> > > See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
> > > 
> > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/nvme.c       | 310 +++++++++++++++++++++++++++++++++++-------
> > >  hw/block/trace-events |   4 +
> > >  2 files changed, 262 insertions(+), 52 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 49d323566393..b89b96990f52 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -76,7 +76,12 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
> > >  
> > >  static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> > >  {
> > > -    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
> > > +    hwaddr hi = addr + size;
> > > +    if (hi < addr) {
> > > +        return 1;
> > > +    }
> > > +
> > > +    if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
> > 
> > I would suggest to split this into a separate patch as well, since this contains not just one but 2 bugfixes
> > for this function and they are not related to sg lists.
> > Or at least move this to 'nvme: refactor nvme_addr_read' and rename this patch
> > to something like 'nvme: fix and refactor nvme_addr_read'
> > 
> 
> I've split it into a patch.
> 
> > 
> > >          memcpy(buf, nvme_addr_to_cmb(n, addr), size);
> > >          return 0;
> > >      }
> > > @@ -328,13 +333,242 @@ unmap:
> > >      return status;
> > >  }
> > >  
> > > -static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> > > -                             uint64_t prp1, uint64_t prp2, DMADirection dir,
> > > +static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
> > > +                                  QEMUIOVector *iov,
> > > +                                  NvmeSglDescriptor *segment, uint64_t nsgld,
> > > +                                  size_t *len, NvmeRequest *req)
> > > +{
> > > +    dma_addr_t addr, trans_len;
> > > +    uint32_t blk_len;
> > > +    uint16_t status;
> > > +
> > > +    for (int i = 0; i < nsgld; i++) {
> > > +        uint8_t type = NVME_SGL_TYPE(segment[i].type);
> > > +
> > > +        if (type != NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> > > +            switch (type) {
> > > +            case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
> > > +            case NVME_SGL_DESCR_TYPE_KEYED_DATA_BLOCK:
> > > +                return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR;
> > > +            default:
> > 
> > To be honest I don't like that 'default'
> > I would explicitly state which segment types remain 
> > (I think segment list and last segment list, and various reserved types)
> > In fact for the reserved types you probably also want to return NVME_SGL_DESCR_TYPE_INVALID)
> > 
> 
> I "negated" the logic which I think is more readable. I still really
> want to keep the default, for instance, nvme v1.4 adds a new type that
> we do not support (the Transport SGL Data Block descriptor).
OK, I'll take a look a that in the next version of the patches.

> 
> > Also this function as well really begs to have a description prior to it,
> > something like 'map a sg list section, assuming that it only contains SGL data descriptions,
> > caller has to ensure this'.
> > 
> 
> Done.
Thanks a lot!
> 
> > 
> > > +                return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR;
> > > +            }
> > > +        }
> > > +
> > > +        if (*len == 0) {
> > > +            uint16_t sgls = le16_to_cpu(n->id_ctrl.sgls);
> > 
> > Nitpick: I would add a small comment here as well describiing
> > what this does (We reach this point if sg list covers more that that
> > was specified in the commmand, and the NVME_CTRL_SGLS_EXCESS_LENGTH controller
> > capability indicates that we support just throwing the extra data away)
> > 
> 
> Adding a comment. It's the other way around. The size as indicated by
> NLB (or whatever depending on the command) is the "authoritative" souce
> of information for the size of the payload. We will never accept an SGL
> that is too short such that we lose or throw away data, but we might
> accept ignoring parts of the SGL.
Yes, that is what I meant. Thanks!

> 
> > > +            if (sgls & NVME_CTRL_SGLS_EXCESS_LENGTH) {
> > > +                break;
> > > +            }
> > > +
> > > +            trace_nvme_dev_err_invalid_sgl_excess_length(nvme_cid(req));
> > > +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> > > +        }
> > > +
> > > +        addr = le64_to_cpu(segment[i].addr);
> > > +        blk_len = le32_to_cpu(segment[i].len);
> > > +
> > > +        if (!blk_len) {
> > > +            continue;
> > > +        }
> > > +
> > > +        if (UINT64_MAX - addr < blk_len) {
> > > +            return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
> > > +        }
> > 
> > Good!
> > > +
> > > +        trans_len = MIN(*len, blk_len);
> > > +
> > > +        status = nvme_map_addr(n, qsg, iov, addr, trans_len);
> > > +        if (status) {
> > > +            return status;
> > > +        }
> > > +
> > > +        *len -= trans_len;
> > > +    }
> > > +
> > > +    return NVME_SUCCESS;
> > > +}
> > > +
> > > +static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
> > > +                             NvmeSglDescriptor sgl, size_t len,
> > >                               NvmeRequest *req)
> > > +{
> > > +    /*
> > > +     * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
> > > +     * dynamically allocating a potentially large SGL. The spec allows the SGL
> > > +     * to be larger than the command transfer size, so it is not bounded by
> > > +     * MDTS.
> > > +     */
> > 
> > Now this is a very good comment!
> > 
> > However I don't fully understand the note about the SGL. I assume that you mean
> > that the data that SGL covers still should be less that MDTS, but the actual SGL chain,
> > if assembled really in inefficient way (like 1 byte per each data descriptor) might be larger.
> > 
> 
> Exactly. I'll rephrase.
Thanks!
> 
> > 
> > > +    const int SEG_CHUNK_SIZE = 256;
> > > +
> > > +    NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
> > > +    uint64_t nsgld;
> > > +    uint32_t seg_len;
> > > +    uint16_t status;
> > > +    bool sgl_in_cmb = false;
> > > +    hwaddr addr;
> > > +    int ret;
> > > +
> > > +    sgld = &sgl;
> > > +    addr = le64_to_cpu(sgl.addr);
> > > +
> > > +    trace_nvme_dev_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), req->nlb,
> > > +                           len);
> > > +
> > > +    /*
> > > +     * If the entire transfer can be described with a single data block it can
> > > +     * be mapped directly.
> > > +     */
> > > +    if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
> > > +        status = nvme_map_sgl_data(n, qsg, iov, sgld, 1, &len, req);
> > > +        if (status) {
> > > +            goto unmap;
> > > +        }
> > > +
> > > +        goto out;
> > > +    }
> > > +
> > > +    /*
> > > +     * If the segment is located in the CMB, the submission queue of the
> > > +     * request must also reside there.
> > > +     */
> > > +    if (nvme_addr_is_cmb(n, addr)) {
> > > +        if (!nvme_addr_is_cmb(n, req->sq->dma_addr)) {
> > > +            return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> > > +        }
> > > +
> > > +        sgl_in_cmb = true;
> > > +    }
> > > +
> > > +    for (;;) {
> > > +        seg_len = le32_to_cpu(sgld->len);
> > > +
> > > +        if (!seg_len || seg_len & 0xf) {
> > > +            return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
> > > +        }
> > 
> > It might be worth noting here that we are dealing with sgl (last) segment descriptor
> > and its length indeed must be non zero and multiple of 16.
> > Otherwise I confused this for a moment with the alignment requirements on the data itsel.
> > 
> 
> Done.
Thanks as well!
> 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > 
> > 
> > 
> > 
> 

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 32/42] nvme: allow multiple aios per command
  2020-03-31  5:47     ` Klaus Birkelund Jensen
@ 2020-03-31  9:10       ` Maxim Levitsky
  2020-04-08 15:02         ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:10 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:47 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:57, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > This refactors how the device issues asynchronous block backend
> > > requests. The NvmeRequest now holds a queue of NvmeAIOs that are
> > > associated with the command. This allows multiple aios to be issued for
> > > a command. Only when all requests have been completed will the device
> > > post a completion queue entry.
> > > 
> > > Because the device is currently guaranteed to only issue a single aio
> > > request per command, the benefit is not immediately obvious. But this
> > > functionality is required to support metadata, the dataset management
> > > command and other features.
> > > 
> > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/nvme.c       | 377 +++++++++++++++++++++++++++++++-----------
> > >  hw/block/nvme.h       | 129 +++++++++++++--
> > >  hw/block/trace-events |   6 +
> > >  3 files changed, 407 insertions(+), 105 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 0d2b5b45b0c5..817384e3b1a9 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -373,6 +374,99 @@ static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, QEMUSGList *qsg,
> > >      return nvme_map_prp(n, qsg, iov, prp1, prp2, len, req);
> > >  }
> > >  
> > > +static void nvme_aio_destroy(NvmeAIO *aio)
> > > +{
> > > +    g_free(aio);
> > > +}
> > > +
> > > +static inline void nvme_req_register_aio(NvmeRequest *req, NvmeAIO *aio,
> > 
> > I guess I'll call this nvme_req_add_aio,
> > or nvme_add_aio_to_reg.
> > Thoughts?
> > Also you can leave this as is, but add a comment on top explaining this
> > 
> 
> nvme_req_add_aio it is :) And comment added.
Thanks a lot!

> 
> > > +                                         NvmeAIOOp opc)
> > > +{
> > > +    aio->opc = opc;
> > > +
> > > +    trace_nvme_dev_req_register_aio(nvme_cid(req), aio, blk_name(aio->blk),
> > > +                                    aio->offset, aio->len,
> > > +                                    nvme_aio_opc_str(aio), req);
> > > +
> > > +    if (req) {
> > > +        QTAILQ_INSERT_TAIL(&req->aio_tailq, aio, tailq_entry);
> > > +    }
> > > +}
> > > +
> > > +static void nvme_submit_aio(NvmeAIO *aio)
> > 
> > OK, this name makes sense
> > Also please add a comment on top.
> 
> Done.
Thanks!
> 
> > > @@ -505,9 +600,11 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
> > >      return NVME_SUCCESS;
> > >  }
> > >  
> > > -static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> > > -                                         uint16_t ctrl, NvmeRequest *req)
> > > +static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, uint16_t ctrl,
> > > +                                         NvmeRequest *req)
> > >  {
> > > +    NvmeNamespace *ns = req->ns;
> > > +
> > 
> > This should go to the patch that added nvme_check_prinfo
> > 
> 
> Probably killing that patch.

Yea, I also agree on that. Once we properly support metadata,
then we can add all the checks for its correctness.

> 
> > > @@ -516,10 +613,10 @@ static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> > >      return NVME_SUCCESS;
> > >  }
> > >  
> > > -static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> > > -                                         uint64_t slba, uint32_t nlb,
> > > -                                         NvmeRequest *req)
> > > +static inline uint16_t nvme_check_bounds(NvmeCtrl *n, uint64_t slba,
> > > +                                         uint32_t nlb, NvmeRequest *req)
> > >  {
> > > +    NvmeNamespace *ns = req->ns;
> > >      uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
> > 
> > This should go to the patch that added nvme_check_bounds as well
> > 
> 
> We can't really, because the NvmeRequest does not hold a reference to
> the namespace as a struct member at that point. This is also an issue
> with the nvme_check_prinfo function above.

I see it now. The changes to NvmeRequest together with this are a good candidate
to split from this patch to get this patch to size that is easy to review.

> 
> > >  
> > >      if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
> > > @@ -530,55 +627,154 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> > >      return NVME_SUCCESS;
> > >  }
> > >  
> > > -static void nvme_rw_cb(void *opaque, int ret)
> > > +static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req)
> > > +{
> > > +    NvmeNamespace *ns = req->ns;
> > > +    NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
> > > +    uint16_t ctrl = le16_to_cpu(rw->control);
> > > +    size_t len = req->nlb << nvme_ns_lbads(ns);
> > > +    uint16_t status;
> > > +
> > > +    status = nvme_check_mdts(n, len, req);
> > > +    if (status) {
> > > +        return status;
> > > +    }
> > > +
> > > +    status = nvme_check_prinfo(n, ctrl, req);
> > > +    if (status) {
> > > +        return status;
> > > +    }
> > > +
> > > +    status = nvme_check_bounds(n, req->slba, req->nlb, req);
> > > +    if (status) {
> > > +        return status;
> > > +    }
> > > +
> > > +    return NVME_SUCCESS;
> > > +}
> > 
> > Nitpick: I hate to say it but nvme_check_rw should be in a separate patch as well.
> > It will also make diff more readable (when adding a funtion and changing a function
> > at the same time, you get a diff between two unrelated things)
> > 
> 
> Done, but had to do it as a follow up patch.
I guess it won't help to do this in a followup patch since this won't simplify this
patch. I'll take a look when you publish the next version.

> 
> > >  
> > > -static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> > > -    NvmeRequest *req)
> > > +static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > 
> > Very small nitpick about zeros/zeroes: This should move to some refactoring patch to be honest. 
> > 
> 
> Done ;)
> 
> > 
> > The patch is still too large IMHO to review properly and few things can be split from it.
> > I tried my best to review it but I might have missed something.
> > 
> 
> Yeah, I know, but thanks for trying!

Thanks to you too.

Best regards,
	Maxim Levitsky

> 




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 31/42] nvme: add check for prinfo
  2020-03-31  5:45     ` Klaus Birkelund Jensen
@ 2020-03-31  9:17       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:17 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:45 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:57, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Check the validity of the PRINFO field.
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > ---
> > >  hw/block/nvme.c       | 50 ++++++++++++++++++++++++++++++++++++-------
> > >  hw/block/trace-events |  1 +
> > >  include/block/nvme.h  |  1 +
> > >  3 files changed, 44 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 7d5340c272c6..0d2b5b45b0c5 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -505,6 +505,17 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len,
> > >      return NVME_SUCCESS;
> > >  }
> > >  
> > > +static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> > > +                                         uint16_t ctrl, NvmeRequest *req)
> > > +{
> > > +    if ((ctrl & NVME_RW_PRINFO_PRACT) && !(ns->id_ns.dps & DPS_TYPE_MASK)) {
> > > +        trace_nvme_dev_err_prinfo(nvme_cid(req), ctrl);
> > > +        return NVME_INVALID_FIELD | NVME_DNR;
> > > +    }
> > 
> > I refreshed my (still very limited) knowelege on the metadata
> > and the protection info, and this is what I found:
> > 
> > I think that this is very far from complete, because we also have:
> > 
> > 1. PRCHECK. According to the spec it is independent of PRACT
> >    And when some of it is set, 
> >    together with enabled protection (the DPS field in namespace),
> >    Then the 8 bytes of the protection info is checked (optionally using the
> >    the EILBRT and ELBAT/ELBATM fields in the command and CRC of the data for the guard field)
> > 
> >    So this field should also be checked to be zero when protection is disabled
> >    (I don't see an explicit requirement for that in the spec, but neither I see
> >    such requirement for PRACT)
> > 
> > 2. The protection values to be written / checked ((E)ILBRT/(E)LBATM/(E)LBAT)
> >    Same here, but also these should not be set when PRCHECK is not set for reads,
> >    plus some are protection type specific.
> > 
> > 
> > The spec does mention the 'Invalid Protection Information' error code which
> > refers to invalid values in the PRINFO field.
> > So this error code I think should be returned instead of the 'Invalid field'
> > 
> > Another thing to optionaly check is that the metadata pointer for separate metadata.
> >  Is zero as long as we don't support metadata
> > (again I don't see an explicit requirement for this in the spec, but it mentions:
> > 
> > "This field is valid only if the command has metadata that is not interleaved with
> > the logical block data, as specified in the Format NVM command"
> > 
> > )
> > 
> 
> I'm kinda inclined to just drop this patch. The spec actually says that
> the PRACT and PRCHK fields are used only if the namespace is formatted
> to use end-to-end protection information. Since we do not support that,
> I don't think we even need to check it.
> 
> Any opinion on this?
Yep. I also think so.

I did add as much as possible checks on all the reserved fields in nvme-mdev,
checks for all all the unused fields like that, to make it as defensive as possible and to reduce
the attack surface to the minimum.
This can be done later when all the dust settles, its not a high priority for sure.


Best regard,
	Maxim Levitsky


> 




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 29/42] nvme: refactor request bounds checking
  2020-03-31  5:44     ` Klaus Birkelund Jensen
@ 2020-03-31  9:23       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:23 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:44 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:56, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > ---
> > >  hw/block/nvme.c | 28 ++++++++++++++++++++++------
> > >  1 file changed, 22 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index eecfad694bf8..ba520c76bae5 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -562,13 +577,14 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
> > >      uint64_t data_offset = slba << data_shift;
> > >      int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
> > >      enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
> > > +    uint16_t status;
> > >  
> > >      trace_nvme_dev_rw(is_write ? "write" : "read", nlb, data_size, slba);
> > >  
> > > -    if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
> > > +    status = nvme_check_bounds(n, ns, slba, nlb, req);
> > > +    if (status) {
> > >          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
> > > -        trace_nvme_dev_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
> > > -        return NVME_LBA_RANGE | NVME_DNR;
> > > +        return status;
> > >      }
> > >  
> > >      if (nvme_map(n, cmd, &req->qsg, &req->iov, data_size, req)) {
> > 
> > Looks good as well, once we get support for discard, it will
> > use this as well, but for now indeed only write zeros and read/write
> > need bounds checking on the IO path.
> > 
> 
> I have that patch in the submission queue and the check is factored out
> there ;)
Perfect!

> 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> 
> 

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 24/42] nvme: remove redundant has_sg member
  2020-03-31  5:44     ` Klaus Birkelund Jensen
@ 2020-03-31  9:25       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:25 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:44 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:45, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Remove the has_sg member from NvmeRequest since it's redundant.
> > 
> > To be honest this patch also replaces the dma_acct_start with block_acct_start
> > which looks right to me, and IMHO its OK to have both in the same patch,
> > but that should be mentioned in the commit message
> > 
> 
> I pulled it to a separate patch :)

Cool. Thanks

> 
> > With this fixed,
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> 
> 

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 23/42] nvme: add mapping helpers
  2020-03-31  5:44     ` Klaus Birkelund Jensen
@ 2020-03-31  9:30       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:30 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:44 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:45, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Add nvme_map_addr, nvme_map_addr_cmb and nvme_addr_to_cmb helpers and
> > > use them in nvme_map_prp.
> > > 
> > > This fixes a bug where in the case of a CMB transfer, the device would
> > > map to the buffer with a wrong length.
> > > 
> > > Fixes: b2b2b67a00574 ("nvme: Add support for Read Data and Write Data in CMBs.")
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > ---
> > >  hw/block/nvme.c       | 97 +++++++++++++++++++++++++++++++++++--------
> > >  hw/block/trace-events |  1 +
> > >  2 files changed, 81 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 08267e847671..187c816eb6ad 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -153,29 +158,79 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
> > >      }
> > >  }
> > >  
> > > +static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
> > > +                                  size_t len)
> > > +{
> > > +    if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> > > +        return NVME_DATA_TRAS_ERROR;
> > > +    }
> > 
> > I just noticed that
> > in theory (not that it really matters) but addr+len refers to the byte which is already 
> > not the part of the transfer.
> > 
> 
> Oh. Good catch - and I think that it does matter? Can't we end up
> rejecting a valid access? Anyway, I fixed it with a '- 1'.

Actually thinking again about it, we can indeed reject the access if the data happens
to to include last byte of CMB. That can absolutely happen.

When I wrote this I was thinking the other way around that we might reject data
that is in regular ram and 'touches' the CMB, which indeed won't happen since
RAM usually don't come close to MMIO ranges.

Anyway there is not reason to not fix such issues.

> 
> > 
> > > +
> > > +    qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
> > 
> > Also intersting is we can add 0 sized iovec.
> > 
> 
> I added a check on len. This also makes sure the above '- 1' fix doesn't
> cause an 'addr + 0 - 1' to be done.
Yes that is what I was thinking, len=0 needs a special case here.


Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 19/42] nvme: enforce valid queue creation sequence
  2020-03-31  5:41     ` Klaus Birkelund Jensen
@ 2020-03-31  9:31       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:31 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:41 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:43, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Support returning Command Sequence Error if Set Features on Number of
> > > Queues is called after queues have been created.
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > ---
> > >  hw/block/nvme.c | 7 +++++++
> > >  hw/block/nvme.h | 1 +
> > >  2 files changed, 8 insertions(+)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 007f8817f101..b40d27cddc46 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -881,6 +881,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
> > >      cq = g_malloc0(sizeof(*cq));
> > >      nvme_init_cq(cq, n, prp1, cqid, vector, qsize + 1,
> > >          NVME_CQ_FLAGS_IEN(qflags));
> > > +
> > > +    n->qs_created = true;
> > 
> > Very minor nitpick, maybe it is worth mentioning in a comment,
> > why this is only needed in CQ creation, as you explained to me.
> > 
> 
> Added.

Thanks a lot!
> 
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > 
> > 
> > 
> > 
> 
> 

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 14/42] nvme: add missing mandatory features
  2020-03-31  5:41     ` Klaus Birkelund Jensen
@ 2020-03-31  9:39       ` Maxim Levitsky
  2020-04-08 11:28         ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:39 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:41 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:41, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Add support for returning a resonable response to Get/Set Features of
> > > mandatory features.
> > > 
> > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/nvme.c       | 60 ++++++++++++++++++++++++++++++++++++++++++-
> > >  hw/block/trace-events |  2 ++
> > >  include/block/nvme.h  |  6 ++++-
> > >  3 files changed, 66 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index ff8975cd6667..eb9c722df968 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -1058,6 +1069,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > >          break;
> > >      case NVME_TIMESTAMP:
> > >          return nvme_get_feature_timestamp(n, cmd);
> > > +    case NVME_INTERRUPT_COALESCING:
> > > +        result = cpu_to_le32(n->features.int_coalescing);
> > > +        break;
> > > +    case NVME_INTERRUPT_VECTOR_CONF:
> > > +        if ((dw11 & 0xffff) > n->params.max_ioqpairs + 1) {
> > > +            return NVME_INVALID_FIELD | NVME_DNR;
> > > +        }
> > 
> > I still think that this should be >= since the interrupt vector is not zero based.
> > So if we have for example 3 IO queues, then we have 4 queues in total
> > which translates to irq numbers 0..3.
> > 
> 
> Yes you are right. The device will support max_ioqpairs + 1 IVs, so
> trying to access that would actually go 1 beyond the array.
> 
> Fixed.
> 
> > BTW the user of the device doesn't have to have 1:1 mapping between qid and msi interrupt index,
> > in fact when MSI is not used, all the queues will map to the same vector, which will be interrupt 0
> > from point of view of the device IMHO.
> > So it kind of makes sense IMHO to have num_irqs or something, even if it technically equals to number of queues.
> > 
> 
> Yeah, but the device will still *support* the N IVs, so they can still
> be configured even though they will not be used. So I don't think we
> need to introduce an additional parameter?

Yes and no.
I wasn't thinking to add a new parameter for number of supporter interrupt vectors,
but just to have an internal variable to represent it so that we could support in future
case where these are not equal.

Also from point of view of validating the users of this virtual nvme drive, I think it kind
of makes sense to allow having less supported IRQ vectors than IO queues, so to check
how userspace copes with it. It is valid after all to have same interrupt vector shared between
multiple queues.

In fact in theory (but that would complicate the implementation greatly) we should even support
case when number of submission queues is not equal to number of completion queues. Yes nobody does in real hardware,
and at least Linux nvme driver hard assumes 1:1 SQ/CQ mapping but still.

My nvme-mdev doesn't make this assumpiton (and neither any assumptions on interrupt vector counts) 
and allows the user to have any SQ/CQ mapping as far as the spec allows
(but it does hardcode maximum number of SQ/CQ supported)

BTW, I haven't looked at that but we should check that the virtual nvme drive can cope with using legacy
interrupt (that is MSI disabled) - nvme-mdev does support this and was tested with it.


> 
> > > @@ -1120,6 +1146,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > >  
> > >          break;
> > >      case NVME_VOLATILE_WRITE_CACHE:
> > > +        if (blk_enable_write_cache(n->conf.blk)) {
> > > +            blk_flush(n->conf.blk);
> > > +        }
> > 
> > (not your fault) but the blk_enable_write_cache function name is highly misleading,
> > since it doesn't enable anything but just gets the flag if the write cache is enabled.
> > It really should be called blk_get_enable_write_cache.
> > 
> 
> Agreed :)
> 
> > > @@ -1804,6 +1860,7 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> > >      id->cqes = (0x4 << 4) | 0x4;
> > >      id->nn = cpu_to_le32(n->num_namespaces);
> > >      id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS | NVME_ONCS_TIMESTAMP);
> > > +
> > 
> > Unrelated whitespace change
> 
> Fixed.
> 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > 
> > 
> > 

Best regards,
	Maxim Levitsky

> 
> 




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 12/42] nvme: add support for the get log page command
  2020-03-31  5:41     ` Klaus Birkelund Jensen
@ 2020-03-31  9:45       ` Maxim Levitsky
  2020-03-31 12:49         ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:45 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:41 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:40, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Add support for the Get Log Page command and basic implementations of
> > > the mandatory Error Information, SMART / Health Information and Firmware
> > > Slot Information log pages.
> > > 
> > > In violation of the specification, the SMART / Health Information log
> > > page does not persist information over the lifetime of the controller
> > > because the device has no place to store such persistent state.
> > > 
> > > Note that the LPA field in the Identify Controller data structure
> > > intentionally has bit 0 cleared because there is no namespace specific
> > > information in the SMART / Health information log page.
> > > 
> > > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > > Section 5.10 ("Get Log Page command").
> > > 
> > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/nvme.c       | 138 +++++++++++++++++++++++++++++++++++++++++-
> > >  hw/block/nvme.h       |  10 +++
> > >  hw/block/trace-events |   2 +
> > >  3 files changed, 149 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 64c42101df5c..83ff3fbfb463 100644
> > > 
> > > +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> > > +                                uint64_t off, NvmeRequest *req)
> > > +{
> > > +    uint32_t trans_len;
> > > +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> > > +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> > > +    uint8_t errlog[64];
> > 
> > I'll would replace this with sizeof(NvmeErrorLogEntry)
> > (and add NvmeErrorLogEntry to the nvme.h), just for the sake of consistency,
> > and in case we end up reporting some errors to the log in the future.
> > 
> 
> NvmeErrorLog is already in nvme.h; Fixed to actually use it.
True that! I'll would rename it to NvmeErrorLogEntry to be honest
(in that patch that added many nvme spec changes) but I don't mind
keeping it as is as well.


> 
> > 
> > > +
> > > +    if (off > sizeof(errlog)) {
> > > +        return NVME_INVALID_FIELD | NVME_DNR;
> > > +    }
> > > +
> > > +    memset(errlog, 0x0, sizeof(errlog));
> > > +
> > > +    trans_len = MIN(sizeof(errlog) - off, buf_len);
> > > +
> > > +    return nvme_dma_read_prp(n, errlog, trans_len, prp1, prp2);
> > > +}
> > 
> > Besides this, looks good now.
> > 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> 
> 

Best regards,	
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 11/42] nvme: add temperature threshold feature
  2020-03-31  5:40     ` Klaus Birkelund Jensen
@ 2020-03-31  9:46       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:46 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:40 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:40, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > It might seem wierd to implement this feature for an emulated device,
> > > but it is mandatory to support and the feature is useful for testing
> > > asynchronous event request support, which will be added in a later
> > > patch.
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/nvme.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++
> > >  hw/block/nvme.h      |  2 ++
> > >  include/block/nvme.h |  8 +++++++-
> > >  3 files changed, 57 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> > > index b7c465560eea..8cda5f02c622 100644
> > > --- a/hw/block/nvme.h
> > > +++ b/hw/block/nvme.h
> > > @@ -108,6 +108,7 @@ typedef struct NvmeCtrl {
> > >      uint64_t    irq_status;
> > >      uint64_t    host_timestamp;                 /* Timestamp sent by the host */
> > >      uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
> > > +    uint16_t    temperature;
> > 
> > You forgot to move this too.
> > 
> 
> Fixed!

Thanks.

> > 
> > With 'temperature' field removed from the header:
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> 
> 

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 09/42] nvme: add max_ioqpairs device parameter
  2020-03-31  5:40     ` Klaus Birkelund Jensen
@ 2020-03-31  9:48       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31  9:48 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:40 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:39, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > The num_queues device paramater has a slightly confusing meaning because
> > > it accounts for the admin queue pair which is not really optional.
> > > Secondly, it is really a maximum value of queues allowed.
> > > 
> > > Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
> > > but keep num_queues for compatibility.
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > ---
> > >  hw/block/nvme.c | 45 ++++++++++++++++++++++++++-------------------
> > >  hw/block/nvme.h |  4 +++-
> > >  2 files changed, 29 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 7cf7cf55143e..7dfd8a1a392d 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -1332,9 +1333,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> > >      int64_t bs_size;
> > >      uint8_t *pci_conf;
> > >  
> > > -    if (!n->params.num_queues) {
> > > -        error_setg(errp, "num_queues can't be zero");
> > > -        return;
> > > +    if (n->params.num_queues) {
> > > +        warn_report("nvme: num_queues is deprecated; please use max_ioqpairs "
> > > +                    "instead");
> > > +
> > > +        n->params.max_ioqpairs = n->params.num_queues - 1;
> > > +    }
> > > +
> > > +    if (!n->params.max_ioqpairs) {
> > > +        error_setg(errp, "max_ioqpairs can't be less than 1");
> > >      }
> > 
> > This is not even a nitpick, but just and idea.
> > 
> > It might be worth it to allow max_ioqpairs=0 to simulate a 'broken'
> > nvme controller. I know that kernel has special handling for such controllers,
> > which include only creation of the control character device (/dev/nvme*) through
> > which the user can submit commands to try and 'fix' the controller (by re-uploading firmware
> > maybe or something like that).
> > 
> > 
> 
> Not sure about the implications of this, so I'll leave that on the TODO
> :) But a controller with no I/O queues is an "Administrative Controller"
> and perfectly legal in NVMe v1.4 AFAIK.
That what I was thinking as well. Keeping this on a TODO list is perfectly fine.

> 
> > >  
> > >      if (!n->conf.blk) {
> > > @@ -1365,19 +1372,19 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> > >      pcie_endpoint_cap_init(pci_dev, 0x80);
> > >  
> > >      n->num_namespaces = 1;
> > > -    n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
> > > +    n->reg_size = pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4);
> > 
> > I hate to say it, but it looks like this thing (which I mentioned to you in V5)
> > was pre-existing bug, which is indeed fixed now.
> > In theory such fixes should go to separate patches, but in this case, I guess it would
> > be too much to ask for it.
> > Maybe mention this in the commit message instead, so that this fix doesn't stay hidden like that?
> > 
> > 
> 
> I'm convinced now. I have added a preparatory bugfix patch before this
> patch.
Thanks a lot!. 
Sorry for not noticing it before.

> 
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 

Best regards,
	Maxim Levitsky
> 
> 




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 07/42] nvme: refactor nvme_addr_read
  2020-03-31  5:39     ` Klaus Birkelund Jensen
@ 2020-03-31 10:41       ` Maxim Levitsky
  2020-03-31 12:48         ` Klaus Birkelund Jensen
  0 siblings, 1 reply; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31 10:41 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:39 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:38, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Pull the controller memory buffer check to its own function. The check
> > > will be used on its own in later patches.
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > ---
> > >  hw/block/nvme.c | 16 ++++++++++++----
> > >  1 file changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index b38d7e548a60..08a83d449de3 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -52,14 +52,22 @@
> > >  
> > >  static void nvme_process_sq(void *opaque);
> > >  
> > > +static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
> > > +{
> > > +    hwaddr low = n->ctrl_mem.addr;
> > > +    hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
> > > +
> > > +    return addr >= low && addr < hi;
> > > +}
> > > +
> > >  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> > >  {
> > > -    if (n->cmbsz && addr >= n->ctrl_mem.addr &&
> > > -                addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
> > > +    if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> > >          memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
> > > -    } else {
> > > -        pci_dma_read(&n->parent_obj, addr, buf, size);
> > > +        return;
> > >      }
> > > +
> > > +    pci_dma_read(&n->parent_obj, addr, buf, size);
> > >  }
> > >  
> > >  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
> > 
> > Note that this patch still contains a bug that it removes the check against the accessed
> > size, which you fix in later patch.
> > I prefer to not add a bug in first place
> > However if you have a reason for this, I won't mind.
> > 
> 
> So yeah. The resons is that there is actually no bug at this point
> because the controller only supports PRPs. I actually thought there was
> a bug as well and reported it to qemu-security some months ago as a
> potential out of bounds access. I was then schooled by Keith on how PRPs
> work ;) Below is a paraphrased version of Keiths analysis.
> 
> The PRPs does not cross page boundaries:
True

> 
>     trans_len = n->page_size - (prp1 % n->page_size);
> 
> The PRPs are always verified to be page aligned:
True
> 
>     if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
> 
> and the transfer length wont go above page size. So, since the beginning
> of the address is within the CMB and considering that the CMB is of an
> MB aligned and sized granularity, then we can never cross outside it
> with PRPs.
I understand now, however the reason I am arguing about this is
that this patch actually _removes_ the size bound check

It was before the patch:

n->cmbsz && addr >= n->ctrl_mem.addr &&
      addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)

> 
> I could add the check at this point (because it *is* needed for when
> SGLs are introduced), but I think it would just be noise and I would
> need to explain why the check is there, but not really needed at this
> point. Instead I'm adding a new patch before the SGL patch that explains
> this.


Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 04/42] nvme: bump spec data structures to v1.3
  2020-03-31  5:38     ` Klaus Birkelund Jensen
@ 2020-03-31 10:43       ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31 10:43 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 07:38 +0200, Klaus Birkelund Jensen wrote:
> On Mar 25 12:37, Maxim Levitsky wrote:
> > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Add missing fields in the Identify Controller and Identify Namespace
> > > data structures to bring them in line with NVMe v1.3.
> > > 
> > > This also adds data structures and defines for SGL support which
> > > requires a couple of trivial changes to the nvme block driver as well.
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > Acked-by: Fam Zheng <fam@euphon.net>
> > > ---
> > >  block/nvme.c         |  18 ++---
> > >  hw/block/nvme.c      |  12 ++--
> > >  include/block/nvme.h | 153 ++++++++++++++++++++++++++++++++++++++-----
> > >  3 files changed, 151 insertions(+), 32 deletions(-)
> > > 
> > > diff --git a/block/nvme.c b/block/nvme.c
> > > index d41c4bda6e39..99b9bb3dac96 100644
> > > --- a/block/nvme.c
> > > +++ b/block/nvme.c
> > > @@ -589,6 +675,16 @@ enum NvmeIdCtrlOncs {
> > >  #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
> > >  #define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf)
> > >  
> > > +#define NVME_CTRL_SGLS_SUPPORTED_MASK            (0x3 <<  0)
> > > +#define NVME_CTRL_SGLS_SUPPORTED_NO_ALIGNMENT    (0x1 <<  0)
> > > +#define NVME_CTRL_SGLS_SUPPORTED_DWORD_ALIGNMENT (0x1 <<  1)
> > > +#define NVME_CTRL_SGLS_KEYED                     (0x1 <<  2)
> > > +#define NVME_CTRL_SGLS_BITBUCKET                 (0x1 << 16)
> > > +#define NVME_CTRL_SGLS_MPTR_CONTIGUOUS           (0x1 << 17)
> > > +#define NVME_CTRL_SGLS_EXCESS_LENGTH             (0x1 << 18)
> > > +#define NVME_CTRL_SGLS_MPTR_SGL                  (0x1 << 19)
> > > +#define NVME_CTRL_SGLS_ADDR_OFFSET               (0x1 << 20)
> > 
> > OK
> > > +
> > >  typedef struct NvmeFeatureVal {
> > >      uint32_t    arbitration;
> > >      uint32_t    power_mgmt;
> > > @@ -611,6 +707,10 @@ typedef struct NvmeFeatureVal {
> > >  #define NVME_INTC_THR(intc)     (intc & 0xff)
> > >  #define NVME_INTC_TIME(intc)    ((intc >> 8) & 0xff)
> > >  
> > > +#define NVME_TEMP_THSEL(temp)  ((temp >> 20) & 0x3)
> > 
> > Nitpick: If we are adding this, I'll add a #define for the values as well
> > 
> 
> Done. And used in the subsequent "nvme: add temperature threshold
> feature" patch.
Thank you!

> 
> > > +#define NVME_TEMP_TMPSEL(temp) ((temp >> 16) & 0xf)
> > > +#define NVME_TEMP_TMPTH(temp)  ((temp >>  0) & 0xffff)
> > > +
> > >  enum NvmeFeatureIds {
> > >      NVME_ARBITRATION                = 0x1,
> > >      NVME_POWER_MANAGEMENT           = 0x2,
> > > @@ -653,18 +753,37 @@ typedef struct NvmeIdNs {
> > >      uint8_t     mc;
> > >      uint8_t     dpc;
> > >      uint8_t     dps;
> > > -
> > >      uint8_t     nmic;
> > >      uint8_t     rescap;
> > >      uint8_t     fpi;
> > >      uint8_t     dlfeat;
> > > -
> > > -    uint8_t     res34[94];
> > > +    uint16_t    nawun;
> > > +    uint16_t    nawupf;
> > > +    uint16_t    nacwu;
> > > +    uint16_t    nabsn;
> > > +    uint16_t    nabo;
> > > +    uint16_t    nabspf;
> > > +    uint16_t    noiob;
> > > +    uint8_t     nvmcap[16];
> > > +    uint8_t     rsvd64[40];
> > > +    uint8_t     nguid[16];
> > > +    uint64_t    eui64;
> > >      NvmeLBAF    lbaf[16];
> > > -    uint8_t     res192[192];
> > > +    uint8_t     rsvd192[192];
> > >      uint8_t     vs[3712];
> > >  } NvmeIdNs;
> > 
> > Also checked this against V5, looks OK now
> > 
> > >  
> > > +typedef struct NvmeIdNsDescr {
> > > +    uint8_t nidt;
> > > +    uint8_t nidl;
> > > +    uint8_t rsvd2[2];
> > > +} NvmeIdNsDescr;
> > 
> > OK
> > 
> > 
> > 
> > > +
> > > +#define NVME_NIDT_UUID_LEN 16
> > > +
> > > +enum {
> > > +    NVME_NIDT_UUID = 0x3,
> > 
> > Very minor nitpick: I'll would add others as well just for the sake
> > of better understanding what this is
> > 
> 
> Done.
Thanks!
> 
> > > +};
> > >  
> > >  /*Deallocate Logical Block Features*/
> > >  #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)
> > 
> > Looks very good.
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> 
> 

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 07/42] nvme: refactor nvme_addr_read
  2020-03-31 10:41       ` Maxim Levitsky
@ 2020-03-31 12:48         ` Klaus Birkelund Jensen
  2020-03-31 14:46           ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31 12:48 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 31 13:41, Maxim Levitsky wrote:
> On Tue, 2020-03-31 at 07:39 +0200, Klaus Birkelund Jensen wrote:
> > On Mar 25 12:38, Maxim Levitsky wrote:
> > > Note that this patch still contains a bug that it removes the check against the accessed
> > > size, which you fix in later patch.
> > > I prefer to not add a bug in first place
> > > However if you have a reason for this, I won't mind.
> > > 
> > 
> > So yeah. The resons is that there is actually no bug at this point
> > because the controller only supports PRPs. I actually thought there was
> > a bug as well and reported it to qemu-security some months ago as a
> > potential out of bounds access. I was then schooled by Keith on how PRPs
> > work ;) Below is a paraphrased version of Keiths analysis.
> > 
> > The PRPs does not cross page boundaries:
> True
> 
> > 
> >     trans_len = n->page_size - (prp1 % n->page_size);
> > 
> > The PRPs are always verified to be page aligned:
> True
> > 
> >     if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
> > 
> > and the transfer length wont go above page size. So, since the beginning
> > of the address is within the CMB and considering that the CMB is of an
> > MB aligned and sized granularity, then we can never cross outside it
> > with PRPs.
> I understand now, however the reason I am arguing about this is
> that this patch actually _removes_ the size bound check
> 
> It was before the patch:
> 
> n->cmbsz && addr >= n->ctrl_mem.addr &&
>       addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)
> 

I don't think it does - the check is just moved to nvme_addr_is_cmb:

    static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
    {
        hwaddr low = n->ctrl_mem.addr;
        hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);

        return addr >= low && addr < hi;
    }

We check that `addr` is less than `hi`. Maybe the name is unfortunate...




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 12/42] nvme: add support for the get log page command
  2020-03-31  9:45       ` Maxim Levitsky
@ 2020-03-31 12:49         ` Klaus Birkelund Jensen
  2020-03-31 14:47           ` Maxim Levitsky
  0 siblings, 1 reply; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-03-31 12:49 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 31 12:45, Maxim Levitsky wrote:
> On Tue, 2020-03-31 at 07:41 +0200, Klaus Birkelund Jensen wrote:
> > On Mar 25 12:40, Maxim Levitsky wrote:
> > > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > > From: Klaus Jensen <k.jensen@samsung.com>
> > > > 
> > > > Add support for the Get Log Page command and basic implementations of
> > > > the mandatory Error Information, SMART / Health Information and Firmware
> > > > Slot Information log pages.
> > > > 
> > > > In violation of the specification, the SMART / Health Information log
> > > > page does not persist information over the lifetime of the controller
> > > > because the device has no place to store such persistent state.
> > > > 
> > > > Note that the LPA field in the Identify Controller data structure
> > > > intentionally has bit 0 cleared because there is no namespace specific
> > > > information in the SMART / Health information log page.
> > > > 
> > > > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > > > Section 5.10 ("Get Log Page command").
> > > > 
> > > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > > ---
> > > >  hw/block/nvme.c       | 138 +++++++++++++++++++++++++++++++++++++++++-
> > > >  hw/block/nvme.h       |  10 +++
> > > >  hw/block/trace-events |   2 +
> > > >  3 files changed, 149 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > > index 64c42101df5c..83ff3fbfb463 100644
> > > > 
> > > > +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> > > > +                                uint64_t off, NvmeRequest *req)
> > > > +{
> > > > +    uint32_t trans_len;
> > > > +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> > > > +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> > > > +    uint8_t errlog[64];
> > > 
> > > I'll would replace this with sizeof(NvmeErrorLogEntry)
> > > (and add NvmeErrorLogEntry to the nvme.h), just for the sake of consistency,
> > > and in case we end up reporting some errors to the log in the future.
> > > 
> > 
> > NvmeErrorLog is already in nvme.h; Fixed to actually use it.
> True that! I'll would rename it to NvmeErrorLogEntry to be honest
> (in that patch that added many nvme spec changes) but I don't mind
> keeping it as is as well.
> 
 
It is used in the block driver (block/nvme.c) as well, and I'd rather
not involved that too much in this series.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 07/42] nvme: refactor nvme_addr_read
  2020-03-31 12:48         ` Klaus Birkelund Jensen
@ 2020-03-31 14:46           ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31 14:46 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 14:48 +0200, Klaus Birkelund Jensen wrote:
> On Mar 31 13:41, Maxim Levitsky wrote:
> > On Tue, 2020-03-31 at 07:39 +0200, Klaus Birkelund Jensen wrote:
> > > On Mar 25 12:38, Maxim Levitsky wrote:
> > > > Note that this patch still contains a bug that it removes the check against the accessed
> > > > size, which you fix in later patch.
> > > > I prefer to not add a bug in first place
> > > > However if you have a reason for this, I won't mind.
> > > > 
> > > 
> > > So yeah. The resons is that there is actually no bug at this point
> > > because the controller only supports PRPs. I actually thought there was
> > > a bug as well and reported it to qemu-security some months ago as a
> > > potential out of bounds access. I was then schooled by Keith on how PRPs
> > > work ;) Below is a paraphrased version of Keiths analysis.
> > > 
> > > The PRPs does not cross page boundaries:
> > 
> > True
> > 
> > > 
> > >     trans_len = n->page_size - (prp1 % n->page_size);
> > > 
> > > The PRPs are always verified to be page aligned:
> > 
> > True
> > > 
> > >     if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
> > > 
> > > and the transfer length wont go above page size. So, since the beginning
> > > of the address is within the CMB and considering that the CMB is of an
> > > MB aligned and sized granularity, then we can never cross outside it
> > > with PRPs.
> > 
> > I understand now, however the reason I am arguing about this is
> > that this patch actually _removes_ the size bound check
> > 
> > It was before the patch:
> > 
> > n->cmbsz && addr >= n->ctrl_mem.addr &&
> >       addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)
> > 
> 
> I don't think it does - the check is just moved to nvme_addr_is_cmb:
> 
>     static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>     {
>         hwaddr low = n->ctrl_mem.addr;
>         hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
> 
>         return addr >= low && addr < hi;
>     }
> 
> We check that `addr` is less than `hi`. Maybe the name is unfortunate...
> 
> 
Oh, I am just blind! sorry about that.
You are 100% right.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 12/42] nvme: add support for the get log page command
  2020-03-31 12:49         ` Klaus Birkelund Jensen
@ 2020-03-31 14:47           ` Maxim Levitsky
  0 siblings, 0 replies; 121+ messages in thread
From: Maxim Levitsky @ 2020-03-31 14:47 UTC (permalink / raw)
  To: Klaus Birkelund Jensen
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Tue, 2020-03-31 at 14:49 +0200, Klaus Birkelund Jensen wrote:
> On Mar 31 12:45, Maxim Levitsky wrote:
> > On Tue, 2020-03-31 at 07:41 +0200, Klaus Birkelund Jensen wrote:
> > > On Mar 25 12:40, Maxim Levitsky wrote:
> > > > On Mon, 2020-03-16 at 07:28 -0700, Klaus Jensen wrote:
> > > > > From: Klaus Jensen <k.jensen@samsung.com>
> > > > > 
> > > > > Add support for the Get Log Page command and basic implementations of
> > > > > the mandatory Error Information, SMART / Health Information and Firmware
> > > > > Slot Information log pages.
> > > > > 
> > > > > In violation of the specification, the SMART / Health Information log
> > > > > page does not persist information over the lifetime of the controller
> > > > > because the device has no place to store such persistent state.
> > > > > 
> > > > > Note that the LPA field in the Identify Controller data structure
> > > > > intentionally has bit 0 cleared because there is no namespace specific
> > > > > information in the SMART / Health information log page.
> > > > > 
> > > > > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > > > > Section 5.10 ("Get Log Page command").
> > > > > 
> > > > > Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
> > > > > Acked-by: Keith Busch <kbusch@kernel.org>
> > > > > ---
> > > > >  hw/block/nvme.c       | 138 +++++++++++++++++++++++++++++++++++++++++-
> > > > >  hw/block/nvme.h       |  10 +++
> > > > >  hw/block/trace-events |   2 +
> > > > >  3 files changed, 149 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > > > index 64c42101df5c..83ff3fbfb463 100644
> > > > > 
> > > > > +static uint16_t nvme_error_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t buf_len,
> > > > > +                                uint64_t off, NvmeRequest *req)
> > > > > +{
> > > > > +    uint32_t trans_len;
> > > > > +    uint64_t prp1 = le64_to_cpu(cmd->dptr.prp1);
> > > > > +    uint64_t prp2 = le64_to_cpu(cmd->dptr.prp2);
> > > > > +    uint8_t errlog[64];
> > > > 
> > > > I'll would replace this with sizeof(NvmeErrorLogEntry)
> > > > (and add NvmeErrorLogEntry to the nvme.h), just for the sake of consistency,
> > > > and in case we end up reporting some errors to the log in the future.
> > > > 
> > > 
> > > NvmeErrorLog is already in nvme.h; Fixed to actually use it.
> > 
> > True that! I'll would rename it to NvmeErrorLogEntry to be honest
> > (in that patch that added many nvme spec changes) but I don't mind
> > keeping it as is as well.
> > 
> 
>  
> It is used in the block driver (block/nvme.c) as well, and I'd rather
> not involved that too much in this series.

All right, this can always be done later.
Best regards,
	Maxim Levitsky
> 




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 14/42] nvme: add missing mandatory features
  2020-03-31  9:39       ` Maxim Levitsky
@ 2020-04-08 11:28         ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-04-08 11:28 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 31 12:39, Maxim Levitsky wrote:
> On Tue, 2020-03-31 at 07:41 +0200, Klaus Birkelund Jensen wrote:
> > On Mar 25 12:41, Maxim Levitsky wrote:
> > > BTW the user of the device doesn't have to have 1:1 mapping between qid and msi interrupt index,
> > > in fact when MSI is not used, all the queues will map to the same vector, which will be interrupt 0
> > > from point of view of the device IMHO.
> > > So it kind of makes sense IMHO to have num_irqs or something, even if it technically equals to number of queues.
> > > 
> > 
> > Yeah, but the device will still *support* the N IVs, so they can still
> > be configured even though they will not be used. So I don't think we
> > need to introduce an additional parameter?
> 
> Yes and no.
> I wasn't thinking to add a new parameter for number of supporter interrupt vectors,
> but just to have an internal variable to represent it so that we could support in future
> case where these are not equal.
> 
> Also from point of view of validating the users of this virtual nvme drive, I think it kind
> of makes sense to allow having less supported IRQ vectors than IO queues, so to check
> how userspace copes with it. It is valid after all to have same interrupt vector shared between
> multiple queues.
> 

I see that this could be useful for testing, but I think we can defer
that to a later patch. Would you be okay with that for now?

> In fact in theory (but that would complicate the implementation greatly) we should even support
> case when number of submission queues is not equal to number of completion queues. Yes nobody does in real hardware,
> and at least Linux nvme driver hard assumes 1:1 SQ/CQ mapping but still.
> 

It is not the hardware that decides this and I believe that there
definitely are applications that chooses to associate multiple SQs with
a single CQ. The CQ is an attribute of the SQ and the IV of the CQ is
also specified in the create command. I believe this is already
supported.

> My nvme-mdev doesn't make this assumpiton (and neither any assumptions on interrupt vector counts) 
> and allows the user to have any SQ/CQ mapping as far as the spec allows
> (but it does hardcode maximum number of SQ/CQ supported)
> 
> BTW, I haven't looked at that but we should check that the virtual nvme drive can cope with using legacy
> interrupt (that is MSI disabled) - nvme-mdev does support this and was tested with it.
> 

Yes, this is definitely not very well tested.

If you insist on all of the above being implemented, then I will do it,
but I would rather defer this to later patches as this series is already
pretty large ;)


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v6 32/42] nvme: allow multiple aios per command
  2020-03-31  9:10       ` Maxim Levitsky
@ 2020-04-08 15:02         ` Klaus Birkelund Jensen
  0 siblings, 0 replies; 121+ messages in thread
From: Klaus Birkelund Jensen @ 2020-04-08 15:02 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Beata Michalska, qemu-block, qemu-devel, Max Reitz,
	Keith Busch, Javier Gonzalez

On Mar 31 12:10, Maxim Levitsky wrote:
> On Tue, 2020-03-31 at 07:47 +0200, Klaus Birkelund Jensen wrote:
> > On Mar 25 12:57, Maxim Levitsky wrote:
> > > On Mon, 2020-03-16 at 07:29 -0700, Klaus Jensen wrote:
> > > > @@ -516,10 +613,10 @@ static inline uint16_t nvme_check_prinfo(NvmeCtrl *n, NvmeNamespace *ns,
> > > >      return NVME_SUCCESS;
> > > >  }
> > > >  
> > > > -static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
> > > > -                                         uint64_t slba, uint32_t nlb,
> > > > -                                         NvmeRequest *req)
> > > > +static inline uint16_t nvme_check_bounds(NvmeCtrl *n, uint64_t slba,
> > > > +                                         uint32_t nlb, NvmeRequest *req)
> > > >  {
> > > > +    NvmeNamespace *ns = req->ns;
> > > >      uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
> > > 
> > > This should go to the patch that added nvme_check_bounds as well
> > > 
> > 
> > We can't really, because the NvmeRequest does not hold a reference to
> > the namespace as a struct member at that point. This is also an issue
> > with the nvme_check_prinfo function above.
> 
> I see it now. The changes to NvmeRequest together with this are a good candidate
> to split from this patch to get this patch to size that is easy to review.
> 

I'm factoring those changes and other stuff out into separate patches!




^ permalink raw reply	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2020-04-08 15:03 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 14:28 [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces Klaus Jensen
2020-03-16 14:28 ` [PATCH v6 01/42] nvme: rename trace events to nvme_dev Klaus Jensen
2020-03-25 10:36   ` Maxim Levitsky
2020-03-31  5:38     ` Klaus Birkelund Jensen
2020-03-16 14:28 ` [PATCH v6 02/42] nvme: remove superfluous breaks Klaus Jensen
2020-03-16 14:28 ` [PATCH v6 03/42] nvme: move device parameters to separate struct Klaus Jensen
2020-03-25 10:36   ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 04/42] nvme: bump spec data structures to v1.3 Klaus Jensen
2020-03-25 10:37   ` Maxim Levitsky
2020-03-31  5:38     ` Klaus Birkelund Jensen
2020-03-31 10:43       ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 05/42] nvme: use constant for identify data size Klaus Jensen
2020-03-25 10:37   ` Maxim Levitsky
2020-03-31  5:38     ` Klaus Birkelund Jensen
2020-03-16 14:28 ` [PATCH v6 06/42] nvme: add identify cns values in header Klaus Jensen
2020-03-25 10:37   ` Maxim Levitsky
2020-03-31  5:39     ` Klaus Birkelund Jensen
2020-03-16 14:28 ` [PATCH v6 07/42] nvme: refactor nvme_addr_read Klaus Jensen
2020-03-25 10:38   ` Maxim Levitsky
2020-03-31  5:39     ` Klaus Birkelund Jensen
2020-03-31 10:41       ` Maxim Levitsky
2020-03-31 12:48         ` Klaus Birkelund Jensen
2020-03-31 14:46           ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 08/42] nvme: add support for the abort command Klaus Jensen
2020-03-25 10:38   ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 09/42] nvme: add max_ioqpairs device parameter Klaus Jensen
2020-03-25 10:39   ` Maxim Levitsky
2020-03-31  5:40     ` Klaus Birkelund Jensen
2020-03-31  9:48       ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 10/42] nvme: refactor device realization Klaus Jensen
2020-03-25 10:40   ` Maxim Levitsky
2020-03-31  5:40     ` Klaus Birkelund Jensen
2020-03-16 14:28 ` [PATCH v6 11/42] nvme: add temperature threshold feature Klaus Jensen
2020-03-25 10:40   ` Maxim Levitsky
2020-03-31  5:40     ` Klaus Birkelund Jensen
2020-03-31  9:46       ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 12/42] nvme: add support for the get log page command Klaus Jensen
2020-03-25 10:40   ` Maxim Levitsky
2020-03-31  5:41     ` Klaus Birkelund Jensen
2020-03-31  9:45       ` Maxim Levitsky
2020-03-31 12:49         ` Klaus Birkelund Jensen
2020-03-31 14:47           ` Maxim Levitsky
2020-03-16 14:28 ` [PATCH v6 13/42] nvme: add support for the asynchronous event request command Klaus Jensen
2020-03-25 10:41   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 14/42] nvme: add missing mandatory features Klaus Jensen
2020-03-25 10:41   ` Maxim Levitsky
2020-03-31  5:41     ` Klaus Birkelund Jensen
2020-03-31  9:39       ` Maxim Levitsky
2020-04-08 11:28         ` Klaus Birkelund Jensen
2020-03-16 14:29 ` [PATCH v6 15/42] nvme: additional tracing Klaus Jensen
2020-03-25 10:42   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 16/42] nvme: make sure ncqr and nsqr is valid Klaus Jensen
2020-03-25 10:42   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 17/42] nvme: add log specific field to trace events Klaus Jensen
2020-03-25 10:43   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 18/42] nvme: support identify namespace descriptor list Klaus Jensen
2020-03-25 10:43   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 19/42] nvme: enforce valid queue creation sequence Klaus Jensen
2020-03-25 10:43   ` Maxim Levitsky
2020-03-31  5:41     ` Klaus Birkelund Jensen
2020-03-31  9:31       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 20/42] nvme: provide the mandatory subnqn field Klaus Jensen
2020-03-25 10:43   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 21/42] nvme: bump supported version to v1.3 Klaus Jensen
2020-03-25 10:44   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 22/42] nvme: memset preallocated requests structures Klaus Jensen
2020-03-25 10:44   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 23/42] nvme: add mapping helpers Klaus Jensen
2020-03-25 10:45   ` Maxim Levitsky
2020-03-31  5:44     ` Klaus Birkelund Jensen
2020-03-31  9:30       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 24/42] nvme: remove redundant has_sg member Klaus Jensen
2020-03-25 10:45   ` Maxim Levitsky
2020-03-31  5:44     ` Klaus Birkelund Jensen
2020-03-31  9:25       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 25/42] nvme: refactor dma read/write Klaus Jensen
2020-03-25 10:46   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 26/42] nvme: pass request along for tracing Klaus Jensen
2020-03-25 10:55   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 27/42] nvme: add request mapping helper Klaus Jensen
2020-03-25 10:56   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 28/42] nvme: verify validity of prp lists in the cmb Klaus Jensen
2020-03-25 10:56   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 29/42] nvme: refactor request bounds checking Klaus Jensen
2020-03-25 10:56   ` Maxim Levitsky
2020-03-31  5:44     ` Klaus Birkelund Jensen
2020-03-31  9:23       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 30/42] nvme: add check for mdts Klaus Jensen
2020-03-25 10:57   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 31/42] nvme: add check for prinfo Klaus Jensen
2020-03-25 10:57   ` Maxim Levitsky
2020-03-31  5:45     ` Klaus Birkelund Jensen
2020-03-31  9:17       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 32/42] nvme: allow multiple aios per command Klaus Jensen
2020-03-25 10:57   ` Maxim Levitsky
2020-03-31  5:47     ` Klaus Birkelund Jensen
2020-03-31  9:10       ` Maxim Levitsky
2020-04-08 15:02         ` Klaus Birkelund Jensen
2020-03-16 14:29 ` [PATCH v6 33/42] nvme: use preallocated qsg/iov in nvme_dma_prp Klaus Jensen
2020-03-25 10:58   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 34/42] pci: pass along the return value of dma_memory_rw Klaus Jensen
2020-03-16 14:29 ` [PATCH v6 35/42] nvme: handle dma errors Klaus Jensen
2020-03-25 10:58   ` Maxim Levitsky
2020-03-31  5:47     ` Klaus Birkelund Jensen
2020-03-16 14:29 ` [PATCH v6 36/42] nvme: add support for scatter gather lists Klaus Jensen
2020-03-25 10:58   ` Maxim Levitsky
2020-03-31  5:48     ` Klaus Birkelund Jensen
2020-03-31  8:51       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 37/42] nvme: refactor identify active namespace id list Klaus Jensen
2020-03-25 10:58   ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 38/42] nvme: support multiple namespaces Klaus Jensen
2020-03-25 10:59   ` Maxim Levitsky
2020-03-31  5:48     ` Klaus Birkelund Jensen
2020-03-31  8:47       ` Maxim Levitsky
2020-03-16 14:29 ` [PATCH v6 39/42] pci: allocate pci id for nvme Klaus Jensen
2020-03-16 14:29 ` [PATCH v6 40/42] nvme: change controller pci id Klaus Jensen
2020-03-16 14:29 ` [PATCH v6 41/42] nvme: remove redundant NvmeCmd pointer parameter Klaus Jensen
2020-03-16 14:29 ` [PATCH v6 42/42] nvme: make lba data size configurable Klaus Jensen
2020-03-25 10:59   ` Maxim Levitsky
2020-03-16 19:30 ` [PATCH v6 00/42] nvme: support NVMe v1.3d, SGLs and multiple namespaces no-reply
2020-03-25 10:35 ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).