All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
@ 2022-06-16 12:34 Jinhao Fan
  2022-06-16 12:34 ` [PATCH v3 1/2] hw/nvme: Implement " Jinhao Fan
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Jinhao Fan @ 2022-06-16 12:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: its, kbusch, Jinhao Fan

This patch adds shadow doorbell buffer support in NVMe 1.3 to QEMU
NVMe. The Doorbell Buffer Config admin command is implemented for the
guest to enable shadow doobell buffer. When this feature is enabled, each
SQ/CQ is associated with two buffers, i.e., Shadow Doorbell buffer and
EventIdx buffer. According to the Spec, each queue's doorbell register
is only updated when the Shadow Doorbell buffer value changes from being
less than or equal to the value of the corresponding EventIdx buffer
entry to being greater than that value. Therefore, the number of MMIO's
on the doorbell registers is greatly reduced.

This patch is adapted from Huaicheng Li's patch[1] in 2018.

[1] https://patchwork.kernel.org/project/qemu-devel/patch/20180305194906.GA3630@gmail.com/

IOPS comparison with FIO:

iodepth    1      2      4      8
  QEMU   25.1k  25.9k  24.5k  24.0k
 +dbbuf  29.1k  60.1k  99.8k  82.5k

MMIO's per IO measured by perf-kvm:

iodepth    1      2      4      8
  QEMU   2.01   1.99   1.99   1.99
 +dbbuf  1.00   0.52   0.27   0.46

The tests are done on Ubuntu 22.04 with 5.15.0-33 kernel with Intel(R) 
Xeon(R) Gold 6248R CPU @ 3.00GHz.

QEMU set up:

bin/x86_64-softmmu/qemu-system-x86_64 \
    -name "nvme-test" \
    -machine accel=kvm \
    -cpu host \
    -smp 4 \
    -m 8G \
    -daemonize \
    -device virtio-scsi-pci,id=scsi0 \
    -device scsi-hd,drive=hd0 \
    -drive file=$OSIMGF,if=none,aio=native,cache=none,format=qcow2,id=hd0,snapshot=on \
    -drive "id=nvm,if=none,file=null-co://,file.read-zeroes=on,format=raw" \
    -device nvme,serial=deadbeef,drive=nvm \
    -net user,hostfwd=tcp::8080-:22 \
    -net nic,model=virtio

FIO configuration:

[global]
ioengine=libaio
filename=/dev/nvme0n1
thread=1
group_reporting=1
direct=1
verify=0
time_based=1
ramp_time=0
runtime=30
;size=1G
;iodepth=1
rw=randread
bs=4k

[test]
numjobs=1

Changes since v2:
  - Do not ignore admin queue updates in nvme_process_db and nvme_post_cqes
  - Calculate db_addr and ei_addr in hard-coded way

Changes since v1:
  - Add compatibility with hosts that do not use admin queue shadow doorbell

Jinhao Fan (2):
  hw/nvme: Implement shadow doorbell buffer support
  hw/nvme: Add trace events for shadow doorbell buffer

 hw/nvme/ctrl.c       | 118 ++++++++++++++++++++++++++++++++++++++++++-
 hw/nvme/nvme.h       |   8 +++
 hw/nvme/trace-events |   5 ++
 include/block/nvme.h |   2 +
 4 files changed, 132 insertions(+), 1 deletion(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-06-16 12:34 [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Jinhao Fan
@ 2022-06-16 12:34 ` Jinhao Fan
  2022-06-27 19:17   ` Keith Busch
  2022-12-07 17:49   ` Guenter Roeck
  2022-06-16 12:34 ` [PATCH v3 2/2] hw/nvme: Add trace events for shadow doorbell buffer Jinhao Fan
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Jinhao Fan @ 2022-06-16 12:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: its, kbusch, Jinhao Fan

Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
command, the nvme_dbbuf_config function tries to associate each existing
SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
Queues created after the Doorbell Buffer Config command will have the
doorbell buffers associated with them when they are initialized.

In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
Doorbell buffer changes instead of wait for doorbell register changes.
This reduces the number of MMIOs.

In nvme_process_db(), update the shadow doorbell buffer value with
the doorbell register value if it is the admin queue. This is a hack
since hosts like Linux NVMe driver and SPDK do not use shadow
doorbell buffer for the admin queue. Copying the doorbell register
value to the shadow doorbell buffer allows us to support these hosts
as well as spec-compliant hosts that use shadow doorbell buffer for
the admin queue.

Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
---
 hw/nvme/ctrl.c       | 113 ++++++++++++++++++++++++++++++++++++++++++-
 hw/nvme/nvme.h       |   8 +++
 include/block/nvme.h |   2 +
 3 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 03760ddeae..f3aaff3e8d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -223,6 +223,7 @@ static const uint32_t nvme_cse_acs[256] = {
     [NVME_ADM_CMD_GET_FEATURES]     = NVME_CMD_EFF_CSUPP,
     [NVME_ADM_CMD_ASYNC_EV_REQ]     = NVME_CMD_EFF_CSUPP,
     [NVME_ADM_CMD_NS_ATTACHMENT]    = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
+    [NVME_ADM_CMD_DBBUF_CONFIG]     = NVME_CMD_EFF_CSUPP,
     [NVME_ADM_CMD_FORMAT_NVM]       = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
@@ -1304,6 +1305,12 @@ static inline void nvme_blk_write(BlockBackend *blk, int64_t offset,
     }
 }
 
+static void nvme_update_cq_head(NvmeCQueue *cq)
+{
+    pci_dma_read(&cq->ctrl->parent_obj, cq->db_addr, &cq->head,
+            sizeof(cq->head));
+}
+
 static void nvme_post_cqes(void *opaque)
 {
     NvmeCQueue *cq = opaque;
@@ -1316,6 +1323,10 @@ static void nvme_post_cqes(void *opaque)
         NvmeSQueue *sq;
         hwaddr addr;
 
+        if (n->dbbuf_enabled) {
+            nvme_update_cq_head(cq);
+        }
+
         if (nvme_cq_full(cq)) {
             break;
         }
@@ -4256,6 +4267,11 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
     }
     sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
 
+    if (n->dbbuf_enabled) {
+        sq->db_addr = n->dbbuf_dbs + (sqid << 3);
+        sq->ei_addr = n->dbbuf_eis + (sqid << 3);
+    }
+
     assert(n->cq[cqid]);
     cq = n->cq[cqid];
     QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry);
@@ -4615,6 +4631,10 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
     cq->head = cq->tail = 0;
     QTAILQ_INIT(&cq->req_list);
     QTAILQ_INIT(&cq->sq_list);
+    if (n->dbbuf_enabled) {
+        cq->db_addr = n->dbbuf_dbs + (cqid << 3) + (1 << 2);
+        cq->ei_addr = n->dbbuf_eis + (cqid << 3) + (1 << 2);
+    }
     n->cq[cqid] = cq;
     cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
 }
@@ -5767,6 +5787,50 @@ out:
     return status;
 }
 
+static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req)
+{
+    uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1);
+    uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2);
+    int i;
+
+    /* Address should be page aligned */
+    if (dbs_addr & (n->page_size - 1) || eis_addr & (n->page_size - 1)) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    /* Save shadow buffer base addr for use during queue creation */
+    n->dbbuf_dbs = dbs_addr;
+    n->dbbuf_eis = eis_addr;
+    n->dbbuf_enabled = true;
+
+    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
+        NvmeSQueue *sq = n->sq[i];
+        NvmeCQueue *cq = n->cq[i];
+
+        if (sq) {
+            /* 
+             * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3)
+             * nvme_process_db() uses this hard-coded way to calculate
+             * doorbell offsets. Be consistent with that here.
+             */
+            sq->db_addr = dbs_addr + (i << 3);
+            sq->ei_addr = eis_addr + (i << 3);
+            pci_dma_write(&n->parent_obj, sq->db_addr, &sq->tail,
+                    sizeof(sq->tail));
+        }
+
+        if (cq) {
+            /* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */
+            cq->db_addr = dbs_addr + (i << 3) + (1 << 2);
+            cq->ei_addr = eis_addr + (i << 3) + (1 << 2);
+            pci_dma_write(&n->parent_obj, cq->db_addr, &cq->head,
+                    sizeof(cq->head));
+        }
+    }
+
+    return NVME_SUCCESS;
+}
+
 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
     trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
@@ -5809,6 +5873,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
         return nvme_aer(n, req);
     case NVME_ADM_CMD_NS_ATTACHMENT:
         return nvme_ns_attachment(n, req);
+    case NVME_ADM_CMD_DBBUF_CONFIG:
+        return nvme_dbbuf_config(n, req);
     case NVME_ADM_CMD_FORMAT_NVM:
         return nvme_format(n, req);
     default:
@@ -5818,6 +5884,18 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
     return NVME_INVALID_OPCODE | NVME_DNR;
 }
 
+static void nvme_update_sq_eventidx(const NvmeSQueue *sq)
+{
+    pci_dma_write(&sq->ctrl->parent_obj, sq->ei_addr, &sq->tail,
+                  sizeof(sq->tail));
+}
+
+static void nvme_update_sq_tail(NvmeSQueue *sq)
+{
+    pci_dma_read(&sq->ctrl->parent_obj, sq->db_addr, &sq->tail,
+                 sizeof(sq->tail));
+}
+
 static void nvme_process_sq(void *opaque)
 {
     NvmeSQueue *sq = opaque;
@@ -5829,6 +5907,10 @@ static void nvme_process_sq(void *opaque)
     NvmeCmd cmd;
     NvmeRequest *req;
 
+    if (n->dbbuf_enabled) {
+        nvme_update_sq_tail(sq);
+    }
+
     while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
         addr = sq->dma_addr + sq->head * n->sqe_size;
         if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
@@ -5852,6 +5934,11 @@ static void nvme_process_sq(void *opaque)
             req->status = status;
             nvme_enqueue_req_completion(cq, req);
         }
+
+        if (n->dbbuf_enabled) {
+            nvme_update_sq_eventidx(sq);
+            nvme_update_sq_tail(sq);
+        }
     }
 }
 
@@ -5889,6 +5976,9 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
     n->aer_queued = 0;
     n->outstanding_aers = 0;
     n->qs_created = false;
+    n->dbbuf_dbs = 0;
+    n->dbbuf_eis = 0;
+    n->dbbuf_enabled = false;
 }
 
 static void nvme_ctrl_shutdown(NvmeCtrl *n)
@@ -6398,6 +6488,10 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 
         start_sqs = nvme_cq_full(cq) ? 1 : 0;
         cq->head = new_head;
+        if (!qid && n->dbbuf_enabled) {
+            pci_dma_write(&n->parent_obj, cq->db_addr, &cq->head,
+                          sizeof(cq->head));
+        }
         if (start_sqs) {
             NvmeSQueue *sq;
             QTAILQ_FOREACH(sq, &cq->sq_list, entry) {
@@ -6455,6 +6549,23 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
         trace_pci_nvme_mmio_doorbell_sq(sq->sqid, new_tail);
 
         sq->tail = new_tail;
+        if (!qid && n->dbbuf_enabled) {
+            /*
+             * The spec states "the host shall also update the controller's
+             * corresponding doorbell property to match the value of that entry
+             * in the Shadow Doorbell buffer."
+             *
+             * Since this context is currently a VM trap, we can safely enforce
+             * the requirement from the device side in case the host is
+             * misbehaving.
+             *
+             * Note, we shouldn't have to do this, but various drivers
+             * including ones that run on Linux, are not updating Admin Queues,
+             * so we can't trust reading it for an appropriate sq tail.
+             */
+            pci_dma_write(&n->parent_obj, sq->db_addr, &sq->tail,
+                          sizeof(sq->tail));
+        }
         timer_mod(sq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
     }
 }
@@ -6733,7 +6844,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 
     id->mdts = n->params.mdts;
     id->ver = cpu_to_le32(NVME_SPEC_VER);
-    id->oacs = cpu_to_le16(NVME_OACS_NS_MGMT | NVME_OACS_FORMAT);
+    id->oacs = cpu_to_le16(NVME_OACS_NS_MGMT | NVME_OACS_FORMAT | NVME_OACS_DBBUF);
     id->cntrltype = 0x1;
 
     /*
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 6773819325..4452e4b1bf 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -334,6 +334,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
     case NVME_ADM_CMD_GET_FEATURES:     return "NVME_ADM_CMD_GET_FEATURES";
     case NVME_ADM_CMD_ASYNC_EV_REQ:     return "NVME_ADM_CMD_ASYNC_EV_REQ";
     case NVME_ADM_CMD_NS_ATTACHMENT:    return "NVME_ADM_CMD_NS_ATTACHMENT";
+    case NVME_ADM_CMD_DBBUF_CONFIG:     return "NVME_ADM_CMD_DBBUF_CONFIG";
     case NVME_ADM_CMD_FORMAT_NVM:       return "NVME_ADM_CMD_FORMAT_NVM";
     default:                            return "NVME_ADM_CMD_UNKNOWN";
     }
@@ -365,6 +366,8 @@ typedef struct NvmeSQueue {
     uint32_t    tail;
     uint32_t    size;
     uint64_t    dma_addr;
+    uint64_t    db_addr;
+    uint64_t    ei_addr;
     QEMUTimer   *timer;
     NvmeRequest *io_req;
     QTAILQ_HEAD(, NvmeRequest) req_list;
@@ -382,6 +385,8 @@ typedef struct NvmeCQueue {
     uint32_t    vector;
     uint32_t    size;
     uint64_t    dma_addr;
+    uint64_t    db_addr;
+    uint64_t    ei_addr;
     QEMUTimer   *timer;
     QTAILQ_HEAD(, NvmeSQueue) sq_list;
     QTAILQ_HEAD(, NvmeRequest) req_list;
@@ -432,6 +437,9 @@ typedef struct NvmeCtrl {
     uint64_t    starttime_ms;
     uint16_t    temperature;
     uint8_t     smart_critical_warning;
+    uint64_t    dbbuf_dbs;
+    uint64_t    dbbuf_eis;
+    bool        dbbuf_enabled;
 
     struct {
         MemoryRegion mem;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 3737351cc8..5b522d7b0e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -595,6 +595,7 @@ enum NvmeAdminCommands {
     NVME_ADM_CMD_ACTIVATE_FW    = 0x10,
     NVME_ADM_CMD_DOWNLOAD_FW    = 0x11,
     NVME_ADM_CMD_NS_ATTACHMENT  = 0x15,
+    NVME_ADM_CMD_DBBUF_CONFIG   = 0x7c,
     NVME_ADM_CMD_FORMAT_NVM     = 0x80,
     NVME_ADM_CMD_SECURITY_SEND  = 0x81,
     NVME_ADM_CMD_SECURITY_RECV  = 0x82,
@@ -1134,6 +1135,7 @@ enum NvmeIdCtrlOacs {
     NVME_OACS_FORMAT    = 1 << 1,
     NVME_OACS_FW        = 1 << 2,
     NVME_OACS_NS_MGMT   = 1 << 3,
+    NVME_OACS_DBBUF     = 1 << 8,
 };
 
 enum NvmeIdCtrlOncs {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] hw/nvme: Add trace events for shadow doorbell buffer
  2022-06-16 12:34 [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Jinhao Fan
  2022-06-16 12:34 ` [PATCH v3 1/2] hw/nvme: Implement " Jinhao Fan
@ 2022-06-16 12:34 ` Jinhao Fan
  2022-06-17 11:54 ` [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Klaus Jensen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Jinhao Fan @ 2022-06-16 12:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: its, kbusch, Jinhao Fan

When shadow doorbell buffer is enabled, doorbell registers are lazily
updated. The actual queue head and tail pointers are stored in Shadow
Doorbell buffers.

Add trace events for updates on the Shadow Doorbell buffers and EventIdx
buffers. Also add trace event for the Doorbell Buffer Config command.

Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
---
 hw/nvme/ctrl.c       | 5 +++++
 hw/nvme/trace-events | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f3aaff3e8d..c952c34f94 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1309,6 +1309,7 @@ static void nvme_update_cq_head(NvmeCQueue *cq)
 {
     pci_dma_read(&cq->ctrl->parent_obj, cq->db_addr, &cq->head,
             sizeof(cq->head));
+    trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
 }
 
 static void nvme_post_cqes(void *opaque)
@@ -5828,6 +5829,8 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req)
         }
     }
 
+    trace_pci_nvme_dbbuf_config(dbs_addr, eis_addr);
+
     return NVME_SUCCESS;
 }
 
@@ -5888,12 +5891,14 @@ static void nvme_update_sq_eventidx(const NvmeSQueue *sq)
 {
     pci_dma_write(&sq->ctrl->parent_obj, sq->ei_addr, &sq->tail,
                   sizeof(sq->tail));
+    trace_pci_nvme_eventidx_sq(sq->sqid, sq->tail);
 }
 
 static void nvme_update_sq_tail(NvmeSQueue *sq)
 {
     pci_dma_read(&sq->ctrl->parent_obj, sq->db_addr, &sq->tail,
                  sizeof(sq->tail));
+    trace_pci_nvme_shadow_doorbell_sq(sq->sqid, sq->tail);
 }
 
 static void nvme_process_sq(void *opaque)
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index ff1b458969..00ee42f475 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -3,6 +3,7 @@ pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
 pci_nvme_irq_pin(void) "pulsing IRQ pin"
 pci_nvme_irq_masked(void) "IRQ is masked"
 pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
+pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=0x%"PRIx64" eis_addr=0x%"PRIx64""
 pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
 pci_nvme_map_addr_cmb(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
 pci_nvme_map_prp(uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
@@ -81,6 +82,8 @@ pci_nvme_enqueue_event_noqueue(int queued) "queued %d"
 pci_nvme_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
 pci_nvme_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
 pci_nvme_enqueue_req_completion(uint16_t cid, uint16_t cqid, uint32_t dw0, uint32_t dw1, uint16_t status) "cid %"PRIu16" cqid %"PRIu16" dw0 0x%"PRIx32" dw1 0x%"PRIx32" status 0x%"PRIx16""
+pci_nvme_eventidx_cq(uint16_t cqid, uint16_t new_eventidx) "cqid %"PRIu16" new_eventidx %"PRIu16""
+pci_nvme_eventidx_sq(uint16_t sqid, uint16_t new_eventidx) "sqid %"PRIu16" new_eventidx %"PRIu16""
 pci_nvme_mmio_read(uint64_t addr, unsigned size) "addr 0x%"PRIx64" size %d"
 pci_nvme_mmio_write(uint64_t addr, uint64_t data, unsigned size) "addr 0x%"PRIx64" data 0x%"PRIx64" size %d"
 pci_nvme_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_head %"PRIu16""
@@ -97,6 +100,8 @@ pci_nvme_mmio_start_success(void) "setting controller enable bit succeeded"
 pci_nvme_mmio_stopped(void) "cleared controller enable bit"
 pci_nvme_mmio_shutdown_set(void) "shutdown bit set"
 pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
+pci_nvme_shadow_doorbell_cq(uint16_t cqid, uint16_t new_shadow_doorbell) "cqid %"PRIu16" new_shadow_doorbell %"PRIu16""
+pci_nvme_shadow_doorbell_sq(uint16_t sqid, uint16_t new_shadow_doorbell) "sqid %"PRIu16" new_shadow_doorbell %"PRIu16""
 pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
  2022-06-16 12:34 [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Jinhao Fan
  2022-06-16 12:34 ` [PATCH v3 1/2] hw/nvme: Implement " Jinhao Fan
  2022-06-16 12:34 ` [PATCH v3 2/2] hw/nvme: Add trace events for shadow doorbell buffer Jinhao Fan
@ 2022-06-17 11:54 ` Klaus Jensen
  2022-06-17 12:47   ` Jinhao Fan
  2022-06-17 20:35 ` Keith Busch
  2022-06-27  6:00 ` Klaus Jensen
  4 siblings, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-06-17 11:54 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 2813 bytes --]

On Jun 16 20:34, Jinhao Fan wrote:
> This patch adds shadow doorbell buffer support in NVMe 1.3 to QEMU
> NVMe. The Doorbell Buffer Config admin command is implemented for the
> guest to enable shadow doobell buffer. When this feature is enabled, each
> SQ/CQ is associated with two buffers, i.e., Shadow Doorbell buffer and
> EventIdx buffer. According to the Spec, each queue's doorbell register
> is only updated when the Shadow Doorbell buffer value changes from being
> less than or equal to the value of the corresponding EventIdx buffer
> entry to being greater than that value. Therefore, the number of MMIO's
> on the doorbell registers is greatly reduced.
> 
> This patch is adapted from Huaicheng Li's patch[1] in 2018.
> 
> [1] https://patchwork.kernel.org/project/qemu-devel/patch/20180305194906.GA3630@gmail.com/
> 
> IOPS comparison with FIO:
> 
> iodepth    1      2      4      8
>   QEMU   25.1k  25.9k  24.5k  24.0k
>  +dbbuf  29.1k  60.1k  99.8k  82.5k
> 
> MMIO's per IO measured by perf-kvm:
> 
> iodepth    1      2      4      8
>   QEMU   2.01   1.99   1.99   1.99
>  +dbbuf  1.00   0.52   0.27   0.46
> 
> The tests are done on Ubuntu 22.04 with 5.15.0-33 kernel with Intel(R) 
> Xeon(R) Gold 6248R CPU @ 3.00GHz.
> 
> QEMU set up:
> 
> bin/x86_64-softmmu/qemu-system-x86_64 \
>     -name "nvme-test" \
>     -machine accel=kvm \
>     -cpu host \
>     -smp 4 \
>     -m 8G \
>     -daemonize \
>     -device virtio-scsi-pci,id=scsi0 \
>     -device scsi-hd,drive=hd0 \
>     -drive file=$OSIMGF,if=none,aio=native,cache=none,format=qcow2,id=hd0,snapshot=on \
>     -drive "id=nvm,if=none,file=null-co://,file.read-zeroes=on,format=raw" \
>     -device nvme,serial=deadbeef,drive=nvm \
>     -net user,hostfwd=tcp::8080-:22 \
>     -net nic,model=virtio
> 
> FIO configuration:
> 
> [global]
> ioengine=libaio
> filename=/dev/nvme0n1
> thread=1
> group_reporting=1
> direct=1
> verify=0
> time_based=1
> ramp_time=0
> runtime=30
> ;size=1G
> ;iodepth=1
> rw=randread
> bs=4k
> 
> [test]
> numjobs=1
> 
> Changes since v2:
>   - Do not ignore admin queue updates in nvme_process_db and nvme_post_cqes
>   - Calculate db_addr and ei_addr in hard-coded way
> 
> Changes since v1:
>   - Add compatibility with hosts that do not use admin queue shadow doorbell
> 
> Jinhao Fan (2):
>   hw/nvme: Implement shadow doorbell buffer support
>   hw/nvme: Add trace events for shadow doorbell buffer
> 
>  hw/nvme/ctrl.c       | 118 ++++++++++++++++++++++++++++++++++++++++++-
>  hw/nvme/nvme.h       |   8 +++
>  hw/nvme/trace-events |   5 ++
>  include/block/nvme.h |   2 +
>  4 files changed, 132 insertions(+), 1 deletion(-)
> 
> -- 
> 2.25.1
> 

LGTM,

Reviewed-by: Klaus Jensen <k.jensen@samsung.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
  2022-06-17 11:54 ` [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Klaus Jensen
@ 2022-06-17 12:47   ` Jinhao Fan
  2022-06-17 12:56     ` Klaus Jensen
  0 siblings, 1 reply; 26+ messages in thread
From: Jinhao Fan @ 2022-06-17 12:47 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: qemu-devel, kbusch



> On Jun 17, 2022, at 7:54 PM, Klaus Jensen <its@irrelevant.dk> wrote:
> 
> On Jun 16 20:34, Jinhao Fan wrote:
>> This patch adds shadow doorbell buffer support in NVMe 1.3 to QEMU
>> NVMe. The Doorbell Buffer Config admin command is implemented for the
>> guest to enable shadow doobell buffer. When this feature is enabled, each
>> SQ/CQ is associated with two buffers, i.e., Shadow Doorbell buffer and
>> EventIdx buffer. According to the Spec, each queue's doorbell register
>> is only updated when the Shadow Doorbell buffer value changes from being
>> less than or equal to the value of the corresponding EventIdx buffer
>> entry to being greater than that value. Therefore, the number of MMIO's
>> on the doorbell registers is greatly reduced.
>> 
>> This patch is adapted from Huaicheng Li's patch[1] in 2018.
>> 
>> [1] https://patchwork.kernel.org/project/qemu-devel/patch/20180305194906.GA3630@gmail.com/
>> 
>> IOPS comparison with FIO:
>> 
>> iodepth    1      2      4      8
>>  QEMU   25.1k  25.9k  24.5k  24.0k
>> +dbbuf  29.1k  60.1k  99.8k  82.5k
>> 
>> MMIO's per IO measured by perf-kvm:
>> 
>> iodepth    1      2      4      8
>>  QEMU   2.01   1.99   1.99   1.99
>> +dbbuf  1.00   0.52   0.27   0.46
>> 
>> The tests are done on Ubuntu 22.04 with 5.15.0-33 kernel with Intel(R) 
>> Xeon(R) Gold 6248R CPU @ 3.00GHz.
>> 
>> QEMU set up:
>> 
>> bin/x86_64-softmmu/qemu-system-x86_64 \
>>    -name "nvme-test" \
>>    -machine accel=kvm \
>>    -cpu host \
>>    -smp 4 \
>>    -m 8G \
>>    -daemonize \
>>    -device virtio-scsi-pci,id=scsi0 \
>>    -device scsi-hd,drive=hd0 \
>>    -drive file=$OSIMGF,if=none,aio=native,cache=none,format=qcow2,id=hd0,snapshot=on \
>>    -drive "id=nvm,if=none,file=null-co://,file.read-zeroes=on,format=raw" \
>>    -device nvme,serial=deadbeef,drive=nvm \
>>    -net user,hostfwd=tcp::8080-:22 \
>>    -net nic,model=virtio
>> 
>> FIO configuration:
>> 
>> [global]
>> ioengine=libaio
>> filename=/dev/nvme0n1
>> thread=1
>> group_reporting=1
>> direct=1
>> verify=0
>> time_based=1
>> ramp_time=0
>> runtime=30
>> ;size=1G
>> ;iodepth=1
>> rw=randread
>> bs=4k
>> 
>> [test]
>> numjobs=1
>> 
>> Changes since v2:
>>  - Do not ignore admin queue updates in nvme_process_db and nvme_post_cqes
>>  - Calculate db_addr and ei_addr in hard-coded way
>> 
>> Changes since v1:
>>  - Add compatibility with hosts that do not use admin queue shadow doorbell
>> 
>> Jinhao Fan (2):
>>  hw/nvme: Implement shadow doorbell buffer support
>>  hw/nvme: Add trace events for shadow doorbell buffer
>> 
>> hw/nvme/ctrl.c       | 118 ++++++++++++++++++++++++++++++++++++++++++-
>> hw/nvme/nvme.h       |   8 +++
>> hw/nvme/trace-events |   5 ++
>> include/block/nvme.h |   2 +
>> 4 files changed, 132 insertions(+), 1 deletion(-)
>> 
>> -- 
>> 2.25.1
>> 
> 
> LGTM,
> 
> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> 

Great!

I have two questions:

How many “Reviewed-by”’s do I need to get my patch applied?

Do I need to post a v4 patch to add the “Reviewed-by”’s in my commit 
message?

Thanks,
Jinhao Fan



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
  2022-06-17 12:47   ` Jinhao Fan
@ 2022-06-17 12:56     ` Klaus Jensen
  2022-06-17 13:02       ` Jinhao Fan
  0 siblings, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-06-17 12:56 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

On Jun 17 20:47, Jinhao Fan wrote:
> 
> 
> > On Jun 17, 2022, at 7:54 PM, Klaus Jensen <its@irrelevant.dk> wrote:
> > 
> > LGTM,
> > 
> > Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> > 
> 
> Great!
> 
> I have two questions:
> 
> How many “Reviewed-by”’s do I need to get my patch applied?
> 

That depends ;) The maintainers decide that.

> Do I need to post a v4 patch to add the “Reviewed-by”’s in my commit 
> message?
> 

Nope, the maintainer will pick that up when applying.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
  2022-06-17 12:56     ` Klaus Jensen
@ 2022-06-17 13:02       ` Jinhao Fan
  0 siblings, 0 replies; 26+ messages in thread
From: Jinhao Fan @ 2022-06-17 13:02 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: qemu-devel, kbusch

> On Jun 17, 2022, at 8:56 PM, Klaus Jensen <its@irrelevant.dk> wrote:
> 
> On Jun 17 20:47, Jinhao Fan wrote:
>> 
>> 
>>> On Jun 17, 2022, at 7:54 PM, Klaus Jensen <its@irrelevant.dk> wrote:
>>> 
>>> LGTM,
>>> 
>>> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
>>> 
>> 
>> Great!
>> 
>> I have two questions:
>> 
>> How many “Reviewed-by”’s do I need to get my patch applied?
>> 
> 
> That depends ;) The maintainers decide that.
> 
>> Do I need to post a v4 patch to add the “Reviewed-by”’s in my commit 
>> message?
>> 
> 
> Nope, the maintainer will pick that up when applying.

Gotcha! Thanks!


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
  2022-06-16 12:34 [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Jinhao Fan
                   ` (2 preceding siblings ...)
  2022-06-17 11:54 ` [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Klaus Jensen
@ 2022-06-17 20:35 ` Keith Busch
  2022-06-27  6:00 ` Klaus Jensen
  4 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2022-06-17 20:35 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, its

On Thu, Jun 16, 2022 at 08:34:06PM +0800, Jinhao Fan wrote:
> This patch adds shadow doorbell buffer support in NVMe 1.3 to QEMU
> NVMe. The Doorbell Buffer Config admin command is implemented for the
> guest to enable shadow doobell buffer. When this feature is enabled, each
> SQ/CQ is associated with two buffers, i.e., Shadow Doorbell buffer and
> EventIdx buffer. According to the Spec, each queue's doorbell register
> is only updated when the Shadow Doorbell buffer value changes from being
> less than or equal to the value of the corresponding EventIdx buffer
> entry to being greater than that value. Therefore, the number of MMIO's
> on the doorbell registers is greatly reduced.

Looks good to me, and passes my sanity tests.

Reviewed-by: Keith Busch <kbusch@kernel.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support
  2022-06-16 12:34 [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Jinhao Fan
                   ` (3 preceding siblings ...)
  2022-06-17 20:35 ` Keith Busch
@ 2022-06-27  6:00 ` Klaus Jensen
  4 siblings, 0 replies; 26+ messages in thread
From: Klaus Jensen @ 2022-06-27  6:00 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]

On Jun 16 20:34, Jinhao Fan wrote:
> This patch adds shadow doorbell buffer support in NVMe 1.3 to QEMU
> NVMe. The Doorbell Buffer Config admin command is implemented for the
> guest to enable shadow doobell buffer. When this feature is enabled, each
> SQ/CQ is associated with two buffers, i.e., Shadow Doorbell buffer and
> EventIdx buffer. According to the Spec, each queue's doorbell register
> is only updated when the Shadow Doorbell buffer value changes from being
> less than or equal to the value of the corresponding EventIdx buffer
> entry to being greater than that value. Therefore, the number of MMIO's
> on the doorbell registers is greatly reduced.
> 
> This patch is adapted from Huaicheng Li's patch[1] in 2018.
> 
> [1] https://patchwork.kernel.org/project/qemu-devel/patch/20180305194906.GA3630@gmail.com/
> 
> IOPS comparison with FIO:
> 
> iodepth    1      2      4      8
>   QEMU   25.1k  25.9k  24.5k  24.0k
>  +dbbuf  29.1k  60.1k  99.8k  82.5k
> 
> MMIO's per IO measured by perf-kvm:
> 
> iodepth    1      2      4      8
>   QEMU   2.01   1.99   1.99   1.99
>  +dbbuf  1.00   0.52   0.27   0.46
> 
> The tests are done on Ubuntu 22.04 with 5.15.0-33 kernel with Intel(R) 
> Xeon(R) Gold 6248R CPU @ 3.00GHz.
> 
> QEMU set up:
> 
> bin/x86_64-softmmu/qemu-system-x86_64 \
>     -name "nvme-test" \
>     -machine accel=kvm \
>     -cpu host \
>     -smp 4 \
>     -m 8G \
>     -daemonize \
>     -device virtio-scsi-pci,id=scsi0 \
>     -device scsi-hd,drive=hd0 \
>     -drive file=$OSIMGF,if=none,aio=native,cache=none,format=qcow2,id=hd0,snapshot=on \
>     -drive "id=nvm,if=none,file=null-co://,file.read-zeroes=on,format=raw" \
>     -device nvme,serial=deadbeef,drive=nvm \
>     -net user,hostfwd=tcp::8080-:22 \
>     -net nic,model=virtio
> 
> FIO configuration:
> 
> [global]
> ioengine=libaio
> filename=/dev/nvme0n1
> thread=1
> group_reporting=1
> direct=1
> verify=0
> time_based=1
> ramp_time=0
> runtime=30
> ;size=1G
> ;iodepth=1
> rw=randread
> bs=4k
> 
> [test]
> numjobs=1
> 
> Changes since v2:
>   - Do not ignore admin queue updates in nvme_process_db and nvme_post_cqes
>   - Calculate db_addr and ei_addr in hard-coded way
> 
> Changes since v1:
>   - Add compatibility with hosts that do not use admin queue shadow doorbell
> 
> Jinhao Fan (2):
>   hw/nvme: Implement shadow doorbell buffer support
>   hw/nvme: Add trace events for shadow doorbell buffer
> 
>  hw/nvme/ctrl.c       | 118 ++++++++++++++++++++++++++++++++++++++++++-
>  hw/nvme/nvme.h       |   8 +++
>  hw/nvme/trace-events |   5 ++
>  include/block/nvme.h |   2 +
>  4 files changed, 132 insertions(+), 1 deletion(-)
> 
> -- 
> 2.25.1
> 
> 

Jinhao,

Thanks, applied to nvme-next!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-06-16 12:34 ` [PATCH v3 1/2] hw/nvme: Implement " Jinhao Fan
@ 2022-06-27 19:17   ` Keith Busch
  2022-06-27 19:33     ` Klaus Jensen
  2022-12-07 17:49   ` Guenter Roeck
  1 sibling, 1 reply; 26+ messages in thread
From: Keith Busch @ 2022-06-27 19:17 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, its

On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
>      }
>      sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
>  
> +    if (n->dbbuf_enabled) {
> +        sq->db_addr = n->dbbuf_dbs + (sqid << 3);
> +        sq->ei_addr = n->dbbuf_eis + (sqid << 3);
> +    }
> +
>      assert(n->cq[cqid]);
>      cq = n->cq[cqid];
>      QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry);
> @@ -4615,6 +4631,10 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
>      cq->head = cq->tail = 0;
>      QTAILQ_INIT(&cq->req_list);
>      QTAILQ_INIT(&cq->sq_list);
> +    if (n->dbbuf_enabled) {
> +        cq->db_addr = n->dbbuf_dbs + (cqid << 3) + (1 << 2);
> +        cq->ei_addr = n->dbbuf_eis + (cqid << 3) + (1 << 2);
> +    }
>      n->cq[cqid] = cq;
>      cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
>  }

I just notice these address calculations changed from previous versions. What
happened to taking the doorbell stride into account? Spec says the shadows and
events follow the same stride spacing as the registers.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-06-27 19:17   ` Keith Busch
@ 2022-06-27 19:33     ` Klaus Jensen
  2022-06-28  0:29       ` Jinhao Fan
  0 siblings, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-06-27 19:33 UTC (permalink / raw)
  To: Keith Busch; +Cc: Jinhao Fan, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]

On Jun 27 13:17, Keith Busch wrote:
> On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> >      }
> >      sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
> >  
> > +    if (n->dbbuf_enabled) {
> > +        sq->db_addr = n->dbbuf_dbs + (sqid << 3);
> > +        sq->ei_addr = n->dbbuf_eis + (sqid << 3);
> > +    }
> > +
> >      assert(n->cq[cqid]);
> >      cq = n->cq[cqid];
> >      QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry);
> > @@ -4615,6 +4631,10 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
> >      cq->head = cq->tail = 0;
> >      QTAILQ_INIT(&cq->req_list);
> >      QTAILQ_INIT(&cq->sq_list);
> > +    if (n->dbbuf_enabled) {
> > +        cq->db_addr = n->dbbuf_dbs + (cqid << 3) + (1 << 2);
> > +        cq->ei_addr = n->dbbuf_eis + (cqid << 3) + (1 << 2);
> > +    }
> >      n->cq[cqid] = cq;
> >      cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
> >  }
> 
> I just notice these address calculations changed from previous versions. What
> happened to taking the doorbell stride into account? Spec says the shadows and
> events follow the same stride spacing as the registers.

The stride is fixed at zero.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-06-27 19:33     ` Klaus Jensen
@ 2022-06-28  0:29       ` Jinhao Fan
  0 siblings, 0 replies; 26+ messages in thread
From: Jinhao Fan @ 2022-06-28  0:29 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Keith Busch, qemu-devel



> On Jun 28, 2022, at 3:33 AM, Klaus Jensen <its@irrelevant.dk> wrote:
> 
> On Jun 27 13:17, Keith Busch wrote:
>> On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
>>>     }
>>>     sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
>>> 
>>> +    if (n->dbbuf_enabled) {
>>> +        sq->db_addr = n->dbbuf_dbs + (sqid << 3);
>>> +        sq->ei_addr = n->dbbuf_eis + (sqid << 3);
>>> +    }
>>> +
>>>     assert(n->cq[cqid]);
>>>     cq = n->cq[cqid];
>>>     QTAILQ_INSERT_TAIL(&(cq->sq_list), sq, entry);
>>> @@ -4615,6 +4631,10 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
>>>     cq->head = cq->tail = 0;
>>>     QTAILQ_INIT(&cq->req_list);
>>>     QTAILQ_INIT(&cq->sq_list);
>>> +    if (n->dbbuf_enabled) {
>>> +        cq->db_addr = n->dbbuf_dbs + (cqid << 3) + (1 << 2);
>>> +        cq->ei_addr = n->dbbuf_eis + (cqid << 3) + (1 << 2);
>>> +    }
>>>     n->cq[cqid] = cq;
>>>     cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
>>> }
>> 
>> I just notice these address calculations changed from previous versions. What
>> happened to taking the doorbell stride into account? Spec says the shadows and
>> events follow the same stride spacing as the registers.
> 
> The stride is fixed at zero.

Yes. I noticed that nvme_process_db() uses a similar hard-coded approach to get
qid from doorbell register offset. So I hard coded here as well. I can submit a tiny
patch later to wrap this calculation.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-06-16 12:34 ` [PATCH v3 1/2] hw/nvme: Implement " Jinhao Fan
  2022-06-27 19:17   ` Keith Busch
@ 2022-12-07 17:49   ` Guenter Roeck
  2022-12-08  7:16     ` Klaus Jensen
  1 sibling, 1 reply; 26+ messages in thread
From: Guenter Roeck @ 2022-12-07 17:49 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, its, kbusch

Hi,

On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
> and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
> in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
> command, the nvme_dbbuf_config function tries to associate each existing
> SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
> Queues created after the Doorbell Buffer Config command will have the
> doorbell buffers associated with them when they are initialized.
> 
> In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
> Doorbell buffer changes instead of wait for doorbell register changes.
> This reduces the number of MMIOs.
> 
> In nvme_process_db(), update the shadow doorbell buffer value with
> the doorbell register value if it is the admin queue. This is a hack
> since hosts like Linux NVMe driver and SPDK do not use shadow
> doorbell buffer for the admin queue. Copying the doorbell register
> value to the shadow doorbell buffer allows us to support these hosts
> as well as spec-compliant hosts that use shadow doorbell buffer for
> the admin queue.
> 
> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>

I noticed that I can no longer boot Linux kernels from nvme on riscv64
systems. The problem is seen with qemu v7.1 and qemu v7.2-rc4.
The log shows:

[   35.904128] nvme nvme0: I/O 642 (I/O Cmd) QID 1 timeout, aborting
[   35.905000] EXT4-fs (nvme0n1): mounting ext2 file system using the ext4 subsystem
[   66.623863] nvme nvme0: I/O 643 (I/O Cmd) QID 1 timeout, aborting
[   97.343989] nvme nvme0: Abort status: 0x0
[   97.344355] nvme nvme0: Abort status: 0x0
[   97.344647] nvme nvme0: I/O 7 QID 0 timeout, reset controller
[   97.350568] nvme nvme0: I/O 644 (I/O Cmd) QID 1 timeout, aborting

This is with the mainline Linux kernel (v6.1-rc8).

Bisect points to this patch. Reverting this patch and a number of associated
patches (to fix conflicts) fixes the problem.

06143d8771 Revert "hw/nvme: Implement shadow doorbell buffer support"
acb4443e3a Revert "hw/nvme: Use ioeventfd to handle doorbell updates"
d5fd309feb Revert "hw/nvme: do not enable ioeventfd by default"
1ca1e6c47c Revert "hw/nvme: unregister the event notifier handler on the main loop"
2d26abd51e Revert "hw/nvme: skip queue processing if notifier is cleared"
99d411b5a5 Revert "hw/nvme: reenable cqe batching"
2293d3ca6c Revert "hw/nvme: Add trace events for shadow doorbell buffer"

Qemu command line:

qemu-system-riscv64 -M virt -m 512M \
     -kernel arch/riscv/boot/Image -snapshot \
     -device nvme,serial=foo,drive=d0 \
     -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
     -bios default \
     -append "root=/dev/nvme0n1 console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
     -nographic -monitor none

Guenter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-07 17:49   ` Guenter Roeck
@ 2022-12-08  7:16     ` Klaus Jensen
  2022-12-08  8:08       ` Klaus Jensen
  0 siblings, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-12-08  7:16 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jinhao Fan, qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 3525 bytes --]

On Dec  7 09:49, Guenter Roeck wrote:
> Hi,
> 
> On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> > Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
> > and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
> > in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
> > command, the nvme_dbbuf_config function tries to associate each existing
> > SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
> > Queues created after the Doorbell Buffer Config command will have the
> > doorbell buffers associated with them when they are initialized.
> > 
> > In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
> > Doorbell buffer changes instead of wait for doorbell register changes.
> > This reduces the number of MMIOs.
> > 
> > In nvme_process_db(), update the shadow doorbell buffer value with
> > the doorbell register value if it is the admin queue. This is a hack
> > since hosts like Linux NVMe driver and SPDK do not use shadow
> > doorbell buffer for the admin queue. Copying the doorbell register
> > value to the shadow doorbell buffer allows us to support these hosts
> > as well as spec-compliant hosts that use shadow doorbell buffer for
> > the admin queue.
> > 
> > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> 
> I noticed that I can no longer boot Linux kernels from nvme on riscv64
> systems. The problem is seen with qemu v7.1 and qemu v7.2-rc4.
> The log shows:
> 
> [   35.904128] nvme nvme0: I/O 642 (I/O Cmd) QID 1 timeout, aborting
> [   35.905000] EXT4-fs (nvme0n1): mounting ext2 file system using the ext4 subsystem
> [   66.623863] nvme nvme0: I/O 643 (I/O Cmd) QID 1 timeout, aborting
> [   97.343989] nvme nvme0: Abort status: 0x0
> [   97.344355] nvme nvme0: Abort status: 0x0
> [   97.344647] nvme nvme0: I/O 7 QID 0 timeout, reset controller
> [   97.350568] nvme nvme0: I/O 644 (I/O Cmd) QID 1 timeout, aborting
> 
> This is with the mainline Linux kernel (v6.1-rc8).
> 
> Bisect points to this patch. Reverting this patch and a number of associated
> patches (to fix conflicts) fixes the problem.
> 
> 06143d8771 Revert "hw/nvme: Implement shadow doorbell buffer support"
> acb4443e3a Revert "hw/nvme: Use ioeventfd to handle doorbell updates"
> d5fd309feb Revert "hw/nvme: do not enable ioeventfd by default"
> 1ca1e6c47c Revert "hw/nvme: unregister the event notifier handler on the main loop"
> 2d26abd51e Revert "hw/nvme: skip queue processing if notifier is cleared"
> 99d411b5a5 Revert "hw/nvme: reenable cqe batching"
> 2293d3ca6c Revert "hw/nvme: Add trace events for shadow doorbell buffer"
> 
> Qemu command line:
> 
> qemu-system-riscv64 -M virt -m 512M \
>      -kernel arch/riscv/boot/Image -snapshot \
>      -device nvme,serial=foo,drive=d0 \
>      -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
>      -bios default \
>      -append "root=/dev/nvme0n1 console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
>      -nographic -monitor none
> 
> Guenter

Hi Guenter,

Thanks for the bisect.

The shadow doorbell is also an obvious candidate for this regression. I
wonder if this could be a kernel bug, since we are not observing this on
other architectures. The memory barriers required are super finicky, but
in QEMU all the operations are associated with full memory barriers. The
barriers are more fine grained in the kernel though.

I will dig into this together with Keith.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08  7:16     ` Klaus Jensen
@ 2022-12-08  8:08       ` Klaus Jensen
  2022-12-08 18:47         ` Guenter Roeck
  0 siblings, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-12-08  8:08 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jinhao Fan, qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 4057 bytes --]

On Dec  8 08:16, Klaus Jensen wrote:
> On Dec  7 09:49, Guenter Roeck wrote:
> > Hi,
> > 
> > On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> > > Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
> > > and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
> > > in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
> > > command, the nvme_dbbuf_config function tries to associate each existing
> > > SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
> > > Queues created after the Doorbell Buffer Config command will have the
> > > doorbell buffers associated with them when they are initialized.
> > > 
> > > In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
> > > Doorbell buffer changes instead of wait for doorbell register changes.
> > > This reduces the number of MMIOs.
> > > 
> > > In nvme_process_db(), update the shadow doorbell buffer value with
> > > the doorbell register value if it is the admin queue. This is a hack
> > > since hosts like Linux NVMe driver and SPDK do not use shadow
> > > doorbell buffer for the admin queue. Copying the doorbell register
> > > value to the shadow doorbell buffer allows us to support these hosts
> > > as well as spec-compliant hosts that use shadow doorbell buffer for
> > > the admin queue.
> > > 
> > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > 
> > I noticed that I can no longer boot Linux kernels from nvme on riscv64
> > systems. The problem is seen with qemu v7.1 and qemu v7.2-rc4.
> > The log shows:
> > 
> > [   35.904128] nvme nvme0: I/O 642 (I/O Cmd) QID 1 timeout, aborting
> > [   35.905000] EXT4-fs (nvme0n1): mounting ext2 file system using the ext4 subsystem
> > [   66.623863] nvme nvme0: I/O 643 (I/O Cmd) QID 1 timeout, aborting
> > [   97.343989] nvme nvme0: Abort status: 0x0
> > [   97.344355] nvme nvme0: Abort status: 0x0
> > [   97.344647] nvme nvme0: I/O 7 QID 0 timeout, reset controller
> > [   97.350568] nvme nvme0: I/O 644 (I/O Cmd) QID 1 timeout, aborting
> > 
> > This is with the mainline Linux kernel (v6.1-rc8).
> > 
> > Bisect points to this patch. Reverting this patch and a number of associated
> > patches (to fix conflicts) fixes the problem.
> > 
> > 06143d8771 Revert "hw/nvme: Implement shadow doorbell buffer support"
> > acb4443e3a Revert "hw/nvme: Use ioeventfd to handle doorbell updates"
> > d5fd309feb Revert "hw/nvme: do not enable ioeventfd by default"
> > 1ca1e6c47c Revert "hw/nvme: unregister the event notifier handler on the main loop"
> > 2d26abd51e Revert "hw/nvme: skip queue processing if notifier is cleared"
> > 99d411b5a5 Revert "hw/nvme: reenable cqe batching"
> > 2293d3ca6c Revert "hw/nvme: Add trace events for shadow doorbell buffer"
> > 
> > Qemu command line:
> > 
> > qemu-system-riscv64 -M virt -m 512M \
> >      -kernel arch/riscv/boot/Image -snapshot \
> >      -device nvme,serial=foo,drive=d0 \
> >      -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> >      -bios default \
> >      -append "root=/dev/nvme0n1 console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
> >      -nographic -monitor none
> > 
> > Guenter
> 
> Hi Guenter,
> 
> Thanks for the bisect.
> 
> The shadow doorbell is also an obvious candidate for this regression. I
> wonder if this could be a kernel bug, since we are not observing this on
> other architectures. The memory barriers required are super finicky, but
> in QEMU all the operations are associated with full memory barriers. The
> barriers are more fine grained in the kernel though.
> 
> I will dig into this together with Keith.

A cq head doorbell mmio is skipped... And it is not the fault of the
kernel. The kernel is in it's good right to skip the mmio since the cq
eventidx is not properly updated.

Adding that and it boots properly on riscv. But I'm perplexed as to why
this didnt show up on our regularly tested platforms.

Gonna try to get this in for 7.2!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08  8:08       ` Klaus Jensen
@ 2022-12-08 18:47         ` Guenter Roeck
  2022-12-08 20:13           ` Guenter Roeck
  0 siblings, 1 reply; 26+ messages in thread
From: Guenter Roeck @ 2022-12-08 18:47 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Jinhao Fan, qemu-devel, kbusch

On Thu, Dec 08, 2022 at 09:08:12AM +0100, Klaus Jensen wrote:
> On Dec  8 08:16, Klaus Jensen wrote:
> > On Dec  7 09:49, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> > > > Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
> > > > and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
> > > > in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
> > > > command, the nvme_dbbuf_config function tries to associate each existing
> > > > SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
> > > > Queues created after the Doorbell Buffer Config command will have the
> > > > doorbell buffers associated with them when they are initialized.
> > > > 
> > > > In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
> > > > Doorbell buffer changes instead of wait for doorbell register changes.
> > > > This reduces the number of MMIOs.
> > > > 
> > > > In nvme_process_db(), update the shadow doorbell buffer value with
> > > > the doorbell register value if it is the admin queue. This is a hack
> > > > since hosts like Linux NVMe driver and SPDK do not use shadow
> > > > doorbell buffer for the admin queue. Copying the doorbell register
> > > > value to the shadow doorbell buffer allows us to support these hosts
> > > > as well as spec-compliant hosts that use shadow doorbell buffer for
> > > > the admin queue.
> > > > 
> > > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > > 
> > > I noticed that I can no longer boot Linux kernels from nvme on riscv64
> > > systems. The problem is seen with qemu v7.1 and qemu v7.2-rc4.
> > > The log shows:
> > > 
> > > [   35.904128] nvme nvme0: I/O 642 (I/O Cmd) QID 1 timeout, aborting
> > > [   35.905000] EXT4-fs (nvme0n1): mounting ext2 file system using the ext4 subsystem
> > > [   66.623863] nvme nvme0: I/O 643 (I/O Cmd) QID 1 timeout, aborting
> > > [   97.343989] nvme nvme0: Abort status: 0x0
> > > [   97.344355] nvme nvme0: Abort status: 0x0
> > > [   97.344647] nvme nvme0: I/O 7 QID 0 timeout, reset controller
> > > [   97.350568] nvme nvme0: I/O 644 (I/O Cmd) QID 1 timeout, aborting
> > > 
> > > This is with the mainline Linux kernel (v6.1-rc8).
> > > 
> > > Bisect points to this patch. Reverting this patch and a number of associated
> > > patches (to fix conflicts) fixes the problem.
> > > 
> > > 06143d8771 Revert "hw/nvme: Implement shadow doorbell buffer support"
> > > acb4443e3a Revert "hw/nvme: Use ioeventfd to handle doorbell updates"
> > > d5fd309feb Revert "hw/nvme: do not enable ioeventfd by default"
> > > 1ca1e6c47c Revert "hw/nvme: unregister the event notifier handler on the main loop"
> > > 2d26abd51e Revert "hw/nvme: skip queue processing if notifier is cleared"
> > > 99d411b5a5 Revert "hw/nvme: reenable cqe batching"
> > > 2293d3ca6c Revert "hw/nvme: Add trace events for shadow doorbell buffer"
> > > 
> > > Qemu command line:
> > > 
> > > qemu-system-riscv64 -M virt -m 512M \
> > >      -kernel arch/riscv/boot/Image -snapshot \
> > >      -device nvme,serial=foo,drive=d0 \
> > >      -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > >      -bios default \
> > >      -append "root=/dev/nvme0n1 console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
> > >      -nographic -monitor none
> > > 
> > > Guenter
> > 
> > Hi Guenter,
> > 
> > Thanks for the bisect.
> > 
> > The shadow doorbell is also an obvious candidate for this regression. I
> > wonder if this could be a kernel bug, since we are not observing this on
> > other architectures. The memory barriers required are super finicky, but
> > in QEMU all the operations are associated with full memory barriers. The
> > barriers are more fine grained in the kernel though.
> > 
> > I will dig into this together with Keith.
> 
> A cq head doorbell mmio is skipped... And it is not the fault of the
> kernel. The kernel is in it's good right to skip the mmio since the cq
> eventidx is not properly updated.
> 
> Adding that and it boots properly on riscv. But I'm perplexed as to why
> this didnt show up on our regularly tested platforms.
> 
> Gonna try to get this in for 7.2!

I see another problem with sparc64.

[    5.261508] could not locate request for tag 0x0
[    5.261711] nvme nvme0: invalid id 0 completed on queue 1

That is seen repeatedly until the request times out. I'll test with
your patch to see if it resolves this problem as well, and will bisect
otherwise.

Guenter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08 18:47         ` Guenter Roeck
@ 2022-12-08 20:13           ` Guenter Roeck
  2022-12-08 20:28             ` Keith Busch
  2022-12-08 20:39             ` Guenter Roeck
  0 siblings, 2 replies; 26+ messages in thread
From: Guenter Roeck @ 2022-12-08 20:13 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Jinhao Fan, qemu-devel, kbusch

On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > 
> > A cq head doorbell mmio is skipped... And it is not the fault of the
> > kernel. The kernel is in it's good right to skip the mmio since the cq
> > eventidx is not properly updated.
> > 
> > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > this didnt show up on our regularly tested platforms.
> > 
> > Gonna try to get this in for 7.2!
> 
> I see another problem with sparc64.
> 
> [    5.261508] could not locate request for tag 0x0
> [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> 
> That is seen repeatedly until the request times out. I'll test with
> your patch to see if it resolves this problem as well, and will bisect
> otherwise.
> 
The second problem is unrelated to the doorbell problem.
It is first seen in qemu v7.1. I'll try to bisect.

Guenter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08 20:13           ` Guenter Roeck
@ 2022-12-08 20:28             ` Keith Busch
  2022-12-08 21:16               ` Guenter Roeck
  2022-12-08 20:39             ` Guenter Roeck
  1 sibling, 1 reply; 26+ messages in thread
From: Keith Busch @ 2022-12-08 20:28 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Klaus Jensen, Jinhao Fan, qemu-devel, Keith Busch

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

When the request times out, the kernel should be printing the command ID.
What does that say? The driver thinks the 0 is invalid, so I'm just curious
what value it's expecting.

On Thu, Dec 8, 2022, 8:13 PM Guenter Roeck <linux@roeck-us.net> wrote:

> On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > >
> > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > kernel. The kernel is in it's good right to skip the mmio since the cq
> > > eventidx is not properly updated.
> > >
> > > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > > this didnt show up on our regularly tested platforms.
> > >
> > > Gonna try to get this in for 7.2!
> >
> > I see another problem with sparc64.
> >
> > [    5.261508] could not locate request for tag 0x0
> > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> >
> > That is seen repeatedly until the request times out. I'll test with
> > your patch to see if it resolves this problem as well, and will bisect
> > otherwise.
> >
> The second problem is unrelated to the doorbell problem.
> It is first seen in qemu v7.1. I'll try to bisect.
>
> Guenter
>

[-- Attachment #2: Type: text/html, Size: 1618 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08 20:13           ` Guenter Roeck
  2022-12-08 20:28             ` Keith Busch
@ 2022-12-08 20:39             ` Guenter Roeck
  2022-12-09 11:00               ` Guenter Roeck
  2022-12-12  9:58               ` Klaus Jensen
  1 sibling, 2 replies; 26+ messages in thread
From: Guenter Roeck @ 2022-12-08 20:39 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Jinhao Fan, qemu-devel, kbusch

On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
> On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > > 
> > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > kernel. The kernel is in it's good right to skip the mmio since the cq
> > > eventidx is not properly updated.
> > > 
> > > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > > this didnt show up on our regularly tested platforms.
> > > 
> > > Gonna try to get this in for 7.2!
> > 
> > I see another problem with sparc64.
> > 
> > [    5.261508] could not locate request for tag 0x0
> > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> > 
> > That is seen repeatedly until the request times out. I'll test with
> > your patch to see if it resolves this problem as well, and will bisect
> > otherwise.
> > 
> The second problem is unrelated to the doorbell problem.
> It is first seen in qemu v7.1. I'll try to bisect.
> 

Unfortunately, the problem observed with sparc64 also bisects to this
patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
does not fix it (which is why I initially thought it was unrelated).

I used the following qemu command line.

qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
    -device nvme,serial=foo,drive=d0,bus=pciB \
    -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
    -kernel arch/sparc/boot/image -no-reboot \
    -append "root=/dev/nvme0n1 console=ttyS0" \
    -nographic -monitor none

Guenter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08 20:28             ` Keith Busch
@ 2022-12-08 21:16               ` Guenter Roeck
  0 siblings, 0 replies; 26+ messages in thread
From: Guenter Roeck @ 2022-12-08 21:16 UTC (permalink / raw)
  To: Keith Busch; +Cc: Klaus Jensen, Jinhao Fan, qemu-devel, Keith Busch

On 12/8/22 12:28, Keith Busch wrote:
> When the request times out, the kernel should be printing the command ID. What does that say? The driver thinks the 0 is invalid, so I'm just curious what value it's expecting.
> 

After some time I see the following.

...
[   88.071197] nvme nvme0: invalid id 0 completed on queue 1
[   88.071514] could not locate request for tag 0x0
[   88.071802] nvme nvme0: invalid id 0 completed on queue 1
[   88.072135] could not locate request for tag 0x0
[   88.072426] nvme nvme0: invalid id 0 completed on queue 1
[   88.072720] could not locate request for tag 0x0
[   88.073007] nvme nvme0: invalid id 0 completed on queue 1
[   88.073343] nvme nvme0: request 0x50 genctr mismatch (got 0x0 expected 0x1)
[   88.073774] nvme nvme0: invalid id 80 completed on queue 1
[   88.074127] nvme nvme0: request 0x4f genctr mismatch (got 0x0 expected 0x1)
[   88.074556] nvme nvme0: invalid id 79 completed on queue 1
[   88.074903] nvme nvme0: request 0x4e genctr mismatch (got 0x0 expected 0x1)
[   88.075318] nvme nvme0: invalid id 78 completed on queue 1
[   88.075803] nvme nvme0: request 0x45 genctr mismatch (got 0x0 expected 0x1)
[   88.076239] nvme nvme0: invalid id 69 completed on queue 1
[   88.076585] nvme nvme0: request 0x46 genctr mismatch (got 0x0 expected 0x1)
[   88.076990] nvme nvme0: invalid id 70 completed on queue 1
[   88.077314] nvme nvme0: request 0x47 genctr mismatch (got 0x0 expected 0x1)
[   88.077744] nvme nvme0: invalid id 71 completed on queue 1
[   88.078064] nvme nvme0: request 0x48 genctr mismatch (got 0x0 expected 0x1)
[   88.078465] nvme nvme0: invalid id 72 completed on queue 1
[   88.078792] nvme nvme0: request 0x49 genctr mismatch (got 0x0 expected 0x1)
[   88.079190] nvme nvme0: invalid id 73 completed on queue 1
[   88.079522] nvme nvme0: request 0x4a genctr mismatch (got 0x0 expected 0x1)
[   88.079918] nvme nvme0: invalid id 74 completed on queue 1
[   88.080243] nvme nvme0: request 0x4b genctr mismatch (got 0x0 expected 0x1)
[   88.080630] nvme nvme0: invalid id 75 completed on queue 1
[   88.080963] nvme nvme0: request 0x4c genctr mismatch (got 0x0 expected 0x1)
[   88.081361] nvme nvme0: invalid id 76 completed on queue 1
[   88.081687] nvme nvme0: request 0x4d genctr mismatch (got 0x0 expected 0x1)
[   88.082081] nvme nvme0: invalid id 77 completed on queue 1
[   89.061345] irq 9: nobody cared (try booting with the "irqpoll" option)
[   89.061794] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.1.0-rc8+ #1
[   89.062220] Call Trace:
[   89.062383] [<0000000000eb7518>] __report_bad_irq+0x38/0xb4
[   89.062685] [<00000000004e6e58>] note_interrupt+0x318/0x380
[   89.063000] [<00000000004e2f00>] handle_irq_event+0x80/0xc0
[   89.063296] [<00000000004e7e10>] handle_fasteoi_irq+0x90/0x220
[   89.063631] [<00000000004e18e8>] generic_handle_irq+0x28/0x40
[   89.063946] [<0000000000ede2ec>] handler_irq+0xac/0x100
[   89.064255] [<00000000004274b0>] sys_call_table+0x760/0x970
[   89.064578] [<0000000000eb2ee0>] ffs+0x0/0x18
[   89.064848] [<000000000042be0c>] do_softirq_own_stack+0x2c/0x40
[   89.065195] [<000000000046fd50>] __irq_exit_rcu+0xf0/0x140
[   89.065520] [<0000000000470744>] irq_exit+0x4/0x40
[   89.065830] [<0000000000ede3c4>] timer_interrupt+0x84/0xc0
[   89.066184] [<00000000004274f8>] sys_call_table+0x7a8/0x970
[   89.066546] [<00000000008f75e0>] blk_mq_do_dispatch_sched+0xa0/0x380
[   89.066940] [<00000000008f7ad4>] __blk_mq_sched_dispatch_requests+0x94/0x160
[   89.067359] [<00000000008f7bfc>] blk_mq_sched_dispatch_requests+0x3c/0x80
[   89.067774] handlers:
[   89.067952] [<(____ptrval____)>] nvme_irq
[   89.068254] [<(____ptrval____)>] nvme_irq
[   89.068538] Disabling IRQ #9
[   89.069837] random: crng init done
[   89.183077] could not locate request for tag 0x0
[   89.183475] nvme nvme0: invalid id 0 completed on queue 1
[   89.183824] could not locate request for tag 0x0
...
[   89.766750] nvme nvme0: invalid id 0 completed on queue 1
[   89.767076] could not locate request for tag 0x0
[   89.767361] nvme nvme0: invalid id 0 completed on queue 1
[   89.767701] nvme nvme0: request 0x4d genctr mismatch (got 0x0 expected 0x1)
[   89.768114] nvme nvme0: invalid id 77 completed on queue 1
[   89.768455] nvme nvme0: request 0x4c genctr mismatch (got 0x0 expected 0x1)
[   89.768876] nvme nvme0: invalid id 76 completed on queue 1
[   89.769215] nvme nvme0: request 0x4b genctr mismatch (got 0x0 expected 0x1)
[   89.769630] nvme nvme0: invalid id 75 completed on queue 1
[   89.769991] nvme nvme0: request 0x4a genctr mismatch (got 0x0 expected 0x1)
[   89.770409] nvme nvme0: invalid id 74 completed on queue 1
[   89.770750] nvme nvme0: request 0x49 genctr mismatch (got 0x0 expected 0x1)
[   89.771171] nvme nvme0: invalid id 73 completed on queue 1
[   89.771513] nvme nvme0: request 0x48 genctr mismatch (got 0x0 expected 0x1)
[   89.771934] nvme nvme0: invalid id 72 completed on queue 1
[   89.772286] nvme nvme0: request 0x47 genctr mismatch (got 0x0 expected 0x1)
[   89.772707] nvme nvme0: invalid id 71 completed on queue 1
[   89.773048] nvme nvme0: request 0x46 genctr mismatch (got 0x0 expected 0x1)
[   89.773478] nvme nvme0: invalid id 70 completed on queue 1
[   89.773958] nvme nvme0: request 0x51 genctr mismatch (got 0x0 expected 0x1)
[   89.774393] nvme nvme0: invalid id 81 completed on queue 1
[   89.774740] nvme nvme0: request 0x50 genctr mismatch (got 0x0 expected 0x1)
[   89.775163] nvme nvme0: invalid id 80 completed on queue 1
[   89.775514] nvme nvme0: request 0x4f genctr mismatch (got 0x0 expected 0x1)
[   89.775902] nvme nvme0: invalid id 79 completed on queue 1
[   89.776239] nvme nvme0: request 0x4e genctr mismatch (got 0x0 expected 0x1)
[   89.776649] nvme nvme0: invalid id 78 completed on queue 1
[   89.782920] could not locate request for tag 0x0


Does that help ? If not I can add some additional log messages.

Guenter

> On Thu, Dec 8, 2022, 8:13 PM Guenter Roeck <linux@roeck-us.net <mailto:linux@roeck-us.net>> wrote:
> 
>     On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
>      > >
>      > > A cq head doorbell mmio is skipped... And it is not the fault of the
>      > > kernel. The kernel is in it's good right to skip the mmio since the cq
>      > > eventidx is not properly updated.
>      > >
>      > > Adding that and it boots properly on riscv. But I'm perplexed as to why
>      > > this didnt show up on our regularly tested platforms.
>      > >
>      > > Gonna try to get this in for 7.2!
>      >
>      > I see another problem with sparc64.
>      >
>      > [    5.261508] could not locate request for tag 0x0
>      > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
>      >
>      > That is seen repeatedly until the request times out. I'll test with
>      > your patch to see if it resolves this problem as well, and will bisect
>      > otherwise.
>      >
>     The second problem is unrelated to the doorbell problem.
>     It is first seen in qemu v7.1. I'll try to bisect.
> 
>     Guenter
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08 20:39             ` Guenter Roeck
@ 2022-12-09 11:00               ` Guenter Roeck
  2022-12-12  9:58               ` Klaus Jensen
  1 sibling, 0 replies; 26+ messages in thread
From: Guenter Roeck @ 2022-12-09 11:00 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Jinhao Fan, qemu-devel, kbusch

On Thu, Dec 08, 2022 at 12:39:57PM -0800, Guenter Roeck wrote:
> On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
> > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > > > 
> > > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > > kernel. The kernel is in it's good right to skip the mmio since the cq
> > > > eventidx is not properly updated.
> > > > 
> > > > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > > > this didnt show up on our regularly tested platforms.
> > > > 
> > > > Gonna try to get this in for 7.2!
> > > 
> > > I see another problem with sparc64.
> > > 
> > > [    5.261508] could not locate request for tag 0x0
> > > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> > > 
> > > That is seen repeatedly until the request times out. I'll test with
> > > your patch to see if it resolves this problem as well, and will bisect
> > > otherwise.
> > > 
> > The second problem is unrelated to the doorbell problem.
> > It is first seen in qemu v7.1. I'll try to bisect.
> > 
> 
> Unfortunately, the problem observed with sparc64 also bisects to this
> patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
> does not fix it (which is why I initially thought it was unrelated).
> 
> I used the following qemu command line.
> 
> qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
>     -device nvme,serial=foo,drive=d0,bus=pciB \
>     -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
>     -kernel arch/sparc/boot/image -no-reboot \
>     -append "root=/dev/nvme0n1 console=ttyS0" \
>     -nographic -monitor none
> 

With completed tests, it turns out the problem is seen with various
emulations running big endian CPUs.

Example from arm64be:

[    4.736752] nvme nvme0: pci function 0000:00:02.0
[    4.737829] nvme 0000:00:02.0: enabling device (0000 -> 0002)
[    4.774673] nvme nvme0: 2/0/0 default/read/poll queues
[    4.779331] nvme nvme0: Ignoring bogus Namespace Identifiers
[    4.799400] could not locate request for tag 0x0
[    4.799533] nvme nvme0: invalid id 0 completed on queue 2
[    4.799612] could not locate request for tag 0x0
[    4.799676] nvme nvme0: invalid id 0 completed on queue 2
[    4.799744] could not locate request for tag 0x0

powerpc:

could not locate request for tag 0x0
nvme nvme0: invalid id 0 completed on queue 1
could not locate request for tag 0x0
nvme nvme0: invalid id 0 completed on queue 1

trace logs (arm64be, good, qemu v7.0):

pci_nvme_admin_cmd cid 2864 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
pci_nvme_identify cid 2864 cns 0x5 ctrlid 0 csi 0x0
pci_nvme_identify_ns_csi nsid=1, csi=0x0
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44a84000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0x44a84000 len 4096
pci_nvme_enqueue_req_completion cid 2864 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_msix raising MSI-X IRQ vector 0
pci_nvme_mmio_write addr 0x1004 data 0x10 size 4
pci_nvme_mmio_doorbell_cq cqid 0 new_head 16
pci_nvme_mmio_write addr 0x1008 data 0x1 size 4
pci_nvme_mmio_doorbell_sq sqid 1 new_tail 1
pci_nvme_io_cmd cid 32770 nsid 0x1 sqid 1 opc 0x2 opname 'NVME_NVM_CMD_READ'
pci_nvme_read cid 32770 nsid 1 nlb 8 count 4096 lba 0x0
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44879000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0x44879000 len 4096
pci_nvme_rw_cb cid 32770 blk 'd0'
pci_nvme_rw_complete_cb cid 32770 blk 'd0'
pci_nvme_enqueue_req_completion cid 32770 cqid 1 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_irq_msix raising MSI-X IRQ vector 1
pci_nvme_mmio_write addr 0x100c data 0x1 size 4
pci_nvme_mmio_doorbell_cq cqid 1 new_head 1
pci_nvme_mmio_write addr 0x1008 data 0x2 size 4
pci_nvme_mmio_doorbell_sq sqid 1 new_tail 2

trace log (arm64be, bad, qemu v7.2):

pci_nvme_admin_cmd cid 5184 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY'
pci_nvme_identify cid 5184 cns 0x5 ctrlid 0 csi 0x0
pci_nvme_identify_ns_csi nsid=1, csi=0x0
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44e56000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0x44e56000 len 4096
pci_nvme_enqueue_req_completion cid 5184 cqid 0 dw0 0x0 dw1 0x0 status 0x0
pci_nvme_update_sq_eventidx sqid 0 new_eventidx 18
pci_nvme_update_sq_tail sqid 0 new_tail 18
pci_nvme_update_cq_eventidx cqid 0 new_eventidx 16
pci_nvme_update_cq_head cqid 0 new_head 16
pci_nvme_irq_msix raising MSI-X IRQ vector 0
pci_nvme_mmio_write addr 0x1004 data 0x11 size 4
pci_nvme_mmio_doorbell_cq cqid 0 new_head 17
pci_nvme_mmio_write addr 0x1010 data 0x1 size 4
pci_nvme_mmio_doorbell_sq sqid 2 new_tail 1
pci_nvme_update_sq_tail sqid 2 new_tail 0
pci_nvme_io_cmd cid 16384 nsid 0x1 sqid 2 opc 0x2 opname 'NVME_NVM_CMD_READ'
pci_nvme_read cid 16384 nsid 1 nlb 8 count 4096 lba 0x0
pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44cea000 prp2 0x0 num_prps 2
pci_nvme_map_addr addr 0x44cea000 len 4096
pci_nvme_update_sq_eventidx sqid 2 new_eventidx 0
pci_nvme_update_sq_tail sqid 2 new_tail 0
pci_nvme_io_cmd cid 0 nsid 0x0 sqid 2 opc 0x0 opname 'NVME_NVM_CMD_FLUSH'
pci_nvme_enqueue_req_completion cid 0 cqid 2 dw0 0x0 dw1 0x0 status 0x400b
pci_nvme_err_req_status cid 0 nsid 0 status 0x400b opc 0x0
pci_nvme_update_sq_eventidx sqid 2 new_eventidx 0
pci_nvme_update_sq_tail sqid 2 new_tail 0

[flush command repeated many times]

Guenter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-08 20:39             ` Guenter Roeck
  2022-12-09 11:00               ` Guenter Roeck
@ 2022-12-12  9:58               ` Klaus Jensen
  2022-12-12 13:39                 ` Guenter Roeck
  1 sibling, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-12-12  9:58 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jinhao Fan, qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

On Dec  8 12:39, Guenter Roeck wrote:
> On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
> > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > > > 
> > > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > > kernel. The kernel is in it's good right to skip the mmio since the cq
> > > > eventidx is not properly updated.
> > > > 
> > > > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > > > this didnt show up on our regularly tested platforms.
> > > > 
> > > > Gonna try to get this in for 7.2!
> > > 
> > > I see another problem with sparc64.
> > > 
> > > [    5.261508] could not locate request for tag 0x0
> > > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> > > 
> > > That is seen repeatedly until the request times out. I'll test with
> > > your patch to see if it resolves this problem as well, and will bisect
> > > otherwise.
> > > 
> > The second problem is unrelated to the doorbell problem.
> > It is first seen in qemu v7.1. I'll try to bisect.
> > 
> 
> Unfortunately, the problem observed with sparc64 also bisects to this
> patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
> does not fix it (which is why I initially thought it was unrelated).
> 
> I used the following qemu command line.
> 
> qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
>     -device nvme,serial=foo,drive=d0,bus=pciB \
>     -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
>     -kernel arch/sparc/boot/image -no-reboot \
>     -append "root=/dev/nvme0n1 console=ttyS0" \
>     -nographic -monitor none
> 

Hi Guenter,

Thank you very much for the detailed reports and I apologize for the
fallout of this.

I think this is related to missing endian conversions when handling the
shadow doorbells. I'm not sure if there is a kernel issue here as well,
because as far as I can tell, the shadow doorbells are updated like so:

  old_value = *dbbuf_db;
  *dbbuf_db = value;

(where `value` is the new head/tail value depending on if this is an sq
or cq).

Keith, is the kernel doing something magically here I am not aware of,
or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-12  9:58               ` Klaus Jensen
@ 2022-12-12 13:39                 ` Guenter Roeck
  2022-12-12 13:45                   ` Klaus Jensen
  0 siblings, 1 reply; 26+ messages in thread
From: Guenter Roeck @ 2022-12-12 13:39 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Jinhao Fan, qemu-devel, kbusch

On 12/12/22 01:58, Klaus Jensen wrote:
> On Dec  8 12:39, Guenter Roeck wrote:
>> On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
>>> On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
>>>>>
>>>>> A cq head doorbell mmio is skipped... And it is not the fault of the
>>>>> kernel. The kernel is in it's good right to skip the mmio since the cq
>>>>> eventidx is not properly updated.
>>>>>
>>>>> Adding that and it boots properly on riscv. But I'm perplexed as to why
>>>>> this didnt show up on our regularly tested platforms.
>>>>>
>>>>> Gonna try to get this in for 7.2!
>>>>
>>>> I see another problem with sparc64.
>>>>
>>>> [    5.261508] could not locate request for tag 0x0
>>>> [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
>>>>
>>>> That is seen repeatedly until the request times out. I'll test with
>>>> your patch to see if it resolves this problem as well, and will bisect
>>>> otherwise.
>>>>
>>> The second problem is unrelated to the doorbell problem.
>>> It is first seen in qemu v7.1. I'll try to bisect.
>>>
>>
>> Unfortunately, the problem observed with sparc64 also bisects to this
>> patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
>> does not fix it (which is why I initially thought it was unrelated).
>>
>> I used the following qemu command line.
>>
>> qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
>>      -device nvme,serial=foo,drive=d0,bus=pciB \
>>      -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
>>      -kernel arch/sparc/boot/image -no-reboot \
>>      -append "root=/dev/nvme0n1 console=ttyS0" \
>>      -nographic -monitor none
>>
> 
> Hi Guenter,
> 
> Thank you very much for the detailed reports and I apologize for the
> fallout of this.
> 
> I think this is related to missing endian conversions when handling the
> shadow doorbells. I'm not sure if there is a kernel issue here as well,
> because as far as I can tell, the shadow doorbells are updated like so:
> 
>    old_value = *dbbuf_db;
>    *dbbuf_db = value;
> 
> (where `value` is the new head/tail value depending on if this is an sq
> or cq).
> 
> Keith, is the kernel doing something magically here I am not aware of,
> or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?

Wouldn't that mean that nvme doorbell support would be broken in Linux
on all big endian systems ? Maybe it is, but it seems a bit unlikely.

Thanks,
Guenter



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-12 13:39                 ` Guenter Roeck
@ 2022-12-12 13:45                   ` Klaus Jensen
  2022-12-12 14:27                     ` Guenter Roeck
  0 siblings, 1 reply; 26+ messages in thread
From: Klaus Jensen @ 2022-12-12 13:45 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jinhao Fan, qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 3235 bytes --]

On Dec 12 05:39, Guenter Roeck wrote:
> On 12/12/22 01:58, Klaus Jensen wrote:
> > On Dec  8 12:39, Guenter Roeck wrote:
> > > On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
> > > > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > > > > > 
> > > > > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > > > > kernel. The kernel is in it's good right to skip the mmio since the cq
> > > > > > eventidx is not properly updated.
> > > > > > 
> > > > > > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > > > > > this didnt show up on our regularly tested platforms.
> > > > > > 
> > > > > > Gonna try to get this in for 7.2!
> > > > > 
> > > > > I see another problem with sparc64.
> > > > > 
> > > > > [    5.261508] could not locate request for tag 0x0
> > > > > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> > > > > 
> > > > > That is seen repeatedly until the request times out. I'll test with
> > > > > your patch to see if it resolves this problem as well, and will bisect
> > > > > otherwise.
> > > > > 
> > > > The second problem is unrelated to the doorbell problem.
> > > > It is first seen in qemu v7.1. I'll try to bisect.
> > > > 
> > > 
> > > Unfortunately, the problem observed with sparc64 also bisects to this
> > > patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
> > > does not fix it (which is why I initially thought it was unrelated).
> > > 
> > > I used the following qemu command line.
> > > 
> > > qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
> > >      -device nvme,serial=foo,drive=d0,bus=pciB \
> > >      -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > >      -kernel arch/sparc/boot/image -no-reboot \
> > >      -append "root=/dev/nvme0n1 console=ttyS0" \
> > >      -nographic -monitor none
> > > 
> > 
> > Hi Guenter,
> > 
> > Thank you very much for the detailed reports and I apologize for the
> > fallout of this.
> > 
> > I think this is related to missing endian conversions when handling the
> > shadow doorbells. I'm not sure if there is a kernel issue here as well,
> > because as far as I can tell, the shadow doorbells are updated like so:
> > 
> >    old_value = *dbbuf_db;
> >    *dbbuf_db = value;
> > 
> > (where `value` is the new head/tail value depending on if this is an sq
> > or cq).
> > 
> > Keith, is the kernel doing something magically here I am not aware of,
> > or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?
> 
> Wouldn't that mean that nvme doorbell support would be broken in Linux
> on all big endian systems ? Maybe it is, but it seems a bit unlikely.
> 

No, not broken in general - only for shadow doorbells. On regular MMIO,
the linux helpers takes care of endianness (and so does the MMIO
callbacks in QEMU). However, for shadow doorbells, the doorbell value is
written to host memory - and Linux (and QEMU) does not handle that
correctly wrt. endianness.

I got it all working with the series I just posted for QEMU (v4) and
fixing endianness conversion for the above in the kernel (patch pending
for linux-nvme).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-12 13:45                   ` Klaus Jensen
@ 2022-12-12 14:27                     ` Guenter Roeck
  2022-12-12 15:18                       ` Klaus Jensen
  0 siblings, 1 reply; 26+ messages in thread
From: Guenter Roeck @ 2022-12-12 14:27 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Jinhao Fan, qemu-devel, kbusch

On 12/12/22 05:45, Klaus Jensen wrote:
> On Dec 12 05:39, Guenter Roeck wrote:
>> On 12/12/22 01:58, Klaus Jensen wrote:
>>> On Dec  8 12:39, Guenter Roeck wrote:
>>>> On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
>>>>> On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
>>>>>>>
>>>>>>> A cq head doorbell mmio is skipped... And it is not the fault of the
>>>>>>> kernel. The kernel is in it's good right to skip the mmio since the cq
>>>>>>> eventidx is not properly updated.
>>>>>>>
>>>>>>> Adding that and it boots properly on riscv. But I'm perplexed as to why
>>>>>>> this didnt show up on our regularly tested platforms.
>>>>>>>
>>>>>>> Gonna try to get this in for 7.2!
>>>>>>
>>>>>> I see another problem with sparc64.
>>>>>>
>>>>>> [    5.261508] could not locate request for tag 0x0
>>>>>> [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
>>>>>>
>>>>>> That is seen repeatedly until the request times out. I'll test with
>>>>>> your patch to see if it resolves this problem as well, and will bisect
>>>>>> otherwise.
>>>>>>
>>>>> The second problem is unrelated to the doorbell problem.
>>>>> It is first seen in qemu v7.1. I'll try to bisect.
>>>>>
>>>>
>>>> Unfortunately, the problem observed with sparc64 also bisects to this
>>>> patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
>>>> does not fix it (which is why I initially thought it was unrelated).
>>>>
>>>> I used the following qemu command line.
>>>>
>>>> qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
>>>>       -device nvme,serial=foo,drive=d0,bus=pciB \
>>>>       -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
>>>>       -kernel arch/sparc/boot/image -no-reboot \
>>>>       -append "root=/dev/nvme0n1 console=ttyS0" \
>>>>       -nographic -monitor none
>>>>
>>>
>>> Hi Guenter,
>>>
>>> Thank you very much for the detailed reports and I apologize for the
>>> fallout of this.
>>>
>>> I think this is related to missing endian conversions when handling the
>>> shadow doorbells. I'm not sure if there is a kernel issue here as well,
>>> because as far as I can tell, the shadow doorbells are updated like so:
>>>
>>>     old_value = *dbbuf_db;
>>>     *dbbuf_db = value;
>>>
>>> (where `value` is the new head/tail value depending on if this is an sq
>>> or cq).
>>>
>>> Keith, is the kernel doing something magically here I am not aware of,
>>> or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?
>>
>> Wouldn't that mean that nvme doorbell support would be broken in Linux
>> on all big endian systems ? Maybe it is, but it seems a bit unlikely.
>>
> 
> No, not broken in general - only for shadow doorbells. On regular MMIO,
> the linux helpers takes care of endianness (and so does the MMIO
> callbacks in QEMU). However, for shadow doorbells, the doorbell value is
> written to host memory - and Linux (and QEMU) does not handle that
> correctly wrt. endianness.
> 
> I got it all working with the series I just posted for QEMU (v4) and
> fixing endianness conversion for the above in the kernel (patch pending
> for linux-nvme).

Hmm, interesting. I guess I'll wait for the Linux patch to be posted.

That makes me wonder, though: Are the Linux changes really necessary ?
If this never worked, would it be possible to adjust the qemu code
in a way that it just works with the existing Linux code ?

Alternatively, would it be possible to add a runtime flag to qemu
that would let me disable shadow doorbell support ? I am testing
all Linux kernel branches, and without such a flag I'd have to carry
a patch just disabling it in qemu until all kernel branches are fixed
(if that ever happens).

Thanks,
Guenter



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support
  2022-12-12 14:27                     ` Guenter Roeck
@ 2022-12-12 15:18                       ` Klaus Jensen
  0 siblings, 0 replies; 26+ messages in thread
From: Klaus Jensen @ 2022-12-12 15:18 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jinhao Fan, qemu-devel, kbusch

[-- Attachment #1: Type: text/plain, Size: 4441 bytes --]

On Dec 12 06:27, Guenter Roeck wrote:
> On 12/12/22 05:45, Klaus Jensen wrote:
> > On Dec 12 05:39, Guenter Roeck wrote:
> > > On 12/12/22 01:58, Klaus Jensen wrote:
> > > > On Dec  8 12:39, Guenter Roeck wrote:
> > > > > On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
> > > > > > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > > > > > > > 
> > > > > > > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > > > > > > kernel. The kernel is in it's good right to skip the mmio since the cq
> > > > > > > > eventidx is not properly updated.
> > > > > > > > 
> > > > > > > > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > > > > > > > this didnt show up on our regularly tested platforms.
> > > > > > > > 
> > > > > > > > Gonna try to get this in for 7.2!
> > > > > > > 
> > > > > > > I see another problem with sparc64.
> > > > > > > 
> > > > > > > [    5.261508] could not locate request for tag 0x0
> > > > > > > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> > > > > > > 
> > > > > > > That is seen repeatedly until the request times out. I'll test with
> > > > > > > your patch to see if it resolves this problem as well, and will bisect
> > > > > > > otherwise.
> > > > > > > 
> > > > > > The second problem is unrelated to the doorbell problem.
> > > > > > It is first seen in qemu v7.1. I'll try to bisect.
> > > > > > 
> > > > > 
> > > > > Unfortunately, the problem observed with sparc64 also bisects to this
> > > > > patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
> > > > > does not fix it (which is why I initially thought it was unrelated).
> > > > > 
> > > > > I used the following qemu command line.
> > > > > 
> > > > > qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
> > > > >       -device nvme,serial=foo,drive=d0,bus=pciB \
> > > > >       -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > > > >       -kernel arch/sparc/boot/image -no-reboot \
> > > > >       -append "root=/dev/nvme0n1 console=ttyS0" \
> > > > >       -nographic -monitor none
> > > > > 
> > > > 
> > > > Hi Guenter,
> > > > 
> > > > Thank you very much for the detailed reports and I apologize for the
> > > > fallout of this.
> > > > 
> > > > I think this is related to missing endian conversions when handling the
> > > > shadow doorbells. I'm not sure if there is a kernel issue here as well,
> > > > because as far as I can tell, the shadow doorbells are updated like so:
> > > > 
> > > >     old_value = *dbbuf_db;
> > > >     *dbbuf_db = value;
> > > > 
> > > > (where `value` is the new head/tail value depending on if this is an sq
> > > > or cq).
> > > > 
> > > > Keith, is the kernel doing something magically here I am not aware of,
> > > > or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?
> > > 
> > > Wouldn't that mean that nvme doorbell support would be broken in Linux
> > > on all big endian systems ? Maybe it is, but it seems a bit unlikely.
> > > 
> > 
> > No, not broken in general - only for shadow doorbells. On regular MMIO,
> > the linux helpers takes care of endianness (and so does the MMIO
> > callbacks in QEMU). However, for shadow doorbells, the doorbell value is
> > written to host memory - and Linux (and QEMU) does not handle that
> > correctly wrt. endianness.
> > 
> > I got it all working with the series I just posted for QEMU (v4) and
> > fixing endianness conversion for the above in the kernel (patch pending
> > for linux-nvme).
> 
> Hmm, interesting. I guess I'll wait for the Linux patch to be posted.
> 
> That makes me wonder, though: Are the Linux changes really necessary ?
> If this never worked, would it be possible to adjust the qemu code
> in a way that it just works with the existing Linux code ?
> 

That would mean that hw/nvme would be non-compliant on big-endian and
I'd rather not leave it in that state.

> Alternatively, would it be possible to add a runtime flag to qemu
> that would let me disable shadow doorbell support ? I am testing
> all Linux kernel branches, and without such a flag I'd have to carry
> a patch just disabling it in qemu until all kernel branches are fixed
> (if that ever happens).
> 

YES. The fallout of this has been bad, so I'll cook up a patch to
disable shadow doorbells at runtime.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-12-12 15:19 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 12:34 [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Jinhao Fan
2022-06-16 12:34 ` [PATCH v3 1/2] hw/nvme: Implement " Jinhao Fan
2022-06-27 19:17   ` Keith Busch
2022-06-27 19:33     ` Klaus Jensen
2022-06-28  0:29       ` Jinhao Fan
2022-12-07 17:49   ` Guenter Roeck
2022-12-08  7:16     ` Klaus Jensen
2022-12-08  8:08       ` Klaus Jensen
2022-12-08 18:47         ` Guenter Roeck
2022-12-08 20:13           ` Guenter Roeck
2022-12-08 20:28             ` Keith Busch
2022-12-08 21:16               ` Guenter Roeck
2022-12-08 20:39             ` Guenter Roeck
2022-12-09 11:00               ` Guenter Roeck
2022-12-12  9:58               ` Klaus Jensen
2022-12-12 13:39                 ` Guenter Roeck
2022-12-12 13:45                   ` Klaus Jensen
2022-12-12 14:27                     ` Guenter Roeck
2022-12-12 15:18                       ` Klaus Jensen
2022-06-16 12:34 ` [PATCH v3 2/2] hw/nvme: Add trace events for shadow doorbell buffer Jinhao Fan
2022-06-17 11:54 ` [PATCH v3 0/2] hw/nvme: Add shadow doorbell buffer support Klaus Jensen
2022-06-17 12:47   ` Jinhao Fan
2022-06-17 12:56     ` Klaus Jensen
2022-06-17 13:02       ` Jinhao Fan
2022-06-17 20:35 ` Keith Busch
2022-06-27  6:00 ` Klaus Jensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.