qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues
@ 2020-07-04 21:30 Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 01/16] block/nvme: Replace magic value by SCALE_MS definition Philippe Mathieu-Daudé
                   ` (15 more replies)
  0 siblings, 16 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

Hi,

This series is mostly code rearrangement (cleanups) to be
able to split the hardware code from the block driver code,
to be able to use multiple queues on the same hardware, or
multiple block drivers on the same hardware.

Missing review: 5, 6, 14, 15 and 16.

Since v2:
- addressed stefanha review comments
- added 4 trivial patches (to simplify the last one)
- register IRQ notifier for each queuepair (admin and io)

Since v1:
- rebased
- use SCALE_MS definition
- added Stefan's R-b
- addressed Stefan's review comments
  - use union { NvmeIdCtrl / NvmeIdNs }
  - move irq_notifier to NVMeQueuePair
  - removed patches depending on "a tracable hardware stateo
    object instead of BDRVNVMeState".

Please review,

Phil.

$ git backport-diff -u v2
Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/16:[----] [--] 'block/nvme: Replace magic value by SCALE_MS definition'
002/16:[----] [--] 'block/nvme: Avoid further processing if trace event not enabled'
003/16:[----] [--] 'block/nvme: Let nvme_create_queue_pair() fail gracefully'
004/16:[----] [--] 'block/nvme: Define QUEUE_INDEX macros to ease code review'
005/16:[down] 'block/nvme: Improve error message when IO queue creation failed'
006/16:[down] 'block/nvme: Use common error path in nvme_add_io_queue()'
007/16:[----] [--] 'block/nvme: Rename local variable'
008/16:[----] [--] 'block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures'
009/16:[----] [--] 'block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset'
010/16:[----] [--] 'block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz)'
011/16:[----] [--] 'block/nvme: Simplify nvme_init_queue() arguments'
012/16:[----] [--] 'block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE'
013/16:[----] [--] 'block/nvme: Simplify nvme_create_queue_pair() arguments'
014/16:[down] 'block/nvme: Extract nvme_poll_queue()'
015/16:[down] 'block/nvme: Move nvme_poll_cb() earlier'
016/16:[0039] [FC] 'block/nvme: Use per-queue AIO context'

Philippe Mathieu-Daudé (16):
  block/nvme: Replace magic value by SCALE_MS definition
  block/nvme: Avoid further processing if trace event not enabled
  block/nvme: Let nvme_create_queue_pair() fail gracefully
  block/nvme: Define QUEUE_INDEX macros to ease code review
  block/nvme: Improve error message when IO queue creation failed
  block/nvme: Use common error path in nvme_add_io_queue()
  block/nvme: Rename local variable
  block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures
  block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset
  block/nvme: Replace qemu_try_blockalign(bs) by
    qemu_try_memalign(pg_sz)
  block/nvme: Simplify nvme_init_queue() arguments
  block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE
  block/nvme: Simplify nvme_create_queue_pair() arguments
  block/nvme: Extract nvme_poll_queue()
  block/nvme: Move nvme_poll_cb() earlier
  block/nvme: Use per-queuepair IRQ notifier and AIO context

 block/nvme.c | 268 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 148 insertions(+), 120 deletions(-)

-- 
2.21.3



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 01/16] block/nvme: Replace magic value by SCALE_MS definition
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 02/16] block/nvme: Avoid further processing if trace event not enabled Philippe Mathieu-Daudé
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

Use self-explicit SCALE_MS definition instead of magic value.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 374e268915..2f5e3c2adf 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -715,7 +715,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     /* Reset device to get a clean state. */
     s->regs->cc = cpu_to_le32(le32_to_cpu(s->regs->cc) & 0xFE);
     /* Wait for CSTS.RDY = 0. */
-    deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * 1000000ULL;
+    deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * SCALE_MS;
     while (le32_to_cpu(s->regs->csts) & 0x1) {
         if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
             error_setg(errp, "Timeout while waiting for device to reset (%"
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 02/16] block/nvme: Avoid further processing if trace event not enabled
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 01/16] block/nvme: Replace magic value by SCALE_MS definition Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 03/16] block/nvme: Let nvme_create_queue_pair() fail gracefully Philippe Mathieu-Daudé
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is
not enabled. This is an untested intend of performance optimization.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index 2f5e3c2adf..8c30a5fee2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -441,6 +441,9 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 {
     int i;
 
+    if (!trace_event_get_state_backends(TRACE_NVME_SUBMIT_COMMAND_RAW)) {
+        return;
+    }
     for (i = 0; i < 8; ++i) {
         uint8_t *cmdp = (uint8_t *)cmd + i * 8;
         trace_nvme_submit_command_raw(cmdp[0], cmdp[1], cmdp[2], cmdp[3],
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 03/16] block/nvme: Let nvme_create_queue_pair() fail gracefully
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 01/16] block/nvme: Replace magic value by SCALE_MS definition Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 02/16] block/nvme: Avoid further processing if trace event not enabled Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 04/16] block/nvme: Define QUEUE_INDEX macros to ease code review Philippe Mathieu-Daudé
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

As nvme_create_queue_pair() is allowed to fail, replace the
alloc() calls by try_alloc() to avoid aborting QEMU.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 8c30a5fee2..e1893b4e79 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -213,14 +213,22 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
     int i, r;
     BDRVNVMeState *s = bs->opaque;
     Error *local_err = NULL;
-    NVMeQueuePair *q = g_new0(NVMeQueuePair, 1);
+    NVMeQueuePair *q;
     uint64_t prp_list_iova;
 
+    q = g_try_new0(NVMeQueuePair, 1);
+    if (!q) {
+        return NULL;
+    }
+    q->prp_list_pages = qemu_try_blockalign0(bs,
+                                          s->page_size * NVME_QUEUE_SIZE);
+    if (!q->prp_list_pages) {
+        goto fail;
+    }
     qemu_mutex_init(&q->lock);
     q->s = s;
     q->index = idx;
     qemu_co_queue_init(&q->free_req_queue);
-    q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_NUM_REQS);
     q->completion_bh = aio_bh_new(bdrv_get_aio_context(bs),
                                   nvme_process_completion_bh, q);
     r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 04/16] block/nvme: Define QUEUE_INDEX macros to ease code review
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (2 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 03/16] block/nvme: Let nvme_create_queue_pair() fail gracefully Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 05/16] block/nvme: Improve error message when IO queue creation failed Philippe Mathieu-Daudé
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

Use definitions instead of '0' or '1' indexes. Also this will
be useful when using multi-queues later.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index e1893b4e79..28762d7ee8 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -103,6 +103,9 @@ typedef volatile struct {
 
 QEMU_BUILD_BUG_ON(offsetof(NVMeRegs, doorbells) != 0x1000);
 
+#define QUEUE_INDEX_ADMIN   0
+#define QUEUE_INDEX_IO(n)   (1 + n)
+
 struct BDRVNVMeState {
     AioContext *aio_context;
     QEMUVFIOState *vfio;
@@ -531,7 +534,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     }
     cmd.prp1 = cpu_to_le64(iova);
 
-    if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
+    if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to identify controller");
         goto out;
     }
@@ -555,7 +558,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 
     cmd.cdw10 = 0;
     cmd.nsid = cpu_to_le32(namespace);
-    if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
+    if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to identify namespace");
         goto out;
     }
@@ -644,7 +647,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
         .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)),
         .cdw11 = cpu_to_le32(0x3),
     };
-    if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
+    if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to create io queue [%d]", n);
         nvme_free_queue_pair(q);
         return false;
@@ -655,7 +658,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
         .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)),
         .cdw11 = cpu_to_le32(0x1 | (n << 16)),
     };
-    if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
+    if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to create io queue [%d]", n);
         nvme_free_queue_pair(q);
         return false;
@@ -739,16 +742,18 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
 
     /* Set up admin queue. */
     s->queues = g_new(NVMeQueuePair *, 1);
-    s->queues[0] = nvme_create_queue_pair(bs, 0, NVME_QUEUE_SIZE, errp);
-    if (!s->queues[0]) {
+    s->queues[QUEUE_INDEX_ADMIN] = nvme_create_queue_pair(bs, 0,
+                                                          NVME_QUEUE_SIZE,
+                                                          errp);
+    if (!s->queues[QUEUE_INDEX_ADMIN]) {
         ret = -EINVAL;
         goto out;
     }
     s->nr_queues = 1;
     QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
     s->regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
-    s->regs->asq = cpu_to_le64(s->queues[0]->sq.iova);
-    s->regs->acq = cpu_to_le64(s->queues[0]->cq.iova);
+    s->regs->asq = cpu_to_le64(s->queues[QUEUE_INDEX_ADMIN]->sq.iova);
+    s->regs->acq = cpu_to_le64(s->queues[QUEUE_INDEX_ADMIN]->cq.iova);
 
     /* After setting up all control registers we can enable device now. */
     s->regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
@@ -839,7 +844,7 @@ static int nvme_enable_disable_write_cache(BlockDriverState *bs, bool enable,
         .cdw11 = cpu_to_le32(enable ? 0x01 : 0x00),
     };
 
-    ret = nvme_cmd_sync(bs, s->queues[0], &cmd);
+    ret = nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd);
     if (ret) {
         error_setg(errp, "Failed to configure NVMe write cache");
     }
@@ -1056,7 +1061,7 @@ static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs,
 {
     int r;
     BDRVNVMeState *s = bs->opaque;
-    NVMeQueuePair *ioq = s->queues[1];
+    NVMeQueuePair *ioq = s->queues[QUEUE_INDEX_IO(0)];
     NVMeRequest *req;
 
     uint32_t cdw12 = (((bytes >> s->blkshift) - 1) & 0xFFFF) |
@@ -1171,7 +1176,7 @@ static coroutine_fn int nvme_co_pwritev(BlockDriverState *bs,
 static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
 {
     BDRVNVMeState *s = bs->opaque;
-    NVMeQueuePair *ioq = s->queues[1];
+    NVMeQueuePair *ioq = s->queues[QUEUE_INDEX_IO(0)];
     NVMeRequest *req;
     NvmeCmd cmd = {
         .opcode = NVME_CMD_FLUSH,
@@ -1202,7 +1207,7 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
                                               BdrvRequestFlags flags)
 {
     BDRVNVMeState *s = bs->opaque;
-    NVMeQueuePair *ioq = s->queues[1];
+    NVMeQueuePair *ioq = s->queues[QUEUE_INDEX_IO(0)];
     NVMeRequest *req;
 
     uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
@@ -1255,7 +1260,7 @@ static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
                                          int bytes)
 {
     BDRVNVMeState *s = bs->opaque;
-    NVMeQueuePair *ioq = s->queues[1];
+    NVMeQueuePair *ioq = s->queues[QUEUE_INDEX_IO(0)];
     NVMeRequest *req;
     NvmeDsmRange *buf;
     QEMUIOVector local_qiov;
@@ -1398,7 +1403,7 @@ static void nvme_aio_unplug(BlockDriverState *bs)
     BDRVNVMeState *s = bs->opaque;
     assert(s->plugged);
     s->plugged = false;
-    for (i = 1; i < s->nr_queues; i++) {
+    for (i = QUEUE_INDEX_IO(0); i < s->nr_queues; i++) {
         NVMeQueuePair *q = s->queues[i];
         qemu_mutex_lock(&q->lock);
         nvme_kick(q);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 05/16] block/nvme: Improve error message when IO queue creation failed
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (3 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 04/16] block/nvme: Define QUEUE_INDEX macros to ease code review Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-06 10:32   ` Stefan Hajnoczi
  2020-07-04 21:30 ` [PATCH v3 06/16] block/nvme: Use common error path in nvme_add_io_queue() Philippe Mathieu-Daudé
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

Do not use the same error message for different failures.
Display a different error whether it is the CQ or the SQ.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 28762d7ee8..5898a2eab9 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -648,7 +648,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
         .cdw11 = cpu_to_le32(0x3),
     };
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
-        error_setg(errp, "Failed to create io queue [%d]", n);
+        error_setg(errp, "Failed to create CQ io queue [%d]", n);
         nvme_free_queue_pair(q);
         return false;
     }
@@ -659,7 +659,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
         .cdw11 = cpu_to_le32(0x1 | (n << 16)),
     };
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
-        error_setg(errp, "Failed to create io queue [%d]", n);
+        error_setg(errp, "Failed to create SQ io queue [%d]", n);
         nvme_free_queue_pair(q);
         return false;
     }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 06/16] block/nvme: Use common error path in nvme_add_io_queue()
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (4 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 05/16] block/nvme: Improve error message when IO queue creation failed Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-06 11:38   ` Stefan Hajnoczi
  2020-07-04 21:30 ` [PATCH v3 07/16] block/nvme: Rename local variable Philippe Mathieu-Daudé
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

Rearrange nvme_add_io_queue() by using a common error path.
This will be proven useful in few commits where we add IRQ
notification to the IO queues.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5898a2eab9..7bec52ca35 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -649,8 +649,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
     };
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to create CQ io queue [%d]", n);
-        nvme_free_queue_pair(q);
-        return false;
+        goto out_error;
     }
     cmd = (NvmeCmd) {
         .opcode = NVME_ADM_CMD_CREATE_SQ,
@@ -660,13 +659,15 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
     };
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to create SQ io queue [%d]", n);
-        nvme_free_queue_pair(q);
-        return false;
+        goto out_error;
     }
     s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
     s->queues[n] = q;
     s->nr_queues++;
     return true;
+out_error:
+    nvme_free_queue_pair(q);
+    return false;
 }
 
 static bool nvme_poll_cb(void *opaque)
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 07/16] block/nvme: Rename local variable
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (5 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 06/16] block/nvme: Use common error path in nvme_add_io_queue() Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 08/16] block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures Philippe Mathieu-Daudé
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

We are going to modify the code in the next commit. Renaming
the 'resp' variable to 'id' first makes the next commit easier
to review. No logical changes.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 7bec52ca35..0e4e5ff107 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -510,8 +510,8 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     BDRVNVMeState *s = bs->opaque;
     NvmeIdCtrl *idctrl;
     NvmeIdNs *idns;
+    uint8_t *id;
     NvmeLBAF *lbaf;
-    uint8_t *resp;
     uint16_t oncs;
     int r;
     uint64_t iova;
@@ -520,14 +520,14 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         .cdw10 = cpu_to_le32(0x1),
     };
 
-    resp = qemu_try_blockalign0(bs, sizeof(NvmeIdCtrl));
-    if (!resp) {
+    id = qemu_try_blockalign0(bs, sizeof(NvmeIdCtrl));
+    if (!id) {
         error_setg(errp, "Cannot allocate buffer for identify response");
         goto out;
     }
-    idctrl = (NvmeIdCtrl *)resp;
-    idns = (NvmeIdNs *)resp;
-    r = qemu_vfio_dma_map(s->vfio, resp, sizeof(NvmeIdCtrl), true, &iova);
+    idctrl = (NvmeIdCtrl *)id;
+    idns = (NvmeIdNs *)id;
+    r = qemu_vfio_dma_map(s->vfio, id, sizeof(NvmeIdCtrl), true, &iova);
     if (r) {
         error_setg(errp, "Cannot map buffer for DMA");
         goto out;
@@ -554,8 +554,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     s->supports_write_zeroes = !!(oncs & NVME_ONCS_WRITE_ZEROS);
     s->supports_discard = !!(oncs & NVME_ONCS_DSM);
 
-    memset(resp, 0, 4096);
-
+    memset(id, 0, 4096);
     cmd.cdw10 = 0;
     cmd.nsid = cpu_to_le32(namespace);
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
@@ -587,8 +586,8 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 
     s->blkshift = lbaf->ds;
 out:
-    qemu_vfio_dma_unmap(s->vfio, resp);
-    qemu_vfree(resp);
+    qemu_vfio_dma_unmap(s->vfio, id);
+    qemu_vfree(id);
 }
 
 static bool nvme_poll_queues(BDRVNVMeState *s)
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 08/16] block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (6 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 07/16] block/nvme: Rename local variable Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 09/16] block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset Philippe Mathieu-Daudé
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

We allocate an unique chunk of memory then use it for two
different structures. By using an union, we make it clear
the data is overlapping (and we can remove the casts).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 0e4e5ff107..a611fdd71e 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -508,9 +508,10 @@ static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
 static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 {
     BDRVNVMeState *s = bs->opaque;
-    NvmeIdCtrl *idctrl;
-    NvmeIdNs *idns;
-    uint8_t *id;
+    union {
+        NvmeIdCtrl ctrl;
+        NvmeIdNs ns;
+    } *id;
     NvmeLBAF *lbaf;
     uint16_t oncs;
     int r;
@@ -520,14 +521,12 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         .cdw10 = cpu_to_le32(0x1),
     };
 
-    id = qemu_try_blockalign0(bs, sizeof(NvmeIdCtrl));
+    id = qemu_try_blockalign0(bs, sizeof(*id));
     if (!id) {
         error_setg(errp, "Cannot allocate buffer for identify response");
         goto out;
     }
-    idctrl = (NvmeIdCtrl *)id;
-    idns = (NvmeIdNs *)id;
-    r = qemu_vfio_dma_map(s->vfio, id, sizeof(NvmeIdCtrl), true, &iova);
+    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
     if (r) {
         error_setg(errp, "Cannot map buffer for DMA");
         goto out;
@@ -539,22 +538,22 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         goto out;
     }
 
-    if (le32_to_cpu(idctrl->nn) < namespace) {
+    if (le32_to_cpu(id->ctrl.nn) < namespace) {
         error_setg(errp, "Invalid namespace");
         goto out;
     }
-    s->write_cache_supported = le32_to_cpu(idctrl->vwc) & 0x1;
-    s->max_transfer = (idctrl->mdts ? 1 << idctrl->mdts : 0) * s->page_size;
+    s->write_cache_supported = le32_to_cpu(id->ctrl.vwc) & 0x1;
+    s->max_transfer = (id->ctrl.mdts ? 1 << id->ctrl.mdts : 0) * s->page_size;
     /* For now the page list buffer per command is one page, to hold at most
      * s->page_size / sizeof(uint64_t) entries. */
     s->max_transfer = MIN_NON_ZERO(s->max_transfer,
                           s->page_size / sizeof(uint64_t) * s->page_size);
 
-    oncs = le16_to_cpu(idctrl->oncs);
+    oncs = le16_to_cpu(id->ctrl.oncs);
     s->supports_write_zeroes = !!(oncs & NVME_ONCS_WRITE_ZEROS);
     s->supports_discard = !!(oncs & NVME_ONCS_DSM);
 
-    memset(id, 0, 4096);
+    memset(id, 0, sizeof(*id));
     cmd.cdw10 = 0;
     cmd.nsid = cpu_to_le32(namespace);
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
@@ -562,11 +561,11 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         goto out;
     }
 
-    s->nsze = le64_to_cpu(idns->nsze);
-    lbaf = &idns->lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
+    s->nsze = le64_to_cpu(id->ns.nsze);
+    lbaf = &id->ns.lbaf[NVME_ID_NS_FLBAS_INDEX(id->ns.flbas)];
 
-    if (NVME_ID_NS_DLFEAT_WRITE_ZEROES(idns->dlfeat) &&
-            NVME_ID_NS_DLFEAT_READ_BEHAVIOR(idns->dlfeat) ==
+    if (NVME_ID_NS_DLFEAT_WRITE_ZEROES(id->ns.dlfeat) &&
+            NVME_ID_NS_DLFEAT_READ_BEHAVIOR(id->ns.dlfeat) ==
                     NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROES) {
         bs->supported_write_flags |= BDRV_REQ_MAY_UNMAP;
     }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 09/16] block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (7 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 08/16] block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 10/16] block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) Philippe Mathieu-Daudé
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

In the next commit we'll get rid of qemu_try_blockalign().
To ease review, first replace qemu_try_blockalign0() by explicit
calls to qemu_try_blockalign() and memset().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index a611fdd71e..9c118c000d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -174,12 +174,12 @@ static void nvme_init_queue(BlockDriverState *bs, NVMeQueue *q,
 
     bytes = ROUND_UP(nentries * entry_bytes, s->page_size);
     q->head = q->tail = 0;
-    q->queue = qemu_try_blockalign0(bs, bytes);
-
+    q->queue = qemu_try_blockalign(bs, bytes);
     if (!q->queue) {
         error_setg(errp, "Cannot allocate queue");
         return;
     }
+    memset(q->queue, 0, bytes);
     r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova);
     if (r) {
         error_setg(errp, "Cannot map queue");
@@ -223,11 +223,12 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
     if (!q) {
         return NULL;
     }
-    q->prp_list_pages = qemu_try_blockalign0(bs,
+    q->prp_list_pages = qemu_try_blockalign(bs,
                                           s->page_size * NVME_QUEUE_SIZE);
     if (!q->prp_list_pages) {
         goto fail;
     }
+    memset(q->prp_list_pages, 0, s->page_size * NVME_QUEUE_SIZE);
     qemu_mutex_init(&q->lock);
     q->s = s;
     q->index = idx;
@@ -521,7 +522,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         .cdw10 = cpu_to_le32(0x1),
     };
 
-    id = qemu_try_blockalign0(bs, sizeof(*id));
+    id = qemu_try_blockalign(bs, sizeof(*id));
     if (!id) {
         error_setg(errp, "Cannot allocate buffer for identify response");
         goto out;
@@ -531,8 +532,9 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         error_setg(errp, "Cannot map buffer for DMA");
         goto out;
     }
-    cmd.prp1 = cpu_to_le64(iova);
 
+    memset(id, 0, sizeof(*id));
+    cmd.prp1 = cpu_to_le64(iova);
     if (nvme_cmd_sync(bs, s->queues[QUEUE_INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to identify controller");
         goto out;
@@ -1283,11 +1285,11 @@ static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
 
     assert(s->nr_queues > 1);
 
-    buf = qemu_try_blockalign0(bs, s->page_size);
+    buf = qemu_try_blockalign(bs, s->page_size);
     if (!buf) {
         return -ENOMEM;
     }
-
+    memset(buf, 0, s->page_size);
     buf->nlb = cpu_to_le32(bytes >> s->blkshift);
     buf->slba = cpu_to_le64(offset >> s->blkshift);
     buf->cattr = 0;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 10/16] block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz)
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (8 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 09/16] block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 11/16] block/nvme: Simplify nvme_init_queue() arguments Philippe Mathieu-Daudé
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

qemu_try_blockalign() is a generic API that call back to the
block driver to return its page alignment. As we call from
within the very same driver, we already know to page alignment
stored in our state. Remove indirections and use the value from
BDRVNVMeState.
This change is required to later remove the BlockDriverState
argument, to make nvme_init_queue() per hardware, and not per
block driver.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 9c118c000d..9566001ba6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -174,7 +174,7 @@ static void nvme_init_queue(BlockDriverState *bs, NVMeQueue *q,
 
     bytes = ROUND_UP(nentries * entry_bytes, s->page_size);
     q->head = q->tail = 0;
-    q->queue = qemu_try_blockalign(bs, bytes);
+    q->queue = qemu_try_memalign(s->page_size, bytes);
     if (!q->queue) {
         error_setg(errp, "Cannot allocate queue");
         return;
@@ -223,7 +223,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
     if (!q) {
         return NULL;
     }
-    q->prp_list_pages = qemu_try_blockalign(bs,
+    q->prp_list_pages = qemu_try_memalign(s->page_size,
                                           s->page_size * NVME_QUEUE_SIZE);
     if (!q->prp_list_pages) {
         goto fail;
@@ -522,7 +522,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         .cdw10 = cpu_to_le32(0x1),
     };
 
-    id = qemu_try_blockalign(bs, sizeof(*id));
+    id = qemu_try_memalign(s->page_size, sizeof(*id));
     if (!id) {
         error_setg(errp, "Cannot allocate buffer for identify response");
         goto out;
@@ -1141,7 +1141,7 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
     }
     trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
-    buf = qemu_try_blockalign(bs, bytes);
+    buf = qemu_try_memalign(s->page_size, bytes);
 
     if (!buf) {
         return -ENOMEM;
@@ -1285,7 +1285,7 @@ static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
 
     assert(s->nr_queues > 1);
 
-    buf = qemu_try_blockalign(bs, s->page_size);
+    buf = qemu_try_memalign(s->page_size, s->page_size);
     if (!buf) {
         return -ENOMEM;
     }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 11/16] block/nvme: Simplify nvme_init_queue() arguments
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (9 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 10/16] block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 12/16] block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE Philippe Mathieu-Daudé
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

nvme_init_queue() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState to simplify.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 9566001ba6..97a63be9d8 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -165,10 +165,9 @@ static QemuOptsList runtime_opts = {
     },
 };
 
-static void nvme_init_queue(BlockDriverState *bs, NVMeQueue *q,
+static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
                             int nentries, int entry_bytes, Error **errp)
 {
-    BDRVNVMeState *s = bs->opaque;
     size_t bytes;
     int r;
 
@@ -251,14 +250,14 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
         req->prp_list_iova = prp_list_iova + i * s->page_size;
     }
 
-    nvme_init_queue(bs, &q->sq, size, NVME_SQ_ENTRY_BYTES, &local_err);
+    nvme_init_queue(s, &q->sq, size, NVME_SQ_ENTRY_BYTES, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto fail;
     }
     q->sq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale];
 
-    nvme_init_queue(bs, &q->cq, size, NVME_CQ_ENTRY_BYTES, &local_err);
+    nvme_init_queue(s, &q->cq, size, NVME_CQ_ENTRY_BYTES, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto fail;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 12/16] block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (10 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 11/16] block/nvme: Simplify nvme_init_queue() arguments Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 13/16] block/nvme: Simplify nvme_create_queue_pair() arguments Philippe Mathieu-Daudé
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

BDRV_POLL_WHILE() is defined as:

  #define BDRV_POLL_WHILE(bs, cond) ({          \
      BlockDriverState *bs_ = (bs);             \
      AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \
                     cond); })

As we will remove the BlockDriverState use in the next commit,
start by using the exploded version of BDRV_POLL_WHILE().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 97a63be9d8..b2fc3c300a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -493,6 +493,7 @@ static void nvme_cmd_sync_cb(void *opaque, int ret)
 static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
                          NvmeCmd *cmd)
 {
+    AioContext *aio_context = bdrv_get_aio_context(bs);
     NVMeRequest *req;
     int ret = -EINPROGRESS;
     req = nvme_get_free_req(q);
@@ -501,7 +502,7 @@ static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
     }
     nvme_submit_command(q, req, cmd, nvme_cmd_sync_cb, &ret);
 
-    BDRV_POLL_WHILE(bs, ret == -EINPROGRESS);
+    AIO_WAIT_WHILE(aio_context, ret == -EINPROGRESS);
     return ret;
 }
 
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 13/16] block/nvme: Simplify nvme_create_queue_pair() arguments
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (11 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 12/16] block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 14/16] block/nvme: Extract nvme_poll_queue() Philippe Mathieu-Daudé
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

nvme_create_queue_pair() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState and AioContext to simplify.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index b2fc3c300a..51ac36dc4f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -208,12 +208,12 @@ static void nvme_free_req_queue_cb(void *opaque)
     qemu_mutex_unlock(&q->lock);
 }
 
-static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
+static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
+                                             AioContext *aio_context,
                                              int idx, int size,
                                              Error **errp)
 {
     int i, r;
-    BDRVNVMeState *s = bs->opaque;
     Error *local_err = NULL;
     NVMeQueuePair *q;
     uint64_t prp_list_iova;
@@ -232,8 +232,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
     q->s = s;
     q->index = idx;
     qemu_co_queue_init(&q->free_req_queue);
-    q->completion_bh = aio_bh_new(bdrv_get_aio_context(bs),
-                                  nvme_process_completion_bh, q);
+    q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
     r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
                           s->page_size * NVME_NUM_REQS,
                           false, &prp_list_iova);
@@ -637,7 +636,8 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
     NvmeCmd cmd;
     int queue_size = NVME_QUEUE_SIZE;
 
-    q = nvme_create_queue_pair(bs, n, queue_size, errp);
+    q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
+                               n, queue_size, errp);
     if (!q) {
         return false;
     }
@@ -683,6 +683,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
                      Error **errp)
 {
     BDRVNVMeState *s = bs->opaque;
+    AioContext *aio_context = bdrv_get_aio_context(bs);
     int ret;
     uint64_t cap;
     uint64_t timeout_ms;
@@ -743,7 +744,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
 
     /* Set up admin queue. */
     s->queues = g_new(NVMeQueuePair *, 1);
-    s->queues[QUEUE_INDEX_ADMIN] = nvme_create_queue_pair(bs, 0,
+    s->queues[QUEUE_INDEX_ADMIN] = nvme_create_queue_pair(s, aio_context, 0,
                                                           NVME_QUEUE_SIZE,
                                                           errp);
     if (!s->queues[QUEUE_INDEX_ADMIN]) {
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 14/16] block/nvme: Extract nvme_poll_queue()
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (12 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 13/16] block/nvme: Simplify nvme_create_queue_pair() arguments Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-06 11:40   ` Stefan Hajnoczi
  2020-07-04 21:30 ` [PATCH v3 15/16] block/nvme: Move nvme_poll_cb() earlier Philippe Mathieu-Daudé
  2020-07-04 21:30 ` [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context Philippe Mathieu-Daudé
  15 siblings, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

As we want to do per-queue polling, extract the nvme_poll_queue()
method which operates on a single queue.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
Stefan better double check here!
---
 block/nvme.c | 44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 51ac36dc4f..a6ff660ad2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -590,31 +590,41 @@ out:
     qemu_vfree(id);
 }
 
+static bool nvme_poll_queue(NVMeQueuePair *q)
+{
+    bool progress = false;
+
+    const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
+    NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
+
+    /*
+     * Do an early check for completions. q->lock isn't needed because
+     * nvme_process_completion() only runs in the event loop thread and
+     * cannot race with itself.
+     */
+    if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
+        return false;
+    }
+
+    qemu_mutex_lock(&q->lock);
+    while (nvme_process_completion(q)) {
+        /* Keep polling */
+        progress = true;
+    }
+    qemu_mutex_unlock(&q->lock);
+
+    return progress;
+}
+
 static bool nvme_poll_queues(BDRVNVMeState *s)
 {
     bool progress = false;
     int i;
 
     for (i = 0; i < s->nr_queues; i++) {
-        NVMeQueuePair *q = s->queues[i];
-        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
-        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
-
-        /*
-         * Do an early check for completions. q->lock isn't needed because
-         * nvme_process_completion() only runs in the event loop thread and
-         * cannot race with itself.
-         */
-        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
-            continue;
-        }
-
-        qemu_mutex_lock(&q->lock);
-        while (nvme_process_completion(q)) {
-            /* Keep polling */
+        if (nvme_poll_queue(s->queues[i])) {
             progress = true;
         }
-        qemu_mutex_unlock(&q->lock);
     }
     return progress;
 }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 15/16] block/nvme: Move nvme_poll_cb() earlier
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (13 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 14/16] block/nvme: Extract nvme_poll_queue() Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-06 11:41   ` Stefan Hajnoczi
  2020-07-04 21:30 ` [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context Philippe Mathieu-Daudé
  15 siblings, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

We are going to use this callback in nvme_add_io_queue()
in the next commit. To avoid forward-declaring it, move
it before. No logical change.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index a6ff660ad2..42c0d5284f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -638,6 +638,15 @@ static void nvme_handle_event(EventNotifier *n)
     nvme_poll_queues(s);
 }
 
+static bool nvme_poll_cb(void *opaque)
+{
+    EventNotifier *e = opaque;
+    BDRVNVMeState *s = container_of(e, BDRVNVMeState, irq_notifier);
+
+    trace_nvme_poll_cb(s);
+    return nvme_poll_queues(s);
+}
+
 static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
 {
     BDRVNVMeState *s = bs->opaque;
@@ -680,15 +689,6 @@ out_error:
     return false;
 }
 
-static bool nvme_poll_cb(void *opaque)
-{
-    EventNotifier *e = opaque;
-    BDRVNVMeState *s = container_of(e, BDRVNVMeState, irq_notifier);
-
-    trace_nvme_poll_cb(s);
-    return nvme_poll_queues(s);
-}
-
 static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
                      Error **errp)
 {
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context
  2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
                   ` (14 preceding siblings ...)
  2020-07-04 21:30 ` [PATCH v3 15/16] block/nvme: Move nvme_poll_cb() earlier Philippe Mathieu-Daudé
@ 2020-07-04 21:30 ` Philippe Mathieu-Daudé
  2020-07-06  9:45   ` Philippe Mathieu-Daudé
  2020-07-06 12:04   ` Stefan Hajnoczi
  15 siblings, 2 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-04 21:30 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, Max Reitz,
	Philippe Mathieu-Daudé

To be able to use multiple queues on the same hardware,
we need to have each queuepair able to receive IRQ
notifications in the correct AIO context.

The AIO context and the notification handler have to be proper
to each queue, not to the block driver. Move aio_context and
irq_notifier from BDRVNVMeState to NVMeQueuePair.

Before this patch, only the admin queuepair had an EventNotifier
and was checking all queues when notified by IRQ.
After this patch, each queuepair (admin or io) is hooked with its
own IRQ notifier up to VFIO.

AioContexts must be identical across all queuepairs and
BlockDriverStates. Although they all have their own AioContext
pointer there is no true support for different AioContexts yet.
(For example, nvme_cmd_sync() is called with a bs argument but
AIO_WAIT_WHILE(q->aio_context, ...) uses the queuepair
aio_context so the assumption is that they match.)

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
v3:
- Add notifier to IO queuepairs
- Reword with Stefan help

I'd like to split this into smaller changes, but I'm not sure
if it is possible...
Maybe move EventNotifier first (keeping aio_context shared),
then move AioContext per queuepair?
---
 block/nvme.c | 102 +++++++++++++++++++++++++--------------------------
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 42c0d5284f..fcf8d93fb2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -60,6 +60,8 @@ typedef struct {
 
 typedef struct {
     QemuMutex   lock;
+    AioContext *aio_context;
+    EventNotifier irq_notifier;
 
     /* Read from I/O code path, initialized under BQL */
     BDRVNVMeState   *s;
@@ -107,7 +109,6 @@ QEMU_BUILD_BUG_ON(offsetof(NVMeRegs, doorbells) != 0x1000);
 #define QUEUE_INDEX_IO(n)   (1 + n)
 
 struct BDRVNVMeState {
-    AioContext *aio_context;
     QEMUVFIOState *vfio;
     NVMeRegs *regs;
     /* The submission/completion queue pairs.
@@ -120,7 +121,6 @@ struct BDRVNVMeState {
     /* How many uint32_t elements does each doorbell entry take. */
     size_t doorbell_scale;
     bool write_cache_supported;
-    EventNotifier irq_notifier;
 
     uint64_t nsze; /* Namespace size reported by identify command */
     int nsid;      /* The namespace id to read/write data. */
@@ -227,11 +227,17 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
     if (!q->prp_list_pages) {
         goto fail;
     }
+    r = event_notifier_init(&q->irq_notifier, 0);
+    if (r) {
+        error_setg(errp, "Failed to init event notifier");
+        goto fail;
+    }
     memset(q->prp_list_pages, 0, s->page_size * NVME_QUEUE_SIZE);
     qemu_mutex_init(&q->lock);
     q->s = s;
     q->index = idx;
     qemu_co_queue_init(&q->free_req_queue);
+    q->aio_context = aio_context;
     q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
     r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
                           s->page_size * NVME_NUM_REQS,
@@ -325,7 +331,7 @@ static void nvme_put_free_req_locked(NVMeQueuePair *q, NVMeRequest *req)
 static void nvme_wake_free_req_locked(NVMeQueuePair *q)
 {
     if (!qemu_co_queue_empty(&q->free_req_queue)) {
-        replay_bh_schedule_oneshot_event(q->s->aio_context,
+        replay_bh_schedule_oneshot_event(q->aio_context,
                 nvme_free_req_queue_cb, q);
     }
 }
@@ -492,7 +498,6 @@ static void nvme_cmd_sync_cb(void *opaque, int ret)
 static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
                          NvmeCmd *cmd)
 {
-    AioContext *aio_context = bdrv_get_aio_context(bs);
     NVMeRequest *req;
     int ret = -EINPROGRESS;
     req = nvme_get_free_req(q);
@@ -501,7 +506,7 @@ static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
     }
     nvme_submit_command(q, req, cmd, nvme_cmd_sync_cb, &ret);
 
-    AIO_WAIT_WHILE(aio_context, ret == -EINPROGRESS);
+    AIO_WAIT_WHILE(q->aio_context, ret == -EINPROGRESS);
     return ret;
 }
 
@@ -616,47 +621,35 @@ static bool nvme_poll_queue(NVMeQueuePair *q)
     return progress;
 }
 
-static bool nvme_poll_queues(BDRVNVMeState *s)
-{
-    bool progress = false;
-    int i;
-
-    for (i = 0; i < s->nr_queues; i++) {
-        if (nvme_poll_queue(s->queues[i])) {
-            progress = true;
-        }
-    }
-    return progress;
-}
-
 static void nvme_handle_event(EventNotifier *n)
 {
-    BDRVNVMeState *s = container_of(n, BDRVNVMeState, irq_notifier);
+    NVMeQueuePair *q = container_of(n, NVMeQueuePair, irq_notifier);
 
-    trace_nvme_handle_event(s);
+    trace_nvme_handle_event(q);
     event_notifier_test_and_clear(n);
-    nvme_poll_queues(s);
+    nvme_poll_queue(q);
 }
 
 static bool nvme_poll_cb(void *opaque)
 {
     EventNotifier *e = opaque;
-    BDRVNVMeState *s = container_of(e, BDRVNVMeState, irq_notifier);
+    NVMeQueuePair *q = container_of(e, NVMeQueuePair, irq_notifier);
 
-    trace_nvme_poll_cb(s);
-    return nvme_poll_queues(s);
+    trace_nvme_poll_cb(q);
+    return nvme_poll_queue(q);
 }
 
-static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
+static bool nvme_add_io_queue(BlockDriverState *bs,
+                              AioContext *aio_context, Error **errp)
 {
     BDRVNVMeState *s = bs->opaque;
     int n = s->nr_queues;
     NVMeQueuePair *q;
     NvmeCmd cmd;
     int queue_size = NVME_QUEUE_SIZE;
+    int ret;
 
-    q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
-                               n, queue_size, errp);
+    q = nvme_create_queue_pair(s, aio_context, n, queue_size, errp);
     if (!q) {
         return false;
     }
@@ -683,6 +676,17 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
     s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
     s->queues[n] = q;
     s->nr_queues++;
+
+    ret = qemu_vfio_pci_init_irq(s->vfio,
+                                 &s->queues[n]->irq_notifier,
+                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
+    if (ret) {
+        goto out_error;
+    }
+    aio_set_event_notifier(aio_context,
+                           &s->queues[n]->irq_notifier,
+                           false, nvme_handle_event, nvme_poll_cb);
+
     return true;
 out_error:
     nvme_free_queue_pair(q);
@@ -704,12 +708,6 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     qemu_co_queue_init(&s->dma_flush_queue);
     s->device = g_strdup(device);
     s->nsid = namespace;
-    s->aio_context = bdrv_get_aio_context(bs);
-    ret = event_notifier_init(&s->irq_notifier, 0);
-    if (ret) {
-        error_setg(errp, "Failed to init event notifier");
-        return ret;
-    }
 
     s->vfio = qemu_vfio_open_pci(device, errp);
     if (!s->vfio) {
@@ -784,12 +782,14 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
         }
     }
 
-    ret = qemu_vfio_pci_init_irq(s->vfio, &s->irq_notifier,
+    ret = qemu_vfio_pci_init_irq(s->vfio,
+                                 &s->queues[QUEUE_INDEX_ADMIN]->irq_notifier,
                                  VFIO_PCI_MSIX_IRQ_INDEX, errp);
     if (ret) {
         goto out;
     }
-    aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier,
+    aio_set_event_notifier(aio_context,
+                           &s->queues[QUEUE_INDEX_ADMIN]->irq_notifier,
                            false, nvme_handle_event, nvme_poll_cb);
 
     nvme_identify(bs, namespace, &local_err);
@@ -800,7 +800,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     }
 
     /* Set up command queues. */
-    if (!nvme_add_io_queue(bs, errp)) {
+    if (!nvme_add_io_queue(bs, aio_context, errp)) {
         ret = -EIO;
     }
 out:
@@ -869,12 +869,14 @@ static void nvme_close(BlockDriverState *bs)
     BDRVNVMeState *s = bs->opaque;
 
     for (i = 0; i < s->nr_queues; ++i) {
-        nvme_free_queue_pair(s->queues[i]);
+        NVMeQueuePair *q = s->queues[i];
+
+        aio_set_event_notifier(q->aio_context,
+                               &q->irq_notifier, false, NULL, NULL);
+        event_notifier_cleanup(&q->irq_notifier);
+        nvme_free_queue_pair(q);
     }
     g_free(s->queues);
-    aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier,
-                           false, NULL, NULL);
-    event_notifier_cleanup(&s->irq_notifier);
     qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE);
     qemu_vfio_close(s->vfio);
 
@@ -1086,7 +1088,7 @@ static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs,
         .cdw12 = cpu_to_le32(cdw12),
     };
     NVMeCoData data = {
-        .ctx = bdrv_get_aio_context(bs),
+        .ctx = ioq->aio_context,
         .ret = -EINPROGRESS,
     };
 
@@ -1195,7 +1197,7 @@ static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
         .nsid = cpu_to_le32(s->nsid),
     };
     NVMeCoData data = {
-        .ctx = bdrv_get_aio_context(bs),
+        .ctx = ioq->aio_context,
         .ret = -EINPROGRESS,
     };
 
@@ -1236,7 +1238,7 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
     };
 
     NVMeCoData data = {
-        .ctx = bdrv_get_aio_context(bs),
+        .ctx = ioq->aio_context,
         .ret = -EINPROGRESS,
     };
 
@@ -1286,7 +1288,7 @@ static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
     };
 
     NVMeCoData data = {
-        .ctx = bdrv_get_aio_context(bs),
+        .ctx = ioq->aio_context,
         .ret = -EINPROGRESS,
     };
 
@@ -1379,10 +1381,10 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
 
         qemu_bh_delete(q->completion_bh);
         q->completion_bh = NULL;
-    }
 
-    aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier,
-                           false, NULL, NULL);
+        aio_set_event_notifier(bdrv_get_aio_context(bs), &q->irq_notifier,
+                               false, NULL, NULL);
+    }
 }
 
 static void nvme_attach_aio_context(BlockDriverState *bs,
@@ -1390,13 +1392,11 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 {
     BDRVNVMeState *s = bs->opaque;
 
-    s->aio_context = new_context;
-    aio_set_event_notifier(new_context, &s->irq_notifier,
-                           false, nvme_handle_event, nvme_poll_cb);
-
     for (int i = 0; i < s->nr_queues; i++) {
         NVMeQueuePair *q = s->queues[i];
 
+        aio_set_event_notifier(new_context, &q->irq_notifier,
+                               false, nvme_handle_event, nvme_poll_cb);
         q->completion_bh =
             aio_bh_new(new_context, nvme_process_completion_bh, q);
     }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context
  2020-07-04 21:30 ` [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context Philippe Mathieu-Daudé
@ 2020-07-06  9:45   ` Philippe Mathieu-Daudé
  2020-07-06 12:04   ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-06  9:45 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Maxim Levitsky, qemu-block, Max Reitz

On 7/4/20 11:30 PM, Philippe Mathieu-Daudé wrote:
> To be able to use multiple queues on the same hardware,
> we need to have each queuepair able to receive IRQ
> notifications in the correct AIO context.
> 
> The AIO context and the notification handler have to be proper
> to each queue, not to the block driver. Move aio_context and
> irq_notifier from BDRVNVMeState to NVMeQueuePair.
> 
> Before this patch, only the admin queuepair had an EventNotifier
> and was checking all queues when notified by IRQ.
> After this patch, each queuepair (admin or io) is hooked with its
> own IRQ notifier up to VFIO.

Hmm I should also add a note that we currently only use a single IO
queuepair: nvme_add_io_queue() is called once in nvme_init().

Now after this patch, we should be able to call it twice...

> 
> AioContexts must be identical across all queuepairs and
> BlockDriverStates. Although they all have their own AioContext
> pointer there is no true support for different AioContexts yet.
> (For example, nvme_cmd_sync() is called with a bs argument but
> AIO_WAIT_WHILE(q->aio_context, ...) uses the queuepair
> aio_context so the assumption is that they match.)
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> v3:
> - Add notifier to IO queuepairs
> - Reword with Stefan help
> 
> I'd like to split this into smaller changes, but I'm not sure
> if it is possible...
> Maybe move EventNotifier first (keeping aio_context shared),
> then move AioContext per queuepair?
> ---
>  block/nvme.c | 102 +++++++++++++++++++++++++--------------------------
>  1 file changed, 51 insertions(+), 51 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 42c0d5284f..fcf8d93fb2 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -60,6 +60,8 @@ typedef struct {
>  
>  typedef struct {
>      QemuMutex   lock;
> +    AioContext *aio_context;
> +    EventNotifier irq_notifier;
>  
>      /* Read from I/O code path, initialized under BQL */
>      BDRVNVMeState   *s;
> @@ -107,7 +109,6 @@ QEMU_BUILD_BUG_ON(offsetof(NVMeRegs, doorbells) != 0x1000);
>  #define QUEUE_INDEX_IO(n)   (1 + n)
>  
>  struct BDRVNVMeState {
> -    AioContext *aio_context;
>      QEMUVFIOState *vfio;
>      NVMeRegs *regs;
>      /* The submission/completion queue pairs.
> @@ -120,7 +121,6 @@ struct BDRVNVMeState {
>      /* How many uint32_t elements does each doorbell entry take. */
>      size_t doorbell_scale;
>      bool write_cache_supported;
> -    EventNotifier irq_notifier;
>  
>      uint64_t nsze; /* Namespace size reported by identify command */
>      int nsid;      /* The namespace id to read/write data. */
> @@ -227,11 +227,17 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
>      if (!q->prp_list_pages) {
>          goto fail;
>      }
> +    r = event_notifier_init(&q->irq_notifier, 0);
> +    if (r) {
> +        error_setg(errp, "Failed to init event notifier");
> +        goto fail;
> +    }
>      memset(q->prp_list_pages, 0, s->page_size * NVME_QUEUE_SIZE);
>      qemu_mutex_init(&q->lock);
>      q->s = s;
>      q->index = idx;
>      qemu_co_queue_init(&q->free_req_queue);
> +    q->aio_context = aio_context;
>      q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
>      r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>                            s->page_size * NVME_NUM_REQS,
> @@ -325,7 +331,7 @@ static void nvme_put_free_req_locked(NVMeQueuePair *q, NVMeRequest *req)
>  static void nvme_wake_free_req_locked(NVMeQueuePair *q)
>  {
>      if (!qemu_co_queue_empty(&q->free_req_queue)) {
> -        replay_bh_schedule_oneshot_event(q->s->aio_context,
> +        replay_bh_schedule_oneshot_event(q->aio_context,
>                  nvme_free_req_queue_cb, q);
>      }
>  }
> @@ -492,7 +498,6 @@ static void nvme_cmd_sync_cb(void *opaque, int ret)
>  static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
>                           NvmeCmd *cmd)
>  {
> -    AioContext *aio_context = bdrv_get_aio_context(bs);
>      NVMeRequest *req;
>      int ret = -EINPROGRESS;
>      req = nvme_get_free_req(q);
> @@ -501,7 +506,7 @@ static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
>      }
>      nvme_submit_command(q, req, cmd, nvme_cmd_sync_cb, &ret);
>  
> -    AIO_WAIT_WHILE(aio_context, ret == -EINPROGRESS);
> +    AIO_WAIT_WHILE(q->aio_context, ret == -EINPROGRESS);
>      return ret;
>  }
>  
> @@ -616,47 +621,35 @@ static bool nvme_poll_queue(NVMeQueuePair *q)
>      return progress;
>  }
>  
> -static bool nvme_poll_queues(BDRVNVMeState *s)
> -{
> -    bool progress = false;
> -    int i;
> -
> -    for (i = 0; i < s->nr_queues; i++) {
> -        if (nvme_poll_queue(s->queues[i])) {
> -            progress = true;
> -        }
> -    }
> -    return progress;
> -}
> -
>  static void nvme_handle_event(EventNotifier *n)
>  {
> -    BDRVNVMeState *s = container_of(n, BDRVNVMeState, irq_notifier);
> +    NVMeQueuePair *q = container_of(n, NVMeQueuePair, irq_notifier);
>  
> -    trace_nvme_handle_event(s);
> +    trace_nvme_handle_event(q);
>      event_notifier_test_and_clear(n);
> -    nvme_poll_queues(s);
> +    nvme_poll_queue(q);
>  }
>  
>  static bool nvme_poll_cb(void *opaque)
>  {
>      EventNotifier *e = opaque;
> -    BDRVNVMeState *s = container_of(e, BDRVNVMeState, irq_notifier);
> +    NVMeQueuePair *q = container_of(e, NVMeQueuePair, irq_notifier);
>  
> -    trace_nvme_poll_cb(s);
> -    return nvme_poll_queues(s);
> +    trace_nvme_poll_cb(q);
> +    return nvme_poll_queue(q);
>  }
>  
> -static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
> +static bool nvme_add_io_queue(BlockDriverState *bs,
> +                              AioContext *aio_context, Error **errp)
>  {
>      BDRVNVMeState *s = bs->opaque;
>      int n = s->nr_queues;
>      NVMeQueuePair *q;
>      NvmeCmd cmd;
>      int queue_size = NVME_QUEUE_SIZE;
> +    int ret;
>  
> -    q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
> -                               n, queue_size, errp);
> +    q = nvme_create_queue_pair(s, aio_context, n, queue_size, errp);
>      if (!q) {
>          return false;
>      }
> @@ -683,6 +676,17 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>      s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
>      s->queues[n] = q;
>      s->nr_queues++;
> +
> +    ret = qemu_vfio_pci_init_irq(s->vfio,
> +                                 &s->queues[n]->irq_notifier,
> +                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
> +    if (ret) {
> +        goto out_error;
> +    }
> +    aio_set_event_notifier(aio_context,
> +                           &s->queues[n]->irq_notifier,
> +                           false, nvme_handle_event, nvme_poll_cb);
> +
>      return true;
>  out_error:
>      nvme_free_queue_pair(q);
> @@ -704,12 +708,6 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      qemu_co_queue_init(&s->dma_flush_queue);
>      s->device = g_strdup(device);
>      s->nsid = namespace;
> -    s->aio_context = bdrv_get_aio_context(bs);
> -    ret = event_notifier_init(&s->irq_notifier, 0);
> -    if (ret) {
> -        error_setg(errp, "Failed to init event notifier");
> -        return ret;
> -    }
>  
>      s->vfio = qemu_vfio_open_pci(device, errp);
>      if (!s->vfio) {
> @@ -784,12 +782,14 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>          }
>      }
>  
> -    ret = qemu_vfio_pci_init_irq(s->vfio, &s->irq_notifier,
> +    ret = qemu_vfio_pci_init_irq(s->vfio,
> +                                 &s->queues[QUEUE_INDEX_ADMIN]->irq_notifier,
>                                   VFIO_PCI_MSIX_IRQ_INDEX, errp);
>      if (ret) {
>          goto out;
>      }
> -    aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier,
> +    aio_set_event_notifier(aio_context,
> +                           &s->queues[QUEUE_INDEX_ADMIN]->irq_notifier,
>                             false, nvme_handle_event, nvme_poll_cb);
>  
>      nvme_identify(bs, namespace, &local_err);
> @@ -800,7 +800,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      }
>  
>      /* Set up command queues. */
> -    if (!nvme_add_io_queue(bs, errp)) {
> +    if (!nvme_add_io_queue(bs, aio_context, errp)) {
>          ret = -EIO;
>      }
>  out:
> @@ -869,12 +869,14 @@ static void nvme_close(BlockDriverState *bs)
>      BDRVNVMeState *s = bs->opaque;
>  
>      for (i = 0; i < s->nr_queues; ++i) {
> -        nvme_free_queue_pair(s->queues[i]);
> +        NVMeQueuePair *q = s->queues[i];
> +
> +        aio_set_event_notifier(q->aio_context,
> +                               &q->irq_notifier, false, NULL, NULL);
> +        event_notifier_cleanup(&q->irq_notifier);
> +        nvme_free_queue_pair(q);
>      }
>      g_free(s->queues);
> -    aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier,
> -                           false, NULL, NULL);
> -    event_notifier_cleanup(&s->irq_notifier);
>      qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE);
>      qemu_vfio_close(s->vfio);
>  
> @@ -1086,7 +1088,7 @@ static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs,
>          .cdw12 = cpu_to_le32(cdw12),
>      };
>      NVMeCoData data = {
> -        .ctx = bdrv_get_aio_context(bs),
> +        .ctx = ioq->aio_context,
>          .ret = -EINPROGRESS,
>      };
>  
> @@ -1195,7 +1197,7 @@ static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
>          .nsid = cpu_to_le32(s->nsid),
>      };
>      NVMeCoData data = {
> -        .ctx = bdrv_get_aio_context(bs),
> +        .ctx = ioq->aio_context,
>          .ret = -EINPROGRESS,
>      };
>  
> @@ -1236,7 +1238,7 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
>      };
>  
>      NVMeCoData data = {
> -        .ctx = bdrv_get_aio_context(bs),
> +        .ctx = ioq->aio_context,
>          .ret = -EINPROGRESS,
>      };
>  
> @@ -1286,7 +1288,7 @@ static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
>      };
>  
>      NVMeCoData data = {
> -        .ctx = bdrv_get_aio_context(bs),
> +        .ctx = ioq->aio_context,
>          .ret = -EINPROGRESS,
>      };
>  
> @@ -1379,10 +1381,10 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
>  
>          qemu_bh_delete(q->completion_bh);
>          q->completion_bh = NULL;
> -    }
>  
> -    aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier,
> -                           false, NULL, NULL);
> +        aio_set_event_notifier(bdrv_get_aio_context(bs), &q->irq_notifier,
> +                               false, NULL, NULL);
> +    }
>  }
>  
>  static void nvme_attach_aio_context(BlockDriverState *bs,
> @@ -1390,13 +1392,11 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
>  {
>      BDRVNVMeState *s = bs->opaque;
>  
> -    s->aio_context = new_context;
> -    aio_set_event_notifier(new_context, &s->irq_notifier,
> -                           false, nvme_handle_event, nvme_poll_cb);
> -
>      for (int i = 0; i < s->nr_queues; i++) {
>          NVMeQueuePair *q = s->queues[i];
>  
> +        aio_set_event_notifier(new_context, &q->irq_notifier,
> +                               false, nvme_handle_event, nvme_poll_cb);
>          q->completion_bh =
>              aio_bh_new(new_context, nvme_process_completion_bh, q);
>      }
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 05/16] block/nvme: Improve error message when IO queue creation failed
  2020-07-04 21:30 ` [PATCH v3 05/16] block/nvme: Improve error message when IO queue creation failed Philippe Mathieu-Daudé
@ 2020-07-06 10:32   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-07-06 10:32 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Maxim Levitsky, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

On Sat, Jul 04, 2020 at 11:30:40PM +0200, Philippe Mathieu-Daudé wrote:
> Do not use the same error message for different failures.
> Display a different error whether it is the CQ or the SQ.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 06/16] block/nvme: Use common error path in nvme_add_io_queue()
  2020-07-04 21:30 ` [PATCH v3 06/16] block/nvme: Use common error path in nvme_add_io_queue() Philippe Mathieu-Daudé
@ 2020-07-06 11:38   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-07-06 11:38 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Maxim Levitsky, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On Sat, Jul 04, 2020 at 11:30:41PM +0200, Philippe Mathieu-Daudé wrote:
> Rearrange nvme_add_io_queue() by using a common error path.
> This will be proven useful in few commits where we add IRQ
> notification to the IO queues.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 14/16] block/nvme: Extract nvme_poll_queue()
  2020-07-04 21:30 ` [PATCH v3 14/16] block/nvme: Extract nvme_poll_queue() Philippe Mathieu-Daudé
@ 2020-07-06 11:40   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-07-06 11:40 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Maxim Levitsky, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 476 bytes --]

On Sat, Jul 04, 2020 at 11:30:49PM +0200, Philippe Mathieu-Daudé wrote:
> As we want to do per-queue polling, extract the nvme_poll_queue()
> method which operates on a single queue.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Stefan better double check here!
> ---
>  block/nvme.c | 44 +++++++++++++++++++++++++++-----------------
>  1 file changed, 27 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 15/16] block/nvme: Move nvme_poll_cb() earlier
  2020-07-04 21:30 ` [PATCH v3 15/16] block/nvme: Move nvme_poll_cb() earlier Philippe Mathieu-Daudé
@ 2020-07-06 11:41   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-07-06 11:41 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Maxim Levitsky, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 444 bytes --]

On Sat, Jul 04, 2020 at 11:30:50PM +0200, Philippe Mathieu-Daudé wrote:
> We are going to use this callback in nvme_add_io_queue()
> in the next commit. To avoid forward-declaring it, move
> it before. No logical change.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context
  2020-07-04 21:30 ` [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context Philippe Mathieu-Daudé
  2020-07-06  9:45   ` Philippe Mathieu-Daudé
@ 2020-07-06 12:04   ` Stefan Hajnoczi
  2020-07-06 12:30     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-07-06 12:04 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Maxim Levitsky, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

On Sat, Jul 04, 2020 at 11:30:51PM +0200, Philippe Mathieu-Daudé wrote:
> @@ -683,6 +676,17 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>      s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
>      s->queues[n] = q;
>      s->nr_queues++;
> +
> +    ret = qemu_vfio_pci_init_irq(s->vfio,
> +                                 &s->queues[n]->irq_notifier,
> +                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
> +    if (ret) {
> +        goto out_error;
> +    }
> +    aio_set_event_notifier(aio_context,
> +                           &s->queues[n]->irq_notifier,
> +                           false, nvme_handle_event, nvme_poll_cb);

s->queues[n] can be replaced with q to make the code easier to read.

> @@ -784,12 +782,14 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>          }
>      }
>  
> -    ret = qemu_vfio_pci_init_irq(s->vfio, &s->irq_notifier,
> +    ret = qemu_vfio_pci_init_irq(s->vfio,
> +                                 &s->queues[QUEUE_INDEX_ADMIN]->irq_notifier,
>                                   VFIO_PCI_MSIX_IRQ_INDEX, errp);

QEMU is setting up only 1 MSI-X vector that is shared by the admin and
all io queues?

I'm not very familiar with the VFIO ioctls but I guess this call
replaces the admin queue's irq_notifier registration with VFIO. So now
the queue's irq_notifier is signalled on admin cq events. The admin
irq_notifier is no longer signalled. This seems broken.

If there are multiple irq_notifiers then multiple MSI-X vectors are
needed.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context
  2020-07-06 12:04   ` Stefan Hajnoczi
@ 2020-07-06 12:30     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-06 12:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Cornelia Huck, qemu-devel,
	Maxim Levitsky, Eric Auger, Alex Williamson, Max Reitz

On 7/6/20 2:04 PM, Stefan Hajnoczi wrote:
> On Sat, Jul 04, 2020 at 11:30:51PM +0200, Philippe Mathieu-Daudé wrote:
>> @@ -683,6 +676,17 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>>      s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
>>      s->queues[n] = q;
>>      s->nr_queues++;
>> +
>> +    ret = qemu_vfio_pci_init_irq(s->vfio,
>> +                                 &s->queues[n]->irq_notifier,
>> +                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
>> +    if (ret) {
>> +        goto out_error;
>> +    }
>> +    aio_set_event_notifier(aio_context,
>> +                           &s->queues[n]->irq_notifier,
>> +                           false, nvme_handle_event, nvme_poll_cb);
> 
> s->queues[n] can be replaced with q to make the code easier to read.

Indeed.

> 
>> @@ -784,12 +782,14 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>>          }
>>      }
>>  
>> -    ret = qemu_vfio_pci_init_irq(s->vfio, &s->irq_notifier,
>> +    ret = qemu_vfio_pci_init_irq(s->vfio,
>> +                                 &s->queues[QUEUE_INDEX_ADMIN]->irq_notifier,
>>                                   VFIO_PCI_MSIX_IRQ_INDEX, errp);
> 
> QEMU is setting up only 1 MSI-X vector that is shared by the admin and
> all io queues?
> 
> I'm not very familiar with the VFIO ioctls but I guess this call
> replaces the admin queue's irq_notifier registration with VFIO. So now
> the queue's irq_notifier is signalled on admin cq events. The admin
> irq_notifier is no longer signalled. This seems broken.

I'll look into that. Cc'ing VFIO experts meanwhile...

> 
> If there are multiple irq_notifiers then multiple MSI-X vectors are
> needed.
> 
> Stefan
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-07-06 12:31 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-04 21:30 [PATCH v3 00/16] block/nvme: Various cleanups required to use multiple queues Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 01/16] block/nvme: Replace magic value by SCALE_MS definition Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 02/16] block/nvme: Avoid further processing if trace event not enabled Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 03/16] block/nvme: Let nvme_create_queue_pair() fail gracefully Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 04/16] block/nvme: Define QUEUE_INDEX macros to ease code review Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 05/16] block/nvme: Improve error message when IO queue creation failed Philippe Mathieu-Daudé
2020-07-06 10:32   ` Stefan Hajnoczi
2020-07-04 21:30 ` [PATCH v3 06/16] block/nvme: Use common error path in nvme_add_io_queue() Philippe Mathieu-Daudé
2020-07-06 11:38   ` Stefan Hajnoczi
2020-07-04 21:30 ` [PATCH v3 07/16] block/nvme: Rename local variable Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 08/16] block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 09/16] block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 10/16] block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 11/16] block/nvme: Simplify nvme_init_queue() arguments Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 12/16] block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 13/16] block/nvme: Simplify nvme_create_queue_pair() arguments Philippe Mathieu-Daudé
2020-07-04 21:30 ` [PATCH v3 14/16] block/nvme: Extract nvme_poll_queue() Philippe Mathieu-Daudé
2020-07-06 11:40   ` Stefan Hajnoczi
2020-07-04 21:30 ` [PATCH v3 15/16] block/nvme: Move nvme_poll_cb() earlier Philippe Mathieu-Daudé
2020-07-06 11:41   ` Stefan Hajnoczi
2020-07-04 21:30 ` [PATCH v3 16/16] block/nvme: Use per-queuepair IRQ notifier and AIO context Philippe Mathieu-Daudé
2020-07-06  9:45   ` Philippe Mathieu-Daudé
2020-07-06 12:04   ` Stefan Hajnoczi
2020-07-06 12:30     ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).