All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs
@ 2020-10-26 10:54 Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 01/19] block/nvme: Correct minimum device page size Philippe Mathieu-Daudé
                   ` (19 more replies)
  0 siblings, 20 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

This series allow using multiple MSIX IRQs
We currently share a single IRQ between 2 NVMe queues
(ADMIN and I/O). This series still uses 1 shared IRQ
but prepare for using multiple ones.

Since v1:
- Addressed Stefan comment in patch #14
  "Pass minimum page size to qemu_vfio_open_pci"
  (check the page size is in range with device)
- To reduce (and simplify) changes in patch #14, added
  new patch #2 "Introduce device/iommu 'page_size_min' variables"
- Added "Trace controller capabilities" useful to test the
  previous changes
- "Set request_alignment at initialization" reported by Stefan
  (and tested by Eric off-list).

Missing review: 2,3,4,14

$ git backport-diff -u v1
Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respective=
ly

001/19:[----] [--] 'block/nvme: Correct minimum device page size'
002/19:[down] 'block/nvme: Set request_alignment at initialization'
003/19:[down] 'block/nvme: Introduce device/iommu 'page_size_min' variables'
004/19:[down] 'block/nvme: Trace controller capabilities'
005/19:[----] [--] 'util/vfio-helpers: Improve reporting unsupported IOMMU ty=
pe'
006/19:[----] [-C] 'util/vfio-helpers: Trace PCI I/O config accesses'
007/19:[----] [-C] 'util/vfio-helpers: Trace PCI BAR region info'
008/19:[----] [-C] 'util/vfio-helpers: Trace where BARs are mapped'
009/19:[----] [-C] 'util/vfio-helpers: Improve DMA trace events'
010/19:[----] [-C] 'util/vfio-helpers: Convert vfio_dump_mapping to trace eve=
nts'
011/19:[----] [-C] 'util/vfio-helpers: Let qemu_vfio_dma_map() propagate Erro=
r'
012/19:[----] [--] 'util/vfio-helpers: Let qemu_vfio_do_mapping() propagate E=
rror'
013/19:[----] [--] 'util/vfio-helpers: Let qemu_vfio_verify_mappings() use er=
ror_report()'
014/19:[0017] [FC] 'util/vfio-helpers: Pass minimum page size to qemu_vfio_op=
en_pci()'
015/19:[----] [-C] 'util/vfio-helpers: Report error when IOMMU page size is n=
ot supported'
016/19:[----] [--] 'util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs=
()'
017/19:[----] [--] 'util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq()'
018/19:[----] [-C] 'block/nvme: Switch to using the MSIX API'
019/19:[----] [--] 'util/vfio-helpers: Remove now unused qemu_vfio_pci_init_i=
rq()'

Philippe Mathieu-Daud=C3=A9 (19):
  block/nvme: Correct minimum device page size
  block/nvme: Set request_alignment at initialization
  block/nvme: Introduce device/iommu 'page_size_min' variables
  block/nvme: Trace controller capabilities
  util/vfio-helpers: Improve reporting unsupported IOMMU type
  util/vfio-helpers: Trace PCI I/O config accesses
  util/vfio-helpers: Trace PCI BAR region info
  util/vfio-helpers: Trace where BARs are mapped
  util/vfio-helpers: Improve DMA trace events
  util/vfio-helpers: Convert vfio_dump_mapping to trace events
  util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
  util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
  util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report()
  util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci()
  util/vfio-helpers: Report error when IOMMU page size is not supported
  util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs()
  util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq()
  block/nvme: Switch to using the MSIX API
  util/vfio-helpers: Remove now unused qemu_vfio_pci_init_irq()

 include/qemu/vfio-helpers.h |  15 ++-
 block/nvme.c                |  58 +++++++++---
 util/vfio-helpers.c         | 183 +++++++++++++++++++++++++++---------
 block/trace-events          |   1 +
 util/trace-events           |  13 ++-
 5 files changed, 208 insertions(+), 62 deletions(-)

--=20
2.26.2




^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 01/19] block/nvme: Correct minimum device page size
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 17:57   ` Auger Eric
  2020-10-26 10:54 ` [PATCH v2 02/19] block/nvme: Set request_alignment at initialization Philippe Mathieu-Daudé
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

While trying to simplify the code using a macro, we forgot
the 12-bit shift... Correct that.

Fixes: fad1eb68862 ("block/nvme: Use register definitions from 'block/nvme.h'")
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index b48f6f25881..029694975b9 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -724,7 +724,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
         goto out;
     }
 
-    s->page_size = MAX(4096, 1 << NVME_CAP_MPSMIN(cap));
+    s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
     s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
     bs->bl.opt_mem_alignment = s->page_size;
     timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 02/19] block/nvme: Set request_alignment at initialization
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 01/19] block/nvme: Correct minimum device page size Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 17:38   ` Auger Eric
  2020-10-27  9:24   ` Stefan Hajnoczi
  2020-10-26 10:54 ` [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables Philippe Mathieu-Daudé
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

When introducing this driver in commit bdd6a90a9e5
("block: Add VFIO based NVMe driver") we correctly
set the request_alignment in nvme_refresh_limits()
but forgot to set it at initialization. Do it now.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nvme.c b/block/nvme.c
index 029694975b9..aa290996679 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -727,6 +727,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
     s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
     bs->bl.opt_mem_alignment = s->page_size;
+    bs->bl.request_alignment = s->page_size;
     timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
 
     /* Reset device to get a clean state. */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 01/19] block/nvme: Correct minimum device page size Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 02/19] block/nvme: Set request_alignment at initialization Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 17:38   ` Auger Eric
  2020-10-27  9:32   ` Stefan Hajnoczi
  2020-10-26 10:54 ` [PATCH v2 04/19] block/nvme: Trace controller capabilities Philippe Mathieu-Daudé
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Introduce device/iommu 'page_size_min' variables to make
the code clearer.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index aa290996679..5abd7257cac 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -690,6 +690,8 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     uint64_t deadline, now;
     Error *local_err = NULL;
     volatile NvmeBar *regs = NULL;
+    size_t device_page_size_min;
+    size_t iommu_page_size_min = 4096;
 
     qemu_co_mutex_init(&s->dma_map_lock);
     qemu_co_queue_init(&s->dma_flush_queue);
@@ -724,7 +726,8 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
         goto out;
     }
 
-    s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
+    device_page_size_min = 1u << (12 + NVME_CAP_MPSMIN(cap));
+    s->page_size = MAX(iommu_page_size_min, device_page_size_min);
     s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
     bs->bl.opt_mem_alignment = s->page_size;
     bs->bl.request_alignment = s->page_size;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 04/19] block/nvme: Trace controller capabilities
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (2 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 17:57   ` Auger Eric
  2020-10-27  9:41   ` Stefan Hajnoczi
  2020-10-26 10:54 ` [PATCH v2 05/19] util/vfio-helpers: Improve reporting unsupported IOMMU type Philippe Mathieu-Daudé
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Controllers have different capabilities and report them in the
CAP register. We are particularly interested by the page size
limits.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c       | 10 ++++++++++
 block/trace-events |  1 +
 2 files changed, 11 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index 5abd7257cac..3b6d3972ec2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -720,6 +720,16 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
      * Initialization". */
 
     cap = le64_to_cpu(regs->cap);
+    trace_nvme_controller_capability("Maximum Queue Entries Supported",
+                                     NVME_CAP_MQES(cap));
+    trace_nvme_controller_capability("Contiguous Queues Required",
+                                     NVME_CAP_CQR(cap));
+    trace_nvme_controller_capability("Subsystem  Reset Supported",
+                                     NVME_CAP_NSSRS(cap));
+    trace_nvme_controller_capability("Memory Page Size Minimum",
+                                     NVME_CAP_MPSMIN(cap));
+    trace_nvme_controller_capability("Memory Page Size Maximum",
+                                     NVME_CAP_MPSMAX(cap));
     if (!NVME_CAP_CSS(cap)) {
         error_setg(errp, "Device doesn't support NVMe command set");
         ret = -EINVAL;
diff --git a/block/trace-events b/block/trace-events
index 0e351c3fa3d..3f141dc6801 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -134,6 +134,7 @@ qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t
 qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
 
 # nvme.c
+nvme_controller_capability(const char *desc, uint64_t value) "%s: %"PRIu64
 nvme_kick(void *s, int queue) "s %p queue %d"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 05/19] util/vfio-helpers: Improve reporting unsupported IOMMU type
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (3 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 04/19] block/nvme: Trace controller capabilities Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 06/19] util/vfio-helpers: Trace PCI I/O config accesses Philippe Mathieu-Daudé
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Change the confuse "VFIO IOMMU check failed" error message by
the explicit "VFIO IOMMU Type1 is not supported" once.

Example on POWER:

 $ qemu-system-ppc64 -drive if=none,id=nvme0,file=nvme://0001:01:00.0/1,format=raw
 qemu-system-ppc64: -drive if=none,id=nvme0,file=nvme://0001:01:00.0/1,format=raw: VFIO IOMMU Type1 is not supported

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c469beb0616..14a549510fe 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -300,7 +300,7 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
     }
 
     if (!ioctl(s->container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
-        error_setg_errno(errp, errno, "VFIO IOMMU check failed");
+        error_setg_errno(errp, errno, "VFIO IOMMU Type1 is not supported");
         ret = -EINVAL;
         goto fail_container;
     }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 06/19] util/vfio-helpers: Trace PCI I/O config accesses
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (4 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 05/19] util/vfio-helpers: Improve reporting unsupported IOMMU type Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 18:02   ` Auger Eric
  2020-10-26 10:54 ` [PATCH v2 07/19] util/vfio-helpers: Trace PCI BAR region info Philippe Mathieu-Daudé
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

We sometime get kernel panic with some devices on Aarch64
hosts. Alex Williamson suggests it might be broken PCIe
root complex. Add trace event to record the latest I/O
access before crashing. In case, assert our accesses are
aligned.

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 8 ++++++++
 util/trace-events   | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 14a549510fe..1d4efafcaa4 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -227,6 +227,10 @@ static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
 {
     int ret;
 
+    trace_qemu_vfio_pci_read_config(buf, ofs, size,
+                                    s->config_region_info.offset,
+                                    s->config_region_info.size);
+    assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
     do {
         ret = pread(s->device, buf, size, s->config_region_info.offset + ofs);
     } while (ret == -1 && errno == EINTR);
@@ -237,6 +241,10 @@ static int qemu_vfio_pci_write_config(QEMUVFIOState *s, void *buf, int size, int
 {
     int ret;
 
+    trace_qemu_vfio_pci_write_config(buf, ofs, size,
+                                     s->config_region_info.offset,
+                                     s->config_region_info.size);
+    assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
     do {
         ret = pwrite(s->device, buf, size, s->config_region_info.offset + ofs);
     } while (ret == -1 && errno == EINTR);
diff --git a/util/trace-events b/util/trace-events
index 24c31803b01..c048f85f828 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -85,3 +85,5 @@ qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p host %p size 0x%zx iova 0x%"PRIx64
 qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d iova %p"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
+qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
+qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 07/19] util/vfio-helpers: Trace PCI BAR region info
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (5 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 06/19] util/vfio-helpers: Trace PCI I/O config accesses Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 18:06   ` Auger Eric
  2020-10-26 10:54 ` [PATCH v2 08/19] util/vfio-helpers: Trace where BARs are mapped Philippe Mathieu-Daudé
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

For debug purpose, trace BAR regions info.

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 8 ++++++++
 util/trace-events   | 1 +
 2 files changed, 9 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 1d4efafcaa4..cd6287c3a98 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -136,6 +136,7 @@ static inline void assert_bar_index_valid(QEMUVFIOState *s, int index)
 
 static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **errp)
 {
+    g_autofree char *barname = NULL;
     assert_bar_index_valid(s, index);
     s->bar_region_info[index] = (struct vfio_region_info) {
         .index = VFIO_PCI_BAR0_REGION_INDEX + index,
@@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **errp)
         error_setg_errno(errp, errno, "Failed to get BAR region info");
         return -errno;
     }
+    barname = g_strdup_printf("bar[%d]", index);
+    trace_qemu_vfio_region_info(barname, s->bar_region_info[index].offset,
+                                s->bar_region_info[index].size,
+                                s->bar_region_info[index].cap_offset);
 
     return 0;
 }
@@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
         ret = -errno;
         goto fail;
     }
+    trace_qemu_vfio_region_info("config", s->config_region_info.offset,
+                                s->config_region_info.size,
+                                s->config_region_info.cap_offset);
 
     for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {
         ret = qemu_vfio_pci_init_bar(s, i, errp);
diff --git a/util/trace-events b/util/trace-events
index c048f85f828..4d40c74a21f 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *io
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
+qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs %"PRId32
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 08/19] util/vfio-helpers: Trace where BARs are mapped
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (6 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 07/19] util/vfio-helpers: Trace PCI BAR region info Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 09/19] util/vfio-helpers: Improve DMA trace events Philippe Mathieu-Daudé
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

For debugging purpose, trace where a BAR is mapped.

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 2 ++
 util/trace-events   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index cd6287c3a98..278c54902e7 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -166,6 +166,8 @@ void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
     p = mmap(NULL, MIN(size, s->bar_region_info[index].size - offset),
              prot, MAP_SHARED,
              s->device, s->bar_region_info[index].offset + offset);
+    trace_qemu_vfio_pci_map_bar(index, s->bar_region_info[index].offset ,
+                                size, offset, p);
     if (p == MAP_FAILED) {
         error_setg_errno(errp, errno, "Failed to map BAR region");
         p = NULL;
diff --git a/util/trace-events b/util/trace-events
index 4d40c74a21f..50652761a58 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -88,3 +88,4 @@ qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs %"PRId32
+qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, int ofs, void *host) "map region bar#%d ofs 0x%"PRIx64" size %"PRId64" ofs %d host %p"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 09/19] util/vfio-helpers: Improve DMA trace events
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (7 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 08/19] util/vfio-helpers: Trace where BARs are mapped Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 10/19] util/vfio-helpers: Convert vfio_dump_mapping to " Philippe Mathieu-Daudé
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

For debugging purpose, trace where DMA regions are mapped.

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 3 ++-
 util/trace-events   | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 278c54902e7..c24a510df82 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -627,7 +627,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
         .vaddr = (uintptr_t)host,
         .size = size,
     };
-    trace_qemu_vfio_do_mapping(s, host, size, iova);
+    trace_qemu_vfio_do_mapping(s, host, iova, size);
 
     if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, &dma_map)) {
         error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
@@ -783,6 +783,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             }
         }
     }
+    trace_qemu_vfio_dma_mapped(s, host, iova0, size);
     if (iova) {
         *iova = iova0;
     }
diff --git a/util/trace-events b/util/trace-events
index 50652761a58..8598066acdb 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -82,8 +82,9 @@ qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 0x%z
 qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova) "s %p host %p size 0x%zx index %d iova 0x%"PRIx64
-qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p host %p size 0x%zx iova 0x%"PRIx64
-qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d iova %p"
+qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64 " size 0x%zx"
+qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d &iova %p"
+qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 10/19] util/vfio-helpers: Convert vfio_dump_mapping to trace events
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (8 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 09/19] util/vfio-helpers: Improve DMA trace events Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error Philippe Mathieu-Daudé
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

The QEMU_VFIO_DEBUG definition is only modifiable at build-time.
Trace events can be enabled at run-time. As we prefer the latter,
convert qemu_vfio_dump_mappings() to use trace events instead
of fprintf().

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 19 ++++---------------
 util/trace-events   |  1 +
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c24a510df82..73f7bfa7540 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -521,23 +521,12 @@ QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp)
     return s;
 }
 
-static void qemu_vfio_dump_mapping(IOVAMapping *m)
-{
-    if (QEMU_VFIO_DEBUG) {
-        printf("  vfio mapping %p %" PRIx64 " to %" PRIx64 "\n", m->host,
-               (uint64_t)m->size, (uint64_t)m->iova);
-    }
-}
-
 static void qemu_vfio_dump_mappings(QEMUVFIOState *s)
 {
-    int i;
-
-    if (QEMU_VFIO_DEBUG) {
-        printf("vfio mappings\n");
-        for (i = 0; i < s->nr_mappings; ++i) {
-            qemu_vfio_dump_mapping(&s->mappings[i]);
-        }
+    for (int i = 0; i < s->nr_mappings; ++i) {
+        trace_qemu_vfio_dump_mapping(s->mappings[i].host,
+                                     s->mappings[i].iova,
+                                     s->mappings[i].size);
     }
 }
 
diff --git a/util/trace-events b/util/trace-events
index 8598066acdb..7faad2a718c 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -80,6 +80,7 @@ qemu_mutex_unlock(void *mutex, const char *file, const int line) "released mutex
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
 qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
 qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
+qemu_vfio_dump_mapping(void *host, uint64_t iova, size_t size) "vfio mapping %p to iova 0x%08" PRIx64 " size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova) "s %p host %p size 0x%zx index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64 " size 0x%zx"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (9 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 10/19] util/vfio-helpers: Convert vfio_dump_mapping to " Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 19:58   ` Auger Eric
  2020-10-26 10:54 ` [PATCH v2 12/19] util/vfio-helpers: Let qemu_vfio_do_mapping() " Philippe Mathieu-Daudé
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Currently qemu_vfio_dma_map() displays errors on stderr.
When using management interface, this information is simply
lost. Pass qemu_vfio_dma_map() an Error* argument so it can
propagate the error to callers.

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/qemu/vfio-helpers.h |  2 +-
 block/nvme.c                | 14 +++++++-------
 util/vfio-helpers.c         | 12 +++++++-----
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 4491c8e1a6e..bde9495b254 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -18,7 +18,7 @@ typedef struct QEMUVFIOState QEMUVFIOState;
 QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
 void qemu_vfio_close(QEMUVFIOState *s);
 int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-                      bool temporary, uint64_t *iova_list);
+                      bool temporary, uint64_t *iova_list, Error **errp);
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
 void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host);
 void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
diff --git a/block/nvme.c b/block/nvme.c
index 3b6d3972ec2..6f1ebdf031f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -167,9 +167,9 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
         return;
     }
     memset(q->queue, 0, bytes);
-    r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova);
+    r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova, errp);
     if (r) {
-        error_setg(errp, "Cannot map queue");
+        error_prepend(errp, "Cannot map queue: ");
     }
 }
 
@@ -223,7 +223,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
     q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
     r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
                           s->page_size * NVME_NUM_REQS,
-                          false, &prp_list_iova);
+                          false, &prp_list_iova, errp);
     if (r) {
         goto fail;
     }
@@ -514,9 +514,9 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         error_setg(errp, "Cannot allocate buffer for identify response");
         goto out;
     }
-    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
+    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova, errp);
     if (r) {
-        error_setg(errp, "Cannot map buffer for DMA");
+        error_prepend(errp, "Cannot map buffer for DMA: ");
         goto out;
     }
 
@@ -1003,7 +1003,7 @@ try_map:
         r = qemu_vfio_dma_map(s->vfio,
                               qiov->iov[i].iov_base,
                               qiov->iov[i].iov_len,
-                              true, &iova);
+                              true, &iova, NULL);
         if (r == -ENOMEM && retry) {
             retry = false;
             trace_nvme_dma_flush_queue_wait(s);
@@ -1450,7 +1450,7 @@ static void nvme_register_buf(BlockDriverState *bs, void *host, size_t size)
     int ret;
     BDRVNVMeState *s = bs->opaque;
 
-    ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL);
+    ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL, NULL);
     if (ret) {
         /* FIXME: we may run out of IOVA addresses after repeated
          * bdrv_register_buf/bdrv_unregister_buf, because nvme_vfio_dma_unmap
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 73f7bfa7540..c03fe0b7156 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -462,7 +462,7 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     trace_qemu_vfio_ram_block_added(s, host, size);
-    qemu_vfio_dma_map(s, host, size, false, NULL);
+    qemu_vfio_dma_map(s, host, size, false, NULL, NULL);
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -477,6 +477,7 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
 
 static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
 {
+    Error *local_err = NULL;
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t length = qemu_ram_get_used_length(rb);
     int ret;
@@ -485,10 +486,11 @@ static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
     if (!host_addr) {
         return 0;
     }
-    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
+    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL, &local_err);
     if (ret) {
-        fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-                host_addr, (uint64_t)length);
+        error_reportf_err(local_err,
+                          "qemu_vfio_init_ramblock: failed %p %" PRId64 ":",
+                          host_addr, (uint64_t)length);
     }
     return 0;
 }
@@ -724,7 +726,7 @@ qemu_vfio_find_temp_iova(QEMUVFIOState *s, size_t size, uint64_t *iova)
  * mapping status within this area is not allowed).
  */
 int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-                      bool temporary, uint64_t *iova)
+                      bool temporary, uint64_t *iova, Error **errp)
 {
     int ret = 0;
     int index;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 12/19] util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (10 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 20:02   ` Auger Eric
  2020-10-26 10:54 ` [PATCH v2 13/19] util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report() Philippe Mathieu-Daudé
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Pass qemu_vfio_do_mapping() an Error* argument so it can propagate
any error to callers. Replace error_report() which only report
to the monitor by the more generic error_setg_errno().

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c03fe0b7156..2c4598d7faa 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -609,7 +609,7 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
 
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
-                                uint64_t iova)
+                                uint64_t iova, Error **errp)
 {
     struct vfio_iommu_type1_dma_map dma_map = {
         .argsz = sizeof(dma_map),
@@ -621,7 +621,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
     trace_qemu_vfio_do_mapping(s, host, iova, size);
 
     if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, &dma_map)) {
-        error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
+        error_setg_errno(errp, errno, "VFIO_MAP_DMA failed");
         return -errno;
     }
     return 0;
@@ -757,7 +757,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
                 goto out;
             }
             assert(qemu_vfio_verify_mappings(s));
-            ret = qemu_vfio_do_mapping(s, host, size, iova0);
+            ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
             if (ret) {
                 qemu_vfio_undo_mapping(s, mapping, NULL);
                 goto out;
@@ -768,7 +768,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
                 ret = -ENOMEM;
                 goto out;
             }
-            ret = qemu_vfio_do_mapping(s, host, size, iova0);
+            ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
             if (ret) {
                 goto out;
             }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 13/19] util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report()
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (11 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 12/19] util/vfio-helpers: Let qemu_vfio_do_mapping() " Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-26 10:54 ` [PATCH v2 14/19] util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci() Philippe Mathieu-Daudé
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Instead of displaying the error on stderr, use error_report()
which also report to the monitor.

Reviewed-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 2c4598d7faa..488ddfca2a9 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -661,13 +661,13 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
     if (QEMU_VFIO_DEBUG) {
         for (i = 0; i < s->nr_mappings - 1; ++i) {
             if (!(s->mappings[i].host < s->mappings[i + 1].host)) {
-                fprintf(stderr, "item %d not sorted!\n", i);
+                error_report("item %d not sorted!", i);
                 qemu_vfio_dump_mappings(s);
                 return false;
             }
             if (!(s->mappings[i].host + s->mappings[i].size <=
                   s->mappings[i + 1].host)) {
-                fprintf(stderr, "item %d overlap with next!\n", i);
+                error_report("item %d overlap with next!", i);
                 qemu_vfio_dump_mappings(s);
                 return false;
             }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 14/19] util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci()
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (12 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 13/19] util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report() Philippe Mathieu-Daudé
@ 2020-10-26 10:54 ` Philippe Mathieu-Daudé
  2020-10-27  9:50   ` Stefan Hajnoczi
  2020-10-26 10:55 ` [PATCH v2 15/19] util/vfio-helpers: Report error when IOMMU page size is not supported Philippe Mathieu-Daudé
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

The block driver asks for a minimum page size, but it might not
match the minimum IOMMU requirement.

In the next commit qemu_vfio_init_pci() will be able to report
the minimum IOMMU page size back to the block driver.
In preparation, pass the minimum page size as argument to
qemu_vfio_open_pci(). Add a check to be sure the IOMMU minimum
page size is in range with our NVMe device maximum page size.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/qemu/vfio-helpers.h |  3 ++-
 block/nvme.c                | 14 +++++++++++++-
 util/vfio-helpers.c         |  8 +++++++-
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index bde9495b254..4b97a904e93 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -15,7 +15,8 @@
 
 typedef struct QEMUVFIOState QEMUVFIOState;
 
-QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
+QEMUVFIOState *qemu_vfio_open_pci(const char *device, size_t *min_page_size,
+                                  Error **errp);
 void qemu_vfio_close(QEMUVFIOState *s);
 int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
                       bool temporary, uint64_t *iova_list, Error **errp);
diff --git a/block/nvme.c b/block/nvme.c
index 6f1ebdf031f..46b09b3a3a7 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -691,6 +691,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     Error *local_err = NULL;
     volatile NvmeBar *regs = NULL;
     size_t device_page_size_min;
+    size_t device_page_size_max;
     size_t iommu_page_size_min = 4096;
 
     qemu_co_mutex_init(&s->dma_map_lock);
@@ -704,7 +705,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
         return ret;
     }
 
-    s->vfio = qemu_vfio_open_pci(device, errp);
+    s->vfio = qemu_vfio_open_pci(device, &iommu_page_size_min, errp);
     if (!s->vfio) {
         ret = -EINVAL;
         goto out;
@@ -737,6 +738,17 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     }
 
     device_page_size_min = 1u << (12 + NVME_CAP_MPSMIN(cap));
+    device_page_size_max = 1u << (12 + NVME_CAP_MPSMAX(cap));
+    if (iommu_page_size_min > device_page_size_max) {
+        g_autofree char *iommu_page_size_s = size_to_str(iommu_page_size_min);
+        g_autofree char *device_page_size_s = size_to_str(device_page_size_max);
+
+        error_setg(errp, "IOMMU minimum page size (%s)"
+                         " too big for device (max %s)",
+                   iommu_page_size_s, device_page_size_s);
+        ret = -EINVAL;
+        goto out;
+    }
     s->page_size = MAX(iommu_page_size_min, device_page_size_min);
     s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
     bs->bl.opt_mem_alignment = s->page_size;
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 488ddfca2a9..5e288dfa113 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -508,8 +508,14 @@ static void qemu_vfio_open_common(QEMUVFIOState *s)
 
 /**
  * Open a PCI device, e.g. "0000:00:01.0".
+ *
+ * @min_page_size: Pointer holding the minimum page size requested
+ *
+ * If the IOMMU can not be configured with @min_page_size, the minimum
+ * page size is stored in @min_page_size and -EINVAL is returned.
  */
-QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp)
+QEMUVFIOState *qemu_vfio_open_pci(const char *device, size_t *min_page_size,
+                                  Error **errp)
 {
     int r;
     QEMUVFIOState *s = g_new0(QEMUVFIOState, 1);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 15/19] util/vfio-helpers: Report error when IOMMU page size is not supported
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (13 preceding siblings ...)
  2020-10-26 10:54 ` [PATCH v2 14/19] util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci() Philippe Mathieu-Daudé
@ 2020-10-26 10:55 ` Philippe Mathieu-Daudé
  2020-10-26 16:12   ` Auger Eric
  2020-10-26 10:55 ` [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs() Philippe Mathieu-Daudé
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

This driver uses the host page size to align its memory regions,
but this size is not always compatible with the IOMMU. Add a
check if the size matches, and bails out providing a hint what
is the minimum page size the driver should use.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 util/vfio-helpers.c | 28 ++++++++++++++++++++++++++--
 util/trace-events   |  1 +
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 5e288dfa113..874d76c2a2a 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include <sys/ioctl.h>
 #include <linux/vfio.h>
 #include "qapi/error.h"
@@ -288,7 +289,7 @@ static void collect_usable_iova_ranges(QEMUVFIOState *s, void *buf)
 }
 
 static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
-                              Error **errp)
+                              size_t *requested_page_size, Error **errp)
 {
     int ret;
     int i;
@@ -299,6 +300,8 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
     struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
     char *group_file = NULL;
 
+    assert(requested_page_size && is_power_of_2(*requested_page_size));
+
     s->usable_iova_ranges = NULL;
 
     /* Create a new container */
@@ -373,6 +376,27 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
         ret = -errno;
         goto fail;
     }
+    if (!(iommu_info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
+        error_setg(errp, "Failed to get IOMMU page size info");
+        ret = -EINVAL;
+        goto fail;
+    }
+    trace_qemu_vfio_iommu_iova_pgsizes(iommu_info->iova_pgsizes);
+    if (!(iommu_info->iova_pgsizes & *requested_page_size)) {
+        g_autofree char *req_page_size_str = size_to_str(*requested_page_size);
+        g_autofree char *min_page_size_str = NULL;
+        uint64_t pgsizes_masked;
+
+        pgsizes_masked = MAKE_64BIT_MASK(0, ctz64(*requested_page_size));
+        *requested_page_size = 1U << ctz64(iommu_info->iova_pgsizes
+                                           & ~pgsizes_masked);
+        min_page_size_str = size_to_str(*requested_page_size);
+        error_setg(errp, "Unsupported IOMMU page size: %s", req_page_size_str);
+        error_append_hint(errp, "Minimum IOMMU page size: %s\n",
+                          min_page_size_str);
+        ret = -EINVAL;
+        goto fail;
+    }
 
     /*
      * if the kernel does not report usable IOVA regions, choose
@@ -520,7 +544,7 @@ QEMUVFIOState *qemu_vfio_open_pci(const char *device, size_t *min_page_size,
     int r;
     QEMUVFIOState *s = g_new0(QEMUVFIOState, 1);
 
-    r = qemu_vfio_init_pci(s, device, errp);
+    r = qemu_vfio_init_pci(s, device, min_page_size, errp);
     if (r) {
         g_free(s);
         return NULL;
diff --git a/util/trace-events b/util/trace-events
index 7faad2a718c..3c36def9f30 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -87,6 +87,7 @@ qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p host
 qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d &iova %p"
 qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
+qemu_vfio_iommu_iova_pgsizes(uint64_t iova_pgsizes) "iommu page size bitmask: 0x%08"PRIx64
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs %"PRId32
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs()
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (14 preceding siblings ...)
  2020-10-26 10:55 ` [PATCH v2 15/19] util/vfio-helpers: Report error when IOMMU page size is not supported Philippe Mathieu-Daudé
@ 2020-10-26 10:55 ` Philippe Mathieu-Daudé
  2020-10-26 20:24   ` Auger Eric
  2020-10-26 10:55 ` [PATCH v2 17/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq() Philippe Mathieu-Daudé
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

qemu_vfio_pci_init_irq() allows us to initialize any type of IRQ,
but only one. Introduce qemu_vfio_pci_msix_init_irqs() which is
specific to MSIX IRQ type, and allow us to use multiple IRQs
(thus passing multiple eventfd notifiers).
All eventfd notifiers are initialized with the special '-1' value
meaning "un-assigned".

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/qemu/vfio-helpers.h |  6 +++-
 util/vfio-helpers.c         | 65 ++++++++++++++++++++++++++++++++++++-
 util/trace-events           |  1 +
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 4b97a904e93..492072cba2f 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -1,11 +1,13 @@
 /*
  * QEMU VFIO helpers
  *
- * Copyright 2016 - 2018 Red Hat, Inc.
+ * Copyright 2016 - 2020 Red Hat, Inc.
  *
  * Authors:
  *   Fam Zheng <famz@redhat.com>
+ *   Philippe Mathieu-Daudé <philmd@redhat.com>
  *
+ * SPDX-License-Identifier: GPL-2.0-or-later
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
  */
@@ -29,5 +31,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar,
                              uint64_t offset, uint64_t size);
 int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
                            int irq_type, Error **errp);
+int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
+                                 unsigned *irq_count, Error **errp);
 
 #endif
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 874d76c2a2a..d88e2c7dc1f 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -1,11 +1,13 @@
 /*
  * VFIO utility
  *
- * Copyright 2016 - 2018 Red Hat, Inc.
+ * Copyright 2016 - 2020 Red Hat, Inc.
  *
  * Authors:
  *   Fam Zheng <famz@redhat.com>
+ *   Philippe Mathieu-Daudé <philmd@redhat.com>
  *
+ * SPDX-License-Identifier: GPL-2.0-or-later
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
  */
@@ -230,6 +232,67 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
     return 0;
 }
 
+/**
+ * Initialize device MSIX IRQs and register event notifiers.
+ * @irq_count: pointer to number of MSIX IRQs to initialize
+ *
+ * If the number of IRQs requested exceeds the available on the device,
+ * store the number of available IRQs in @irq_count and return -EOVERFLOW.
+ */
+int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
+                                 unsigned *irq_count, Error **errp)
+{
+    int r;
+    size_t irq_set_size;
+    struct vfio_irq_set *irq_set;
+    struct vfio_irq_info irq_info = {
+        .argsz = sizeof(irq_info),
+        .index = VFIO_PCI_MSIX_IRQ_INDEX
+    };
+
+    if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, &irq_info)) {
+        error_setg_errno(errp, errno, "Failed to get device interrupt info");
+        return -errno;
+    }
+    trace_qemu_vfio_msix_info_irqs(irq_info.count, *irq_count);
+    if (irq_info.count < *irq_count) {
+        error_setg(errp, "Not enough device interrupts available");
+        *irq_count = irq_info.count;
+        return -EOVERFLOW;
+    }
+    if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
+        error_setg(errp, "Device interrupt doesn't support eventfd");
+        return -EINVAL;
+    }
+
+    irq_set_size = sizeof(*irq_set) + *irq_count * sizeof(int32_t);
+    irq_set = g_malloc0(irq_set_size);
+
+    /* Get to a known IRQ state */
+    *irq_set = (struct vfio_irq_set) {
+        .argsz = irq_set_size,
+        .flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
+        .index = VFIO_PCI_MSIX_IRQ_INDEX,
+        .start = 0,
+        .count = *irq_count,
+    };
+
+    for (unsigned i = 0; i < *irq_count; i++) {
+        ((int32_t *)&irq_set->data)[i] = -1; /* un-assigned: skip */
+    }
+    r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (r < 0) {
+        error_setg_errno(errp, errno, "Failed to setup device interrupts");
+        return -errno;
+    } else if (r > 0) {
+        error_setg(errp, "Not enough device interrupts available");
+        *irq_count = r;
+        return -EOVERFLOW;
+    }
+    return 0;
+}
+
 static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
                                      int size, int ofs)
 {
diff --git a/util/trace-events b/util/trace-events
index 3c36def9f30..ec93578b125 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -87,6 +87,7 @@ qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p host
 qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d &iova %p"
 qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
+qemu_vfio_msix_info_irqs(uint32_t count, unsigned asked) "msix irqs %"PRIu32" (asked: %u)"
 qemu_vfio_iommu_iova_pgsizes(uint64_t iova_pgsizes) "iommu page size bitmask: 0x%08"PRIx64
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 17/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq()
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (15 preceding siblings ...)
  2020-10-26 10:55 ` [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs() Philippe Mathieu-Daudé
@ 2020-10-26 10:55 ` Philippe Mathieu-Daudé
  2020-10-26 20:28   ` Auger Eric
  2020-10-26 10:55 ` [PATCH v2 18/19] block/nvme: Switch to using the MSIX API Philippe Mathieu-Daudé
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Introduce qemu_vfio_pci_msix_set_irq() to set the event
notifier of a specific MSIX IRQ. All other registered IRQs
are left unmodified.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/qemu/vfio-helpers.h |  2 ++
 util/vfio-helpers.c         | 35 +++++++++++++++++++++++++++++++++++
 util/trace-events           |  1 +
 3 files changed, 38 insertions(+)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 492072cba2f..4c06694e03a 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -33,5 +33,7 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
                            int irq_type, Error **errp);
 int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
                                  unsigned *irq_count, Error **errp);
+int qemu_vfio_pci_msix_set_irq(QEMUVFIOState *s, unsigned irq_index,
+                               EventNotifier *notifier, Error **errp);
 
 #endif
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index d88e2c7dc1f..18142e6be86 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -232,6 +232,41 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
     return 0;
 }
 
+/**
+ * Initialize a MSIX IRQ and register its event notifier.
+ * @irq_index: MSIX IRQ index
+ * @notifier: notifier for the MSIX IRQ
+ */
+int qemu_vfio_pci_msix_set_irq(QEMUVFIOState *s, unsigned irq_index,
+                               EventNotifier *notifier, Error **errp)
+{
+    int r;
+    int fd = event_notifier_get_fd(notifier);
+    size_t irq_set_size;
+    struct vfio_irq_set *irq_set;
+
+    trace_qemu_vfio_pci_msix_set_irq(irq_index, fd);
+    irq_set_size = sizeof(*irq_set) + sizeof(int32_t);
+    irq_set = g_malloc0(irq_set_size);
+    /* Get to a known IRQ state */
+    *irq_set = (struct vfio_irq_set) {
+        .argsz = irq_set_size,
+        .flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
+        .index = VFIO_PCI_MSIX_IRQ_INDEX,
+        .start = irq_index,
+        .count = 1,
+    };
+    ((int32_t *)&irq_set->data)[0] = fd;
+    r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (r) {
+        error_setg_errno(errp, errno, "Failed to setup device interrupt #%u",
+                         irq_index);
+        return -errno;
+    }
+    return 0;
+}
+
 /**
  * Initialize device MSIX IRQs and register event notifiers.
  * @irq_count: pointer to number of MSIX IRQs to initialize
diff --git a/util/trace-events b/util/trace-events
index ec93578b125..3a56c542a94 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -88,6 +88,7 @@ qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *io
 qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_msix_info_irqs(uint32_t count, unsigned asked) "msix irqs %"PRIu32" (asked: %u)"
+qemu_vfio_pci_msix_set_irq(unsigned irq_index, int fd) "msix irq %u notifier_fd %d"
 qemu_vfio_iommu_iova_pgsizes(uint64_t iova_pgsizes) "iommu page size bitmask: 0x%08"PRIx64
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 18/19] block/nvme: Switch to using the MSIX API
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (16 preceding siblings ...)
  2020-10-26 10:55 ` [PATCH v2 17/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq() Philippe Mathieu-Daudé
@ 2020-10-26 10:55 ` Philippe Mathieu-Daudé
  2020-10-26 20:32   ` Auger Eric
  2020-10-26 10:55 ` [PATCH v2 19/19] util/vfio-helpers: Remove now unused qemu_vfio_pci_init_irq() Philippe Mathieu-Daudé
  2020-10-27  9:55 ` [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Stefan Hajnoczi
  19 siblings, 1 reply; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

In preparation of using multiple IRQs, switch to using the recently
introduced MSIX API. Instead of allocating and assigning IRQ in
a single step, we now have to use two distinct calls.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 block/nvme.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 46b09b3a3a7..191678540b6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -693,6 +693,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     size_t device_page_size_min;
     size_t device_page_size_max;
     size_t iommu_page_size_min = 4096;
+    unsigned irq_count = MSIX_IRQ_COUNT;
 
     qemu_co_mutex_init(&s->dma_map_lock);
     qemu_co_queue_init(&s->dma_flush_queue);
@@ -809,8 +810,17 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
         }
     }
 
-    ret = qemu_vfio_pci_init_irq(s->vfio, s->irq_notifier,
-                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
+    ret = qemu_vfio_pci_msix_init_irqs(s->vfio, &irq_count, errp);
+    if (ret) {
+        if (ret == -EOVERFLOW) {
+            error_append_hint(errp, "%u IRQs requested but only %u available\n",
+                              MSIX_IRQ_COUNT, irq_count);
+        }
+        goto out;
+    }
+
+    ret = qemu_vfio_pci_msix_set_irq(s->vfio, MSIX_SHARED_IRQ_IDX,
+                                     s->irq_notifier, errp);
     if (ret) {
         goto out;
     }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 19/19] util/vfio-helpers: Remove now unused qemu_vfio_pci_init_irq()
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (17 preceding siblings ...)
  2020-10-26 10:55 ` [PATCH v2 18/19] block/nvme: Switch to using the MSIX API Philippe Mathieu-Daudé
@ 2020-10-26 10:55 ` Philippe Mathieu-Daudé
  2020-10-27  9:55 ` [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Stefan Hajnoczi
  19 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-26 10:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Eric Auger,
	Alex Williamson, Stefan Hajnoczi, Philippe Mathieu-Daudé

Our only user, the NVMe block driver, switched to the MSIX API.
As this function is now unused, remove it.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/qemu/vfio-helpers.h |  2 --
 util/vfio-helpers.c         | 43 -------------------------------------
 2 files changed, 45 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 4c06694e03a..f42371d25d6 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -29,8 +29,6 @@ void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
                             Error **errp);
 void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar,
                              uint64_t offset, uint64_t size);
-int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
-                           int irq_type, Error **errp);
 int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
                                  unsigned *irq_count, Error **errp);
 int qemu_vfio_pci_msix_set_irq(QEMUVFIOState *s, unsigned irq_index,
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 18142e6be86..83d6eef13fb 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -189,49 +189,6 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar,
     }
 }
 
-/**
- * Initialize device IRQ with @irq_type and register an event notifier.
- */
-int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
-                           int irq_type, Error **errp)
-{
-    int r;
-    struct vfio_irq_set *irq_set;
-    size_t irq_set_size;
-    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
-
-    irq_info.index = irq_type;
-    if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, &irq_info)) {
-        error_setg_errno(errp, errno, "Failed to get device interrupt info");
-        return -errno;
-    }
-    if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
-        error_setg(errp, "Device interrupt doesn't support eventfd");
-        return -EINVAL;
-    }
-
-    irq_set_size = sizeof(*irq_set) + sizeof(int);
-    irq_set = g_malloc0(irq_set_size);
-
-    /* Get to a known IRQ state */
-    *irq_set = (struct vfio_irq_set) {
-        .argsz = irq_set_size,
-        .flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
-        .index = irq_info.index,
-        .start = 0,
-        .count = 1,
-    };
-
-    *(int *)&irq_set->data = event_notifier_get_fd(e);
-    r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
-    g_free(irq_set);
-    if (r) {
-        error_setg_errno(errp, errno, "Failed to setup device interrupt");
-        return -errno;
-    }
-    return 0;
-}
-
 /**
  * Initialize a MSIX IRQ and register its event notifier.
  * @irq_index: MSIX IRQ index
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 15/19] util/vfio-helpers: Report error when IOMMU page size is not supported
  2020-10-26 10:55 ` [PATCH v2 15/19] util/vfio-helpers: Report error when IOMMU page size is not supported Philippe Mathieu-Daudé
@ 2020-10-26 16:12   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 16:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
> This driver uses the host page size to align its memory regions,
> but this size is not always compatible with the IOMMU. Add a
> check if the size matches, and bails out providing a hint what
> is the minimum page size the driver should use.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  util/vfio-helpers.c | 28 ++++++++++++++++++++++++++--
>  util/trace-events   |  1 +
>  2 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 5e288dfa113..874d76c2a2a 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include <sys/ioctl.h>
>  #include <linux/vfio.h>
>  #include "qapi/error.h"
> @@ -288,7 +289,7 @@ static void collect_usable_iova_ranges(QEMUVFIOState *s, void *buf)
>  }
>  
>  static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
> -                              Error **errp)
> +                              size_t *requested_page_size, Error **errp)
>  {
>      int ret;
>      int i;
> @@ -299,6 +300,8 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
>      struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
>      char *group_file = NULL;
>  
> +    assert(requested_page_size && is_power_of_2(*requested_page_size));
> +
>      s->usable_iova_ranges = NULL;
>  
>      /* Create a new container */
> @@ -373,6 +376,27 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
>          ret = -errno;
>          goto fail;
>      }
> +    if (!(iommu_info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
> +        error_setg(errp, "Failed to get IOMMU page size info");
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +    trace_qemu_vfio_iommu_iova_pgsizes(iommu_info->iova_pgsizes);
> +    if (!(iommu_info->iova_pgsizes & *requested_page_size)) {
> +        g_autofree char *req_page_size_str = size_to_str(*requested_page_size);
> +        g_autofree char *min_page_size_str = NULL;
> +        uint64_t pgsizes_masked;
> +
> +        pgsizes_masked = MAKE_64BIT_MASK(0, ctz64(*requested_page_size));
> +        *requested_page_size = 1U << ctz64(iommu_info->iova_pgsizes
> +                                           & ~pgsizes_masked);
> +        min_page_size_str = size_to_str(*requested_page_size);
> +        error_setg(errp, "Unsupported IOMMU page size: %s", req_page_size_str);
> +        error_append_hint(errp, "Minimum IOMMU page size: %s\n",
> +                          min_page_size_str);
this blocks the 64kB tentative support. Before I was able to run the UC
with 64kB page host while the MPS used by the device is 4kB. Of course I
have no evidence yet my work is correct - besides it works in my case
for a sepcific device - but at least we should make sure we do not
introduce a new blocker here.

Also as discussed together
f68453237b  block/nvme: Map doorbells pages write-only
causes troubles with 64kB pages as there, you attempt to map 2
consecutive 4kB pages with different attributes. The 2d mmap fails.

Thanks

Eric
> +        ret = -EINVAL;
> +        goto fail;
> +    }
>  
>      /*
>       * if the kernel does not report usable IOVA regions, choose
> @@ -520,7 +544,7 @@ QEMUVFIOState *qemu_vfio_open_pci(const char *device, size_t *min_page_size,
>      int r;
>      QEMUVFIOState *s = g_new0(QEMUVFIOState, 1);
>  
> -    r = qemu_vfio_init_pci(s, device, errp);
> +    r = qemu_vfio_init_pci(s, device, min_page_size, errp);
>      if (r) {
>          g_free(s);
>          return NULL;
> diff --git a/util/trace-events b/util/trace-events
> index 7faad2a718c..3c36def9f30 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -87,6 +87,7 @@ qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p host
>  qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d &iova %p"
>  qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
> +qemu_vfio_iommu_iova_pgsizes(uint64_t iova_pgsizes) "iommu page size bitmask: 0x%08"PRIx64
>  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs %"PRId32
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables
  2020-10-26 10:54 ` [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables Philippe Mathieu-Daudé
@ 2020-10-26 17:38   ` Auger Eric
  2020-10-27  9:32   ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 17:38 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> Introduce device/iommu 'page_size_min' variables to make
> the code clearer.

I am unclear how much the device and the iommu page size must equal. For
instance, in [RFC 0/5] NVMe passthrough: Support 64kB page host, I have
a 64kB host page and an MPS set to 4kB.

Thanks

Eric
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index aa290996679..5abd7257cac 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -690,6 +690,8 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      uint64_t deadline, now;
>      Error *local_err = NULL;
>      volatile NvmeBar *regs = NULL;
> +    size_t device_page_size_min;
> +    size_t iommu_page_size_min = 4096;
>  
>      qemu_co_mutex_init(&s->dma_map_lock);
>      qemu_co_queue_init(&s->dma_flush_queue);
> @@ -724,7 +726,8 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>          goto out;
>      }
>  
> -    s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
> +    device_page_size_min = 1u << (12 + NVME_CAP_MPSMIN(cap));
> +    s->page_size = MAX(iommu_page_size_min, device_page_size_min);
>      s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
>      bs->bl.opt_mem_alignment = s->page_size;
>      bs->bl.request_alignment = s->page_size;
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 02/19] block/nvme: Set request_alignment at initialization
  2020-10-26 10:54 ` [PATCH v2 02/19] block/nvme: Set request_alignment at initialization Philippe Mathieu-Daudé
@ 2020-10-26 17:38   ` Auger Eric
  2020-10-27  9:24   ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 17:38 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> When introducing this driver in commit bdd6a90a9e5
> ("block: Add VFIO based NVMe driver") we correctly
> set the request_alignment in nvme_refresh_limits()
> but forgot to set it at initialization. Do it now.
> 
> Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric

> ---
>  block/nvme.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 029694975b9..aa290996679 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -727,6 +727,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
>      s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
>      bs->bl.opt_mem_alignment = s->page_size;
> +    bs->bl.request_alignment = s->page_size;
>      timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
>  
>      /* Reset device to get a clean state. */
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 04/19] block/nvme: Trace controller capabilities
  2020-10-26 10:54 ` [PATCH v2 04/19] block/nvme: Trace controller capabilities Philippe Mathieu-Daudé
@ 2020-10-26 17:57   ` Auger Eric
  2020-10-27  9:41   ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 17:57 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> Controllers have different capabilities and report them in the
> CAP register. We are particularly interested by the page size
> limits.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c       | 10 ++++++++++
>  block/trace-events |  1 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 5abd7257cac..3b6d3972ec2 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -720,6 +720,16 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>       * Initialization". */
>  
>      cap = le64_to_cpu(regs->cap);
> +    trace_nvme_controller_capability("Maximum Queue Entries Supported",
> +                                     NVME_CAP_MQES(cap));
> +    trace_nvme_controller_capability("Contiguous Queues Required",
> +                                     NVME_CAP_CQR(cap));
> +    trace_nvme_controller_capability("Subsystem  Reset Supported",
nit extra space
> +                                     NVME_CAP_NSSRS(cap));
> +    trace_nvme_controller_capability("Memory Page Size Minimum",
> +                                     NVME_CAP_MPSMIN(cap));
> +    trace_nvme_controller_capability("Memory Page Size Maximum",
> +                                     NVME_CAP_MPSMAX(cap));
Don't you want to trace the true MPSMIN/MAX (I mean with the shift)?

I would personally group those traces into a single one and avoid this
generic trace format but well ...

Thanks

Eric
>      if (!NVME_CAP_CSS(cap)) {
>          error_setg(errp, "Device doesn't support NVMe command set");
>          ret = -EINVAL;
> diff --git a/block/trace-events b/block/trace-events
> index 0e351c3fa3d..3f141dc6801 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -134,6 +134,7 @@ qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t
>  qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
>  
>  # nvme.c
> +nvme_controller_capability(const char *desc, uint64_t value) "%s: %"PRIu64
>  nvme_kick(void *s, int queue) "s %p queue %d"
>  nvme_dma_flush_queue_wait(void *s) "s %p"
>  nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 01/19] block/nvme: Correct minimum device page size
  2020-10-26 10:54 ` [PATCH v2 01/19] block/nvme: Correct minimum device page size Philippe Mathieu-Daudé
@ 2020-10-26 17:57   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 17:57 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> While trying to simplify the code using a macro, we forgot
> the 12-bit shift... Correct that.
> 
> Fixes: fad1eb68862 ("block/nvme: Use register definitions from 'block/nvme.h'")
> Reported-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> ---

>  block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index b48f6f25881..029694975b9 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -724,7 +724,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>          goto out;
>      }
>  
> -    s->page_size = MAX(4096, 1 << NVME_CAP_MPSMIN(cap));
> +    s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
nit the MAX(4096,) could have been removed

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
>      s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
>      bs->bl.opt_mem_alignment = s->page_size;
>      timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 06/19] util/vfio-helpers: Trace PCI I/O config accesses
  2020-10-26 10:54 ` [PATCH v2 06/19] util/vfio-helpers: Trace PCI I/O config accesses Philippe Mathieu-Daudé
@ 2020-10-26 18:02   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 18:02 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> We sometime get kernel panic with some devices on Aarch64
> hosts. Alex Williamson suggests it might be broken PCIe
> root complex. Add trace event to record the latest I/O
> access before crashing. In case, assert our accesses are
> aligned.
> 
> Reviewed-by: Fam Zheng <fam@euphon.net>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  util/vfio-helpers.c | 8 ++++++++
>  util/trace-events   | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 14a549510fe..1d4efafcaa4 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -227,6 +227,10 @@ static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
>  {
>      int ret;
>  
> +    trace_qemu_vfio_pci_read_config(buf, ofs, size,
> +                                    s->config_region_info.offset,
> +                                    s->config_region_info.size);
> +    assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
>      do {
>          ret = pread(s->device, buf, size, s->config_region_info.offset + ofs);
>      } while (ret == -1 && errno == EINTR);
> @@ -237,6 +241,10 @@ static int qemu_vfio_pci_write_config(QEMUVFIOState *s, void *buf, int size, int
>  {
>      int ret;
>  
> +    trace_qemu_vfio_pci_write_config(buf, ofs, size,
> +                                     s->config_region_info.offset,
> +                                     s->config_region_info.size);
> +    assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
>      do {
>          ret = pwrite(s->device, buf, size, s->config_region_info.offset + ofs);
>      } while (ret == -1 && errno == EINTR);
> diff --git a/util/trace-events b/util/trace-events
> index 24c31803b01..c048f85f828 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -85,3 +85,5 @@ qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova
>  qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p host %p size 0x%zx iova 0x%"PRIx64
>  qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d iova %p"
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
> +qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
> +qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
I would personally use 0x%PRIx64 for the size too as generally done in
hw/vfio/trace-events .

Thanks

Eric
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 07/19] util/vfio-helpers: Trace PCI BAR region info
  2020-10-26 10:54 ` [PATCH v2 07/19] util/vfio-helpers: Trace PCI BAR region info Philippe Mathieu-Daudé
@ 2020-10-26 18:06   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 18:06 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> For debug purpose, trace BAR regions info.
> 
> Reviewed-by: Fam Zheng <fam@euphon.net>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  util/vfio-helpers.c | 8 ++++++++
>  util/trace-events   | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 1d4efafcaa4..cd6287c3a98 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -136,6 +136,7 @@ static inline void assert_bar_index_valid(QEMUVFIOState *s, int index)
>  
>  static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **errp)
>  {
> +    g_autofree char *barname = NULL;
>      assert_bar_index_valid(s, index);
>      s->bar_region_info[index] = (struct vfio_region_info) {
>          .index = VFIO_PCI_BAR0_REGION_INDEX + index,
> @@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **errp)
>          error_setg_errno(errp, errno, "Failed to get BAR region info");
>          return -errno;
>      }
> +    barname = g_strdup_printf("bar[%d]", index);
> +    trace_qemu_vfio_region_info(barname, s->bar_region_info[index].offset,
> +                                s->bar_region_info[index].size,
> +                                s->bar_region_info[index].cap_offset);
>  
>      return 0;
>  }
> @@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
>          ret = -errno;
>          goto fail;
>      }
> +    trace_qemu_vfio_region_info("config", s->config_region_info.offset,
> +                                s->config_region_info.size,
> +                                s->config_region_info.cap_offset);
>  
>      for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {
>          ret = qemu_vfio_pci_init_bar(s, i, errp);
> diff --git a/util/trace-events b/util/trace-events
> index c048f85f828..4d40c74a21f 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *io
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
>  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
> +qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs %"PRId32
Use hexa too for the offsets and size. This is generally more readable.

Thanks

Eric
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
  2020-10-26 10:54 ` [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error Philippe Mathieu-Daudé
@ 2020-10-26 19:58   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 19:58 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> Currently qemu_vfio_dma_map() displays errors on stderr.
> When using management interface, this information is simply
> lost. Pass qemu_vfio_dma_map() an Error* argument so it can
Error** or simply error handle
> propagate the error to callers.
> 
> Reviewed-by: Fam Zheng <fam@euphon.net>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  include/qemu/vfio-helpers.h |  2 +-
>  block/nvme.c                | 14 +++++++-------
>  util/vfio-helpers.c         | 12 +++++++-----
>  3 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
> index 4491c8e1a6e..bde9495b254 100644
> --- a/include/qemu/vfio-helpers.h
> +++ b/include/qemu/vfio-helpers.h
> @@ -18,7 +18,7 @@ typedef struct QEMUVFIOState QEMUVFIOState;
>  QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
>  void qemu_vfio_close(QEMUVFIOState *s);
>  int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
> -                      bool temporary, uint64_t *iova_list);
> +                      bool temporary, uint64_t *iova_list, Error **errp);
>  int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
>  void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host);
>  void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
> diff --git a/block/nvme.c b/block/nvme.c
> index 3b6d3972ec2..6f1ebdf031f 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -167,9 +167,9 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
>          return;
>      }
>      memset(q->queue, 0, bytes);
> -    r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova);
> +    r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova, errp);
>      if (r) {
> -        error_setg(errp, "Cannot map queue");
> +        error_prepend(errp, "Cannot map queue: ");
>      }
>  }
>  
> @@ -223,7 +223,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
>      q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
>      r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>                            s->page_size * NVME_NUM_REQS,
> -                          false, &prp_list_iova);
> +                          false, &prp_list_iova, errp);
>      if (r) {
you may add an associated error_prepend(errp, "") here too to be consistent.
>          goto fail;
>      }
> @@ -514,9 +514,9 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>          error_setg(errp, "Cannot allocate buffer for identify response");
>          goto out;
>      }
> -    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
> +    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova, errp);
>      if (r) {
> -        error_setg(errp, "Cannot map buffer for DMA");
> +        error_prepend(errp, "Cannot map buffer for DMA: ");
>          goto out;
>      }
>  
> @@ -1003,7 +1003,7 @@ try_map:
>          r = qemu_vfio_dma_map(s->vfio,
>                                qiov->iov[i].iov_base,
>                                qiov->iov[i].iov_len,
> -                              true, &iova);
> +                              true, &iova, NULL);
>          if (r == -ENOMEM && retry) {
>              retry = false;
>              trace_nvme_dma_flush_queue_wait(s);
> @@ -1450,7 +1450,7 @@ static void nvme_register_buf(BlockDriverState *bs, void *host, size_t size)
>      int ret;
>      BDRVNVMeState *s = bs->opaque;
>  
> -    ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL);
> +    ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL, NULL);
>      if (ret) {
>          /* FIXME: we may run out of IOVA addresses after repeated
>           * bdrv_register_buf/bdrv_unregister_buf, because nvme_vfio_dma_unmap
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 73f7bfa7540..c03fe0b7156 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -462,7 +462,7 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
>  {
>      QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
>      trace_qemu_vfio_ram_block_added(s, host, size);
> -    qemu_vfio_dma_map(s, host, size, false, NULL);
> +    qemu_vfio_dma_map(s, host, size, false, NULL, NULL);
>  }
>  
>  static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
> @@ -477,6 +477,7 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
>  
>  static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
>  {
> +    Error *local_err = NULL;
>      void *host_addr = qemu_ram_get_host_addr(rb);
>      ram_addr_t length = qemu_ram_get_used_length(rb);
>      int ret;
> @@ -485,10 +486,11 @@ static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
>      if (!host_addr) {
>          return 0;
>      }
> -    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
> +    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL, &local_err);
>      if (ret) {
> -        fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
> -                host_addr, (uint64_t)length);
> +        error_reportf_err(local_err,
> +                          "qemu_vfio_init_ramblock: failed %p %" PRId64 ":",
> +                          host_addr, (uint64_t)length);
>      }
>      return 0;
>  }
> @@ -724,7 +726,7 @@ qemu_vfio_find_temp_iova(QEMUVFIOState *s, size_t size, uint64_t *iova)
>   * mapping status within this area is not allowed).
>   */
>  int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
> -                      bool temporary, uint64_t *iova)
> +                      bool temporary, uint64_t *iova, Error **errp)
>  {
where do you set errp. You just add prepend messages above. If I am not
wrong errp must be non NULL to call error_prepend()

Thanks

Eric
>      int ret = 0;
>      int index;
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 12/19] util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
  2020-10-26 10:54 ` [PATCH v2 12/19] util/vfio-helpers: Let qemu_vfio_do_mapping() " Philippe Mathieu-Daudé
@ 2020-10-26 20:02   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 20:02 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> Pass qemu_vfio_do_mapping() an Error* argument so it can propagate
> any error to callers. Replace error_report() which only report
> to the monitor by the more generic error_setg_errno().
> 
> Reviewed-by: Fam Zheng <fam@euphon.net>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  util/vfio-helpers.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index c03fe0b7156..2c4598d7faa 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -609,7 +609,7 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
>  
>  /* Do the DMA mapping with VFIO. */
>  static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
> -                                uint64_t iova)
> +                                uint64_t iova, Error **errp)
>  {
>      struct vfio_iommu_type1_dma_map dma_map = {
>          .argsz = sizeof(dma_map),
> @@ -621,7 +621,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
>      trace_qemu_vfio_do_mapping(s, host, iova, size);
>  
>      if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, &dma_map)) {
> -        error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
> +        error_setg_errno(errp, errno, "VFIO_MAP_DMA failed");
>          return -errno;
>      }
>      return 0;
> @@ -757,7 +757,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
There are other error cases in qemu_vfio_dma_map() where you must set
errp. Otherwise the caller may, in the future test the returned value
and incorrectly handle the errp.

Thanks

Eric
>                  goto out;
>              }
>              assert(qemu_vfio_verify_mappings(s));
> -            ret = qemu_vfio_do_mapping(s, host, size, iova0);
> +            ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
>              if (ret) {
>                  qemu_vfio_undo_mapping(s, mapping, NULL);
>                  goto out;
> @@ -768,7 +768,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
>                  ret = -ENOMEM;
>                  goto out;
>              }
> -            ret = qemu_vfio_do_mapping(s, host, size, iova0);
> +            ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
>              if (ret) {
>                  goto out;
>              }
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs()
  2020-10-26 10:55 ` [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs() Philippe Mathieu-Daudé
@ 2020-10-26 20:24   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 20:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
> qemu_vfio_pci_init_irq() allows us to initialize any type of IRQ,
> but only one. Introduce qemu_vfio_pci_msix_init_irqs() which is
> specific to MSIX IRQ type, and allow us to use multiple IRQs
> (thus passing multiple eventfd notifiers).
> All eventfd notifiers are initialized with the special '-1' value
> meaning "un-assigned".
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  include/qemu/vfio-helpers.h |  6 +++-
>  util/vfio-helpers.c         | 65 ++++++++++++++++++++++++++++++++++++-
>  util/trace-events           |  1 +
>  3 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
> index 4b97a904e93..492072cba2f 100644
> --- a/include/qemu/vfio-helpers.h
> +++ b/include/qemu/vfio-helpers.h
> @@ -1,11 +1,13 @@
>  /*
>   * QEMU VFIO helpers
>   *
> - * Copyright 2016 - 2018 Red Hat, Inc.
> + * Copyright 2016 - 2020 Red Hat, Inc.
>   *
>   * Authors:
>   *   Fam Zheng <famz@redhat.com>
> + *   Philippe Mathieu-Daudé <philmd@redhat.com>
>   *
> + * SPDX-License-Identifier: GPL-2.0-or-later
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> @@ -29,5 +31,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar,
>                               uint64_t offset, uint64_t size);
>  int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
>                             int irq_type, Error **errp);
> +int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
> +                                 unsigned *irq_count, Error **errp);
>  
>  #endif
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 874d76c2a2a..d88e2c7dc1f 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -1,11 +1,13 @@
>  /*
>   * VFIO utility
>   *
> - * Copyright 2016 - 2018 Red Hat, Inc.
> + * Copyright 2016 - 2020 Red Hat, Inc.
>   *
>   * Authors:
>   *   Fam Zheng <famz@redhat.com>
> + *   Philippe Mathieu-Daudé <philmd@redhat.com>
>   *
> + * SPDX-License-Identifier: GPL-2.0-or-later
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> @@ -230,6 +232,67 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
>      return 0;
>  }
>  
> +/**
> + * Initialize device MSIX IRQs and register event notifiers.
> + * @irq_count: pointer to number of MSIX IRQs to initialize
> + *
> + * If the number of IRQs requested exceeds the available on the device,
> + * store the number of available IRQs in @irq_count and return -EOVERFLOW.
> + */
> +int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
> +                                 unsigned *irq_count, Error **errp)
> +{
> +    int r;
> +    size_t irq_set_size;
> +    struct vfio_irq_set *irq_set;
> +    struct vfio_irq_info irq_info = {
> +        .argsz = sizeof(irq_info),
> +        .index = VFIO_PCI_MSIX_IRQ_INDEX
> +    };
> +
> +    if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, &irq_info)) {
> +        error_setg_errno(errp, errno, "Failed to get device interrupt info");
> +        return -errno;
> +    }
> +    trace_qemu_vfio_msix_info_irqs(irq_info.count, *irq_count);
> +    if (irq_info.count < *irq_count) {
> +        error_setg(errp, "Not enough device interrupts available");
> +        *irq_count = irq_info.count;
> +        return -EOVERFLOW;
> +    }
> +    if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
> +        error_setg(errp, "Device interrupt doesn't support eventfd");
> +        return -EINVAL;
> +    }
> +
> +    irq_set_size = sizeof(*irq_set) + *irq_count * sizeof(int32_t);
> +    irq_set = g_malloc0(irq_set_size);
> +
> +    /* Get to a known IRQ state */
> +    *irq_set = (struct vfio_irq_set) {
> +        .argsz = irq_set_size,
> +        .flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
> +        .index = VFIO_PCI_MSIX_IRQ_INDEX,
> +        .start = 0,
> +        .count = *irq_count,
> +    };
> +
> +    for (unsigned i = 0; i < *irq_count; i++) {
> +        ((int32_t *)&irq_set->data)[i] = -1; /* un-assigned: skip */
> +    }
> +    r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
> +    g_free(irq_set);
> +    if (r < 0) {
> +        error_setg_errno(errp, errno, "Failed to setup device interrupts");
> +        return -errno;
> +    } else if (r > 0) {
Can it happen?

Thanks

Eric
> +        error_setg(errp, "Not enough device interrupts available");
> +        *irq_count = r;
> +        return -EOVERFLOW;
> +    }
> +    return 0;
> +}
> +
>  static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
>                                       int size, int ofs)
>  {
> diff --git a/util/trace-events b/util/trace-events
> index 3c36def9f30..ec93578b125 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -87,6 +87,7 @@ qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p host
>  qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size 0x%zx temporary %d &iova %p"
>  qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
> +qemu_vfio_msix_info_irqs(uint32_t count, unsigned asked) "msix irqs %"PRIu32" (asked: %u)"
>  qemu_vfio_iommu_iova_pgsizes(uint64_t iova_pgsizes) "iommu page size bitmask: 0x%08"PRIx64
>  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 17/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq()
  2020-10-26 10:55 ` [PATCH v2 17/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq() Philippe Mathieu-Daudé
@ 2020-10-26 20:28   ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2020-10-26 20:28 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
> Introduce qemu_vfio_pci_msix_set_irq() to set the event
> notifier of a specific MSIX IRQ. All other registered IRQs
> are left unmodified.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric

> ---
>  include/qemu/vfio-helpers.h |  2 ++
>  util/vfio-helpers.c         | 35 +++++++++++++++++++++++++++++++++++
>  util/trace-events           |  1 +
>  3 files changed, 38 insertions(+)
> 
> diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
> index 492072cba2f..4c06694e03a 100644
> --- a/include/qemu/vfio-helpers.h
> +++ b/include/qemu/vfio-helpers.h
> @@ -33,5 +33,7 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
>                             int irq_type, Error **errp);
>  int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
>                                   unsigned *irq_count, Error **errp);
> +int qemu_vfio_pci_msix_set_irq(QEMUVFIOState *s, unsigned irq_index,
> +                               EventNotifier *notifier, Error **errp);
>  
>  #endif
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index d88e2c7dc1f..18142e6be86 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -232,6 +232,41 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
>      return 0;
>  }
>  
> +/**
> + * Initialize a MSIX IRQ and register its event notifier.
> + * @irq_index: MSIX IRQ index
> + * @notifier: notifier for the MSIX IRQ
> + */
> +int qemu_vfio_pci_msix_set_irq(QEMUVFIOState *s, unsigned irq_index,
> +                               EventNotifier *notifier, Error **errp)
> +{
> +    int r;
> +    int fd = event_notifier_get_fd(notifier);
> +    size_t irq_set_size;
> +    struct vfio_irq_set *irq_set;
> +
> +    trace_qemu_vfio_pci_msix_set_irq(irq_index, fd);
> +    irq_set_size = sizeof(*irq_set) + sizeof(int32_t);
> +    irq_set = g_malloc0(irq_set_size);
> +    /* Get to a known IRQ state */
> +    *irq_set = (struct vfio_irq_set) {
> +        .argsz = irq_set_size,
> +        .flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
> +        .index = VFIO_PCI_MSIX_IRQ_INDEX,
> +        .start = irq_index,
> +        .count = 1,
> +    };
> +    ((int32_t *)&irq_set->data)[0] = fd;
> +    r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
> +    g_free(irq_set);
> +    if (r) {
> +        error_setg_errno(errp, errno, "Failed to setup device interrupt #%u",
> +                         irq_index);
> +        return -errno;
> +    }
> +    return 0;
> +}
> +
>  /**
>   * Initialize device MSIX IRQs and register event notifiers.
>   * @irq_count: pointer to number of MSIX IRQs to initialize
> diff --git a/util/trace-events b/util/trace-events
> index ec93578b125..3a56c542a94 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -88,6 +88,7 @@ qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *io
>  qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p host %p <-> iova 0x%"PRIx64" size 0x%zx"
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
>  qemu_vfio_msix_info_irqs(uint32_t count, unsigned asked) "msix irqs %"PRIu32" (asked: %u)"
> +qemu_vfio_pci_msix_set_irq(unsigned irq_index, int fd) "msix irq %u notifier_fd %d"
>  qemu_vfio_iommu_iova_pgsizes(uint64_t iova_pgsizes) "iommu page size bitmask: 0x%08"PRIx64
>  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" size %"PRId64")"
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 18/19] block/nvme: Switch to using the MSIX API
  2020-10-26 10:55 ` [PATCH v2 18/19] block/nvme: Switch to using the MSIX API Philippe Mathieu-Daudé
@ 2020-10-26 20:32   ` Auger Eric
  2020-10-27  9:55     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 38+ messages in thread
From: Auger Eric @ 2020-10-26 20:32 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

Hi Philippe,

On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
> In preparation of using multiple IRQs, switch to using the recently
> introduced MSIX API. Instead of allocating and assigning IRQ in
> a single step, we now have to use two distinct calls.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 46b09b3a3a7..191678540b6 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -693,6 +693,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      size_t device_page_size_min;
>      size_t device_page_size_max;
>      size_t iommu_page_size_min = 4096;
> +    unsigned irq_count = MSIX_IRQ_COUNT;
>  
>      qemu_co_mutex_init(&s->dma_map_lock);
>      qemu_co_queue_init(&s->dma_flush_queue);
> @@ -809,8 +810,17 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>          }
>      }
>  
> -    ret = qemu_vfio_pci_init_irq(s->vfio, s->irq_notifier,
> -                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
> +    ret = qemu_vfio_pci_msix_init_irqs(s->vfio, &irq_count, errp);
> +    if (ret) {
> +        if (ret == -EOVERFLOW) {
> +            error_append_hint(errp, "%u IRQs requested but only %u available\n",
> +                              MSIX_IRQ_COUNT, irq_count);
This message can be directly printed in qemu_vfio_pci_msix_init_irqs()
> +        }
> +        goto out;
> +    }
> +
> +    ret = qemu_vfio_pci_msix_set_irq(s->vfio, MSIX_SHARED_IRQ_IDX,
> +                                     s->irq_notifier, errp);
>      if (ret) {
>          goto out;
>      }
> 
Thanks

Eric



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 02/19] block/nvme: Set request_alignment at initialization
  2020-10-26 10:54 ` [PATCH v2 02/19] block/nvme: Set request_alignment at initialization Philippe Mathieu-Daudé
  2020-10-26 17:38   ` Auger Eric
@ 2020-10-27  9:24   ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-10-27  9:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Eric Auger, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

On Mon, Oct 26, 2020 at 11:54:47AM +0100, Philippe Mathieu-Daudé wrote:
> When introducing this driver in commit bdd6a90a9e5
> ("block: Add VFIO based NVMe driver") we correctly
> set the request_alignment in nvme_refresh_limits()
> but forgot to set it at initialization. Do it now.

This patch is fine but the commit description does not explain why this
change is necessary. Is there a bug and how can it be triggered? Or is
this code change just for consistency?

> Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables
  2020-10-26 10:54 ` [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables Philippe Mathieu-Daudé
  2020-10-26 17:38   ` Auger Eric
@ 2020-10-27  9:32   ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-10-27  9:32 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Eric Auger, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

On Mon, Oct 26, 2020 at 11:54:48AM +0100, Philippe Mathieu-Daudé wrote:
> Introduce device/iommu 'page_size_min' variables to make
> the code clearer.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index aa290996679..5abd7257cac 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -690,6 +690,8 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      uint64_t deadline, now;
>      Error *local_err = NULL;
>      volatile NvmeBar *regs = NULL;
> +    size_t device_page_size_min;
> +    size_t iommu_page_size_min = 4096;
>  
>      qemu_co_mutex_init(&s->dma_map_lock);
>      qemu_co_queue_init(&s->dma_flush_queue);
> @@ -724,7 +726,8 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>          goto out;
>      }
>  
> -    s->page_size = MAX(4096, 1u << (12 + NVME_CAP_MPSMIN(cap)));
> +    device_page_size_min = 1u << (12 + NVME_CAP_MPSMIN(cap));
> +    s->page_size = MAX(iommu_page_size_min, device_page_size_min);

It's not clear to me that the 4096 value is related to the IOMMU page
size here. The MAX(4096) expression seems like a sanity-check to me. An
MPS value of 0 is a 4KB page size, so it's never possible to express a
smaller page size. I guess MAX() was used in case a device incorrectly
reports MPSMIN.

I think introducing the concept of IOMMU page size is premature and
maybe not the intention of the existing code, but the concept will be
needed soon, so this patch is okay:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 04/19] block/nvme: Trace controller capabilities
  2020-10-26 10:54 ` [PATCH v2 04/19] block/nvme: Trace controller capabilities Philippe Mathieu-Daudé
  2020-10-26 17:57   ` Auger Eric
@ 2020-10-27  9:41   ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-10-27  9:41 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Eric Auger, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

On Mon, Oct 26, 2020 at 11:54:49AM +0100, Philippe Mathieu-Daudé wrote:
> Controllers have different capabilities and report them in the
> CAP register. We are particularly interested by the page size
> limits.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  block/nvme.c       | 10 ++++++++++
>  block/trace-events |  1 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 5abd7257cac..3b6d3972ec2 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -720,6 +720,16 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>       * Initialization". */
>  
>      cap = le64_to_cpu(regs->cap);
> +    trace_nvme_controller_capability("Maximum Queue Entries Supported",
> +                                     NVME_CAP_MQES(cap));
> +    trace_nvme_controller_capability("Contiguous Queues Required",
> +                                     NVME_CAP_CQR(cap));
> +    trace_nvme_controller_capability("Subsystem  Reset Supported",
> +                                     NVME_CAP_NSSRS(cap));
> +    trace_nvme_controller_capability("Memory Page Size Minimum",
> +                                     NVME_CAP_MPSMIN(cap));
> +    trace_nvme_controller_capability("Memory Page Size Maximum",
> +                                     NVME_CAP_MPSMAX(cap));

This works well for printf-style tracing but it can be tedious to
strcmp() or parse strings from SystemTap and other tracing scripts.

Another way of expressing these trace events is:

  trace_nvme_controller_capabilities(NVME_CAP_MQES(cap),
                                     NVME_CAP_CQR(cap),
				     ...);

Or:

  trace_nvme_controller_capability_mqes(NVME_CAP_MQES(cap));
  trace_nvme_controller_capability_cqr(NVME_CAP_CQR(cap));

One view is that tracing is like printf (the style used in this patch).
Another view is that tracing produces records that scripts can process
(the style I showed). Trace events defined in one style may be hard to
use in the other style (i.e. less human-readable vs harder to process in
a script).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/19] util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci()
  2020-10-26 10:54 ` [PATCH v2 14/19] util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci() Philippe Mathieu-Daudé
@ 2020-10-27  9:50   ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-10-27  9:50 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Eric Auger, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On Mon, Oct 26, 2020 at 11:54:59AM +0100, Philippe Mathieu-Daudé wrote:
> @@ -737,6 +738,17 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>      }
>  
>      device_page_size_min = 1u << (12 + NVME_CAP_MPSMIN(cap));
> +    device_page_size_max = 1u << (12 + NVME_CAP_MPSMAX(cap));
> +    if (iommu_page_size_min > device_page_size_max) {
> +        g_autofree char *iommu_page_size_s = size_to_str(iommu_page_size_min);
> +        g_autofree char *device_page_size_s = size_to_str(device_page_size_max);
> +
> +        error_setg(errp, "IOMMU minimum page size (%s)"
> +                         " too big for device (max %s)",
> +                   iommu_page_size_s, device_page_size_s);
> +        ret = -EINVAL;
> +        goto out;
> +    }

I thought you and Eric worked on a solution to support smaller device
pages on bigger IOMMU pages? For example, 4KB device page size on 64KB
IOMMU page size.

Won't this check be removed again very soon? Why add it at all?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 18/19] block/nvme: Switch to using the MSIX API
  2020-10-26 20:32   ` Auger Eric
@ 2020-10-27  9:55     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-27  9:55 UTC (permalink / raw)
  To: Auger Eric, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Alex Williamson,
	Stefan Hajnoczi

On 10/26/20 9:32 PM, Auger Eric wrote:
> Hi Philippe,
> 
> On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
>> In preparation of using multiple IRQs, switch to using the recently
>> introduced MSIX API. Instead of allocating and assigning IRQ in
>> a single step, we now have to use two distinct calls.
>>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>> ---
>>  block/nvme.c | 14 ++++++++++++--
>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index 46b09b3a3a7..191678540b6 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -693,6 +693,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>>      size_t device_page_size_min;
>>      size_t device_page_size_max;
>>      size_t iommu_page_size_min = 4096;
>> +    unsigned irq_count = MSIX_IRQ_COUNT;
>>  
>>      qemu_co_mutex_init(&s->dma_map_lock);
>>      qemu_co_queue_init(&s->dma_flush_queue);
>> @@ -809,8 +810,17 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>>          }
>>      }
>>  
>> -    ret = qemu_vfio_pci_init_irq(s->vfio, s->irq_notifier,
>> -                                 VFIO_PCI_MSIX_IRQ_INDEX, errp);
>> +    ret = qemu_vfio_pci_msix_init_irqs(s->vfio, &irq_count, errp);
>> +    if (ret) {
>> +        if (ret == -EOVERFLOW) {
>> +            error_append_hint(errp, "%u IRQs requested but only %u available\n",
>> +                              MSIX_IRQ_COUNT, irq_count);
> This message can be directly printed in qemu_vfio_pci_msix_init_irqs()

Good idea, thanks.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs
  2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
                   ` (18 preceding siblings ...)
  2020-10-26 10:55 ` [PATCH v2 19/19] util/vfio-helpers: Remove now unused qemu_vfio_pci_init_irq() Philippe Mathieu-Daudé
@ 2020-10-27  9:55 ` Stefan Hajnoczi
  19 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-10-27  9:55 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Eric Auger, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

On Mon, Oct 26, 2020 at 11:54:45AM +0100, Philippe Mathieu-Daudé wrote:
> This series allow using multiple MSIX IRQs
> We currently share a single IRQ between 2 NVMe queues
> (ADMIN and I/O). This series still uses 1 shared IRQ
> but prepare for using multiple ones.
> 
> Since v1:
> - Addressed Stefan comment in patch #14
>   "Pass minimum page size to qemu_vfio_open_pci"
>   (check the page size is in range with device)
> - To reduce (and simplify) changes in patch #14, added
>   new patch #2 "Introduce device/iommu 'page_size_min' variables"
> - Added "Trace controller capabilities" useful to test the
>   previous changes
> - "Set request_alignment at initialization" reported by Stefan
>   (and tested by Eric off-list).

The MSI-X patches look good.

I'm confused about the page size patches since they don't solve the 4KB
device page size on 64KB IOMMU page size issue that you and Eric were
working on. I would prefer to hold off on page size changes until that
work is complete.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-10-27  9:57 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26 10:54 [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Philippe Mathieu-Daudé
2020-10-26 10:54 ` [PATCH v2 01/19] block/nvme: Correct minimum device page size Philippe Mathieu-Daudé
2020-10-26 17:57   ` Auger Eric
2020-10-26 10:54 ` [PATCH v2 02/19] block/nvme: Set request_alignment at initialization Philippe Mathieu-Daudé
2020-10-26 17:38   ` Auger Eric
2020-10-27  9:24   ` Stefan Hajnoczi
2020-10-26 10:54 ` [PATCH v2 03/19] block/nvme: Introduce device/iommu 'page_size_min' variables Philippe Mathieu-Daudé
2020-10-26 17:38   ` Auger Eric
2020-10-27  9:32   ` Stefan Hajnoczi
2020-10-26 10:54 ` [PATCH v2 04/19] block/nvme: Trace controller capabilities Philippe Mathieu-Daudé
2020-10-26 17:57   ` Auger Eric
2020-10-27  9:41   ` Stefan Hajnoczi
2020-10-26 10:54 ` [PATCH v2 05/19] util/vfio-helpers: Improve reporting unsupported IOMMU type Philippe Mathieu-Daudé
2020-10-26 10:54 ` [PATCH v2 06/19] util/vfio-helpers: Trace PCI I/O config accesses Philippe Mathieu-Daudé
2020-10-26 18:02   ` Auger Eric
2020-10-26 10:54 ` [PATCH v2 07/19] util/vfio-helpers: Trace PCI BAR region info Philippe Mathieu-Daudé
2020-10-26 18:06   ` Auger Eric
2020-10-26 10:54 ` [PATCH v2 08/19] util/vfio-helpers: Trace where BARs are mapped Philippe Mathieu-Daudé
2020-10-26 10:54 ` [PATCH v2 09/19] util/vfio-helpers: Improve DMA trace events Philippe Mathieu-Daudé
2020-10-26 10:54 ` [PATCH v2 10/19] util/vfio-helpers: Convert vfio_dump_mapping to " Philippe Mathieu-Daudé
2020-10-26 10:54 ` [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error Philippe Mathieu-Daudé
2020-10-26 19:58   ` Auger Eric
2020-10-26 10:54 ` [PATCH v2 12/19] util/vfio-helpers: Let qemu_vfio_do_mapping() " Philippe Mathieu-Daudé
2020-10-26 20:02   ` Auger Eric
2020-10-26 10:54 ` [PATCH v2 13/19] util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report() Philippe Mathieu-Daudé
2020-10-26 10:54 ` [PATCH v2 14/19] util/vfio-helpers: Pass minimum page size to qemu_vfio_open_pci() Philippe Mathieu-Daudé
2020-10-27  9:50   ` Stefan Hajnoczi
2020-10-26 10:55 ` [PATCH v2 15/19] util/vfio-helpers: Report error when IOMMU page size is not supported Philippe Mathieu-Daudé
2020-10-26 16:12   ` Auger Eric
2020-10-26 10:55 ` [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs() Philippe Mathieu-Daudé
2020-10-26 20:24   ` Auger Eric
2020-10-26 10:55 ` [PATCH v2 17/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_set_irq() Philippe Mathieu-Daudé
2020-10-26 20:28   ` Auger Eric
2020-10-26 10:55 ` [PATCH v2 18/19] block/nvme: Switch to using the MSIX API Philippe Mathieu-Daudé
2020-10-26 20:32   ` Auger Eric
2020-10-27  9:55     ` Philippe Mathieu-Daudé
2020-10-26 10:55 ` [PATCH v2 19/19] util/vfio-helpers: Remove now unused qemu_vfio_pci_init_irq() Philippe Mathieu-Daudé
2020-10-27  9:55 ` [PATCH v2 00/19] util/vfio-helpers: Allow using multiple MSIX IRQs Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.